speech-map

Name	speech-map JSON
Version	0.1.1 JSON
	download
home_page	None
Summary	Mean Average Precision over n-grams / words with speech features
upload_time	2025-09-19 11:18:22
maintainer	None
docs_url	None
author	Maxime Poli
requires_python	>=3.12
license	None
keywords	speech machine learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Mean Average Precision over words or n-grams with speech features

Compute the Mean Average Precision (MAP) with speech features.

This is the MAP@R from equation (3) of https://arxiv.org/abs/2003.08505.

## Installation

This package is available on PyPI:

```bash
pip install speech-map
```

It is much more efficient to use the Faiss backend for the k-NN, instead of the naive PyTorch backend.
Since Faiss is not available on PyPI, you can install this package in a conda environment with your conda variant:

- CPU version:
    ```bash
    micromamba create -f environment-cpu.yaml
    ```
- GPU version:
    ```bash
    CONDA_OVERRIDE_CUDA=12.6 micromamba create -f environment-gpu.yaml
    ```

## Usage

### CLI

```
❯ python -m speech_map --help
usage: __main__.py [-h] [--pooling {MEAN,MAX,MIN,HAMMING}] [--frequency FREQUENCY] [--backend {FAISS,TORCH}] features jsonl

Mean Average Precision over n-grams / words with speech features

positional arguments:
  features              Path to the directory with pre-computed features
  jsonl                 Path to the JSONL file with annotations

options:
  -h, --help            show this help message and exit
  --pooling {MEAN,MAX,MIN,HAMMING}
                        Pooling (default: MEAN)
  --frequency FREQUENCY
                        Feature frequency in Hz (default: 50 Hz)
  --backend {FAISS,TORCH}
                        KNN (default: FAISS)
```

### Python API

You most probably need only two functions: `build_embeddings_and_labels` and `mean_average_precision`.
Use them like this:

```python
from speech_map import build_embeddings_and_labels, mean_average_precision

embeddings, labels = build_embeddings_and_labels(path_to_features, path_to_jsonl)
print(mean_average_precision(embeddings, labels))
```

In this example, `path_to_features` is a path to a directory containing features stored in individual PyTorch
tensor files, and `path_to_jsonl` is the path to the JSONL annotations file.

You can also use those functions in a more advanced setting like this:

```python
from speech_map import Pooling, build_embeddings_and_labels, mean_average_precision

embeddings, labels = build_embeddings_and_labels(
    path_to_features,
    path_to_jsonl,
    pooling=Pooling.MAX,
    frequency=100,
    feature_maker=my_model,
    file_extension=".wav",
)
print(mean_average_precision(embeddings, labels))
```

This is a minimal package, and you can easily go through the code in `src/speech_map/core.py` if you want to check the details.

## Data

We distribute in `data` the words and n-grams annotations for LibriSpeech evaluation subsets. Decompress them with zstd.

We have not used the n-grams annotations recently; there is probably too much samples and they would need some clever subsampling.

## References

MAP for speech representations:

```bibtex
@inproceedings{carlin11_interspeech,
  title     = {Rapid evaluation of speech representations for spoken term discovery},
  author    = {Michael A. Carlin and Samuel Thomas and Aren Jansen and Hynek Hermansky},
  year      = {2011},
  booktitle = {Interspeech 2011},
  pages     = {821--824},
  doi       = {10.21437/Interspeech.2011-304},
  issn      = {2958-1796},
}
```

Data and original implementation:

```bibtex
@inproceedings{algayres20_interspeech,
  title     = {Evaluating the Reliability of Acoustic Speech Embeddings},
  author    = {Robin Algayres and Mohamed Salah Zaiem and Benoît Sagot and Emmanuel Dupoux},
  year      = {2020},
  booktitle = {Interspeech 2020},
  pages     = {4621--4625},
  doi       = {10.21437/Interspeech.2020-2362},
  issn      = {2958-1796},
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "speech-map",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "speech, machine learning",
    "author": "Maxime Poli",
    "author_email": "CoML <dev@cognitive-ml.fr>",
    "download_url": "https://files.pythonhosted.org/packages/9d/0d/76a2806d7090cc0e8e00b08bb4b3c297b4e354f68135f86677d740815c84/speech_map-0.1.1.tar.gz",
    "platform": null,
    "description": "# Mean Average Precision over words or n-grams with speech features\n\nCompute the Mean Average Precision (MAP) with speech features.\n\nThis is the MAP@R from equation (3) of https://arxiv.org/abs/2003.08505.\n\n## Installation\n\nThis package is available on PyPI:\n\n```bash\npip install speech-map\n```\n\nIt is much more efficient to use the Faiss backend for the k-NN, instead of the naive PyTorch backend.\nSince Faiss is not available on PyPI, you can install this package in a conda environment with your conda variant:\n\n- CPU version:\n    ```bash\n    micromamba create -f environment-cpu.yaml\n    ```\n- GPU version:\n    ```bash\n    CONDA_OVERRIDE_CUDA=12.6 micromamba create -f environment-gpu.yaml\n    ```\n\n## Usage\n\n### CLI\n\n```\n\u276f python -m speech_map --help\nusage: __main__.py [-h] [--pooling {MEAN,MAX,MIN,HAMMING}] [--frequency FREQUENCY] [--backend {FAISS,TORCH}] features jsonl\n\nMean Average Precision over n-grams / words with speech features\n\npositional arguments:\n  features              Path to the directory with pre-computed features\n  jsonl                 Path to the JSONL file with annotations\n\noptions:\n  -h, --help            show this help message and exit\n  --pooling {MEAN,MAX,MIN,HAMMING}\n                        Pooling (default: MEAN)\n  --frequency FREQUENCY\n                        Feature frequency in Hz (default: 50 Hz)\n  --backend {FAISS,TORCH}\n                        KNN (default: FAISS)\n```\n\n### Python API\n\nYou most probably need only two functions: `build_embeddings_and_labels` and `mean_average_precision`.\nUse them like this:\n\n```python\nfrom speech_map import build_embeddings_and_labels, mean_average_precision\n\nembeddings, labels = build_embeddings_and_labels(path_to_features, path_to_jsonl)\nprint(mean_average_precision(embeddings, labels))\n```\n\nIn this example, `path_to_features` is a path to a directory containing features stored in individual PyTorch\ntensor files, and `path_to_jsonl` is the path to the JSONL annotations file.\n\nYou can also use those functions in a more advanced setting like this:\n\n```python\nfrom speech_map import Pooling, build_embeddings_and_labels, mean_average_precision\n\nembeddings, labels = build_embeddings_and_labels(\n    path_to_features,\n    path_to_jsonl,\n    pooling=Pooling.MAX,\n    frequency=100,\n    feature_maker=my_model,\n    file_extension=\".wav\",\n)\nprint(mean_average_precision(embeddings, labels))\n```\n\nThis is a minimal package, and you can easily go through the code in `src/speech_map/core.py` if you want to check the details.\n\n## Data\n\nWe distribute in `data` the words and n-grams annotations for LibriSpeech evaluation subsets. Decompress them with zstd.\n\nWe have not used the n-grams annotations recently; there is probably too much samples and they would need some clever subsampling.\n\n## References\n\nMAP for speech representations:\n\n```bibtex\n@inproceedings{carlin11_interspeech,\n  title     = {Rapid evaluation of speech representations for spoken term discovery},\n  author    = {Michael A. Carlin and Samuel Thomas and Aren Jansen and Hynek Hermansky},\n  year      = {2011},\n  booktitle = {Interspeech 2011},\n  pages     = {821--824},\n  doi       = {10.21437/Interspeech.2011-304},\n  issn      = {2958-1796},\n}\n```\n\nData and original implementation:\n\n```bibtex\n@inproceedings{algayres20_interspeech,\n  title     = {Evaluating the Reliability of Acoustic Speech Embeddings},\n  author    = {Robin Algayres and Mohamed Salah Zaiem and Beno\u00eet Sagot and Emmanuel Dupoux},\n  year      = {2020},\n  booktitle = {Interspeech 2020},\n  pages     = {4621--4625},\n  doi       = {10.21437/Interspeech.2020-2362},\n  issn      = {2958-1796},\n}\n```\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Mean Average Precision over n-grams / words with speech features",
    "version": "0.1.1",
    "project_urls": {
        "repository": "https://github.com/bootphon/speech-map"
    },
    "split_keywords": [
        "speech",
        " machine learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bd05ed0f23f9f468fc292f5b306dbafe5f9f6f109b2233bd7f702b27632bfcae",
                "md5": "6343ecb505c959ea298e2816551cfd6f",
                "sha256": "8fec00a00ca845b5450f1b5cb02fff7d7ac5e187d1e7efd7528bd048ade09298"
            },
            "downloads": -1,
            "filename": "speech_map-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6343ecb505c959ea298e2816551cfd6f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 7919,
            "upload_time": "2025-09-19T11:18:21",
            "upload_time_iso_8601": "2025-09-19T11:18:21.837679Z",
            "url": "https://files.pythonhosted.org/packages/bd/05/ed0f23f9f468fc292f5b306dbafe5f9f6f109b2233bd7f702b27632bfcae/speech_map-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9d0d76a2806d7090cc0e8e00b08bb4b3c297b4e354f68135f86677d740815c84",
                "md5": "c16471a302234f98e7b1090d3b98d40e",
                "sha256": "06857985f5e7f2d381b56972be9cacfc3fe0b6541ac4083859290206191e97c6"
            },
            "downloads": -1,
            "filename": "speech_map-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "c16471a302234f98e7b1090d3b98d40e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 6761,
            "upload_time": "2025-09-19T11:18:22",
            "upload_time_iso_8601": "2025-09-19T11:18:22.614066Z",
            "url": "https://files.pythonhosted.org/packages/9d/0d/76a2806d7090cc0e8e00b08bb4b3c297b4e354f68135f86677d740815c84/speech_map-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-19 11:18:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bootphon",
    "github_project": "speech-map",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "speech-map"
}

Maxime Poli