| Name | speech-map JSON |
| Version |
0.1.1
JSON |
| download |
| home_page | None |
| Summary | Mean Average Precision over n-grams / words with speech features |
| upload_time | 2025-09-19 11:18:22 |
| maintainer | None |
| docs_url | None |
| author | Maxime Poli |
| requires_python | >=3.12 |
| license | None |
| keywords |
speech
machine learning
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# Mean Average Precision over words or n-grams with speech features
Compute the Mean Average Precision (MAP) with speech features.
This is the MAP@R from equation (3) of https://arxiv.org/abs/2003.08505.
## Installation
This package is available on PyPI:
```bash
pip install speech-map
```
It is much more efficient to use the Faiss backend for the k-NN, instead of the naive PyTorch backend.
Since Faiss is not available on PyPI, you can install this package in a conda environment with your conda variant:
- CPU version:
```bash
micromamba create -f environment-cpu.yaml
```
- GPU version:
```bash
CONDA_OVERRIDE_CUDA=12.6 micromamba create -f environment-gpu.yaml
```
## Usage
### CLI
```
❯ python -m speech_map --help
usage: __main__.py [-h] [--pooling {MEAN,MAX,MIN,HAMMING}] [--frequency FREQUENCY] [--backend {FAISS,TORCH}] features jsonl
Mean Average Precision over n-grams / words with speech features
positional arguments:
features Path to the directory with pre-computed features
jsonl Path to the JSONL file with annotations
options:
-h, --help show this help message and exit
--pooling {MEAN,MAX,MIN,HAMMING}
Pooling (default: MEAN)
--frequency FREQUENCY
Feature frequency in Hz (default: 50 Hz)
--backend {FAISS,TORCH}
KNN (default: FAISS)
```
### Python API
You most probably need only two functions: `build_embeddings_and_labels` and `mean_average_precision`.
Use them like this:
```python
from speech_map import build_embeddings_and_labels, mean_average_precision
embeddings, labels = build_embeddings_and_labels(path_to_features, path_to_jsonl)
print(mean_average_precision(embeddings, labels))
```
In this example, `path_to_features` is a path to a directory containing features stored in individual PyTorch
tensor files, and `path_to_jsonl` is the path to the JSONL annotations file.
You can also use those functions in a more advanced setting like this:
```python
from speech_map import Pooling, build_embeddings_and_labels, mean_average_precision
embeddings, labels = build_embeddings_and_labels(
path_to_features,
path_to_jsonl,
pooling=Pooling.MAX,
frequency=100,
feature_maker=my_model,
file_extension=".wav",
)
print(mean_average_precision(embeddings, labels))
```
This is a minimal package, and you can easily go through the code in `src/speech_map/core.py` if you want to check the details.
## Data
We distribute in `data` the words and n-grams annotations for LibriSpeech evaluation subsets. Decompress them with zstd.
We have not used the n-grams annotations recently; there is probably too much samples and they would need some clever subsampling.
## References
MAP for speech representations:
```bibtex
@inproceedings{carlin11_interspeech,
title = {Rapid evaluation of speech representations for spoken term discovery},
author = {Michael A. Carlin and Samuel Thomas and Aren Jansen and Hynek Hermansky},
year = {2011},
booktitle = {Interspeech 2011},
pages = {821--824},
doi = {10.21437/Interspeech.2011-304},
issn = {2958-1796},
}
```
Data and original implementation:
```bibtex
@inproceedings{algayres20_interspeech,
title = {Evaluating the Reliability of Acoustic Speech Embeddings},
author = {Robin Algayres and Mohamed Salah Zaiem and Benoît Sagot and Emmanuel Dupoux},
year = {2020},
booktitle = {Interspeech 2020},
pages = {4621--4625},
doi = {10.21437/Interspeech.2020-2362},
issn = {2958-1796},
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "speech-map",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "speech, machine learning",
"author": "Maxime Poli",
"author_email": "CoML <dev@cognitive-ml.fr>",
"download_url": "https://files.pythonhosted.org/packages/9d/0d/76a2806d7090cc0e8e00b08bb4b3c297b4e354f68135f86677d740815c84/speech_map-0.1.1.tar.gz",
"platform": null,
"description": "# Mean Average Precision over words or n-grams with speech features\n\nCompute the Mean Average Precision (MAP) with speech features.\n\nThis is the MAP@R from equation (3) of https://arxiv.org/abs/2003.08505.\n\n## Installation\n\nThis package is available on PyPI:\n\n```bash\npip install speech-map\n```\n\nIt is much more efficient to use the Faiss backend for the k-NN, instead of the naive PyTorch backend.\nSince Faiss is not available on PyPI, you can install this package in a conda environment with your conda variant:\n\n- CPU version:\n ```bash\n micromamba create -f environment-cpu.yaml\n ```\n- GPU version:\n ```bash\n CONDA_OVERRIDE_CUDA=12.6 micromamba create -f environment-gpu.yaml\n ```\n\n## Usage\n\n### CLI\n\n```\n\u276f python -m speech_map --help\nusage: __main__.py [-h] [--pooling {MEAN,MAX,MIN,HAMMING}] [--frequency FREQUENCY] [--backend {FAISS,TORCH}] features jsonl\n\nMean Average Precision over n-grams / words with speech features\n\npositional arguments:\n features Path to the directory with pre-computed features\n jsonl Path to the JSONL file with annotations\n\noptions:\n -h, --help show this help message and exit\n --pooling {MEAN,MAX,MIN,HAMMING}\n Pooling (default: MEAN)\n --frequency FREQUENCY\n Feature frequency in Hz (default: 50 Hz)\n --backend {FAISS,TORCH}\n KNN (default: FAISS)\n```\n\n### Python API\n\nYou most probably need only two functions: `build_embeddings_and_labels` and `mean_average_precision`.\nUse them like this:\n\n```python\nfrom speech_map import build_embeddings_and_labels, mean_average_precision\n\nembeddings, labels = build_embeddings_and_labels(path_to_features, path_to_jsonl)\nprint(mean_average_precision(embeddings, labels))\n```\n\nIn this example, `path_to_features` is a path to a directory containing features stored in individual PyTorch\ntensor files, and `path_to_jsonl` is the path to the JSONL annotations file.\n\nYou can also use those functions in a more advanced setting like this:\n\n```python\nfrom speech_map import Pooling, build_embeddings_and_labels, mean_average_precision\n\nembeddings, labels = build_embeddings_and_labels(\n path_to_features,\n path_to_jsonl,\n pooling=Pooling.MAX,\n frequency=100,\n feature_maker=my_model,\n file_extension=\".wav\",\n)\nprint(mean_average_precision(embeddings, labels))\n```\n\nThis is a minimal package, and you can easily go through the code in `src/speech_map/core.py` if you want to check the details.\n\n## Data\n\nWe distribute in `data` the words and n-grams annotations for LibriSpeech evaluation subsets. Decompress them with zstd.\n\nWe have not used the n-grams annotations recently; there is probably too much samples and they would need some clever subsampling.\n\n## References\n\nMAP for speech representations:\n\n```bibtex\n@inproceedings{carlin11_interspeech,\n title = {Rapid evaluation of speech representations for spoken term discovery},\n author = {Michael A. Carlin and Samuel Thomas and Aren Jansen and Hynek Hermansky},\n year = {2011},\n booktitle = {Interspeech 2011},\n pages = {821--824},\n doi = {10.21437/Interspeech.2011-304},\n issn = {2958-1796},\n}\n```\n\nData and original implementation:\n\n```bibtex\n@inproceedings{algayres20_interspeech,\n title = {Evaluating the Reliability of Acoustic Speech Embeddings},\n author = {Robin Algayres and Mohamed Salah Zaiem and Beno\u00eet Sagot and Emmanuel Dupoux},\n year = {2020},\n booktitle = {Interspeech 2020},\n pages = {4621--4625},\n doi = {10.21437/Interspeech.2020-2362},\n issn = {2958-1796},\n}\n```\n\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Mean Average Precision over n-grams / words with speech features",
"version": "0.1.1",
"project_urls": {
"repository": "https://github.com/bootphon/speech-map"
},
"split_keywords": [
"speech",
" machine learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "bd05ed0f23f9f468fc292f5b306dbafe5f9f6f109b2233bd7f702b27632bfcae",
"md5": "6343ecb505c959ea298e2816551cfd6f",
"sha256": "8fec00a00ca845b5450f1b5cb02fff7d7ac5e187d1e7efd7528bd048ade09298"
},
"downloads": -1,
"filename": "speech_map-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6343ecb505c959ea298e2816551cfd6f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 7919,
"upload_time": "2025-09-19T11:18:21",
"upload_time_iso_8601": "2025-09-19T11:18:21.837679Z",
"url": "https://files.pythonhosted.org/packages/bd/05/ed0f23f9f468fc292f5b306dbafe5f9f6f109b2233bd7f702b27632bfcae/speech_map-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "9d0d76a2806d7090cc0e8e00b08bb4b3c297b4e354f68135f86677d740815c84",
"md5": "c16471a302234f98e7b1090d3b98d40e",
"sha256": "06857985f5e7f2d381b56972be9cacfc3fe0b6541ac4083859290206191e97c6"
},
"downloads": -1,
"filename": "speech_map-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "c16471a302234f98e7b1090d3b98d40e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 6761,
"upload_time": "2025-09-19T11:18:22",
"upload_time_iso_8601": "2025-09-19T11:18:22.614066Z",
"url": "https://files.pythonhosted.org/packages/9d/0d/76a2806d7090cc0e8e00b08bb4b3c297b4e354f68135f86677d740815c84/speech_map-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-19 11:18:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "bootphon",
"github_project": "speech-map",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "speech-map"
}