Note: Still in development, as I am configuring the system for the most performant approach. Feel free to jump on the project with me.
# speaker-detector ποΈ
A lightweight CLI tool for speaker enrollment and voice identification, powered by [SpeechBrain](https://speechbrain.readthedocs.io/).
## π§ Features
- β
Enroll speakers from .wav audio
- π΅οΈ Identify speakers from audio samples
- π§ ECAPA-TDNN embedding-based matching
- ποΈ Simple, fast command-line interface
- π Clean file storage in `~/.speaker-detector/`
- π Optional `--verbose` mode for debugging
## π¦ Installation
Install from [TestPyPI](https://test.pypi.org/):
```bash
pip install --index-url https://test.pypi.org/simple/ speaker-detector
When installing packages with a stale requirement file you might need to use: pip install --break-system-packages soundfile to install on WSL Ubuntu
Run this version with -m module flag if you are contributing and want to run server.py:
python3 -m speaker_detector.server
```
## π Usage
## ποΈ Enroll a speaker:
```bash
speaker-detector record --enroll Lara
```
## π΅οΈ Identify a speaker:
```bash
speaker-detector record --test
```
## π List enrolled speakers:
```bash
speaker-detector list
```
## ποΈ Project Structure
~/.speaker-detector/enrollments/ Saved .pt voice embeddings
~/.speaker-detector/recordings/ CLI-recorded .wav audio files
π§Ή Clean vs Verbose Mode
By default, warnings from speechbrain, torch, etc. are hidden for a clean CLI experience.
To enable full logs & deprecation warnings:
speaker-detector --verbose identify samples/test_sample.wav
π Requirements
Python 3.8+
torch
speechbrain
numpy
soundfile
onnxruntime
| Step | Command | When / Purpose | Output |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------------- | ----------------------------- | ---------------------------------------- |
| **1. Export ECAPA Model to ONNX** | `speaker-detector export-model --pt models/embedding_model.ckpt --out ecapa_model.onnx` | Run once unless model changes | `ecapa_model.onnx` |
| **2. Enroll Speaker** | `speaker-detector enroll <speaker_id> <audio_path>`<br>Example:<br>`speaker-detector enroll Lara samples/lara1.wav` | Run per new speaker | Individual `.pt` files (e.g., `Lara.pt`) |
| **3. Combine Embeddings** | `speaker-detector combine --folder data/embeddings/ --out data/enrolled_speakers.pt` | After enrolling speakers | `enrolled_speakers.pt` |
| **4. Export Speakers to JSON** | `speaker-detector export-speaker-json --pt data/enrolled_speakers.pt --out public/speakers.json` | For frontend use | `speakers.json` |
| **5. Identify Speaker** | `speaker-detector identify samples/test_sample.wav` | Identify speaker from audio | Console output: name + score |
| **6. List Enrolled Speakers** | `speaker-detector list-speakers` | Show all enrolled speakers | Console output: list of IDs |
| **Verbose Mode (optional)** | Add `--verbose` to any command:<br>`speaker-detector --verbose identify samples/test_sample.wav` | Show warnings, detailed logs | Developer debug info |
NB: When pushing to Github, do not include any .identifier files.
You can manually clean up stale embeddings that donβt match any existing speaker folder with a quick script:
# Run inside your project root
cd storage/embeddings
for f in *.pt; do
speaker="${f%.pt}"
if [ ! -d "../speakers/$speaker" ]; then
echo "Deleting stale embedding: $f"
rm "$f"
fi
done
Raw data
{
"_id": null,
"home_page": null,
"name": "speaker-detector",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "speaker-recognition, speechbrain, voice, cli, ai",
"author": null,
"author_email": "Lara Whybrow <lara.whybrow@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/27/65/370c5ebca08e2972f05489fade19dfee660b070df999ded50b7ba6039e43/speaker_detector-0.1.6.tar.gz",
"platform": null,
"description": "Note: Still in development, as I am configuring the system for the most performant approach. Feel free to jump on the project with me. \n\n# speaker-detector \ud83c\udf99\ufe0f\n\nA lightweight CLI tool for speaker enrollment and voice identification, powered by [SpeechBrain](https://speechbrain.readthedocs.io/).\n\n## \ud83d\udd27 Features\n\n\n- \u2705 Enroll speakers from .wav audio\n- \ud83d\udd75\ufe0f Identify speakers from audio samples\n- \ud83e\udde0 ECAPA-TDNN embedding-based matching\n- \ud83c\udf9b\ufe0f Simple, fast command-line interface\n- \ud83d\udcc1 Clean file storage in `~/.speaker-detector/`\n- \ud83d\udd0a Optional `--verbose` mode for debugging\n\n\n## \ud83d\udce6 Installation\n\nInstall from [TestPyPI](https://test.pypi.org/):\n\n```bash\npip install --index-url https://test.pypi.org/simple/ speaker-detector\n\nWhen installing packages with a stale requirement file you might need to use: pip install --break-system-packages soundfile to install on WSL Ubuntu\n\nRun this version with -m module flag if you are contributing and want to run server.py:\npython3 -m speaker_detector.server\n\n```\n\n## \ud83d\ude80 Usage\n\n## \ud83c\udf99\ufe0f Enroll a speaker:\n\n```bash\nspeaker-detector record --enroll Lara\n```\n\n## \ud83d\udd75\ufe0f Identify a speaker:\n\n```bash\nspeaker-detector record --test\n```\n## \ud83d\udccb List enrolled speakers:\n\n```bash\nspeaker-detector list\n```\n\n## \ud83d\uddc2\ufe0f Project Structure\n\n~/.speaker-detector/enrollments/\t Saved .pt voice embeddings\n~/.speaker-detector/recordings/\t CLI-recorded .wav audio files\n\n\ud83e\uddf9 Clean vs Verbose Mode\nBy default, warnings from speechbrain, torch, etc. are hidden for a clean CLI experience.\nTo enable full logs & deprecation warnings:\n\nspeaker-detector --verbose identify samples/test_sample.wav\n\n\ud83d\udee0 Requirements\nPython 3.8+\ntorch\nspeechbrain\nnumpy\nsoundfile\nonnxruntime\n\n| Step | Command | When / Purpose | Output |\n| --------------------------------- | ------------------------------------------------------------------------------------------------------------------- | ----------------------------- | ---------------------------------------- |\n| **1. Export ECAPA Model to ONNX** | `speaker-detector export-model --pt models/embedding_model.ckpt --out ecapa_model.onnx` | Run once unless model changes | `ecapa_model.onnx` |\n| **2. Enroll Speaker** | `speaker-detector enroll <speaker_id> <audio_path>`<br>Example:<br>`speaker-detector enroll Lara samples/lara1.wav` | Run per new speaker | Individual `.pt` files (e.g., `Lara.pt`) |\n| **3. Combine Embeddings** | `speaker-detector combine --folder data/embeddings/ --out data/enrolled_speakers.pt` | After enrolling speakers | `enrolled_speakers.pt` |\n| **4. Export Speakers to JSON** | `speaker-detector export-speaker-json --pt data/enrolled_speakers.pt --out public/speakers.json` | For frontend use | `speakers.json` |\n| **5. Identify Speaker** | `speaker-detector identify samples/test_sample.wav` | Identify speaker from audio | Console output: name + score |\n| **6. List Enrolled Speakers** | `speaker-detector list-speakers` | Show all enrolled speakers | Console output: list of IDs |\n| **Verbose Mode (optional)** | Add `--verbose` to any command:<br>`speaker-detector --verbose identify samples/test_sample.wav` | Show warnings, detailed logs | Developer debug info |\n\n\n\n\nNB: When pushing to Github, do not include any .identifier files.\n\nYou can manually clean up stale embeddings that don\u2019t match any existing speaker folder with a quick script:\n\n# Run inside your project root\ncd storage/embeddings\nfor f in *.pt; do\n speaker=\"${f%.pt}\"\n if [ ! -d \"../speakers/$speaker\" ]; then\n echo \"Deleting stale embedding: $f\"\n rm \"$f\"\n fi\ndone\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A CLI + Web tool for speaker enrollment and identification using SpeechBrain.",
"version": "0.1.6",
"project_urls": {
"Documentation": "https://github.com/P0llen/speaker-detector#readme",
"Homepage": "https://github.com/P0llen/speaker-detector",
"Issues": "https://github.com/P0llen/speaker-detector/issues",
"Repository": "https://github.com/P0llen/speaker-detector"
},
"split_keywords": [
"speaker-recognition",
" speechbrain",
" voice",
" cli",
" ai"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3e6dc1c390313c6b7c425c8a741d2869ec22f3427828d552089681d55df77bd5",
"md5": "3db907e3eb1dcfc4fbd6043c7c3e7bb6",
"sha256": "5ebefcc0b8981504ed7d7398df32c1b176c040b0808f6ff5e604e7add90bded9"
},
"downloads": -1,
"filename": "speaker_detector-0.1.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3db907e3eb1dcfc4fbd6043c7c3e7bb6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 83009530,
"upload_time": "2025-07-11T10:44:54",
"upload_time_iso_8601": "2025-07-11T10:44:54.111841Z",
"url": "https://files.pythonhosted.org/packages/3e/6d/c1c390313c6b7c425c8a741d2869ec22f3427828d552089681d55df77bd5/speaker_detector-0.1.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2765370c5ebca08e2972f05489fade19dfee660b070df999ded50b7ba6039e43",
"md5": "c65c798d2be6536468e8474274b557b2",
"sha256": "26f15eb4fe706513d1265562ea8699f45b9be010078f8e4266f1f51eb593e8b0"
},
"downloads": -1,
"filename": "speaker_detector-0.1.6.tar.gz",
"has_sig": false,
"md5_digest": "c65c798d2be6536468e8474274b557b2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 83009355,
"upload_time": "2025-07-11T10:45:35",
"upload_time_iso_8601": "2025-07-11T10:45:35.394316Z",
"url": "https://files.pythonhosted.org/packages/27/65/370c5ebca08e2972f05489fade19dfee660b070df999ded50b7ba6039e43/speaker_detector-0.1.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-11 10:45:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "P0llen",
"github_project": "speaker-detector#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "torch",
"specs": []
},
{
"name": "torchaudio",
"specs": []
},
{
"name": "speechbrain",
"specs": []
},
{
"name": "pydub",
"specs": []
},
{
"name": "sounddevice",
"specs": []
},
{
"name": "soundfile",
"specs": []
},
{
"name": "flask",
"specs": []
},
{
"name": "flask-cors",
"specs": []
},
{
"name": "numpy",
"specs": []
},
{
"name": "build",
"specs": []
},
{
"name": "twine",
"specs": []
}
],
"lcname": "speaker-detector"
}