08/08: a lot of improvements to the speaker-detector-client which resulted in a few changes to this backend.
23/07/2025 - Lara Whybrow, Creator - it has a few bugs that need fixing, but I ma determining if it is data related or software related. Feel free to clone from Github and help with bug fixes.
# speaker-detector ποΈ
A lightweight CLI tool for speaker enrollment and voice identification, powered by [SpeechBrain](https://speechbrain.readthedocs.io/).
## π§ Features
- β
Enroll speakers from .wav audio
- π΅οΈ Identify speakers from audio samples
- π§ ECAPA-TDNN embedding-based matching
- ποΈ Simple, fast command-line interface
- π Clean file storage in `~/.speaker-detector/`
- π Optional `--verbose` mode for debugging
Web UI note: The web client uses a guided-only enrollment flow (multiple short recordings). Quick enroll with a single clip has been removed to ensure model accuracy.
## π¦ Installation
```bash
pip install speaker-detector
When installing packages with a stale requirement file you might need to use: pip install --break-system-packages soundfile to install on WSL Ubuntu environment.
Run this version with -m module flag if you are having issues with running server.py:
python3 -m speaker_detector.server
```
## π Example Usage
## ποΈ Enroll a speaker:
```bash
speaker-detector record --enroll Lara
```
## π΅οΈ Identify a speaker:
```bash
speaker-detector record --test
```
## π List enrolled speakers:
```bash
speaker-detector list
```
## ποΈ Project Structure
~/.speaker-detector/enrollments/ Saved .pt voice embeddings
~/.speaker-detector/recordings/ CLI-recorded .wav audio files
π§Ή Clean vs Verbose Mode
By default, warnings from speechbrain, torch, etc. are hidden for a clean CLI experience.
To enable full logs & deprecation warnings:
speaker-detector --verbose identify samples/test_sample.wav
π Requirements
Python 3.8+
torch
speechbrain
numpy
soundfile
onnxruntime
| Step | Command | When / Purpose | Output |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------------- | ----------------------------- | ---------------------------------------- |
| **1. Export ECAPA Model to ONNX** | `speaker-detector export-model --pt models/embedding_model.ckpt --out ecapa_model.onnx` | Run once unless model changes | `ecapa_model.onnx` |
| **2. Enroll Speaker** | `speaker-detector enroll <speaker_id> <audio_path>`<br>Example:<br>`speaker-detector enroll Lara samples/lara1.wav` | Run per new speaker | Individual `.pt` files (e.g., `Lara.pt`) |
| **3. Combine Embeddings** | `speaker-detector combine --folder data/embeddings/ --out data/enrolled_speakers.pt` | After enrolling speakers | `enrolled_speakers.pt` |
| **4. Export Speakers to JSON** | `speaker-detector export-speaker-json --pt data/enrolled_speakers.pt --out public/speakers.json` | For frontend use | `speakers.json` |
| **5. Identify Speaker** | `speaker-detector identify samples/test_sample.wav` | Identify speaker from audio | Console output: name + score |
| **6. List Enrolled Speakers** | `speaker-detector list-speakers` | Show all enrolled speakers | Console output: list of IDs |
| **Verbose Mode (optional)** | Add `--verbose` to any command:<br>`speaker-detector --verbose identify samples/test_sample.wav` | Show warnings, detailed logs | Developer debug info |
NB: When pushing to Github, do not include any .identifier files.
You can manually clean up stale embeddings that donβt match any existing speaker folder with a quick script:
# Run inside your project root
cd storage/embeddings
for f in *.pt; do
speaker="${f%.pt}"
if [ ! -d "../speakers/$speaker" ]; then
echo "Deleting stale embedding: $f"
rm "$f"
fi
done
## HTTP API: Online & Detection State
This backend exposes simple endpoints to let a client know when the server is reachable and when live detection is ready to be polled.
### Online (one-shot SSE)
- Path: `GET /api/online`
- Headers:
- `Content-Type: text/event-stream`
- `Cache-Control: no-cache`
- `Connection: keep-alive`
- `Access-Control-Allow-Origin: http://localhost:5173` (override with env `CLIENT_ORIGIN`)
- Behavior: immediately emits a single event and closes the stream.
Example event:
```
event: online
data: 1
```
This removes the need for heartbeat polling: as soon as the client connects, it can mark the backend as reachable.
### Detection State (SSE)
- Path: `GET /api/detection-state`
- Emits an immediate state and then re-emits on changes; includes keep-alives.
- Event name: `detection`
- Data: `running` | `stopped`
Example stream excerpts:
```
event: detection
data: stopped
: keep-alive
event: detection
data: running
```
Clients can start polling `/api/active-speaker` only when the state is `running`, and pause when `stopped`.
### Active Speaker (readiness semantics)
- Path: `GET /api/active-speaker`
- Responses:
- When listening mode is OFF: `200 { "status": "disabled", "speaker": null, "confidence": null, "is_speaking": false }`
- When mode is ON but engine not yet ready (e.g., mic unavailable or loop not running): `200 { "status": "pending", ... }`
- When running and healthy: `200` with the usual payload including `speaker`, `confidence`, `is_speaking`, `status: "listening"`, and optional `suggested`.
These semantics avoid red 503s in DevTools while still making state transitions explicit for the client.
### Quick Examples
Curl (SSE streams)
```
# One-shot online event
curl -N -H 'Accept: text/event-stream' http://127.0.0.1:9000/api/online
# Detection state stream (emits running|stopped)
curl -N -H 'Accept: text/event-stream' http://127.0.0.1:9000/api/detection-state
```
Browser client (minimal)
```js
// Reachability: mark backend online as soon as server is up
const online = new EventSource('http://127.0.0.1:9000/api/online');
online.addEventListener('online', () => {
console.log('Backend online');
online.close(); // one-shot
});
// Detection state: start/stop polling active speaker
let pollTimer = null;
function startPolling() {
if (pollTimer) return;
pollTimer = setInterval(async () => {
try {
const r = await fetch('http://127.0.0.1:9000/api/active-speaker');
const j = await r.json();
if (j.status === 'disabled' || j.status === 'pending') return; // wait
console.log('Active:', j);
} catch (e) {
console.warn('poll failed', e);
}
}, 500);
}
function stopPolling() { clearInterval(pollTimer); pollTimer = null; }
const detect = new EventSource('http://127.0.0.1:9000/api/detection-state');
detect.addEventListener('detection', (ev) => {
const state = (ev.data || '').trim();
if (state === 'running') startPolling(); else stopPolling();
});
```
Raw data
{
"_id": null,
"home_page": null,
"name": "speaker-detector",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "speaker-recognition, speechbrain, voice, cli, ai",
"author": null,
"author_email": "Lara Whybrow <lara.whybrow@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/e2/69/ba2f3f662605ada1a4c4aa78d442264d56db608b1085ee63d4c117e25d2c/speaker_detector-0.2.4.tar.gz",
"platform": null,
"description": "08/08: a lot of improvements to the speaker-detector-client which resulted in a few changes to this backend. \n\n23/07/2025 - Lara Whybrow, Creator - it has a few bugs that need fixing, but I ma determining if it is data related or software related. Feel free to clone from Github and help with bug fixes. \n\n# speaker-detector \ud83c\udf99\ufe0f\n\nA lightweight CLI tool for speaker enrollment and voice identification, powered by [SpeechBrain](https://speechbrain.readthedocs.io/).\n\n## \ud83d\udd27 Features\n\n\n- \u2705 Enroll speakers from .wav audio\n- \ud83d\udd75\ufe0f Identify speakers from audio samples\n- \ud83e\udde0 ECAPA-TDNN embedding-based matching\n- \ud83c\udf9b\ufe0f Simple, fast command-line interface\n- \ud83d\udcc1 Clean file storage in `~/.speaker-detector/`\n- \ud83d\udd0a Optional `--verbose` mode for debugging\n\nWeb UI note: The web client uses a guided-only enrollment flow (multiple short recordings). Quick enroll with a single clip has been removed to ensure model accuracy.\n\n\n## \ud83d\udce6 Installation\n\n\n```bash\npip install speaker-detector\n\n\nWhen installing packages with a stale requirement file you might need to use: pip install --break-system-packages soundfile to install on WSL Ubuntu environment.\n\nRun this version with -m module flag if you are having issues with running server.py:\npython3 -m speaker_detector.server\n\n```\n\n## \ud83d\ude80 Example Usage\n\n## \ud83c\udf99\ufe0f Enroll a speaker:\n\n```bash\nspeaker-detector record --enroll Lara\n```\n\n## \ud83d\udd75\ufe0f Identify a speaker:\n\n```bash\nspeaker-detector record --test\n```\n## \ud83d\udccb List enrolled speakers:\n\n```bash\nspeaker-detector list\n```\n\n## \ud83d\uddc2\ufe0f Project Structure\n\n~/.speaker-detector/enrollments/\t Saved .pt voice embeddings\n~/.speaker-detector/recordings/\t CLI-recorded .wav audio files\n\n\ud83e\uddf9 Clean vs Verbose Mode\nBy default, warnings from speechbrain, torch, etc. are hidden for a clean CLI experience.\nTo enable full logs & deprecation warnings:\n\nspeaker-detector --verbose identify samples/test_sample.wav\n\n\ud83d\udee0 Requirements\nPython 3.8+\ntorch\nspeechbrain\nnumpy\nsoundfile\nonnxruntime\n\n| Step | Command | When / Purpose | Output |\n| --------------------------------- | ------------------------------------------------------------------------------------------------------------------- | ----------------------------- | ---------------------------------------- |\n| **1. Export ECAPA Model to ONNX** | `speaker-detector export-model --pt models/embedding_model.ckpt --out ecapa_model.onnx` | Run once unless model changes | `ecapa_model.onnx` |\n| **2. Enroll Speaker** | `speaker-detector enroll <speaker_id> <audio_path>`<br>Example:<br>`speaker-detector enroll Lara samples/lara1.wav` | Run per new speaker | Individual `.pt` files (e.g., `Lara.pt`) |\n| **3. Combine Embeddings** | `speaker-detector combine --folder data/embeddings/ --out data/enrolled_speakers.pt` | After enrolling speakers | `enrolled_speakers.pt` |\n| **4. Export Speakers to JSON** | `speaker-detector export-speaker-json --pt data/enrolled_speakers.pt --out public/speakers.json` | For frontend use | `speakers.json` |\n| **5. Identify Speaker** | `speaker-detector identify samples/test_sample.wav` | Identify speaker from audio | Console output: name + score |\n| **6. List Enrolled Speakers** | `speaker-detector list-speakers` | Show all enrolled speakers | Console output: list of IDs |\n| **Verbose Mode (optional)** | Add `--verbose` to any command:<br>`speaker-detector --verbose identify samples/test_sample.wav` | Show warnings, detailed logs | Developer debug info |\n\n\n\n\nNB: When pushing to Github, do not include any .identifier files.\n\nYou can manually clean up stale embeddings that don\u2019t match any existing speaker folder with a quick script:\n\n# Run inside your project root\ncd storage/embeddings\nfor f in *.pt; do\n speaker=\"${f%.pt}\"\n if [ ! -d \"../speakers/$speaker\" ]; then\n echo \"Deleting stale embedding: $f\"\n rm \"$f\"\n fi\ndone\n\n\n## HTTP API: Online & Detection State\n\nThis backend exposes simple endpoints to let a client know when the server is reachable and when live detection is ready to be polled.\n\n### Online (one-shot SSE)\n\n- Path: `GET /api/online`\n- Headers:\n - `Content-Type: text/event-stream`\n - `Cache-Control: no-cache`\n - `Connection: keep-alive`\n - `Access-Control-Allow-Origin: http://localhost:5173` (override with env `CLIENT_ORIGIN`)\n- Behavior: immediately emits a single event and closes the stream.\n\nExample event:\n\n```\nevent: online\ndata: 1\n\n```\n\nThis removes the need for heartbeat polling: as soon as the client connects, it can mark the backend as reachable.\n\n### Detection State (SSE)\n\n- Path: `GET /api/detection-state`\n- Emits an immediate state and then re-emits on changes; includes keep-alives.\n- Event name: `detection`\n- Data: `running` | `stopped`\n\nExample stream excerpts:\n\n```\nevent: detection\ndata: stopped\n\n: keep-alive\n\nevent: detection\ndata: running\n\n```\n\nClients can start polling `/api/active-speaker` only when the state is `running`, and pause when `stopped`.\n\n### Active Speaker (readiness semantics)\n\n- Path: `GET /api/active-speaker`\n- Responses:\n - When listening mode is OFF: `200 { \"status\": \"disabled\", \"speaker\": null, \"confidence\": null, \"is_speaking\": false }`\n - When mode is ON but engine not yet ready (e.g., mic unavailable or loop not running): `200 { \"status\": \"pending\", ... }`\n - When running and healthy: `200` with the usual payload including `speaker`, `confidence`, `is_speaking`, `status: \"listening\"`, and optional `suggested`.\n\nThese semantics avoid red 503s in DevTools while still making state transitions explicit for the client.\n\n### Quick Examples\n\nCurl (SSE streams)\n\n```\n# One-shot online event\ncurl -N -H 'Accept: text/event-stream' http://127.0.0.1:9000/api/online\n\n# Detection state stream (emits running|stopped)\ncurl -N -H 'Accept: text/event-stream' http://127.0.0.1:9000/api/detection-state\n```\n\nBrowser client (minimal)\n\n```js\n// Reachability: mark backend online as soon as server is up\nconst online = new EventSource('http://127.0.0.1:9000/api/online');\nonline.addEventListener('online', () => {\n console.log('Backend online');\n online.close(); // one-shot\n});\n\n// Detection state: start/stop polling active speaker\nlet pollTimer = null;\nfunction startPolling() {\n if (pollTimer) return;\n pollTimer = setInterval(async () => {\n try {\n const r = await fetch('http://127.0.0.1:9000/api/active-speaker');\n const j = await r.json();\n if (j.status === 'disabled' || j.status === 'pending') return; // wait\n console.log('Active:', j);\n } catch (e) {\n console.warn('poll failed', e);\n }\n }, 500);\n}\nfunction stopPolling() { clearInterval(pollTimer); pollTimer = null; }\n\nconst detect = new EventSource('http://127.0.0.1:9000/api/detection-state');\ndetect.addEventListener('detection', (ev) => {\n const state = (ev.data || '').trim();\n if (state === 'running') startPolling(); else stopPolling();\n});\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A CLI + Web tool for speaker enrollment and identification using SpeechBrain.",
"version": "0.2.4",
"project_urls": {
"Documentation": "https://github.com/P0llen/speaker-detector#readme",
"Homepage": "https://github.com/P0llen/speaker-detector",
"Issues": "https://github.com/P0llen/speaker-detector/issues",
"Repository": "https://github.com/P0llen/speaker-detector"
},
"split_keywords": [
"speaker-recognition",
" speechbrain",
" voice",
" cli",
" ai"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "d791dd30a199cb9fbe28f216d455883903541450f4a8f8129ec8cdc5d2ed1754",
"md5": "18126e98ee5a24fc53f5f1b313c81331",
"sha256": "810670d99ec185d5fb8e9b5247f92bb2dbd62954fdc4c733e470a89b4d1ce9de"
},
"downloads": -1,
"filename": "speaker_detector-0.2.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "18126e98ee5a24fc53f5f1b313c81331",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 83098191,
"upload_time": "2025-09-07T13:39:03",
"upload_time_iso_8601": "2025-09-07T13:39:03.710031Z",
"url": "https://files.pythonhosted.org/packages/d7/91/dd30a199cb9fbe28f216d455883903541450f4a8f8129ec8cdc5d2ed1754/speaker_detector-0.2.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "e269ba2f3f662605ada1a4c4aa78d442264d56db608b1085ee63d4c117e25d2c",
"md5": "dec5419c151ac72229c8298f3bde78df",
"sha256": "1a24d98f7ff5812e6cdd8ae9ba8c21616c754b73ca460f7b430ca9d9b8bb0a21"
},
"downloads": -1,
"filename": "speaker_detector-0.2.4.tar.gz",
"has_sig": false,
"md5_digest": "dec5419c151ac72229c8298f3bde78df",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 83072439,
"upload_time": "2025-09-07T13:41:36",
"upload_time_iso_8601": "2025-09-07T13:41:36.950110Z",
"url": "https://files.pythonhosted.org/packages/e2/69/ba2f3f662605ada1a4c4aa78d442264d56db608b1085ee63d4c117e25d2c/speaker_detector-0.2.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-07 13:41:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "P0llen",
"github_project": "speaker-detector#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "torch",
"specs": []
},
{
"name": "torchaudio",
"specs": []
},
{
"name": "speechbrain",
"specs": []
},
{
"name": "pydub",
"specs": []
},
{
"name": "sounddevice",
"specs": []
},
{
"name": "soundfile",
"specs": []
},
{
"name": "flask",
"specs": []
},
{
"name": "flask-cors",
"specs": []
},
{
"name": "numpy",
"specs": []
},
{
"name": "build",
"specs": []
},
{
"name": "twine",
"specs": []
}
],
"lcname": "speaker-detector"
}