speaker-detector


Namespeaker-detector JSON
Version 0.2.4 PyPI version JSON
download
home_pageNone
SummaryA CLI + Web tool for speaker enrollment and identification using SpeechBrain.
upload_time2025-09-07 13:41:36
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords speaker-recognition speechbrain voice cli ai
VCS
bugtrack_url
requirements torch torchaudio speechbrain pydub sounddevice soundfile flask flask-cors numpy build twine
Travis-CI No Travis.
coveralls test coverage No coveralls.
            08/08: a lot of improvements to the speaker-detector-client which resulted in a few changes to this backend. 

23/07/2025 - Lara Whybrow, Creator - it has a few bugs that need fixing, but I ma determining if it is data related or software related. Feel free to clone from Github and help with bug fixes. 

# speaker-detector πŸŽ™οΈ

A lightweight CLI tool for speaker enrollment and voice identification, powered by [SpeechBrain](https://speechbrain.readthedocs.io/).

## πŸ”§ Features


- βœ… Enroll speakers from .wav audio
- πŸ•΅οΈ Identify speakers from audio samples
- 🧠 ECAPA-TDNN embedding-based matching
- πŸŽ›οΈ Simple, fast command-line interface
- πŸ“ Clean file storage in `~/.speaker-detector/`
- πŸ”Š Optional `--verbose` mode for debugging

Web UI note: The web client uses a guided-only enrollment flow (multiple short recordings). Quick enroll with a single clip has been removed to ensure model accuracy.


## πŸ“¦ Installation


```bash
pip install speaker-detector


When installing packages with a stale requirement file you might need to use:  pip install --break-system-packages soundfile to install on WSL Ubuntu environment.

Run this version with -m module flag if you are having issues with running server.py:
python3 -m speaker_detector.server

```

## πŸš€ Example Usage

## πŸŽ™οΈ Enroll a speaker:

```bash
speaker-detector record --enroll Lara
```

## πŸ•΅οΈ Identify a speaker:

```bash
speaker-detector record --test
```
## πŸ“‹ List enrolled speakers:

```bash
speaker-detector list
```

## πŸ—‚οΈ Project Structure

~/.speaker-detector/enrollments/	    Saved .pt voice embeddings
~/.speaker-detector/recordings/	        CLI-recorded .wav audio files

🧹 Clean vs Verbose Mode
By default, warnings from speechbrain, torch, etc. are hidden for a clean CLI experience.
To enable full logs & deprecation warnings:

speaker-detector --verbose identify samples/test_sample.wav

πŸ›  Requirements
Python 3.8+
torch
speechbrain
numpy
soundfile
onnxruntime

| Step                              | Command                                                                                                             | When / Purpose                | Output                                   |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------------- | ----------------------------- | ---------------------------------------- |
| **1. Export ECAPA Model to ONNX** | `speaker-detector export-model --pt models/embedding_model.ckpt --out ecapa_model.onnx`                             | Run once unless model changes | `ecapa_model.onnx`                       |
| **2. Enroll Speaker**             | `speaker-detector enroll <speaker_id> <audio_path>`<br>Example:<br>`speaker-detector enroll Lara samples/lara1.wav` | Run per new speaker           | Individual `.pt` files (e.g., `Lara.pt`) |
| **3. Combine Embeddings**         | `speaker-detector combine --folder data/embeddings/ --out data/enrolled_speakers.pt`                                | After enrolling speakers      | `enrolled_speakers.pt`                   |
| **4. Export Speakers to JSON**    | `speaker-detector export-speaker-json --pt data/enrolled_speakers.pt --out public/speakers.json`                    | For frontend use              | `speakers.json`                          |
| **5. Identify Speaker**           | `speaker-detector identify samples/test_sample.wav`                                                                 | Identify speaker from audio   | Console output: name + score             |
| **6. List Enrolled Speakers**     | `speaker-detector list-speakers`                                                                                    | Show all enrolled speakers    | Console output: list of IDs              |
| **Verbose Mode (optional)**       | Add `--verbose` to any command:<br>`speaker-detector --verbose identify samples/test_sample.wav`                    | Show warnings, detailed logs  | Developer debug info                     |




NB: When pushing to Github, do not include any .identifier files.

You can manually clean up stale embeddings that don’t match any existing speaker folder with a quick script:

# Run inside your project root
cd storage/embeddings
for f in *.pt; do
  speaker="${f%.pt}"
  if [ ! -d "../speakers/$speaker" ]; then
    echo "Deleting stale embedding: $f"
    rm "$f"
  fi
done


## HTTP API: Online & Detection State

This backend exposes simple endpoints to let a client know when the server is reachable and when live detection is ready to be polled.

### Online (one-shot SSE)

- Path: `GET /api/online`
- Headers:
  - `Content-Type: text/event-stream`
  - `Cache-Control: no-cache`
  - `Connection: keep-alive`
  - `Access-Control-Allow-Origin: http://localhost:5173` (override with env `CLIENT_ORIGIN`)
- Behavior: immediately emits a single event and closes the stream.

Example event:

```
event: online
data: 1

```

This removes the need for heartbeat polling: as soon as the client connects, it can mark the backend as reachable.

### Detection State (SSE)

- Path: `GET /api/detection-state`
- Emits an immediate state and then re-emits on changes; includes keep-alives.
- Event name: `detection`
- Data: `running` | `stopped`

Example stream excerpts:

```
event: detection
data: stopped

: keep-alive

event: detection
data: running

```

Clients can start polling `/api/active-speaker` only when the state is `running`, and pause when `stopped`.

### Active Speaker (readiness semantics)

- Path: `GET /api/active-speaker`
- Responses:
  - When listening mode is OFF: `200 { "status": "disabled", "speaker": null, "confidence": null, "is_speaking": false }`
  - When mode is ON but engine not yet ready (e.g., mic unavailable or loop not running): `200 { "status": "pending", ... }`
  - When running and healthy: `200` with the usual payload including `speaker`, `confidence`, `is_speaking`, `status: "listening"`, and optional `suggested`.

These semantics avoid red 503s in DevTools while still making state transitions explicit for the client.

### Quick Examples

Curl (SSE streams)

```
# One-shot online event
curl -N -H 'Accept: text/event-stream' http://127.0.0.1:9000/api/online

# Detection state stream (emits running|stopped)
curl -N -H 'Accept: text/event-stream' http://127.0.0.1:9000/api/detection-state
```

Browser client (minimal)

```js
// Reachability: mark backend online as soon as server is up
const online = new EventSource('http://127.0.0.1:9000/api/online');
online.addEventListener('online', () => {
  console.log('Backend online');
  online.close(); // one-shot
});

// Detection state: start/stop polling active speaker
let pollTimer = null;
function startPolling() {
  if (pollTimer) return;
  pollTimer = setInterval(async () => {
    try {
      const r = await fetch('http://127.0.0.1:9000/api/active-speaker');
      const j = await r.json();
      if (j.status === 'disabled' || j.status === 'pending') return; // wait
      console.log('Active:', j);
    } catch (e) {
      console.warn('poll failed', e);
    }
  }, 500);
}
function stopPolling() { clearInterval(pollTimer); pollTimer = null; }

const detect = new EventSource('http://127.0.0.1:9000/api/detection-state');
detect.addEventListener('detection', (ev) => {
  const state = (ev.data || '').trim();
  if (state === 'running') startPolling(); else stopPolling();
});
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "speaker-detector",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "speaker-recognition, speechbrain, voice, cli, ai",
    "author": null,
    "author_email": "Lara Whybrow <lara.whybrow@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/e2/69/ba2f3f662605ada1a4c4aa78d442264d56db608b1085ee63d4c117e25d2c/speaker_detector-0.2.4.tar.gz",
    "platform": null,
    "description": "08/08: a lot of improvements to the speaker-detector-client which resulted in a few changes to this backend. \n\n23/07/2025 - Lara Whybrow, Creator - it has a few bugs that need fixing, but I ma determining if it is data related or software related. Feel free to clone from Github and help with bug fixes. \n\n# speaker-detector \ud83c\udf99\ufe0f\n\nA lightweight CLI tool for speaker enrollment and voice identification, powered by [SpeechBrain](https://speechbrain.readthedocs.io/).\n\n## \ud83d\udd27 Features\n\n\n- \u2705 Enroll speakers from .wav audio\n- \ud83d\udd75\ufe0f Identify speakers from audio samples\n- \ud83e\udde0 ECAPA-TDNN embedding-based matching\n- \ud83c\udf9b\ufe0f Simple, fast command-line interface\n- \ud83d\udcc1 Clean file storage in `~/.speaker-detector/`\n- \ud83d\udd0a Optional `--verbose` mode for debugging\n\nWeb UI note: The web client uses a guided-only enrollment flow (multiple short recordings). Quick enroll with a single clip has been removed to ensure model accuracy.\n\n\n## \ud83d\udce6 Installation\n\n\n```bash\npip install speaker-detector\n\n\nWhen installing packages with a stale requirement file you might need to use:  pip install --break-system-packages soundfile to install on WSL Ubuntu environment.\n\nRun this version with -m module flag if you are having issues with running server.py:\npython3 -m speaker_detector.server\n\n```\n\n## \ud83d\ude80 Example Usage\n\n## \ud83c\udf99\ufe0f Enroll a speaker:\n\n```bash\nspeaker-detector record --enroll Lara\n```\n\n## \ud83d\udd75\ufe0f Identify a speaker:\n\n```bash\nspeaker-detector record --test\n```\n## \ud83d\udccb List enrolled speakers:\n\n```bash\nspeaker-detector list\n```\n\n## \ud83d\uddc2\ufe0f Project Structure\n\n~/.speaker-detector/enrollments/\t    Saved .pt voice embeddings\n~/.speaker-detector/recordings/\t        CLI-recorded .wav audio files\n\n\ud83e\uddf9 Clean vs Verbose Mode\nBy default, warnings from speechbrain, torch, etc. are hidden for a clean CLI experience.\nTo enable full logs & deprecation warnings:\n\nspeaker-detector --verbose identify samples/test_sample.wav\n\n\ud83d\udee0 Requirements\nPython 3.8+\ntorch\nspeechbrain\nnumpy\nsoundfile\nonnxruntime\n\n| Step                              | Command                                                                                                             | When / Purpose                | Output                                   |\n| --------------------------------- | ------------------------------------------------------------------------------------------------------------------- | ----------------------------- | ---------------------------------------- |\n| **1. Export ECAPA Model to ONNX** | `speaker-detector export-model --pt models/embedding_model.ckpt --out ecapa_model.onnx`                             | Run once unless model changes | `ecapa_model.onnx`                       |\n| **2. Enroll Speaker**             | `speaker-detector enroll <speaker_id> <audio_path>`<br>Example:<br>`speaker-detector enroll Lara samples/lara1.wav` | Run per new speaker           | Individual `.pt` files (e.g., `Lara.pt`) |\n| **3. Combine Embeddings**         | `speaker-detector combine --folder data/embeddings/ --out data/enrolled_speakers.pt`                                | After enrolling speakers      | `enrolled_speakers.pt`                   |\n| **4. Export Speakers to JSON**    | `speaker-detector export-speaker-json --pt data/enrolled_speakers.pt --out public/speakers.json`                    | For frontend use              | `speakers.json`                          |\n| **5. Identify Speaker**           | `speaker-detector identify samples/test_sample.wav`                                                                 | Identify speaker from audio   | Console output: name + score             |\n| **6. List Enrolled Speakers**     | `speaker-detector list-speakers`                                                                                    | Show all enrolled speakers    | Console output: list of IDs              |\n| **Verbose Mode (optional)**       | Add `--verbose` to any command:<br>`speaker-detector --verbose identify samples/test_sample.wav`                    | Show warnings, detailed logs  | Developer debug info                     |\n\n\n\n\nNB: When pushing to Github, do not include any .identifier files.\n\nYou can manually clean up stale embeddings that don\u2019t match any existing speaker folder with a quick script:\n\n# Run inside your project root\ncd storage/embeddings\nfor f in *.pt; do\n  speaker=\"${f%.pt}\"\n  if [ ! -d \"../speakers/$speaker\" ]; then\n    echo \"Deleting stale embedding: $f\"\n    rm \"$f\"\n  fi\ndone\n\n\n## HTTP API: Online & Detection State\n\nThis backend exposes simple endpoints to let a client know when the server is reachable and when live detection is ready to be polled.\n\n### Online (one-shot SSE)\n\n- Path: `GET /api/online`\n- Headers:\n  - `Content-Type: text/event-stream`\n  - `Cache-Control: no-cache`\n  - `Connection: keep-alive`\n  - `Access-Control-Allow-Origin: http://localhost:5173` (override with env `CLIENT_ORIGIN`)\n- Behavior: immediately emits a single event and closes the stream.\n\nExample event:\n\n```\nevent: online\ndata: 1\n\n```\n\nThis removes the need for heartbeat polling: as soon as the client connects, it can mark the backend as reachable.\n\n### Detection State (SSE)\n\n- Path: `GET /api/detection-state`\n- Emits an immediate state and then re-emits on changes; includes keep-alives.\n- Event name: `detection`\n- Data: `running` | `stopped`\n\nExample stream excerpts:\n\n```\nevent: detection\ndata: stopped\n\n: keep-alive\n\nevent: detection\ndata: running\n\n```\n\nClients can start polling `/api/active-speaker` only when the state is `running`, and pause when `stopped`.\n\n### Active Speaker (readiness semantics)\n\n- Path: `GET /api/active-speaker`\n- Responses:\n  - When listening mode is OFF: `200 { \"status\": \"disabled\", \"speaker\": null, \"confidence\": null, \"is_speaking\": false }`\n  - When mode is ON but engine not yet ready (e.g., mic unavailable or loop not running): `200 { \"status\": \"pending\", ... }`\n  - When running and healthy: `200` with the usual payload including `speaker`, `confidence`, `is_speaking`, `status: \"listening\"`, and optional `suggested`.\n\nThese semantics avoid red 503s in DevTools while still making state transitions explicit for the client.\n\n### Quick Examples\n\nCurl (SSE streams)\n\n```\n# One-shot online event\ncurl -N -H 'Accept: text/event-stream' http://127.0.0.1:9000/api/online\n\n# Detection state stream (emits running|stopped)\ncurl -N -H 'Accept: text/event-stream' http://127.0.0.1:9000/api/detection-state\n```\n\nBrowser client (minimal)\n\n```js\n// Reachability: mark backend online as soon as server is up\nconst online = new EventSource('http://127.0.0.1:9000/api/online');\nonline.addEventListener('online', () => {\n  console.log('Backend online');\n  online.close(); // one-shot\n});\n\n// Detection state: start/stop polling active speaker\nlet pollTimer = null;\nfunction startPolling() {\n  if (pollTimer) return;\n  pollTimer = setInterval(async () => {\n    try {\n      const r = await fetch('http://127.0.0.1:9000/api/active-speaker');\n      const j = await r.json();\n      if (j.status === 'disabled' || j.status === 'pending') return; // wait\n      console.log('Active:', j);\n    } catch (e) {\n      console.warn('poll failed', e);\n    }\n  }, 500);\n}\nfunction stopPolling() { clearInterval(pollTimer); pollTimer = null; }\n\nconst detect = new EventSource('http://127.0.0.1:9000/api/detection-state');\ndetect.addEventListener('detection', (ev) => {\n  const state = (ev.data || '').trim();\n  if (state === 'running') startPolling(); else stopPolling();\n});\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A CLI + Web tool for speaker enrollment and identification using SpeechBrain.",
    "version": "0.2.4",
    "project_urls": {
        "Documentation": "https://github.com/P0llen/speaker-detector#readme",
        "Homepage": "https://github.com/P0llen/speaker-detector",
        "Issues": "https://github.com/P0llen/speaker-detector/issues",
        "Repository": "https://github.com/P0llen/speaker-detector"
    },
    "split_keywords": [
        "speaker-recognition",
        " speechbrain",
        " voice",
        " cli",
        " ai"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d791dd30a199cb9fbe28f216d455883903541450f4a8f8129ec8cdc5d2ed1754",
                "md5": "18126e98ee5a24fc53f5f1b313c81331",
                "sha256": "810670d99ec185d5fb8e9b5247f92bb2dbd62954fdc4c733e470a89b4d1ce9de"
            },
            "downloads": -1,
            "filename": "speaker_detector-0.2.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "18126e98ee5a24fc53f5f1b313c81331",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 83098191,
            "upload_time": "2025-09-07T13:39:03",
            "upload_time_iso_8601": "2025-09-07T13:39:03.710031Z",
            "url": "https://files.pythonhosted.org/packages/d7/91/dd30a199cb9fbe28f216d455883903541450f4a8f8129ec8cdc5d2ed1754/speaker_detector-0.2.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e269ba2f3f662605ada1a4c4aa78d442264d56db608b1085ee63d4c117e25d2c",
                "md5": "dec5419c151ac72229c8298f3bde78df",
                "sha256": "1a24d98f7ff5812e6cdd8ae9ba8c21616c754b73ca460f7b430ca9d9b8bb0a21"
            },
            "downloads": -1,
            "filename": "speaker_detector-0.2.4.tar.gz",
            "has_sig": false,
            "md5_digest": "dec5419c151ac72229c8298f3bde78df",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 83072439,
            "upload_time": "2025-09-07T13:41:36",
            "upload_time_iso_8601": "2025-09-07T13:41:36.950110Z",
            "url": "https://files.pythonhosted.org/packages/e2/69/ba2f3f662605ada1a4c4aa78d442264d56db608b1085ee63d4c117e25d2c/speaker_detector-0.2.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-07 13:41:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "P0llen",
    "github_project": "speaker-detector#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "torchaudio",
            "specs": []
        },
        {
            "name": "speechbrain",
            "specs": []
        },
        {
            "name": "pydub",
            "specs": []
        },
        {
            "name": "sounddevice",
            "specs": []
        },
        {
            "name": "soundfile",
            "specs": []
        },
        {
            "name": "flask",
            "specs": []
        },
        {
            "name": "flask-cors",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "build",
            "specs": []
        },
        {
            "name": "twine",
            "specs": []
        }
    ],
    "lcname": "speaker-detector"
}
        
Elapsed time: 2.80548s