speaker-diarization-system

Name	speaker-diarization-system JSON
Version	1.0.2 JSON
	download
home_page	None
Summary	An AI-powered script to identify speakers in an audio file and split them into separate, clean tracks.
upload_time	2025-07-29 00:13:59
maintainer	None
docs_url	None
author	None
requires_python	>=3.11
license	MIT License Copyright (c) [2025] [Lukium] Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	diarization speaker audio pyannote splitter ai
VCS
bugtrack_url
requirements	pyannote.audio pyannote.core pyannote.database pyannote.metrics pyannote.pipeline pytorch-lightning torchmetrics numpy python-dotenv huggingface_hub pydub
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

```
██╗ ██╗ ██╗██╗ ██╗██╗██╗ ██╗███╗ ███╗
██║ ██║ ██║██║ ██╔╝██║██║ ██║████╗ ████║
██║ ██║ ██║█████╔╝ ██║██║ ██║██╔████╔██║
██║ ██║ ██║██╔═██╗ ██║██║ ██║██║╚██╔╝██║
███████╗╚██████╔╝██║ ██╗██║╚██████╔╝██║ ╚═╝ ██║
╚══════╝ ╚═════╝ ╚═╝ ╚═╝╚═╝ ╚═════╝ ╚═╝ ╚═╝
```

# Speaker Diarization & Splitting System

A powerful Python script that automatically identifies different speakers in an audio file and splits them into separate, clean tracks. Built by **Lukium**.

---

## Overview

This project uses AI-powered speaker diarization (thanks to `pyannote.audio`) to process audio files containing multiple speakers. It intelligently determines who is speaking and when, then exports a separate audio file for each person.

The key feature is its ability to remove **crosstalk**. The output tracks contain silence when the speaker is not talking, ensuring that overlapping speech is eliminated. This makes it an ideal tool for podcast editing, interview transcription, character animation workflows, and any other task requiring isolated speaker audio.

## Features

- **🎙️ Multi-Speaker Diarization:** Identifies and separates an unlimited number of speakers in a single audio file.
- **🧹 Crosstalk Removal:** Generates clean, non-overlapping audio tracks for each speaker.
- **⚙️ Batch Processing:** Automatically processes all supported audio files (`.wav`, `.mp3`, `.m4a`, `.flac`) in the `audio/pending` directory.
- **🚀 GPU Acceleration:** Automatically detects and uses an NVIDIA GPU for significantly faster processing.
- **🗣️ Flexible Speaker Count:** You can specify an exact number of speakers, a min/max range, or let the model detect it automatically.
- **🤫 Verbose/Quiet Mode:** Run in quiet mode for clean output, or use the `--verbose` flag to see detailed logs for debugging.

## 🤖 Automated Sanity Checks
The main split_speakers.py script is designed to make the first run as smooth as possible by including automated checks for common setup problems. If you forget a step, the script will try to help you fix it.

Missing FFmpeg: If the script can't find ffmpeg in your system's PATH, it will print an error with instructions and automatically open the FFmpeg download page in your browser before exiting.

Hugging Face Model Access: The script proactively checks if you have accepted the user agreements for the required pyannote models. If you haven't accepted one, it will print a message identifying the specific model and automatically open its Hugging Face page for you to accept the terms.

## Prerequisites

Before you begin, ensure you have the following installed on your system:

1. **Python 3.9+**
2. **Git** (for cloning the repository).
3. **NVIDIA GPU with CUDA Drivers** (required for GPU acceleration).
4. **FFmpeg:** The script requires FFmpeg for audio processing.
- Download from: [https://www.gyan.dev/ffmpeg/builds/](https://www.gyan.dev/ffmpeg/builds/)
- Ensure the `bin` folder from the download is added to your system's `PATH`.

## Setup & Installation

This project uses **`uv`** for fast and reliable Python package management. The setup process is guided by an interactive script.

1. **Clone the Repository**
```bash
git clone <your-repository-url>
cd <your-repository-folder>
```

2. **Install `uv`**
If you don't have `uv` installed, follow the official instructions for your OS:
[https://github.com/astral-sh/uv](https://github.com/astral-sh/uv)

3. **Create & Activate a Virtual Environment**
It's critical to run this project in a dedicated virtual environment. **Run your terminal as an Administrator** for this process on Windows.
```bash
# Create the environment with pip bootstrapped
uv venv .venv --seed

# Activate it (on Windows)
.venv\Scripts\activate
```

4. **Run the Interactive Setup Script**
This script will detect your hardware and install the correct dependencies.
```bash
python install.py
```
Follow the on-screen prompts. If you have an NVIDIA GPU, it will ask if you want to install the CUDA-enabled libraries.

5. **Create `.env` File**
Create a file named `.env` in the project's root directory. Get a **read** access token from [Hugging Face](https://huggingface.co/settings/tokens) and add it to the file:
```
HF_TOKEN=hf_YourAccessTokenGoesHere
```

6. **Accept Hugging Face Agreements**
You must accept the user conditions for the gated models used by this project. Visit the links below, make sure you are logged in, and click the "Access repository" button on each page.
- [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1)
- [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0)

## Usage

1. **Place Files:** Add the audio files you want to process into the `audio/pending` directory.
2. **Run the Script:** Execute the script from your terminal with your virtual environment active.

#### Command Examples:

- **Automatic speaker detection:**
```bash
python split_speakers.py
```
- **Specify an exact number of speakers (e.g., 2):**
```bash
python split_speakers.py 2
```
- **Specify a range of speakers (e.g., min 2, max 4):**
```bash
python split_speakers.py 2 4
```
- **Run in verbose/debug mode:**
```bash
python split_speakers.py --verbose
```

#### File Workflow:

- **Input:** `audio/pending/your_file.wav`
- **Processed Original:** `audio/processed/your_file.wav`
- **Output:** `audio/completed/your_file_SPEAKER_00.wav`, `audio/completed/your_file_SPEAKER_01.wav`, etc.

## Troubleshooting

- **`Permission denied` Errors during Setup:** You must run your terminal (PowerShell/Command Prompt) **as an Administrator** on Windows to ensure the setup script can write to the virtual environment directory.
- **`nvidia-smi` Not Found:** This means your NVIDIA drivers are not installed correctly or `nvidia-smi.exe` is not in your system's `PATH`.
- **Hugging Face Errors:** If you get a `401` or `GatedRepoError`, double-check that your `HF_TOKEN` in the `.env` file is correct and that you have accepted the user agreements for both required models.
- **Latest Libraries Causing Bugs?** If you suspect a new library version has introduced a bug, you can install a known-stable set of dependencies by running the setup script in failsafe mode: `python install.py --failsafe`.

## License

This project is licensed under the MIT License. See the `LICENSE` file for details.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "speaker-diarization-system",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "diarization, speaker, audio, pyannote, splitter, ai",
    "author": null,
    "author_email": "Lukium <lukium@pm.me>",
    "download_url": "https://files.pythonhosted.org/packages/b5/c3/18e27205a74226b02c90fe4754fc6ce4d05470ae44bcefd3b94043eed827/speaker_diarization_system-1.0.2.tar.gz",
    "platform": null,
    "description": "```\r\n  \u2588\u2588\u2557     \u2588\u2588\u2557   \u2588\u2588\u2557\u2588\u2588\u2557  \u2588\u2588\u2557\u2588\u2588\u2557\u2588\u2588\u2557   \u2588\u2588\u2557\u2588\u2588\u2588\u2557   \u2588\u2588\u2588\u2557 \r\n  \u2588\u2588\u2551     \u2588\u2588\u2551   \u2588\u2588\u2551\u2588\u2588\u2551 \u2588\u2588\u2554\u255d\u2588\u2588\u2551\u2588\u2588\u2551   \u2588\u2588\u2551\u2588\u2588\u2588\u2588\u2557 \u2588\u2588\u2588\u2588\u2551 \r\n  \u2588\u2588\u2551     \u2588\u2588\u2551   \u2588\u2588\u2551\u2588\u2588\u2588\u2588\u2588\u2554\u255d \u2588\u2588\u2551\u2588\u2588\u2551   \u2588\u2588\u2551\u2588\u2588\u2554\u2588\u2588\u2588\u2588\u2554\u2588\u2588\u2551 \r\n  \u2588\u2588\u2551     \u2588\u2588\u2551   \u2588\u2588\u2551\u2588\u2588\u2554\u2550\u2588\u2588\u2557 \u2588\u2588\u2551\u2588\u2588\u2551   \u2588\u2588\u2551\u2588\u2588\u2551\u255a\u2588\u2588\u2554\u255d\u2588\u2588\u2551 \r\n  \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2557\u255a\u2588\u2588\u2588\u2588\u2588\u2588\u2554\u255d\u2588\u2588\u2551  \u2588\u2588\u2557\u2588\u2588\u2551\u255a\u2588\u2588\u2588\u2588\u2588\u2588\u2554\u255d\u2588\u2588\u2551 \u255a\u2550\u255d \u2588\u2588\u2551 \r\n  \u255a\u2550\u2550\u2550\u2550\u2550\u2550\u255d \u255a\u2550\u2550\u2550\u2550\u2550\u255d \u255a\u2550\u255d  \u255a\u2550\u255d\u255a\u2550\u255d \u255a\u2550\u2550\u2550\u2550\u2550\u255d \u255a\u2550\u255d     \u255a\u2550\u255d \r\n```\r\n\r\n# Speaker Diarization & Splitting System\r\n\r\nA powerful Python script that automatically identifies different speakers in an audio file and splits them into separate, clean tracks. Built by **Lukium**.\r\n\r\n---\r\n\r\n## Overview\r\n\r\nThis project uses AI-powered speaker diarization (thanks to `pyannote.audio`) to process audio files containing multiple speakers. It intelligently determines who is speaking and when, then exports a separate audio file for each person.\r\n\r\nThe key feature is its ability to remove **crosstalk**. The output tracks contain silence when the speaker is not talking, ensuring that overlapping speech is eliminated. This makes it an ideal tool for podcast editing, interview transcription, character animation workflows, and any other task requiring isolated speaker audio.\r\n\r\n## Features\r\n\r\n-   **\ud83c\udf99\ufe0f Multi-Speaker Diarization:** Identifies and separates an unlimited number of speakers in a single audio file.\r\n-   **\ud83e\uddf9 Crosstalk Removal:** Generates clean, non-overlapping audio tracks for each speaker.\r\n-   **\u2699\ufe0f Batch Processing:** Automatically processes all supported audio files (`.wav`, `.mp3`, `.m4a`, `.flac`) in the `audio/pending` directory.\r\n-   **\ud83d\ude80 GPU Acceleration:** Automatically detects and uses an NVIDIA GPU for significantly faster processing.\r\n-   **\ud83d\udde3\ufe0f Flexible Speaker Count:** You can specify an exact number of speakers, a min/max range, or let the model detect it automatically.\r\n-   **\ud83e\udd2b Verbose/Quiet Mode:** Run in quiet mode for clean output, or use the `--verbose` flag to see detailed logs for debugging.\r\n\r\n## \ud83e\udd16 Automated Sanity Checks\r\nThe main split_speakers.py script is designed to make the first run as smooth as possible by including automated checks for common setup problems. If you forget a step, the script will try to help you fix it.\r\n\r\nMissing FFmpeg: If the script can't find ffmpeg in your system's PATH, it will print an error with instructions and automatically open the FFmpeg download page in your browser before exiting.\r\n\r\nHugging Face Model Access: The script proactively checks if you have accepted the user agreements for the required pyannote models. If you haven't accepted one, it will print a message identifying the specific model and automatically open its Hugging Face page for you to accept the terms.\r\n\r\n## Prerequisites\r\n\r\nBefore you begin, ensure you have the following installed on your system:\r\n\r\n1.  **Python 3.9+**\r\n2.  **Git** (for cloning the repository).\r\n3.  **NVIDIA GPU with CUDA Drivers** (required for GPU acceleration).\r\n4.  **FFmpeg:** The script requires FFmpeg for audio processing.\r\n    -   Download from: [https://www.gyan.dev/ffmpeg/builds/](https://www.gyan.dev/ffmpeg/builds/)\r\n    -   Ensure the `bin` folder from the download is added to your system's `PATH`.\r\n\r\n## Setup & Installation\r\n\r\nThis project uses **`uv`** for fast and reliable Python package management. The setup process is guided by an interactive script.\r\n\r\n1.  **Clone the Repository**\r\n    ```bash\r\n    git clone <your-repository-url>\r\n    cd <your-repository-folder>\r\n    ```\r\n\r\n2.  **Install `uv`**\r\n    If you don't have `uv` installed, follow the official instructions for your OS:\r\n    [https://github.com/astral-sh/uv](https://github.com/astral-sh/uv)\r\n\r\n3.  **Create & Activate a Virtual Environment**\r\n    It's critical to run this project in a dedicated virtual environment. **Run your terminal as an Administrator** for this process on Windows.\r\n    ```bash\r\n    # Create the environment with pip bootstrapped\r\n    uv venv .venv --seed\r\n\r\n    # Activate it (on Windows)\r\n    .venv\\Scripts\\activate\r\n    ```\r\n\r\n4.  **Run the Interactive Setup Script**\r\n    This script will detect your hardware and install the correct dependencies.\r\n    ```bash\r\n    python install.py\r\n    ```\r\n    Follow the on-screen prompts. If you have an NVIDIA GPU, it will ask if you want to install the CUDA-enabled libraries.\r\n\r\n5.  **Create `.env` File**\r\n    Create a file named `.env` in the project's root directory. Get a **read** access token from [Hugging Face](https://huggingface.co/settings/tokens) and add it to the file:\r\n    ```\r\n    HF_TOKEN=hf_YourAccessTokenGoesHere\r\n    ```\r\n\r\n6.  **Accept Hugging Face Agreements**\r\n    You must accept the user conditions for the gated models used by this project. Visit the links below, make sure you are logged in, and click the \"Access repository\" button on each page.\r\n    -   [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1)\r\n    -   [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0)\r\n\r\n## Usage\r\n\r\n1.  **Place Files:** Add the audio files you want to process into the `audio/pending` directory.\r\n2.  **Run the Script:** Execute the script from your terminal with your virtual environment active.\r\n\r\n#### Command Examples:\r\n\r\n-   **Automatic speaker detection:**\r\n    ```bash\r\n    python split_speakers.py\r\n    ```\r\n-   **Specify an exact number of speakers (e.g., 2):**\r\n    ```bash\r\n    python split_speakers.py 2\r\n    ```\r\n-   **Specify a range of speakers (e.g., min 2, max 4):**\r\n    ```bash\r\n    python split_speakers.py 2 4\r\n    ```\r\n-   **Run in verbose/debug mode:**\r\n    ```bash\r\n    python split_speakers.py --verbose\r\n    ```\r\n\r\n#### File Workflow:\r\n\r\n-   **Input:** `audio/pending/your_file.wav`\r\n-   **Processed Original:** `audio/processed/your_file.wav`\r\n-   **Output:** `audio/completed/your_file_SPEAKER_00.wav`, `audio/completed/your_file_SPEAKER_01.wav`, etc.\r\n\r\n## Troubleshooting\r\n\r\n-   **`Permission denied` Errors during Setup:** You must run your terminal (PowerShell/Command Prompt) **as an Administrator** on Windows to ensure the setup script can write to the virtual environment directory.\r\n-   **`nvidia-smi` Not Found:** This means your NVIDIA drivers are not installed correctly or `nvidia-smi.exe` is not in your system's `PATH`.\r\n-   **Hugging Face Errors:** If you get a `401` or `GatedRepoError`, double-check that your `HF_TOKEN` in the `.env` file is correct and that you have accepted the user agreements for both required models.\r\n-   **Latest Libraries Causing Bugs?** If you suspect a new library version has introduced a bug, you can install a known-stable set of dependencies by running the setup script in failsafe mode: `python install.py --failsafe`.\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License. See the `LICENSE` file for details.\r\n",
    "bugtrack_url": null,
    "license": "MIT License\r\n        \r\n        Copyright (c) [2025] [Lukium]\r\n        \r\n        Permission is hereby granted, free of charge, to any person obtaining a copy\r\n        of this software and associated documentation files (the \"Software\"), to deal\r\n        in the Software without restriction, including without limitation the rights\r\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\r\n        copies of the Software, and to permit persons to whom the Software is\r\n        furnished to do so, subject to the following conditions:\r\n        \r\n        The above copyright notice and this permission notice shall be included in all\r\n        copies or substantial portions of the Software.\r\n        \r\n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\r\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\r\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\r\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\r\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\r\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\r\n        SOFTWARE.",
    "summary": "An AI-powered script to identify speakers in an audio file and split them into separate, clean tracks.",
    "version": "1.0.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/lukium/speaker-diarization-system/issues",
        "Homepage": "https://github.com/lukium/speaker-diarization-system"
    },
    "split_keywords": [
        "diarization",
        " speaker",
        " audio",
        " pyannote",
        " splitter",
        " ai"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cc44c76189cc22ac283586c89bd77e1e5e8dde43c26fd11b0814c5c552954309",
                "md5": "01e0e6920846778412c029670d21622f",
                "sha256": "d1624f6f91bfbc5f41aedb66b6b1b59342641ee24ba44e556a8a97a585252cdf"
            },
            "downloads": -1,
            "filename": "speaker_diarization_system-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "01e0e6920846778412c029670d21622f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 14704,
            "upload_time": "2025-07-29T00:13:58",
            "upload_time_iso_8601": "2025-07-29T00:13:58.233291Z",
            "url": "https://files.pythonhosted.org/packages/cc/44/c76189cc22ac283586c89bd77e1e5e8dde43c26fd11b0814c5c552954309/speaker_diarization_system-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b5c318e27205a74226b02c90fe4754fc6ce4d05470ae44bcefd3b94043eed827",
                "md5": "f1310626e4cc7b0a73430709206d5566",
                "sha256": "3006a252963d8176e506e34b111fdc2feb6b14711117b51234686114e1e4a713"
            },
            "downloads": -1,
            "filename": "speaker_diarization_system-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "f1310626e4cc7b0a73430709206d5566",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 14060,
            "upload_time": "2025-07-29T00:13:59",
            "upload_time_iso_8601": "2025-07-29T00:13:59.945401Z",
            "url": "https://files.pythonhosted.org/packages/b5/c3/18e27205a74226b02c90fe4754fc6ce4d05470ae44bcefd3b94043eed827/speaker_diarization_system-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-29 00:13:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lukium",
    "github_project": "speaker-diarization-system",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pyannote.audio",
            "specs": [
                [
                    "==",
                    "3.1.1"
                ]
            ]
        },
        {
            "name": "pyannote.core",
            "specs": [
                [
                    "==",
                    "5.0.0"
                ]
            ]
        },
        {
            "name": "pyannote.database",
            "specs": [
                [
                    "==",
                    "5.1.0"
                ]
            ]
        },
        {
            "name": "pyannote.metrics",
            "specs": [
                [
                    "==",
                    "3.2.1"
                ]
            ]
        },
        {
            "name": "pyannote.pipeline",
            "specs": [
                [
                    "==",
                    "3.0.1"
                ]
            ]
        },
        {
            "name": "pytorch-lightning",
            "specs": [
                [
                    "==",
                    "2.2.5"
                ]
            ]
        },
        {
            "name": "torchmetrics",
            "specs": [
                [
                    "==",
                    "1.3.2"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.26.4"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": []
        },
        {
            "name": "huggingface_hub",
            "specs": []
        },
        {
            "name": "pydub",
            "specs": []
        }
    ],
    "lcname": "speaker-diarization-system"
}

None