precisetranscribe

Name	precisetranscribe JSON
Version	0.2.1 JSON
	download
home_page	None
Summary	Utilities for transcribing audio files using the Whisper API.
upload_time	2024-06-21 16:53:19
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	None
keywords	transcription
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            Transcription with OpenAI's [Whisper](https://github.com/openai/whisper) is very accurate, but it doesn't natively support speaker labelling (diarisation). 

Existing libraries for diarisation like [pyannote](https://github.com/pyannote/pyannote-audio) rely on audio features to separate and identify speakers, but are computationally expensive and often inaccurate. A common failure mode arises when the speaker changes their audio quality, such as when they move closer to or further from the microphone. This can cause the diarisation algorithm to incorrectly identify the speaker as a new person.

I had a simple hypothesis: the cues from transcribed speech are sufficient to identify speakers. I developed a pipeline which passes the transcribed text to GPT-4o with a prompt asking it to identify the speaker.

```mermaid
flowchart LR
    A[Input file #40;audio, video#41;]
    B[Whisper transcription]
    C[Text output]
    D[Label and tidy with GPT-4o]
    E[Output in user-defined format]
    A-- Convert to WAV -->B
    B-->C
    C-->D
    D-->C
    D-->E
```

# Installation

```bash
pip install precisetranscribe
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "precisetranscribe",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "transcription",
    "author": null,
    "author_email": "Aeron Laffere <ajlaffere@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/ae/7b/b3ced5996b86562dbcb1ded3e42657651c4dbfa5ed0a4a40a0dd5aad62e2/precisetranscribe-0.2.1.tar.gz",
    "platform": null,
    "description": "Transcription with OpenAI's [Whisper](https://github.com/openai/whisper) is very accurate, but it doesn't natively support speaker labelling (diarisation). \n\nExisting libraries for diarisation like [pyannote](https://github.com/pyannote/pyannote-audio) rely on audio features to separate and identify speakers, but are computationally expensive and often inaccurate. A common failure mode arises when the speaker changes their audio quality, such as when they move closer to or further from the microphone. This can cause the diarisation algorithm to incorrectly identify the speaker as a new person.\n\nI had a simple hypothesis: the cues from transcribed speech are sufficient to identify speakers. I developed a pipeline which passes the transcribed text to GPT-4o with a prompt asking it to identify the speaker.\n\n```mermaid\nflowchart LR\n    A[Input file #40;audio, video#41;]\n    B[Whisper transcription]\n    C[Text output]\n    D[Label and tidy with GPT-4o]\n    E[Output in user-defined format]\n    A-- Convert to WAV -->B\n    B-->C\n    C-->D\n    D-->C\n    D-->E\n```\n\n# Installation\n\n```bash\npip install precisetranscribe\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Utilities for transcribing audio files using the Whisper API.",
    "version": "0.2.1",
    "project_urls": {
        "Documentation": "https://transcribe.readthedocs.io/",
        "Homepage": "https://github.com/aeronjl/transcribe",
        "Repository": "https://github.com/aeronjl/transcribe.git"
    },
    "split_keywords": [
        "transcription"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ee0f9e0792831fc2e0f0e8332d02a9a36f1351a678ee6efacb6f8911afc605f2",
                "md5": "4f84035126efee289711141e3a40f36b",
                "sha256": "4b6bbe41ff29cd7b992196970f79b176557f4a0b511d734b16e918c6370de70b"
            },
            "downloads": -1,
            "filename": "precisetranscribe-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4f84035126efee289711141e3a40f36b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 8217,
            "upload_time": "2024-06-21T16:53:18",
            "upload_time_iso_8601": "2024-06-21T16:53:18.476649Z",
            "url": "https://files.pythonhosted.org/packages/ee/0f/9e0792831fc2e0f0e8332d02a9a36f1351a678ee6efacb6f8911afc605f2/precisetranscribe-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ae7bb3ced5996b86562dbcb1ded3e42657651c4dbfa5ed0a4a40a0dd5aad62e2",
                "md5": "53b388a92054c2ac1d7a8bc16db30426",
                "sha256": "b7dc2ef8ee4262fef44909ace7c39610ab8934f579a4f7d4f206a53b4c8e4e2f"
            },
            "downloads": -1,
            "filename": "precisetranscribe-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "53b388a92054c2ac1d7a8bc16db30426",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 7709,
            "upload_time": "2024-06-21T16:53:19",
            "upload_time_iso_8601": "2024-06-21T16:53:19.460028Z",
            "url": "https://files.pythonhosted.org/packages/ae/7b/b3ced5996b86562dbcb1ded3e42657651c4dbfa5ed0a4a40a0dd5aad62e2/precisetranscribe-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-21 16:53:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aeronjl",
    "github_project": "transcribe",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "precisetranscribe"
}

None