faster-whisper-hotkey


Namefaster-whisper-hotkey JSON
Version 0.4.2 PyPI version JSON
download
home_pageNone
SummaryPush-to-talk transcription
upload_time2025-09-14 11:47:50
maintainerNone
docs_urlNone
authorblakkd
requires_python>=3.10
licenseNone
keywords keyboard recognition voice typing speech shortcut speech-to-text hotkey stt asr faster-whisper whisper parakeet canary voxtral
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # _faster-whisper Hotkey_

a minimalist push-to-talk style transcription tool built upon **[cutting-edge ASR models](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard)**.

**Hold the hotkey, Speak, Release ==> And baamm in your text field!**

In the terminal, in a text editor, or even in the text chat of your online video game, anywhere!

## Current models

- (NEW) **[nvidia/canary-1b-v2](https://huggingface.co/nvidia/canary-1b-v2)**:

  - 25 languages supported
  - Transcription and translation
  - No automatic language recognition
  - Crazy fast even on CPU in F16

- (NEW) **[nvidia/parakeet-tdt-0.6b-v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3)**:

  - 25 languages supported
  - Transcription only
  - Automatic language recognition
  - Crazy fast even on CPU in F16

- **[mistralai/Voxtral-Mini-3B-2507](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507)**:

  - English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian
  - Transcription only
  - Automatic language recognition
  - Smart (it even guesses when to put some quotes, etc.) and less error-prone for non English native speakers
  - GPU only

- **[Systran/faster-whisper](https://github.com/SYSTRAN/faster-whisper)**:

  - Many languages
  - Transcription only

***What I personally use currently?***

*- parakeet-tdt-0.6b-v3, on CPU, when I need all my VRAM to run my LMs*

*- Voxtral-Mini-3B-2507, on GPU, when I run smaller models and can fit it along them*

## Features

- **Models downloading**: Missing models are automatically downloaded from Hugging Face.
- **User-Friendly Interface**: Allows users to set the input device, transcription model, compute type, device, and language directly through the menu.
- **Fast**: Almost instant transcription, even on CPU when picking parakeet or canary.

## Installation

_see https://docs.astral.sh/uv/ for more information on uv. uv is fast :\)_

### From PyPi

- As a pip package:

  ```
  uv pip install faster-whisper-hotkey
  ```

- or as an tool, so that you can run faster-whisper-hotkey from any venv:

  ```
  uv tool install faster-whisper-hotkey
  ```

### From source

1. Clone the repository:

   ```
   git clone https://github.com/blakkd/faster-whisper-hotkey
   cd faster-whisper-hotkey
   ```

2. Install the package and dependencies:

- as a pip package:

  ```
  uv pip install .
  ```

- or as an uv tool:

  ```
  uv tool install .
  ```

### For Nvidia GPU

You need to install cudnn https://developer.nvidia.com/cudnn-downloads

## Usage

1. Whether you installed from PyPi or from source, just run `faster-whisper-hotkey`
2. Go through the menu steps.
3. Once the model is loaded, focus on any text field.
4. Then, simply press the hotkey (PAUSE, F4 or F8) while you speak, release it when you're done, and see the magic happening!

When the script is running, you can forget it, the model will remain loaded, and it's ready to transcribe at any time.

## Configuration File

The script automatically saves your settings to `~/.config/faster_whisper_hotkey/transcriber_settings.json`.

## Limitations

- **voxtral**: because of some limitations, and to keep the automatic language recognition capabilities, we are splitting the audio by chunks of 30s. So even if we can still transcribe long speech, best results are when audio is shorter than this.
In the current state it seems impossible to concile long audio as 1 chunk and automatic language detection. We may need to patch upstream https://huggingface.co/docs/transformers/v4.56.1/en/model_doc/voxtral#transformers.VoxtralProcessor.apply_transcription_request

- Due to window type detection to send appropriate key stroke, unfortunately the VSCodium/VSCode terminal isn't supported for now. No clue if we can workaround this. This is an edge case.

## Tricks

- If you you pick a multilingual **faster-whisper** model, and select `en` as source while speaking another language it will be translated to English, provided you speak for at least few seconds.
- If you pick parakeet-tdt-0.6b-v3, you can even use multiple languages during your recording!

## Acknowledgements

Many thanks to:

- **the developers of faster-whisper** for providing such an efficient transcription inference engine
- **NVIDIA** for their blazing fast parakeet and canary models
- **Mistral** for their impressively accurate model Voxtral-Mini-3B model
- and to **all the contributors** of the libraries I used


Also thanks to [wgabrys88](https://huggingface.co/spaces/WJ88/NVIDIA-Parakeet-TDT-0.6B-v2-INT8-Real-Time-Mic-Transcription) and [MohamedRashadthat](https://huggingface.co/spaces/MohamedRashad/Voxtral) for their huggingface spaces that have been helpful!

And to finish, a special mention to **@siddhpant** for their useful [broo](https://github.com/siddhpant/broo) tool, who gave me a mic <3

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "faster-whisper-hotkey",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "keyboard, recognition, voice, typing, speech, shortcut, speech-to-text, hotkey, stt, asr, faster-whisper, whisper, parakeet, canary, voxtral",
    "author": "blakkd",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/e0/1c/2d931df07fdb59fa4a5b4845c2264bc546b2e9ab350cee08debbab5b54ce/faster_whisper_hotkey-0.4.2.tar.gz",
    "platform": null,
    "description": "# _faster-whisper Hotkey_\n\na minimalist push-to-talk style transcription tool built upon **[cutting-edge ASR models](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard)**.\n\n**Hold the hotkey, Speak, Release ==> And baamm in your text field!**\n\nIn the terminal, in a text editor, or even in the text chat of your online video game, anywhere!\n\n## Current models\n\n- (NEW) **[nvidia/canary-1b-v2](https://huggingface.co/nvidia/canary-1b-v2)**:\n\n  - 25 languages supported\n  - Transcription and translation\n  - No automatic language recognition\n  - Crazy fast even on CPU in F16\n\n- (NEW) **[nvidia/parakeet-tdt-0.6b-v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3)**:\n\n  - 25 languages supported\n  - Transcription only\n  - Automatic language recognition\n  - Crazy fast even on CPU in F16\n\n- **[mistralai/Voxtral-Mini-3B-2507](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507)**:\n\n  - English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian\n  - Transcription only\n  - Automatic language recognition\n  - Smart (it even guesses when to put some quotes, etc.) and less error-prone for non English native speakers\n  - GPU only\n\n- **[Systran/faster-whisper](https://github.com/SYSTRAN/faster-whisper)**:\n\n  - Many languages\n  - Transcription only\n\n***What I personally use currently?***\n\n*- parakeet-tdt-0.6b-v3, on CPU, when I need all my VRAM to run my LMs*\n\n*- Voxtral-Mini-3B-2507, on GPU, when I run smaller models and can fit it along them*\n\n## Features\n\n- **Models downloading**: Missing models are automatically downloaded from Hugging Face.\n- **User-Friendly Interface**: Allows users to set the input device, transcription model, compute type, device, and language directly through the menu.\n- **Fast**: Almost instant transcription, even on CPU when picking parakeet or canary.\n\n## Installation\n\n_see https://docs.astral.sh/uv/ for more information on uv. uv is fast :\\)_\n\n### From PyPi\n\n- As a pip package:\n\n  ```\n  uv pip install faster-whisper-hotkey\n  ```\n\n- or as an tool, so that you can run faster-whisper-hotkey from any venv:\n\n  ```\n  uv tool install faster-whisper-hotkey\n  ```\n\n### From source\n\n1. Clone the repository:\n\n   ```\n   git clone https://github.com/blakkd/faster-whisper-hotkey\n   cd faster-whisper-hotkey\n   ```\n\n2. Install the package and dependencies:\n\n- as a pip package:\n\n  ```\n  uv pip install .\n  ```\n\n- or as an uv tool:\n\n  ```\n  uv tool install .\n  ```\n\n### For Nvidia GPU\n\nYou need to install cudnn https://developer.nvidia.com/cudnn-downloads\n\n## Usage\n\n1. Whether you installed from PyPi or from source, just run `faster-whisper-hotkey`\n2. Go through the menu steps.\n3. Once the model is loaded, focus on any text field.\n4. Then, simply press the hotkey (PAUSE, F4 or F8) while you speak, release it when you're done, and see the magic happening!\n\nWhen the script is running, you can forget it, the model will remain loaded, and it's ready to transcribe at any time.\n\n## Configuration File\n\nThe script automatically saves your settings to `~/.config/faster_whisper_hotkey/transcriber_settings.json`.\n\n## Limitations\n\n- **voxtral**: because of some limitations, and to keep the automatic language recognition capabilities, we are splitting the audio by chunks of 30s. So even if we can still transcribe long speech, best results are when audio is shorter than this.\nIn the current state it seems impossible to concile long audio as 1 chunk and automatic language detection. We may need to patch upstream https://huggingface.co/docs/transformers/v4.56.1/en/model_doc/voxtral#transformers.VoxtralProcessor.apply_transcription_request\n\n- Due to window type detection to send appropriate key stroke, unfortunately the VSCodium/VSCode terminal isn't supported for now. No clue if we can workaround this. This is an edge case.\n\n## Tricks\n\n- If you you pick a multilingual **faster-whisper** model, and select `en` as source while speaking another language it will be translated to English, provided you speak for at least few seconds.\n- If you pick parakeet-tdt-0.6b-v3, you can even use multiple languages during your recording!\n\n## Acknowledgements\n\nMany thanks to:\n\n- **the developers of faster-whisper** for providing such an efficient transcription inference engine\n- **NVIDIA** for their blazing fast parakeet and canary models\n- **Mistral** for their impressively accurate model Voxtral-Mini-3B model\n- and to **all the contributors** of the libraries I used\n\n\nAlso thanks to [wgabrys88](https://huggingface.co/spaces/WJ88/NVIDIA-Parakeet-TDT-0.6B-v2-INT8-Real-Time-Mic-Transcription) and [MohamedRashadthat](https://huggingface.co/spaces/MohamedRashad/Voxtral) for their huggingface spaces that have been helpful!\n\nAnd to finish, a special mention to **@siddhpant** for their useful [broo](https://github.com/siddhpant/broo) tool, who gave me a mic <3\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Push-to-talk transcription",
    "version": "0.4.2",
    "project_urls": null,
    "split_keywords": [
        "keyboard",
        " recognition",
        " voice",
        " typing",
        " speech",
        " shortcut",
        " speech-to-text",
        " hotkey",
        " stt",
        " asr",
        " faster-whisper",
        " whisper",
        " parakeet",
        " canary",
        " voxtral"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "619c871fe74582f4159ee86cfa6e549c52c0d49b6553f4e288579ad5ef205269",
                "md5": "7bccfec8209775d01a5c5007df92d107",
                "sha256": "03d6816959999c96e361b3ec079d7018abc1e4fd022e8d0c51d17008eb2d5b87"
            },
            "downloads": -1,
            "filename": "faster_whisper_hotkey-0.4.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7bccfec8209775d01a5c5007df92d107",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 18877,
            "upload_time": "2025-09-14T11:47:49",
            "upload_time_iso_8601": "2025-09-14T11:47:49.384834Z",
            "url": "https://files.pythonhosted.org/packages/61/9c/871fe74582f4159ee86cfa6e549c52c0d49b6553f4e288579ad5ef205269/faster_whisper_hotkey-0.4.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e01c2d931df07fdb59fa4a5b4845c2264bc546b2e9ab350cee08debbab5b54ce",
                "md5": "dbcbe3eaa66c808b14a14ad87267ba7e",
                "sha256": "6377a33a1f6f8c52166ec4c55a6ecde6434295479c9b6a3b4c7c3f6dbf50bb1e"
            },
            "downloads": -1,
            "filename": "faster_whisper_hotkey-0.4.2.tar.gz",
            "has_sig": false,
            "md5_digest": "dbcbe3eaa66c808b14a14ad87267ba7e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 17698,
            "upload_time": "2025-09-14T11:47:50",
            "upload_time_iso_8601": "2025-09-14T11:47:50.658762Z",
            "url": "https://files.pythonhosted.org/packages/e0/1c/2d931df07fdb59fa4a5b4845c2264bc546b2e9ab350cee08debbab5b54ce/faster_whisper_hotkey-0.4.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-14 11:47:50",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "faster-whisper-hotkey"
}
        
Elapsed time: 0.85933s