faster-whisper-hotkey


Namefaster-whisper-hotkey JSON
Version 0.2.6 PyPI version JSON
download
home_pageNone
SummaryPush-to-talk transcription using faster-whisper
upload_time2025-08-09 18:44:52
maintainerNone
docs_urlNone
authorblakkd
requires_python>=3.10
licenseNone
keywords keyboard recognition voice typing speech shortcut speech-recognition hotkey speech-to-text transcription stt asr faster-whisper whisper parakeet canary
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # _faster-whisper Hotkey_


a minimalist push-to-talk style transcription tool built upon **[cutting-edge ASR models](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard)** such as Canary, Parakeet or Whisper.

**Hold the hotkey, Speak, Release ==> And baamm in your text field!**

In the terminal, in a text editor, or even in the text chat of your online video game, anywhere!

## Current models

- (NEW) **[canary-1b-flash](https://huggingface.co/nvidia/canary-1b-flash)**: 4 languages: en, fr, de, es
- (NEW) **[parakeet-tdt-0.6b-v2](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2)**: en-only model
- **[whisper](https://github.com/SYSTRAN/faster-whisper)**: many languages

## Features

- **Automatic Download**: The missing models are automatically retrieved from Hugging Face.
- **No clipboard usage**: Uses `pynput` to directly simulate keypresses instead.
- **Zero impact on resources** apart from RAM/VRAM (cause we want the load to stay loaded to be always ready-to-use).
- **User-Friendly Interface**: Simple interactive menu for configuration, with quick "last config" reuse.
- **Configurable Settings**: Allows users to set the input device, transcription model, compute type, device, and language directly through the menu.

## Performances

- **openai/whisper** (multilanguage):
  - **GPU (cuda)**: instant transcription using any models, even with auto language detection.
  - **CPU**: Time-to-first-word can be longer, but transcribing longer sequences compared to just few words won't lead to significant added delay. For large model, time to first word should still be acceptable without language detection.

_Personnal note:
I feel distilled whisper models are lacking precision for non-native English speakers. I personally don't really like them, finding them a bit "rigid"._

- (New) **nvidia/parakeet-tdt-0.6b-v2** (english only):

  - **~20% lower error rate** than whisper-large-v3 despite being **~20x faster!**
  - **CPU**: instant transcription, even in F16!
  - **GPU**: really not necessary

- (New) **nvidia/canary-1b-flash** (4 languages):
  - **~20% lower error rate** than whisper-large-v3 despite being **~10x faster!**
  - **CPU**: almost instant transcription, even in F16!
  - **GPU**: really not necessary

See https://huggingface.co/spaces/hf-audio/open_asr_leaderboard for details.

## Installation

_see https://docs.astral.sh/uv/ for more information on uv. uv is fast :\)_

### From PyPi

- As a pip package:

  ```
  uv pip install faster-whisper-hotkey
  ```

- or as an tool, so that you can run faster-whisper-hotkey from any venv:

  ```
  uv tool install faster-whisper-hotkey
  ```

### From source

1. Clone the repository:

   ```
   git clone https://github.com/blakkd/faster-whisper-hotkey
   cd faster-whisper-hotkey
   ```

2. Install the package and dependencies:

- as a pip package:

  ```
  uv pip install .
  ```

- or as an uv tool:

  ```
  uv tool install .
  ```

### For Nvidia GPU

You need to install cudnn https://developer.nvidia.com/cudnn-downloads

## Usage

1. Whether you installed from PyPi or from source, just run `faster-whisper-hotkey`
2. Go through the menu steps.
3. Once the model is loaded, focus on any text field.
4. Then, simply press the hotkey (PAUSE, F4 or F8) while you speak, release it when you're done, and see the magic happening!

When the script is running, you can forget it, the model will remain loaded, and it's ready to transcribe at any time.

## Configuration File

The script automatically saves your settings to `~/.config/faster_whisper_hotkey/transcriber_settings.json`.

## Limitations

- Canary is limited to 40s of audio only (because we don't use the batching script provided by Nvidia for now, maybe later, but this may be out of scope).
- Almost all text fields are supported. But there can be some rare exception such as the cinnamon start menu search bar.

## Tricks

- If you you pick a multilingual **whisper** model, and select `en` as source while speaking another language it will be translated to English, provided you speak for at least few seconds.

## Acknowledgements

Many thanks to:

- **the developers of faster-whisper** for providing such an efficient transcription inference engine
- **NVIDIA** for their awesome parakeet-tdt-0.6b-v2 and canary-1b-flash models
- and to **all the contributors** of the libraries I used

Also thanks to @wgabrys88 for their [parakeet HF space demo example](https://huggingface.co/spaces/WJ88/NVIDIA-Parakeet-TDT-0.6B-v2-INT8-Real-Time-Mic-Transcription) that has been helpful!

And to finish, a special mention to **@siddhpant** for their useful [broo](https://github.com/siddhpant/broo) tool, who gave me a mic <3

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "faster-whisper-hotkey",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "keyboard, recognition, voice, typing, speech, shortcut, speech-recognition, hotkey, speech-to-text, transcription, stt, asr, faster-whisper, whisper, parakeet, canary",
    "author": "blakkd",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/06/22/af208e0c49e24291ff680310107182c41151a0472fffe18dfeb793ea60fe/faster_whisper_hotkey-0.2.6.tar.gz",
    "platform": null,
    "description": "# _faster-whisper Hotkey_\n\n\na minimalist push-to-talk style transcription tool built upon **[cutting-edge ASR models](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard)** such as Canary, Parakeet or Whisper.\n\n**Hold the hotkey, Speak, Release ==> And baamm in your text field!**\n\nIn the terminal, in a text editor, or even in the text chat of your online video game, anywhere!\n\n## Current models\n\n- (NEW) **[canary-1b-flash](https://huggingface.co/nvidia/canary-1b-flash)**: 4 languages: en, fr, de, es\n- (NEW) **[parakeet-tdt-0.6b-v2](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2)**: en-only model\n- **[whisper](https://github.com/SYSTRAN/faster-whisper)**: many languages\n\n## Features\n\n- **Automatic Download**: The missing models are automatically retrieved from Hugging Face.\n- **No clipboard usage**: Uses `pynput` to directly simulate keypresses instead.\n- **Zero impact on resources** apart from RAM/VRAM (cause we want the load to stay loaded to be always ready-to-use).\n- **User-Friendly Interface**: Simple interactive menu for configuration, with quick \"last config\" reuse.\n- **Configurable Settings**: Allows users to set the input device, transcription model, compute type, device, and language directly through the menu.\n\n## Performances\n\n- **openai/whisper** (multilanguage):\n  - **GPU (cuda)**: instant transcription using any models, even with auto language detection.\n  - **CPU**: Time-to-first-word can be longer, but transcribing longer sequences compared to just few words won't lead to significant added delay. For large model, time to first word should still be acceptable without language detection.\n\n_Personnal note:\nI feel distilled whisper models are lacking precision for non-native English speakers. I personally don't really like them, finding them a bit \"rigid\"._\n\n- (New) **nvidia/parakeet-tdt-0.6b-v2** (english only):\n\n  - **~20% lower error rate** than whisper-large-v3 despite being **~20x faster!**\n  - **CPU**: instant transcription, even in F16!\n  - **GPU**: really not necessary\n\n- (New) **nvidia/canary-1b-flash** (4 languages):\n  - **~20% lower error rate** than whisper-large-v3 despite being **~10x faster!**\n  - **CPU**: almost instant transcription, even in F16!\n  - **GPU**: really not necessary\n\nSee https://huggingface.co/spaces/hf-audio/open_asr_leaderboard for details.\n\n## Installation\n\n_see https://docs.astral.sh/uv/ for more information on uv. uv is fast :\\)_\n\n### From PyPi\n\n- As a pip package:\n\n  ```\n  uv pip install faster-whisper-hotkey\n  ```\n\n- or as an tool, so that you can run faster-whisper-hotkey from any venv:\n\n  ```\n  uv tool install faster-whisper-hotkey\n  ```\n\n### From source\n\n1. Clone the repository:\n\n   ```\n   git clone https://github.com/blakkd/faster-whisper-hotkey\n   cd faster-whisper-hotkey\n   ```\n\n2. Install the package and dependencies:\n\n- as a pip package:\n\n  ```\n  uv pip install .\n  ```\n\n- or as an uv tool:\n\n  ```\n  uv tool install .\n  ```\n\n### For Nvidia GPU\n\nYou need to install cudnn https://developer.nvidia.com/cudnn-downloads\n\n## Usage\n\n1. Whether you installed from PyPi or from source, just run `faster-whisper-hotkey`\n2. Go through the menu steps.\n3. Once the model is loaded, focus on any text field.\n4. Then, simply press the hotkey (PAUSE, F4 or F8) while you speak, release it when you're done, and see the magic happening!\n\nWhen the script is running, you can forget it, the model will remain loaded, and it's ready to transcribe at any time.\n\n## Configuration File\n\nThe script automatically saves your settings to `~/.config/faster_whisper_hotkey/transcriber_settings.json`.\n\n## Limitations\n\n- Canary is limited to 40s of audio only (because we don't use the batching script provided by Nvidia for now, maybe later, but this may be out of scope).\n- Almost all text fields are supported. But there can be some rare exception such as the cinnamon start menu search bar.\n\n## Tricks\n\n- If you you pick a multilingual **whisper** model, and select `en` as source while speaking another language it will be translated to English, provided you speak for at least few seconds.\n\n## Acknowledgements\n\nMany thanks to:\n\n- **the developers of faster-whisper** for providing such an efficient transcription inference engine\n- **NVIDIA** for their awesome parakeet-tdt-0.6b-v2 and canary-1b-flash models\n- and to **all the contributors** of the libraries I used\n\nAlso thanks to @wgabrys88 for their [parakeet HF space demo example](https://huggingface.co/spaces/WJ88/NVIDIA-Parakeet-TDT-0.6B-v2-INT8-Real-Time-Mic-Transcription) that has been helpful!\n\nAnd to finish, a special mention to **@siddhpant** for their useful [broo](https://github.com/siddhpant/broo) tool, who gave me a mic <3\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Push-to-talk transcription using faster-whisper",
    "version": "0.2.6",
    "project_urls": {
        "Bug Tracker": "https://github.com/blakkd/faster-whisper-hotkey/issues",
        "Homepage": "https://github.com/blakkd/faster-whisper-hotkey"
    },
    "split_keywords": [
        "keyboard",
        " recognition",
        " voice",
        " typing",
        " speech",
        " shortcut",
        " speech-recognition",
        " hotkey",
        " speech-to-text",
        " transcription",
        " stt",
        " asr",
        " faster-whisper",
        " whisper",
        " parakeet",
        " canary"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a2b70ce255ba536eb97c968a1bf482fb25ea5308cd357ba8415c2728e1ef54d7",
                "md5": "e3f37020afbf99a08884a08653b3eac8",
                "sha256": "20e379b1d0ae99e396b4c39d2f9855ebc45a4821736c67213490398f05a51fda"
            },
            "downloads": -1,
            "filename": "faster_whisper_hotkey-0.2.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e3f37020afbf99a08884a08653b3eac8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 10955,
            "upload_time": "2025-08-09T18:44:50",
            "upload_time_iso_8601": "2025-08-09T18:44:50.813661Z",
            "url": "https://files.pythonhosted.org/packages/a2/b7/0ce255ba536eb97c968a1bf482fb25ea5308cd357ba8415c2728e1ef54d7/faster_whisper_hotkey-0.2.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0622af208e0c49e24291ff680310107182c41151a0472fffe18dfeb793ea60fe",
                "md5": "856749b293033523ef0f7a6a54a5278a",
                "sha256": "c6d4bf5433fb2ad0a439912f7c07dc19f7bd460f0c5258bdea052103cd1c8710"
            },
            "downloads": -1,
            "filename": "faster_whisper_hotkey-0.2.6.tar.gz",
            "has_sig": false,
            "md5_digest": "856749b293033523ef0f7a6a54a5278a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 12346,
            "upload_time": "2025-08-09T18:44:52",
            "upload_time_iso_8601": "2025-08-09T18:44:52.432643Z",
            "url": "https://files.pythonhosted.org/packages/06/22/af208e0c49e24291ff680310107182c41151a0472fffe18dfeb793ea60fe/faster_whisper_hotkey-0.2.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-09 18:44:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "blakkd",
    "github_project": "faster-whisper-hotkey",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "faster-whisper-hotkey"
}
        
Elapsed time: 0.44402s