Name | scribe-cli JSON |
Version |
0.11.0
JSON |
| download |
home_page | None |
Summary | scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer |
upload_time | 2025-02-21 02:24:55 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | MIT License
Copyright (c) 2024 Mahé Perrette
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
---
Note: This project relies on external packages that may have more restrictive
licenses. For example, the `pynput` package is licensed under LGPLv3, which
has different requirements compared to the MIT License. Please review the
licenses of all dependencies before using or distributing this software to
ensure compliance with their respective terms. |
keywords |
speech recognition
transcription
ai
language
vosk
whisper
openai
keyboard
clipboard
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
[]()
[](https://pypi.org/project/scribe-cli)
# Scribe <img src="scribe_data/share/icon.png" width=48px>
`scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
## Compatibility
In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) for my own purposes so glitches are likely on other configurations.
Moreover there are quite a bit of dependencies that rely on very OS-specific protocols under the hood, like access to the microphone, keyboard and clipboard,
and even though the python dependencies `scribe` relies on are not restricted to a single platform, there may be limitation and additional binaries to install.
This guide is based on python3.12 running on Ubuntu 24.04 with Gnome + Wayland, which is a relatively standard setting at the time of writing.
Note as of February 19, 2025 python 13 does not seem to produce any transcription (I am not sure which dependency is to blame).
A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
## Installation
Install PortAudio library and xclip library. E.g. on Ubuntu:
```bash
sudo apt-get install portaudio19-dev xclip
```
See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
```bash
pip install scribe-cli[all]
```
(note the `-cli` suffix for client)
or for local development:
```bash
git clone https://github.com/perrette/scribe.git
cd scribe
pip install -e .[all]
```
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
The language models for local backends `vosk` and `whisper` will download on-the-fly.
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
## Usage
Just type in the terminal:
```bash
scribe
```
and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
or until after recording is complete (`whisper`).
You can interrupt the recording via Ctrl + C and start again or change model.
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
```bash
scribe --backend openaiapi --api YOURAPIKEY
```
where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
## Output media
By default the transcription is printed on the terminal, but other output media are supported.
### Clipboard
The most straightforward is the clipboard:
```bash
scribe --clipboard
```
The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
### Output file
Alternatively an output file can be indicated:
```bash
--keyboard -o transcription.txt
```
### Virtual keyboard (experimental)
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
```bash
scribe --keyboard
```
This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
#### Use the keyboard with Wayland (default for Ubuntu 24.04)
In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
Another workaround while staying with Wayland is to use the low-level `uinput` backend of `pynput`, but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
Moreover, the keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered is that the special characters (`é`) were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
```bash
sudo modprobe uinput
sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe) --latency 0.01
```
You're on the right path :)
## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
To activate start with:
```bash
scribe --app
```
or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
of predefined models, or to Quit and choose from the terminal before pressing Enter again.
For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
```bash
sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
pip install PyGObject
```
## Start as an application in GNOME
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
`--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
In a relatively basic form
```bash
scribe-install --clipboard --api YOUROPENAIAPIKEY
```
(`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
And to make an app running outside the terminal:
```bash
scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --api YOUROPENAIAPIKEY
```
This will install two separate apps (names "Scribe" and "Scribe App")
## Fine tuning
There are a number of options to control the silence threshold, duration and more.
Best is to check the available options in the online help:
```bash
scribe --help
```
Raw data
{
"_id": null,
"home_page": null,
"name": "scribe-cli",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "speech recognition, transcription, AI, language, vosk, whisper, openai, keyboard, clipboard",
"author": null,
"author_email": "Mah\u00e9 Perrette <mahe.perrette@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/f8/98/9aa73e4860016418271713f9a2d1d04e67bcc2141c1f320bcad3ddaedd28/scribe_cli-0.11.0.tar.gz",
"platform": null,
"description": "[]()\n[](https://pypi.org/project/scribe-cli)\n\n# Scribe <img src=\"scribe_data/share/icon.png\" width=48px>\n\n`scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.\n\nIt features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).\n\n## Compatibility\n\nIn principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) for my own purposes so glitches are likely on other configurations.\nMoreover there are quite a bit of dependencies that rely on very OS-specific protocols under the hood, like access to the microphone, keyboard and clipboard,\nand even though the python dependencies `scribe` relies on are not restricted to a single platform, there may be limitation and additional binaries to install.\nThis guide is based on python3.12 running on Ubuntu 24.04 with Gnome + Wayland, which is a relatively standard setting at the time of writing.\nNote as of February 19, 2025 python 13 does not seem to produce any transcription (I am not sure which dependency is to blame).\nA test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).\n\n## Installation\n\nInstall PortAudio library and xclip library. E.g. on Ubuntu:\n\n```bash\nsudo apt-get install portaudio19-dev xclip\n```\n\nSee additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:\n\n```bash\npip install scribe-cli[all]\n```\n\n(note the `-cli` suffix for client)\n\nor for local development:\n\n```bash\ngit clone https://github.com/perrette/scribe.git\ncd scribe\npip install -e .[all]\n```\n\nYou can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).\n\nThe language models for local backends `vosk` and `whisper` will download on-the-fly.\nThe default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.\n\n## Usage\n\nJust type in the terminal:\n\n```bash\nscribe\n```\nand the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.\nAfter this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)\nor until after recording is complete (`whisper`).\nYou can interrupt the recording via Ctrl + C and start again or change model.\n\nThe default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,\nbut it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.\nWith the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).\nBy default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.\n\nThe `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.\nIt becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.\nThere are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).\n\nThe `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key\n```bash\nscribe --backend openaiapi --api YOURAPIKEY\n```\nwhere `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).\n\n## Output media\n\nBy default the transcription is printed on the terminal, but other output media are supported.\n\n### Clipboard\n\nThe most straightforward is the clipboard:\n\n```bash\nscribe --clipboard\n```\nThe content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).\n\n### Output file\n\nAlternatively an output file can be indicated:\n\n```bash\n --keyboard -o transcription.txt\n```\n\n### Virtual keyboard (experimental)\n\nWith the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:\n\n```bash\nscribe --keyboard\n```\n\nThis can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.\n\nThe `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).\nDepending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).\n\n#### Use the keyboard with Wayland (default for Ubuntu 24.04)\n\nIn my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.\n\nOne workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.\n\nAnother workaround while staying with Wayland is to use the low-level `uinput` backend of `pynput`, but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).\nMoreover, the keyboard must be set with an appropriate layout, for example to have the letter `\u00e9` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered is that the special characters (`\u00e9`) were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:\n```bash\nsudo modprobe uinput\nsudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe) --latency 0.01\n```\nYou're on the right path :)\n\n## System tray icon (experimental) <img src=\"scribe_data/share/icon.png\" width=48px>\n\nTo avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.\nTo activate start with:\n```bash\nscribe --app\n```\nor toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set\nof predefined models, or to Quit and choose from the terminal before pressing Enter again.\nFor the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.\nThat option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:\n\n```bash\nsudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1\npip install PyGObject\n```\n\n## Start as an application in GNOME\n\nIf you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`\nto make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.\n`--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.\n\nIn a relatively basic form\n\n```bash\nscribe-install --clipboard --api YOUROPENAIAPIKEY\n```\n(`--api` is optional and only useful if you plan to use `openaiapi` backend later on)\n\nAnd to make an app running outside the terminal:\n\n```bash\nscribe-install --backend openaiapi --name \"Scribe App\" --keyboard --clipboard --app --no-prompt --no-terminal --api YOUROPENAIAPIKEY\n```\nThis will install two separate apps (names \"Scribe\" and \"Scribe App\")\n\n\n## Fine tuning\n\nThere are a number of options to control the silence threshold, duration and more.\nBest is to check the available options in the online help:\n\n```bash\nscribe --help\n```\n",
"bugtrack_url": null,
"license": "MIT License\n \n Copyright (c) 2024 Mah\u00e9 Perrette\n \n Permission is hereby granted, free of charge, to any person obtaining a copy\n of this software and associated documentation files (the \"Software\"), to deal\n in the Software without restriction, including without limitation the rights\n to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n copies of the Software, and to permit persons to whom the Software is\n furnished to do so, subject to the following conditions:\n \n The above copyright notice and this permission notice shall be included in all\n copies or substantial portions of the Software.\n \n THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n SOFTWARE.\n \n ---\n \n Note: This project relies on external packages that may have more restrictive\n licenses. For example, the `pynput` package is licensed under LGPLv3, which\n has different requirements compared to the MIT License. Please review the\n licenses of all dependencies before using or distributing this software to\n ensure compliance with their respective terms.",
"summary": "scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer",
"version": "0.11.0",
"project_urls": {
"Homepage": "https://github.com/perrette/scribe"
},
"split_keywords": [
"speech recognition",
" transcription",
" ai",
" language",
" vosk",
" whisper",
" openai",
" keyboard",
" clipboard"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "0de7cc7acf24b24c087aeefd7a4021e25427f1c6b86a45dadb0192d62bc90a15",
"md5": "21e1d17db16f20a7a86c5b2826b8601a",
"sha256": "2613c4b0cb616690e1a71b8349dd1c9f5ef7f84b5e0645dab30b8dfe1627e2c3"
},
"downloads": -1,
"filename": "scribe_cli-0.11.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "21e1d17db16f20a7a86c5b2826b8601a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 35080,
"upload_time": "2025-02-21T02:24:53",
"upload_time_iso_8601": "2025-02-21T02:24:53.151578Z",
"url": "https://files.pythonhosted.org/packages/0d/e7/cc7acf24b24c087aeefd7a4021e25427f1c6b86a45dadb0192d62bc90a15/scribe_cli-0.11.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f8989aa73e4860016418271713f9a2d1d04e67bcc2141c1f320bcad3ddaedd28",
"md5": "7c3ed8d4dc44996db9996f515434b286",
"sha256": "2b15d53e8328d2d3b888b4c5f91ee4328a0100c007bdd02f689643e7bd0eb474"
},
"downloads": -1,
"filename": "scribe_cli-0.11.0.tar.gz",
"has_sig": false,
"md5_digest": "7c3ed8d4dc44996db9996f515434b286",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 61085,
"upload_time": "2025-02-21T02:24:55",
"upload_time_iso_8601": "2025-02-21T02:24:55.320010Z",
"url": "https://files.pythonhosted.org/packages/f8/98/9aa73e4860016418271713f9a2d1d04e67bcc2141c1f320bcad3ddaedd28/scribe_cli-0.11.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-21 02:24:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "perrette",
"github_project": "scribe",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "scribe-cli"
}