# ispeak
A keyboard-centric inline speech-to-text tool that works wherever you can type; [`vim`](https://www.vim.org/), [`emacs`](https://www.gnu.org/software/emacs/), [`firefox`](https://www.firefox.com), and CLI/AI tools like [`aider`](https://github.com/paul-gauthier/aider), [`codex`](https://github.com/openai/codex), [`claude`](https://claude.ai/code), or whatever you fancy
<img align="right" width="188" height="204" alt="ispeak logo" src="https://raw.githubusercontent.com/fetchTe/ispeak/master/docs/ispeak-logo.png" />
+ **Multilingual, Local, Fast** - Powered via [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
+ **Transcribed Speech** - As keyboard (type) or clipboard (copy) events
+ **Inline UX** - Recording indicator displayed in the active buffer & self-deletes
+ **Hotkey-Driven & Configurable** - Tune the operation/model to your liking
+ **Post-Transcribe Plugin Pipeline** - [Replace](#-replace), [text2num](#-text2num), and [num2text](#-num2text)
+ **Cross-Platform** - Works on [Linux](#linux)/[macOS](#macos)/[Windows](#windows) with GPU or CPU
## Quick Start
<img align="right" src="https://img.shields.io/badge/Python-3.11%2B-3776AB?logo=python&logoColor=white" />
1. **Run**: `ispeak` (add `-b <program>` to target a specific executable)
2. **Activate**: Press the hotkey (default `shift_l`) - the 'recording indicator' is text-based (default `;`)
3. **Record**: Speak freely; no automatic timeout or voice activity cutoff
4. **Complete**: Press the hotkey again to delete the indicator and transcribe your speech (abort via `escape`)
5. **Output**: Your words appear as typed text at your cursor's location
> **IMPORTANT**: The output goes to the application that currently has keyboard focus, which allows you to use the same `ispeak` instance between applications. This may be a feature or a bug.
### ▎Install
```sh
#> copy'n'paste system/global install
pip install ispeak
uv tool install ispeak
# cpu-only + plugins; it's better to simply clone & run: uv tool install ".[cpu,plugin]"
uv pip install --system "ispeak[plugin]" --torch-backend=cpu
```
> [`uv`](https://docs.astral.sh/uv/) is a python package installer
```sh
#> clone'n'install
git clone https://github.com/fetchTe/ispeak && cd ispeak
# global install (extra: cpu, cu118, cu128, plugin)
uv tool install ".[plugin]" # CUDA + plugins
uv tool install ".[cpu,plugin]" # CPU-only (no CUDA) + plugins
# local install (extra: cpu, cu118, cu128, plugin)
uv sync --group dev # CUDA (default) + dev (ruff, pyright, pytest)
uv sync --extra cpu --extra plugin # CPU-only (no CUDA) + plugins
# pip install + plugins
pip install RealtimeSTT pynput pyperclip num2words text2num
```
### ▎Usage
```crystal
# USAGE (v0.3.0)
ispeak [options...]
# OPTIONS
-b, --binary Executable to launch with voice input (default: none)
-c, --config Path to configuration file
-l, --log-file Path to voice transcription append log file
-n, --no-output Disables all output/actions - typing, copying, and record indicator
-p, --copy Use the 'clipboard' to copy instead of the 'keyboard' to type the output
-s, --setup Configure voice settings
-t, --test Test voice input functionality
--config-show Show current configuration
# EXAMPLES
ispeak --setup # Interactive configuration wizard
ispeak --copy # Start with the output mode set as 'clipboard'
ispeak -l words.log # Log transcriptions to file
# DEV/LOCAL USAGE
uv run ispeak --setup # via uv
```
<br/>
## Configuration
Can be defined via [JSON](https://en.wikipedia.org/wiki/JSON) or [TOML](https://en.wikipedia.org/wiki/TOML), and the lookup is performed in the following order:
1. **Environment Variable**: `ISPEAK_CONFIG` environment variable is set to the path of the config file
2. **Platform-Specific Config**
- **macOS**: `~/Library/Preferences/ispeak/ispeak.{json,toml}`
- **Windows**: `%APPDATA%\ispeak\ispeak.{json,toml}` (or `~/AppData/Roaming/ispeak/ispeak.{json,toml}`)
- **Linux**: `$XDG_CONFIG_HOME/ispeak/ispeak.{json,toml}` (or `~/.config/ispeak/ispeak.{json,toml}`)
3. **Local**: `./ispeak.{json,toml}` in the current working directory
4. **Default**: fallback
```json
{
"ispeak": {
"binary": null,
"delete_key": null,
"delete_keyword": ["delete", "undo"],
"escape_key": "esc",
"keyboard_interval": 0,
"log_file": null,
"output": "keyboard",
"push_to_talk_key": "shift_l",
"push_to_talk_key_delay": 0.3,
"recording_indicator": ";",
"strip_whitespace": true
},
"stt": {
"model": "tiny",
"language": "auto",
"beam_size": 5,
"compute_type": "auto",
"download_root": null,
"enable_realtime_transcription": false,
"ensure_sentence_ends_with_period": true,
"ensure_sentence_starting_uppercase": true,
"initial_prompt": null,
"no_log_file": true,
"normalize_audio": true,
"spinner": false
},
"plugin": {}
}
```
> **NOTE**: Highly recommend using `ispeak --setup` for initial setup
<br/>
### ▎ `ispeak`
- `binary` (str/null): Default executable to launch with voice input
- `delete_key` (str/null): Key to trigger deletion of previous input via backspace
- `delete_keyword` (list/bool): Words that trigger deletion of previous input via backspace (must be exact)
- `escape_key` (str/null): Key to cancel current recording without transcription
- `keyboard_interval` (float/null): delay applied after each 'keyboard' character
- `log_file` (str/null): Path to file for logging voice transcriptions
- `output` (str/false): Mode of output; 'keyboard' (type), 'clipboard' (copy), or false for none
- For all languages aside from English, using 'clipboard' is recommended
- `push_to_talk_key_delay` (float): Brief delay after hotkey press to prevent input conflicts
- `push_to_talk_key` (str/null): Hotkey to start/stop recording sessions
- `recording_indicator` (str/null): Visual indicator typed when recording starts **must be a typeable**
- `strip_whitespace` (bool): Remove extra whitespace from transcribed text
> Hotkeys work via [pynput](https://github.com/moses-palmer/pynput) and support: <br/>
> ╸ Simple characters: `a`, `b`, `c`, `1`, etc. <br/>
> ╸ Special keys: `end`, `alt_l`, `ctrl_l` - (see [pynput Key class](https://github.com/moses-palmer/pynput/blob/74c5220a61fecf9eec0734abdbca23389001ea6b/lib/pynput/keyboard/_base.py#L162)) <br/>
> ╸ Key combinations: `<ctrl>+<alt>+h`, `<shift>+<f1>`<br/>
<br/>
### ▎`stt`
> A full config reference can be found in [`./docs/stt-options.md`](https://github.com/fetchTe/ispeak/blob/master/docs/stt-options.md) <br/>
> ╸ [`RealtimeSTT`](https://github.com/KoljaB/RealtimeSTT) handles the input/mic setup and processing <br/>
> ╸ [`faster-whisper`](https://github.com/SYSTRAN/faster-whisper) is the actual speech-to-text engine implementation
- `model` (str): Model size or path to local CTranslate2 model (for English variants append `.en`)
- `tiny`: Ultra fast, workable accuracy (~39MB, CPU/GPU)
- `base`: Respectable accuracy/speed (~74MB, CPU/GPU ~1GB/VRAM)
- `small`: Decent accuracy (~244MB, CPU+/GPU ~2GB/VRAM)
- `medium`: Good accuracy (~769MB, GPU ~3GB/VRAM)
- `large-v1`/`large-v2`: Superb accuracy (~1550MB, GPU ~4GB/VRAM)
- `language` (str): Language code (`en`, `es`, `fr`, `de`, etc) or `"auto"` for automatic detection
- `beam_size` (int): Size to use for beam search decoding (worth bumping up)
- `download_root` (str/null): Root path were the models are downloaded/loaded from
- `enable_realtime_transcription` (bool): Enable continuous transcription (2x computation)
- `ensure_sentence_ends_with_period` (bool): Add periods to sentences without punctuation
- `ensure_sentence_starting_uppercase` (bool): Ensure sentences start with uppercase letters
- `initial_prompt` (null/str): Initial prompt to be fed to the main transcription model
- `no_log_file` (bool): Skip debug log file creation
- `normalize_audio` (bool): Normalize audio range before processing for better transcription quality
- `spinner` (bool): Show spinner animation (set to `false` to avoid terminal conflicts)
> Apart from using [faster-distil-whisper-large-v3](https://huggingface.co/Systran/faster-distil-whisper-large-v3), I've had good results with the following
```json
{
"model": "Systran/faster-distil-whisper-medium.en",
"initial_prompt": "In this session, we'll discuss concise expression.",
"beam_size": 8,
"post_speech_silence_duration": 0.4,
}
```
> **NOTE**: `initial_prompt` defines style and/or spelling, not instructions [cookbook](https://cookbook.openai.com/examples/whisper_prompting_guide#comparison-with-gpt-prompting)/[ref](https://platform.openai.com/docs/guides/speech-to-text/improving-reliability)
<br/>
## Plugin
The plugin system processes transcribed text through a configurable pipeline of text transformation plugins. Plugins are loaded and executed in order based on their configuration, and each can be configured with the following fields:
- `use` (bool): Enable/disable the plugin (default: `true`)
- `order` (int): Execution order - plugins run in ascending order (default: `999`)
- `settings` (dict): Plugin-specific configuration options
### ▎ `replace`
Regex-based text replacement, mainly for simple string replacements, but also capable of handling Regex patterns with capture groups and flags.
```json5
{
"plugin": {
"replace": {
"use": true,
"order": 1,
"settings": {
// simple string replacements
"iSpeak": "ispeak",
" one ": " 1 ",
"read me": "README",
// regex with capture groups
"(\\s+)(semi)(\\s+)": ";\\g<3>",
"(\\s+)(comma)(\\s+)": ",\\g<3>",
// common voice transcription cleanup
"\\s+question\\s*mark\\.?": "?",
"\\s+exclamation\\s*mark\\.?": "!",
// code-specific replacements
"\\s+open\\s*paren\\s*": "(",
"\\s+close\\s*paren\\s*": ")",
"\\s+open\\s*brace\\s*": "{",
"\\s+close\\s*brace\\s*": "}",
// regex patterns with flags (/pattern/flags format)
"/hello/i": "HI", // case insensitive
"/^start/m": "BEGIN", // multiline
"/comma/gmi": "," // global, multiline, case insensitive
}
}
}
}
```
> **Flags**: Use `/pattern/flags` format (supports `i`, `m`, `s`, `x` flags) <br/>
> **Substitution**: Use `\1`, `\2` or `\g<1>`, `\g<2>` syntax <br/>
> **Tests**: [`./tests/test_plugin_replace.py`](https://github.com/fetchTe/ispeak/blob/master/tests/test_plugin_replace.py) <br/>
<br/>
### ▎ `num2text`
Convert digits to text numbers, like "42" into "forty-two" via [`num2words`](https://github.com/savoirfairelinux/num2words)
```json5
{
"plugin": {
"num2text": {
"use": true,
"order": 3,
"settings": {
"lang": "en", // language code
"to": "cardinal", // cardinal, ordinal, ordinal_num, currency, year
"min": null, // minimum value to convert
"max": null, // maximum value to convert
"currency": "USD", // currency code for currency conversion
"cents": true, // include cents in currency
"percent": "percent" // suffix for percentage conversion
}
}
}
}
```
> **Tests**: [`./tests/test_plugin_num2text.py`](https://github.com/fetchTe/ispeak/blob/master/tests/test_plugin_num2text.py) <br/>
> **Dependency**: [`num2words`](https://github.com/savoirfairelinux/num2words) -> `uv pip install num2words` <br/>
<br/>
### ▎ `text2num`
Convert text numbers to digits, like "forty-two" into "42" via [`text_to_num`](https://github.com/allo-media/text2num)
```json
{
"plugin": {
"text2num": {
"use": true,
"order": 2,
"settings": {
"lang": "en",
"threshold": 0
}
}
}
}
```
> **Tests**: [`./tests/test_plugin_text2num.py`](https://github.com/fetchTe/ispeak/blob/master/tests/test_plugin_text2num.py) <br/>
> **Dependency**: [`text_to_num`](https://github.com/allo-media/text2num) -> `uv pip install text_to_num` <br/>
> **IMPORTANT**: the `threshold` may, or, may not work if cardinal; check out the `TestWishyWashyThreshold` test for more dets<br/>
<br/>
## Troubleshooting
+ **Hotkey Issues**: Check/grant permissions see [linux](#linux), [macOS](#macos), [windows](#windows)
+ **Recording Indicator Misfire(s)**: Increase `push_to_talk_key_delay` (try 0.2-1.0)
+ **Typing/Character Issues**: Try using `"output": "clipboard"`
+ If missing/skipping ASCII characters try using `"keyboard_interval": 0.1`
+ **Transcription Issues**: Try the CPU-only and/or the following minimal test code to isolate the problem:
```python
# test_audio.py -> uv run ./test_audio.py
from RealtimeSTT import AudioToTextRecorder
def process_text(text):
print(f"Transcribed: {text}")
if __name__ == '__main__':
print("Testing RealtimeSTT - speak after you see 'Listening...'")
try:
recorder = AudioToTextRecorder()
while True:
recorder.text(process_text)
except KeyboardInterrupt:
print("\nTest completed.")
except Exception as e:
print(f"Error: {e}")
```
<br/>
## Platform Limitations
> These limitations/quirks come from the `pynput` [docs](https://pynput.readthedocs.io/en/latest/limitations.html)
### ▎Linux
When running under *X*, the following must be true:
- An *X server* must be running
- The environment variable `$DISPLAY` must be set
When running under *uinput*, the following must be true:
- You must run your script as root, so that it has the required permissions for *uinput*
The latter requirement for *X* means that running *pynput* over *SSH* generally will not work. To work around that, make sure to set `$DISPLAY`:
``` sh
$ DISPLAY=:0 python -c 'import pynput'
```
Please note that the value `DISPLAY=:0` is just an example. To find the
actual value, please launch a terminal application from your desktop
environment and issue the command `echo $DISPLAY`.
When running under *Wayland*, the *X server* emulator `Xwayland` will usually run, providing limited functionality. Notably, you will only receive input events from applications running under this emulator.
### ▎macOS
Recent versions of *macOS* restrict monitoring of the keyboard for security reasons. For that reason, one of the following must be true:
- The process must run as root.
- Your application must be white listed under *Enable access for assistive devices*. Note that this might require that you package your application, since otherwise the entire *Python* installation must be white listed.
- On versions after *Mojave*, you may also need to whitelist your terminal application if running your script from a terminal.
All listener classes have the additional attribute `IS_TRUSTED`, which is `True` if no permissions are lacking.
### ▎Windows
Virtual events sent by *other* processes may not be received. This library takes precautions, however, to dispatch any virtual events generated to all currently running listeners of the current process.
<br/>
## Development
```
# USAGE (ispeak)
make [flags...] <target>
# TARGET
-------------------
run execute entry-point -> uv run main.py
build build wheel/source distributions -> hatch build
clean delete build artifacts, cache files, and temporary files
-------------------
publish publish to pypi.org -> twine upload
publish_test publish to test.pypi.org -> twine upload --repository testpypi
publish_check check distributions -> twine check
release clean, format, lint, test, build, check, and optionally publish
-------------------
install install dependencies -> uv sync
install_cpu install dependencies -> uv sync --extra cpu
install_dev install dev dependencies -> uv sync --group dev --extra plugin
install_plugin install plugin dependencies -> uv sync --extra plugin
update update dependencies -> uv lock --upgrade && uv sync
update_dry show outdated dependencies -> uv tree --outdated
venv setup virtual environment if needed -> uv venv -p 3.11
-------------------
check run all checks: lint, type, and format
format format check -> ruff format --check
lint lint check -> ruff check
type type check -> pyright
format_fix auto-fix format -> ruff format
lint_fix auto-fix lint -> ruff check --fix
-------------------
test test -> pytest
test_fast test & fail-fast -> pytest -x -q
-------------------
help displays (this) help screen
# FLAGS
-------------------
UV [? ] uv build flag(s) (e.g: make UV="--no-build-isolation")
-------------------
BAIL [?1] fail fast (bail) on the first test or lint error
PUBLISH [?0] publishes to PyPI after build (requires twine config)
-------------------
DEBUG [?0] enables verbose logging for tools (uv, pytest, ruff)
QUIET [?0] disables pretty-printed/log target (INIT/DONE) info
NO_COLOR [?0] disables color logging/ANSI codes
```
<br/>
## Contributing
1. Fork the repository
2. Create a feature branch: `git checkout -b feature/amazing-feature`
3. Install development dependencies: `uv sync --group dev`
4. Make your changes following the existing code style
5. Run quality checks & test:
```sh
make format_fix # auto-fix format -> ruff format
make check # run all checks: lint, type, and format
make test # run all tests
```
6. Commit your changes: `git commit -m 'feat: add amazing feature'`
7. Push to your branch: `git push origin feature/amazing-feature`
8. Open a Pull Request with a clear description of your changes
<br/>
## Respects
- **[`RealtimeSTT`](https://github.com/KoljaB/RealtimeSTT)** - A swell wrapper around [`faster-whisper`](https://github.com/SYSTRAN/faster-whisper) that powers the speech-to-text engine
- **[`pynput`](https://github.com/moses-palmer/pynput)** - Cross-platform controller and monitorer for the keyboard
- **[`pyperclip`](https://github.com/asweigart/pyperclip)** - Cross-platform clipboard
- **[`whisper`](https://github.com/openai/whisper)** - The foundational speech-to-text recognition model
<br/>
## License
```
MIT License
Copyright (c) 2025 te <legal@fetchTe.com>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
```
Raw data
{
"_id": null,
"home_page": null,
"name": "ispeak",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "aider, claude, cli, clipboard, codex, inline, keyboard, speech-to-text, stt, whisper",
"author": null,
"author_email": "te <git@fetchte.com>",
"download_url": "https://files.pythonhosted.org/packages/03/28/898942bd11e4d90b0d58b97bbc0fdf06a3f764a05bbec163a7ffc1799f9c/ispeak-0.4.0.tar.gz",
"platform": null,
"description": "# ispeak\n\nA keyboard-centric inline speech-to-text tool that works wherever you can type; [`vim`](https://www.vim.org/), [`emacs`](https://www.gnu.org/software/emacs/), [`firefox`](https://www.firefox.com), and CLI/AI tools like [`aider`](https://github.com/paul-gauthier/aider), [`codex`](https://github.com/openai/codex), [`claude`](https://claude.ai/code), or whatever you fancy\n\n<img align=\"right\" width=\"188\" height=\"204\" alt=\"ispeak logo\" src=\"https://raw.githubusercontent.com/fetchTe/ispeak/master/docs/ispeak-logo.png\" />\n\n+ **Multilingual, Local, Fast** - Powered via [faster-whisper](https://github.com/SYSTRAN/faster-whisper) \n+ **Transcribed Speech** - As keyboard (type) or clipboard (copy) events\n+ **Inline UX** - Recording indicator displayed in the active buffer & self-deletes\n+ **Hotkey-Driven & Configurable** - Tune the operation/model to your liking\n+ **Post-Transcribe Plugin Pipeline** - [Replace](#-replace), [text2num](#-text2num), and [num2text](#-num2text)\n+ **Cross-Platform** - Works on [Linux](#linux)/[macOS](#macos)/[Windows](#windows) with GPU or CPU\n\n\n\n## Quick Start\n\n<img align=\"right\" src=\"https://img.shields.io/badge/Python-3.11%2B-3776AB?logo=python&logoColor=white\" />\n\n1. **Run**: `ispeak` (add `-b <program>` to target a specific executable)\n2. **Activate**: Press the hotkey (default `shift_l`) - the 'recording indicator' is text-based (default `;`)\n3. **Record**: Speak freely; no automatic timeout or voice activity cutoff\n4. **Complete**: Press the hotkey again to delete the indicator and transcribe your speech (abort via `escape`)\n5. **Output**: Your words appear as typed text at your cursor's location\n\n\n> **IMPORTANT**: The output goes to the application that currently has keyboard focus, which allows you to use the same `ispeak` instance between applications. This may be a feature or a bug.\n\n\n### \u258eInstall\n\n```sh\n#> copy'n'paste system/global install\npip install ispeak\nuv tool install ispeak\n# cpu-only + plugins; it's better to simply clone & run: uv tool install \".[cpu,plugin]\"\nuv pip install --system \"ispeak[plugin]\" --torch-backend=cpu\n```\n> [`uv`](https://docs.astral.sh/uv/) is a python package installer\n\n\n```sh\n#> clone'n'install\ngit clone https://github.com/fetchTe/ispeak && cd ispeak\n\n# global install (extra: cpu, cu118, cu128, plugin)\nuv tool install \".[plugin]\" # CUDA + plugins\nuv tool install \".[cpu,plugin]\" # CPU-only (no CUDA) + plugins\n\n# local install (extra: cpu, cu118, cu128, plugin)\nuv sync --group dev # CUDA (default) + dev (ruff, pyright, pytest)\nuv sync --extra cpu --extra plugin # CPU-only (no CUDA) + plugins\n\n# pip install + plugins\npip install RealtimeSTT pynput pyperclip num2words text2num\n```\n\n\n### \u258eUsage\n\n```crystal\n# USAGE (v0.3.0)\n ispeak [options...]\n\n# OPTIONS\n -b, --binary Executable to launch with voice input (default: none)\n -c, --config Path to configuration file\n -l, --log-file Path to voice transcription append log file\n -n, --no-output Disables all output/actions - typing, copying, and record indicator\n -p, --copy Use the 'clipboard' to copy instead of the 'keyboard' to type the output\n -s, --setup Configure voice settings\n -t, --test Test voice input functionality\n --config-show Show current configuration\n\n# EXAMPLES\nispeak --setup # Interactive configuration wizard\nispeak --copy # Start with the output mode set as 'clipboard'\nispeak -l words.log # Log transcriptions to file\n\n# DEV/LOCAL USAGE\nuv run ispeak --setup # via uv\n```\n<br/>\n\n\n\n## Configuration\nCan be defined via [JSON](https://en.wikipedia.org/wiki/JSON) or [TOML](https://en.wikipedia.org/wiki/TOML), and the lookup is performed in the following order:\n\n1. **Environment Variable**: `ISPEAK_CONFIG` environment variable is set to the path of the config file\n2. **Platform-Specific Config**\n - **macOS**: `~/Library/Preferences/ispeak/ispeak.{json,toml}`\n - **Windows**: `%APPDATA%\\ispeak\\ispeak.{json,toml}` (or `~/AppData/Roaming/ispeak/ispeak.{json,toml}`)\n - **Linux**: `$XDG_CONFIG_HOME/ispeak/ispeak.{json,toml}` (or `~/.config/ispeak/ispeak.{json,toml}`)\n3. **Local**: `./ispeak.{json,toml}` in the current working directory\n4. **Default**: fallback\n\n```json\n{\n \"ispeak\": {\n \"binary\": null,\n \"delete_key\": null,\n \"delete_keyword\": [\"delete\", \"undo\"],\n \"escape_key\": \"esc\",\n \"keyboard_interval\": 0,\n \"log_file\": null,\n \"output\": \"keyboard\",\n \"push_to_talk_key\": \"shift_l\",\n \"push_to_talk_key_delay\": 0.3,\n \"recording_indicator\": \";\",\n \"strip_whitespace\": true\n },\n \"stt\": {\n \"model\": \"tiny\",\n \"language\": \"auto\",\n \"beam_size\": 5,\n \"compute_type\": \"auto\",\n \"download_root\": null,\n \"enable_realtime_transcription\": false,\n \"ensure_sentence_ends_with_period\": true,\n \"ensure_sentence_starting_uppercase\": true,\n \"initial_prompt\": null,\n \"no_log_file\": true,\n \"normalize_audio\": true,\n \"spinner\": false\n },\n \"plugin\": {}\n}\n```\n> **NOTE**: Highly recommend using `ispeak --setup` for initial setup\n\n\n<br/>\n\n\n### \u258e `ispeak`\n\n- `binary` (str/null): Default executable to launch with voice input\n- `delete_key` (str/null): Key to trigger deletion of previous input via backspace\n- `delete_keyword` (list/bool): Words that trigger deletion of previous input via backspace (must be exact)\n- `escape_key` (str/null): Key to cancel current recording without transcription\n- `keyboard_interval` (float/null): delay applied after each 'keyboard' character\n- `log_file` (str/null): Path to file for logging voice transcriptions\n- `output` (str/false): Mode of output; 'keyboard' (type), 'clipboard' (copy), or false for none\n - For all languages aside from English, using 'clipboard' is recommended\n- `push_to_talk_key_delay` (float): Brief delay after hotkey press to prevent input conflicts\n- `push_to_talk_key` (str/null): Hotkey to start/stop recording sessions\n- `recording_indicator` (str/null): Visual indicator typed when recording starts **must be a typeable**\n- `strip_whitespace` (bool): Remove extra whitespace from transcribed text\n\n> Hotkeys work via [pynput](https://github.com/moses-palmer/pynput) and support: <br/>\n> \u2578 Simple characters: `a`, `b`, `c`, `1`, etc. <br/>\n> \u2578 Special keys: `end`, `alt_l`, `ctrl_l` - (see [pynput Key class](https://github.com/moses-palmer/pynput/blob/74c5220a61fecf9eec0734abdbca23389001ea6b/lib/pynput/keyboard/_base.py#L162)) <br/>\n> \u2578 Key combinations: `<ctrl>+<alt>+h`, `<shift>+<f1>`<br/>\n<br/>\n\n\n### \u258e`stt`\n> A full config reference can be found in [`./docs/stt-options.md`](https://github.com/fetchTe/ispeak/blob/master/docs/stt-options.md) <br/>\n> \u2578 [`RealtimeSTT`](https://github.com/KoljaB/RealtimeSTT) handles the input/mic setup and processing <br/>\n> \u2578 [`faster-whisper`](https://github.com/SYSTRAN/faster-whisper) is the actual speech-to-text engine implementation\n\n- `model` (str): Model size or path to local CTranslate2 model (for English variants append `.en`)\n - `tiny`: Ultra fast, workable accuracy (~39MB, CPU/GPU)\n - `base`: Respectable accuracy/speed (~74MB, CPU/GPU ~1GB/VRAM)\n - `small`: Decent accuracy (~244MB, CPU+/GPU ~2GB/VRAM)\n - `medium`: Good accuracy (~769MB, GPU ~3GB/VRAM)\n - `large-v1`/`large-v2`: Superb accuracy (~1550MB, GPU ~4GB/VRAM) \n- `language` (str): Language code (`en`, `es`, `fr`, `de`, etc) or `\"auto\"` for automatic detection\n- `beam_size` (int): Size to use for beam search decoding (worth bumping up)\n- `download_root` (str/null): Root path were the models are downloaded/loaded from\n- `enable_realtime_transcription` (bool): Enable continuous transcription (2x computation)\n- `ensure_sentence_ends_with_period` (bool): Add periods to sentences without punctuation\n- `ensure_sentence_starting_uppercase` (bool): Ensure sentences start with uppercase letters\n- `initial_prompt` (null/str): Initial prompt to be fed to the main transcription model\n- `no_log_file` (bool): Skip debug log file creation\n- `normalize_audio` (bool): Normalize audio range before processing for better transcription quality\n- `spinner` (bool): Show spinner animation (set to `false` to avoid terminal conflicts)\n\n\n> Apart from using [faster-distil-whisper-large-v3](https://huggingface.co/Systran/faster-distil-whisper-large-v3), I've had good results with the following\n\n```json\n{\n \"model\": \"Systran/faster-distil-whisper-medium.en\",\n \"initial_prompt\": \"In this session, we'll discuss concise expression.\",\n \"beam_size\": 8,\n \"post_speech_silence_duration\": 0.4,\n}\n```\n> **NOTE**: `initial_prompt` defines style and/or spelling, not instructions [cookbook](https://cookbook.openai.com/examples/whisper_prompting_guide#comparison-with-gpt-prompting)/[ref](https://platform.openai.com/docs/guides/speech-to-text/improving-reliability)\n\n\n<br/>\n\n\n\n## Plugin\n\nThe plugin system processes transcribed text through a configurable pipeline of text transformation plugins. Plugins are loaded and executed in order based on their configuration, and each can be configured with the following fields:\n\n- `use` (bool): Enable/disable the plugin (default: `true`)\n- `order` (int): Execution order - plugins run in ascending order (default: `999`)\n- `settings` (dict): Plugin-specific configuration options\n\n\n### \u258e `replace`\nRegex-based text replacement, mainly for simple string replacements, but also capable of handling Regex patterns with capture groups and flags.\n\n```json5\n{\n \"plugin\": {\n \"replace\": {\n \"use\": true,\n \"order\": 1,\n \"settings\": {\n // simple string replacements\n \"iSpeak\": \"ispeak\",\n \" one \": \" 1 \",\n \"read me\": \"README\",\n\n // regex with capture groups\n \"(\\\\s+)(semi)(\\\\s+)\": \";\\\\g<3>\",\n \"(\\\\s+)(comma)(\\\\s+)\": \",\\\\g<3>\",\n\n // common voice transcription cleanup\n \"\\\\s+question\\\\s*mark\\\\.?\": \"?\",\n \"\\\\s+exclamation\\\\s*mark\\\\.?\": \"!\",\n \n // code-specific replacements\n \"\\\\s+open\\\\s*paren\\\\s*\": \"(\",\n \"\\\\s+close\\\\s*paren\\\\s*\": \")\",\n \"\\\\s+open\\\\s*brace\\\\s*\": \"{\",\n \"\\\\s+close\\\\s*brace\\\\s*\": \"}\",\n\n // regex patterns with flags (/pattern/flags format)\n \"/hello/i\": \"HI\", // case insensitive\n \"/^start/m\": \"BEGIN\", // multiline\n \"/comma/gmi\": \",\" // global, multiline, case insensitive\n }\n }\n }\n}\n```\n> **Flags**: Use `/pattern/flags` format (supports `i`, `m`, `s`, `x` flags) <br/>\n> **Substitution**: Use `\\1`, `\\2` or `\\g<1>`, `\\g<2>` syntax <br/>\n> **Tests**: [`./tests/test_plugin_replace.py`](https://github.com/fetchTe/ispeak/blob/master/tests/test_plugin_replace.py) <br/>\n\n<br/>\n\n\n### \u258e `num2text` \nConvert digits to text numbers, like \"42\" into \"forty-two\" via [`num2words`](https://github.com/savoirfairelinux/num2words)\n\n```json5\n{\n \"plugin\": {\n \"num2text\": {\n \"use\": true,\n \"order\": 3,\n \"settings\": {\n \"lang\": \"en\", // language code\n \"to\": \"cardinal\", // cardinal, ordinal, ordinal_num, currency, year\n \"min\": null, // minimum value to convert\n \"max\": null, // maximum value to convert\n \"currency\": \"USD\", // currency code for currency conversion\n \"cents\": true, // include cents in currency\n \"percent\": \"percent\" // suffix for percentage conversion\n }\n }\n }\n}\n```\n> **Tests**: [`./tests/test_plugin_num2text.py`](https://github.com/fetchTe/ispeak/blob/master/tests/test_plugin_num2text.py) <br/>\n> **Dependency**: [`num2words`](https://github.com/savoirfairelinux/num2words) -> `uv pip install num2words` <br/>\n\n<br/>\n\n\n### \u258e `text2num`\nConvert text numbers to digits, like \"forty-two\" into \"42\" via [`text_to_num`](https://github.com/allo-media/text2num)\n\n\n```json\n{\n \"plugin\": {\n \"text2num\": {\n \"use\": true,\n \"order\": 2,\n \"settings\": {\n \"lang\": \"en\",\n \"threshold\": 0\n }\n }\n }\n}\n```\n> **Tests**: [`./tests/test_plugin_text2num.py`](https://github.com/fetchTe/ispeak/blob/master/tests/test_plugin_text2num.py) <br/>\n> **Dependency**: [`text_to_num`](https://github.com/allo-media/text2num) -> `uv pip install text_to_num` <br/>\n> **IMPORTANT**: the `threshold` may, or, may not work if cardinal; check out the `TestWishyWashyThreshold` test for more dets<br/>\n\n<br/>\n\n\n\n## Troubleshooting\n\n+ **Hotkey Issues**: Check/grant permissions see [linux](#linux), [macOS](#macos), [windows](#windows)\n+ **Recording Indicator Misfire(s)**: Increase `push_to_talk_key_delay` (try 0.2-1.0)\n+ **Typing/Character Issues**: Try using `\"output\": \"clipboard\"`\n + If missing/skipping ASCII characters try using `\"keyboard_interval\": 0.1`\n+ **Transcription Issues**: Try the CPU-only and/or the following minimal test code to isolate the problem:\n\n```python\n# test_audio.py -> uv run ./test_audio.py\nfrom RealtimeSTT import AudioToTextRecorder\n\ndef process_text(text):\n print(f\"Transcribed: {text}\")\n\nif __name__ == '__main__':\n print(\"Testing RealtimeSTT - speak after you see 'Listening...'\")\n try:\n recorder = AudioToTextRecorder()\n while True:\n recorder.text(process_text)\n except KeyboardInterrupt:\n print(\"\\nTest completed.\")\n except Exception as e:\n print(f\"Error: {e}\")\n```\n\n<br/>\n\n\n\n## Platform Limitations\n> These limitations/quirks come from the `pynput` [docs](https://pynput.readthedocs.io/en/latest/limitations.html)\n\n\n### \u258eLinux\nWhen running under *X*, the following must be true:\n- An *X server* must be running\n- The environment variable `$DISPLAY` must be set\n\nWhen running under *uinput*, the following must be true:\n- You must run your script as root, so that it has the required permissions for *uinput*\n\nThe latter requirement for *X* means that running *pynput* over *SSH* generally will not work. To work around that, make sure to set `$DISPLAY`:\n\n``` sh\n$ DISPLAY=:0 python -c 'import pynput'\n```\n\nPlease note that the value `DISPLAY=:0` is just an example. To find the\nactual value, please launch a terminal application from your desktop\nenvironment and issue the command `echo $DISPLAY`.\n\nWhen running under *Wayland*, the *X server* emulator `Xwayland` will usually run, providing limited functionality. Notably, you will only receive input events from applications running under this emulator.\n\n\n### \u258emacOS\nRecent versions of *macOS* restrict monitoring of the keyboard for security reasons. For that reason, one of the following must be true:\n\n- The process must run as root.\n- Your application must be white listed under *Enable access for assistive devices*. Note that this might require that you package your application, since otherwise the entire *Python* installation must be white listed.\n- On versions after *Mojave*, you may also need to whitelist your terminal application if running your script from a terminal.\n\nAll listener classes have the additional attribute `IS_TRUSTED`, which is `True` if no permissions are lacking.\n\n\n### \u258eWindows\nVirtual events sent by *other* processes may not be received. This library takes precautions, however, to dispatch any virtual events generated to all currently running listeners of the current process.\n\n<br/>\n\n\n\n## Development\n\n```\n# USAGE (ispeak)\n make [flags...] <target>\n\n# TARGET\n -------------------\n run execute entry-point -> uv run main.py\n build build wheel/source distributions -> hatch build\n clean delete build artifacts, cache files, and temporary files\n -------------------\n publish publish to pypi.org -> twine upload\n publish_test publish to test.pypi.org -> twine upload --repository testpypi\n publish_check check distributions -> twine check\n release clean, format, lint, test, build, check, and optionally publish\n -------------------\n install install dependencies -> uv sync\n install_cpu install dependencies -> uv sync --extra cpu\n install_dev install dev dependencies -> uv sync --group dev --extra plugin\n install_plugin install plugin dependencies -> uv sync --extra plugin\n update update dependencies -> uv lock --upgrade && uv sync\n update_dry show outdated dependencies -> uv tree --outdated\n venv setup virtual environment if needed -> uv venv -p 3.11\n -------------------\n check run all checks: lint, type, and format\n format format check -> ruff format --check\n lint lint check -> ruff check\n type type check -> pyright\n format_fix auto-fix format -> ruff format\n lint_fix auto-fix lint -> ruff check --fix\n -------------------\n test test -> pytest\n test_fast test & fail-fast -> pytest -x -q\n -------------------\n help displays (this) help screen\n\n# FLAGS\n -------------------\n UV [? ] uv build flag(s) (e.g: make UV=\"--no-build-isolation\")\n -------------------\n BAIL [?1] fail fast (bail) on the first test or lint error\n PUBLISH [?0] publishes to PyPI after build (requires twine config)\n -------------------\n DEBUG [?0] enables verbose logging for tools (uv, pytest, ruff)\n QUIET [?0] disables pretty-printed/log target (INIT/DONE) info\n NO_COLOR [?0] disables color logging/ANSI codes\n```\n\n\n<br/>\n\n\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch: `git checkout -b feature/amazing-feature`\n3. Install development dependencies: `uv sync --group dev`\n4. Make your changes following the existing code style\n5. Run quality checks & test:\n ```sh\n make format_fix # auto-fix format -> ruff format\n make check # run all checks: lint, type, and format\n make test # run all tests\n ```\n6. Commit your changes: `git commit -m 'feat: add amazing feature'`\n7. Push to your branch: `git push origin feature/amazing-feature`\n8. Open a Pull Request with a clear description of your changes\n\n<br/>\n\n\n\n## Respects\n\n- **[`RealtimeSTT`](https://github.com/KoljaB/RealtimeSTT)** - A swell wrapper around [`faster-whisper`](https://github.com/SYSTRAN/faster-whisper) that powers the speech-to-text engine\n- **[`pynput`](https://github.com/moses-palmer/pynput)** - Cross-platform controller and monitorer for the keyboard\n- **[`pyperclip`](https://github.com/asweigart/pyperclip)** - Cross-platform clipboard\n- **[`whisper`](https://github.com/openai/whisper)** - The foundational speech-to-text recognition model\n\n\n<br/>\n\n\n\n## License\n\n```\nMIT License\n\nCopyright (c) 2025 te <legal@fetchTe.com>\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "A keyboard-centric inline speech-to-text CLI tool that works wherever you can type",
"version": "0.4.0",
"project_urls": {
"Changelog": "https://github.com/fetchTe/ispeak/blob/master/CHANGELOG.md",
"Documentation": "https://github.com/fetchTe/ispeak#quick-start",
"Homepage": "https://github.com/fetchTe/ispeak",
"Issues": "https://github.com/fetchTe/ispeak/issues",
"Repository": "https://github.com/fetchTe/ispeak.git"
},
"split_keywords": [
"aider",
" claude",
" cli",
" clipboard",
" codex",
" inline",
" keyboard",
" speech-to-text",
" stt",
" whisper"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c3d9884ecfde17c91dec58cb5d1231b836139f810835dd1c715a1337af1e41a5",
"md5": "2a5318dd082f3c7f52affb0d5bd000c3",
"sha256": "1aa027654a44d1c5360b3fd3815b3838ba0969a245b663814f596b79866a1852"
},
"downloads": -1,
"filename": "ispeak-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2a5318dd082f3c7f52affb0d5bd000c3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 36966,
"upload_time": "2025-09-11T16:47:17",
"upload_time_iso_8601": "2025-09-11T16:47:17.206628Z",
"url": "https://files.pythonhosted.org/packages/c3/d9/884ecfde17c91dec58cb5d1231b836139f810835dd1c715a1337af1e41a5/ispeak-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0328898942bd11e4d90b0d58b97bbc0fdf06a3f764a05bbec163a7ffc1799f9c",
"md5": "8885b9d2e68ca362aeb3b66f8e3d18eb",
"sha256": "0907cce0403ef5b9ce1cbaf112835db4ddee14241cfafa1fd92d89459fc106e2"
},
"downloads": -1,
"filename": "ispeak-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "8885b9d2e68ca362aeb3b66f8e3d18eb",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 183742,
"upload_time": "2025-09-11T16:47:18",
"upload_time_iso_8601": "2025-09-11T16:47:18.175239Z",
"url": "https://files.pythonhosted.org/packages/03/28/898942bd11e4d90b0d58b97bbc0fdf06a3f764a05bbec163a7ffc1799f9c/ispeak-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-11 16:47:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fetchTe",
"github_project": "ispeak",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "ispeak"
}