# speech2caret
<p align="center">
<img src="https://github.com/asmith26/speech2caret/raw/refs/heads/main/assets/speech2caret_logo.svg" alt="speech2caret logo" width="250"/>
</p>
<p align="center">
Use your speech to write to the current caret position!
</p>
## Goals
- ✅ **Simple**: A minimalist tool that does one thing well.
- ✅ **Local**: Runs entirely on your machine (uses [Hugging Face models](https://huggingface.co/models) for speech recognition).
- ✅ **Efficient**: Optimised for low CPU and memory usage, thanks to an event-driven architecture that responds instantly to key presses without wasting resources.
**Note**: Tested only on Linux (Ubuntu). Other operating systems are currently unsupported.
**Demo (turn volume on):**
[demo video](https://github.com/user-attachments/assets/6de72da8-0aa2-40c4-802d-82772881c862)
## Installation
### 1. System Dependencies
First, install the required system libraries:
```bash
sudo apt update
sudo apt install libportaudio2 ffmpeg
```
### 2. Grant Permissions
To read keyboard events and simulate key presses, [`evdev`](https://python-evdev.readthedocs.io/en/latest/usage.html#listing-accessible-event-devices) needs access to your keyboard input device. Add your user to the `input` group to grant the necessary permissions:
```bash
sudo usermod -aG input $USER
newgrp input # or log out and back in
```
### 3. Install and Run
You can install and run `speech2caret` using `pip` or `uv`:
```bash
# Install the package
uv add speech2caret # or pip install speech2caret
# Run the application
speech2caret
```
Alternatively, you can run it directly without installation using `uvx`(the `--index pytorch-cpu=...` flag ensures only CPU packages are downloaded, avoiding GPU-related dependencies):
```bash
uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu --from speech2caret speech2caret
```
## Configuration
The first time you run `speech2caret`, it creates a config file at `~/.config/speech2caret/config.ini`.
You’ll need to manually edit it with the following values:
#### `keyboard_device_path`
This is the path to your keyboard input device. You can find the path either following [this](https://python-evdev.readthedocs.io/en/latest/usage.html#listing-accessible-event-devices), or by running the command below and looking for an entry that ends with `-event-kbd`.
```bash
ls /dev/input/by-path/
```
#### `start_stop_key` and `resume_pause_key`
These are the keys you'll use to control the app.
To find the correct name for a key, you can use the provided Python script below. First, ensure you have your `keyboard_device_path` from the step above, then run this command:
```bash
uvx --from evdev python -c '
keyboard_device_path = "PASTE_YOUR_KEYBOARD_DEVICE_PATH_HERE"
from evdev import InputDevice, categorize, ecodes, KeyEvent
dev = InputDevice(keyboard_device_path)
print(f"Listening for key presses on {dev.name}...")
for event in dev.read_loop():
if event.type == ecodes.EV_KEY:
key_event = categorize(event)
if key_event.keystate == KeyEvent.key_down:
print(f" {key_event.keycode}")
'
```
Press the keys you wish to use, and their names will be printed to the terminal. For a full list of available key names, see [here](https://github.com/torvalds/linux/blob/a79a588fc1761dc12a3064fc2f648ae66cea3c5a/include/uapi/linux/input-event-codes.h#L65).
## How to Use
1. Run the `speech2caret` command in your terminal.
2. Press your configured `start_stop_key` to begin recording.
3. Press the `resume_pause_key` to toggle between pausing and resuming.
4. When you are finished, press the `start_stop_key` again.
5. The recorded audio will be transcribed and typed at your current caret position.
Raw data
{
"_id": null,
"home_page": null,
"name": "speech2caret",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.13",
"maintainer_email": null,
"keywords": "speech recognition, speech to text, caret, artificial intelligence, cli",
"author": "asmith26",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/92/c6/c9d1f4cefb99cfe4df207a6cec3efdc9777869d0d1677383a545683968e7/speech2caret-0.1.2.tar.gz",
"platform": null,
"description": "# speech2caret\n\n<p align=\"center\">\n <img src=\"https://github.com/asmith26/speech2caret/raw/refs/heads/main/assets/speech2caret_logo.svg\" alt=\"speech2caret logo\" width=\"250\"/>\n</p>\n<p align=\"center\">\nUse your speech to write to the current caret position!\n</p>\n\n\n## Goals\n\n- \u2705 **Simple**: A minimalist tool that does one thing well.\n- \u2705 **Local**: Runs entirely on your machine (uses [Hugging Face models](https://huggingface.co/models) for speech recognition).\n- \u2705 **Efficient**: Optimised for low CPU and memory usage, thanks to an event-driven architecture that responds instantly to key presses without wasting resources.\n\n**Note**: Tested only on Linux (Ubuntu). Other operating systems are currently unsupported.\n\n**Demo (turn volume on):**\n\n[demo video](https://github.com/user-attachments/assets/6de72da8-0aa2-40c4-802d-82772881c862)\n\n## Installation\n\n### 1. System Dependencies\n\nFirst, install the required system libraries:\n\n```bash\nsudo apt update\nsudo apt install libportaudio2 ffmpeg\n```\n\n### 2. Grant Permissions\n\nTo read keyboard events and simulate key presses, [`evdev`](https://python-evdev.readthedocs.io/en/latest/usage.html#listing-accessible-event-devices) needs access to your keyboard input device. Add your user to the `input` group to grant the necessary permissions:\n\n```bash\nsudo usermod -aG input $USER\nnewgrp input # or log out and back in \n```\n \n### 3. Install and Run\n\nYou can install and run `speech2caret` using `pip` or `uv`:\n\n```bash\n# Install the package\nuv add speech2caret # or pip install speech2caret\n\n# Run the application\nspeech2caret\n```\n\nAlternatively, you can run it directly without installation using `uvx`(the `--index pytorch-cpu=...` flag ensures only CPU packages are downloaded, avoiding GPU-related dependencies):\n\n```bash\nuvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu --from speech2caret speech2caret\n```\n\n## Configuration\n\nThe first time you run `speech2caret`, it creates a config file at `~/.config/speech2caret/config.ini`.\n\nYou\u2019ll need to manually edit it with the following values:\n\n#### `keyboard_device_path`\nThis is the path to your keyboard input device. You can find the path either following [this](https://python-evdev.readthedocs.io/en/latest/usage.html#listing-accessible-event-devices), or by running the command below and looking for an entry that ends with `-event-kbd`.\n\n```bash\nls /dev/input/by-path/\n```\n\n#### `start_stop_key` and `resume_pause_key`\nThese are the keys you'll use to control the app.\n\nTo find the correct name for a key, you can use the provided Python script below. First, ensure you have your `keyboard_device_path` from the step above, then run this command:\n\n```bash\nuvx --from evdev python -c '\nkeyboard_device_path = \"PASTE_YOUR_KEYBOARD_DEVICE_PATH_HERE\"\n\nfrom evdev import InputDevice, categorize, ecodes, KeyEvent\ndev = InputDevice(keyboard_device_path)\nprint(f\"Listening for key presses on {dev.name}...\")\nfor event in dev.read_loop():\n if event.type == ecodes.EV_KEY:\n key_event = categorize(event)\n if key_event.keystate == KeyEvent.key_down:\n print(f\" {key_event.keycode}\")\n'\n```\nPress the keys you wish to use, and their names will be printed to the terminal. For a full list of available key names, see [here](https://github.com/torvalds/linux/blob/a79a588fc1761dc12a3064fc2f648ae66cea3c5a/include/uapi/linux/input-event-codes.h#L65).\n\n## How to Use\n\n1. Run the `speech2caret` command in your terminal.\n2. Press your configured `start_stop_key` to begin recording.\n3. Press the `resume_pause_key` to toggle between pausing and resuming.\n4. When you are finished, press the `start_stop_key` again.\n5. The recorded audio will be transcribed and typed at your current caret position.\n",
"bugtrack_url": null,
"license": null,
"summary": "Use your speech to write to the current caret position!",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/asmith26/speech2caret"
},
"split_keywords": [
"speech recognition",
" speech to text",
" caret",
" artificial intelligence",
" cli"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "eebab5a3310893f22772f4e579b778336d4479685a534db975cbb8a78ddec070",
"md5": "6e9b95187cce91fbf48801e00e2a7671",
"sha256": "077d2201feb8b595673f3025239aecf16540eccac77a0b6a4b31cc66d105ad08"
},
"downloads": -1,
"filename": "speech2caret-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6e9b95187cce91fbf48801e00e2a7671",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.13",
"size": 8282,
"upload_time": "2025-09-09T18:32:33",
"upload_time_iso_8601": "2025-09-09T18:32:33.023765Z",
"url": "https://files.pythonhosted.org/packages/ee/ba/b5a3310893f22772f4e579b778336d4479685a534db975cbb8a78ddec070/speech2caret-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "92c6c9d1f4cefb99cfe4df207a6cec3efdc9777869d0d1677383a545683968e7",
"md5": "5fec13045fc33eb6a16b5b8e1f592b26",
"sha256": "1453fc4354bb64272f3f1def979ba72f8bd649bcc8d85333695edb9b7441a2c0"
},
"downloads": -1,
"filename": "speech2caret-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "5fec13045fc33eb6a16b5b8e1f592b26",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.13",
"size": 6154,
"upload_time": "2025-09-09T18:32:34",
"upload_time_iso_8601": "2025-09-09T18:32:34.418450Z",
"url": "https://files.pythonhosted.org/packages/92/c6/c9d1f4cefb99cfe4df207a6cec3efdc9777869d0d1677383a545683968e7/speech2caret-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-09 18:32:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "asmith26",
"github_project": "speech2caret",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "speech2caret"
}