# Listen: STT Services
This program is composed of two parts:
- A server aimed to be runned as a background service to serve STT models within the bounds of a socket.
- A client to query the models to transcribe audio from files or directly from a live microphone stream.
The outputed wav file can be stored for later use.
You can then use the `data.helper` script to verify the transcription of every wav file and update the CSV training register before you start training a model.
## Requirements
- [`python-pyaudio`](https://people.csail.mit.edu/hubert/pyaudio/)
## Installation
Once you have a working `pyaudio` for your version of python, install `listen`.
```zsh
pip install stt-listen
# Or from source
pip install git+https://gitlab.com/waser-technologies/technologies/listen.git
```
## Usage
```zsh
❯ listen --help
usage: listen [-h] [-f FILE] [--aggressive {0,1,2,3}] [-d MIC_DEVICE]
[-w SAVE_WAV]
Transcribe long audio files using webRTC VAD or use the streaming interface
from a microphone
options:
-h, --help show this help message and exit
-f FILE, --file FILE Path to the audio file to run (WAV format)
--aggressive {0,1,2,3}
Determines how aggressive filtering out non-speech is.
(Integer between 0-3)
-d MIC_DEVICE, --mic_device MIC_DEVICE
Device input index (Int) as listed by
pyaudio.PyAudio.get_device_info_by_index(). If not
provided, falls back to PyAudio.get_default_device().
-w SAVE_WAV, --save_wav SAVE_WAV
Path to directory where to save recorded sentences
--debug Show debug info
```
## Start the server
To use `listen`, you need a socket with STT models at the ready.
Example to enable as service.
```zsh
cp ./listen.service.example /usr/lib/systemd/user/listen.service
systemctl --user enable --now listen.service
```
Models for STT and punctuation will be downloaded the first time your run the server.
Or manually using python
```zsh
python -m listen.STT.as_service
```
### Get authorization to listen
You need to authorize the system to listen first. Change the service configuration as follows.
```toml
# ~/.assistant/stt.toml
...
[stt]
is_allowed = true
...
```
Then [start the server](#start-the-server) and use `listen` to start [transcribing audio](#use-the-client).
## Use the client
### Transcribe a file
You can quickly transcribe a wav file.
```zsh
❯ listen -f savewav_2022-04-11_17-18-08_578756.wav
Filename Duration(s)
savewav_2022-04-11_17-18-08_578756.wav 3.580
❯ cat savewav_2022-04-11_17-18-08_578756.txt
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ File: savewav_2022-04-11_17-18-08_578756.txt
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ Bonjour.
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────
```
### Transcribe from a live microphone stream
You can also query the models in real time from a microphone.
```zsh
❯ listen
You can speak now.
Bonjour.
^C
Stopped listening.
```
## Supported languages
By default, the server uses the system's language according to the environment variable `$LANG`.
You can manually specify a supported language for the server to use.
```zsh
LANG="en_US.UTF-8" python -m listen.STT.as_service
```
Have a look at [stt-models-locals](https://github.com/wasertech/stt-models-locals#languages) to see the complete list.
If the provided `$LANG` is not supported by any STT model, english is used as a failback.
Raw data
{
"_id": null,
"home_page": "https://gitlab.com/waser-technologies/technologies/listen",
"name": "stt-listen",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<3.11",
"maintainer_email": "",
"keywords": "",
"author": "Danny Waser",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/69/85/4166501cfcb5a342cafefe6b6df936d45a663a6251e76994715dc4f46553/stt-listen-2.4.2.tar.gz",
"platform": null,
"description": "# Listen: STT Services\n\nThis program is composed of two parts:\n- A server aimed to be runned as a background service to serve STT models within the bounds of a socket.\n- A client to query the models to transcribe audio from files or directly from a live microphone stream.\n\n\nThe outputed wav file can be stored for later use.\n\nYou can then use the `data.helper` script to verify the transcription of every wav file and update the CSV training register before you start training a model.\n\n## Requirements\n\n- [`python-pyaudio`](https://people.csail.mit.edu/hubert/pyaudio/)\n\n## Installation\n\nOnce you have a working `pyaudio` for your version of python, install `listen`.\n\n```zsh\npip install stt-listen\n# Or from source\npip install git+https://gitlab.com/waser-technologies/technologies/listen.git\n```\n\n## Usage\n\n```zsh\n\u276f listen --help\nusage: listen [-h] [-f FILE] [--aggressive {0,1,2,3}] [-d MIC_DEVICE]\n [-w SAVE_WAV]\n\nTranscribe long audio files using webRTC VAD or use the streaming interface\nfrom a microphone\n\noptions:\n -h, --help show this help message and exit\n -f FILE, --file FILE Path to the audio file to run (WAV format)\n --aggressive {0,1,2,3}\n Determines how aggressive filtering out non-speech is.\n (Integer between 0-3)\n -d MIC_DEVICE, --mic_device MIC_DEVICE\n Device input index (Int) as listed by\n pyaudio.PyAudio.get_device_info_by_index(). If not\n provided, falls back to PyAudio.get_default_device().\n -w SAVE_WAV, --save_wav SAVE_WAV\n Path to directory where to save recorded sentences\n --debug Show debug info\n```\n\n## Start the server\n\nTo use `listen`, you need a socket with STT models at the ready.\n\nExample to enable as service.\n```zsh\ncp ./listen.service.example /usr/lib/systemd/user/listen.service\nsystemctl --user enable --now listen.service\n```\n\nModels for STT and punctuation will be downloaded the first time your run the server.\n\nOr manually using python\n\n```zsh\npython -m listen.STT.as_service\n```\n\n### Get authorization to listen\n\nYou need to authorize the system to listen first. Change the service configuration as follows.\n\n```toml\n# ~/.assistant/stt.toml\n...\n[stt]\nis_allowed = true\n...\n```\n\nThen [start the server](#start-the-server) and use `listen` to start [transcribing audio](#use-the-client).\n\n## Use the client\n\n### Transcribe a file\n\nYou can quickly transcribe a wav file.\n```zsh\n\u276f listen -f savewav_2022-04-11_17-18-08_578756.wav\nFilename Duration(s) \nsavewav_2022-04-11_17-18-08_578756.wav 3.580 \n\n\u276f cat savewav_2022-04-11_17-18-08_578756.txt\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n \u2502 File: savewav_2022-04-11_17-18-08_578756.txt\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 Bonjour.\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n```\n\n### Transcribe from a live microphone stream\n\nYou can also query the models in real time from a microphone.\n\n```zsh\n\u276f listen\nYou can speak now.\nBonjour.\n^C\nStopped listening.\n```\n\n## Supported languages\n\nBy default, the server uses the system's language according to the environment variable `$LANG`.\n\nYou can manually specify a supported language for the server to use.\n\n```zsh\nLANG=\"en_US.UTF-8\" python -m listen.STT.as_service\n```\n\nHave a look at [stt-models-locals](https://github.com/wasertech/stt-models-locals#languages) to see the complete list.\n\nIf the provided `$LANG` is not supported by any STT model, english is used as a failback.\n",
"bugtrack_url": null,
"license": "LICENSE",
"summary": "Transcribe long audio files with STT or use the streaming interface",
"version": "2.4.2",
"project_urls": {
"Homepage": "https://gitlab.com/waser-technologies/technologies/listen"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ea0cd5ca2472c209abb7c38105255c40b1887655d4c3f26dc790bc0b54a8b2d1",
"md5": "0063b8407c6ae1c41009047027615ed3",
"sha256": "819861821d3a9aed787feb96e4163cccaebb16e1a99e4405e193955849b373dc"
},
"downloads": -1,
"filename": "stt_listen-2.4.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0063b8407c6ae1c41009047027615ed3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<3.11",
"size": 29465,
"upload_time": "2022-11-24T23:10:43",
"upload_time_iso_8601": "2022-11-24T23:10:43.002973Z",
"url": "https://files.pythonhosted.org/packages/ea/0c/d5ca2472c209abb7c38105255c40b1887655d4c3f26dc790bc0b54a8b2d1/stt_listen-2.4.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "69854166501cfcb5a342cafefe6b6df936d45a663a6251e76994715dc4f46553",
"md5": "e8752e9e89b7b78115ecc04506d59a6b",
"sha256": "69f8e307d53f801e04fc0a43800e9a6c0447cbecde3722cc9bbd1aacc81deda6"
},
"downloads": -1,
"filename": "stt-listen-2.4.2.tar.gz",
"has_sig": false,
"md5_digest": "e8752e9e89b7b78115ecc04506d59a6b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<3.11",
"size": 29187,
"upload_time": "2022-11-24T23:10:44",
"upload_time_iso_8601": "2022-11-24T23:10:44.303631Z",
"url": "https://files.pythonhosted.org/packages/69/85/4166501cfcb5a342cafefe6b6df936d45a663a6251e76994715dc4f46553/stt-listen-2.4.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-11-24 23:10:44",
"github": false,
"gitlab": true,
"bitbucket": false,
"codeberg": false,
"gitlab_user": "waser-technologies",
"gitlab_project": "technologies",
"lcname": "stt-listen"
}