stt-listen


Namestt-listen JSON
Version 2.4.2 PyPI version JSON
download
home_pagehttps://gitlab.com/waser-technologies/technologies/listen
SummaryTranscribe long audio files with STT or use the streaming interface
upload_time2022-11-24 23:10:44
maintainer
docs_urlNone
authorDanny Waser
requires_python>=3.8,<3.11
licenseLICENSE
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Listen: STT Services

This program is composed of two parts:
- A server aimed to be runned as a background service to serve STT models within the bounds of a socket.
- A client to query the models to transcribe audio from files or directly from a live microphone stream.


The outputed wav file can be stored for later use.

You can then use the `data.helper` script to verify the transcription of every wav file and update the CSV training register before you start training a model.

## Requirements

- [`python-pyaudio`](https://people.csail.mit.edu/hubert/pyaudio/)

## Installation

Once you have a working `pyaudio` for your version of python, install `listen`.

```zsh
pip install stt-listen
# Or from source
pip install git+https://gitlab.com/waser-technologies/technologies/listen.git
```

## Usage

```zsh
❯ listen --help
usage: listen [-h] [-f FILE] [--aggressive {0,1,2,3}] [-d MIC_DEVICE]
                   [-w SAVE_WAV]

Transcribe long audio files using webRTC VAD or use the streaming interface
from a microphone

options:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  Path to the audio file to run (WAV format)
  --aggressive {0,1,2,3}
                        Determines how aggressive filtering out non-speech is.
                        (Integer between 0-3)
  -d MIC_DEVICE, --mic_device MIC_DEVICE
                        Device input index (Int) as listed by
                        pyaudio.PyAudio.get_device_info_by_index(). If not
                        provided, falls back to PyAudio.get_default_device().
  -w SAVE_WAV, --save_wav SAVE_WAV
                        Path to directory where to save recorded sentences
  --debug               Show debug info
```

## Start the server

To use `listen`, you need a socket with STT models at the ready.

Example to enable as service.
```zsh
cp ./listen.service.example /usr/lib/systemd/user/listen.service
systemctl --user enable --now listen.service
```

Models for STT and punctuation will be downloaded the first time your run the server.

Or manually using python

```zsh
python -m listen.STT.as_service
```

### Get authorization to listen

You need to authorize the system to listen first. Change the service configuration as follows.

```toml
# ~/.assistant/stt.toml
...
[stt]
is_allowed = true
...
```

Then [start the server](#start-the-server) and use `listen` to start [transcribing audio](#use-the-client).

## Use the client

### Transcribe a file

You can quickly transcribe a wav file.
```zsh
❯ listen -f savewav_2022-04-11_17-18-08_578756.wav
Filename                       Duration(s)         
savewav_2022-04-11_17-18-08_578756.wav 3.580               

❯ cat savewav_2022-04-11_17-18-08_578756.txt
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: savewav_2022-04-11_17-18-08_578756.txt
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ Bonjour.
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────
```

### Transcribe from a live microphone stream

You can also query the models in real time from a microphone.

```zsh
❯ listen
You can speak now.
Bonjour.
^C
Stopped listening.
```

## Supported languages

By default, the server uses the system's language according to the environment variable `$LANG`.

You can manually specify a supported language for the server to use.

```zsh
LANG="en_US.UTF-8" python -m listen.STT.as_service
```

Have a look at [stt-models-locals](https://github.com/wasertech/stt-models-locals#languages) to see the complete list.

If the provided `$LANG` is not supported by any STT model, english is used as a failback.

            

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.com/waser-technologies/technologies/listen",
    "name": "stt-listen",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<3.11",
    "maintainer_email": "",
    "keywords": "",
    "author": "Danny Waser",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/69/85/4166501cfcb5a342cafefe6b6df936d45a663a6251e76994715dc4f46553/stt-listen-2.4.2.tar.gz",
    "platform": null,
    "description": "# Listen: STT Services\n\nThis program is composed of two parts:\n- A server aimed to be runned as a background service to serve STT models within the bounds of a socket.\n- A client to query the models to transcribe audio from files or directly from a live microphone stream.\n\n\nThe outputed wav file can be stored for later use.\n\nYou can then use the `data.helper` script to verify the transcription of every wav file and update the CSV training register before you start training a model.\n\n## Requirements\n\n- [`python-pyaudio`](https://people.csail.mit.edu/hubert/pyaudio/)\n\n## Installation\n\nOnce you have a working `pyaudio` for your version of python, install `listen`.\n\n```zsh\npip install stt-listen\n# Or from source\npip install git+https://gitlab.com/waser-technologies/technologies/listen.git\n```\n\n## Usage\n\n```zsh\n\u276f listen --help\nusage: listen [-h] [-f FILE] [--aggressive {0,1,2,3}] [-d MIC_DEVICE]\n                   [-w SAVE_WAV]\n\nTranscribe long audio files using webRTC VAD or use the streaming interface\nfrom a microphone\n\noptions:\n  -h, --help            show this help message and exit\n  -f FILE, --file FILE  Path to the audio file to run (WAV format)\n  --aggressive {0,1,2,3}\n                        Determines how aggressive filtering out non-speech is.\n                        (Integer between 0-3)\n  -d MIC_DEVICE, --mic_device MIC_DEVICE\n                        Device input index (Int) as listed by\n                        pyaudio.PyAudio.get_device_info_by_index(). If not\n                        provided, falls back to PyAudio.get_default_device().\n  -w SAVE_WAV, --save_wav SAVE_WAV\n                        Path to directory where to save recorded sentences\n  --debug               Show debug info\n```\n\n## Start the server\n\nTo use `listen`, you need a socket with STT models at the ready.\n\nExample to enable as service.\n```zsh\ncp ./listen.service.example /usr/lib/systemd/user/listen.service\nsystemctl --user enable --now listen.service\n```\n\nModels for STT and punctuation will be downloaded the first time your run the server.\n\nOr manually using python\n\n```zsh\npython -m listen.STT.as_service\n```\n\n### Get authorization to listen\n\nYou need to authorize the system to listen first. Change the service configuration as follows.\n\n```toml\n# ~/.assistant/stt.toml\n...\n[stt]\nis_allowed = true\n...\n```\n\nThen [start the server](#start-the-server) and use `listen` to start [transcribing audio](#use-the-client).\n\n## Use the client\n\n### Transcribe a file\n\nYou can quickly transcribe a wav file.\n```zsh\n\u276f listen -f savewav_2022-04-11_17-18-08_578756.wav\nFilename                       Duration(s)         \nsavewav_2022-04-11_17-18-08_578756.wav 3.580               \n\n\u276f cat savewav_2022-04-11_17-18-08_578756.txt\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n       \u2502 File: savewav_2022-04-11_17-18-08_578756.txt\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n   1   \u2502 Bonjour.\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n```\n\n### Transcribe from a live microphone stream\n\nYou can also query the models in real time from a microphone.\n\n```zsh\n\u276f listen\nYou can speak now.\nBonjour.\n^C\nStopped listening.\n```\n\n## Supported languages\n\nBy default, the server uses the system's language according to the environment variable `$LANG`.\n\nYou can manually specify a supported language for the server to use.\n\n```zsh\nLANG=\"en_US.UTF-8\" python -m listen.STT.as_service\n```\n\nHave a look at [stt-models-locals](https://github.com/wasertech/stt-models-locals#languages) to see the complete list.\n\nIf the provided `$LANG` is not supported by any STT model, english is used as a failback.\n",
    "bugtrack_url": null,
    "license": "LICENSE",
    "summary": "Transcribe long audio files with STT or use the streaming interface",
    "version": "2.4.2",
    "project_urls": {
        "Homepage": "https://gitlab.com/waser-technologies/technologies/listen"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ea0cd5ca2472c209abb7c38105255c40b1887655d4c3f26dc790bc0b54a8b2d1",
                "md5": "0063b8407c6ae1c41009047027615ed3",
                "sha256": "819861821d3a9aed787feb96e4163cccaebb16e1a99e4405e193955849b373dc"
            },
            "downloads": -1,
            "filename": "stt_listen-2.4.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0063b8407c6ae1c41009047027615ed3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<3.11",
            "size": 29465,
            "upload_time": "2022-11-24T23:10:43",
            "upload_time_iso_8601": "2022-11-24T23:10:43.002973Z",
            "url": "https://files.pythonhosted.org/packages/ea/0c/d5ca2472c209abb7c38105255c40b1887655d4c3f26dc790bc0b54a8b2d1/stt_listen-2.4.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "69854166501cfcb5a342cafefe6b6df936d45a663a6251e76994715dc4f46553",
                "md5": "e8752e9e89b7b78115ecc04506d59a6b",
                "sha256": "69f8e307d53f801e04fc0a43800e9a6c0447cbecde3722cc9bbd1aacc81deda6"
            },
            "downloads": -1,
            "filename": "stt-listen-2.4.2.tar.gz",
            "has_sig": false,
            "md5_digest": "e8752e9e89b7b78115ecc04506d59a6b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<3.11",
            "size": 29187,
            "upload_time": "2022-11-24T23:10:44",
            "upload_time_iso_8601": "2022-11-24T23:10:44.303631Z",
            "url": "https://files.pythonhosted.org/packages/69/85/4166501cfcb5a342cafefe6b6df936d45a663a6251e76994715dc4f46553/stt-listen-2.4.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-11-24 23:10:44",
    "github": false,
    "gitlab": true,
    "bitbucket": false,
    "codeberg": false,
    "gitlab_user": "waser-technologies",
    "gitlab_project": "technologies",
    "lcname": "stt-listen"
}
        
Elapsed time: 0.29758s