py-listener


Namepy-listener JSON
Version 0.1.3 PyPI version JSON
download
home_pagehttps://github.com/n1teshy/py-listener
SummaryReal time speech to text
upload_time2024-12-30 18:14:05
maintainerNone
docs_urlNone
authorNitesh Yadav
requires_python>=3.8
licenseMIT
keywords speech-to-text offline speech to text stt
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PyListener

PyListener is tool for near real-time voice processing and speech to text conversion, it can be pretty
fast to slightly sluggish depending on the compute and memory availability of the environment, I suggest
using it in situations where a delay of ~1 second is reasonable, e.g. AI assistants, voice command
processing etc.

[![Watch a demo](https://img.youtube.com/vi/SEFm8rJRg_A/0.jpg)](https://www.youtube.com/watch?v=SEFm8rJRg_A)

## Installation

Use the package manager [pip](https://pip.pypa.io/en/stable/) to install foobar.

```bash
pip install py-listener
```

## Basic Usage

```python
from listener import Listener

# prints what the speaker is saying, look at all
# parameters of the constructor to find out more features
listener = Listener(speech_handler=print)

# start listening
listener.listen()

# NOTE: listening is done from a separate thread, so you must
# have other operations to keep the interpreter going, or it will
# quit. if your code has no other operations, just run a loop like
# below.

# --------------------
# import time

# while True:
#     time.sleep(1)
# -----------------------

# stops listening
listener.stop()

# starts listening again
# listener.listen()
```

## Documentation
There is only one class in the package, the `Listener`.

It starts collecting audio data after instantation into `n` second chunks, `n` is a number passed as an argument, it checks if the audio chunk contains any human voice in it and if there is human voice, it collects that chunk for later processing (conversion to text or any other user-defined processing) and discards the chunk otherwise.

#### Constructor parameters
- `speech_handler`: a function that is called with the text for the human voice in the recorded audio as the only argument, `speech_handler(string speech)`.

- `on_listening_start`: a parameterless function that is called right after the Listener object starts collecting audio.

- `time_window`: an integer that specifies the chunk size of the collected audio in seconds, `2` is the default.

- `no_channels`: the number of audio channels to be used for recording, `1` is the default.

- `has_voice`: a function that is called on the recorded audio chunks to determine if they have human voice in them, it gets the audio chunk in a `numpy.ndarray` object as the only argument, [Silero](https://github.com/snakers4/silero-vad) is used by default to do this, `has_voice(numpy.ndarrray chunk)`.

- `voice_handler`: a function that is used to process [an utterance](https://en.wikipedia.org/wiki/Utterance), a continuous segment of speech, it gets a list of audio chunks as the only argument, `voice_handler(list<numpy.ndarray>)`.

- `voice_to_speech`: a function used to convert human voice to text, [whisper](https://github.com/openai/whisper) is used by default to do this, `voice_to_speech(list<numpy.ndarray>)`.

- `use_fp16`: a boolean flag indicating if the the voice detection and speech-to-text models should use [half precision](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) arithmetic to save memory and reduce latency, the default is `True` if CUDA is available, it has no effect on CPUs at the time of this writing so it's set to `False` by default on CPU environments.

- `en_only`: a flag indicating only english language is going to be used in the collected audio, this is used to determine the best whisper model to use to convert speech to text.

- `show_model_download`: a flag specifying if a progress bar should be displayed when downloading models.

- `device`: this the device where the speech detection and speech to text conversion models run, the default is `cuda if available, else cpu`.

## Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

## License

[MIT](https://choosealicense.com/licenses/mit/)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/n1teshy/py-listener",
    "name": "py-listener",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "speech-to-text, offline speech to text, stt",
    "author": "Nitesh Yadav",
    "author_email": "nitesh.txt@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/c4/97/117ccd7a56ad56178ea1676b37da63c645f41ee628fdad2d2c3e2a4c4a67/py_listener-0.1.3.tar.gz",
    "platform": "Posix; Windows",
    "description": "# PyListener\r\n\r\nPyListener is tool for near real-time voice processing and speech to text conversion, it can be pretty\r\nfast to slightly sluggish depending on the compute and memory availability of the environment, I suggest\r\nusing it in situations where a delay of ~1 second is reasonable, e.g. AI assistants, voice command\r\nprocessing etc.\r\n\r\n[![Watch a demo](https://img.youtube.com/vi/SEFm8rJRg_A/0.jpg)](https://www.youtube.com/watch?v=SEFm8rJRg_A)\r\n\r\n## Installation\r\n\r\nUse the package manager [pip](https://pip.pypa.io/en/stable/) to install foobar.\r\n\r\n```bash\r\npip install py-listener\r\n```\r\n\r\n## Basic Usage\r\n\r\n```python\r\nfrom listener import Listener\r\n\r\n# prints what the speaker is saying, look at all\r\n# parameters of the constructor to find out more features\r\nlistener = Listener(speech_handler=print)\r\n\r\n# start listening\r\nlistener.listen()\r\n\r\n# NOTE: listening is done from a separate thread, so you must\r\n# have other operations to keep the interpreter going, or it will\r\n# quit. if your code has no other operations, just run a loop like\r\n# below.\r\n\r\n# --------------------\r\n# import time\r\n\r\n# while True:\r\n#     time.sleep(1)\r\n# -----------------------\r\n\r\n# stops listening\r\nlistener.stop()\r\n\r\n# starts listening again\r\n# listener.listen()\r\n```\r\n\r\n## Documentation\r\nThere is only one class in the package, the `Listener`.\r\n\r\nIt starts collecting audio data after instantation into `n` second chunks, `n` is a number passed as an argument, it checks if the audio chunk contains any human voice in it and if there is human voice, it collects that chunk for later processing (conversion to text or any other user-defined processing) and discards the chunk otherwise.\r\n\r\n#### Constructor parameters\r\n- `speech_handler`: a function that is called with the text for the human voice in the recorded audio as the only argument, `speech_handler(string speech)`.\r\n\r\n- `on_listening_start`: a parameterless function that is called right after the Listener object starts collecting audio.\r\n\r\n- `time_window`: an integer that specifies the chunk size of the collected audio in seconds, `2` is the default.\r\n\r\n- `no_channels`: the number of audio channels to be used for recording, `1` is the default.\r\n\r\n- `has_voice`: a function that is called on the recorded audio chunks to determine if they have human voice in them, it gets the audio chunk in a `numpy.ndarray` object as the only argument, [Silero](https://github.com/snakers4/silero-vad) is used by default to do this, `has_voice(numpy.ndarrray chunk)`.\r\n\r\n- `voice_handler`: a function that is used to process [an utterance](https://en.wikipedia.org/wiki/Utterance), a continuous segment of speech, it gets a list of audio chunks as the only argument, `voice_handler(list<numpy.ndarray>)`.\r\n\r\n- `voice_to_speech`: a function used to convert human voice to text, [whisper](https://github.com/openai/whisper) is used by default to do this, `voice_to_speech(list<numpy.ndarray>)`.\r\n\r\n- `use_fp16`: a boolean flag indicating if the the voice detection and speech-to-text models should use [half precision](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) arithmetic to save memory and reduce latency, the default is `True` if CUDA is available, it has no effect on CPUs at the time of this writing so it's set to `False` by default on CPU environments.\r\n\r\n- `en_only`: a flag indicating only english language is going to be used in the collected audio, this is used to determine the best whisper model to use to convert speech to text.\r\n\r\n- `show_model_download`: a flag specifying if a progress bar should be displayed when downloading models.\r\n\r\n- `device`: this the device where the speech detection and speech to text conversion models run, the default is `cuda if available, else cpu`.\r\n\r\n## Contributing\r\n\r\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\r\n\r\n## License\r\n\r\n[MIT](https://choosealicense.com/licenses/mit/)\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Real time speech to text",
    "version": "0.1.3",
    "project_urls": {
        "Homepage": "https://github.com/n1teshy/py-listener"
    },
    "split_keywords": [
        "speech-to-text",
        " offline speech to text",
        " stt"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f954aa76a9fc5942b30ae9310e9267362f236af4f8dcd336b5b061954e04cb39",
                "md5": "ce3a09aa5b0c6f6c6d7217e168336870",
                "sha256": "23fde1961daaebb0d1dcb417def8b45183a4c7709b2e13536f924ce7479272bc"
            },
            "downloads": -1,
            "filename": "py_listener-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ce3a09aa5b0c6f6c6d7217e168336870",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 7907,
            "upload_time": "2024-12-30T18:14:04",
            "upload_time_iso_8601": "2024-12-30T18:14:04.054694Z",
            "url": "https://files.pythonhosted.org/packages/f9/54/aa76a9fc5942b30ae9310e9267362f236af4f8dcd336b5b061954e04cb39/py_listener-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c497117ccd7a56ad56178ea1676b37da63c645f41ee628fdad2d2c3e2a4c4a67",
                "md5": "63e9b05f305db36ba99f01beae8d250b",
                "sha256": "674717a3bf3f71e6cdf6a2138a3db6684b9f6579d83b677258406b237dc26f28"
            },
            "downloads": -1,
            "filename": "py_listener-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "63e9b05f305db36ba99f01beae8d250b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 8289,
            "upload_time": "2024-12-30T18:14:05",
            "upload_time_iso_8601": "2024-12-30T18:14:05.341006Z",
            "url": "https://files.pythonhosted.org/packages/c4/97/117ccd7a56ad56178ea1676b37da63c645f41ee628fdad2d2c3e2a4c4a67/py_listener-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-30 18:14:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "n1teshy",
    "github_project": "py-listener",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "py-listener"
}
        
Elapsed time: 1.32635s