Name | listening_neuron JSON |
Version |
0.0.35
JSON |
| download |
home_page | None |
Summary | None |
upload_time | 2025-02-15 13:34:57 |
maintainer | None |
docs_url | None |
author | C. Thomas Brittain |
requires_python | >=3.10 |
license | MIT |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<!-- start setup -->
## Setup
A simple toolset for using [Whisper](https://openai.com/index/whisper/) models to transcribe audio in real-time.
The `listening_neuron` is a wrapper around the whisper library that provides a simple interface for transcribing audio in real-time. The module is designed to be versatile, piping the data to local or remote endpoints for further processing. All aspects of the transcription can be configured via a config file (see bottom).
## Other Neuron Modules
- [Speaking Neuron](https://github.com/Ladvien/speech_neuron) - A simple text-to-speech server using Kokoro models.
### Prerequisites
#### MacOS
1. Install `brew install portaudio`
<!-- end setup -->
<!-- start quick_start -->
## Quick Start
Install the package and create a config file.
```
pip install listening_neuron
```
Create a `config.yaml` file with the following content according to configuration options below.
Below is a basic example of how to use the listening neuron to transcribe audio in real-time.
```python
from listening_neuron import Config, RecordingDevice, ListeningNeuron, TranscriptionResult
def transcription_callback(text: str, result: TranscriptionResult) -> None:
print("Here's what I heard: ")
print(result)
config = Config.load("config.yaml")
recording_device = RecordingDevice(config.mic_config)
listening_neuron = ListeningNeuron(
config.listening_neuron,
recording_device,
)
listening_neuron.listen(transcription_callback)
```
The `transcription_callback` function is called when a transcription is completed.
<!-- end quick_start -->
## Documentation
- [Documentation](https://listening_neuron.readthedocs.io/en/latest/)
## Attribution
The core of this code was heavily influenced and includes some code from:
- https://github.com/davabase/whisper_real_time/tree/master
- https://github.com/openai/whisper/discussions/608
Huge thanks to [davabase](https://github.com/davabase) for the initial code! All I've done is wrap it up in a nice package.
<!-- start advanced_usage -->
### Send Text to Web API
```py
import requests
from listening_neuron import Config, RecordingDevice, ListeningNeuron, TranscriptionResult
def transcription_callback(text: str, result: TranscriptionResult) -> None:
# Send the transcription to a REST API
requests.post(
"http://localhost:5000/transcribe",
json={"text": text, "result": result.to_dict()}
)
config = Config.load("config.yaml")
recording_device = RecordingDevice(config.mic_config)
listening_neuron = ListeningNeuron(
config.listening_neuron,
recording_device,
)
listening_neuron.listen(transcription_callback)
```
The `TranscriptionResult` object has a `.to_dict()` method that converts the object to a dictionary, which can be serialized to JSON.
```json
{
"text": "This is only a test of words.",
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 1.8,
"text": " This is only a test of words.",
"tokens": [50363, 770, 318, 691, 257, 1332, 286, 2456, 13, 50463],
"temperature": 0.0,
"avg_logprob": -0.43947878750887787,
"compression_ratio": 0.8285714285714286,
"no_speech_prob": 0.0012085052439942956,
"words": [
{"word": " This", "start": 0.0, "end": 0.36, "probability": 0.750191330909729},
{"word": " is", "start": 0.36, "end": 0.54, "probability": 0.997636079788208},
{"word": " only", "start": 0.54, "end": 0.78, "probability": 0.998072624206543},
{"word": " a", "start": 0.78, "end": 1.02, "probability": 0.9984667897224426},
{"word": " test", "start": 1.02, "end": 1.28, "probability": 0.9980781078338623},
{"word": " of", "start": 1.28, "end": 1.48, "probability": 0.99817955493927},
{"word": " words.", "start": 1.48, "end": 1.8, "probability": 0.9987621307373047}
]
}
],
"language": "en",
"processing_secs": 5.410359,
"local_starttime": "2025-01-31T06:19:03.322642-06:00",
"processing_rolling_avg_secs": 22.098183908976
}
```
<!-- end advanced_usage -->
<!-- start config -->
## Config
Config is a `yaml` file enabling control of all aspects of the audio recording, model config, and transcription formatting. Below is an example of a config file.
```yaml
mic_config:
mic_name: "Jabra SPEAK 410 USB: Audio (hw:3,0)" # Linux only
sample_rate: 16000
energy_threshold: 3000 # 0-4000
listening_neuron:
record_timeout: 2 # 0-10
phrase_timeout: 3 # 0-10
in_memory: True
transcribe_config:
# 'tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small',
#'medium.en', 'medium', 'large-v1', 'large-v2', 'large-v3',
# 'large', 'large-v3-turbo', 'turbo'
model: medium.en
# Whether to display the text being decoded to the console.
# If True, displays all the details, If False, displays
# minimal details. If None, does not display anything
verbose: True
# Temperature for sampling. It can be a tuple of temperatures,
# which will be successively used upon failures according to
# either compression_ratio_threshold or logprob_threshold.
temperature: "(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)" # "(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)"
# If the gzip compression ratio is above this value,
# treat as failed
compression_ratio_threshold: 2.4 # 2.4
# If the average log probability over sampled tokens is below this value, treat as failed
logprob_threshold: -1.0 # -1.0
# If the no_speech probability is higher than this value AND
# the average log probability over sampled tokens is below
# logprob_threshold, consider the segment as silent
no_speech_threshold: 0.6 # 0.6
# if True, the previous output of the model is provided as a
# prompt for the next window; disabling may make the text
# inconsistent across windows, but the model becomes less
# prone to getting stuck in a failure loop, such as repetition
# looping or timestamps going out of sync.
condition_on_previous_text: True # True
# Extract word-level timestamps using the cross-attention
# pattern and dynamic time warping, and include the timestamps
# for each word in each segment.
# NOTE: Setting this to true also adds word level data to the
# output, which can be useful for downstream processing. E.g.,
# {
# 'word': 'test',
# 'start': np.float64(1.0),
# 'end': np.float64(1.6),
# 'probability': np.float64(0.8470910787582397)
# }
word_timestamps: True # False
# If word_timestamps is True, merge these punctuation symbols
# with the next word
prepend_punctuations: '"''“¿([{-'
# If word_timestamps is True, merge these punctuation symbols with the previous word
append_punctuations: '"''.。,,!!??::”)]}、'
# Optional text to provide as a prompt for the first window.
# This can be used to provide, or "prompt-engineer" a context
# for transcription, e.g. custom vocabularies or proper nouns
# to make it more likely to predict those word correctly.
initial_prompt: "" # ""
# Comma-separated list start,end,start,end,... timestamps
# (in seconds) of clips to process. The last end timestamp
# defaults to the end of the file.
clip_timestamps: "0" # "0"
# When word_timestamps is True, skip silent periods **longer**
# than this threshold (in seconds) when a possible
# hallucination is detected
hallucination_silence_threshold: None # float | None
# Keyword arguments to construct DecodingOptions instances
# TODO: How can DecodingOptions work?
logging_config:
level: INFO # DEBUG, INFO, WARNING, ERROR, CRITICAL
filepath: "talking.log"
log_entry_format: "%(asctime)s - %(levelname)s - %(message)s"
date_format: "%Y-%m-%d %H:%M:%S"
```
<!-- end config -->
Raw data
{
"_id": null,
"home_page": null,
"name": "listening_neuron",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": null,
"author": "C. Thomas Brittain",
"author_email": "cthomasbrittain@hotmail.com",
"download_url": "https://files.pythonhosted.org/packages/ad/9d/6bfd9f6cf4c1dfcc60f36b2e6da365238c14a00fd269f1d7b1f360c005f0/listening_neuron-0.0.35.tar.gz",
"platform": null,
"description": "<!-- start setup -->\n## Setup\nA simple toolset for using [Whisper](https://openai.com/index/whisper/) models to transcribe audio in real-time.\n\nThe `listening_neuron` is a wrapper around the whisper library that provides a simple interface for transcribing audio in real-time. The module is designed to be versatile, piping the data to local or remote endpoints for further processing. All aspects of the transcription can be configured via a config file (see bottom).\n\n## Other Neuron Modules\n- [Speaking Neuron](https://github.com/Ladvien/speech_neuron) - A simple text-to-speech server using Kokoro models.\n\n### Prerequisites\n\n#### MacOS\n1. Install `brew install portaudio`\n\n<!-- end setup -->\n\n<!-- start quick_start -->\n## Quick Start\n\nInstall the package and create a config file.\n```\npip install listening_neuron\n```\n\nCreate a `config.yaml` file with the following content according to configuration options below.\n\nBelow is a basic example of how to use the listening neuron to transcribe audio in real-time.\n```python\nfrom listening_neuron import Config, RecordingDevice, ListeningNeuron, TranscriptionResult\n\ndef transcription_callback(text: str, result: TranscriptionResult) -> None:\n print(\"Here's what I heard: \")\n print(result)\n\nconfig = Config.load(\"config.yaml\")\n\nrecording_device = RecordingDevice(config.mic_config)\nlistening_neuron = ListeningNeuron(\n config.listening_neuron,\n recording_device,\n)\n\nlistening_neuron.listen(transcription_callback)\n```\n\nThe `transcription_callback` function is called when a transcription is completed. \n\n<!-- end quick_start -->\n\n## Documentation\n- [Documentation](https://listening_neuron.readthedocs.io/en/latest/)\n\n## Attribution\nThe core of this code was heavily influenced and includes some code from:\n- https://github.com/davabase/whisper_real_time/tree/master\n- https://github.com/openai/whisper/discussions/608\n\nHuge thanks to [davabase](https://github.com/davabase) for the initial code! All I've done is wrap it up in a nice package.\n\n<!-- start advanced_usage -->\n### Send Text to Web API\n```py\nimport requests\nfrom listening_neuron import Config, RecordingDevice, ListeningNeuron, TranscriptionResult\n\ndef transcription_callback(text: str, result: TranscriptionResult) -> None:\n # Send the transcription to a REST API\n requests.post(\n \"http://localhost:5000/transcribe\",\n json={\"text\": text, \"result\": result.to_dict()}\n )\n\nconfig = Config.load(\"config.yaml\")\nrecording_device = RecordingDevice(config.mic_config)\nlistening_neuron = ListeningNeuron(\n config.listening_neuron,\n recording_device,\n)\nlistening_neuron.listen(transcription_callback)\n```\n\nThe `TranscriptionResult` object has a `.to_dict()` method that converts the object to a dictionary, which can be serialized to JSON.\n\n```json\n{\n \"text\": \"This is only a test of words.\",\n \"segments\": [\n {\n \"id\": 0,\n \"seek\": 0,\n \"start\": 0.0,\n \"end\": 1.8,\n \"text\": \" This is only a test of words.\",\n \"tokens\": [50363, 770, 318, 691, 257, 1332, 286, 2456, 13, 50463],\n \"temperature\": 0.0,\n \"avg_logprob\": -0.43947878750887787,\n \"compression_ratio\": 0.8285714285714286,\n \"no_speech_prob\": 0.0012085052439942956,\n \"words\": [\n {\"word\": \" This\", \"start\": 0.0, \"end\": 0.36, \"probability\": 0.750191330909729},\n {\"word\": \" is\", \"start\": 0.36, \"end\": 0.54, \"probability\": 0.997636079788208},\n {\"word\": \" only\", \"start\": 0.54, \"end\": 0.78, \"probability\": 0.998072624206543},\n {\"word\": \" a\", \"start\": 0.78, \"end\": 1.02, \"probability\": 0.9984667897224426},\n {\"word\": \" test\", \"start\": 1.02, \"end\": 1.28, \"probability\": 0.9980781078338623},\n {\"word\": \" of\", \"start\": 1.28, \"end\": 1.48, \"probability\": 0.99817955493927},\n {\"word\": \" words.\", \"start\": 1.48, \"end\": 1.8, \"probability\": 0.9987621307373047}\n ]\n }\n ],\n \"language\": \"en\",\n \"processing_secs\": 5.410359,\n \"local_starttime\": \"2025-01-31T06:19:03.322642-06:00\",\n \"processing_rolling_avg_secs\": 22.098183908976\n}\n```\n<!-- end advanced_usage -->\n\n<!-- start config -->\n## Config\nConfig is a `yaml` file enabling control of all aspects of the audio recording, model config, and transcription formatting. Below is an example of a config file.\n\n```yaml\nmic_config:\n mic_name: \"Jabra SPEAK 410 USB: Audio (hw:3,0)\" # Linux only\n sample_rate: 16000\n energy_threshold: 3000 # 0-4000\n\nlistening_neuron:\n record_timeout: 2 # 0-10\n phrase_timeout: 3 # 0-10\n in_memory: True\n transcribe_config:\n # 'tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', \n #'medium.en', 'medium', 'large-v1', 'large-v2', 'large-v3', \n # 'large', 'large-v3-turbo', 'turbo'\n model: medium.en\n\n # Whether to display the text being decoded to the console.\n # If True, displays all the details, If False, displays\n # minimal details. If None, does not display anything\n verbose: True\n\n # Temperature for sampling. It can be a tuple of temperatures,\n # which will be successively used upon failures according to\n # either compression_ratio_threshold or logprob_threshold.\n temperature: \"(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)\" # \"(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)\"\n\n # If the gzip compression ratio is above this value,\n # treat as failed\n compression_ratio_threshold: 2.4 # 2.4\n\n # If the average log probability over sampled tokens is below this value, treat as failed\n logprob_threshold: -1.0 # -1.0\n\n # If the no_speech probability is higher than this value AND\n # the average log probability over sampled tokens is below\n # logprob_threshold, consider the segment as silent\n no_speech_threshold: 0.6 # 0.6\n\n # if True, the previous output of the model is provided as a\n # prompt for the next window; disabling may make the text\n # inconsistent across windows, but the model becomes less\n # prone to getting stuck in a failure loop, such as repetition\n # looping or timestamps going out of sync.\n condition_on_previous_text: True # True\n\n # Extract word-level timestamps using the cross-attention\n # pattern and dynamic time warping, and include the timestamps\n # for each word in each segment.\n # NOTE: Setting this to true also adds word level data to the\n # output, which can be useful for downstream processing. E.g.,\n # {\n # 'word': 'test',\n # 'start': np.float64(1.0),\n # 'end': np.float64(1.6),\n # 'probability': np.float64(0.8470910787582397)\n # }\n word_timestamps: True # False\n\n # If word_timestamps is True, merge these punctuation symbols\n # with the next word\n\n prepend_punctuations: '\"''\u201c\u00bf([{-'\n\n # If word_timestamps is True, merge these punctuation symbols with the previous word\n append_punctuations: '\"''.\u3002,\uff0c!\uff01?\uff1f:\uff1a\u201d)]}\u3001'\n\n # Optional text to provide as a prompt for the first window.\n # This can be used to provide, or \"prompt-engineer\" a context\n # for transcription, e.g. custom vocabularies or proper nouns\n # to make it more likely to predict those word correctly.\n initial_prompt: \"\" # \"\"\n\n # Comma-separated list start,end,start,end,... timestamps\n # (in seconds) of clips to process. The last end timestamp\n # defaults to the end of the file.\n clip_timestamps: \"0\" # \"0\"\n\n # When word_timestamps is True, skip silent periods **longer**\n # than this threshold (in seconds) when a possible\n # hallucination is detected\n hallucination_silence_threshold: None # float | None\n\n # Keyword arguments to construct DecodingOptions instances\n # TODO: How can DecodingOptions work?\n\nlogging_config:\n level: INFO # DEBUG, INFO, WARNING, ERROR, CRITICAL\n filepath: \"talking.log\"\n log_entry_format: \"%(asctime)s - %(levelname)s - %(message)s\"\n date_format: \"%Y-%m-%d %H:%M:%S\"\n```\n<!-- end config -->",
"bugtrack_url": null,
"license": "MIT",
"summary": null,
"version": "0.0.35",
"project_urls": {
"documentation": "https://listening_neuron.readthedocs.io/en/latest/",
"homepage": "https://github.com/Ladvien/speech-node",
"repository": "https://github.com/Ladvien/speech-node"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b4bdd1c2e6af91804a2130b5636c71eedcc043df8dc5f339a2cade5ddbe5c48e",
"md5": "c0b385e73f82b35d7e2a5e610e9c79b2",
"sha256": "10749cb4f2023f00280ad638cae3311b4eb28b7eac51e8a12878400224a3dd5e"
},
"downloads": -1,
"filename": "listening_neuron-0.0.35-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c0b385e73f82b35d7e2a5e610e9c79b2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 12328,
"upload_time": "2025-02-15T13:34:55",
"upload_time_iso_8601": "2025-02-15T13:34:55.710466Z",
"url": "https://files.pythonhosted.org/packages/b4/bd/d1c2e6af91804a2130b5636c71eedcc043df8dc5f339a2cade5ddbe5c48e/listening_neuron-0.0.35-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ad9d6bfd9f6cf4c1dfcc60f36b2e6da365238c14a00fd269f1d7b1f360c005f0",
"md5": "b7e0ed3f0a58af176e8b1b73f6988af9",
"sha256": "cb3685807abc4ec52e6ae373dcbe7ede5acc2ea6c92534bdd8a2f810fcf27a65"
},
"downloads": -1,
"filename": "listening_neuron-0.0.35.tar.gz",
"has_sig": false,
"md5_digest": "b7e0ed3f0a58af176e8b1b73f6988af9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 11281,
"upload_time": "2025-02-15T13:34:57",
"upload_time_iso_8601": "2025-02-15T13:34:57.501358Z",
"url": "https://files.pythonhosted.org/packages/ad/9d/6bfd9f6cf4c1dfcc60f36b2e6da365238c14a00fd269f1d7b1f360c005f0/listening_neuron-0.0.35.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-15 13:34:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Ladvien",
"github_project": "speech-node",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "listening_neuron"
}