Name | speech_neuron JSON |
Version |
0.0.5
JSON |
| download |
home_page | None |
Summary | None |
upload_time | 2025-02-15 14:10:29 |
maintainer | None |
docs_url | None |
author | C. Thomas Brittain |
requires_python | <3.13,>=3.10 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# speech_neuron
A text-to-speech server to convert text to speech using the [Kokoro-TTS](https://huggingface.co/spaces/hexgrad/Kokoro-TTS) models and FastAPI.
## Other Neuron Packages
- [Listening Neuron](https://github.com/Ladvien/listening_neuron)
<!-- start quick_start -->
## Quick Start
Run:
```sh
pip install speech_neuron
```
Create a `config.yaml` file with the following content, see [Configuration](#configuration) for more details.
Create a `main.py` file with the following content:
```py
import os
import yaml
from fastapi import FastAPI
import uvicorn
from speech_neuron import SpeechNeuronServer, NodeConfig
CONFIG_PATH = os.environ.get("NODE_CONFIG_PATH", "config.yaml")
config = NodeConfig(**yaml.safe_load(open(CONFIG_PATH, "r")))
app = FastAPI()
speech_neuron = SpeechNeuronServer(config)
app.include_router(speech_neuron.router)
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
```
Create a client file `client.py` with the following content:
```py
import requests
import io
import sounddevice as sd
import soundfile as sf
from datetime import datetime
HOST = "http://0.0.0.0:8000" # <--- Change to your server IP
url = f"{HOST}/node/speech"
start = datetime.now()
response = requests.get(
url,
params={
"text": """Anyway, it was the Saturday of the football game with Saxon Hall.
The game with Saxon Hall was supposed to be a very big deal around Pencey.
""",
"voice": "af_bella",
"speed": 1.1,
"split_pattern": r"\n+",
},
stream=True,
)
# Read the streamed response into memory
audio_buffer = io.BytesIO()
for chunk in response.iter_content(chunk_size=4096):
if chunk:
audio_buffer.write(chunk)
# Play the audio in real-time
audio_buffer.seek(0) # Reset buffer for reading
data, samplerate = sf.read(audio_buffer)
sd.play(data, samplerate)
sd.wait() # Wait for audio to finish playing
print(f"Time taken: {datetime.now() - start}")
```
Run:
```sh
python main.py &
```
And then run:
```sh
python client.py
```
<!-- end quick_start -->
<!-- start config -->
## Configuration
Create a `config.yaml` file with the following content:
```yaml
name: "speech_node"
# "kokoro-v1.0.fp16-gpu.onnx",
# "kokoro-v1.0.fp16.onnx",
# "kokoro-v1.0.int8.onnx",
# "kokoro-v1.0.onnx"
model_name: kokoro-v1.0.int8.onnx
voices_name: voices-v1.0.bin
response:
# TODO: type: stream
sample_rate: 24000
format: wav
compression_level: 0
pipeline:
model:
device: cpu # cpu or cuda
use_transformer: true
# Model configuration
# 'a' = American English
# 'b' = British English
# 'e' = Spanish
# 'f' = French
# 'h' = Hindi
# 'i' = Italian
# 'p' = Portuguese
# 'j' = Japanese
# 'z' = Chinese
language_code: en-us
# Request defaults
speed: 1.0 # Can be set during request
voice: "af_heart" # Can be set during request
split_pattern: "\n" # Can be set during request
```
<!-- end config -->
****
## Dependencies
### Linux
#### Ubuntu
```sh
sudo apt update
sudo apt install libglslang-dev
```
#### Manjaro
```sh
sudo pacman -S ffmpeg glslang
# Check for version mismatch
find /usr -name "libglslang-default-resource-limits.so*"
# If version mismatch
sudo ln -s /usr/lib/libglslang-default-resource-limits.so.15 /usr/lib/libglslang-default-resource-limits.so.14
# Check for version mismatch
find /usr -name "libSPIRV.so*"
# If version mismatch
sudo ldconfig
```
If NVIDIA is not working:
```sh
sudo modprobe -r nvidia_uvm
sudo modprobe nvidia_uvm
```
### MacOS
```
brew install ffmpeg
brew install glslang
```
Raw data
{
"_id": null,
"home_page": null,
"name": "speech_neuron",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.10",
"maintainer_email": null,
"keywords": null,
"author": "C. Thomas Brittain",
"author_email": "cthomasbrittain@hotmail.com",
"download_url": "https://files.pythonhosted.org/packages/9c/2d/4b305517da55ae70b5a703026eb1fb132cd62c490ae925d64d8ff07679f7/speech_neuron-0.0.5.tar.gz",
"platform": null,
"description": "# speech_neuron\nA text-to-speech server to convert text to speech using the [Kokoro-TTS](https://huggingface.co/spaces/hexgrad/Kokoro-TTS) models and FastAPI.\n\n## Other Neuron Packages\n- [Listening Neuron](https://github.com/Ladvien/listening_neuron)\n<!-- start quick_start -->\n\n## Quick Start\nRun:\n```sh\npip install speech_neuron\n```\n\nCreate a `config.yaml` file with the following content, see [Configuration](#configuration) for more details.\n\nCreate a `main.py` file with the following content:\n```py\nimport os\nimport yaml\nfrom fastapi import FastAPI\nimport uvicorn\nfrom speech_neuron import SpeechNeuronServer, NodeConfig\n\nCONFIG_PATH = os.environ.get(\"NODE_CONFIG_PATH\", \"config.yaml\")\nconfig = NodeConfig(**yaml.safe_load(open(CONFIG_PATH, \"r\")))\n\napp = FastAPI()\n\nspeech_neuron = SpeechNeuronServer(config)\napp.include_router(speech_neuron.router)\n\nif __name__ == \"__main__\":\n uvicorn.run(app, host=\"0.0.0.0\", port=8000)\n\n```\n\nCreate a client file `client.py` with the following content:\n```py\nimport requests\nimport io\nimport sounddevice as sd\nimport soundfile as sf\nfrom datetime import datetime\n\nHOST = \"http://0.0.0.0:8000\" # <--- Change to your server IP\nurl = f\"{HOST}/node/speech\"\n\nstart = datetime.now()\nresponse = requests.get(\n url,\n params={\n \"text\": \"\"\"Anyway, it was the Saturday of the football game with Saxon Hall. \n The game with Saxon Hall was supposed to be a very big deal around Pencey. \n \"\"\",\n \"voice\": \"af_bella\",\n \"speed\": 1.1,\n \"split_pattern\": r\"\\n+\",\n },\n stream=True,\n)\n\n\n# Read the streamed response into memory\naudio_buffer = io.BytesIO()\nfor chunk in response.iter_content(chunk_size=4096):\n if chunk:\n audio_buffer.write(chunk)\n\n# Play the audio in real-time\naudio_buffer.seek(0) # Reset buffer for reading\ndata, samplerate = sf.read(audio_buffer)\nsd.play(data, samplerate)\nsd.wait() # Wait for audio to finish playing\n\nprint(f\"Time taken: {datetime.now() - start}\")\n```\n\nRun:\n```sh\npython main.py &\n```\n\nAnd then run:\n```sh\npython client.py\n```\n<!-- end quick_start -->\n\n<!-- start config -->\n\n## Configuration\nCreate a `config.yaml` file with the following content:\n```yaml\nname: \"speech_node\"\n\n# \"kokoro-v1.0.fp16-gpu.onnx\",\n# \"kokoro-v1.0.fp16.onnx\",\n# \"kokoro-v1.0.int8.onnx\",\n# \"kokoro-v1.0.onnx\"\nmodel_name: kokoro-v1.0.int8.onnx\nvoices_name: voices-v1.0.bin\n\nresponse:\n # TODO: type: stream\n sample_rate: 24000\n format: wav\n compression_level: 0\n\npipeline:\n model:\n device: cpu # cpu or cuda\n use_transformer: true\n\n # Model configuration\n # 'a' = American English\n # 'b' = British English\n # 'e' = Spanish\n # 'f' = French\n # 'h' = Hindi\n # 'i' = Italian\n # 'p' = Portuguese\n # 'j' = Japanese\n # 'z' = Chinese\n language_code: en-us\n\n # Request defaults\n speed: 1.0 # Can be set during request\n voice: \"af_heart\" # Can be set during request\n split_pattern: \"\\n\" # Can be set during request\n```\n<!-- end config -->\n****\n\n## Dependencies\n\n### Linux\n\n\n#### Ubuntu\n```sh\nsudo apt update\nsudo apt install libglslang-dev\n```\n\n#### Manjaro\n```sh\nsudo pacman -S ffmpeg glslang\n\n# Check for version mismatch\nfind /usr -name \"libglslang-default-resource-limits.so*\"\n# If version mismatch\nsudo ln -s /usr/lib/libglslang-default-resource-limits.so.15 /usr/lib/libglslang-default-resource-limits.so.14\n\n# Check for version mismatch\nfind /usr -name \"libSPIRV.so*\"\n# If version mismatch\n\nsudo ldconfig\n```\n\nIf NVIDIA is not working:\n```sh\nsudo modprobe -r nvidia_uvm\nsudo modprobe nvidia_uvm\n```\n\n### MacOS\n```\nbrew install ffmpeg\nbrew install glslang\n```",
"bugtrack_url": null,
"license": null,
"summary": null,
"version": "0.0.5",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0b07a95e9972a82d8facb5626dfa3675c2dc7442701be1b83b03f76996ac4ade",
"md5": "2b02d412e79e7137a62bd44b4b3d8df8",
"sha256": "99527827affe3ca467694bd51e7c2cf3ee424532ad8267281bdda080e840340d"
},
"downloads": -1,
"filename": "speech_neuron-0.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2b02d412e79e7137a62bd44b4b3d8df8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.10",
"size": 16960,
"upload_time": "2025-02-15T14:10:27",
"upload_time_iso_8601": "2025-02-15T14:10:27.749069Z",
"url": "https://files.pythonhosted.org/packages/0b/07/a95e9972a82d8facb5626dfa3675c2dc7442701be1b83b03f76996ac4ade/speech_neuron-0.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9c2d4b305517da55ae70b5a703026eb1fb132cd62c490ae925d64d8ff07679f7",
"md5": "c117192b3d99b39ff973068ae541f12a",
"sha256": "3cbc18ea75f15a5408971a80899c7b546f5affd0bb56794269fc2bc64c31dd29"
},
"downloads": -1,
"filename": "speech_neuron-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "c117192b3d99b39ff973068ae541f12a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.10",
"size": 16209,
"upload_time": "2025-02-15T14:10:29",
"upload_time_iso_8601": "2025-02-15T14:10:29.556503Z",
"url": "https://files.pythonhosted.org/packages/9c/2d/4b305517da55ae70b5a703026eb1fb132cd62c490ae925d64d8ff07679f7/speech_neuron-0.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-15 14:10:29",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "speech_neuron"
}