speech_neuron

Name	speech_neuron JSON
Version	0.0.5 JSON
	download
home_page	None
Summary	None
upload_time	2025-02-15 14:10:29
maintainer	None
docs_url	None
author	C. Thomas Brittain
requires_python	<3.13,>=3.10
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # speech_neuron
A text-to-speech server to convert text to speech using the [Kokoro-TTS](https://huggingface.co/spaces/hexgrad/Kokoro-TTS) models and FastAPI.

## Other Neuron Packages
- [Listening Neuron](https://github.com/Ladvien/listening_neuron)
<!-- start quick_start -->

## Quick Start
Run:
```sh
pip install speech_neuron
```

Create a `config.yaml` file with the following content, see [Configuration](#configuration) for more details.

Create a `main.py` file with the following content:
```py
import os
import yaml
from fastapi import FastAPI
import uvicorn
from speech_neuron import SpeechNeuronServer, NodeConfig

CONFIG_PATH = os.environ.get("NODE_CONFIG_PATH", "config.yaml")
config = NodeConfig(**yaml.safe_load(open(CONFIG_PATH, "r")))

app = FastAPI()

speech_neuron = SpeechNeuronServer(config)
app.include_router(speech_neuron.router)

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

```

Create a client file `client.py` with the following content:
```py
import requests
import io
import sounddevice as sd
import soundfile as sf
from datetime import datetime

HOST = "http://0.0.0.0:8000" # <--- Change to your server IP
url = f"{HOST}/node/speech"

start = datetime.now()
response = requests.get(
    url,
    params={
        "text": """Anyway, it was the Saturday of the football game with Saxon Hall. 
                   The game with Saxon Hall was supposed to be a very big deal around Pencey. 
        """,
        "voice": "af_bella",
        "speed": 1.1,
        "split_pattern": r"\n+",
    },
    stream=True,
)


# Read the streamed response into memory
audio_buffer = io.BytesIO()
for chunk in response.iter_content(chunk_size=4096):
    if chunk:
        audio_buffer.write(chunk)

# Play the audio in real-time
audio_buffer.seek(0)  # Reset buffer for reading
data, samplerate = sf.read(audio_buffer)
sd.play(data, samplerate)
sd.wait()  # Wait for audio to finish playing

print(f"Time taken: {datetime.now() - start}")
```

Run:
```sh
python main.py &
```

And then run:
```sh
python client.py
```
<!-- end quick_start -->

<!-- start config -->

## Configuration
Create a `config.yaml` file with the following content:
```yaml
name: "speech_node"

# "kokoro-v1.0.fp16-gpu.onnx",
# "kokoro-v1.0.fp16.onnx",
# "kokoro-v1.0.int8.onnx",
# "kokoro-v1.0.onnx"
model_name: kokoro-v1.0.int8.onnx
voices_name: voices-v1.0.bin

response:
  # TODO: type: stream
  sample_rate: 24000
  format: wav
  compression_level: 0

pipeline:
  model:
  device: cpu # cpu or cuda
  use_transformer: true

  # Model configuration
  # 'a' = American English
  # 'b' = British English
  # 'e' = Spanish
  # 'f' = French
  # 'h' = Hindi
  # 'i' = Italian
  # 'p' = Portuguese
  # 'j' = Japanese
  # 'z' = Chinese
  language_code: en-us

  # Request defaults
  speed: 1.0 # Can be set during request
  voice: "af_heart" # Can be set during request
  split_pattern: "\n" # Can be set during request
```
<!-- end config -->
****

## Dependencies

### Linux


#### Ubuntu
```sh
sudo apt update
sudo apt install libglslang-dev
```

#### Manjaro
```sh
sudo pacman -S ffmpeg glslang

# Check for version mismatch
find /usr -name "libglslang-default-resource-limits.so*"
# If version mismatch
sudo ln -s /usr/lib/libglslang-default-resource-limits.so.15 /usr/lib/libglslang-default-resource-limits.so.14

# Check for version mismatch
find /usr -name "libSPIRV.so*"
# If version mismatch

sudo ldconfig
```

If NVIDIA is not working:
```sh
sudo modprobe -r nvidia_uvm
sudo modprobe nvidia_uvm
```

### MacOS
```
brew install ffmpeg
brew install glslang
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "speech_neuron",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "C. Thomas Brittain",
    "author_email": "cthomasbrittain@hotmail.com",
    "download_url": "https://files.pythonhosted.org/packages/9c/2d/4b305517da55ae70b5a703026eb1fb132cd62c490ae925d64d8ff07679f7/speech_neuron-0.0.5.tar.gz",
    "platform": null,
    "description": "# speech_neuron\nA text-to-speech server to convert text to speech using the [Kokoro-TTS](https://huggingface.co/spaces/hexgrad/Kokoro-TTS) models and FastAPI.\n\n## Other Neuron Packages\n- [Listening Neuron](https://github.com/Ladvien/listening_neuron)\n<!-- start quick_start -->\n\n## Quick Start\nRun:\n```sh\npip install speech_neuron\n```\n\nCreate a `config.yaml` file with the following content, see [Configuration](#configuration) for more details.\n\nCreate a `main.py` file with the following content:\n```py\nimport os\nimport yaml\nfrom fastapi import FastAPI\nimport uvicorn\nfrom speech_neuron import SpeechNeuronServer, NodeConfig\n\nCONFIG_PATH = os.environ.get(\"NODE_CONFIG_PATH\", \"config.yaml\")\nconfig = NodeConfig(**yaml.safe_load(open(CONFIG_PATH, \"r\")))\n\napp = FastAPI()\n\nspeech_neuron = SpeechNeuronServer(config)\napp.include_router(speech_neuron.router)\n\nif __name__ == \"__main__\":\n    uvicorn.run(app, host=\"0.0.0.0\", port=8000)\n\n```\n\nCreate a client file `client.py` with the following content:\n```py\nimport requests\nimport io\nimport sounddevice as sd\nimport soundfile as sf\nfrom datetime import datetime\n\nHOST = \"http://0.0.0.0:8000\" # <--- Change to your server IP\nurl = f\"{HOST}/node/speech\"\n\nstart = datetime.now()\nresponse = requests.get(\n    url,\n    params={\n        \"text\": \"\"\"Anyway, it was the Saturday of the football game with Saxon Hall. \n                   The game with Saxon Hall was supposed to be a very big deal around Pencey. \n        \"\"\",\n        \"voice\": \"af_bella\",\n        \"speed\": 1.1,\n        \"split_pattern\": r\"\\n+\",\n    },\n    stream=True,\n)\n\n\n# Read the streamed response into memory\naudio_buffer = io.BytesIO()\nfor chunk in response.iter_content(chunk_size=4096):\n    if chunk:\n        audio_buffer.write(chunk)\n\n# Play the audio in real-time\naudio_buffer.seek(0)  # Reset buffer for reading\ndata, samplerate = sf.read(audio_buffer)\nsd.play(data, samplerate)\nsd.wait()  # Wait for audio to finish playing\n\nprint(f\"Time taken: {datetime.now() - start}\")\n```\n\nRun:\n```sh\npython main.py &\n```\n\nAnd then run:\n```sh\npython client.py\n```\n<!-- end quick_start -->\n\n<!-- start config -->\n\n## Configuration\nCreate a `config.yaml` file with the following content:\n```yaml\nname: \"speech_node\"\n\n# \"kokoro-v1.0.fp16-gpu.onnx\",\n# \"kokoro-v1.0.fp16.onnx\",\n# \"kokoro-v1.0.int8.onnx\",\n# \"kokoro-v1.0.onnx\"\nmodel_name: kokoro-v1.0.int8.onnx\nvoices_name: voices-v1.0.bin\n\nresponse:\n  # TODO: type: stream\n  sample_rate: 24000\n  format: wav\n  compression_level: 0\n\npipeline:\n  model:\n  device: cpu # cpu or cuda\n  use_transformer: true\n\n  # Model configuration\n  # 'a' = American English\n  # 'b' = British English\n  # 'e' = Spanish\n  # 'f' = French\n  # 'h' = Hindi\n  # 'i' = Italian\n  # 'p' = Portuguese\n  # 'j' = Japanese\n  # 'z' = Chinese\n  language_code: en-us\n\n  # Request defaults\n  speed: 1.0 # Can be set during request\n  voice: \"af_heart\" # Can be set during request\n  split_pattern: \"\\n\" # Can be set during request\n```\n<!-- end config -->\n****\n\n## Dependencies\n\n### Linux\n\n\n#### Ubuntu\n```sh\nsudo apt update\nsudo apt install libglslang-dev\n```\n\n#### Manjaro\n```sh\nsudo pacman -S ffmpeg glslang\n\n# Check for version mismatch\nfind /usr -name \"libglslang-default-resource-limits.so*\"\n# If version mismatch\nsudo ln -s /usr/lib/libglslang-default-resource-limits.so.15 /usr/lib/libglslang-default-resource-limits.so.14\n\n# Check for version mismatch\nfind /usr -name \"libSPIRV.so*\"\n# If version mismatch\n\nsudo ldconfig\n```\n\nIf NVIDIA is not working:\n```sh\nsudo modprobe -r nvidia_uvm\nsudo modprobe nvidia_uvm\n```\n\n### MacOS\n```\nbrew install ffmpeg\nbrew install glslang\n```",
    "bugtrack_url": null,
    "license": null,
    "summary": null,
    "version": "0.0.5",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0b07a95e9972a82d8facb5626dfa3675c2dc7442701be1b83b03f76996ac4ade",
                "md5": "2b02d412e79e7137a62bd44b4b3d8df8",
                "sha256": "99527827affe3ca467694bd51e7c2cf3ee424532ad8267281bdda080e840340d"
            },
            "downloads": -1,
            "filename": "speech_neuron-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2b02d412e79e7137a62bd44b4b3d8df8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 16960,
            "upload_time": "2025-02-15T14:10:27",
            "upload_time_iso_8601": "2025-02-15T14:10:27.749069Z",
            "url": "https://files.pythonhosted.org/packages/0b/07/a95e9972a82d8facb5626dfa3675c2dc7442701be1b83b03f76996ac4ade/speech_neuron-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9c2d4b305517da55ae70b5a703026eb1fb132cd62c490ae925d64d8ff07679f7",
                "md5": "c117192b3d99b39ff973068ae541f12a",
                "sha256": "3cbc18ea75f15a5408971a80899c7b546f5affd0bb56794269fc2bc64c31dd29"
            },
            "downloads": -1,
            "filename": "speech_neuron-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "c117192b3d99b39ff973068ae541f12a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 16209,
            "upload_time": "2025-02-15T14:10:29",
            "upload_time_iso_8601": "2025-02-15T14:10:29.556503Z",
            "url": "https://files.pythonhosted.org/packages/9c/2d/4b305517da55ae70b5a703026eb1fb132cd62c490ae925d64d8ff07679f7/speech_neuron-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-15 14:10:29",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "speech_neuron"
}

C. Thomas Brittain