# Cartesia Sonic TTS Wrapper
**You need your own [API key](**https://play.cartesia.ai/keys**) to use demo.**
<a href="https://huggingface.co/spaces/daswer123/sonic-tts-webui" style='padding-left: 0.5rem;'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-orange'></a> <a href='https://colab.research.google.com/drive/1-AWeWhPvCTBX0KfMtgtMk10uPU05ihoA?usp=sharing' style='padding-left: 0.5rem;'></a>
## About
A simple and powerful wrapper for the [Cartesia Sonic Text-to-Speech (TTS) API](https://www.cartesia.ai/sonic), providing an easy-to-use interface for generating speech from text in multiple languages with advanced features. The package includes:
- A Python library for developers.
- A Command-Line Interface (CLI) for terminal interaction.
- A Gradio web interface for user-friendly interaction.
**Note**: To use this wrapper, you need a valid API key from Cartesia. A subscription is required to access the Sonic TTS API. Visit [Cartesia Sonic](https://www.cartesia.ai/sonic) for more information.
## Table of Contents
- [Features](#features)
- [Installation](#installation)
- [Getting Started](#getting-started)
- [Setting Up the API Key](#setting-up-the-api-key)
- [Usage](#usage)
- [As a Python Library](#as-a-python-library)
- [Initializing the Voice Manager](#initializing-the-voice-manager)
- [Voice Management](#voice-management)
- [Text-to-Speech Generation](#text-to-speech-generation)
- [Command-Line Interface (CLI)](#command-line-interface-cli)
- [Commands and Usage](#commands-and-usage)
- [Gradio Web Interface](#gradio-web-interface)
- [Running the Interface](#running-the-interface)
- [Online Demo](#online-demo)
- [Examples](#examples)
- [Generating Speech with Emotions](#generating-speech-with-emotions)
- [Creating and Using a Custom Voice](#creating-and-using-a-custom-voice)
- [Notes](#notes)
- [TODO](#todo)
- [License](#license)
- [Acknowledgments](#acknowledgments)
- [Contact](#contact)
## Features
- **Easy-to-use Python Wrapper**: Simplifies interaction with the Cartesia Sonic TTS API.
- **Text-to-Speech Generation**:
- Supports multiple languages.
- Speed control from very slow to very fast.
- Emotion control with adjustable intensity.
- Text improvement options for better TTS results.
- **Voice Management**:
- List available voices with filtering options.
- Create custom voices from audio files.
- Get detailed information about voices.
- **Command-Line Interface (CLI)**: Interact with the TTS functionality via the terminal.
- **Gradio Web Interface**: User-friendly web application for interactive use.
## Installation
Install the `sonic-wrapper` package via pip:
```bash
pip install sonic-wrapper
```
**Note**: The package requires Python 3.9 or higher.
### Additional Dependencies for Gradio Interface
If you plan to use the Gradio web interface, install Gradio:
```bash
pip install gradio>=5.0.0
```
## Getting Started
### Setting Up the API Key
To use the Cartesia Sonic TTS API, you need a valid API key. Obtain an API key by subscribing to the service on the [Cartesia Sonic](https://www.cartesia.ai/sonic) website.
Once you have your API key, you can set it up:
- **Using the Python Library**: Provide the API key when initializing the `CartesiaVoiceManager`.
- **Using the CLI**: Set the API key using the `set-api-key` command.
- **Using the Gradio Interface**: Enter the API key in the provided field.
The API key is stored in a `.env` file for subsequent use.
## Usage
### As a Python Library
#### Initializing the Voice Manager
```python
from sonic_wrapper import CartesiaVoiceManager
# Initialize the manager with your API key
manager = CartesiaVoiceManager(api_key='your_api_key_here')
```
Alternatively, if you have set the `CARTESIA_API_KEY` environment variable or stored the API key in a `.env` file, you can initialize without passing the API key:
```python
manager = CartesiaVoiceManager()
```
#### Voice Management
**Listing Available Voices:**
```python
voices = manager.list_available_voices()
for voice in voices:
print(f"ID: {voice['id']}, Name: {voice['name']}, Language: {voice['language']}")
```
**Filtering Voices by Language and Accessibility:**
```python
from sonic_wrapper import VoiceAccessibility
voices = manager.list_available_voices(
languages=['en'],
accessibility=VoiceAccessibility.ONLY_PUBLIC
)
```
**Getting Voice Information:**
```python
voice_info = manager.get_voice_info('voice_id')
print(voice_info)
```
**Creating a Custom Voice:**
```python
voice_id = manager.create_custom_voice(
name='My Custom Voice',
source='path/to/your_voice_sample.wav',
language='en',
description='This is a custom voice created from my own sample.'
)
```
#### Text-to-Speech Generation
**Setting the Voice:**
```python
manager.set_voice('voice_id')
```
**Adjusting Speed and Emotions:**
```python
# Set speech speed (-1.0 to 1.0)
manager.speed = 0.5 # Faster speech
# Set emotions
emotions = [
{'name': 'positivity', 'level': 'high'},
{'name': 'surprise', 'level': 'medium'}
]
manager.set_emotions(emotions)
```
**Generating Speech:**
```python
output_file = manager.speak(
text='Hello, world!',
output_file='output.wav'
)
print(f"Audio saved to {output_file}")
```
**Improving Text Before Synthesis:**
```python
from sonic_wrapper import improve_tts_text
text = 'Your raw text here.'
improved_text = improve_tts_text(text, language='en')
manager.speak(text=improved_text, output_file='improved_output.wav')
```
### Command-Line Interface (CLI)
The package includes a CLI tool for interacting with the TTS functionality directly from the terminal.
#### Commands and Usage
**Set API Key**
Set your Cartesia API key:
```bash
python -m sonic_wrapper.cli set-api-key your_api_key_here
```
**List Voices**
List all available voices:
```bash
python -m sonic_wrapper.cli list-voices
```
With filters:
```bash
python -m sonic_wrapper.cli list-voices --language en --accessibility api
```
**Generate Speech**
Generate speech from text using a specific voice:
```bash
python -m sonic_wrapper.cli generate-speech --text "Hello, world!" --voice "Voice Name or ID"
```
Additional options:
- **Specify Output File:**
```bash
--output output.wav
```
- **Adjust Speech Speed:**
```bash
--speed 0.5 # Speed ranges from -1.0 (slowest) to 1.0 (fastest)
```
- **Add Emotions:**
```bash
--emotions "positivity:medium" "surprise:high"
```
Valid emotions: `anger`, `positivity`, `surprise`, `sadness`, `curiosity`
Valid intensities: `lowest`, `low`, `medium`, `high`, `highest`
**Create Custom Voice**
Create a custom voice from an audio file:
```bash
python -m sonic_wrapper.cli create-voice --name "My Custom Voice" --source path/to/audio.wav
```
### Gradio Web Interface
The Gradio interface provides a user-friendly web application for interacting with the TTS functionality.
#### Running the Interface
1. **Install Gradio** (if not already installed):
```bash
pip install gradio>=5.0.0
```
2. **Run the Application**:
```bash
python app.py
```
3. **Access the Web Interface**:
Open the provided local URL in your web browser.
#### Online Demo
Try the Gradio interface online without installing anything:
[](https://huggingface.co/spaces/daswer123/sonic-tts-webui)
## Examples
### Generating Speech with Emotions
```bash
python -m sonic_wrapper.cli generate-speech \
--text "I'm so excited to share this news with you!" \
--voice "Enthusiastic Voice" \
--emotions "positivity:high" "surprise:medium" \
--speed 0.5 \
--output excited_message.wav
```
### Creating and Using a Custom Voice
**Step 1: Create a Custom Voice**
```bash
python -m sonic_wrapper.cli create-voice \
--name "Custom Voice" \
--source path/to/your_voice_sample.wav \
--description "A custom voice created from my own audio sample."
```
**Step 2: Generate Speech with the Custom Voice**
```bash
python -m sonic_wrapper.cli generate-speech \
--text "This is my custom voice." \
--voice "Custom Voice" \
--output custom_voice_output.wav
```
## Notes
- **API Key**: A valid Cartesia API key is required to use this wrapper. Set your API key using the CLI or in your code. Visit [Cartesia Sonic](https://www.cartesia.ai/sonic) to obtain an API key.
- **Subscription**: Access to the Cartesia Sonic TTS API requires a subscription. Please refer to their [pricing page](https://www.cartesia.ai/sonic/pricing) for more details.
- **Voice Mixing**: Currently, voice mixing functionality is not available in the CLI and Gradio versions but is available in the Python library.
- **Voice Embeddings**: The wrapper handles voice embeddings for you, storing them locally for faster access.
## TODO
- [ ] Implement voice mixing functionality in Gradio interface and CLI.
- [ ] Enhance error handling and logging.
- [ ] Improve documentation with more examples and use cases.
- [ ] Add support for additional languages and voices as they become available.
## License
This project is licensed under the MIT License.
Raw data
{
"_id": null,
"home_page": null,
"name": "sonic-wrapper",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "daswer123 <daswerq123@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/5a/9c/0cf67c3dbbae732df3bfd04b50ebf57efb31c1497fec4d7e4ba2ca664b98/sonic_wrapper-0.1.5.tar.gz",
"platform": null,
"description": "# Cartesia Sonic TTS Wrapper\n\n**You need your own [API key](**https://play.cartesia.ai/keys**) to use demo.**\n\n<a href=\"https://huggingface.co/spaces/daswer123/sonic-tts-webui\" style='padding-left: 0.5rem;'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-orange'></a> <a href='https://colab.research.google.com/drive/1-AWeWhPvCTBX0KfMtgtMk10uPU05ihoA?usp=sharing' style='padding-left: 0.5rem;'></a>\n\n\n## About\n\nA simple and powerful wrapper for the [Cartesia Sonic Text-to-Speech (TTS) API](https://www.cartesia.ai/sonic), providing an easy-to-use interface for generating speech from text in multiple languages with advanced features. The package includes:\n\n- A Python library for developers.\n- A Command-Line Interface (CLI) for terminal interaction.\n- A Gradio web interface for user-friendly interaction.\n\n**Note**: To use this wrapper, you need a valid API key from Cartesia. A subscription is required to access the Sonic TTS API. Visit [Cartesia Sonic](https://www.cartesia.ai/sonic) for more information.\n\n## Table of Contents\n\n- [Features](#features)\n- [Installation](#installation)\n- [Getting Started](#getting-started)\n - [Setting Up the API Key](#setting-up-the-api-key)\n- [Usage](#usage)\n - [As a Python Library](#as-a-python-library)\n - [Initializing the Voice Manager](#initializing-the-voice-manager)\n - [Voice Management](#voice-management)\n - [Text-to-Speech Generation](#text-to-speech-generation)\n - [Command-Line Interface (CLI)](#command-line-interface-cli)\n - [Commands and Usage](#commands-and-usage)\n - [Gradio Web Interface](#gradio-web-interface)\n - [Running the Interface](#running-the-interface)\n - [Online Demo](#online-demo)\n- [Examples](#examples)\n - [Generating Speech with Emotions](#generating-speech-with-emotions)\n - [Creating and Using a Custom Voice](#creating-and-using-a-custom-voice)\n- [Notes](#notes)\n- [TODO](#todo)\n- [License](#license)\n- [Acknowledgments](#acknowledgments)\n- [Contact](#contact)\n\n## Features\n\n- **Easy-to-use Python Wrapper**: Simplifies interaction with the Cartesia Sonic TTS API.\n- **Text-to-Speech Generation**:\n - Supports multiple languages.\n - Speed control from very slow to very fast.\n - Emotion control with adjustable intensity.\n - Text improvement options for better TTS results.\n- **Voice Management**:\n - List available voices with filtering options.\n - Create custom voices from audio files.\n - Get detailed information about voices.\n- **Command-Line Interface (CLI)**: Interact with the TTS functionality via the terminal.\n- **Gradio Web Interface**: User-friendly web application for interactive use.\n\n## Installation\n\nInstall the `sonic-wrapper` package via pip:\n\n```bash\npip install sonic-wrapper\n```\n\n**Note**: The package requires Python 3.9 or higher.\n\n### Additional Dependencies for Gradio Interface\n\nIf you plan to use the Gradio web interface, install Gradio:\n\n```bash\npip install gradio>=5.0.0\n```\n\n## Getting Started\n\n### Setting Up the API Key\n\nTo use the Cartesia Sonic TTS API, you need a valid API key. Obtain an API key by subscribing to the service on the [Cartesia Sonic](https://www.cartesia.ai/sonic) website.\n\nOnce you have your API key, you can set it up:\n\n- **Using the Python Library**: Provide the API key when initializing the `CartesiaVoiceManager`.\n- **Using the CLI**: Set the API key using the `set-api-key` command.\n- **Using the Gradio Interface**: Enter the API key in the provided field.\n\nThe API key is stored in a `.env` file for subsequent use.\n\n## Usage\n\n### As a Python Library\n\n#### Initializing the Voice Manager\n\n```python\nfrom sonic_wrapper import CartesiaVoiceManager\n\n# Initialize the manager with your API key\nmanager = CartesiaVoiceManager(api_key='your_api_key_here')\n```\n\nAlternatively, if you have set the `CARTESIA_API_KEY` environment variable or stored the API key in a `.env` file, you can initialize without passing the API key:\n\n```python\nmanager = CartesiaVoiceManager()\n```\n\n#### Voice Management\n\n**Listing Available Voices:**\n\n```python\nvoices = manager.list_available_voices()\nfor voice in voices:\n print(f\"ID: {voice['id']}, Name: {voice['name']}, Language: {voice['language']}\")\n```\n\n**Filtering Voices by Language and Accessibility:**\n\n```python\nfrom sonic_wrapper import VoiceAccessibility\n\nvoices = manager.list_available_voices(\n languages=['en'],\n accessibility=VoiceAccessibility.ONLY_PUBLIC\n)\n```\n\n**Getting Voice Information:**\n\n```python\nvoice_info = manager.get_voice_info('voice_id')\nprint(voice_info)\n```\n\n**Creating a Custom Voice:**\n\n```python\nvoice_id = manager.create_custom_voice(\n name='My Custom Voice',\n source='path/to/your_voice_sample.wav',\n language='en',\n description='This is a custom voice created from my own sample.'\n)\n```\n\n#### Text-to-Speech Generation\n\n**Setting the Voice:**\n\n```python\nmanager.set_voice('voice_id')\n```\n\n**Adjusting Speed and Emotions:**\n\n```python\n# Set speech speed (-1.0 to 1.0)\nmanager.speed = 0.5 # Faster speech\n\n# Set emotions\nemotions = [\n {'name': 'positivity', 'level': 'high'},\n {'name': 'surprise', 'level': 'medium'}\n]\nmanager.set_emotions(emotions)\n```\n\n**Generating Speech:**\n\n```python\noutput_file = manager.speak(\n text='Hello, world!',\n output_file='output.wav'\n)\nprint(f\"Audio saved to {output_file}\")\n```\n\n**Improving Text Before Synthesis:**\n\n```python\nfrom sonic_wrapper import improve_tts_text\n\ntext = 'Your raw text here.'\nimproved_text = improve_tts_text(text, language='en')\nmanager.speak(text=improved_text, output_file='improved_output.wav')\n```\n\n### Command-Line Interface (CLI)\n\nThe package includes a CLI tool for interacting with the TTS functionality directly from the terminal.\n\n#### Commands and Usage\n\n**Set API Key**\n\nSet your Cartesia API key:\n\n```bash\npython -m sonic_wrapper.cli set-api-key your_api_key_here\n```\n\n**List Voices**\n\nList all available voices:\n\n```bash\npython -m sonic_wrapper.cli list-voices\n```\n\nWith filters:\n\n```bash\npython -m sonic_wrapper.cli list-voices --language en --accessibility api\n```\n\n**Generate Speech**\n\nGenerate speech from text using a specific voice:\n\n```bash\npython -m sonic_wrapper.cli generate-speech --text \"Hello, world!\" --voice \"Voice Name or ID\"\n```\n\nAdditional options:\n\n- **Specify Output File:**\n\n ```bash\n --output output.wav\n ```\n\n- **Adjust Speech Speed:**\n\n ```bash\n --speed 0.5 # Speed ranges from -1.0 (slowest) to 1.0 (fastest)\n ```\n\n- **Add Emotions:**\n\n ```bash\n --emotions \"positivity:medium\" \"surprise:high\"\n ```\n\n Valid emotions: `anger`, `positivity`, `surprise`, `sadness`, `curiosity`\n\n Valid intensities: `lowest`, `low`, `medium`, `high`, `highest`\n\n**Create Custom Voice**\n\nCreate a custom voice from an audio file:\n\n```bash\npython -m sonic_wrapper.cli create-voice --name \"My Custom Voice\" --source path/to/audio.wav\n```\n\n### Gradio Web Interface\n\nThe Gradio interface provides a user-friendly web application for interacting with the TTS functionality.\n\n#### Running the Interface\n\n1. **Install Gradio** (if not already installed):\n\n ```bash\n pip install gradio>=5.0.0\n ```\n\n2. **Run the Application**:\n\n ```bash\n python app.py\n ```\n\n3. **Access the Web Interface**:\n\n Open the provided local URL in your web browser.\n\n#### Online Demo\n\nTry the Gradio interface online without installing anything:\n\n[](https://huggingface.co/spaces/daswer123/sonic-tts-webui)\n\n## Examples\n\n### Generating Speech with Emotions\n\n```bash\npython -m sonic_wrapper.cli generate-speech \\\n --text \"I'm so excited to share this news with you!\" \\\n --voice \"Enthusiastic Voice\" \\\n --emotions \"positivity:high\" \"surprise:medium\" \\\n --speed 0.5 \\\n --output excited_message.wav\n```\n\n### Creating and Using a Custom Voice\n\n**Step 1: Create a Custom Voice**\n\n```bash\npython -m sonic_wrapper.cli create-voice \\\n --name \"Custom Voice\" \\\n --source path/to/your_voice_sample.wav \\\n --description \"A custom voice created from my own audio sample.\"\n```\n\n**Step 2: Generate Speech with the Custom Voice**\n\n```bash\npython -m sonic_wrapper.cli generate-speech \\\n --text \"This is my custom voice.\" \\\n --voice \"Custom Voice\" \\\n --output custom_voice_output.wav\n```\n\n## Notes\n\n- **API Key**: A valid Cartesia API key is required to use this wrapper. Set your API key using the CLI or in your code. Visit [Cartesia Sonic](https://www.cartesia.ai/sonic) to obtain an API key.\n- **Subscription**: Access to the Cartesia Sonic TTS API requires a subscription. Please refer to their [pricing page](https://www.cartesia.ai/sonic/pricing) for more details.\n- **Voice Mixing**: Currently, voice mixing functionality is not available in the CLI and Gradio versions but is available in the Python library.\n- **Voice Embeddings**: The wrapper handles voice embeddings for you, storing them locally for faster access.\n\n## TODO\n\n- [ ] Implement voice mixing functionality in Gradio interface and CLI.\n- [ ] Enhance error handling and logging.\n- [ ] Improve documentation with more examples and use cases.\n- [ ] Add support for additional languages and voices as they become available.\n\n## License\n\nThis project is licensed under the MIT License.\n",
"bugtrack_url": null,
"license": null,
"summary": "A simple wrapper for Cartesia Sonic TTS",
"version": "0.1.5",
"project_urls": {
"Bug Tracker": "https://github.com/daswer123/sonic_tts_api_wrapper/issues",
"Homepage": "https://github.com/daswer123/sonic_tts_api_wrapper"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "84a3a62800d7bd5111c1a1966ade1fdea961d42112d4a337d418f0d306c896f7",
"md5": "070c2cf50a21ea1a52e7f9d0caa353d3",
"sha256": "43178d49f071c7a93e55998c141ccf709c8fe07df8fcac17364a6a10f7431cfe"
},
"downloads": -1,
"filename": "sonic_wrapper-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "070c2cf50a21ea1a52e7f9d0caa353d3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 21077,
"upload_time": "2024-12-25T15:28:18",
"upload_time_iso_8601": "2024-12-25T15:28:18.929107Z",
"url": "https://files.pythonhosted.org/packages/84/a3/a62800d7bd5111c1a1966ade1fdea961d42112d4a337d418f0d306c896f7/sonic_wrapper-0.1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5a9c0cf67c3dbbae732df3bfd04b50ebf57efb31c1497fec4d7e4ba2ca664b98",
"md5": "df7e4fe99207f34daf26c116759fd0e5",
"sha256": "14d1afcfb4758a5bba5e892a118ee6535dd930a305fee1abddec641b481625eb"
},
"downloads": -1,
"filename": "sonic_wrapper-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "df7e4fe99207f34daf26c116759fd0e5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 20349,
"upload_time": "2024-12-25T15:28:21",
"upload_time_iso_8601": "2024-12-25T15:28:21.303191Z",
"url": "https://files.pythonhosted.org/packages/5a/9c/0cf67c3dbbae732df3bfd04b50ebf57efb31c1497fec4d7e4ba2ca664b98/sonic_wrapper-0.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-25 15:28:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "daswer123",
"github_project": "sonic_tts_api_wrapper",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "cartesia",
"specs": []
},
{
"name": "tqdm",
"specs": []
},
{
"name": "loguru",
"specs": []
},
{
"name": "gradio",
"specs": [
[
">=",
"5.0.0"
]
]
},
{
"name": "python-dotenv",
"specs": []
}
],
"lcname": "sonic-wrapper"
}