sonic-wrapper

Name	sonic-wrapper JSON
Version	0.1.5 JSON
	download
home_page	None
Summary	A simple wrapper for Cartesia Sonic TTS
upload_time	2024-12-25 15:28:21
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	None
keywords
VCS
bugtrack_url
requirements	cartesia tqdm loguru gradio python-dotenv
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Cartesia Sonic TTS Wrapper

**You need your own [API key](**https://play.cartesia.ai/keys**) to use demo.**

<a href="https://huggingface.co/spaces/daswer123/sonic-tts-webui"  style='padding-left: 0.5rem;'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-orange'></a> <a href='https://colab.research.google.com/drive/1-AWeWhPvCTBX0KfMtgtMk10uPU05ihoA?usp=sharing' style='padding-left: 0.5rem;'></a>


## About

A simple and powerful wrapper for the [Cartesia Sonic Text-to-Speech (TTS) API](https://www.cartesia.ai/sonic), providing an easy-to-use interface for generating speech from text in multiple languages with advanced features. The package includes:

- A Python library for developers.
- A Command-Line Interface (CLI) for terminal interaction.
- A Gradio web interface for user-friendly interaction.

**Note**: To use this wrapper, you need a valid API key from Cartesia. A subscription is required to access the Sonic TTS API. Visit [Cartesia Sonic](https://www.cartesia.ai/sonic) for more information.

## Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Getting Started](#getting-started)
  - [Setting Up the API Key](#setting-up-the-api-key)
- [Usage](#usage)
  - [As a Python Library](#as-a-python-library)
    - [Initializing the Voice Manager](#initializing-the-voice-manager)
    - [Voice Management](#voice-management)
    - [Text-to-Speech Generation](#text-to-speech-generation)
  - [Command-Line Interface (CLI)](#command-line-interface-cli)
    - [Commands and Usage](#commands-and-usage)
  - [Gradio Web Interface](#gradio-web-interface)
    - [Running the Interface](#running-the-interface)
    - [Online Demo](#online-demo)
- [Examples](#examples)
  - [Generating Speech with Emotions](#generating-speech-with-emotions)
  - [Creating and Using a Custom Voice](#creating-and-using-a-custom-voice)
- [Notes](#notes)
- [TODO](#todo)
- [License](#license)
- [Acknowledgments](#acknowledgments)
- [Contact](#contact)

## Features

- **Easy-to-use Python Wrapper**: Simplifies interaction with the Cartesia Sonic TTS API.
- **Text-to-Speech Generation**:
  - Supports multiple languages.
  - Speed control from very slow to very fast.
  - Emotion control with adjustable intensity.
  - Text improvement options for better TTS results.
- **Voice Management**:
  - List available voices with filtering options.
  - Create custom voices from audio files.
  - Get detailed information about voices.
- **Command-Line Interface (CLI)**: Interact with the TTS functionality via the terminal.
- **Gradio Web Interface**: User-friendly web application for interactive use.

## Installation

Install the `sonic-wrapper` package via pip:

```bash
pip install sonic-wrapper
```

**Note**: The package requires Python 3.9 or higher.

### Additional Dependencies for Gradio Interface

If you plan to use the Gradio web interface, install Gradio:

```bash
pip install gradio>=5.0.0
```

## Getting Started

### Setting Up the API Key

To use the Cartesia Sonic TTS API, you need a valid API key. Obtain an API key by subscribing to the service on the [Cartesia Sonic](https://www.cartesia.ai/sonic) website.

Once you have your API key, you can set it up:

- **Using the Python Library**: Provide the API key when initializing the `CartesiaVoiceManager`.
- **Using the CLI**: Set the API key using the `set-api-key` command.
- **Using the Gradio Interface**: Enter the API key in the provided field.

The API key is stored in a `.env` file for subsequent use.

## Usage

### As a Python Library

#### Initializing the Voice Manager

```python
from sonic_wrapper import CartesiaVoiceManager

# Initialize the manager with your API key
manager = CartesiaVoiceManager(api_key='your_api_key_here')
```

Alternatively, if you have set the `CARTESIA_API_KEY` environment variable or stored the API key in a `.env` file, you can initialize without passing the API key:

```python
manager = CartesiaVoiceManager()
```

#### Voice Management

**Listing Available Voices:**

```python
voices = manager.list_available_voices()
for voice in voices:
    print(f"ID: {voice['id']}, Name: {voice['name']}, Language: {voice['language']}")
```

**Filtering Voices by Language and Accessibility:**

```python
from sonic_wrapper import VoiceAccessibility

voices = manager.list_available_voices(
    languages=['en'],
    accessibility=VoiceAccessibility.ONLY_PUBLIC
)
```

**Getting Voice Information:**

```python
voice_info = manager.get_voice_info('voice_id')
print(voice_info)
```

**Creating a Custom Voice:**

```python
voice_id = manager.create_custom_voice(
    name='My Custom Voice',
    source='path/to/your_voice_sample.wav',
    language='en',
    description='This is a custom voice created from my own sample.'
)
```

#### Text-to-Speech Generation

**Setting the Voice:**

```python
manager.set_voice('voice_id')
```

**Adjusting Speed and Emotions:**

```python
# Set speech speed (-1.0 to 1.0)
manager.speed = 0.5  # Faster speech

# Set emotions
emotions = [
    {'name': 'positivity', 'level': 'high'},
    {'name': 'surprise', 'level': 'medium'}
]
manager.set_emotions(emotions)
```

**Generating Speech:**

```python
output_file = manager.speak(
    text='Hello, world!',
    output_file='output.wav'
)
print(f"Audio saved to {output_file}")
```

**Improving Text Before Synthesis:**

```python
from sonic_wrapper import improve_tts_text

text = 'Your raw text here.'
improved_text = improve_tts_text(text, language='en')
manager.speak(text=improved_text, output_file='improved_output.wav')
```

### Command-Line Interface (CLI)

The package includes a CLI tool for interacting with the TTS functionality directly from the terminal.

#### Commands and Usage

**Set API Key**

Set your Cartesia API key:

```bash
python -m sonic_wrapper.cli set-api-key your_api_key_here
```

**List Voices**

List all available voices:

```bash
python -m sonic_wrapper.cli list-voices
```

With filters:

```bash
python -m sonic_wrapper.cli list-voices --language en --accessibility api
```

**Generate Speech**

Generate speech from text using a specific voice:

```bash
python -m sonic_wrapper.cli generate-speech --text "Hello, world!" --voice "Voice Name or ID"
```

Additional options:

- **Specify Output File:**

  ```bash
  --output output.wav
  ```

- **Adjust Speech Speed:**

  ```bash
  --speed 0.5  # Speed ranges from -1.0 (slowest) to 1.0 (fastest)
  ```

- **Add Emotions:**

  ```bash
  --emotions "positivity:medium" "surprise:high"
  ```

  Valid emotions: `anger`, `positivity`, `surprise`, `sadness`, `curiosity`

  Valid intensities: `lowest`, `low`, `medium`, `high`, `highest`

**Create Custom Voice**

Create a custom voice from an audio file:

```bash
python -m sonic_wrapper.cli create-voice --name "My Custom Voice" --source path/to/audio.wav
```

### Gradio Web Interface

The Gradio interface provides a user-friendly web application for interacting with the TTS functionality.

#### Running the Interface

1. **Install Gradio** (if not already installed):

   ```bash
   pip install gradio>=5.0.0
   ```

2. **Run the Application**:

   ```bash
   python app.py
   ```

3. **Access the Web Interface**:

   Open the provided local URL in your web browser.

#### Online Demo

Try the Gradio interface online without installing anything:

[![Gradio Demo](https://img.shields.io/badge/Gradio-Demo-brightgreen)](https://huggingface.co/spaces/daswer123/sonic-tts-webui)

## Examples

### Generating Speech with Emotions

```bash
python -m sonic_wrapper.cli generate-speech \
  --text "I'm so excited to share this news with you!" \
  --voice "Enthusiastic Voice" \
  --emotions "positivity:high" "surprise:medium" \
  --speed 0.5 \
  --output excited_message.wav
```

### Creating and Using a Custom Voice

**Step 1: Create a Custom Voice**

```bash
python -m sonic_wrapper.cli create-voice \
  --name "Custom Voice" \
  --source path/to/your_voice_sample.wav \
  --description "A custom voice created from my own audio sample."
```

**Step 2: Generate Speech with the Custom Voice**

```bash
python -m sonic_wrapper.cli generate-speech \
  --text "This is my custom voice." \
  --voice "Custom Voice" \
  --output custom_voice_output.wav
```

## Notes

- **API Key**: A valid Cartesia API key is required to use this wrapper. Set your API key using the CLI or in your code. Visit [Cartesia Sonic](https://www.cartesia.ai/sonic) to obtain an API key.
- **Subscription**: Access to the Cartesia Sonic TTS API requires a subscription. Please refer to their [pricing page](https://www.cartesia.ai/sonic/pricing) for more details.
- **Voice Mixing**: Currently, voice mixing functionality is not available in the CLI and Gradio versions but is available in the Python library.
- **Voice Embeddings**: The wrapper handles voice embeddings for you, storing them locally for faster access.

## TODO

- [ ] Implement voice mixing functionality in Gradio interface and CLI.
- [ ] Enhance error handling and logging.
- [ ] Improve documentation with more examples and use cases.
- [ ] Add support for additional languages and voices as they become available.

## License

This project is licensed under the MIT License.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "sonic-wrapper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "daswer123 <daswerq123@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/5a/9c/0cf67c3dbbae732df3bfd04b50ebf57efb31c1497fec4d7e4ba2ca664b98/sonic_wrapper-0.1.5.tar.gz",
    "platform": null,
    "description": "# Cartesia Sonic TTS Wrapper\n\n**You need your own [API key](**https://play.cartesia.ai/keys**) to use demo.**\n\n<a href=\"https://huggingface.co/spaces/daswer123/sonic-tts-webui\"  style='padding-left: 0.5rem;'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-orange'></a> <a href='https://colab.research.google.com/drive/1-AWeWhPvCTBX0KfMtgtMk10uPU05ihoA?usp=sharing' style='padding-left: 0.5rem;'></a>\n\n\n## About\n\nA simple and powerful wrapper for the [Cartesia Sonic Text-to-Speech (TTS) API](https://www.cartesia.ai/sonic), providing an easy-to-use interface for generating speech from text in multiple languages with advanced features. The package includes:\n\n- A Python library for developers.\n- A Command-Line Interface (CLI) for terminal interaction.\n- A Gradio web interface for user-friendly interaction.\n\n**Note**: To use this wrapper, you need a valid API key from Cartesia. A subscription is required to access the Sonic TTS API. Visit [Cartesia Sonic](https://www.cartesia.ai/sonic) for more information.\n\n## Table of Contents\n\n- [Features](#features)\n- [Installation](#installation)\n- [Getting Started](#getting-started)\n  - [Setting Up the API Key](#setting-up-the-api-key)\n- [Usage](#usage)\n  - [As a Python Library](#as-a-python-library)\n    - [Initializing the Voice Manager](#initializing-the-voice-manager)\n    - [Voice Management](#voice-management)\n    - [Text-to-Speech Generation](#text-to-speech-generation)\n  - [Command-Line Interface (CLI)](#command-line-interface-cli)\n    - [Commands and Usage](#commands-and-usage)\n  - [Gradio Web Interface](#gradio-web-interface)\n    - [Running the Interface](#running-the-interface)\n    - [Online Demo](#online-demo)\n- [Examples](#examples)\n  - [Generating Speech with Emotions](#generating-speech-with-emotions)\n  - [Creating and Using a Custom Voice](#creating-and-using-a-custom-voice)\n- [Notes](#notes)\n- [TODO](#todo)\n- [License](#license)\n- [Acknowledgments](#acknowledgments)\n- [Contact](#contact)\n\n## Features\n\n- **Easy-to-use Python Wrapper**: Simplifies interaction with the Cartesia Sonic TTS API.\n- **Text-to-Speech Generation**:\n  - Supports multiple languages.\n  - Speed control from very slow to very fast.\n  - Emotion control with adjustable intensity.\n  - Text improvement options for better TTS results.\n- **Voice Management**:\n  - List available voices with filtering options.\n  - Create custom voices from audio files.\n  - Get detailed information about voices.\n- **Command-Line Interface (CLI)**: Interact with the TTS functionality via the terminal.\n- **Gradio Web Interface**: User-friendly web application for interactive use.\n\n## Installation\n\nInstall the `sonic-wrapper` package via pip:\n\n```bash\npip install sonic-wrapper\n```\n\n**Note**: The package requires Python 3.9 or higher.\n\n### Additional Dependencies for Gradio Interface\n\nIf you plan to use the Gradio web interface, install Gradio:\n\n```bash\npip install gradio>=5.0.0\n```\n\n## Getting Started\n\n### Setting Up the API Key\n\nTo use the Cartesia Sonic TTS API, you need a valid API key. Obtain an API key by subscribing to the service on the [Cartesia Sonic](https://www.cartesia.ai/sonic) website.\n\nOnce you have your API key, you can set it up:\n\n- **Using the Python Library**: Provide the API key when initializing the `CartesiaVoiceManager`.\n- **Using the CLI**: Set the API key using the `set-api-key` command.\n- **Using the Gradio Interface**: Enter the API key in the provided field.\n\nThe API key is stored in a `.env` file for subsequent use.\n\n## Usage\n\n### As a Python Library\n\n#### Initializing the Voice Manager\n\n```python\nfrom sonic_wrapper import CartesiaVoiceManager\n\n# Initialize the manager with your API key\nmanager = CartesiaVoiceManager(api_key='your_api_key_here')\n```\n\nAlternatively, if you have set the `CARTESIA_API_KEY` environment variable or stored the API key in a `.env` file, you can initialize without passing the API key:\n\n```python\nmanager = CartesiaVoiceManager()\n```\n\n#### Voice Management\n\n**Listing Available Voices:**\n\n```python\nvoices = manager.list_available_voices()\nfor voice in voices:\n    print(f\"ID: {voice['id']}, Name: {voice['name']}, Language: {voice['language']}\")\n```\n\n**Filtering Voices by Language and Accessibility:**\n\n```python\nfrom sonic_wrapper import VoiceAccessibility\n\nvoices = manager.list_available_voices(\n    languages=['en'],\n    accessibility=VoiceAccessibility.ONLY_PUBLIC\n)\n```\n\n**Getting Voice Information:**\n\n```python\nvoice_info = manager.get_voice_info('voice_id')\nprint(voice_info)\n```\n\n**Creating a Custom Voice:**\n\n```python\nvoice_id = manager.create_custom_voice(\n    name='My Custom Voice',\n    source='path/to/your_voice_sample.wav',\n    language='en',\n    description='This is a custom voice created from my own sample.'\n)\n```\n\n#### Text-to-Speech Generation\n\n**Setting the Voice:**\n\n```python\nmanager.set_voice('voice_id')\n```\n\n**Adjusting Speed and Emotions:**\n\n```python\n# Set speech speed (-1.0 to 1.0)\nmanager.speed = 0.5  # Faster speech\n\n# Set emotions\nemotions = [\n    {'name': 'positivity', 'level': 'high'},\n    {'name': 'surprise', 'level': 'medium'}\n]\nmanager.set_emotions(emotions)\n```\n\n**Generating Speech:**\n\n```python\noutput_file = manager.speak(\n    text='Hello, world!',\n    output_file='output.wav'\n)\nprint(f\"Audio saved to {output_file}\")\n```\n\n**Improving Text Before Synthesis:**\n\n```python\nfrom sonic_wrapper import improve_tts_text\n\ntext = 'Your raw text here.'\nimproved_text = improve_tts_text(text, language='en')\nmanager.speak(text=improved_text, output_file='improved_output.wav')\n```\n\n### Command-Line Interface (CLI)\n\nThe package includes a CLI tool for interacting with the TTS functionality directly from the terminal.\n\n#### Commands and Usage\n\n**Set API Key**\n\nSet your Cartesia API key:\n\n```bash\npython -m sonic_wrapper.cli set-api-key your_api_key_here\n```\n\n**List Voices**\n\nList all available voices:\n\n```bash\npython -m sonic_wrapper.cli list-voices\n```\n\nWith filters:\n\n```bash\npython -m sonic_wrapper.cli list-voices --language en --accessibility api\n```\n\n**Generate Speech**\n\nGenerate speech from text using a specific voice:\n\n```bash\npython -m sonic_wrapper.cli generate-speech --text \"Hello, world!\" --voice \"Voice Name or ID\"\n```\n\nAdditional options:\n\n- **Specify Output File:**\n\n  ```bash\n  --output output.wav\n  ```\n\n- **Adjust Speech Speed:**\n\n  ```bash\n  --speed 0.5  # Speed ranges from -1.0 (slowest) to 1.0 (fastest)\n  ```\n\n- **Add Emotions:**\n\n  ```bash\n  --emotions \"positivity:medium\" \"surprise:high\"\n  ```\n\n  Valid emotions: `anger`, `positivity`, `surprise`, `sadness`, `curiosity`\n\n  Valid intensities: `lowest`, `low`, `medium`, `high`, `highest`\n\n**Create Custom Voice**\n\nCreate a custom voice from an audio file:\n\n```bash\npython -m sonic_wrapper.cli create-voice --name \"My Custom Voice\" --source path/to/audio.wav\n```\n\n### Gradio Web Interface\n\nThe Gradio interface provides a user-friendly web application for interacting with the TTS functionality.\n\n#### Running the Interface\n\n1. **Install Gradio** (if not already installed):\n\n   ```bash\n   pip install gradio>=5.0.0\n   ```\n\n2. **Run the Application**:\n\n   ```bash\n   python app.py\n   ```\n\n3. **Access the Web Interface**:\n\n   Open the provided local URL in your web browser.\n\n#### Online Demo\n\nTry the Gradio interface online without installing anything:\n\n[![Gradio Demo](https://img.shields.io/badge/Gradio-Demo-brightgreen)](https://huggingface.co/spaces/daswer123/sonic-tts-webui)\n\n## Examples\n\n### Generating Speech with Emotions\n\n```bash\npython -m sonic_wrapper.cli generate-speech \\\n  --text \"I'm so excited to share this news with you!\" \\\n  --voice \"Enthusiastic Voice\" \\\n  --emotions \"positivity:high\" \"surprise:medium\" \\\n  --speed 0.5 \\\n  --output excited_message.wav\n```\n\n### Creating and Using a Custom Voice\n\n**Step 1: Create a Custom Voice**\n\n```bash\npython -m sonic_wrapper.cli create-voice \\\n  --name \"Custom Voice\" \\\n  --source path/to/your_voice_sample.wav \\\n  --description \"A custom voice created from my own audio sample.\"\n```\n\n**Step 2: Generate Speech with the Custom Voice**\n\n```bash\npython -m sonic_wrapper.cli generate-speech \\\n  --text \"This is my custom voice.\" \\\n  --voice \"Custom Voice\" \\\n  --output custom_voice_output.wav\n```\n\n## Notes\n\n- **API Key**: A valid Cartesia API key is required to use this wrapper. Set your API key using the CLI or in your code. Visit [Cartesia Sonic](https://www.cartesia.ai/sonic) to obtain an API key.\n- **Subscription**: Access to the Cartesia Sonic TTS API requires a subscription. Please refer to their [pricing page](https://www.cartesia.ai/sonic/pricing) for more details.\n- **Voice Mixing**: Currently, voice mixing functionality is not available in the CLI and Gradio versions but is available in the Python library.\n- **Voice Embeddings**: The wrapper handles voice embeddings for you, storing them locally for faster access.\n\n## TODO\n\n- [ ] Implement voice mixing functionality in Gradio interface and CLI.\n- [ ] Enhance error handling and logging.\n- [ ] Improve documentation with more examples and use cases.\n- [ ] Add support for additional languages and voices as they become available.\n\n## License\n\nThis project is licensed under the MIT License.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A simple wrapper for Cartesia Sonic TTS",
    "version": "0.1.5",
    "project_urls": {
        "Bug Tracker": "https://github.com/daswer123/sonic_tts_api_wrapper/issues",
        "Homepage": "https://github.com/daswer123/sonic_tts_api_wrapper"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "84a3a62800d7bd5111c1a1966ade1fdea961d42112d4a337d418f0d306c896f7",
                "md5": "070c2cf50a21ea1a52e7f9d0caa353d3",
                "sha256": "43178d49f071c7a93e55998c141ccf709c8fe07df8fcac17364a6a10f7431cfe"
            },
            "downloads": -1,
            "filename": "sonic_wrapper-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "070c2cf50a21ea1a52e7f9d0caa353d3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 21077,
            "upload_time": "2024-12-25T15:28:18",
            "upload_time_iso_8601": "2024-12-25T15:28:18.929107Z",
            "url": "https://files.pythonhosted.org/packages/84/a3/a62800d7bd5111c1a1966ade1fdea961d42112d4a337d418f0d306c896f7/sonic_wrapper-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5a9c0cf67c3dbbae732df3bfd04b50ebf57efb31c1497fec4d7e4ba2ca664b98",
                "md5": "df7e4fe99207f34daf26c116759fd0e5",
                "sha256": "14d1afcfb4758a5bba5e892a118ee6535dd930a305fee1abddec641b481625eb"
            },
            "downloads": -1,
            "filename": "sonic_wrapper-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "df7e4fe99207f34daf26c116759fd0e5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 20349,
            "upload_time": "2024-12-25T15:28:21",
            "upload_time_iso_8601": "2024-12-25T15:28:21.303191Z",
            "url": "https://files.pythonhosted.org/packages/5a/9c/0cf67c3dbbae732df3bfd04b50ebf57efb31c1497fec4d7e4ba2ca664b98/sonic_wrapper-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-25 15:28:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "daswer123",
    "github_project": "sonic_tts_api_wrapper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "cartesia",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "loguru",
            "specs": []
        },
        {
            "name": "gradio",
            "specs": [
                [
                    ">=",
                    "5.0.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": []
        }
    ],
    "lcname": "sonic-wrapper"
}

None