# Yapp Python SDK
The official Python SDK for the Yapp Voice AI API. Easily integrate speech-to-text (STT) and text-to-speech (TTS) capabilities into your Python applications.
## Features
- **Simple, intuitive API** - Clean, Pythonic interface
- **Multiple TTS models** - Kokoro-82M and ResembleAI Chatterbox
- **Model-specific parameters** - Each model has its own unique parameters with validation
- **Voice cloning** with ResembleAI's audio prompt feature
- **Flexible audio handling** - Works with file paths, file objects, or raw bytes
- **Type hints** for better IDE support
- **Comprehensive error handling**
## Available Models
### Speech-to-Text (STT)
| Model | Parameters | Description |
|-------|-----------|-------------|
| **Parakeet** | `file` | Audio file to transcribe (required, positional) |
| | `start_time` | Start time in seconds for transcription window (optional) |
| | `end_time` | End time in seconds for transcription window (optional) |
**Example:**
```python
# Basic usage
client.speech_to_text.convert("parakeet", "audio.wav")
# With time window
client.speech_to_text.convert("parakeet", "audio.wav", start_time=3.0, end_time=9.0)
```
### Text-to-Speech (TTS)
| Model | Parameters | Description |
|-------|-----------|-------------|
| **Kokoro** | `voice` | Voice ID (default: "af_bella") |
| | `speed` | Speech speed multiplier: 0.5-2.0 (default: 1.0) |
| **ResembleAI** | `audio_prompt` | Reference audio file for voice cloning (optional) |
| | `exaggeration` | Emotional intensity: 0.0-1.0 (default: 0.5) |
| | `cfg_weight` | Adherence to reference voice: 0.0-1.0 (default: 0.5) |
## Installation
Install from PyPI:
```bash
pip install yapp
```
Or install from source:
```bash
git clone https://github.com/yourusername/yapp-sdk.git
cd yapp-sdk
pip install -e .
```
## Quick Start
```python
import yapp
# Initialize the client
client = yapp.Yapp(api_key="your-api-key")
# Transcribe audio to text
transcription = client.speech_to_text.convert("parakeet", "audio.wav")
print(transcription.text)
# Generate speech from text
response = client.text_to_speech.convert("kokoro", "Hello world!")
response.save("output.wav")
```
## Authentication
You can provide your API key in two ways:
### 1. Pass it directly to the client:
```python
client = yapp.Yapp(api_key="your-api-key")
```
### 2. Set it as an environment variable:
```bash
export YAPP_API_KEY="your-api-key"
```
```python
client = yapp.Yapp() # Will use YAPP_API_KEY from environment
```
## Usage Examples
### Speech-to-Text (Transcription)
#### Basic Transcription
The SDK uses the **Parakeet** multilingual STT model for transcription.
```python
import yapp
client = yapp.Yapp(api_key="your-api-key")
# Transcribe an entire audio file using Parakeet
response = client.speech_to_text.convert("parakeet", "audio.wav")
print(response.text)
```
#### Transcription with Time Window
```python
# Transcribe only a specific time range
response = client.speech_to_text.convert(
"parakeet", # Model (positional)
"audio.wav", # File (positional)
start_time=3.0, # Start at 3 seconds
end_time=9.0 # End at 9 seconds
)
print(response.text)
```
#### Multiple Audio Formats
The SDK automatically handles various audio formats:
```python
# From file path (string)
response = client.speech_to_text.convert("parakeet", "audio.wav")
# From pathlib.Path
from pathlib import Path
response = client.speech_to_text.convert("parakeet", Path("audio.mp3"))
# From file object
with open("audio.wav", "rb") as f:
response = client.speech_to_text.convert("parakeet", f)
# From bytes
audio_bytes = open("audio.wav", "rb").read()
response = client.speech_to_text.convert("parakeet", audio_bytes)
```
### Text-to-Speech (Synthesis)
#### Using Kokoro-82M Model
Kokoro parameters: `voice`, `speed`
```python
import yapp
client = yapp.Yapp(api_key="your-api-key")
# Simple synthesis (uses defaults)
response = client.text_to_speech.convert(
"kokoro", # Model first
"Hello world!"
)
response.save("output.wav")
# With custom voice and speed
response = client.text_to_speech.convert(
"kokoro", # Model first
"The quick brown fox jumps over the lazy dog.",
voice="af_bella", # Kokoro parameter
speed=1.2 # Kokoro parameter - 20% faster
)
response.save("output_fast.wav")
# Or use the kokoro() method directly
response = client.text_to_speech.kokoro(
text="Direct method call",
voice="af_bella",
speed=0.8 # 20% slower
)
response.save("output_slow.wav")
```
#### Using ResembleAI Model (with Voice Cloning)
ResembleAI parameters: `audio_prompt`, `exaggeration`, `cfg_weight`
```python
# Simple generation
response = client.text_to_speech.convert(
"resemble", # Model first
"Hello world!",
exaggeration=0.5, # Emotional intensity (0.0 - 1.0)
cfg_weight=0.5 # Adherence to reference voice (0.0 - 1.0)
)
response.save("output.wav")
# Voice cloning with audio prompt
response = client.text_to_speech.convert(
"resemble", # Model first
"This should sound like the reference voice.",
audio_prompt="reference_voice.wav", # Reference audio for cloning
cfg_weight=0.9 # High adherence to reference
)
response.save("cloned_voice.wav")
# Highly expressive speech
response = client.text_to_speech.convert(
"resemble", # Model first
"Wow! This is amazing!",
exaggeration=0.9, # High emotional intensity
cfg_weight=0.5
)
response.save("expressive.wav")
# Or use the resemble() method directly
response = client.text_to_speech.resemble(
text="Direct method call",
audio_prompt="reference.wav",
exaggeration=0.7,
cfg_weight=0.8
)
response.save("output.wav")
```
#### Discovering Model Parameters
The SDK provides methods to discover what parameters are available for each TTS model:
```python
# Parakeet (STT) parameters
# Model: parakeet
# Parameters:
# - file (required): Audio file to transcribe
# - start_time (optional): Start time in seconds
# - end_time (optional): End time in seconds
client.speech_to_text.convert("parakeet", "audio.wav", start_time=0, end_time=10)
# List all available TTS models
models = client.text_to_speech.list_models()
print(models) # ['kokoro', 'resemble']
# Print help for a specific TTS model
client.text_to_speech.print_model_help("kokoro")
# Output:
# Model: kokoro
# Parameters:
# - voice (str, default='af_bella'): Voice to use for synthesis
# - speed (float, default=1.0): Speech speed multiplier (0.5 = half speed, 2.0 = double speed)
client.text_to_speech.print_model_help("resemble")
# Output:
# Model: resemble
# Parameters:
# - audio_prompt (AudioFile, default=None): Reference audio file for voice cloning (optional)
# - exaggeration (float, default=0.5): Emotional intensity, range 0.0-1.0
# - cfg_weight (float, default=0.5): Adherence to reference voice, range 0.0-1.0
# Get parameter specifications programmatically
params = client.text_to_speech.get_model_parameters("kokoro")
for param_name, param_info in params.items():
print(f"{param_name}: {param_info['description']}")
```
#### Parameter Validation
The SDK validates that you're using the correct parameters for each model:
```python
# This works - correct Kokoro parameters
response = client.text_to_speech.convert(
"kokoro", "Hello", voice="af_bella", speed=1.2
)
# This raises ValidationError with helpful message
try:
response = client.text_to_speech.convert(
"resemble", "Hello", speed=1.2 # ERROR!
)
except ValidationError as e:
print(e)
# Output: Unknown parameters for ResembleAI model: speed.
# Valid parameters: audio_prompt, exaggeration, cfg_weight
# Use client.text_to_speech.print_model_help('resemble') for more details.
# This also raises ValidationError
response = client.text_to_speech.convert(
"kokoro", "Hello", exaggeration=0.5 # ERROR!
)
```
### Working with Audio Responses
```python
# Save to file
response = client.text_to_speech.convert("kokoro", "Hello world!")
response.save("output.wav")
# Or use stream_to_file (alias for save)
response.stream_to_file("output.wav")
# Get raw bytes
audio_bytes = response.content
print(f"Generated {len(audio_bytes)} bytes")
# Check content type
print(response.content_type) # e.g., "audio/wav"
```
## API Reference
### `yapp.Yapp`
Main client class for the Yapp API.
**Parameters:**
- `api_key` (str, optional): Your Yapp API key
- `timeout` (int, default=30): Request timeout in seconds
**Properties:**
- `speech_to_text`: Speech-to-text (STT) resource for audio transcription
- `text_to_speech`: Text-to-speech (TTS) resource for audio synthesis
---
### `client.speech_to_text.convert()`
Transcribe audio to text using the Parakeet STT model.
**Parameters:**
- `model` (str): STT model to use (currently: "parakeet") - **positional, required**
- `file` (str | Path | BinaryIO | bytes): Audio file to transcribe - **positional, required**
- `start_time` (float, optional): Start time in seconds for transcription window
- `end_time` (float, optional): End time in seconds for transcription window
- `**kwargs`: Additional model-specific parameters (reserved for future use)
**Example:**
```python
# Both model and file are positional
response = client.speech_to_text.convert("parakeet", "audio.wav")
```
**Available Models:**
- `"parakeet"` - nvidia/parakeet-tdt-0.6b-v3 - Multilingual ASR with word-level timestamps
**Returns:** `TranscriptionResponse`
- `.text`: The transcribed text
- `.metadata`: Additional metadata from the API (may include word-level timestamps)
**Supported audio formats:** WAV, MP3, MP4, M4A, OGG, FLAC, PCM
---
### `client.text_to_speech.convert()`
Generate speech from text. This is a unified interface that routes to the appropriate model.
**Parameters:**
- `model` (str): Model to use ("kokoro" or "resemble") - **required, first parameter**
- `text` (str): Text to convert to speech
- `**kwargs`: Model-specific parameters (see below)
**Model-Specific Parameters:**
**For Kokoro model:**
- `voice` (str, default="af_bella"): Voice to use for synthesis
- `speed` (float, default=1.0): Speech speed multiplier (0.5 = half speed, 2.0 = double speed)
**For ResembleAI model:**
- `audio_prompt` (str | Path | BinaryIO | bytes, optional): Reference audio for voice cloning
- `exaggeration` (float, default=0.5): Emotional intensity, range 0.0-1.0
- `cfg_weight` (float, default=0.5): Adherence to reference voice, range 0.0-1.0
**Returns:** `AudioResponse`
**Examples:**
```python
# Kokoro - model comes first!
response = client.text_to_speech.convert(
"kokoro", "Hello", voice="af_bella", speed=1.2
)
# ResembleAI - model comes first!
response = client.text_to_speech.convert(
"resemble", "Hello", exaggeration=0.7, cfg_weight=0.6
)
```
**See also:** Use `print_model_help()` to discover parameters
---
### `client.text_to_speech.list_models()`
List all available TTS models.
**Returns:** `list` - List of model names
**Example:**
```python
models = client.text_to_speech.list_models()
print(models) # ['kokoro', 'resemble']
```
---
### `client.text_to_speech.get_model_parameters()`
Get parameter specifications for a specific model.
**Parameters:**
- `model` (str): Model name
**Returns:** `dict` - Parameter specifications with types, defaults, and descriptions
**Example:**
```python
params = client.text_to_speech.get_model_parameters("kokoro")
for name, info in params.items():
print(f"{name}: {info['description']}")
```
---
### `client.text_to_speech.print_model_help()`
Print helpful information about a model's parameters to console.
**Parameters:**
- `model` (str): Model name
**Example:**
```python
client.text_to_speech.print_model_help("kokoro")
# Prints:
# Model: kokoro
# Parameters:
# - voice (str, default='af_bella'): Voice to use for synthesis
# - speed (float, default=1.0): Speech speed multiplier...
```
---
### `client.text_to_speech.kokoro()`
Generate speech using the Kokoro-82M model.
**Parameters:**
- `text` (str): Text to convert to speech
- `voice` (str, default="af_bella"): Voice to use
- `speed` (float, default=1.0): Speech speed multiplier
**Returns:** `AudioResponse`
---
### `client.text_to_speech.resemble()`
Generate speech using ResembleAI Chatterbox with voice cloning.
**Parameters:**
- `text` (str): Text to convert to speech
- `audio_prompt` (str | Path | BinaryIO | bytes, optional): Reference audio for voice cloning
- `exaggeration` (float, default=0.5): Emotional intensity (0.0 - 1.0)
- `cfg_weight` (float, default=0.5): Adherence to reference voice (0.0 - 1.0)
**Returns:** `AudioResponse`
---
### `AudioResponse`
Response object containing generated audio.
**Properties:**
- `content`: Raw audio bytes
- `content_type`: MIME type of the audio
**Methods:**
- `save(file_path)`: Save audio to file
- `stream_to_file(file_path)`: Alias for `save()`
---
### `TranscriptionResponse`
Response object containing transcribed text.
**Properties:**
- `text`: The transcribed text
- `metadata`: Additional metadata
## Complete Workflow Example
```python
import yapp
# Initialize client
client = yapp.Yapp(api_key="your-api-key")
# 1. Transcribe audio
transcription = client.speech_to_text.convert(
"parakeet", # Model (positional)
"original.wav", # File (positional)
start_time=0,
end_time=10
)
print(f"Original: {transcription.text}")
# 2. Modify the text
modified_text = transcription.text.upper()
# 3. Generate new speech with Kokoro
response = client.text_to_speech.convert(
"kokoro", modified_text, voice="af_bella", speed=1.0
)
response.save("output_kokoro.wav")
# 4. Clone voice from original audio
cloned = client.text_to_speech.convert(
"resemble", "New text in the original voice",
audio_prompt="original.wav", cfg_weight=0.9
)
cloned.save("cloned_voice.wav")
```
## Error Handling
The SDK provides specific exception types for different error scenarios:
```python
from yapp import YappError, APIError, AuthenticationError, ValidationError
try:
response = client.text_to_speech.convert(text="Hello world!")
response.save("output.wav")
except AuthenticationError as e:
print(f"Authentication failed: {e}")
except ValidationError as e:
print(f"Invalid parameters: {e}")
except APIError as e:
print(f"API error (status {e.status_code}): {e.message}")
except YappError as e:
print(f"Yapp SDK error: {e}")
```
## Supported Audio Formats
### Input (Transcription)
- WAV (.wav)
- MP3 (.mp3)
- MP4 Audio (.mp4, .m4a)
- OGG (.ogg)
- FLAC (.flac)
- PCM (.pcm)
### Output (Synthesis)
- WAV (default output format)
## Development
### Running Examples
```bash
cd examples
python discover_parameters.py # Learn about model parameters
python transcribe_audio.py # Speech-to-text examples
python synthesize_speech.py # Text-to-speech examples
python voice_cloning.py # Voice cloning with ResembleAI
python model_parameters.py # Model-specific parameter examples
python full_workflow.py # Complete workflow
```
### Installing for Development
```bash
git clone https://github.com/yourusername/yapp-sdk.git
cd yapp-sdk
pip install -e .
```
## Roadmap
- [ ] Add streaming support for real-time TTS
- [ ] Support for additional TTS models
- [ ] Async client support
- [ ] Audio format conversion utilities
- [ ] Batch processing capabilities
- [ ] WebSocket support for real-time conversations
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
MIT License - see LICENSE file for details.
## Support
For issues and questions:
- GitHub Issues: https://github.com/yourusername/yapp-sdk/issues
- Documentation: https://docs.yapp.ai
- Email: support@yapp.ai
Raw data
{
"_id": null,
"home_page": "https://github.com/yourusername/yapp-sdk",
"name": "yapp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Yapp <support@yapp.ai>",
"keywords": "voice, ai, speech-to-text, text-to-speech, tts, stt, audio, speech-recognition, voice-cloning, kokoro, resemble",
"author": "Yapp",
"author_email": "Yapp <support@yapp.ai>",
"download_url": "https://files.pythonhosted.org/packages/96/34/4ad9543fe5a3c322667b608efedf4d7ec26dc19951838b15b68744f13cf8/yapp-0.2.0.tar.gz",
"platform": null,
"description": "# Yapp Python SDK\n\nThe official Python SDK for the Yapp Voice AI API. Easily integrate speech-to-text (STT) and text-to-speech (TTS) capabilities into your Python applications.\n\n## Features\n\n- **Simple, intuitive API** - Clean, Pythonic interface\n- **Multiple TTS models** - Kokoro-82M and ResembleAI Chatterbox\n- **Model-specific parameters** - Each model has its own unique parameters with validation\n- **Voice cloning** with ResembleAI's audio prompt feature\n- **Flexible audio handling** - Works with file paths, file objects, or raw bytes\n- **Type hints** for better IDE support\n- **Comprehensive error handling**\n\n## Available Models\n\n### Speech-to-Text (STT)\n\n| Model | Parameters | Description |\n|-------|-----------|-------------|\n| **Parakeet** | `file` | Audio file to transcribe (required, positional) |\n| | `start_time` | Start time in seconds for transcription window (optional) |\n| | `end_time` | End time in seconds for transcription window (optional) |\n\n**Example:**\n```python\n# Basic usage\nclient.speech_to_text.convert(\"parakeet\", \"audio.wav\")\n\n# With time window\nclient.speech_to_text.convert(\"parakeet\", \"audio.wav\", start_time=3.0, end_time=9.0)\n```\n\n### Text-to-Speech (TTS)\n\n| Model | Parameters | Description |\n|-------|-----------|-------------|\n| **Kokoro** | `voice` | Voice ID (default: \"af_bella\") |\n| | `speed` | Speech speed multiplier: 0.5-2.0 (default: 1.0) |\n| **ResembleAI** | `audio_prompt` | Reference audio file for voice cloning (optional) |\n| | `exaggeration` | Emotional intensity: 0.0-1.0 (default: 0.5) |\n| | `cfg_weight` | Adherence to reference voice: 0.0-1.0 (default: 0.5) |\n\n## Installation\n\nInstall from PyPI:\n\n```bash\npip install yapp\n```\n\nOr install from source:\n\n```bash\ngit clone https://github.com/yourusername/yapp-sdk.git\ncd yapp-sdk\npip install -e .\n```\n\n## Quick Start\n\n```python\nimport yapp\n\n# Initialize the client\nclient = yapp.Yapp(api_key=\"your-api-key\")\n\n# Transcribe audio to text\ntranscription = client.speech_to_text.convert(\"parakeet\", \"audio.wav\")\nprint(transcription.text)\n\n# Generate speech from text\nresponse = client.text_to_speech.convert(\"kokoro\", \"Hello world!\")\nresponse.save(\"output.wav\")\n```\n\n## Authentication\n\nYou can provide your API key in two ways:\n\n### 1. Pass it directly to the client:\n```python\nclient = yapp.Yapp(api_key=\"your-api-key\")\n```\n\n### 2. Set it as an environment variable:\n```bash\nexport YAPP_API_KEY=\"your-api-key\"\n```\n\n```python\nclient = yapp.Yapp() # Will use YAPP_API_KEY from environment\n```\n\n## Usage Examples\n\n### Speech-to-Text (Transcription)\n\n#### Basic Transcription\n\nThe SDK uses the **Parakeet** multilingual STT model for transcription.\n\n```python\nimport yapp\n\nclient = yapp.Yapp(api_key=\"your-api-key\")\n\n# Transcribe an entire audio file using Parakeet\nresponse = client.speech_to_text.convert(\"parakeet\", \"audio.wav\")\nprint(response.text)\n```\n\n#### Transcription with Time Window\n\n```python\n# Transcribe only a specific time range\nresponse = client.speech_to_text.convert(\n \"parakeet\", # Model (positional)\n \"audio.wav\", # File (positional)\n start_time=3.0, # Start at 3 seconds\n end_time=9.0 # End at 9 seconds\n)\nprint(response.text)\n```\n\n#### Multiple Audio Formats\n\nThe SDK automatically handles various audio formats:\n\n```python\n# From file path (string)\nresponse = client.speech_to_text.convert(\"parakeet\", \"audio.wav\")\n\n# From pathlib.Path\nfrom pathlib import Path\nresponse = client.speech_to_text.convert(\"parakeet\", Path(\"audio.mp3\"))\n\n# From file object\nwith open(\"audio.wav\", \"rb\") as f:\n response = client.speech_to_text.convert(\"parakeet\", f)\n\n# From bytes\naudio_bytes = open(\"audio.wav\", \"rb\").read()\nresponse = client.speech_to_text.convert(\"parakeet\", audio_bytes)\n```\n\n### Text-to-Speech (Synthesis)\n\n#### Using Kokoro-82M Model\n\nKokoro parameters: `voice`, `speed`\n\n```python\nimport yapp\n\nclient = yapp.Yapp(api_key=\"your-api-key\")\n\n# Simple synthesis (uses defaults)\nresponse = client.text_to_speech.convert(\n \"kokoro\", # Model first\n \"Hello world!\"\n)\nresponse.save(\"output.wav\")\n\n# With custom voice and speed\nresponse = client.text_to_speech.convert(\n \"kokoro\", # Model first\n \"The quick brown fox jumps over the lazy dog.\",\n voice=\"af_bella\", # Kokoro parameter\n speed=1.2 # Kokoro parameter - 20% faster\n)\nresponse.save(\"output_fast.wav\")\n\n# Or use the kokoro() method directly\nresponse = client.text_to_speech.kokoro(\n text=\"Direct method call\",\n voice=\"af_bella\",\n speed=0.8 # 20% slower\n)\nresponse.save(\"output_slow.wav\")\n```\n\n#### Using ResembleAI Model (with Voice Cloning)\n\nResembleAI parameters: `audio_prompt`, `exaggeration`, `cfg_weight`\n\n```python\n# Simple generation\nresponse = client.text_to_speech.convert(\n \"resemble\", # Model first\n \"Hello world!\",\n exaggeration=0.5, # Emotional intensity (0.0 - 1.0)\n cfg_weight=0.5 # Adherence to reference voice (0.0 - 1.0)\n)\nresponse.save(\"output.wav\")\n\n# Voice cloning with audio prompt\nresponse = client.text_to_speech.convert(\n \"resemble\", # Model first\n \"This should sound like the reference voice.\",\n audio_prompt=\"reference_voice.wav\", # Reference audio for cloning\n cfg_weight=0.9 # High adherence to reference\n)\nresponse.save(\"cloned_voice.wav\")\n\n# Highly expressive speech\nresponse = client.text_to_speech.convert(\n \"resemble\", # Model first\n \"Wow! This is amazing!\",\n exaggeration=0.9, # High emotional intensity\n cfg_weight=0.5\n)\nresponse.save(\"expressive.wav\")\n\n# Or use the resemble() method directly\nresponse = client.text_to_speech.resemble(\n text=\"Direct method call\",\n audio_prompt=\"reference.wav\",\n exaggeration=0.7,\n cfg_weight=0.8\n)\nresponse.save(\"output.wav\")\n```\n\n#### Discovering Model Parameters\n\nThe SDK provides methods to discover what parameters are available for each TTS model:\n\n```python\n# Parakeet (STT) parameters\n# Model: parakeet\n# Parameters:\n# - file (required): Audio file to transcribe\n# - start_time (optional): Start time in seconds\n# - end_time (optional): End time in seconds\nclient.speech_to_text.convert(\"parakeet\", \"audio.wav\", start_time=0, end_time=10)\n\n# List all available TTS models\nmodels = client.text_to_speech.list_models()\nprint(models) # ['kokoro', 'resemble']\n\n# Print help for a specific TTS model\nclient.text_to_speech.print_model_help(\"kokoro\")\n# Output:\n# Model: kokoro\n# Parameters:\n# - voice (str, default='af_bella'): Voice to use for synthesis\n# - speed (float, default=1.0): Speech speed multiplier (0.5 = half speed, 2.0 = double speed)\n\nclient.text_to_speech.print_model_help(\"resemble\")\n# Output:\n# Model: resemble\n# Parameters:\n# - audio_prompt (AudioFile, default=None): Reference audio file for voice cloning (optional)\n# - exaggeration (float, default=0.5): Emotional intensity, range 0.0-1.0\n# - cfg_weight (float, default=0.5): Adherence to reference voice, range 0.0-1.0\n\n# Get parameter specifications programmatically\nparams = client.text_to_speech.get_model_parameters(\"kokoro\")\nfor param_name, param_info in params.items():\n print(f\"{param_name}: {param_info['description']}\")\n```\n\n#### Parameter Validation\n\nThe SDK validates that you're using the correct parameters for each model:\n\n```python\n# This works - correct Kokoro parameters\nresponse = client.text_to_speech.convert(\n \"kokoro\", \"Hello\", voice=\"af_bella\", speed=1.2\n)\n\n# This raises ValidationError with helpful message\ntry:\n response = client.text_to_speech.convert(\n \"resemble\", \"Hello\", speed=1.2 # ERROR!\n )\nexcept ValidationError as e:\n print(e)\n # Output: Unknown parameters for ResembleAI model: speed.\n # Valid parameters: audio_prompt, exaggeration, cfg_weight\n # Use client.text_to_speech.print_model_help('resemble') for more details.\n\n# This also raises ValidationError\nresponse = client.text_to_speech.convert(\n \"kokoro\", \"Hello\", exaggeration=0.5 # ERROR!\n)\n```\n\n### Working with Audio Responses\n\n```python\n# Save to file\nresponse = client.text_to_speech.convert(\"kokoro\", \"Hello world!\")\nresponse.save(\"output.wav\")\n\n# Or use stream_to_file (alias for save)\nresponse.stream_to_file(\"output.wav\")\n\n# Get raw bytes\naudio_bytes = response.content\nprint(f\"Generated {len(audio_bytes)} bytes\")\n\n# Check content type\nprint(response.content_type) # e.g., \"audio/wav\"\n```\n\n## API Reference\n\n### `yapp.Yapp`\n\nMain client class for the Yapp API.\n\n**Parameters:**\n- `api_key` (str, optional): Your Yapp API key\n- `timeout` (int, default=30): Request timeout in seconds\n\n**Properties:**\n- `speech_to_text`: Speech-to-text (STT) resource for audio transcription\n- `text_to_speech`: Text-to-speech (TTS) resource for audio synthesis\n\n---\n\n### `client.speech_to_text.convert()`\n\nTranscribe audio to text using the Parakeet STT model.\n\n**Parameters:**\n- `model` (str): STT model to use (currently: \"parakeet\") - **positional, required**\n- `file` (str | Path | BinaryIO | bytes): Audio file to transcribe - **positional, required**\n- `start_time` (float, optional): Start time in seconds for transcription window\n- `end_time` (float, optional): End time in seconds for transcription window\n- `**kwargs`: Additional model-specific parameters (reserved for future use)\n\n**Example:**\n```python\n# Both model and file are positional\nresponse = client.speech_to_text.convert(\"parakeet\", \"audio.wav\")\n```\n\n**Available Models:**\n- `\"parakeet\"` - nvidia/parakeet-tdt-0.6b-v3 - Multilingual ASR with word-level timestamps\n\n**Returns:** `TranscriptionResponse`\n- `.text`: The transcribed text\n- `.metadata`: Additional metadata from the API (may include word-level timestamps)\n\n**Supported audio formats:** WAV, MP3, MP4, M4A, OGG, FLAC, PCM\n\n---\n\n### `client.text_to_speech.convert()`\n\nGenerate speech from text. This is a unified interface that routes to the appropriate model.\n\n**Parameters:**\n- `model` (str): Model to use (\"kokoro\" or \"resemble\") - **required, first parameter**\n- `text` (str): Text to convert to speech\n- `**kwargs`: Model-specific parameters (see below)\n\n**Model-Specific Parameters:**\n\n**For Kokoro model:**\n- `voice` (str, default=\"af_bella\"): Voice to use for synthesis\n- `speed` (float, default=1.0): Speech speed multiplier (0.5 = half speed, 2.0 = double speed)\n\n**For ResembleAI model:**\n- `audio_prompt` (str | Path | BinaryIO | bytes, optional): Reference audio for voice cloning\n- `exaggeration` (float, default=0.5): Emotional intensity, range 0.0-1.0\n- `cfg_weight` (float, default=0.5): Adherence to reference voice, range 0.0-1.0\n\n**Returns:** `AudioResponse`\n\n**Examples:**\n```python\n# Kokoro - model comes first!\nresponse = client.text_to_speech.convert(\n \"kokoro\", \"Hello\", voice=\"af_bella\", speed=1.2\n)\n\n# ResembleAI - model comes first!\nresponse = client.text_to_speech.convert(\n \"resemble\", \"Hello\", exaggeration=0.7, cfg_weight=0.6\n)\n```\n\n**See also:** Use `print_model_help()` to discover parameters\n\n---\n\n### `client.text_to_speech.list_models()`\n\nList all available TTS models.\n\n**Returns:** `list` - List of model names\n\n**Example:**\n```python\nmodels = client.text_to_speech.list_models()\nprint(models) # ['kokoro', 'resemble']\n```\n\n---\n\n### `client.text_to_speech.get_model_parameters()`\n\nGet parameter specifications for a specific model.\n\n**Parameters:**\n- `model` (str): Model name\n\n**Returns:** `dict` - Parameter specifications with types, defaults, and descriptions\n\n**Example:**\n```python\nparams = client.text_to_speech.get_model_parameters(\"kokoro\")\nfor name, info in params.items():\n print(f\"{name}: {info['description']}\")\n```\n\n---\n\n### `client.text_to_speech.print_model_help()`\n\nPrint helpful information about a model's parameters to console.\n\n**Parameters:**\n- `model` (str): Model name\n\n**Example:**\n```python\nclient.text_to_speech.print_model_help(\"kokoro\")\n# Prints:\n# Model: kokoro\n# Parameters:\n# - voice (str, default='af_bella'): Voice to use for synthesis\n# - speed (float, default=1.0): Speech speed multiplier...\n```\n\n---\n\n### `client.text_to_speech.kokoro()`\n\nGenerate speech using the Kokoro-82M model.\n\n**Parameters:**\n- `text` (str): Text to convert to speech\n- `voice` (str, default=\"af_bella\"): Voice to use\n- `speed` (float, default=1.0): Speech speed multiplier\n\n**Returns:** `AudioResponse`\n\n---\n\n### `client.text_to_speech.resemble()`\n\nGenerate speech using ResembleAI Chatterbox with voice cloning.\n\n**Parameters:**\n- `text` (str): Text to convert to speech\n- `audio_prompt` (str | Path | BinaryIO | bytes, optional): Reference audio for voice cloning\n- `exaggeration` (float, default=0.5): Emotional intensity (0.0 - 1.0)\n- `cfg_weight` (float, default=0.5): Adherence to reference voice (0.0 - 1.0)\n\n**Returns:** `AudioResponse`\n\n---\n\n### `AudioResponse`\n\nResponse object containing generated audio.\n\n**Properties:**\n- `content`: Raw audio bytes\n- `content_type`: MIME type of the audio\n\n**Methods:**\n- `save(file_path)`: Save audio to file\n- `stream_to_file(file_path)`: Alias for `save()`\n\n---\n\n### `TranscriptionResponse`\n\nResponse object containing transcribed text.\n\n**Properties:**\n- `text`: The transcribed text\n- `metadata`: Additional metadata\n\n## Complete Workflow Example\n\n```python\nimport yapp\n\n# Initialize client\nclient = yapp.Yapp(api_key=\"your-api-key\")\n\n# 1. Transcribe audio\ntranscription = client.speech_to_text.convert(\n \"parakeet\", # Model (positional)\n \"original.wav\", # File (positional)\n start_time=0,\n end_time=10\n)\nprint(f\"Original: {transcription.text}\")\n\n# 2. Modify the text\nmodified_text = transcription.text.upper()\n\n# 3. Generate new speech with Kokoro\nresponse = client.text_to_speech.convert(\n \"kokoro\", modified_text, voice=\"af_bella\", speed=1.0\n)\nresponse.save(\"output_kokoro.wav\")\n\n# 4. Clone voice from original audio\ncloned = client.text_to_speech.convert(\n \"resemble\", \"New text in the original voice\",\n audio_prompt=\"original.wav\", cfg_weight=0.9\n)\ncloned.save(\"cloned_voice.wav\")\n```\n\n## Error Handling\n\nThe SDK provides specific exception types for different error scenarios:\n\n```python\nfrom yapp import YappError, APIError, AuthenticationError, ValidationError\n\ntry:\n response = client.text_to_speech.convert(text=\"Hello world!\")\n response.save(\"output.wav\")\nexcept AuthenticationError as e:\n print(f\"Authentication failed: {e}\")\nexcept ValidationError as e:\n print(f\"Invalid parameters: {e}\")\nexcept APIError as e:\n print(f\"API error (status {e.status_code}): {e.message}\")\nexcept YappError as e:\n print(f\"Yapp SDK error: {e}\")\n```\n\n## Supported Audio Formats\n\n### Input (Transcription)\n- WAV (.wav)\n- MP3 (.mp3)\n- MP4 Audio (.mp4, .m4a)\n- OGG (.ogg)\n- FLAC (.flac)\n- PCM (.pcm)\n\n### Output (Synthesis)\n- WAV (default output format)\n\n## Development\n\n### Running Examples\n\n```bash\ncd examples\npython discover_parameters.py # Learn about model parameters\npython transcribe_audio.py # Speech-to-text examples\npython synthesize_speech.py # Text-to-speech examples\npython voice_cloning.py # Voice cloning with ResembleAI\npython model_parameters.py # Model-specific parameter examples\npython full_workflow.py # Complete workflow\n```\n\n### Installing for Development\n\n```bash\ngit clone https://github.com/yourusername/yapp-sdk.git\ncd yapp-sdk\npip install -e .\n```\n\n## Roadmap\n\n- [ ] Add streaming support for real-time TTS\n- [ ] Support for additional TTS models\n- [ ] Async client support\n- [ ] Audio format conversion utilities\n- [ ] Batch processing capabilities\n- [ ] WebSocket support for real-time conversations\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Support\n\nFor issues and questions:\n- GitHub Issues: https://github.com/yourusername/yapp-sdk/issues\n- Documentation: https://docs.yapp.ai\n- Email: support@yapp.ai\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python SDK for Yapp Voice AI API - Speech-to-Text and Text-to-Speech",
"version": "0.2.0",
"project_urls": {
"Bug Tracker": "https://github.com/yourusername/yapp-sdk/issues",
"Documentation": "https://github.com/yourusername/yapp-sdk#readme",
"Homepage": "https://github.com/yourusername/yapp-sdk",
"Repository": "https://github.com/yourusername/yapp-sdk"
},
"split_keywords": [
"voice",
" ai",
" speech-to-text",
" text-to-speech",
" tts",
" stt",
" audio",
" speech-recognition",
" voice-cloning",
" kokoro",
" resemble"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "62a46e9b42497f263d567f8f0ad942158967cbb2a2e8ba608d525e9350ee6bd8",
"md5": "70730616ba4576c061b33783c83c0a5a",
"sha256": "f2b3585c77d08d5a4578ca0d1f9623786d8d8e8dae34b932d336f9ef9dd9075c"
},
"downloads": -1,
"filename": "yapp-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "70730616ba4576c061b33783c83c0a5a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 15786,
"upload_time": "2025-10-27T22:24:09",
"upload_time_iso_8601": "2025-10-27T22:24:09.764458Z",
"url": "https://files.pythonhosted.org/packages/62/a4/6e9b42497f263d567f8f0ad942158967cbb2a2e8ba608d525e9350ee6bd8/yapp-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "96344ad9543fe5a3c322667b608efedf4d7ec26dc19951838b15b68744f13cf8",
"md5": "19552de45086066953297566d36c04b5",
"sha256": "9c3f6cd053391a34156ee78af31387848c4604941d08429877e8a316c9110372"
},
"downloads": -1,
"filename": "yapp-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "19552de45086066953297566d36c04b5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 19175,
"upload_time": "2025-10-27T22:24:11",
"upload_time_iso_8601": "2025-10-27T22:24:11.214058Z",
"url": "https://files.pythonhosted.org/packages/96/34/4ad9543fe5a3c322667b608efedf4d7ec26dc19951838b15b68744f13cf8/yapp-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-27 22:24:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yourusername",
"github_project": "yapp-sdk",
"github_not_found": true,
"lcname": "yapp"
}