yapp


Nameyapp JSON
Version 0.2.0 PyPI version JSON
download
home_pagehttps://github.com/yourusername/yapp-sdk
SummaryPython SDK for Yapp Voice AI API - Speech-to-Text and Text-to-Speech
upload_time2025-10-27 22:24:11
maintainerNone
docs_urlNone
authorYapp
requires_python>=3.8
licenseMIT
keywords voice ai speech-to-text text-to-speech tts stt audio speech-recognition voice-cloning kokoro resemble
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Yapp Python SDK

The official Python SDK for the Yapp Voice AI API. Easily integrate speech-to-text (STT) and text-to-speech (TTS) capabilities into your Python applications.

## Features

- **Simple, intuitive API** - Clean, Pythonic interface
- **Multiple TTS models** - Kokoro-82M and ResembleAI Chatterbox
- **Model-specific parameters** - Each model has its own unique parameters with validation
- **Voice cloning** with ResembleAI's audio prompt feature
- **Flexible audio handling** - Works with file paths, file objects, or raw bytes
- **Type hints** for better IDE support
- **Comprehensive error handling**

## Available Models

### Speech-to-Text (STT)

| Model | Parameters | Description |
|-------|-----------|-------------|
| **Parakeet** | `file` | Audio file to transcribe (required, positional) |
| | `start_time` | Start time in seconds for transcription window (optional) |
| | `end_time` | End time in seconds for transcription window (optional) |

**Example:**
```python
# Basic usage
client.speech_to_text.convert("parakeet", "audio.wav")

# With time window
client.speech_to_text.convert("parakeet", "audio.wav", start_time=3.0, end_time=9.0)
```

### Text-to-Speech (TTS)

| Model | Parameters | Description |
|-------|-----------|-------------|
| **Kokoro** | `voice` | Voice ID (default: "af_bella") |
| | `speed` | Speech speed multiplier: 0.5-2.0 (default: 1.0) |
| **ResembleAI** | `audio_prompt` | Reference audio file for voice cloning (optional) |
| | `exaggeration` | Emotional intensity: 0.0-1.0 (default: 0.5) |
| | `cfg_weight` | Adherence to reference voice: 0.0-1.0 (default: 0.5) |

## Installation

Install from PyPI:

```bash
pip install yapp
```

Or install from source:

```bash
git clone https://github.com/yourusername/yapp-sdk.git
cd yapp-sdk
pip install -e .
```

## Quick Start

```python
import yapp

# Initialize the client
client = yapp.Yapp(api_key="your-api-key")

# Transcribe audio to text
transcription = client.speech_to_text.convert("parakeet", "audio.wav")
print(transcription.text)

# Generate speech from text
response = client.text_to_speech.convert("kokoro", "Hello world!")
response.save("output.wav")
```

## Authentication

You can provide your API key in two ways:

### 1. Pass it directly to the client:
```python
client = yapp.Yapp(api_key="your-api-key")
```

### 2. Set it as an environment variable:
```bash
export YAPP_API_KEY="your-api-key"
```

```python
client = yapp.Yapp()  # Will use YAPP_API_KEY from environment
```

## Usage Examples

### Speech-to-Text (Transcription)

#### Basic Transcription

The SDK uses the **Parakeet** multilingual STT model for transcription.

```python
import yapp

client = yapp.Yapp(api_key="your-api-key")

# Transcribe an entire audio file using Parakeet
response = client.speech_to_text.convert("parakeet", "audio.wav")
print(response.text)
```

#### Transcription with Time Window

```python
# Transcribe only a specific time range
response = client.speech_to_text.convert(
    "parakeet",   # Model (positional)
    "audio.wav",  # File (positional)
    start_time=3.0,  # Start at 3 seconds
    end_time=9.0     # End at 9 seconds
)
print(response.text)
```

#### Multiple Audio Formats

The SDK automatically handles various audio formats:

```python
# From file path (string)
response = client.speech_to_text.convert("parakeet", "audio.wav")

# From pathlib.Path
from pathlib import Path
response = client.speech_to_text.convert("parakeet", Path("audio.mp3"))

# From file object
with open("audio.wav", "rb") as f:
    response = client.speech_to_text.convert("parakeet", f)

# From bytes
audio_bytes = open("audio.wav", "rb").read()
response = client.speech_to_text.convert("parakeet", audio_bytes)
```

### Text-to-Speech (Synthesis)

#### Using Kokoro-82M Model

Kokoro parameters: `voice`, `speed`

```python
import yapp

client = yapp.Yapp(api_key="your-api-key")

# Simple synthesis (uses defaults)
response = client.text_to_speech.convert(
    "kokoro",  # Model first
    "Hello world!"
)
response.save("output.wav")

# With custom voice and speed
response = client.text_to_speech.convert(
    "kokoro",  # Model first
    "The quick brown fox jumps over the lazy dog.",
    voice="af_bella",  # Kokoro parameter
    speed=1.2          # Kokoro parameter - 20% faster
)
response.save("output_fast.wav")

# Or use the kokoro() method directly
response = client.text_to_speech.kokoro(
    text="Direct method call",
    voice="af_bella",
    speed=0.8  # 20% slower
)
response.save("output_slow.wav")
```

#### Using ResembleAI Model (with Voice Cloning)

ResembleAI parameters: `audio_prompt`, `exaggeration`, `cfg_weight`

```python
# Simple generation
response = client.text_to_speech.convert(
    "resemble",  # Model first
    "Hello world!",
    exaggeration=0.5,  # Emotional intensity (0.0 - 1.0)
    cfg_weight=0.5     # Adherence to reference voice (0.0 - 1.0)
)
response.save("output.wav")

# Voice cloning with audio prompt
response = client.text_to_speech.convert(
    "resemble",  # Model first
    "This should sound like the reference voice.",
    audio_prompt="reference_voice.wav",  # Reference audio for cloning
    cfg_weight=0.9                       # High adherence to reference
)
response.save("cloned_voice.wav")

# Highly expressive speech
response = client.text_to_speech.convert(
    "resemble",  # Model first
    "Wow! This is amazing!",
    exaggeration=0.9,  # High emotional intensity
    cfg_weight=0.5
)
response.save("expressive.wav")

# Or use the resemble() method directly
response = client.text_to_speech.resemble(
    text="Direct method call",
    audio_prompt="reference.wav",
    exaggeration=0.7,
    cfg_weight=0.8
)
response.save("output.wav")
```

#### Discovering Model Parameters

The SDK provides methods to discover what parameters are available for each TTS model:

```python
# Parakeet (STT) parameters
# Model: parakeet
# Parameters:
#   - file (required): Audio file to transcribe
#   - start_time (optional): Start time in seconds
#   - end_time (optional): End time in seconds
client.speech_to_text.convert("parakeet", "audio.wav", start_time=0, end_time=10)

# List all available TTS models
models = client.text_to_speech.list_models()
print(models)  # ['kokoro', 'resemble']

# Print help for a specific TTS model
client.text_to_speech.print_model_help("kokoro")
# Output:
# Model: kokoro
# Parameters:
#   - voice (str, default='af_bella'): Voice to use for synthesis
#   - speed (float, default=1.0): Speech speed multiplier (0.5 = half speed, 2.0 = double speed)

client.text_to_speech.print_model_help("resemble")
# Output:
# Model: resemble
# Parameters:
#   - audio_prompt (AudioFile, default=None): Reference audio file for voice cloning (optional)
#   - exaggeration (float, default=0.5): Emotional intensity, range 0.0-1.0
#   - cfg_weight (float, default=0.5): Adherence to reference voice, range 0.0-1.0

# Get parameter specifications programmatically
params = client.text_to_speech.get_model_parameters("kokoro")
for param_name, param_info in params.items():
    print(f"{param_name}: {param_info['description']}")
```

#### Parameter Validation

The SDK validates that you're using the correct parameters for each model:

```python
# This works - correct Kokoro parameters
response = client.text_to_speech.convert(
    "kokoro", "Hello", voice="af_bella", speed=1.2
)

# This raises ValidationError with helpful message
try:
    response = client.text_to_speech.convert(
        "resemble", "Hello", speed=1.2  # ERROR!
    )
except ValidationError as e:
    print(e)
    # Output: Unknown parameters for ResembleAI model: speed.
    #         Valid parameters: audio_prompt, exaggeration, cfg_weight
    #         Use client.text_to_speech.print_model_help('resemble') for more details.

# This also raises ValidationError
response = client.text_to_speech.convert(
    "kokoro", "Hello", exaggeration=0.5  # ERROR!
)
```

### Working with Audio Responses

```python
# Save to file
response = client.text_to_speech.convert("kokoro", "Hello world!")
response.save("output.wav")

# Or use stream_to_file (alias for save)
response.stream_to_file("output.wav")

# Get raw bytes
audio_bytes = response.content
print(f"Generated {len(audio_bytes)} bytes")

# Check content type
print(response.content_type)  # e.g., "audio/wav"
```

## API Reference

### `yapp.Yapp`

Main client class for the Yapp API.

**Parameters:**
- `api_key` (str, optional): Your Yapp API key
- `timeout` (int, default=30): Request timeout in seconds

**Properties:**
- `speech_to_text`: Speech-to-text (STT) resource for audio transcription
- `text_to_speech`: Text-to-speech (TTS) resource for audio synthesis

---

### `client.speech_to_text.convert()`

Transcribe audio to text using the Parakeet STT model.

**Parameters:**
- `model` (str): STT model to use (currently: "parakeet") - **positional, required**
- `file` (str | Path | BinaryIO | bytes): Audio file to transcribe - **positional, required**
- `start_time` (float, optional): Start time in seconds for transcription window
- `end_time` (float, optional): End time in seconds for transcription window
- `**kwargs`: Additional model-specific parameters (reserved for future use)

**Example:**
```python
# Both model and file are positional
response = client.speech_to_text.convert("parakeet", "audio.wav")
```

**Available Models:**
- `"parakeet"` - nvidia/parakeet-tdt-0.6b-v3 - Multilingual ASR with word-level timestamps

**Returns:** `TranscriptionResponse`
- `.text`: The transcribed text
- `.metadata`: Additional metadata from the API (may include word-level timestamps)

**Supported audio formats:** WAV, MP3, MP4, M4A, OGG, FLAC, PCM

---

### `client.text_to_speech.convert()`

Generate speech from text. This is a unified interface that routes to the appropriate model.

**Parameters:**
- `model` (str): Model to use ("kokoro" or "resemble") - **required, first parameter**
- `text` (str): Text to convert to speech
- `**kwargs`: Model-specific parameters (see below)

**Model-Specific Parameters:**

**For Kokoro model:**
- `voice` (str, default="af_bella"): Voice to use for synthesis
- `speed` (float, default=1.0): Speech speed multiplier (0.5 = half speed, 2.0 = double speed)

**For ResembleAI model:**
- `audio_prompt` (str | Path | BinaryIO | bytes, optional): Reference audio for voice cloning
- `exaggeration` (float, default=0.5): Emotional intensity, range 0.0-1.0
- `cfg_weight` (float, default=0.5): Adherence to reference voice, range 0.0-1.0

**Returns:** `AudioResponse`

**Examples:**
```python
# Kokoro - model comes first!
response = client.text_to_speech.convert(
    "kokoro", "Hello", voice="af_bella", speed=1.2
)

# ResembleAI - model comes first!
response = client.text_to_speech.convert(
    "resemble", "Hello", exaggeration=0.7, cfg_weight=0.6
)
```

**See also:** Use `print_model_help()` to discover parameters

---

### `client.text_to_speech.list_models()`

List all available TTS models.

**Returns:** `list` - List of model names

**Example:**
```python
models = client.text_to_speech.list_models()
print(models)  # ['kokoro', 'resemble']
```

---

### `client.text_to_speech.get_model_parameters()`

Get parameter specifications for a specific model.

**Parameters:**
- `model` (str): Model name

**Returns:** `dict` - Parameter specifications with types, defaults, and descriptions

**Example:**
```python
params = client.text_to_speech.get_model_parameters("kokoro")
for name, info in params.items():
    print(f"{name}: {info['description']}")
```

---

### `client.text_to_speech.print_model_help()`

Print helpful information about a model's parameters to console.

**Parameters:**
- `model` (str): Model name

**Example:**
```python
client.text_to_speech.print_model_help("kokoro")
# Prints:
# Model: kokoro
# Parameters:
#   - voice (str, default='af_bella'): Voice to use for synthesis
#   - speed (float, default=1.0): Speech speed multiplier...
```

---

### `client.text_to_speech.kokoro()`

Generate speech using the Kokoro-82M model.

**Parameters:**
- `text` (str): Text to convert to speech
- `voice` (str, default="af_bella"): Voice to use
- `speed` (float, default=1.0): Speech speed multiplier

**Returns:** `AudioResponse`

---

### `client.text_to_speech.resemble()`

Generate speech using ResembleAI Chatterbox with voice cloning.

**Parameters:**
- `text` (str): Text to convert to speech
- `audio_prompt` (str | Path | BinaryIO | bytes, optional): Reference audio for voice cloning
- `exaggeration` (float, default=0.5): Emotional intensity (0.0 - 1.0)
- `cfg_weight` (float, default=0.5): Adherence to reference voice (0.0 - 1.0)

**Returns:** `AudioResponse`

---

### `AudioResponse`

Response object containing generated audio.

**Properties:**
- `content`: Raw audio bytes
- `content_type`: MIME type of the audio

**Methods:**
- `save(file_path)`: Save audio to file
- `stream_to_file(file_path)`: Alias for `save()`

---

### `TranscriptionResponse`

Response object containing transcribed text.

**Properties:**
- `text`: The transcribed text
- `metadata`: Additional metadata

## Complete Workflow Example

```python
import yapp

# Initialize client
client = yapp.Yapp(api_key="your-api-key")

# 1. Transcribe audio
transcription = client.speech_to_text.convert(
    "parakeet",      # Model (positional)
    "original.wav",  # File (positional)
    start_time=0,
    end_time=10
)
print(f"Original: {transcription.text}")

# 2. Modify the text
modified_text = transcription.text.upper()

# 3. Generate new speech with Kokoro
response = client.text_to_speech.convert(
    "kokoro", modified_text, voice="af_bella", speed=1.0
)
response.save("output_kokoro.wav")

# 4. Clone voice from original audio
cloned = client.text_to_speech.convert(
    "resemble", "New text in the original voice",
    audio_prompt="original.wav", cfg_weight=0.9
)
cloned.save("cloned_voice.wav")
```

## Error Handling

The SDK provides specific exception types for different error scenarios:

```python
from yapp import YappError, APIError, AuthenticationError, ValidationError

try:
    response = client.text_to_speech.convert(text="Hello world!")
    response.save("output.wav")
except AuthenticationError as e:
    print(f"Authentication failed: {e}")
except ValidationError as e:
    print(f"Invalid parameters: {e}")
except APIError as e:
    print(f"API error (status {e.status_code}): {e.message}")
except YappError as e:
    print(f"Yapp SDK error: {e}")
```

## Supported Audio Formats

### Input (Transcription)
- WAV (.wav)
- MP3 (.mp3)
- MP4 Audio (.mp4, .m4a)
- OGG (.ogg)
- FLAC (.flac)
- PCM (.pcm)

### Output (Synthesis)
- WAV (default output format)

## Development

### Running Examples

```bash
cd examples
python discover_parameters.py   # Learn about model parameters
python transcribe_audio.py      # Speech-to-text examples
python synthesize_speech.py     # Text-to-speech examples
python voice_cloning.py         # Voice cloning with ResembleAI
python model_parameters.py      # Model-specific parameter examples
python full_workflow.py         # Complete workflow
```

### Installing for Development

```bash
git clone https://github.com/yourusername/yapp-sdk.git
cd yapp-sdk
pip install -e .
```

## Roadmap

- [ ] Add streaming support for real-time TTS
- [ ] Support for additional TTS models
- [ ] Async client support
- [ ] Audio format conversion utilities
- [ ] Batch processing capabilities
- [ ] WebSocket support for real-time conversations

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

MIT License - see LICENSE file for details.

## Support

For issues and questions:
- GitHub Issues: https://github.com/yourusername/yapp-sdk/issues
- Documentation: https://docs.yapp.ai
- Email: support@yapp.ai

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yourusername/yapp-sdk",
    "name": "yapp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Yapp <support@yapp.ai>",
    "keywords": "voice, ai, speech-to-text, text-to-speech, tts, stt, audio, speech-recognition, voice-cloning, kokoro, resemble",
    "author": "Yapp",
    "author_email": "Yapp <support@yapp.ai>",
    "download_url": "https://files.pythonhosted.org/packages/96/34/4ad9543fe5a3c322667b608efedf4d7ec26dc19951838b15b68744f13cf8/yapp-0.2.0.tar.gz",
    "platform": null,
    "description": "# Yapp Python SDK\n\nThe official Python SDK for the Yapp Voice AI API. Easily integrate speech-to-text (STT) and text-to-speech (TTS) capabilities into your Python applications.\n\n## Features\n\n- **Simple, intuitive API** - Clean, Pythonic interface\n- **Multiple TTS models** - Kokoro-82M and ResembleAI Chatterbox\n- **Model-specific parameters** - Each model has its own unique parameters with validation\n- **Voice cloning** with ResembleAI's audio prompt feature\n- **Flexible audio handling** - Works with file paths, file objects, or raw bytes\n- **Type hints** for better IDE support\n- **Comprehensive error handling**\n\n## Available Models\n\n### Speech-to-Text (STT)\n\n| Model | Parameters | Description |\n|-------|-----------|-------------|\n| **Parakeet** | `file` | Audio file to transcribe (required, positional) |\n| | `start_time` | Start time in seconds for transcription window (optional) |\n| | `end_time` | End time in seconds for transcription window (optional) |\n\n**Example:**\n```python\n# Basic usage\nclient.speech_to_text.convert(\"parakeet\", \"audio.wav\")\n\n# With time window\nclient.speech_to_text.convert(\"parakeet\", \"audio.wav\", start_time=3.0, end_time=9.0)\n```\n\n### Text-to-Speech (TTS)\n\n| Model | Parameters | Description |\n|-------|-----------|-------------|\n| **Kokoro** | `voice` | Voice ID (default: \"af_bella\") |\n| | `speed` | Speech speed multiplier: 0.5-2.0 (default: 1.0) |\n| **ResembleAI** | `audio_prompt` | Reference audio file for voice cloning (optional) |\n| | `exaggeration` | Emotional intensity: 0.0-1.0 (default: 0.5) |\n| | `cfg_weight` | Adherence to reference voice: 0.0-1.0 (default: 0.5) |\n\n## Installation\n\nInstall from PyPI:\n\n```bash\npip install yapp\n```\n\nOr install from source:\n\n```bash\ngit clone https://github.com/yourusername/yapp-sdk.git\ncd yapp-sdk\npip install -e .\n```\n\n## Quick Start\n\n```python\nimport yapp\n\n# Initialize the client\nclient = yapp.Yapp(api_key=\"your-api-key\")\n\n# Transcribe audio to text\ntranscription = client.speech_to_text.convert(\"parakeet\", \"audio.wav\")\nprint(transcription.text)\n\n# Generate speech from text\nresponse = client.text_to_speech.convert(\"kokoro\", \"Hello world!\")\nresponse.save(\"output.wav\")\n```\n\n## Authentication\n\nYou can provide your API key in two ways:\n\n### 1. Pass it directly to the client:\n```python\nclient = yapp.Yapp(api_key=\"your-api-key\")\n```\n\n### 2. Set it as an environment variable:\n```bash\nexport YAPP_API_KEY=\"your-api-key\"\n```\n\n```python\nclient = yapp.Yapp()  # Will use YAPP_API_KEY from environment\n```\n\n## Usage Examples\n\n### Speech-to-Text (Transcription)\n\n#### Basic Transcription\n\nThe SDK uses the **Parakeet** multilingual STT model for transcription.\n\n```python\nimport yapp\n\nclient = yapp.Yapp(api_key=\"your-api-key\")\n\n# Transcribe an entire audio file using Parakeet\nresponse = client.speech_to_text.convert(\"parakeet\", \"audio.wav\")\nprint(response.text)\n```\n\n#### Transcription with Time Window\n\n```python\n# Transcribe only a specific time range\nresponse = client.speech_to_text.convert(\n    \"parakeet\",   # Model (positional)\n    \"audio.wav\",  # File (positional)\n    start_time=3.0,  # Start at 3 seconds\n    end_time=9.0     # End at 9 seconds\n)\nprint(response.text)\n```\n\n#### Multiple Audio Formats\n\nThe SDK automatically handles various audio formats:\n\n```python\n# From file path (string)\nresponse = client.speech_to_text.convert(\"parakeet\", \"audio.wav\")\n\n# From pathlib.Path\nfrom pathlib import Path\nresponse = client.speech_to_text.convert(\"parakeet\", Path(\"audio.mp3\"))\n\n# From file object\nwith open(\"audio.wav\", \"rb\") as f:\n    response = client.speech_to_text.convert(\"parakeet\", f)\n\n# From bytes\naudio_bytes = open(\"audio.wav\", \"rb\").read()\nresponse = client.speech_to_text.convert(\"parakeet\", audio_bytes)\n```\n\n### Text-to-Speech (Synthesis)\n\n#### Using Kokoro-82M Model\n\nKokoro parameters: `voice`, `speed`\n\n```python\nimport yapp\n\nclient = yapp.Yapp(api_key=\"your-api-key\")\n\n# Simple synthesis (uses defaults)\nresponse = client.text_to_speech.convert(\n    \"kokoro\",  # Model first\n    \"Hello world!\"\n)\nresponse.save(\"output.wav\")\n\n# With custom voice and speed\nresponse = client.text_to_speech.convert(\n    \"kokoro\",  # Model first\n    \"The quick brown fox jumps over the lazy dog.\",\n    voice=\"af_bella\",  # Kokoro parameter\n    speed=1.2          # Kokoro parameter - 20% faster\n)\nresponse.save(\"output_fast.wav\")\n\n# Or use the kokoro() method directly\nresponse = client.text_to_speech.kokoro(\n    text=\"Direct method call\",\n    voice=\"af_bella\",\n    speed=0.8  # 20% slower\n)\nresponse.save(\"output_slow.wav\")\n```\n\n#### Using ResembleAI Model (with Voice Cloning)\n\nResembleAI parameters: `audio_prompt`, `exaggeration`, `cfg_weight`\n\n```python\n# Simple generation\nresponse = client.text_to_speech.convert(\n    \"resemble\",  # Model first\n    \"Hello world!\",\n    exaggeration=0.5,  # Emotional intensity (0.0 - 1.0)\n    cfg_weight=0.5     # Adherence to reference voice (0.0 - 1.0)\n)\nresponse.save(\"output.wav\")\n\n# Voice cloning with audio prompt\nresponse = client.text_to_speech.convert(\n    \"resemble\",  # Model first\n    \"This should sound like the reference voice.\",\n    audio_prompt=\"reference_voice.wav\",  # Reference audio for cloning\n    cfg_weight=0.9                       # High adherence to reference\n)\nresponse.save(\"cloned_voice.wav\")\n\n# Highly expressive speech\nresponse = client.text_to_speech.convert(\n    \"resemble\",  # Model first\n    \"Wow! This is amazing!\",\n    exaggeration=0.9,  # High emotional intensity\n    cfg_weight=0.5\n)\nresponse.save(\"expressive.wav\")\n\n# Or use the resemble() method directly\nresponse = client.text_to_speech.resemble(\n    text=\"Direct method call\",\n    audio_prompt=\"reference.wav\",\n    exaggeration=0.7,\n    cfg_weight=0.8\n)\nresponse.save(\"output.wav\")\n```\n\n#### Discovering Model Parameters\n\nThe SDK provides methods to discover what parameters are available for each TTS model:\n\n```python\n# Parakeet (STT) parameters\n# Model: parakeet\n# Parameters:\n#   - file (required): Audio file to transcribe\n#   - start_time (optional): Start time in seconds\n#   - end_time (optional): End time in seconds\nclient.speech_to_text.convert(\"parakeet\", \"audio.wav\", start_time=0, end_time=10)\n\n# List all available TTS models\nmodels = client.text_to_speech.list_models()\nprint(models)  # ['kokoro', 'resemble']\n\n# Print help for a specific TTS model\nclient.text_to_speech.print_model_help(\"kokoro\")\n# Output:\n# Model: kokoro\n# Parameters:\n#   - voice (str, default='af_bella'): Voice to use for synthesis\n#   - speed (float, default=1.0): Speech speed multiplier (0.5 = half speed, 2.0 = double speed)\n\nclient.text_to_speech.print_model_help(\"resemble\")\n# Output:\n# Model: resemble\n# Parameters:\n#   - audio_prompt (AudioFile, default=None): Reference audio file for voice cloning (optional)\n#   - exaggeration (float, default=0.5): Emotional intensity, range 0.0-1.0\n#   - cfg_weight (float, default=0.5): Adherence to reference voice, range 0.0-1.0\n\n# Get parameter specifications programmatically\nparams = client.text_to_speech.get_model_parameters(\"kokoro\")\nfor param_name, param_info in params.items():\n    print(f\"{param_name}: {param_info['description']}\")\n```\n\n#### Parameter Validation\n\nThe SDK validates that you're using the correct parameters for each model:\n\n```python\n# This works - correct Kokoro parameters\nresponse = client.text_to_speech.convert(\n    \"kokoro\", \"Hello\", voice=\"af_bella\", speed=1.2\n)\n\n# This raises ValidationError with helpful message\ntry:\n    response = client.text_to_speech.convert(\n        \"resemble\", \"Hello\", speed=1.2  # ERROR!\n    )\nexcept ValidationError as e:\n    print(e)\n    # Output: Unknown parameters for ResembleAI model: speed.\n    #         Valid parameters: audio_prompt, exaggeration, cfg_weight\n    #         Use client.text_to_speech.print_model_help('resemble') for more details.\n\n# This also raises ValidationError\nresponse = client.text_to_speech.convert(\n    \"kokoro\", \"Hello\", exaggeration=0.5  # ERROR!\n)\n```\n\n### Working with Audio Responses\n\n```python\n# Save to file\nresponse = client.text_to_speech.convert(\"kokoro\", \"Hello world!\")\nresponse.save(\"output.wav\")\n\n# Or use stream_to_file (alias for save)\nresponse.stream_to_file(\"output.wav\")\n\n# Get raw bytes\naudio_bytes = response.content\nprint(f\"Generated {len(audio_bytes)} bytes\")\n\n# Check content type\nprint(response.content_type)  # e.g., \"audio/wav\"\n```\n\n## API Reference\n\n### `yapp.Yapp`\n\nMain client class for the Yapp API.\n\n**Parameters:**\n- `api_key` (str, optional): Your Yapp API key\n- `timeout` (int, default=30): Request timeout in seconds\n\n**Properties:**\n- `speech_to_text`: Speech-to-text (STT) resource for audio transcription\n- `text_to_speech`: Text-to-speech (TTS) resource for audio synthesis\n\n---\n\n### `client.speech_to_text.convert()`\n\nTranscribe audio to text using the Parakeet STT model.\n\n**Parameters:**\n- `model` (str): STT model to use (currently: \"parakeet\") - **positional, required**\n- `file` (str | Path | BinaryIO | bytes): Audio file to transcribe - **positional, required**\n- `start_time` (float, optional): Start time in seconds for transcription window\n- `end_time` (float, optional): End time in seconds for transcription window\n- `**kwargs`: Additional model-specific parameters (reserved for future use)\n\n**Example:**\n```python\n# Both model and file are positional\nresponse = client.speech_to_text.convert(\"parakeet\", \"audio.wav\")\n```\n\n**Available Models:**\n- `\"parakeet\"` - nvidia/parakeet-tdt-0.6b-v3 - Multilingual ASR with word-level timestamps\n\n**Returns:** `TranscriptionResponse`\n- `.text`: The transcribed text\n- `.metadata`: Additional metadata from the API (may include word-level timestamps)\n\n**Supported audio formats:** WAV, MP3, MP4, M4A, OGG, FLAC, PCM\n\n---\n\n### `client.text_to_speech.convert()`\n\nGenerate speech from text. This is a unified interface that routes to the appropriate model.\n\n**Parameters:**\n- `model` (str): Model to use (\"kokoro\" or \"resemble\") - **required, first parameter**\n- `text` (str): Text to convert to speech\n- `**kwargs`: Model-specific parameters (see below)\n\n**Model-Specific Parameters:**\n\n**For Kokoro model:**\n- `voice` (str, default=\"af_bella\"): Voice to use for synthesis\n- `speed` (float, default=1.0): Speech speed multiplier (0.5 = half speed, 2.0 = double speed)\n\n**For ResembleAI model:**\n- `audio_prompt` (str | Path | BinaryIO | bytes, optional): Reference audio for voice cloning\n- `exaggeration` (float, default=0.5): Emotional intensity, range 0.0-1.0\n- `cfg_weight` (float, default=0.5): Adherence to reference voice, range 0.0-1.0\n\n**Returns:** `AudioResponse`\n\n**Examples:**\n```python\n# Kokoro - model comes first!\nresponse = client.text_to_speech.convert(\n    \"kokoro\", \"Hello\", voice=\"af_bella\", speed=1.2\n)\n\n# ResembleAI - model comes first!\nresponse = client.text_to_speech.convert(\n    \"resemble\", \"Hello\", exaggeration=0.7, cfg_weight=0.6\n)\n```\n\n**See also:** Use `print_model_help()` to discover parameters\n\n---\n\n### `client.text_to_speech.list_models()`\n\nList all available TTS models.\n\n**Returns:** `list` - List of model names\n\n**Example:**\n```python\nmodels = client.text_to_speech.list_models()\nprint(models)  # ['kokoro', 'resemble']\n```\n\n---\n\n### `client.text_to_speech.get_model_parameters()`\n\nGet parameter specifications for a specific model.\n\n**Parameters:**\n- `model` (str): Model name\n\n**Returns:** `dict` - Parameter specifications with types, defaults, and descriptions\n\n**Example:**\n```python\nparams = client.text_to_speech.get_model_parameters(\"kokoro\")\nfor name, info in params.items():\n    print(f\"{name}: {info['description']}\")\n```\n\n---\n\n### `client.text_to_speech.print_model_help()`\n\nPrint helpful information about a model's parameters to console.\n\n**Parameters:**\n- `model` (str): Model name\n\n**Example:**\n```python\nclient.text_to_speech.print_model_help(\"kokoro\")\n# Prints:\n# Model: kokoro\n# Parameters:\n#   - voice (str, default='af_bella'): Voice to use for synthesis\n#   - speed (float, default=1.0): Speech speed multiplier...\n```\n\n---\n\n### `client.text_to_speech.kokoro()`\n\nGenerate speech using the Kokoro-82M model.\n\n**Parameters:**\n- `text` (str): Text to convert to speech\n- `voice` (str, default=\"af_bella\"): Voice to use\n- `speed` (float, default=1.0): Speech speed multiplier\n\n**Returns:** `AudioResponse`\n\n---\n\n### `client.text_to_speech.resemble()`\n\nGenerate speech using ResembleAI Chatterbox with voice cloning.\n\n**Parameters:**\n- `text` (str): Text to convert to speech\n- `audio_prompt` (str | Path | BinaryIO | bytes, optional): Reference audio for voice cloning\n- `exaggeration` (float, default=0.5): Emotional intensity (0.0 - 1.0)\n- `cfg_weight` (float, default=0.5): Adherence to reference voice (0.0 - 1.0)\n\n**Returns:** `AudioResponse`\n\n---\n\n### `AudioResponse`\n\nResponse object containing generated audio.\n\n**Properties:**\n- `content`: Raw audio bytes\n- `content_type`: MIME type of the audio\n\n**Methods:**\n- `save(file_path)`: Save audio to file\n- `stream_to_file(file_path)`: Alias for `save()`\n\n---\n\n### `TranscriptionResponse`\n\nResponse object containing transcribed text.\n\n**Properties:**\n- `text`: The transcribed text\n- `metadata`: Additional metadata\n\n## Complete Workflow Example\n\n```python\nimport yapp\n\n# Initialize client\nclient = yapp.Yapp(api_key=\"your-api-key\")\n\n# 1. Transcribe audio\ntranscription = client.speech_to_text.convert(\n    \"parakeet\",      # Model (positional)\n    \"original.wav\",  # File (positional)\n    start_time=0,\n    end_time=10\n)\nprint(f\"Original: {transcription.text}\")\n\n# 2. Modify the text\nmodified_text = transcription.text.upper()\n\n# 3. Generate new speech with Kokoro\nresponse = client.text_to_speech.convert(\n    \"kokoro\", modified_text, voice=\"af_bella\", speed=1.0\n)\nresponse.save(\"output_kokoro.wav\")\n\n# 4. Clone voice from original audio\ncloned = client.text_to_speech.convert(\n    \"resemble\", \"New text in the original voice\",\n    audio_prompt=\"original.wav\", cfg_weight=0.9\n)\ncloned.save(\"cloned_voice.wav\")\n```\n\n## Error Handling\n\nThe SDK provides specific exception types for different error scenarios:\n\n```python\nfrom yapp import YappError, APIError, AuthenticationError, ValidationError\n\ntry:\n    response = client.text_to_speech.convert(text=\"Hello world!\")\n    response.save(\"output.wav\")\nexcept AuthenticationError as e:\n    print(f\"Authentication failed: {e}\")\nexcept ValidationError as e:\n    print(f\"Invalid parameters: {e}\")\nexcept APIError as e:\n    print(f\"API error (status {e.status_code}): {e.message}\")\nexcept YappError as e:\n    print(f\"Yapp SDK error: {e}\")\n```\n\n## Supported Audio Formats\n\n### Input (Transcription)\n- WAV (.wav)\n- MP3 (.mp3)\n- MP4 Audio (.mp4, .m4a)\n- OGG (.ogg)\n- FLAC (.flac)\n- PCM (.pcm)\n\n### Output (Synthesis)\n- WAV (default output format)\n\n## Development\n\n### Running Examples\n\n```bash\ncd examples\npython discover_parameters.py   # Learn about model parameters\npython transcribe_audio.py      # Speech-to-text examples\npython synthesize_speech.py     # Text-to-speech examples\npython voice_cloning.py         # Voice cloning with ResembleAI\npython model_parameters.py      # Model-specific parameter examples\npython full_workflow.py         # Complete workflow\n```\n\n### Installing for Development\n\n```bash\ngit clone https://github.com/yourusername/yapp-sdk.git\ncd yapp-sdk\npip install -e .\n```\n\n## Roadmap\n\n- [ ] Add streaming support for real-time TTS\n- [ ] Support for additional TTS models\n- [ ] Async client support\n- [ ] Audio format conversion utilities\n- [ ] Batch processing capabilities\n- [ ] WebSocket support for real-time conversations\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Support\n\nFor issues and questions:\n- GitHub Issues: https://github.com/yourusername/yapp-sdk/issues\n- Documentation: https://docs.yapp.ai\n- Email: support@yapp.ai\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python SDK for Yapp Voice AI API - Speech-to-Text and Text-to-Speech",
    "version": "0.2.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/yourusername/yapp-sdk/issues",
        "Documentation": "https://github.com/yourusername/yapp-sdk#readme",
        "Homepage": "https://github.com/yourusername/yapp-sdk",
        "Repository": "https://github.com/yourusername/yapp-sdk"
    },
    "split_keywords": [
        "voice",
        " ai",
        " speech-to-text",
        " text-to-speech",
        " tts",
        " stt",
        " audio",
        " speech-recognition",
        " voice-cloning",
        " kokoro",
        " resemble"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "62a46e9b42497f263d567f8f0ad942158967cbb2a2e8ba608d525e9350ee6bd8",
                "md5": "70730616ba4576c061b33783c83c0a5a",
                "sha256": "f2b3585c77d08d5a4578ca0d1f9623786d8d8e8dae34b932d336f9ef9dd9075c"
            },
            "downloads": -1,
            "filename": "yapp-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "70730616ba4576c061b33783c83c0a5a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 15786,
            "upload_time": "2025-10-27T22:24:09",
            "upload_time_iso_8601": "2025-10-27T22:24:09.764458Z",
            "url": "https://files.pythonhosted.org/packages/62/a4/6e9b42497f263d567f8f0ad942158967cbb2a2e8ba608d525e9350ee6bd8/yapp-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "96344ad9543fe5a3c322667b608efedf4d7ec26dc19951838b15b68744f13cf8",
                "md5": "19552de45086066953297566d36c04b5",
                "sha256": "9c3f6cd053391a34156ee78af31387848c4604941d08429877e8a316c9110372"
            },
            "downloads": -1,
            "filename": "yapp-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "19552de45086066953297566d36c04b5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 19175,
            "upload_time": "2025-10-27T22:24:11",
            "upload_time_iso_8601": "2025-10-27T22:24:11.214058Z",
            "url": "https://files.pythonhosted.org/packages/96/34/4ad9543fe5a3c322667b608efedf4d7ec26dc19951838b15b68744f13cf8/yapp-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-27 22:24:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yourusername",
    "github_project": "yapp-sdk",
    "github_not_found": true,
    "lcname": "yapp"
}
        
Elapsed time: 1.59529s