
<div align="center">
<a href="https://twitter.com/smallest_AI">
<img src="https://img.shields.io/twitter/url/https/twitter.com/smallest_AI.svg?style=social&label=Follow%20smallest_AI" alt="Twitter">
<a href="https://discord.gg/ywShEyXHBW">
<img src="https://dcbadge.vercel.app/api/server/ywShEyXHBW?style=flat" alt="Discord">
</a>
<a href="https://www.linkedin.com/company/smallest">
<img src="https://img.shields.io/badge/LinkedIn-Connect-blue" alt="Linkedin">
</a>
<a href="https://www.youtube.com/@smallest_ai">
<img src="https://img.shields.io/static/v1?message=smallest_ai&logo=youtube&label=&color=FF0000&logoColor=white&labelColor=&style=for-the-badge" height=20 alt="Youtube">
</a>
</div>
## Official Python Client for Smallest AI API
Smallest AI offers an end to end Voice AI suite for developers trying to build real-time voice agents. You can either directly use our Text to Speech APIs through the Waves Client or use the Atoms Client to build and operate end to end enterprise ready Voice Agents.
With this sdk, you can easily interact with both Waves and Atoms from any Python 3.9+ application, by utilising WavesClient and AtomsClient classes respectively. Currently, the WavesClient supports direct synthesis and the ability to synthesize streamed LLM output, both synchronously and asynchronously. AtomsClient provides a simpler way of interacting with all our API's to develop and run agentic workflows.
To learn how to use our API's, check out our documentation for [Atoms](https://atoms-docs.smallest.ai/introduction) and [Waves](https://waves-docs.smallest.ai/content/introduction/)
## Table of Contents
- [Installation](#installation)
- [Get the API Key](#get-the-api-key)
- [What are Atoms?](#what-are-atoms)
- [Creating your first Agent](#creating-your-first-agent)
- [Placing an outbound call](#placing-an-outbound-call)
- [Providing context to the agent](#providing-context-to-the-agent)
- [Configuring workflows to drive conversations](#configuring-workflows-to-drive-conversations)
- [Provisioning bulk calling using campaigns](#provisioning-bulk-calling-using-campaigns)
- [Getting started with Waves](#getting-started-with-waves)
- [Best Practices for Input Text](#best-practices-for-input-text)
- [Examples](#examples)
- [Synchronous](#synchronous)
- [Asynchronous](#asynchronous)
- [LLM to Speech](#llm-to-speech)
- [Add your Voice](#add-your-voice)
- [Synchronously](#add-synchronously)
- [Asynchronously](#add-asynchronously)
- [Delete your Voice](#delete-your-voice)
- [Synchronously](#delete-synchronously)
- [Asynchronously](#delete-asynchronously)
- [Available Methods](#available-methods)
- [Technical Note: WAV Headers in Streaming Audio](#technical-note-wav-headers-in-streaming-audio)
## Installation
To install the latest version available
```bash
pip install smallestai
```
When using an SDK in your application, make sure to pin to at least the major version (e.g., ==1.*). This helps ensure your application remains stable and avoids potential issues from breaking changes in future updates.
## Get the API Key
1. Visit [console.smallest.ai](https://console.smallest.ai//) and sign up for an account or log in if you already have an account.
2. Navigate to `API Keys` tab in your account dashboard.
3. Create a new API Key and copy it.
4. Export the API Key in your environment with the name `SMALLEST_API_KEY`, ensuring that your application can access it securely for authentication.
## What are Atoms
Atoms are agents that can talk to anyone on voice or text in any language, in any voice. Imagine an AI that you can hire to perform end-to-end tasks for your business. The following examples give an overview of how AtomsClient leverages abstractions such as KnowledgeBase, Campaigns and graph-based Workflows to let you build the smartest voice agent for your usecase.
You can find the full reference for Atoms [here](./docs/atoms/Api.md).
### Creating your first Agent
```python
from smallestai.atoms import AtomsClient
TARGET_PHONE_NUMBER = "+919666666666"
def main():
# alternatively, you can export API Key as environment variable SMALLEST_API_KEY.
config = Configuration(
access_token = 'SMALLEST_API_KEY'
)
atoms_client = AtomsClient(config)
agent_id = atoms_client.create_agent(
create_agent_request={
"name": "Atoms Multi-Modal Agent",
"description": "My first atoms agent",
"language": {
"enabled": "en",
"switching": False
},
"synthesizer": {
"voiceConfig": {
"model": "waves_lightning_large",
"voiceId": "nyah"
},
"speed": 1.2,
"consistency": 0.5,
"similarity": 0,
"enhancement": 1
},
"slmModel": "electron",
}
).data
print(f"Successfully created agent with id: {agent_id}")
if __name__ == "__main__":
main()
```
### Placing an outbound call
```python
from smallestai.atoms import AtomsClient
from smallestai.atoms import Configuration
TARGET_PHONE_NUMBER = "+919666666666"
MY_AGENT_ID = "67e****ff*ec***82*3c9e**"
def main():
# assumes you have exported API_KEY in SMALLEST_API_KEY environment variable
atoms_client = AtomsClient()
call_response = atoms_client.start_outbound_call(
start_outbound_call_request={
"agent_id": MY_AGENT_ID,
"phone_number": TARGET_PHONE_NUMBER,
}
)
print(f"Successfully placed call with id: {call_response.conversation_id}")
if __name__ == "__main__":
main()
```
### Providing context to the agent
An agent can be attached to a knowledge base, which it can look up during conversations. Here is how you can do it:
```python
from smallestai.atoms import AtomsClient
def main():
# assumes you have exported API_KEY in SMALLEST_API_KEY environment variable
atoms_client = AtomsClient()
# Create a new knowledge base
knowledge_base = atoms_client.create_knowledge_base(
create_knowledge_base_request={
"name": "Customer Support Knowledge Base",
"description": "Contains FAQs and product information"
}
)
knowledge_base_id = knowledge_base.data
with open("product_manual.pdf", "rb") as f:
media_content = f.read()
media_response = atoms_client.upload_media_to_knowledge_base(
id=knowledge_base_id,
media=media_content
)
print("Added product_manual.pdf to knowledge base")
if __name__ == "__main__":
main()
```
### Configuring workflows to drive conversations
An agent can be configured with a graph-based workflow to help it drive meaningful conversations. You can explore making one on our [platform](https://atoms.smallest.ai/dashboard/agents). Refer to our [documentation](https://atoms-docs.smallest.ai/deep-dive/workflow/what-is-a-workflow) for learning more extensively.

### Provisioning bulk calling using campaigns
To manage bulk calls, you can use [Atoms platform](https://atoms.smallest.ai/dashboard/audience) to create [audience](https://atoms-docs.smallest.ai/deep-dive/audience/audience) (collection of contacts) and then configure [campaigns](https://atoms-docs.smallest.ai/deep-dive/campaign/campaign) to run.
## Getting started with Waves
### Best Practices for Input Text
### Examples
#### Synchronous
A synchronous text-to-speech synthesis client.
**Basic Usage:**
```python
from smallestai.waves import WavesClient
def main():
waves_client = WavesClient(api_key="SMALLEST_API_KEY")
waves_client.synthesize(
text="Hello, this is a test for sync synthesis function.",
save_as="sync_synthesize.wav"
)
if __name__ == "__main__":
main()
```
**Parameters:**
- `api_key`: Your API key (can be set via SMALLEST_API_KEY environment variable)
- `model`: TTS model to use (default: "lightning")
- `sample_rate`: Audio sample rate (default: 24000)
- `voice_id`: Voice ID (default: "emily")
- `speed`: Speech speed multiplier (default: 1.0)
- `consistency`: Controls word repetition and skipping. Decrease it to prevent skipped words, and increase it to prevent repetition. Only supported in `lightning-large` model. (default: 0.5)
- `similarity`: Controls the similarity between the synthesized audio and the reference audio. Increase it to make the speech more similar to the reference audio. Only supported in `lightning-large` model. (default: 0)
- `enhancement`: Enhances speech quality at the cost of increased latency. Only supported in `lightning-large` model. (default: False)
- `add_wav_header`: Whether to add a WAV header to the output audio.
These parameters are part of the `Smallest` instance. They can be set when creating the instance (as shown above). However, the `synthesize` function also accepts `kwargs`, allowing you to override these parameters for a specific synthesis request.
For example, you can modify the speech speed and sample rate just for a particular synthesis call:
```py
client.synthesize(
"Hello, this is a test for sync synthesis function.",
save_as="sync_synthesize.wav",
speed=1.5, # Overrides default speed
sample_rate=16000 # Overrides default sample rate
)
```
#### Asynchronous
Asynchronous text-to-speech synthesis client.
**Basic Usage:**
```python
import asyncio
import aiofiles
import smallestai
async def main():
client = smallestai.waves.AsyncWavesClient(api_key="SMALLEST_API_KEY")
async with client as tts:
audio_bytes = await tts.synthesize("Hello, this is a test of the async synthesis function.")
async with aiofiles.open("async_synthesize.wav", "wb") as f:
await f.write(audio_bytes) # alternatively you can use the `save_as` parameter.
if __name__ == "__main__":
asyncio.run(main())
```
**Running Asynchronously in a Jupyter Notebook**
If you are using a Jupyter Notebook, use the following approach to execute the asynchronous function within an existing event loop:
```python
import asyncio
import aiofiles
from smallest import AsyncSmallest
async def main():
client = AsyncSmallest(api_key="SMALLEST_API_KEY")
async with client as tts:
audio_bytes = await tts.synthesize("Hello, this is a test of the async synthesis function.")
async with aiofiles.open("async_synthesize.wav", "wb") as f:
await f.write(audio_bytes) # alternatively you can use the `save_as` parameter.
await main()
```
**Parameters:**
- `api_key`: Your API key (can be set via SMALLEST_API_KEY environment variable)
- `model`: TTS model to use (default: "lightning")
- `sample_rate`: Audio sample rate (default: 24000)
- `voice_id`: Voice ID (default: "emily")
- `speed`: Speech speed multiplier (default: 1.0)
- `consistency`: Controls word repetition and skipping. Decrease it to prevent skipped words, and increase it to prevent repetition. Only supported in `lightning-large` model.
- `similarity`: Controls the similarity between the synthesized audio and the reference audio. Increase it to make the speech more similar to the reference audio. Only supported in `lightning-large` model.
- `enhancement`: Enhances speech quality at the cost of increased latency. Only supported in `lightning-large` model.
- `add_wav_header`: Whether to add a WAV header to the output audio.
These parameters are part of the `AsyncSmallest` instance. They can be set when creating the instance (as shown above). However, the `synthesize` function also accepts `kwargs`, allowing you to override any of these parameters on a per-request basis.
For example, you can modify the speech speed and sample rate just for a particular synthesis request:
```py
audio_bytes = await tts.synthesize(
"Hello, this is a test of the async synthesis function.",
speed=1.5, # Overrides default speed
sample_rate=16000 # Overrides default sample rate
)
```
#### LLM to Speech
The `TextToAudioStream` class provides real-time text-to-speech processing, converting streaming text into audio output. It's particularly useful for applications like voice assistants, live captioning, or interactive chatbots that require immediate audio feedback from text generation. Supports both synchronous and asynchronous TTS instance.
##### Stream through a WebSocket
```python
import asyncio
import websockets
from groq import Groq
from smallest import Smallest, TextToAudioStream
# Initialize Groq (LLM) and Smallest (TTS) instances
llm = Groq(api_key="GROQ_API_KEY")
tts = Smallest(api_key="SMALLEST_API_KEY")
WEBSOCKET_URL = "wss://echo.websocket.events" # Mock WebSocket server
# Async function to stream text generation from LLM
async def generate_text(prompt):
completion = llm.chat.completions.create(
messages=[{"role": "user", "content": prompt}],
model="llama3-8b-8192",
stream=True,
)
# Yield text as it is generated
for chunk in completion:
text = chunk.choices[0].delta.content
if text:
yield text
# Main function to run the process
async def main():
# Initialize the TTS processor
processor = TextToAudioStream(tts_instance=tts)
# Generate text from LLM
llm_output = generate_text("Explain text to speech like I am five in 5 sentences.")
# Stream the generated speech throught a websocket
async with websockets.connect(WEBSOCKET_URL) as ws:
print("Connected to WebSocket server.")
# Stream the generated speech
async for audio_chunk in processor.process(llm_output):
await ws.send(audio_chunk) # Send audio chunk
echoed_data = await ws.recv() # Receive the echoed message
print("Received from server:", echoed_data[:20], "...") # Print first 20 bytes
print("WebSocket connection closed.")
if __name__ == "__main__":
asyncio.run(main())
```
##### Save to a File
```python
import wave
import asyncio
from groq import Groq
from smallest import Smallest, TextToAudioStream
llm = Groq(api_key="GROQ_API_KEY")
tts = Smallest(api_key="SMALLEST_API_KEY")
async def generate_text(prompt):
"""Async generator for streaming text from Groq. You can use any LLM"""
completion = llm.chat.completions.create(
messages=[
{
"role": "user",
"content": prompt,
}
],
model="llama3-8b-8192",
stream=True,
)
for chunk in completion:
text = chunk.choices[0].delta.content
if text is not None:
yield text
async def save_audio_to_wav(file_path, processor, llm_output):
with wave.open(file_path, "wb") as wav_file:
wav_file.setnchannels(1)
wav_file.setsampwidth(2)
wav_file.setframerate(24000)
async for audio_chunk in processor.process(llm_output):
wav_file.writeframes(audio_chunk)
async def main():
# Initialize the TTS processor with the TTS instance
processor = TextToAudioStream(tts_instance=tts)
# Generate text asynchronously and process it
llm_output = generate_text("Explain text to speech like I am five in 5 sentences.")
# As an example, save the generated audio to a WAV file.
await save_audio_to_wav("llm_to_speech.wav", processor, llm_output)
if __name__ == "__main__":
asyncio.run(main())
```
**Parameters:**
- `tts_instance`: Text-to-speech engine (Smallest or AsyncSmallest)
- `queue_timeout`: Wait time for new text (seconds, default: 5.0)
- `max_retries`: Number of retry attempts for failed synthesis (default: 3)
**Output Format:**
The processor yields raw audio data chunks without WAV headers for streaming efficiency. These chunks can be:
- Played directly through an audio device
- Saved to a file
- Streamed over a network
- Further processed as needed
#### Add your Voice
The Smallest AI SDK allows you to clone your voice by uploading an audio file. This feature is available both synchronously and asynchronously, making it flexible for different use cases. Below are examples of how to use this functionality.
##### Add Synchronously
```python
from smallest import Smallest
def main():
client = Smallest(api_key="SMALLEST_API_KEY")
res = client.add_voice(display_name="My Voice", file_path="my_voice.wav")
print(res)
if __name__ == "__main__":
main()
```
##### Add Asynchronously
```python
import asyncio
from smallest import AsyncSmallest
async def main():
client = AsyncSmallest(api_key="SMALLEST_API_KEY")
res = await client.add_voice(display_name="My Voice", file_path="my_voice.wav")
print(res)
if __name__ == "__main__":
asyncio.run(main())
```
#### Delete your Voice
The Smallest AI SDK allows you to delete your cloned voice. This feature is available both synchronously and asynchronously, making it flexible for different use cases. Below are examples of how to use this functionality.
##### Delete Synchronously
```python
from smallest import Smallest
def main():
client = Smallest(api_key="SMALLEST_API_KEY")
res = client.delete_voice(voice_id="voice_id")
print(res)
if __name__ == "__main__":
main()
```
##### Delete Asynchronously
```python
import asyncio
from smallest import AsyncSmallest
async def main():
client = AsyncSmallest(api_key="SMALLEST_API_KEY")
res = await client.delete_voice(voice_id="voice_id")
print(res)
if __name__ == "__main__":
asyncio.run(main())
```
#### Available Methods
```python
from smallest import Smallest
client = Smallest(api_key="SMALLEST_API_KEY")
print(f"Available Languages: {client.get_languages()}")
print(f"Available Voices: {client.get_voices(model='lightning')}")
print(f"Available Voices: {client.get_cloned_voices()}")
print(f"Available Models: {client.get_models()}")
```
#### Technical Note: WAV Headers in Streaming Audio
When implementing audio streaming with chunks of synthesized speech, WAV headers are omitted from individual chunks because:
##### Technical Issues
- Each WAV header contains metadata about the entire audio file.
- Multiple headers would make chunks appear as separate audio files and add redundancy.
- Headers contain file-specific data (like total size) that's invalid for chunks.
- Sequential playback of chunks with headers causes audio artifacts (pop sounds) when concatenating or playing audio sequentially.
- Audio players would try to reinitialize audio settings for each chunk.
##### Best Practices for Audio Streaming
1. Stream raw PCM audio data without headers
2. Add a single WAV header only when:
- Saving the complete stream to a file
- Initializing the audio playback system
- Converting the stream to a standard audio format
Raw data
{
"_id": null,
"home_page": null,
"name": "smallestai",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "smallest, smallest.ai, tts, text-to-speech, waves, atoms",
"author": null,
"author_email": "Smallest <support@smallest.ai>",
"download_url": "https://files.pythonhosted.org/packages/a6/43/4c0348c4a0c4ceb965b6b11e0dbb121a6a9063e144918eeac681beeceb74/smallestai-4.0.1.tar.gz",
"platform": null,
"description": " \n\n\n<div align=\"center\">\n <a href=\"https://twitter.com/smallest_AI\">\n <img src=\"https://img.shields.io/twitter/url/https/twitter.com/smallest_AI.svg?style=social&label=Follow%20smallest_AI\" alt=\"Twitter\">\n <a href=\"https://discord.gg/ywShEyXHBW\">\n <img src=\"https://dcbadge.vercel.app/api/server/ywShEyXHBW?style=flat\" alt=\"Discord\">\n </a>\n <a href=\"https://www.linkedin.com/company/smallest\">\n <img src=\"https://img.shields.io/badge/LinkedIn-Connect-blue\" alt=\"Linkedin\">\n </a>\n <a href=\"https://www.youtube.com/@smallest_ai\">\n <img src=\"https://img.shields.io/static/v1?message=smallest_ai&logo=youtube&label=&color=FF0000&logoColor=white&labelColor=&style=for-the-badge\" height=20 alt=\"Youtube\">\n </a>\n</div> \n\n## Official Python Client for Smallest AI API \n\nSmallest AI offers an end to end Voice AI suite for developers trying to build real-time voice agents. You can either directly use our Text to Speech APIs through the Waves Client or use the Atoms Client to build and operate end to end enterprise ready Voice Agents.\n\nWith this sdk, you can easily interact with both Waves and Atoms from any Python 3.9+ application, by utilising WavesClient and AtomsClient classes respectively. Currently, the WavesClient supports direct synthesis and the ability to synthesize streamed LLM output, both synchronously and asynchronously. AtomsClient provides a simpler way of interacting with all our API's to develop and run agentic workflows. \n\nTo learn how to use our API's, check out our documentation for [Atoms](https://atoms-docs.smallest.ai/introduction) and [Waves](https://waves-docs.smallest.ai/content/introduction/)\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Get the API Key](#get-the-api-key)\n- [What are Atoms?](#what-are-atoms)\n - [Creating your first Agent](#creating-your-first-agent)\n - [Placing an outbound call](#placing-an-outbound-call)\n - [Providing context to the agent](#providing-context-to-the-agent)\n - [Configuring workflows to drive conversations](#configuring-workflows-to-drive-conversations)\n - [Provisioning bulk calling using campaigns](#provisioning-bulk-calling-using-campaigns) \n- [Getting started with Waves](#getting-started-with-waves)\n - [Best Practices for Input Text](#best-practices-for-input-text)\n - [Examples](#examples)\n - [Synchronous](#synchronous)\n - [Asynchronous](#asynchronous)\n - [LLM to Speech](#llm-to-speech)\n - [Add your Voice](#add-your-voice)\n - [Synchronously](#add-synchronously)\n - [Asynchronously](#add-asynchronously)\n - [Delete your Voice](#delete-your-voice)\n - [Synchronously](#delete-synchronously)\n - [Asynchronously](#delete-asynchronously)\n - [Available Methods](#available-methods)\n - [Technical Note: WAV Headers in Streaming Audio](#technical-note-wav-headers-in-streaming-audio)\n\n## Installation\n\nTo install the latest version available \n```bash\npip install smallestai\n``` \nWhen using an SDK in your application, make sure to pin to at least the major version (e.g., ==1.*). This helps ensure your application remains stable and avoids potential issues from breaking changes in future updates. \n \n\n## Get the API Key \n\n1. Visit [console.smallest.ai](https://console.smallest.ai//) and sign up for an account or log in if you already have an account. \n2. Navigate to `API Keys` tab in your account dashboard.\n3. Create a new API Key and copy it.\n4. Export the API Key in your environment with the name `SMALLEST_API_KEY`, ensuring that your application can access it securely for authentication.\n\n\n## What are Atoms\n\nAtoms are agents that can talk to anyone on voice or text in any language, in any voice. Imagine an AI that you can hire to perform end-to-end tasks for your business. The following examples give an overview of how AtomsClient leverages abstractions such as KnowledgeBase, Campaigns and graph-based Workflows to let you build the smartest voice agent for your usecase.\n\nYou can find the full reference for Atoms [here](./docs/atoms/Api.md).\n\n### Creating your first Agent\n\n```python\nfrom smallestai.atoms import AtomsClient\n\nTARGET_PHONE_NUMBER = \"+919666666666\"\n \ndef main():\n # alternatively, you can export API Key as environment variable SMALLEST_API_KEY. \n config = Configuration(\n access_token = 'SMALLEST_API_KEY' \n )\n\n atoms_client = AtomsClient(config)\n\n agent_id = atoms_client.create_agent(\n create_agent_request={\n \"name\": \"Atoms Multi-Modal Agent\",\n \"description\": \"My first atoms agent\",\n \"language\": {\n \"enabled\": \"en\",\n \"switching\": False\n },\n \"synthesizer\": {\n \"voiceConfig\": {\n \"model\": \"waves_lightning_large\",\n \"voiceId\": \"nyah\"\n },\n \"speed\": 1.2,\n \"consistency\": 0.5,\n \"similarity\": 0,\n \"enhancement\": 1\n },\n \"slmModel\": \"electron\",\n }\n ).data\n \n print(f\"Successfully created agent with id: {agent_id}\")\n\nif __name__ == \"__main__\":\n main()\n```\n\n### Placing an outbound call\n\n```python\nfrom smallestai.atoms import AtomsClient\nfrom smallestai.atoms import Configuration\n\nTARGET_PHONE_NUMBER = \"+919666666666\"\nMY_AGENT_ID = \"67e****ff*ec***82*3c9e**\"\n\ndef main():\n # assumes you have exported API_KEY in SMALLEST_API_KEY environment variable\n atoms_client = AtomsClient()\n\n call_response = atoms_client.start_outbound_call(\n start_outbound_call_request={\n \"agent_id\": MY_AGENT_ID,\n \"phone_number\": TARGET_PHONE_NUMBER,\n }\n )\n print(f\"Successfully placed call with id: {call_response.conversation_id}\")\n \nif __name__ == \"__main__\":\n main()\n```\n### Providing context to the agent\n\nAn agent can be attached to a knowledge base, which it can look up during conversations. Here is how you can do it:\n\n```python\nfrom smallestai.atoms import AtomsClient\n\ndef main():\n # assumes you have exported API_KEY in SMALLEST_API_KEY environment variable\n atoms_client = AtomsClient()\n \n # Create a new knowledge base\n knowledge_base = atoms_client.create_knowledge_base(\n create_knowledge_base_request={\n \"name\": \"Customer Support Knowledge Base\",\n \"description\": \"Contains FAQs and product information\"\n }\n )\n knowledge_base_id = knowledge_base.data\n\n with open(\"product_manual.pdf\", \"rb\") as f:\n media_content = f.read()\n media_response = atoms_client.upload_media_to_knowledge_base(\n id=knowledge_base_id,\n media=media_content\n )\n print(\"Added product_manual.pdf to knowledge base\")\n\nif __name__ == \"__main__\":\n main()\n ``` \n\n### Configuring workflows to drive conversations\n\nAn agent can be configured with a graph-based workflow to help it drive meaningful conversations. You can explore making one on our [platform](https://atoms.smallest.ai/dashboard/agents). Refer to our [documentation](https://atoms-docs.smallest.ai/deep-dive/workflow/what-is-a-workflow) for learning more extensively.\n\n\n\n### Provisioning bulk calling using campaigns\n\nTo manage bulk calls, you can use [Atoms platform](https://atoms.smallest.ai/dashboard/audience) to create [audience](https://atoms-docs.smallest.ai/deep-dive/audience/audience) (collection of contacts) and then configure [campaigns](https://atoms-docs.smallest.ai/deep-dive/campaign/campaign) to run. \n\n## Getting started with Waves\n\n### Best Practices for Input Text\n\n### Examples\n\n#### Synchronous \nA synchronous text-to-speech synthesis client. \n\n**Basic Usage:** \n```python\n\nfrom smallestai.waves import WavesClient\n\ndef main():\n waves_client = WavesClient(api_key=\"SMALLEST_API_KEY\")\n waves_client.synthesize(\n text=\"Hello, this is a test for sync synthesis function.\",\n save_as=\"sync_synthesize.wav\"\n )\n\nif __name__ == \"__main__\":\n main()\n```\n\n**Parameters:** \n- `api_key`: Your API key (can be set via SMALLEST_API_KEY environment variable)\n- `model`: TTS model to use (default: \"lightning\")\n- `sample_rate`: Audio sample rate (default: 24000)\n- `voice_id`: Voice ID (default: \"emily\")\n- `speed`: Speech speed multiplier (default: 1.0)\n- `consistency`: Controls word repetition and skipping. Decrease it to prevent skipped words, and increase it to prevent repetition. Only supported in `lightning-large` model. (default: 0.5)\n- `similarity`: Controls the similarity between the synthesized audio and the reference audio. Increase it to make the speech more similar to the reference audio. Only supported in `lightning-large` model. (default: 0)\n- `enhancement`: Enhances speech quality at the cost of increased latency. Only supported in `lightning-large` model. (default: False)\n- `add_wav_header`: Whether to add a WAV header to the output audio.\n\nThese parameters are part of the `Smallest` instance. They can be set when creating the instance (as shown above). However, the `synthesize` function also accepts `kwargs`, allowing you to override these parameters for a specific synthesis request.\n\nFor example, you can modify the speech speed and sample rate just for a particular synthesis call: \n```py\nclient.synthesize(\n \"Hello, this is a test for sync synthesis function.\",\n save_as=\"sync_synthesize.wav\",\n speed=1.5, # Overrides default speed\n sample_rate=16000 # Overrides default sample rate\n)\n```\n\n\n#### Asynchronous \nAsynchronous text-to-speech synthesis client. \n\n**Basic Usage:** \n```python\nimport asyncio\nimport aiofiles\nimport smallestai\n\nasync def main():\n client = smallestai.waves.AsyncWavesClient(api_key=\"SMALLEST_API_KEY\")\n async with client as tts:\n audio_bytes = await tts.synthesize(\"Hello, this is a test of the async synthesis function.\") \n async with aiofiles.open(\"async_synthesize.wav\", \"wb\") as f:\n await f.write(audio_bytes) # alternatively you can use the `save_as` parameter.\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n**Running Asynchronously in a Jupyter Notebook** \nIf you are using a Jupyter Notebook, use the following approach to execute the asynchronous function within an existing event loop:\n```python\nimport asyncio\nimport aiofiles\nfrom smallest import AsyncSmallest\n\nasync def main():\n client = AsyncSmallest(api_key=\"SMALLEST_API_KEY\")\n async with client as tts:\n audio_bytes = await tts.synthesize(\"Hello, this is a test of the async synthesis function.\") \n async with aiofiles.open(\"async_synthesize.wav\", \"wb\") as f:\n await f.write(audio_bytes) # alternatively you can use the `save_as` parameter.\n\nawait main()\n```\n\n**Parameters:** \n- `api_key`: Your API key (can be set via SMALLEST_API_KEY environment variable)\n- `model`: TTS model to use (default: \"lightning\")\n- `sample_rate`: Audio sample rate (default: 24000)\n- `voice_id`: Voice ID (default: \"emily\")\n- `speed`: Speech speed multiplier (default: 1.0)\n- `consistency`: Controls word repetition and skipping. Decrease it to prevent skipped words, and increase it to prevent repetition. Only supported in `lightning-large` model.\n- `similarity`: Controls the similarity between the synthesized audio and the reference audio. Increase it to make the speech more similar to the reference audio. Only supported in `lightning-large` model.\n- `enhancement`: Enhances speech quality at the cost of increased latency. Only supported in `lightning-large` model.\n- `add_wav_header`: Whether to add a WAV header to the output audio.\n\nThese parameters are part of the `AsyncSmallest` instance. They can be set when creating the instance (as shown above). However, the `synthesize` function also accepts `kwargs`, allowing you to override any of these parameters on a per-request basis. \n\nFor example, you can modify the speech speed and sample rate just for a particular synthesis request: \n```py\naudio_bytes = await tts.synthesize(\n \"Hello, this is a test of the async synthesis function.\",\n speed=1.5, # Overrides default speed\n sample_rate=16000 # Overrides default sample rate\n)\n```\n\n#### LLM to Speech \n\nThe `TextToAudioStream` class provides real-time text-to-speech processing, converting streaming text into audio output. It's particularly useful for applications like voice assistants, live captioning, or interactive chatbots that require immediate audio feedback from text generation. Supports both synchronous and asynchronous TTS instance.\n\n##### Stream through a WebSocket\n\n```python\nimport asyncio\nimport websockets\nfrom groq import Groq\nfrom smallest import Smallest, TextToAudioStream \n\n# Initialize Groq (LLM) and Smallest (TTS) instances\nllm = Groq(api_key=\"GROQ_API_KEY\")\ntts = Smallest(api_key=\"SMALLEST_API_KEY\")\nWEBSOCKET_URL = \"wss://echo.websocket.events\" # Mock WebSocket server\n\n# Async function to stream text generation from LLM\nasync def generate_text(prompt):\n completion = llm.chat.completions.create(\n messages=[{\"role\": \"user\", \"content\": prompt}],\n model=\"llama3-8b-8192\",\n stream=True,\n )\n\n # Yield text as it is generated\n for chunk in completion:\n text = chunk.choices[0].delta.content\n if text:\n yield text\n\n# Main function to run the process\nasync def main():\n # Initialize the TTS processor\n processor = TextToAudioStream(tts_instance=tts)\n\n # Generate text from LLM\n llm_output = generate_text(\"Explain text to speech like I am five in 5 sentences.\")\n\n # Stream the generated speech throught a websocket\n async with websockets.connect(WEBSOCKET_URL) as ws:\n print(\"Connected to WebSocket server.\")\n\n # Stream the generated speech\n async for audio_chunk in processor.process(llm_output):\n await ws.send(audio_chunk) # Send audio chunk\n echoed_data = await ws.recv() # Receive the echoed message\n print(\"Received from server:\", echoed_data[:20], \"...\") # Print first 20 bytes\n\n print(\"WebSocket connection closed.\")\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n##### Save to a File\n```python\nimport wave\nimport asyncio\nfrom groq import Groq\nfrom smallest import Smallest, TextToAudioStream\n\nllm = Groq(api_key=\"GROQ_API_KEY\")\ntts = Smallest(api_key=\"SMALLEST_API_KEY\")\n\nasync def generate_text(prompt):\n \"\"\"Async generator for streaming text from Groq. You can use any LLM\"\"\"\n completion = llm.chat.completions.create(\n messages=[\n {\n \"role\": \"user\",\n \"content\": prompt,\n }\n ],\n model=\"llama3-8b-8192\",\n stream=True,\n )\n\n for chunk in completion:\n text = chunk.choices[0].delta.content\n if text is not None:\n yield text\n\nasync def save_audio_to_wav(file_path, processor, llm_output):\n with wave.open(file_path, \"wb\") as wav_file:\n wav_file.setnchannels(1)\n wav_file.setsampwidth(2) \n wav_file.setframerate(24000)\n \n async for audio_chunk in processor.process(llm_output):\n wav_file.writeframes(audio_chunk)\n\nasync def main():\n # Initialize the TTS processor with the TTS instance\n processor = TextToAudioStream(tts_instance=tts)\n \n # Generate text asynchronously and process it\n llm_output = generate_text(\"Explain text to speech like I am five in 5 sentences.\")\n \n # As an example, save the generated audio to a WAV file.\n await save_audio_to_wav(\"llm_to_speech.wav\", processor, llm_output)\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n**Parameters:** \n\n- `tts_instance`: Text-to-speech engine (Smallest or AsyncSmallest)\n- `queue_timeout`: Wait time for new text (seconds, default: 5.0)\n- `max_retries`: Number of retry attempts for failed synthesis (default: 3)\n\n**Output Format:** \nThe processor yields raw audio data chunks without WAV headers for streaming efficiency. These chunks can be:\n\n- Played directly through an audio device\n- Saved to a file\n- Streamed over a network\n- Further processed as needed\n\n#### Add your Voice \nThe Smallest AI SDK allows you to clone your voice by uploading an audio file. This feature is available both synchronously and asynchronously, making it flexible for different use cases. Below are examples of how to use this functionality. \n\n##### Add Synchronously\n```python\nfrom smallest import Smallest\n\ndef main():\n client = Smallest(api_key=\"SMALLEST_API_KEY\")\n res = client.add_voice(display_name=\"My Voice\", file_path=\"my_voice.wav\")\n print(res)\n\nif __name__ == \"__main__\":\n main()\n``` \n\n##### Add Asynchronously\n```python\nimport asyncio\nfrom smallest import AsyncSmallest\n\nasync def main():\n client = AsyncSmallest(api_key=\"SMALLEST_API_KEY\")\n res = await client.add_voice(display_name=\"My Voice\", file_path=\"my_voice.wav\")\n print(res)\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n#### Delete your Voice\nThe Smallest AI SDK allows you to delete your cloned voice. This feature is available both synchronously and asynchronously, making it flexible for different use cases. Below are examples of how to use this functionality.\n\n##### Delete Synchronously\n```python\nfrom smallest import Smallest\n\ndef main():\n client = Smallest(api_key=\"SMALLEST_API_KEY\")\n res = client.delete_voice(voice_id=\"voice_id\")\n print(res)\n\nif __name__ == \"__main__\":\n main()\n```\n\n##### Delete Asynchronously\n```python\nimport asyncio\nfrom smallest import AsyncSmallest\n\nasync def main():\n client = AsyncSmallest(api_key=\"SMALLEST_API_KEY\")\n res = await client.delete_voice(voice_id=\"voice_id\")\n print(res)\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n#### Available Methods\n\n```python\nfrom smallest import Smallest\n\nclient = Smallest(api_key=\"SMALLEST_API_KEY\")\n\nprint(f\"Available Languages: {client.get_languages()}\")\nprint(f\"Available Voices: {client.get_voices(model='lightning')}\")\nprint(f\"Available Voices: {client.get_cloned_voices()}\")\nprint(f\"Available Models: {client.get_models()}\")\n```\n\n#### Technical Note: WAV Headers in Streaming Audio\n\nWhen implementing audio streaming with chunks of synthesized speech, WAV headers are omitted from individual chunks because:\n\n##### Technical Issues\n- Each WAV header contains metadata about the entire audio file.\n- Multiple headers would make chunks appear as separate audio files and add redundancy.\n- Headers contain file-specific data (like total size) that's invalid for chunks.\n- Sequential playback of chunks with headers causes audio artifacts (pop sounds) when concatenating or playing audio sequentially.\n- Audio players would try to reinitialize audio settings for each chunk.\n\n##### Best Practices for Audio Streaming\n1. Stream raw PCM audio data without headers\n2. Add a single WAV header only when:\n - Saving the complete stream to a file\n - Initializing the audio playback system\n - Converting the stream to a standard audio format\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Official Python client for the Smallest AI API",
"version": "4.0.1",
"project_urls": {
"Homepage": "https://github.com/smallest-inc/smallest-python-sdk"
},
"split_keywords": [
"smallest",
" smallest.ai",
" tts",
" text-to-speech",
" waves",
" atoms"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f3a48c5d5537ff03705df60da56108ba312af5cf3a22457131efef7f744c79a0",
"md5": "863e201e240531d9417548ba70ada59a",
"sha256": "6b385965f664a50fb5a742bb4bf89a9f99e17f15d89ae9240a1896ed728a4ef5"
},
"downloads": -1,
"filename": "smallestai-4.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "863e201e240531d9417548ba70ada59a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 232520,
"upload_time": "2025-07-29T10:44:31",
"upload_time_iso_8601": "2025-07-29T10:44:31.596151Z",
"url": "https://files.pythonhosted.org/packages/f3/a4/8c5d5537ff03705df60da56108ba312af5cf3a22457131efef7f744c79a0/smallestai-4.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a6434c0348c4a0c4ceb965b6b11e0dbb121a6a9063e144918eeac681beeceb74",
"md5": "497325b4c6661698727b29b35417596f",
"sha256": "ecf736c7fe4fb6566abb0403bc9c6df30a55efd97503ea5a1d1b1912e74e4664"
},
"downloads": -1,
"filename": "smallestai-4.0.1.tar.gz",
"has_sig": false,
"md5_digest": "497325b4c6661698727b29b35417596f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 79017,
"upload_time": "2025-07-29T10:44:32",
"upload_time_iso_8601": "2025-07-29T10:44:32.783852Z",
"url": "https://files.pythonhosted.org/packages/a6/43/4c0348c4a0c4ceb965b6b11e0dbb121a6a9063e144918eeac681beeceb74/smallestai-4.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-29 10:44:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "smallest-inc",
"github_project": "smallest-python-sdk",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "aiohttp",
"specs": [
[
"~=",
"3.11"
]
]
},
{
"name": "aiofiles",
"specs": [
[
"~=",
"24.1"
]
]
},
{
"name": "requests",
"specs": [
[
"~=",
"2.32"
]
]
},
{
"name": "pydub",
"specs": [
[
"~=",
"0.25"
]
]
},
{
"name": "jiwer",
"specs": [
[
"~=",
"3.1"
]
]
},
{
"name": "httpx",
"specs": [
[
"~=",
"0.28"
]
]
},
{
"name": "pytest",
"specs": [
[
"~=",
"8.3"
]
]
},
{
"name": "pytest-asyncio",
"specs": [
[
"~=",
"0.25"
]
]
},
{
"name": "deepgram-sdk",
"specs": [
[
"~=",
"3.10"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
"~=",
"1.0"
]
]
},
{
"name": "urllib3",
"specs": [
[
"<",
"3.0.0"
],
[
"~=",
"1.25.3"
]
]
},
{
"name": "python_dateutil",
"specs": [
[
"~=",
"2.8.2"
]
]
}
],
"lcname": "smallestai"
}