![image](https://i.imgur.com/TJ2tT4g.png)
<div align="center">
<a href="https://twitter.com/smallest_AI">
<img src="https://img.shields.io/twitter/url/https/twitter.com/smallest_AI.svg?style=social&label=Follow%20smallest_AI" alt="Twitter">
</a>
<a href="https://discord.gg/ywShEyXHBW">
<img src="https://dcbadge.vercel.app/api/server/ywShEyXHBW?style=flat" alt="Discord">
</a>
<a href="https://www.linkedin.com/company/smallest">
<img src="https://img.shields.io/badge/LinkedIn-Connect-blue" alt="Linkedin">
</a>
<a href="https://www.youtube.com/@smallest_ai">
<img src="https://img.shields.io/static/v1?message=smallest_ai&logo=youtube&label=&color=FF0000&logoColor=white&labelColor=&style=for-the-badge" height=20 alt="Youtube">
</a>
</div>
## Official Python Client for Smallest AI API
Smallest AI builds high-speed multi-lingual voice models tailored for real-time applications, achieving ultra-realistic audio generation in as fast as ~100 milliseconds for 10 seconds of audio. With this sdk, you can easily convert text into high-quality audio with humanlike expressiveness.
Currently, the library supports direct synthesis and the ability to synthesize streamed LLM output, both synchronously and asynchronously.
## Table of Contents
- [Installation](#installation)
- [Get the API Key](#get-the-api-key)
- [Best Practices for Input Text](#best-practices-for-input-text)
- [Examples](#examples)
- [Sync](#sync)
- [Async](#async)
- [LLM to Speech](#llm-to-speech)
- [Available Methods](#available-methods)
- [Technical Note: WAV Headers in Streaming Audio](#technical-note-wav-headers-in-streaming-audio)
## Installation
To install the latest version available
```bash
pip install smallestai
```
When using an SDK in your application, make sure to pin to at least the major version (e.g., ==1.*). This helps ensure your application remains stable and avoids potential issues from breaking changes in future updates.
## Get the API Key
1. Visit [waves.smallest.ai](https://waves.smallest.ai/) and sign up for an account or log in if you already have an account.
2. Navigate to `API Key` tab in your account dashboard.
3. Create a new API Key and copy it.
4. Export the API Key in your environment with the name `SMALLEST_API_KEY`, ensuring that your application can access it securely for authentication.
## Best Practices for Input Text
While the `transliterate` parameter is provided, please note that it is not fully supported and may not perform consistently across all cases. It is recommended to use the model without relying on this parameter.
For optimal voice generation results:
1. For English, provide the input in Latin script (e.g., "Hello, how are you?").
2. For Hindi, provide the input in Devanagari script (e.g., "नमस्ते, आप कैसे हैं?").
3. For code-mixed input, use Latin script for English and Devanagari script for Hindi (e.g., "Hello, आप कैसे हैं?").
## Examples
### Sync
A synchronous text-to-speech synthesis client.
**Basic Usage:**
```python
import os
from smallest import Smallest
def main():
client = Smallest(api_key=os.environ.get("SMALLEST_API_KEY"))
client.synthesize("Hello, this is a test for sync synthesis function.", save_as="sync_synthesize.wav")
if __name__ == "__main__":
main()
```
**Parameters:**
- `api_key`: Your API key (can be set via SMALLEST_API_KEY environment variable)
- `model`: TTS model to use (default: "lightning")
- `sample_rate`: Audio sample rate (default: 24000)
- `voice`: Voice ID (default: "emily")
- `speed`: Speech speed multiplier (default: 1.0)
- `add_wav_header`: Include WAV header in output (default: True)
- `transliterate`: Enable text transliteration (default: False)
- `remove_extra_silence`: Remove additional silence (default: True)
These parameters are part of the `Smallest` instance. They can be set when creating the instance (as shown above). However, the `synthesize` function also accepts `kwargs`, allowing you to override these parameters for a specific synthesis request.
For example, you can modify the speech speed and sample rate just for a particular synthesis call:
```py
client.synthesize(
"Hello, this is a test for sync synthesis function.",
save_as="sync_synthesize.wav",
speed=1.5, # Overrides default speed
sample_rate=16000 # Overrides default sample rate
)
```
### Async
Asynchronous text-to-speech synthesis client.
**Basic Usage:**
```python
import os
import asyncio
import aiofiles
from smallest import AsyncSmallest
client = AsyncSmallest(api_key=os.environ.get("SMALLEST_API_KEY"))
async def main():
async with client as tts:
audio_bytes = await tts.synthesize("Hello, this is a test of the async synthesis function.")
async with aiofiles.open("async_synthesize.wav", "wb") as f:
await f.write(audio_bytes) # alternatively you can use the `save_as` parameter.
if __name__ == "__main__":
asyncio.run(main())
```
**Parameters:**
- `api_key`: Your API key (can be set via SMALLEST_API_KEY environment variable)
- `model`: TTS model to use (default: "lightning")
- `sample_rate`: Audio sample rate (default: 24000)
- `voice`: Voice ID (default: "emily")
- `speed`: Speech speed multiplier (default: 1.0)
- `add_wav_header`: Include WAV header in output (default: True)
- `transliterate`: Enable text transliteration (default: False)
- `remove_extra_silence`: Remove additional silence (default: True)
These parameters are part of the `AsyncSmallest` instance. They can be set when creating the instance (as shown above). However, the `synthesize` function also accepts `kwargs`, allowing you to override any of these parameters on a per-request basis.
For example, you can modify the speech speed and sample rate just for a particular synthesis request:
```py
audio_bytes = await tts.synthesize(
"Hello, this is a test of the async synthesis function.",
speed=1.5, # Overrides default speed
sample_rate=16000 # Overrides default sample rate
)
```
### LLM to Speech
The `TextToAudioStream` class provides real-time text-to-speech processing, converting streaming text into audio output. It's particularly useful for applications like voice assistants, live captioning, or interactive chatbots that require immediate audio feedback from text generation. Supports both synchronous and asynchronous TTS instance.
```python
import os
import wave
import asyncio
from groq import Groq
from smallest import Smallest
from smallest import TextToAudioStream
llm = Groq(api_key=os.environ.get("GROQ_API_KEY"))
tts = Smallest(api_key=os.environ.get("SMALLEST_API_KEY"))
async def generate_text(prompt):
"""Async generator for streaming text from Groq. You can use any LLM"""
completion = llm.chat.completions.create(
messages=[
{
"role": "user",
"content": prompt,
}
],
model="llama3-8b-8192",
stream=True,
)
for chunk in completion:
text = chunk.choices[0].delta.content
if text is not None:
yield text
async def save_audio_to_wav(file_path, processor, llm_output):
with wave.open(file_path, "wb") as wav_file:
wav_file.setnchannels(1)
wav_file.setsampwidth(2)
wav_file.setframerate(24000)
async for audio_chunk in processor.process(llm_output):
wav_file.writeframes(audio_chunk)
async def main():
# Initialize the TTS processor with the TTS instance
processor = TextToAudioStream(tts_instance=tts)
# Generate text asynchronously and process it
llm_output = generate_text("Explain text to speech like I am five in 5 sentences.")
# As an example, save the generated audio to a WAV file.
await save_audio_to_wav("llm_to_speech.wav", processor, llm_output)
if __name__ == "__main__":
asyncio.run(main())
```
**Parameters:**
- `tts_instance`: Text-to-speech engine (Smallest or AsyncSmallest)
- `queue_timeout`: Wait time for new text (seconds, default: 5.0)
- `max_retries`: Number of retry attempts for failed synthesis (default: 3)
**Output Format:**
The processor yields raw audio data chunks without WAV headers for streaming efficiency. These chunks can be:
- Played directly through an audio device
- Saved to a file
- Streamed over a network
- Further processed as needed
## Available Methods
```python
from smallest.tts import Smallest
client = Smallest(api_key=os.environ.get("SMALLEST_API_KEY"))
print(f"Avalaible Languages: {client.get_languages()}")
print(f"Available Voices: {client.get_voices()}")
print(f"Available Models: {client.get_models()}")
```
## Technical Note: WAV Headers in Streaming Audio
When implementing audio streaming with chunks of synthesized speech, WAV headers are omitted from individual chunks because:
#### Technical Issues
- Each WAV header contains metadata about the entire audio file.
- Multiple headers would make chunks appear as separate audio files and add redundancy.
- Headers contain file-specific data (like total size) that's invalid for chunks.
- Sequential playback of chunks with headers causes audio artifacts (pop sounds) when concatenating or playing audio sequentially.
- Audio players would try to reinitialize audio settings for each chunk.
### Best Practices for Audio Streaming
1. Stream raw PCM audio data without headers
2. Add a single WAV header only when:
- Saving the complete stream to a file
- Initializing the audio playback system
- Converting the stream to a standard audio format
Raw data
{
"_id": null,
"home_page": null,
"name": "smallestai",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "smallest, smallest.ai, tts, text-to-speech",
"author": null,
"author_email": "Smallest <support@smallest.ai>",
"download_url": "https://files.pythonhosted.org/packages/0a/8b/52e179361bdd91f5a44ab419c7b09fdb1782c41366f475ee337667e575a2/smallestai-1.3.4.tar.gz",
"platform": null,
"description": "![image](https://i.imgur.com/TJ2tT4g.png) \n\n\n<div align=\"center\">\n <a href=\"https://twitter.com/smallest_AI\">\n <img src=\"https://img.shields.io/twitter/url/https/twitter.com/smallest_AI.svg?style=social&label=Follow%20smallest_AI\" alt=\"Twitter\">\n </a>\n <a href=\"https://discord.gg/ywShEyXHBW\">\n <img src=\"https://dcbadge.vercel.app/api/server/ywShEyXHBW?style=flat\" alt=\"Discord\">\n </a>\n <a href=\"https://www.linkedin.com/company/smallest\">\n <img src=\"https://img.shields.io/badge/LinkedIn-Connect-blue\" alt=\"Linkedin\">\n </a>\n <a href=\"https://www.youtube.com/@smallest_ai\">\n <img src=\"https://img.shields.io/static/v1?message=smallest_ai&logo=youtube&label=&color=FF0000&logoColor=white&labelColor=&style=for-the-badge\" height=20 alt=\"Youtube\">\n </a>\n</div> \n\n## Official Python Client for Smallest AI API \n\nSmallest AI builds high-speed multi-lingual voice models tailored for real-time applications, achieving ultra-realistic audio generation in as fast as ~100 milliseconds for 10 seconds of audio. With this sdk, you can easily convert text into high-quality audio with humanlike expressiveness.\n\nCurrently, the library supports direct synthesis and the ability to synthesize streamed LLM output, both synchronously and asynchronously. \n\n## Table of Contents\n\n- [Installation](#installation)\n- [Get the API Key](#get-the-api-key)\n- [Best Practices for Input Text](#best-practices-for-input-text)\n- [Examples](#examples)\n - [Sync](#sync)\n - [Async](#async)\n - [LLM to Speech](#llm-to-speech)\n- [Available Methods](#available-methods)\n- [Technical Note: WAV Headers in Streaming Audio](#technical-note-wav-headers-in-streaming-audio)\n\n## Installation\n\nTo install the latest version available \n```bash\npip install smallestai\n``` \nWhen using an SDK in your application, make sure to pin to at least the major version (e.g., ==1.*). This helps ensure your application remains stable and avoids potential issues from breaking changes in future updates. \n \n\n## Get the API Key \n\n1. Visit [waves.smallest.ai](https://waves.smallest.ai/) and sign up for an account or log in if you already have an account. \n2. Navigate to `API Key` tab in your account dashboard.\n3. Create a new API Key and copy it.\n4. Export the API Key in your environment with the name `SMALLEST_API_KEY`, ensuring that your application can access it securely for authentication.\n\n## Best Practices for Input Text\nWhile the `transliterate` parameter is provided, please note that it is not fully supported and may not perform consistently across all cases. It is recommended to use the model without relying on this parameter.\n\nFor optimal voice generation results:\n\n1. For English, provide the input in Latin script (e.g., \"Hello, how are you?\").\n2. For Hindi, provide the input in Devanagari script (e.g., \"\u0928\u092e\u0938\u094d\u0924\u0947, \u0906\u092a \u0915\u0948\u0938\u0947 \u0939\u0948\u0902?\").\n3. For code-mixed input, use Latin script for English and Devanagari script for Hindi (e.g., \"Hello, \u0906\u092a \u0915\u0948\u0938\u0947 \u0939\u0948\u0902?\").\n\n## Examples\n\n### Sync \nA synchronous text-to-speech synthesis client. \n\n**Basic Usage:** \n```python\nimport os\nfrom smallest import Smallest\n\ndef main():\n client = Smallest(api_key=os.environ.get(\"SMALLEST_API_KEY\"))\n client.synthesize(\"Hello, this is a test for sync synthesis function.\", save_as=\"sync_synthesize.wav\")\n\nif __name__ == \"__main__\":\n main()\n```\n\n**Parameters:** \n- `api_key`: Your API key (can be set via SMALLEST_API_KEY environment variable)\n- `model`: TTS model to use (default: \"lightning\")\n- `sample_rate`: Audio sample rate (default: 24000)\n- `voice`: Voice ID (default: \"emily\")\n- `speed`: Speech speed multiplier (default: 1.0)\n- `add_wav_header`: Include WAV header in output (default: True)\n- `transliterate`: Enable text transliteration (default: False)\n- `remove_extra_silence`: Remove additional silence (default: True) \n\nThese parameters are part of the `Smallest` instance. They can be set when creating the instance (as shown above). However, the `synthesize` function also accepts `kwargs`, allowing you to override these parameters for a specific synthesis request.\n\nFor example, you can modify the speech speed and sample rate just for a particular synthesis call: \n```py\nclient.synthesize(\n \"Hello, this is a test for sync synthesis function.\",\n save_as=\"sync_synthesize.wav\",\n speed=1.5, # Overrides default speed\n sample_rate=16000 # Overrides default sample rate\n)\n```\n\n\n### Async \nAsynchronous text-to-speech synthesis client. \n\n**Basic Usage:** \n```python\nimport os\nimport asyncio\nimport aiofiles\nfrom smallest import AsyncSmallest\n\nclient = AsyncSmallest(api_key=os.environ.get(\"SMALLEST_API_KEY\"))\n\nasync def main():\n async with client as tts:\n audio_bytes = await tts.synthesize(\"Hello, this is a test of the async synthesis function.\") \n async with aiofiles.open(\"async_synthesize.wav\", \"wb\") as f:\n await f.write(audio_bytes) # alternatively you can use the `save_as` parameter.\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n**Parameters:** \n- `api_key`: Your API key (can be set via SMALLEST_API_KEY environment variable)\n- `model`: TTS model to use (default: \"lightning\")\n- `sample_rate`: Audio sample rate (default: 24000)\n- `voice`: Voice ID (default: \"emily\")\n- `speed`: Speech speed multiplier (default: 1.0)\n- `add_wav_header`: Include WAV header in output (default: True)\n- `transliterate`: Enable text transliteration (default: False)\n- `remove_extra_silence`: Remove additional silence (default: True) \n\nThese parameters are part of the `AsyncSmallest` instance. They can be set when creating the instance (as shown above). However, the `synthesize` function also accepts `kwargs`, allowing you to override any of these parameters on a per-request basis. \n\nFor example, you can modify the speech speed and sample rate just for a particular synthesis request: \n```py\naudio_bytes = await tts.synthesize(\n \"Hello, this is a test of the async synthesis function.\",\n speed=1.5, # Overrides default speed\n sample_rate=16000 # Overrides default sample rate\n)\n```\n\n### LLM to Speech \n\nThe `TextToAudioStream` class provides real-time text-to-speech processing, converting streaming text into audio output. It's particularly useful for applications like voice assistants, live captioning, or interactive chatbots that require immediate audio feedback from text generation. Supports both synchronous and asynchronous TTS instance.\n\n```python\nimport os\nimport wave\nimport asyncio\nfrom groq import Groq\nfrom smallest import Smallest\nfrom smallest import TextToAudioStream\n\nllm = Groq(api_key=os.environ.get(\"GROQ_API_KEY\"))\ntts = Smallest(api_key=os.environ.get(\"SMALLEST_API_KEY\"))\n\nasync def generate_text(prompt):\n \"\"\"Async generator for streaming text from Groq. You can use any LLM\"\"\"\n completion = llm.chat.completions.create(\n messages=[\n {\n \"role\": \"user\",\n \"content\": prompt,\n }\n ],\n model=\"llama3-8b-8192\",\n stream=True,\n )\n\n for chunk in completion:\n text = chunk.choices[0].delta.content\n if text is not None:\n yield text\n\nasync def save_audio_to_wav(file_path, processor, llm_output):\n with wave.open(file_path, \"wb\") as wav_file:\n wav_file.setnchannels(1)\n wav_file.setsampwidth(2) \n wav_file.setframerate(24000)\n \n async for audio_chunk in processor.process(llm_output):\n wav_file.writeframes(audio_chunk)\n\nasync def main():\n # Initialize the TTS processor with the TTS instance\n processor = TextToAudioStream(tts_instance=tts)\n \n # Generate text asynchronously and process it\n llm_output = generate_text(\"Explain text to speech like I am five in 5 sentences.\")\n \n # As an example, save the generated audio to a WAV file.\n await save_audio_to_wav(\"llm_to_speech.wav\", processor, llm_output)\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n**Parameters:** \n\n- `tts_instance`: Text-to-speech engine (Smallest or AsyncSmallest)\n- `queue_timeout`: Wait time for new text (seconds, default: 5.0)\n- `max_retries`: Number of retry attempts for failed synthesis (default: 3)\n\n**Output Format:** \nThe processor yields raw audio data chunks without WAV headers for streaming efficiency. These chunks can be:\n\n- Played directly through an audio device\n- Saved to a file\n- Streamed over a network\n- Further processed as needed\n\n\n## Available Methods\n\n```python\nfrom smallest.tts import Smallest\n\nclient = Smallest(api_key=os.environ.get(\"SMALLEST_API_KEY\"))\n\nprint(f\"Avalaible Languages: {client.get_languages()}\")\nprint(f\"Available Voices: {client.get_voices()}\")\nprint(f\"Available Models: {client.get_models()}\")\n```\n\n## Technical Note: WAV Headers in Streaming Audio\n\nWhen implementing audio streaming with chunks of synthesized speech, WAV headers are omitted from individual chunks because:\n\n#### Technical Issues\n- Each WAV header contains metadata about the entire audio file.\n- Multiple headers would make chunks appear as separate audio files and add redundancy.\n- Headers contain file-specific data (like total size) that's invalid for chunks.\n- Sequential playback of chunks with headers causes audio artifacts (pop sounds) when concatenating or playing audio sequentially.\n- Audio players would try to reinitialize audio settings for each chunk.\n\n### Best Practices for Audio Streaming\n1. Stream raw PCM audio data without headers\n2. Add a single WAV header only when:\n - Saving the complete stream to a file\n - Initializing the audio playback system\n - Converting the stream to a standard audio format\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Official Python client for the Smallest AI API",
"version": "1.3.4",
"project_urls": {
"Homepage": "https://github.com/smallest-inc/smallest-python-sdk"
},
"split_keywords": [
"smallest",
" smallest.ai",
" tts",
" text-to-speech"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "332f7dcac81e8f033ed7cc7a97d369c90bb805907ff0f00f0ee76edd58532f41",
"md5": "0578e67db73e6026451277006072e601",
"sha256": "a510b3e9496b239419139c58c94e8d563a6a16510dfb2c5f60fa288347d06e77"
},
"downloads": -1,
"filename": "smallestai-1.3.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0578e67db73e6026451277006072e601",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 14708,
"upload_time": "2024-12-20T12:00:30",
"upload_time_iso_8601": "2024-12-20T12:00:30.848174Z",
"url": "https://files.pythonhosted.org/packages/33/2f/7dcac81e8f033ed7cc7a97d369c90bb805907ff0f00f0ee76edd58532f41/smallestai-1.3.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0a8b52e179361bdd91f5a44ab419c7b09fdb1782c41366f475ee337667e575a2",
"md5": "9961d4ff30b8dbacfc4061ef2e0b34d6",
"sha256": "e9c8cb62852f7ec3ff0cd9edfed1106944f148402b3a021f19ba3e9100fa741a"
},
"downloads": -1,
"filename": "smallestai-1.3.4.tar.gz",
"has_sig": false,
"md5_digest": "9961d4ff30b8dbacfc4061ef2e0b34d6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 16258,
"upload_time": "2024-12-20T12:00:33",
"upload_time_iso_8601": "2024-12-20T12:00:33.522336Z",
"url": "https://files.pythonhosted.org/packages/0a/8b/52e179361bdd91f5a44ab419c7b09fdb1782c41366f475ee337667e575a2/smallestai-1.3.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-20 12:00:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "smallest-inc",
"github_project": "smallest-python-sdk",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "aiohttp",
"specs": []
},
{
"name": "aiofiles",
"specs": []
},
{
"name": "requests",
"specs": []
},
{
"name": "sacremoses",
"specs": []
},
{
"name": "pydub",
"specs": []
},
{
"name": "jiwer",
"specs": []
},
{
"name": "httpx",
"specs": []
},
{
"name": "pytest",
"specs": []
},
{
"name": "pytest-asyncio",
"specs": []
},
{
"name": "deepgram-sdk",
"specs": []
},
{
"name": "python-dotenv",
"specs": []
}
],
"lcname": "smallestai"
}