# Arthemis TTS
A simple and efficient transformer-based text-to-speech library for Python.
## Overview
Arthemis TTS is a PyPI package that provides an easy-to-use interface for
interacting with arthemis-TTS model
## Features
- **Simple API**: Easy-to-use functions for text-to-speech conversion
- **Pretrained Models**: Use ready-to-go pretrained models
- **GPU Support**: Automatic GPU acceleration when available
- **Multiple Output Formats**: Support for various audio formats (WAV, MP3, etc.)
- **Lightweight**: Minimal dependencies and efficient implementation works on CPU
## Installation
### From PyPI (Recommended)
```bash
pip install arthemis-tts
```
## Using Pretrained Models
### Basic Usage
```python
import arthemis_tts
# Simple text-to-speech with pretrained model (tested example)
model_path = "your_model.pt"
audio = arthemis_tts.text_to_speech("Hello, world!", model_path=model_path)
# Save to file
arthemis_tts.text_to_speech("Hello, world!",
model_path=model_path,
output_path="hello_world.wav")
```
### Advanced Usage
```python
from arthemis_tts import load_model
from arthemis_tts.audio_processing import write_audio_to_file
# Load a pretrained model (tested with actual model)
model_path = "your_model.pt"
model = load_model(model_path)
# Generate speech
audio = model.synthesize("This is a test of the synthesize function.")
# Save the audio
write_audio_to_file(audio, "synthesized_speech.wav")
```
### Step-by-Step Usage (Complete Example)
```python
import torch
from arthemis_tts import ArthemisTTS
from arthemis_tts.text_processing import text_to_sequence
from arthemis_tts.audio_processing import inverse_mel_spec_to_wav, write_audio_to_file
from arthemis_tts.utils import get_device
# 1. Load your pretrained model (tested example)
model_path = "your_model.pt"
device = get_device() # Automatically detects best device
print(f"Using device: {device}")
model = ArthemisTTS(device=device)
state_dict = torch.load(model_path, map_location=device)
# Handle different state dict formats
if isinstance(state_dict, dict) and "model" in state_dict:
model.load_state_dict(state_dict["model"])
else:
model.load_state_dict(state_dict)
model.eval()
# 2. Convert text to sequence
text = "Hello, world!"
text_seq = text_to_sequence(text).unsqueeze(0).to(device)
# 3. Generate mel spectrogram
with torch.no_grad():
mel_postnet, gate_outputs = model.inference(
text_seq,
max_length=100, # Shorter for faster generation
stop_token_threshold=0.5,
with_tqdm=True # Show progress bar
)
# 4. Convert to audio
audio = inverse_mel_spec_to_wav(mel_postnet.detach()[0].T)
# 5. Save audio file
write_audio_to_file(audio, "step_by_step_output.wav")
print(f"Generated audio shape: {audio.shape}")
```
## API Reference
### Main Functions
#### `text_to_speech(text, model_path=None, output_path=None, max_length=800, gate_threshold=0.5)`
Convert text to speech using a pretrained model.
**Parameters:**
- `text` (str): Input text to synthesize
- `model_path` (str): Path to pretrained model file (required)
- `output_path` (str, optional): Path to save audio file
- `max_length` (int): Maximum generation length (default: 800)
- `gate_threshold` (float): Stop token threshold (default: 0.5)
**Returns:**
- `torch.Tensor` or `None`: Audio tensor if no output_path, None if saved to file
#### `load_model(model_path)`
Load a pretrained model.
**Parameters:**
- `model_path` (str): Path to pretrained model file
**Returns:**
- `ArthemisTTS`: Loaded model instance
### Classes
#### `ArthemisTTS`
Main TTS model class for using pretrained models.
**Methods:**
- `inference(text_tensor, max_length=800, stop_token_threshold=0.5, with_tqdm=True)`: Generate mel spectrogram
- `synthesize(text, max_length=800, stop_token_threshold=0.5)`: High-level synthesis function
## Supported Audio Formats
- WAV (recommended)
- MP3
## Requirements
- Python >= 3.7
- PyTorch >= 1.9.0
- torchaudio >= 0.9.0
- NumPy >= 1.19.0
- pandas >= 1.2.0
- tqdm >= 4.60.0
- pydub >= 0.25.0 (for MP3 support)
## Performance Notes
- **GPU Acceleration**: The model will automatically use CUDA if available
- **Memory Usage**: Adjust `max_length` parameter based on available memory
- **Generation Speed**: Depends on text length and hardware capabilities
## Model Requirements
- Models should be saved as PyTorch state dictionaries (.pt files)
- Compatible with the transformer architecture used in this library
- Models trained on the LJ Speech dataset work best for English text
## Examples
### Batch Processing with Pretrained Model
```python
import arthemis_tts
# Path to your pretrained model (tested example)
model_path = r"C:\Users\haris\Downloads\train_ArthemisTTS (7).pt"
texts = [
"Hello, world!",
"This is Arthemis TTS.",
"Text-to-speech synthesis."
]
for i, text in enumerate(texts):
arthemis_tts.text_to_speech(
text,
model_path=model_path,
output_path=f"batch_output_{i+1}.wav"
)
print(f"Generated audio {i+1}: {text}")
```
### Efficient Multiple Generation (Load Once, Use Many Times)
```python
from arthemis_tts import load_model
from arthemis_tts.audio_processing import write_audio_to_file
# Load model once (tested with actual model)
model_path = r"C:\Users\haris\Downloads\train_ArthemisTTS (7).pt"
model = load_model(model_path)
texts = [
"Hello, this is the first sentence.",
"This is the second sentence.",
"And this is the third sentence."
]
# Generate multiple times without reloading model
for i, text in enumerate(texts):
audio = model.synthesize(text)
write_audio_to_file(audio, f"efficient_output_{i+1}.wav")
print(f"Generated efficient audio {i+1}")
```
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Acknowledgments
- Based on the [Neural Speech Synthesis with Transformer Network](https://arxiv.org/pdf/1809.08895.pdf) paper
- Inspired by the original SimpleTransformerTTS implementation
- Uses PyTorch and torchaudio for audio processing
## Support
For questions and support, please open an issue on GitHub or Huggingface.
---
Raw data
{
"_id": null,
"home_page": "https://github.com/yourusername/arthemis-tts",
"name": "arthemis-tts",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "text-to-speech, tts, transformer, neural, speech synthesis",
"author": "Harish Santhnakakshmi Ganesan",
"author_email": "harishsg99@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/aa/7a/4906688fce955e15141ffdbba3a7cc137f7731842f778ea256fac11516fe/arthemis_tts-0.1.2.tar.gz",
"platform": null,
"description": "# Arthemis TTS\r\n\r\nA simple and efficient transformer-based text-to-speech library for Python.\r\n\r\n## Overview\r\n\r\nArthemis TTS is a PyPI package that provides an easy-to-use interface for \r\ninteracting with arthemis-TTS model\r\n\r\n## Features\r\n\r\n- **Simple API**: Easy-to-use functions for text-to-speech conversion\r\n- **Pretrained Models**: Use ready-to-go pretrained models\r\n- **GPU Support**: Automatic GPU acceleration when available\r\n- **Multiple Output Formats**: Support for various audio formats (WAV, MP3, etc.)\r\n- **Lightweight**: Minimal dependencies and efficient implementation works on CPU\r\n\r\n## Installation\r\n\r\n### From PyPI (Recommended)\r\n\r\n```bash\r\npip install arthemis-tts\r\n```\r\n\r\n\r\n## Using Pretrained Models\r\n\r\n### Basic Usage\r\n\r\n```python\r\nimport arthemis_tts\r\n\r\n# Simple text-to-speech with pretrained model (tested example)\r\nmodel_path = \"your_model.pt\"\r\naudio = arthemis_tts.text_to_speech(\"Hello, world!\", model_path=model_path)\r\n\r\n# Save to file\r\narthemis_tts.text_to_speech(\"Hello, world!\", \r\n model_path=model_path,\r\n output_path=\"hello_world.wav\")\r\n```\r\n\r\n### Advanced Usage\r\n\r\n```python\r\nfrom arthemis_tts import load_model\r\nfrom arthemis_tts.audio_processing import write_audio_to_file\r\n\r\n# Load a pretrained model (tested with actual model)\r\nmodel_path = \"your_model.pt\"\r\nmodel = load_model(model_path)\r\n\r\n# Generate speech\r\naudio = model.synthesize(\"This is a test of the synthesize function.\")\r\n\r\n# Save the audio\r\nwrite_audio_to_file(audio, \"synthesized_speech.wav\")\r\n```\r\n\r\n### Step-by-Step Usage (Complete Example)\r\n\r\n```python\r\nimport torch\r\nfrom arthemis_tts import ArthemisTTS\r\nfrom arthemis_tts.text_processing import text_to_sequence\r\nfrom arthemis_tts.audio_processing import inverse_mel_spec_to_wav, write_audio_to_file\r\nfrom arthemis_tts.utils import get_device\r\n\r\n# 1. Load your pretrained model (tested example)\r\nmodel_path = \"your_model.pt\"\r\ndevice = get_device() # Automatically detects best device\r\nprint(f\"Using device: {device}\")\r\n\r\nmodel = ArthemisTTS(device=device)\r\nstate_dict = torch.load(model_path, map_location=device)\r\n\r\n# Handle different state dict formats\r\nif isinstance(state_dict, dict) and \"model\" in state_dict:\r\n model.load_state_dict(state_dict[\"model\"])\r\nelse:\r\n model.load_state_dict(state_dict)\r\nmodel.eval()\r\n\r\n# 2. Convert text to sequence\r\ntext = \"Hello, world!\"\r\ntext_seq = text_to_sequence(text).unsqueeze(0).to(device)\r\n\r\n# 3. Generate mel spectrogram\r\nwith torch.no_grad():\r\n mel_postnet, gate_outputs = model.inference(\r\n text_seq,\r\n max_length=100, # Shorter for faster generation\r\n stop_token_threshold=0.5,\r\n with_tqdm=True # Show progress bar\r\n )\r\n\r\n# 4. Convert to audio\r\naudio = inverse_mel_spec_to_wav(mel_postnet.detach()[0].T)\r\n\r\n# 5. Save audio file\r\nwrite_audio_to_file(audio, \"step_by_step_output.wav\")\r\nprint(f\"Generated audio shape: {audio.shape}\")\r\n```\r\n\r\n## API Reference\r\n\r\n### Main Functions\r\n\r\n#### `text_to_speech(text, model_path=None, output_path=None, max_length=800, gate_threshold=0.5)`\r\n\r\nConvert text to speech using a pretrained model.\r\n\r\n**Parameters:**\r\n- `text` (str): Input text to synthesize\r\n- `model_path` (str): Path to pretrained model file (required)\r\n- `output_path` (str, optional): Path to save audio file\r\n- `max_length` (int): Maximum generation length (default: 800)\r\n- `gate_threshold` (float): Stop token threshold (default: 0.5)\r\n\r\n**Returns:**\r\n- `torch.Tensor` or `None`: Audio tensor if no output_path, None if saved to file\r\n\r\n#### `load_model(model_path)`\r\n\r\nLoad a pretrained model.\r\n\r\n**Parameters:**\r\n- `model_path` (str): Path to pretrained model file\r\n\r\n**Returns:**\r\n- `ArthemisTTS`: Loaded model instance\r\n\r\n### Classes\r\n\r\n#### `ArthemisTTS`\r\n\r\nMain TTS model class for using pretrained models.\r\n\r\n**Methods:**\r\n- `inference(text_tensor, max_length=800, stop_token_threshold=0.5, with_tqdm=True)`: Generate mel spectrogram\r\n- `synthesize(text, max_length=800, stop_token_threshold=0.5)`: High-level synthesis function\r\n\r\n## Supported Audio Formats\r\n\r\n- WAV (recommended)\r\n- MP3\r\n\r\n## Requirements\r\n\r\n- Python >= 3.7\r\n- PyTorch >= 1.9.0\r\n- torchaudio >= 0.9.0\r\n- NumPy >= 1.19.0\r\n- pandas >= 1.2.0\r\n- tqdm >= 4.60.0\r\n- pydub >= 0.25.0 (for MP3 support)\r\n\r\n## Performance Notes\r\n\r\n- **GPU Acceleration**: The model will automatically use CUDA if available\r\n- **Memory Usage**: Adjust `max_length` parameter based on available memory\r\n- **Generation Speed**: Depends on text length and hardware capabilities\r\n\r\n\r\n## Model Requirements\r\n\r\n- Models should be saved as PyTorch state dictionaries (.pt files)\r\n- Compatible with the transformer architecture used in this library\r\n- Models trained on the LJ Speech dataset work best for English text\r\n\r\n## Examples\r\n\r\n### Batch Processing with Pretrained Model\r\n\r\n```python\r\nimport arthemis_tts\r\n\r\n# Path to your pretrained model (tested example)\r\nmodel_path = r\"C:\\Users\\haris\\Downloads\\train_ArthemisTTS (7).pt\"\r\n\r\ntexts = [\r\n \"Hello, world!\",\r\n \"This is Arthemis TTS.\",\r\n \"Text-to-speech synthesis.\"\r\n]\r\n\r\nfor i, text in enumerate(texts):\r\n arthemis_tts.text_to_speech(\r\n text, \r\n model_path=model_path,\r\n output_path=f\"batch_output_{i+1}.wav\"\r\n )\r\n print(f\"Generated audio {i+1}: {text}\")\r\n```\r\n\r\n### Efficient Multiple Generation (Load Once, Use Many Times)\r\n\r\n```python\r\nfrom arthemis_tts import load_model\r\nfrom arthemis_tts.audio_processing import write_audio_to_file\r\n\r\n# Load model once (tested with actual model)\r\nmodel_path = r\"C:\\Users\\haris\\Downloads\\train_ArthemisTTS (7).pt\"\r\nmodel = load_model(model_path)\r\n\r\ntexts = [\r\n \"Hello, this is the first sentence.\",\r\n \"This is the second sentence.\",\r\n \"And this is the third sentence.\"\r\n]\r\n\r\n# Generate multiple times without reloading model\r\nfor i, text in enumerate(texts):\r\n audio = model.synthesize(text)\r\n write_audio_to_file(audio, f\"efficient_output_{i+1}.wav\")\r\n print(f\"Generated efficient audio {i+1}\")\r\n```\r\n\r\n## Contributing\r\n\r\nContributions are welcome! Please feel free to submit a Pull Request.\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the LICENSE file for details.\r\n\r\n\r\n\r\n## Acknowledgments\r\n\r\n- Based on the [Neural Speech Synthesis with Transformer Network](https://arxiv.org/pdf/1809.08895.pdf) paper\r\n- Inspired by the original SimpleTransformerTTS implementation\r\n- Uses PyTorch and torchaudio for audio processing\r\n\r\n## Support\r\n\r\nFor questions and support, please open an issue on GitHub or Huggingface.\r\n\r\n---\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A simple transformer-based text-to-speech library",
"version": "0.1.2",
"project_urls": {
"Bug Reports": "https://github.com/yourusername/arthemis-tts/issues",
"Documentation": "https://github.com/yourusername/arthemis-tts#readme",
"Homepage": "https://github.com/yourusername/arthemis-tts",
"Source": "https://github.com/yourusername/arthemis-tts"
},
"split_keywords": [
"text-to-speech",
" tts",
" transformer",
" neural",
" speech synthesis"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "0df0e80367aeb048e23a7fee10495bd5953f8035a0b48ae6dfc411b9e0a88108",
"md5": "cbe65c712fef4130569b66dc28a84c1f",
"sha256": "ab3aef97bd40cd16dc8c3e2b5f835666a1a756f14d9e960d0d82be3c53072092"
},
"downloads": -1,
"filename": "arthemis_tts-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cbe65c712fef4130569b66dc28a84c1f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 13906,
"upload_time": "2025-09-09T03:25:56",
"upload_time_iso_8601": "2025-09-09T03:25:56.102684Z",
"url": "https://files.pythonhosted.org/packages/0d/f0/e80367aeb048e23a7fee10495bd5953f8035a0b48ae6dfc411b9e0a88108/arthemis_tts-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "aa7a4906688fce955e15141ffdbba3a7cc137f7731842f778ea256fac11516fe",
"md5": "f839fc282ee798e1b14185e97b2c57ef",
"sha256": "ae676f14eaed9a6288c8266e0e4d1e1a6a419e8482b3da4db00f1a7cf63d0860"
},
"downloads": -1,
"filename": "arthemis_tts-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "f839fc282ee798e1b14185e97b2c57ef",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 15034,
"upload_time": "2025-09-09T03:25:57",
"upload_time_iso_8601": "2025-09-09T03:25:57.412729Z",
"url": "https://files.pythonhosted.org/packages/aa/7a/4906688fce955e15141ffdbba3a7cc137f7731842f778ea256fac11516fe/arthemis_tts-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-09 03:25:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yourusername",
"github_project": "arthemis-tts",
"github_not_found": true,
"lcname": "arthemis-tts"
}