arthemis-tts


Namearthemis-tts JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/yourusername/arthemis-tts
SummaryA simple transformer-based text-to-speech library
upload_time2025-09-09 03:25:57
maintainerNone
docs_urlNone
authorHarish Santhnakakshmi Ganesan
requires_python>=3.7
licenseNone
keywords text-to-speech tts transformer neural speech synthesis
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Arthemis TTS

A simple and efficient transformer-based text-to-speech library for Python.

## Overview

Arthemis TTS is a PyPI package that provides an easy-to-use interface for 
interacting with arthemis-TTS model

## Features

- **Simple API**: Easy-to-use functions for text-to-speech conversion
- **Pretrained Models**: Use ready-to-go pretrained models
- **GPU Support**: Automatic GPU acceleration when available
- **Multiple Output Formats**: Support for various audio formats (WAV, MP3, etc.)
- **Lightweight**: Minimal dependencies and efficient implementation works on CPU

## Installation

### From PyPI (Recommended)

```bash
pip install arthemis-tts
```


## Using Pretrained Models

### Basic Usage

```python
import arthemis_tts

# Simple text-to-speech with pretrained model (tested example)
model_path = "your_model.pt"
audio = arthemis_tts.text_to_speech("Hello, world!", model_path=model_path)

# Save to file
arthemis_tts.text_to_speech("Hello, world!", 
                           model_path=model_path,
                           output_path="hello_world.wav")
```

### Advanced Usage

```python
from arthemis_tts import load_model
from arthemis_tts.audio_processing import write_audio_to_file

# Load a pretrained model (tested with actual model)
model_path = "your_model.pt"
model = load_model(model_path)

# Generate speech
audio = model.synthesize("This is a test of the synthesize function.")

# Save the audio
write_audio_to_file(audio, "synthesized_speech.wav")
```

### Step-by-Step Usage (Complete Example)

```python
import torch
from arthemis_tts import ArthemisTTS
from arthemis_tts.text_processing import text_to_sequence
from arthemis_tts.audio_processing import inverse_mel_spec_to_wav, write_audio_to_file
from arthemis_tts.utils import get_device

# 1. Load your pretrained model (tested example)
model_path = "your_model.pt"
device = get_device()  # Automatically detects best device
print(f"Using device: {device}")

model = ArthemisTTS(device=device)
state_dict = torch.load(model_path, map_location=device)

# Handle different state dict formats
if isinstance(state_dict, dict) and "model" in state_dict:
    model.load_state_dict(state_dict["model"])
else:
    model.load_state_dict(state_dict)
model.eval()

# 2. Convert text to sequence
text = "Hello, world!"
text_seq = text_to_sequence(text).unsqueeze(0).to(device)

# 3. Generate mel spectrogram
with torch.no_grad():
    mel_postnet, gate_outputs = model.inference(
        text_seq,
        max_length=100,  # Shorter for faster generation
        stop_token_threshold=0.5,
        with_tqdm=True  # Show progress bar
    )

# 4. Convert to audio
audio = inverse_mel_spec_to_wav(mel_postnet.detach()[0].T)

# 5. Save audio file
write_audio_to_file(audio, "step_by_step_output.wav")
print(f"Generated audio shape: {audio.shape}")
```

## API Reference

### Main Functions

#### `text_to_speech(text, model_path=None, output_path=None, max_length=800, gate_threshold=0.5)`

Convert text to speech using a pretrained model.

**Parameters:**
- `text` (str): Input text to synthesize
- `model_path` (str): Path to pretrained model file (required)
- `output_path` (str, optional): Path to save audio file
- `max_length` (int): Maximum generation length (default: 800)
- `gate_threshold` (float): Stop token threshold (default: 0.5)

**Returns:**
- `torch.Tensor` or `None`: Audio tensor if no output_path, None if saved to file

#### `load_model(model_path)`

Load a pretrained model.

**Parameters:**
- `model_path` (str): Path to pretrained model file

**Returns:**
- `ArthemisTTS`: Loaded model instance

### Classes

#### `ArthemisTTS`

Main TTS model class for using pretrained models.

**Methods:**
- `inference(text_tensor, max_length=800, stop_token_threshold=0.5, with_tqdm=True)`: Generate mel spectrogram
- `synthesize(text, max_length=800, stop_token_threshold=0.5)`: High-level synthesis function

## Supported Audio Formats

- WAV (recommended)
- MP3

## Requirements

- Python >= 3.7
- PyTorch >= 1.9.0
- torchaudio >= 0.9.0
- NumPy >= 1.19.0
- pandas >= 1.2.0
- tqdm >= 4.60.0
- pydub >= 0.25.0 (for MP3 support)

## Performance Notes

- **GPU Acceleration**: The model will automatically use CUDA if available
- **Memory Usage**: Adjust `max_length` parameter based on available memory
- **Generation Speed**: Depends on text length and hardware capabilities


## Model Requirements

- Models should be saved as PyTorch state dictionaries (.pt files)
- Compatible with the transformer architecture used in this library
- Models trained on the LJ Speech dataset work best for English text

## Examples

### Batch Processing with Pretrained Model

```python
import arthemis_tts

# Path to your pretrained model (tested example)
model_path = r"C:\Users\haris\Downloads\train_ArthemisTTS (7).pt"

texts = [
    "Hello, world!",
    "This is Arthemis TTS.",
    "Text-to-speech synthesis."
]

for i, text in enumerate(texts):
    arthemis_tts.text_to_speech(
        text, 
        model_path=model_path,
        output_path=f"batch_output_{i+1}.wav"
    )
    print(f"Generated audio {i+1}: {text}")
```

### Efficient Multiple Generation (Load Once, Use Many Times)

```python
from arthemis_tts import load_model
from arthemis_tts.audio_processing import write_audio_to_file

# Load model once (tested with actual model)
model_path = r"C:\Users\haris\Downloads\train_ArthemisTTS (7).pt"
model = load_model(model_path)

texts = [
    "Hello, this is the first sentence.",
    "This is the second sentence.",
    "And this is the third sentence."
]

# Generate multiple times without reloading model
for i, text in enumerate(texts):
    audio = model.synthesize(text)
    write_audio_to_file(audio, f"efficient_output_{i+1}.wav")
    print(f"Generated efficient audio {i+1}")
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details.



## Acknowledgments

- Based on the [Neural Speech Synthesis with Transformer Network](https://arxiv.org/pdf/1809.08895.pdf) paper
- Inspired by the original SimpleTransformerTTS implementation
- Uses PyTorch and torchaudio for audio processing

## Support

For questions and support, please open an issue on GitHub or Huggingface.

---

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yourusername/arthemis-tts",
    "name": "arthemis-tts",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "text-to-speech, tts, transformer, neural, speech synthesis",
    "author": "Harish Santhnakakshmi Ganesan",
    "author_email": "harishsg99@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/aa/7a/4906688fce955e15141ffdbba3a7cc137f7731842f778ea256fac11516fe/arthemis_tts-0.1.2.tar.gz",
    "platform": null,
    "description": "# Arthemis TTS\r\n\r\nA simple and efficient transformer-based text-to-speech library for Python.\r\n\r\n## Overview\r\n\r\nArthemis TTS is a PyPI package that provides an easy-to-use interface for \r\ninteracting with arthemis-TTS model\r\n\r\n## Features\r\n\r\n- **Simple API**: Easy-to-use functions for text-to-speech conversion\r\n- **Pretrained Models**: Use ready-to-go pretrained models\r\n- **GPU Support**: Automatic GPU acceleration when available\r\n- **Multiple Output Formats**: Support for various audio formats (WAV, MP3, etc.)\r\n- **Lightweight**: Minimal dependencies and efficient implementation works on CPU\r\n\r\n## Installation\r\n\r\n### From PyPI (Recommended)\r\n\r\n```bash\r\npip install arthemis-tts\r\n```\r\n\r\n\r\n## Using Pretrained Models\r\n\r\n### Basic Usage\r\n\r\n```python\r\nimport arthemis_tts\r\n\r\n# Simple text-to-speech with pretrained model (tested example)\r\nmodel_path = \"your_model.pt\"\r\naudio = arthemis_tts.text_to_speech(\"Hello, world!\", model_path=model_path)\r\n\r\n# Save to file\r\narthemis_tts.text_to_speech(\"Hello, world!\", \r\n                           model_path=model_path,\r\n                           output_path=\"hello_world.wav\")\r\n```\r\n\r\n### Advanced Usage\r\n\r\n```python\r\nfrom arthemis_tts import load_model\r\nfrom arthemis_tts.audio_processing import write_audio_to_file\r\n\r\n# Load a pretrained model (tested with actual model)\r\nmodel_path = \"your_model.pt\"\r\nmodel = load_model(model_path)\r\n\r\n# Generate speech\r\naudio = model.synthesize(\"This is a test of the synthesize function.\")\r\n\r\n# Save the audio\r\nwrite_audio_to_file(audio, \"synthesized_speech.wav\")\r\n```\r\n\r\n### Step-by-Step Usage (Complete Example)\r\n\r\n```python\r\nimport torch\r\nfrom arthemis_tts import ArthemisTTS\r\nfrom arthemis_tts.text_processing import text_to_sequence\r\nfrom arthemis_tts.audio_processing import inverse_mel_spec_to_wav, write_audio_to_file\r\nfrom arthemis_tts.utils import get_device\r\n\r\n# 1. Load your pretrained model (tested example)\r\nmodel_path = \"your_model.pt\"\r\ndevice = get_device()  # Automatically detects best device\r\nprint(f\"Using device: {device}\")\r\n\r\nmodel = ArthemisTTS(device=device)\r\nstate_dict = torch.load(model_path, map_location=device)\r\n\r\n# Handle different state dict formats\r\nif isinstance(state_dict, dict) and \"model\" in state_dict:\r\n    model.load_state_dict(state_dict[\"model\"])\r\nelse:\r\n    model.load_state_dict(state_dict)\r\nmodel.eval()\r\n\r\n# 2. Convert text to sequence\r\ntext = \"Hello, world!\"\r\ntext_seq = text_to_sequence(text).unsqueeze(0).to(device)\r\n\r\n# 3. Generate mel spectrogram\r\nwith torch.no_grad():\r\n    mel_postnet, gate_outputs = model.inference(\r\n        text_seq,\r\n        max_length=100,  # Shorter for faster generation\r\n        stop_token_threshold=0.5,\r\n        with_tqdm=True  # Show progress bar\r\n    )\r\n\r\n# 4. Convert to audio\r\naudio = inverse_mel_spec_to_wav(mel_postnet.detach()[0].T)\r\n\r\n# 5. Save audio file\r\nwrite_audio_to_file(audio, \"step_by_step_output.wav\")\r\nprint(f\"Generated audio shape: {audio.shape}\")\r\n```\r\n\r\n## API Reference\r\n\r\n### Main Functions\r\n\r\n#### `text_to_speech(text, model_path=None, output_path=None, max_length=800, gate_threshold=0.5)`\r\n\r\nConvert text to speech using a pretrained model.\r\n\r\n**Parameters:**\r\n- `text` (str): Input text to synthesize\r\n- `model_path` (str): Path to pretrained model file (required)\r\n- `output_path` (str, optional): Path to save audio file\r\n- `max_length` (int): Maximum generation length (default: 800)\r\n- `gate_threshold` (float): Stop token threshold (default: 0.5)\r\n\r\n**Returns:**\r\n- `torch.Tensor` or `None`: Audio tensor if no output_path, None if saved to file\r\n\r\n#### `load_model(model_path)`\r\n\r\nLoad a pretrained model.\r\n\r\n**Parameters:**\r\n- `model_path` (str): Path to pretrained model file\r\n\r\n**Returns:**\r\n- `ArthemisTTS`: Loaded model instance\r\n\r\n### Classes\r\n\r\n#### `ArthemisTTS`\r\n\r\nMain TTS model class for using pretrained models.\r\n\r\n**Methods:**\r\n- `inference(text_tensor, max_length=800, stop_token_threshold=0.5, with_tqdm=True)`: Generate mel spectrogram\r\n- `synthesize(text, max_length=800, stop_token_threshold=0.5)`: High-level synthesis function\r\n\r\n## Supported Audio Formats\r\n\r\n- WAV (recommended)\r\n- MP3\r\n\r\n## Requirements\r\n\r\n- Python >= 3.7\r\n- PyTorch >= 1.9.0\r\n- torchaudio >= 0.9.0\r\n- NumPy >= 1.19.0\r\n- pandas >= 1.2.0\r\n- tqdm >= 4.60.0\r\n- pydub >= 0.25.0 (for MP3 support)\r\n\r\n## Performance Notes\r\n\r\n- **GPU Acceleration**: The model will automatically use CUDA if available\r\n- **Memory Usage**: Adjust `max_length` parameter based on available memory\r\n- **Generation Speed**: Depends on text length and hardware capabilities\r\n\r\n\r\n## Model Requirements\r\n\r\n- Models should be saved as PyTorch state dictionaries (.pt files)\r\n- Compatible with the transformer architecture used in this library\r\n- Models trained on the LJ Speech dataset work best for English text\r\n\r\n## Examples\r\n\r\n### Batch Processing with Pretrained Model\r\n\r\n```python\r\nimport arthemis_tts\r\n\r\n# Path to your pretrained model (tested example)\r\nmodel_path = r\"C:\\Users\\haris\\Downloads\\train_ArthemisTTS (7).pt\"\r\n\r\ntexts = [\r\n    \"Hello, world!\",\r\n    \"This is Arthemis TTS.\",\r\n    \"Text-to-speech synthesis.\"\r\n]\r\n\r\nfor i, text in enumerate(texts):\r\n    arthemis_tts.text_to_speech(\r\n        text, \r\n        model_path=model_path,\r\n        output_path=f\"batch_output_{i+1}.wav\"\r\n    )\r\n    print(f\"Generated audio {i+1}: {text}\")\r\n```\r\n\r\n### Efficient Multiple Generation (Load Once, Use Many Times)\r\n\r\n```python\r\nfrom arthemis_tts import load_model\r\nfrom arthemis_tts.audio_processing import write_audio_to_file\r\n\r\n# Load model once (tested with actual model)\r\nmodel_path = r\"C:\\Users\\haris\\Downloads\\train_ArthemisTTS (7).pt\"\r\nmodel = load_model(model_path)\r\n\r\ntexts = [\r\n    \"Hello, this is the first sentence.\",\r\n    \"This is the second sentence.\",\r\n    \"And this is the third sentence.\"\r\n]\r\n\r\n# Generate multiple times without reloading model\r\nfor i, text in enumerate(texts):\r\n    audio = model.synthesize(text)\r\n    write_audio_to_file(audio, f\"efficient_output_{i+1}.wav\")\r\n    print(f\"Generated efficient audio {i+1}\")\r\n```\r\n\r\n## Contributing\r\n\r\nContributions are welcome! Please feel free to submit a Pull Request.\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the LICENSE file for details.\r\n\r\n\r\n\r\n## Acknowledgments\r\n\r\n- Based on the [Neural Speech Synthesis with Transformer Network](https://arxiv.org/pdf/1809.08895.pdf) paper\r\n- Inspired by the original SimpleTransformerTTS implementation\r\n- Uses PyTorch and torchaudio for audio processing\r\n\r\n## Support\r\n\r\nFor questions and support, please open an issue on GitHub or Huggingface.\r\n\r\n---\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A simple transformer-based text-to-speech library",
    "version": "0.1.2",
    "project_urls": {
        "Bug Reports": "https://github.com/yourusername/arthemis-tts/issues",
        "Documentation": "https://github.com/yourusername/arthemis-tts#readme",
        "Homepage": "https://github.com/yourusername/arthemis-tts",
        "Source": "https://github.com/yourusername/arthemis-tts"
    },
    "split_keywords": [
        "text-to-speech",
        " tts",
        " transformer",
        " neural",
        " speech synthesis"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0df0e80367aeb048e23a7fee10495bd5953f8035a0b48ae6dfc411b9e0a88108",
                "md5": "cbe65c712fef4130569b66dc28a84c1f",
                "sha256": "ab3aef97bd40cd16dc8c3e2b5f835666a1a756f14d9e960d0d82be3c53072092"
            },
            "downloads": -1,
            "filename": "arthemis_tts-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cbe65c712fef4130569b66dc28a84c1f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 13906,
            "upload_time": "2025-09-09T03:25:56",
            "upload_time_iso_8601": "2025-09-09T03:25:56.102684Z",
            "url": "https://files.pythonhosted.org/packages/0d/f0/e80367aeb048e23a7fee10495bd5953f8035a0b48ae6dfc411b9e0a88108/arthemis_tts-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "aa7a4906688fce955e15141ffdbba3a7cc137f7731842f778ea256fac11516fe",
                "md5": "f839fc282ee798e1b14185e97b2c57ef",
                "sha256": "ae676f14eaed9a6288c8266e0e4d1e1a6a419e8482b3da4db00f1a7cf63d0860"
            },
            "downloads": -1,
            "filename": "arthemis_tts-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "f839fc282ee798e1b14185e97b2c57ef",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 15034,
            "upload_time": "2025-09-09T03:25:57",
            "upload_time_iso_8601": "2025-09-09T03:25:57.412729Z",
            "url": "https://files.pythonhosted.org/packages/aa/7a/4906688fce955e15141ffdbba3a7cc137f7731842f778ea256fac11516fe/arthemis_tts-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-09 03:25:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yourusername",
    "github_project": "arthemis-tts",
    "github_not_found": true,
    "lcname": "arthemis-tts"
}
        
Elapsed time: 2.19872s