# AI Audiobook Generator: CLI Tool with GPU & NPU Acceleration
[](https://pypi.org/project/reader/)
[](https://pypi.org/project/reader/)
[](https://github.com/danielcorsano/reader/blob/main/LICENSE)
[](https://pypi.org/project/reader/)
**Transform long-form text into professional audiobooks with character-aware voices, emotion analysis, and intelligent processing.**
Perfect for novels, articles, textbooks, research papers, and any long-form content. Built with Kokoro-82M TTS for production-quality narration. Works on all platforms with optimizations for Apple Silicon (M1/M2/M3/M4 Neural Engine), NVIDIA GPUs, and AMD/Intel GPUs.
## ✨ Core Features
### ⚡ **High-Performance Conversion**
- **Up to 6x faster than real-time** on Apple Silicon (M1/M2/M3/M4) with Neural Engine
- **GPU acceleration** for NVIDIA (CUDA), AMD/Intel (DirectML on Windows)
- **Efficient CPU processing** on all platforms
- Kokoro-82M engine optimized for speed + quality balance
### 🎭 **Character-Aware Narration**
- **Automatic character detection** in dialogue
- **Auto-assign different voices** with automatic gender detection when possible
- Assigns gender-appropriate voices (e.g., Alice gets `af_sarah`, Bob gets `am_adam`)
- Perfect for fiction, interviews, dialogues, and multi-speaker content
### 😊 **Emotion Analysis**
- **VADER sentiment analysis** adjusts prosody in real-time
- Excitement, sadness, tension automatically reflected in voice tone
- Natural emotional narration without manual SSML tagging
### 💾 **Checkpoint Resumption**
- **Resume interrupted conversions** from where you left off
- Essential for extra-long texts (500+ page books, textbooks, research papers)
- Reliable production workflow for lengthy content
### 📚 **Chapter Management**
- **Automatic chapter detection** from EPUB TOC, PDF structure, or text patterns
- **M4B audiobook format** with chapter metadata
- Chapter timestamps and navigation
### 📊 **Professional Production Tools**
- **4 progress visualization styles**: simple, tqdm, rich, timeseries charts
- **Real-time metrics**: processing speed, ETA, completion percentage
- **Batch processing** with queue management
- **Multiple output formats**: MP3 (48kHz mono optimized), WAV, M4A, M4B
### 🎙️ **Production-Quality TTS**
- **Kokoro-82M**: 48 high-quality neural voices across 8 languages
- **Near-human quality** narration
- **Consistent voice** throughout long documents
- No voice cloning overhead
---
## ⚖️ Copyright Notice
**IMPORTANT**: This software is a tool for converting text to audio. Users are solely responsible for:
- Ensuring they have the legal right to convert any text to audio
- Obtaining necessary permissions for copyrighted materials
- Complying with all applicable copyright laws and licensing terms
- Understanding that creating audiobooks from copyrighted text without authorization may constitute copyright infringement
**Recommended Use Cases:**
- ✅ Your own original content
- ✅ Public domain works
- ✅ Content you have explicit permission to convert
- ✅ Educational materials you legally own
- ✅ Open-source or Creative Commons licensed texts (per their terms)
The developers of audiobook-reader do not condone or support copyright infringement. By using this software, you agree to use it only for content you have the legal right to convert.
---
## 📚 Supported Input Formats
EPUB, PDF, TXT, Markdown, ReStructuredText
## 📦 Installation
### Using pip (recommended for users)
```bash
# Default installation (Kokoro TTS + core features)
pip install audiobook-reader
# With all progress visualizations (tqdm, rich, plotext)
pip install audiobook-reader[progress-full]
# With system monitoring
pip install audiobook-reader[monitoring]
# With everything
pip install audiobook-reader[all]
```
### Hardware Acceleration Options
audiobook-reader works great on **all platforms**. For maximum performance, enable hardware acceleration:
#### ✅ Apple Silicon (M1/M2/M3/M4)
**Neural Engine (CoreML) works automatically** - no additional setup needed!
```bash
pip install audiobook-reader
# That's it! CoreML acceleration is built-in
```
#### ✅ NVIDIA GPU (Windows/Linux)
Get **CUDA acceleration** with a simple package swap:
```bash
pip install audiobook-reader
pip uninstall onnxruntime
pip install onnxruntime-gpu
```
#### ✅ AMD/Intel GPU (Windows)
Get **DirectML acceleration**:
```bash
pip install audiobook-reader
pip uninstall onnxruntime
pip install onnxruntime-directml
```
#### ✅ CPU Only (All Platforms)
**No GPU? No problem!** The default installation works efficiently on any CPU:
```bash
pip install audiobook-reader
# Works great on Intel, AMD, ARM processors
```
## 🚀 Quick Start
```bash
# 1. Install
pip install audiobook-reader
# 2. Models auto-download on first use (~310MB)
# Or manually: reader download models
# For permanent local storage: reader download models --local
# 3. Add a text file
echo "Hello world! This is my first audiobook." > text/hello.txt
# 4. Convert to audiobook (Neural Engine optimized)
reader convert
# 5. Listen to finished/hello_kokoro_am_michael.mp3
```
### 🎭 Character Voices (Optional)
For books with dialogue, assign different voices to each character:
```bash
# Auto-detect characters and generate config
reader characters detect text/mybook.txt --auto-assign
# OR manually create mybook.characters.yaml:
# characters:
# - name: Alice
# voice: af_sarah
# gender: female
# - name: Bob
# voice: am_michael
# gender: male
# Convert with character voices
reader convert --characters --file text/mybook.txt
```
## 📖 Documentation
- **[Usage Guide](https://github.com/danielcorsano/reader/blob/main/docs/USAGE.md)** - Complete command reference and workflows
- **[Examples](https://github.com/danielcorsano/reader/blob/main/docs/EXAMPLES.md)** - Real-world examples and use cases
- **[Advanced Features](https://github.com/danielcorsano/reader/blob/main/docs/ADVANCED_FEATURES.md)** - Professional audiobook production features
- **[Kokoro Setup](https://github.com/danielcorsano/reader/blob/main/docs/KOKORO_SETUP.md)** - Neural TTS model setup guide
## 🎙️ Command Reference
### Basic Conversion
```bash
# Convert single file with Neural Engine acceleration
reader convert --file text/book.epub
# Convert with specific voice
reader convert --file text/book.epub --voice am_michael
# Kokoro is the TTS engine
# Enable debug mode to see Neural Engine status
reader convert --file text/book.epub --debug
```
### 📊 Progress Visualization Options
```bash
# Simple text progress (default)
reader convert --progress-style simple --file "book.epub"
# Professional progress bars with speed metrics
reader convert --progress-style tqdm --file "book.epub"
# Beautiful Rich formatted displays with colors
reader convert --progress-style rich --file "book.epub"
# Real-time ASCII charts showing processing speed
reader convert --progress-style timeseries --file "book.epub"
```
### Configuration Management
```bash
# Save permanent settings to config file
reader config --engine kokoro --voice am_michael --format mp3
# List available Kokoro voices
reader voices
# View current configuration
reader config
# View application info and features
reader info
```
### **Parameter Hierarchy (How Settings Work)**
1. **CLI parameters** (highest priority) - temporary overrides, never saved
2. **Config file** (middle priority) - your saved preferences
3. **Code defaults** (lowest priority) - sensible fallbacks
Example:
```bash
# Save your preferred settings
reader config --engine kokoro --voice am_michael --format mp3
# Use temporary override (doesn't change your saved config)
reader convert --voice af_sarah
# Your config file still has kokoro/am_michael/mp3 saved
```
## 📁 File Support
### Input Formats
| Format | Extension | Chapter Detection |
|--------|-----------|------------------|
| EPUB | `.epub` | ✅ Automatic from TOC |
| PDF | `.pdf` | ✅ Page-based |
| Text | `.txt` | ✅ Simple patterns |
| Markdown | `.md` | ✅ Header-based |
| ReStructuredText | `.rst` | ✅ Header-based |
### Output Formats
- **MP3** (default) - 48kHz mono, configurable bitrate (32k-64k, default 48k)
- **WAV** - Uncompressed, high quality
- **M4A** - Apple-friendly format
- **M4B** - Audiobook format with chapter support
## 🏗️ Project Structure
```
reader/
├── text/ # 📂 Input files (your books)
├── audio/ # 🔊 Temporary processing
├── finished/ # ✅ Completed audiobooks
├── config/ # ⚙️ Configuration files
├── models/ # 🤖 Kokoro TTS models
└── reader/
├── engines/ # 🎙️ TTS engine (Kokoro)
├── parsers/ # 📖 File format parsers
├── batch/ # 💾 Neural Engine processor
├── analysis/ # 🎭 Emotion/dialogue detection
└── cli.py # 💻 Command-line interface
```
## 🎨 Example Workflows
### Simple Book Conversion
```bash
# Add your book
cp "My Novel.epub" text/
# Convert with Neural Engine acceleration
reader convert
# Result: finished/My Novel_kokoro_am_michael.mp3
```
### Voice Comparison
```bash
# Test different Kokoro voices on same content
reader convert --voice af_sarah --file text/sample.txt
reader convert --voice am_adam --file text/sample.txt
reader convert --voice bf_emma --file text/sample.txt
# Compare finished/sample_*.mp3 outputs
```
### Batch Processing
```bash
# Add multiple books
cp book1.epub book2.pdf story.txt text/
# Set default voice and convert all
reader config --voice am_michael --speed 1.0
reader convert
# Results: finished/book1_*.mp3, finished/book2_*.mp3, finished/story_*.mp3
```
## ⚙️ Configuration
Settings are saved to `config/settings.yaml`:
```yaml
tts:
engine: kokoro # TTS engine (Kokoro)
voice: am_michael # Default voice
speed: 1.0 # Speech rate multiplier
volume: 1.0 # Volume level
audio:
format: mp3 # Output format (mp3, wav, m4a, m4b)
bitrate: 48k # MP3 bitrate (32k-64k typical for audiobooks)
add_metadata: true # Metadata support
processing:
chunk_size: 400 # Text chunk size for processing (Kokoro optimal)
auto_detect_chapters: true # Chapter detection
```
## 🛠️ Development
**Modular Architecture Benefits:**
- **Easy TTS upgrades**: pyttsx3 → Kokoro → Custom engines
- **New format support**: Add parsers for Word, HTML, etc.
- **Enhanced processing**: Audio effects, normalization, etc.
- **Cloud integration**: Azure, AWS, Google TTS services
**Component Swapping:**
```python
# Each component implements abstract interfaces
class MyCustomTTS(TTSEngine):
def synthesize(self, text, voice, speed): ...
def list_voices(self): ...
```
## 🎯 Quick Examples
See **[docs/EXAMPLES.md](https://github.com/danielcorsano/reader/blob/main/docs/EXAMPLES.md)** for detailed examples including:
- Voice testing and selection
- PDF processing workflows
- Markdown chapter handling
- Batch processing scripts
- Configuration optimization
## 📊 Technical Specs
- **TTS Engine**: Kokoro-82M (82M parameters, Apache 2.0 license)
- **Model Size**: ~310MB ONNX models (auto-downloaded on first use to cache)
- **Model Cache**: Follows XDG standard (`~/.cache/audiobook-reader/models/`)
- **Python**: 3.10-3.13 compatibility
- **Platforms**: macOS, Linux, Windows (all fully supported)
- **Audio Quality**: 48kHz mono MP3, configurable bitrate (32k-64k, default 48k)
- **Hardware Acceleration**:
- ✅ Apple Silicon (M1/M2/M3/M4): CoreML (Neural Engine) - automatic
- ✅ NVIDIA GPUs: CUDA via onnxruntime-gpu
- ✅ AMD/Intel GPUs: DirectML on Windows
- ✅ CPU: Works efficiently on all processors
- **Performance**: Hardware-accelerated on all major platforms
- **Memory**: Efficient streaming processing for large books
## 🎵 Audio Quality
**Kokoro TTS** (primary engine):
- ✅ Near-human quality neural voices
- ✅ 48 voices across 8 languages
- ✅ Apple Neural Engine acceleration
- ✅ Professional audiobook production
- ✅ Consistent narration (no hallucinations)
---
## 🔧 Troubleshooting
### FFmpeg Not Found
**Error**: `FFmpeg not found` or `Command 'ffmpeg' not found`
**Solution**:
```bash
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg
# Windows
# Download from https://ffmpeg.org/download.html
# Or use: choco install ffmpeg
```
### Models Not Downloading
**Error**: `Failed to download Kokoro models`
**Solution**:
Models auto-download on first use (~310MB). If automatic download fails:
```bash
# Download to system cache (default)
reader download models
# Download to local models/ folder (permanent storage)
reader download models --local
# Force re-download
reader download models --force
```
**Model Storage Options:**
- **Cache** (default): System cache directory, shared across installations
- macOS: `~/Library/Caches/audiobook-reader/models/`
- Linux: `~/.cache/audiobook-reader/models/`
- Windows: `%LOCALAPPDATA%\audiobook-reader\models\`
- **Local** (`--local` flag): `models/` folder in package root
- Permanent local storage, survives cache clears
- Priority: Reader checks `models/` first, then falls back to cache
### Neural Engine Not Detected (Apple Silicon)
**Error**: `Neural Engine not available, using CPU`
**Solution**:
- Ensure you're on Apple Silicon (M1/M2/M3/M4 Mac)
- Update macOS to latest version
- Reinstall onnxruntime: `pip uninstall onnxruntime && pip install onnxruntime`
- CPU processing works fine but is slower than GPU/NPU
### Permission Errors
**Error**: `Permission denied` when creating directories
**Solution**:
```bash
# Ensure write permissions in project directory
chmod -R u+w /path/to/reader
# Or run from a directory you own
cd ~/Documents
git clone https://github.com/danielcorsano/reader.git
cd reader
```
### Import Errors
**Error**: `ModuleNotFoundError: No module named 'kokoro_onnx'`
**Solution**:
```bash
# Reinstall package
pip install --force-reinstall audiobook-reader
```
### Invalid Input Format
**Error**: `Unsupported file format`
**Supported formats**: `.epub`, `.pdf`, `.txt`, `.md`, `.rst`
**Solution**:
```bash
# Convert your file to a supported format first
# For Word docs: Save as .txt or .pdf
# For HTML: Save as .txt or use pandoc to convert
```
### GPU Acceleration Issues
**NVIDIA GPU**: Requires `onnxruntime-gpu` instead of `onnxruntime`
```bash
pip uninstall onnxruntime
pip install onnxruntime-gpu
```
**AMD/Intel GPU (Windows)**: Requires `onnxruntime-directml`
```bash
pip uninstall onnxruntime
pip install onnxruntime-directml
```
### Still Having Issues?
- Check the [GitHub Issues](https://github.com/danielcorsano/reader/issues)
- Run with debug mode: `reader convert --debug --file yourfile.txt`
- Verify Python version: `python --version` (requires 3.10-3.13)
## 📜 Credits & Licensing
### Kokoro TTS Model
This project uses the [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model by [hexgrad](https://github.com/hexgrad/kokoro), licensed under Apache 2.0.
**Model Credits:**
- Original Model: [hexgrad/Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) (Apache 2.0)
- ONNX Wrapper: [kokoro-onnx](https://github.com/thewh1teagle/kokoro-onnx) by thewh1teagle (MIT)
- Training datasets: Koniwa (CC BY 3.0), SIWIS (CC BY 4.0)
### Reader Package
This audiobook CLI tool is licensed under the MIT License. See `LICENSE` file for details.
---
**Ready to create your first audiobook?** Check out the **[Usage Guide](https://github.com/danielcorsano/reader/blob/main/docs/USAGE.md)** for step-by-step instructions!
Raw data
{
"_id": null,
"home_page": "https://github.com/danielcorsano/reader",
"name": "audiobook-reader",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.14,>=3.10",
"maintainer_email": null,
"keywords": "audiobook, text-to-speech, tts, kokoro, neural-engine, apple-silicon, m1, m2, m3, m4, epub, pdf, cli, audio, conversion, character-voices, emotion-analysis",
"author": "danielcorsano",
"author_email": "danielcorsano@users.noreply.github.com",
"download_url": "https://files.pythonhosted.org/packages/de/d8/272512ebd2dcfe34abc009d9ddfd223ef0c7da7c12877310ce3cd159a0e9/audiobook_reader-0.1.3.tar.gz",
"platform": null,
"description": "# AI Audiobook Generator: CLI Tool with GPU & NPU Acceleration\n\n[](https://pypi.org/project/reader/)\n[](https://pypi.org/project/reader/)\n[](https://github.com/danielcorsano/reader/blob/main/LICENSE)\n[](https://pypi.org/project/reader/)\n\n**Transform long-form text into professional audiobooks with character-aware voices, emotion analysis, and intelligent processing.**\n\nPerfect for novels, articles, textbooks, research papers, and any long-form content. Built with Kokoro-82M TTS for production-quality narration. Works on all platforms with optimizations for Apple Silicon (M1/M2/M3/M4 Neural Engine), NVIDIA GPUs, and AMD/Intel GPUs.\n\n## \u2728 Core Features\n\n### \u26a1 **High-Performance Conversion**\n- **Up to 6x faster than real-time** on Apple Silicon (M1/M2/M3/M4) with Neural Engine\n- **GPU acceleration** for NVIDIA (CUDA), AMD/Intel (DirectML on Windows)\n- **Efficient CPU processing** on all platforms\n- Kokoro-82M engine optimized for speed + quality balance\n\n### \ud83c\udfad **Character-Aware Narration**\n- **Automatic character detection** in dialogue\n- **Auto-assign different voices** with automatic gender detection when possible\n- Assigns gender-appropriate voices (e.g., Alice gets `af_sarah`, Bob gets `am_adam`)\n- Perfect for fiction, interviews, dialogues, and multi-speaker content\n\n### \ud83d\ude0a **Emotion Analysis**\n- **VADER sentiment analysis** adjusts prosody in real-time\n- Excitement, sadness, tension automatically reflected in voice tone\n- Natural emotional narration without manual SSML tagging\n\n### \ud83d\udcbe **Checkpoint Resumption**\n- **Resume interrupted conversions** from where you left off\n- Essential for extra-long texts (500+ page books, textbooks, research papers)\n- Reliable production workflow for lengthy content\n\n### \ud83d\udcda **Chapter Management**\n- **Automatic chapter detection** from EPUB TOC, PDF structure, or text patterns\n- **M4B audiobook format** with chapter metadata\n- Chapter timestamps and navigation\n\n### \ud83d\udcca **Professional Production Tools**\n- **4 progress visualization styles**: simple, tqdm, rich, timeseries charts\n- **Real-time metrics**: processing speed, ETA, completion percentage\n- **Batch processing** with queue management\n- **Multiple output formats**: MP3 (48kHz mono optimized), WAV, M4A, M4B\n\n### \ud83c\udf99\ufe0f **Production-Quality TTS**\n- **Kokoro-82M**: 48 high-quality neural voices across 8 languages\n- **Near-human quality** narration\n- **Consistent voice** throughout long documents\n- No voice cloning overhead\n\n---\n\n## \u2696\ufe0f Copyright Notice\n\n**IMPORTANT**: This software is a tool for converting text to audio. Users are solely responsible for:\n\n- Ensuring they have the legal right to convert any text to audio\n- Obtaining necessary permissions for copyrighted materials\n- Complying with all applicable copyright laws and licensing terms\n- Understanding that creating audiobooks from copyrighted text without authorization may constitute copyright infringement\n\n**Recommended Use Cases:**\n- \u2705 Your own original content\n- \u2705 Public domain works\n- \u2705 Content you have explicit permission to convert\n- \u2705 Educational materials you legally own\n- \u2705 Open-source or Creative Commons licensed texts (per their terms)\n\nThe developers of audiobook-reader do not condone or support copyright infringement. By using this software, you agree to use it only for content you have the legal right to convert.\n\n---\n\n## \ud83d\udcda Supported Input Formats\n\nEPUB, PDF, TXT, Markdown, ReStructuredText\n\n## \ud83d\udce6 Installation\n\n### Using pip (recommended for users)\n```bash\n# Default installation (Kokoro TTS + core features)\npip install audiobook-reader\n\n# With all progress visualizations (tqdm, rich, plotext)\npip install audiobook-reader[progress-full]\n\n# With system monitoring\npip install audiobook-reader[monitoring]\n\n# With everything\npip install audiobook-reader[all]\n```\n\n### Hardware Acceleration Options\n\naudiobook-reader works great on **all platforms**. For maximum performance, enable hardware acceleration:\n\n#### \u2705 Apple Silicon (M1/M2/M3/M4)\n**Neural Engine (CoreML) works automatically** - no additional setup needed!\n\n```bash\npip install audiobook-reader\n# That's it! CoreML acceleration is built-in\n```\n\n#### \u2705 NVIDIA GPU (Windows/Linux)\nGet **CUDA acceleration** with a simple package swap:\n\n```bash\npip install audiobook-reader\npip uninstall onnxruntime\npip install onnxruntime-gpu\n```\n\n#### \u2705 AMD/Intel GPU (Windows)\nGet **DirectML acceleration**:\n\n```bash\npip install audiobook-reader\npip uninstall onnxruntime\npip install onnxruntime-directml\n```\n\n#### \u2705 CPU Only (All Platforms)\n**No GPU? No problem!** The default installation works efficiently on any CPU:\n\n```bash\npip install audiobook-reader\n# Works great on Intel, AMD, ARM processors\n```\n\n## \ud83d\ude80 Quick Start\n\n```bash\n# 1. Install\npip install audiobook-reader\n\n# 2. Models auto-download on first use (~310MB)\n# Or manually: reader download models\n# For permanent local storage: reader download models --local\n\n# 3. Add a text file\necho \"Hello world! This is my first audiobook.\" > text/hello.txt\n\n# 4. Convert to audiobook (Neural Engine optimized)\nreader convert\n\n# 5. Listen to finished/hello_kokoro_am_michael.mp3\n```\n\n### \ud83c\udfad Character Voices (Optional)\n\nFor books with dialogue, assign different voices to each character:\n\n```bash\n# Auto-detect characters and generate config\nreader characters detect text/mybook.txt --auto-assign\n\n# OR manually create mybook.characters.yaml:\n# characters:\n# - name: Alice\n# voice: af_sarah\n# gender: female\n# - name: Bob\n# voice: am_michael\n# gender: male\n\n# Convert with character voices\nreader convert --characters --file text/mybook.txt\n```\n\n## \ud83d\udcd6 Documentation\n\n- **[Usage Guide](https://github.com/danielcorsano/reader/blob/main/docs/USAGE.md)** - Complete command reference and workflows\n- **[Examples](https://github.com/danielcorsano/reader/blob/main/docs/EXAMPLES.md)** - Real-world examples and use cases\n- **[Advanced Features](https://github.com/danielcorsano/reader/blob/main/docs/ADVANCED_FEATURES.md)** - Professional audiobook production features\n- **[Kokoro Setup](https://github.com/danielcorsano/reader/blob/main/docs/KOKORO_SETUP.md)** - Neural TTS model setup guide\n\n## \ud83c\udf99\ufe0f Command Reference\n\n### Basic Conversion\n```bash\n# Convert single file with Neural Engine acceleration\nreader convert --file text/book.epub\n\n# Convert with specific voice\nreader convert --file text/book.epub --voice am_michael\n\n# Kokoro is the TTS engine\n\n# Enable debug mode to see Neural Engine status\nreader convert --file text/book.epub --debug\n```\n\n### \ud83d\udcca Progress Visualization Options\n\n```bash\n# Simple text progress (default)\nreader convert --progress-style simple --file \"book.epub\"\n\n# Professional progress bars with speed metrics\nreader convert --progress-style tqdm --file \"book.epub\"\n\n# Beautiful Rich formatted displays with colors\nreader convert --progress-style rich --file \"book.epub\"\n\n# Real-time ASCII charts showing processing speed\nreader convert --progress-style timeseries --file \"book.epub\"\n```\n\n### Configuration Management\n```bash\n# Save permanent settings to config file\nreader config --engine kokoro --voice am_michael --format mp3\n\n# List available Kokoro voices\nreader voices\n\n# View current configuration\nreader config\n\n# View application info and features\nreader info\n```\n\n### **Parameter Hierarchy (How Settings Work)**\n1. **CLI parameters** (highest priority) - temporary overrides, never saved\n2. **Config file** (middle priority) - your saved preferences \n3. **Code defaults** (lowest priority) - sensible fallbacks\n\nExample:\n```bash\n# Save your preferred settings\nreader config --engine kokoro --voice am_michael --format mp3\n\n# Use temporary override (doesn't change your saved config)\nreader convert --voice af_sarah\n\n# Your config file still has kokoro/am_michael/mp3 saved\n```\n\n## \ud83d\udcc1 File Support\n\n### Input Formats\n| Format | Extension | Chapter Detection |\n|--------|-----------|------------------|\n| EPUB | `.epub` | \u2705 Automatic from TOC |\n| PDF | `.pdf` | \u2705 Page-based |\n| Text | `.txt` | \u2705 Simple patterns |\n| Markdown | `.md` | \u2705 Header-based |\n| ReStructuredText | `.rst` | \u2705 Header-based |\n\n### Output Formats\n- **MP3** (default) - 48kHz mono, configurable bitrate (32k-64k, default 48k)\n- **WAV** - Uncompressed, high quality\n- **M4A** - Apple-friendly format\n- **M4B** - Audiobook format with chapter support\n\n## \ud83c\udfd7\ufe0f Project Structure\n\n```\nreader/\n\u251c\u2500\u2500 text/ # \ud83d\udcc2 Input files (your books)\n\u251c\u2500\u2500 audio/ # \ud83d\udd0a Temporary processing\n\u251c\u2500\u2500 finished/ # \u2705 Completed audiobooks\n\u251c\u2500\u2500 config/ # \u2699\ufe0f Configuration files\n\u251c\u2500\u2500 models/ # \ud83e\udd16 Kokoro TTS models\n\u2514\u2500\u2500 reader/\n \u251c\u2500\u2500 engines/ # \ud83c\udf99\ufe0f TTS engine (Kokoro)\n \u251c\u2500\u2500 parsers/ # \ud83d\udcd6 File format parsers\n \u251c\u2500\u2500 batch/ # \ud83d\udcbe Neural Engine processor\n \u251c\u2500\u2500 analysis/ # \ud83c\udfad Emotion/dialogue detection\n \u2514\u2500\u2500 cli.py # \ud83d\udcbb Command-line interface\n```\n\n## \ud83c\udfa8 Example Workflows\n\n### Simple Book Conversion\n```bash\n# Add your book\ncp \"My Novel.epub\" text/\n\n# Convert with Neural Engine acceleration\nreader convert\n\n# Result: finished/My Novel_kokoro_am_michael.mp3\n```\n\n### Voice Comparison\n```bash\n# Test different Kokoro voices on same content\nreader convert --voice af_sarah --file text/sample.txt\nreader convert --voice am_adam --file text/sample.txt\nreader convert --voice bf_emma --file text/sample.txt\n\n# Compare finished/sample_*.mp3 outputs\n```\n\n### Batch Processing\n```bash\n# Add multiple books\ncp book1.epub book2.pdf story.txt text/\n\n# Set default voice and convert all\nreader config --voice am_michael --speed 1.0\nreader convert\n\n# Results: finished/book1_*.mp3, finished/book2_*.mp3, finished/story_*.mp3\n```\n\n## \u2699\ufe0f Configuration\n\nSettings are saved to `config/settings.yaml`:\n\n```yaml\ntts:\n engine: kokoro # TTS engine (Kokoro)\n voice: am_michael # Default voice\n speed: 1.0 # Speech rate multiplier\n volume: 1.0 # Volume level\naudio:\n format: mp3 # Output format (mp3, wav, m4a, m4b)\n bitrate: 48k # MP3 bitrate (32k-64k typical for audiobooks)\n add_metadata: true # Metadata support\nprocessing:\n chunk_size: 400 # Text chunk size for processing (Kokoro optimal)\n auto_detect_chapters: true # Chapter detection\n```\n\n## \ud83d\udee0\ufe0f Development\n\n**Modular Architecture Benefits:**\n- **Easy TTS upgrades**: pyttsx3 \u2192 Kokoro \u2192 Custom engines\n- **New format support**: Add parsers for Word, HTML, etc. \n- **Enhanced processing**: Audio effects, normalization, etc.\n- **Cloud integration**: Azure, AWS, Google TTS services\n\n**Component Swapping:**\n```python\n# Each component implements abstract interfaces\nclass MyCustomTTS(TTSEngine):\n def synthesize(self, text, voice, speed): ...\n def list_voices(self): ...\n```\n\n## \ud83c\udfaf Quick Examples\n\nSee **[docs/EXAMPLES.md](https://github.com/danielcorsano/reader/blob/main/docs/EXAMPLES.md)** for detailed examples including:\n- Voice testing and selection\n- PDF processing workflows \n- Markdown chapter handling\n- Batch processing scripts\n- Configuration optimization\n\n## \ud83d\udcca Technical Specs\n\n- **TTS Engine**: Kokoro-82M (82M parameters, Apache 2.0 license)\n- **Model Size**: ~310MB ONNX models (auto-downloaded on first use to cache)\n- **Model Cache**: Follows XDG standard (`~/.cache/audiobook-reader/models/`)\n- **Python**: 3.10-3.13 compatibility\n- **Platforms**: macOS, Linux, Windows (all fully supported)\n- **Audio Quality**: 48kHz mono MP3, configurable bitrate (32k-64k, default 48k)\n- **Hardware Acceleration**:\n - \u2705 Apple Silicon (M1/M2/M3/M4): CoreML (Neural Engine) - automatic\n - \u2705 NVIDIA GPUs: CUDA via onnxruntime-gpu\n - \u2705 AMD/Intel GPUs: DirectML on Windows\n - \u2705 CPU: Works efficiently on all processors\n- **Performance**: Hardware-accelerated on all major platforms\n- **Memory**: Efficient streaming processing for large books\n\n## \ud83c\udfb5 Audio Quality\n\n**Kokoro TTS** (primary engine):\n- \u2705 Near-human quality neural voices\n- \u2705 48 voices across 8 languages\n- \u2705 Apple Neural Engine acceleration\n- \u2705 Professional audiobook production\n- \u2705 Consistent narration (no hallucinations)\n\n---\n\n## \ud83d\udd27 Troubleshooting\n\n### FFmpeg Not Found\n**Error**: `FFmpeg not found` or `Command 'ffmpeg' not found`\n\n**Solution**:\n```bash\n# macOS\nbrew install ffmpeg\n\n# Ubuntu/Debian\nsudo apt-get install ffmpeg\n\n# Windows\n# Download from https://ffmpeg.org/download.html\n# Or use: choco install ffmpeg\n```\n\n### Models Not Downloading\n**Error**: `Failed to download Kokoro models`\n\n**Solution**:\nModels auto-download on first use (~310MB). If automatic download fails:\n```bash\n# Download to system cache (default)\nreader download models\n\n# Download to local models/ folder (permanent storage)\nreader download models --local\n\n# Force re-download\nreader download models --force\n```\n\n**Model Storage Options:**\n- **Cache** (default): System cache directory, shared across installations\n - macOS: `~/Library/Caches/audiobook-reader/models/`\n - Linux: `~/.cache/audiobook-reader/models/`\n - Windows: `%LOCALAPPDATA%\\audiobook-reader\\models\\`\n- **Local** (`--local` flag): `models/` folder in package root\n - Permanent local storage, survives cache clears\n - Priority: Reader checks `models/` first, then falls back to cache\n\n### Neural Engine Not Detected (Apple Silicon)\n**Error**: `Neural Engine not available, using CPU`\n\n**Solution**:\n- Ensure you're on Apple Silicon (M1/M2/M3/M4 Mac)\n- Update macOS to latest version\n- Reinstall onnxruntime: `pip uninstall onnxruntime && pip install onnxruntime`\n- CPU processing works fine but is slower than GPU/NPU\n\n### Permission Errors\n**Error**: `Permission denied` when creating directories\n\n**Solution**:\n```bash\n# Ensure write permissions in project directory\nchmod -R u+w /path/to/reader\n\n# Or run from a directory you own\ncd ~/Documents\ngit clone https://github.com/danielcorsano/reader.git\ncd reader\n```\n\n### Import Errors\n**Error**: `ModuleNotFoundError: No module named 'kokoro_onnx'`\n\n**Solution**:\n```bash\n# Reinstall package\npip install --force-reinstall audiobook-reader\n```\n\n### Invalid Input Format\n**Error**: `Unsupported file format`\n\n**Supported formats**: `.epub`, `.pdf`, `.txt`, `.md`, `.rst`\n\n**Solution**:\n```bash\n# Convert your file to a supported format first\n# For Word docs: Save as .txt or .pdf\n# For HTML: Save as .txt or use pandoc to convert\n```\n\n### GPU Acceleration Issues\n**NVIDIA GPU**: Requires `onnxruntime-gpu` instead of `onnxruntime`\n```bash\npip uninstall onnxruntime\npip install onnxruntime-gpu\n```\n\n**AMD/Intel GPU (Windows)**: Requires `onnxruntime-directml`\n```bash\npip uninstall onnxruntime\npip install onnxruntime-directml\n```\n\n### Still Having Issues?\n- Check the [GitHub Issues](https://github.com/danielcorsano/reader/issues)\n- Run with debug mode: `reader convert --debug --file yourfile.txt`\n- Verify Python version: `python --version` (requires 3.10-3.13)\n\n## \ud83d\udcdc Credits & Licensing\n\n### Kokoro TTS Model\nThis project uses the [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model by [hexgrad](https://github.com/hexgrad/kokoro), licensed under Apache 2.0.\n\n**Model Credits:**\n- Original Model: [hexgrad/Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) (Apache 2.0)\n- ONNX Wrapper: [kokoro-onnx](https://github.com/thewh1teagle/kokoro-onnx) by thewh1teagle (MIT)\n- Training datasets: Koniwa (CC BY 3.0), SIWIS (CC BY 4.0)\n\n### Reader Package\nThis audiobook CLI tool is licensed under the MIT License. See `LICENSE` file for details.\n\n---\n\n**Ready to create your first audiobook?** Check out the **[Usage Guide](https://github.com/danielcorsano/reader/blob/main/docs/USAGE.md)** for step-by-step instructions!",
"bugtrack_url": null,
"license": "MIT",
"summary": "AI-powered audiobook generator with GPU/NPU acceleration (up to 6x faster than real-time). Built-in Kokoro-82M TTS with character-aware voices and emotion analysis.",
"version": "0.1.3",
"project_urls": {
"Documentation": "https://github.com/danielcorsano/reader/tree/main/docs",
"Homepage": "https://github.com/danielcorsano/reader",
"Repository": "https://github.com/danielcorsano/reader"
},
"split_keywords": [
"audiobook",
" text-to-speech",
" tts",
" kokoro",
" neural-engine",
" apple-silicon",
" m1",
" m2",
" m3",
" m4",
" epub",
" pdf",
" cli",
" audio",
" conversion",
" character-voices",
" emotion-analysis"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3c30d923e45d9956e30905ed57bf1f93131d928430511141271191dfbc261644",
"md5": "8c4700fa213707827c9ff15c2ccbc615",
"sha256": "e557e4318b3a8fefa9061474be30035d95975c538e201d80af0912ae8c7d1b1b"
},
"downloads": -1,
"filename": "audiobook_reader-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8c4700fa213707827c9ff15c2ccbc615",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.14,>=3.10",
"size": 84308,
"upload_time": "2025-10-11T21:04:24",
"upload_time_iso_8601": "2025-10-11T21:04:24.977034Z",
"url": "https://files.pythonhosted.org/packages/3c/30/d923e45d9956e30905ed57bf1f93131d928430511141271191dfbc261644/audiobook_reader-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ded8272512ebd2dcfe34abc009d9ddfd223ef0c7da7c12877310ce3cd159a0e9",
"md5": "c6da466ec39d9757895a9ddc70e576b0",
"sha256": "c62f56824a5077a9272cc194e924f07b87b1030fd566c73615ac0d995ef6a04d"
},
"downloads": -1,
"filename": "audiobook_reader-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "c6da466ec39d9757895a9ddc70e576b0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.14,>=3.10",
"size": 75434,
"upload_time": "2025-10-11T21:04:26",
"upload_time_iso_8601": "2025-10-11T21:04:26.628445Z",
"url": "https://files.pythonhosted.org/packages/de/d8/272512ebd2dcfe34abc009d9ddfd223ef0c7da7c12877310ce3cd159a0e9/audiobook_reader-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-11 21:04:26",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "danielcorsano",
"github_project": "reader",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "audiobook-reader"
}