audiobook-reader


Nameaudiobook-reader JSON
Version 0.1.3 PyPI version JSON
download
home_pagehttps://github.com/danielcorsano/reader
SummaryAI-powered audiobook generator with GPU/NPU acceleration (up to 6x faster than real-time). Built-in Kokoro-82M TTS with character-aware voices and emotion analysis.
upload_time2025-10-11 21:04:26
maintainerNone
docs_urlNone
authordanielcorsano
requires_python<3.14,>=3.10
licenseMIT
keywords audiobook text-to-speech tts kokoro neural-engine apple-silicon m1 m2 m3 m4 epub pdf cli audio conversion character-voices emotion-analysis
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # AI Audiobook Generator: CLI Tool with GPU & NPU Acceleration

[![PyPI](https://img.shields.io/pypi/v/reader)](https://pypi.org/project/reader/)
[![Python](https://img.shields.io/pypi/pyversions/reader)](https://pypi.org/project/reader/)
[![License](https://img.shields.io/pypi/l/reader)](https://github.com/danielcorsano/reader/blob/main/LICENSE)
[![Downloads](https://img.shields.io/pypi/dm/reader)](https://pypi.org/project/reader/)

**Transform long-form text into professional audiobooks with character-aware voices, emotion analysis, and intelligent processing.**

Perfect for novels, articles, textbooks, research papers, and any long-form content. Built with Kokoro-82M TTS for production-quality narration. Works on all platforms with optimizations for Apple Silicon (M1/M2/M3/M4 Neural Engine), NVIDIA GPUs, and AMD/Intel GPUs.

## ✨ Core Features

### ⚡ **High-Performance Conversion**
- **Up to 6x faster than real-time** on Apple Silicon (M1/M2/M3/M4) with Neural Engine
- **GPU acceleration** for NVIDIA (CUDA), AMD/Intel (DirectML on Windows)
- **Efficient CPU processing** on all platforms
- Kokoro-82M engine optimized for speed + quality balance

### 🎭 **Character-Aware Narration**
- **Automatic character detection** in dialogue
- **Auto-assign different voices** with automatic gender detection when possible
- Assigns gender-appropriate voices (e.g., Alice gets `af_sarah`, Bob gets `am_adam`)
- Perfect for fiction, interviews, dialogues, and multi-speaker content

### 😊 **Emotion Analysis**
- **VADER sentiment analysis** adjusts prosody in real-time
- Excitement, sadness, tension automatically reflected in voice tone
- Natural emotional narration without manual SSML tagging

### 💾 **Checkpoint Resumption**
- **Resume interrupted conversions** from where you left off
- Essential for extra-long texts (500+ page books, textbooks, research papers)
- Reliable production workflow for lengthy content

### 📚 **Chapter Management**
- **Automatic chapter detection** from EPUB TOC, PDF structure, or text patterns
- **M4B audiobook format** with chapter metadata
- Chapter timestamps and navigation

### 📊 **Professional Production Tools**
- **4 progress visualization styles**: simple, tqdm, rich, timeseries charts
- **Real-time metrics**: processing speed, ETA, completion percentage
- **Batch processing** with queue management
- **Multiple output formats**: MP3 (48kHz mono optimized), WAV, M4A, M4B

### 🎙️ **Production-Quality TTS**
- **Kokoro-82M**: 48 high-quality neural voices across 8 languages
- **Near-human quality** narration
- **Consistent voice** throughout long documents
- No voice cloning overhead

---

## ⚖️ Copyright Notice

**IMPORTANT**: This software is a tool for converting text to audio. Users are solely responsible for:

- Ensuring they have the legal right to convert any text to audio
- Obtaining necessary permissions for copyrighted materials
- Complying with all applicable copyright laws and licensing terms
- Understanding that creating audiobooks from copyrighted text without authorization may constitute copyright infringement

**Recommended Use Cases:**
- ✅ Your own original content
- ✅ Public domain works
- ✅ Content you have explicit permission to convert
- ✅ Educational materials you legally own
- ✅ Open-source or Creative Commons licensed texts (per their terms)

The developers of audiobook-reader do not condone or support copyright infringement. By using this software, you agree to use it only for content you have the legal right to convert.

---

## 📚 Supported Input Formats

EPUB, PDF, TXT, Markdown, ReStructuredText

## 📦 Installation

### Using pip (recommended for users)
```bash
# Default installation (Kokoro TTS + core features)
pip install audiobook-reader

# With all progress visualizations (tqdm, rich, plotext)
pip install audiobook-reader[progress-full]

# With system monitoring
pip install audiobook-reader[monitoring]

# With everything
pip install audiobook-reader[all]
```

### Hardware Acceleration Options

audiobook-reader works great on **all platforms**. For maximum performance, enable hardware acceleration:

#### ✅ Apple Silicon (M1/M2/M3/M4)
**Neural Engine (CoreML) works automatically** - no additional setup needed!

```bash
pip install audiobook-reader
# That's it! CoreML acceleration is built-in
```

#### ✅ NVIDIA GPU (Windows/Linux)
Get **CUDA acceleration** with a simple package swap:

```bash
pip install audiobook-reader
pip uninstall onnxruntime
pip install onnxruntime-gpu
```

#### ✅ AMD/Intel GPU (Windows)
Get **DirectML acceleration**:

```bash
pip install audiobook-reader
pip uninstall onnxruntime
pip install onnxruntime-directml
```

#### ✅ CPU Only (All Platforms)
**No GPU? No problem!** The default installation works efficiently on any CPU:

```bash
pip install audiobook-reader
# Works great on Intel, AMD, ARM processors
```

## 🚀 Quick Start

```bash
# 1. Install
pip install audiobook-reader

# 2. Models auto-download on first use (~310MB)
#    Or manually: reader download models
#    For permanent local storage: reader download models --local

# 3. Add a text file
echo "Hello world! This is my first audiobook." > text/hello.txt

# 4. Convert to audiobook (Neural Engine optimized)
reader convert

# 5. Listen to finished/hello_kokoro_am_michael.mp3
```

### 🎭 Character Voices (Optional)

For books with dialogue, assign different voices to each character:

```bash
# Auto-detect characters and generate config
reader characters detect text/mybook.txt --auto-assign

# OR manually create mybook.characters.yaml:
# characters:
#   - name: Alice
#     voice: af_sarah
#     gender: female
#   - name: Bob
#     voice: am_michael
#     gender: male

# Convert with character voices
reader convert --characters --file text/mybook.txt
```

## 📖 Documentation

- **[Usage Guide](https://github.com/danielcorsano/reader/blob/main/docs/USAGE.md)** - Complete command reference and workflows
- **[Examples](https://github.com/danielcorsano/reader/blob/main/docs/EXAMPLES.md)** - Real-world examples and use cases
- **[Advanced Features](https://github.com/danielcorsano/reader/blob/main/docs/ADVANCED_FEATURES.md)** - Professional audiobook production features
- **[Kokoro Setup](https://github.com/danielcorsano/reader/blob/main/docs/KOKORO_SETUP.md)** - Neural TTS model setup guide

## 🎙️ Command Reference

### Basic Conversion
```bash
# Convert single file with Neural Engine acceleration
reader convert --file text/book.epub

# Convert with specific voice
reader convert --file text/book.epub --voice am_michael

# Kokoro is the TTS engine

# Enable debug mode to see Neural Engine status
reader convert --file text/book.epub --debug
```

### 📊 Progress Visualization Options

```bash
# Simple text progress (default)
reader convert --progress-style simple --file "book.epub"

# Professional progress bars with speed metrics
reader convert --progress-style tqdm --file "book.epub"

# Beautiful Rich formatted displays with colors
reader convert --progress-style rich --file "book.epub"

# Real-time ASCII charts showing processing speed
reader convert --progress-style timeseries --file "book.epub"
```

### Configuration Management
```bash
# Save permanent settings to config file
reader config --engine kokoro --voice am_michael --format mp3

# List available Kokoro voices
reader voices

# View current configuration
reader config

# View application info and features
reader info
```

### **Parameter Hierarchy (How Settings Work)**
1. **CLI parameters** (highest priority) - temporary overrides, never saved
2. **Config file** (middle priority) - your saved preferences  
3. **Code defaults** (lowest priority) - sensible fallbacks

Example:
```bash
# Save your preferred settings
reader config --engine kokoro --voice am_michael --format mp3

# Use temporary override (doesn't change your saved config)
reader convert --voice af_sarah

# Your config file still has kokoro/am_michael/mp3 saved
```

## 📁 File Support

### Input Formats
| Format | Extension | Chapter Detection |
|--------|-----------|------------------|
| EPUB | `.epub` | ✅ Automatic from TOC |
| PDF | `.pdf` | ✅ Page-based |
| Text | `.txt` | ✅ Simple patterns |
| Markdown | `.md` | ✅ Header-based |
| ReStructuredText | `.rst` | ✅ Header-based |

### Output Formats
- **MP3** (default) - 48kHz mono, configurable bitrate (32k-64k, default 48k)
- **WAV** - Uncompressed, high quality
- **M4A** - Apple-friendly format
- **M4B** - Audiobook format with chapter support

## 🏗️ Project Structure

```
reader/
├── text/                   # 📂 Input files (your books)
├── audio/                  # 🔊 Temporary processing
├── finished/               # ✅ Completed audiobooks
├── config/                 # ⚙️ Configuration files
├── models/                 # 🤖 Kokoro TTS models
└── reader/
    ├── engines/           # 🎙️ TTS engine (Kokoro)
    ├── parsers/           # 📖 File format parsers
    ├── batch/             # 💾 Neural Engine processor
    ├── analysis/          # 🎭 Emotion/dialogue detection
    └── cli.py             # 💻 Command-line interface
```

## 🎨 Example Workflows

### Simple Book Conversion
```bash
# Add your book
cp "My Novel.epub" text/

# Convert with Neural Engine acceleration
reader convert

# Result: finished/My Novel_kokoro_am_michael.mp3
```

### Voice Comparison
```bash
# Test different Kokoro voices on same content
reader convert --voice af_sarah --file text/sample.txt
reader convert --voice am_adam --file text/sample.txt
reader convert --voice bf_emma --file text/sample.txt

# Compare finished/sample_*.mp3 outputs
```

### Batch Processing
```bash
# Add multiple books
cp book1.epub book2.pdf story.txt text/

# Set default voice and convert all
reader config --voice am_michael --speed 1.0
reader convert

# Results: finished/book1_*.mp3, finished/book2_*.mp3, finished/story_*.mp3
```

## ⚙️ Configuration

Settings are saved to `config/settings.yaml`:

```yaml
tts:
  engine: kokoro           # TTS engine (Kokoro)
  voice: am_michael        # Default voice
  speed: 1.0               # Speech rate multiplier
  volume: 1.0              # Volume level
audio:
  format: mp3              # Output format (mp3, wav, m4a, m4b)
  bitrate: 48k             # MP3 bitrate (32k-64k typical for audiobooks)
  add_metadata: true       # Metadata support
processing:
  chunk_size: 400          # Text chunk size for processing (Kokoro optimal)
  auto_detect_chapters: true  # Chapter detection
```

## 🛠️ Development

**Modular Architecture Benefits:**
- **Easy TTS upgrades**: pyttsx3 → Kokoro → Custom engines
- **New format support**: Add parsers for Word, HTML, etc.  
- **Enhanced processing**: Audio effects, normalization, etc.
- **Cloud integration**: Azure, AWS, Google TTS services

**Component Swapping:**
```python
# Each component implements abstract interfaces
class MyCustomTTS(TTSEngine):
    def synthesize(self, text, voice, speed): ...
    def list_voices(self): ...
```

## 🎯 Quick Examples

See **[docs/EXAMPLES.md](https://github.com/danielcorsano/reader/blob/main/docs/EXAMPLES.md)** for detailed examples including:
- Voice testing and selection
- PDF processing workflows  
- Markdown chapter handling
- Batch processing scripts
- Configuration optimization

## 📊 Technical Specs

- **TTS Engine**: Kokoro-82M (82M parameters, Apache 2.0 license)
- **Model Size**: ~310MB ONNX models (auto-downloaded on first use to cache)
- **Model Cache**: Follows XDG standard (`~/.cache/audiobook-reader/models/`)
- **Python**: 3.10-3.13 compatibility
- **Platforms**: macOS, Linux, Windows (all fully supported)
- **Audio Quality**: 48kHz mono MP3, configurable bitrate (32k-64k, default 48k)
- **Hardware Acceleration**:
  - ✅ Apple Silicon (M1/M2/M3/M4): CoreML (Neural Engine) - automatic
  - ✅ NVIDIA GPUs: CUDA via onnxruntime-gpu
  - ✅ AMD/Intel GPUs: DirectML on Windows
  - ✅ CPU: Works efficiently on all processors
- **Performance**: Hardware-accelerated on all major platforms
- **Memory**: Efficient streaming processing for large books

## 🎵 Audio Quality

**Kokoro TTS** (primary engine):
- ✅ Near-human quality neural voices
- ✅ 48 voices across 8 languages
- ✅ Apple Neural Engine acceleration
- ✅ Professional audiobook production
- ✅ Consistent narration (no hallucinations)

---

## 🔧 Troubleshooting

### FFmpeg Not Found
**Error**: `FFmpeg not found` or `Command 'ffmpeg' not found`

**Solution**:
```bash
# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html
# Or use: choco install ffmpeg
```

### Models Not Downloading
**Error**: `Failed to download Kokoro models`

**Solution**:
Models auto-download on first use (~310MB). If automatic download fails:
```bash
# Download to system cache (default)
reader download models

# Download to local models/ folder (permanent storage)
reader download models --local

# Force re-download
reader download models --force
```

**Model Storage Options:**
- **Cache** (default): System cache directory, shared across installations
  - macOS: `~/Library/Caches/audiobook-reader/models/`
  - Linux: `~/.cache/audiobook-reader/models/`
  - Windows: `%LOCALAPPDATA%\audiobook-reader\models\`
- **Local** (`--local` flag): `models/` folder in package root
  - Permanent local storage, survives cache clears
  - Priority: Reader checks `models/` first, then falls back to cache

### Neural Engine Not Detected (Apple Silicon)
**Error**: `Neural Engine not available, using CPU`

**Solution**:
- Ensure you're on Apple Silicon (M1/M2/M3/M4 Mac)
- Update macOS to latest version
- Reinstall onnxruntime: `pip uninstall onnxruntime && pip install onnxruntime`
- CPU processing works fine but is slower than GPU/NPU

### Permission Errors
**Error**: `Permission denied` when creating directories

**Solution**:
```bash
# Ensure write permissions in project directory
chmod -R u+w /path/to/reader

# Or run from a directory you own
cd ~/Documents
git clone https://github.com/danielcorsano/reader.git
cd reader
```

### Import Errors
**Error**: `ModuleNotFoundError: No module named 'kokoro_onnx'`

**Solution**:
```bash
# Reinstall package
pip install --force-reinstall audiobook-reader
```

### Invalid Input Format
**Error**: `Unsupported file format`

**Supported formats**: `.epub`, `.pdf`, `.txt`, `.md`, `.rst`

**Solution**:
```bash
# Convert your file to a supported format first
# For Word docs: Save as .txt or .pdf
# For HTML: Save as .txt or use pandoc to convert
```

### GPU Acceleration Issues
**NVIDIA GPU**: Requires `onnxruntime-gpu` instead of `onnxruntime`
```bash
pip uninstall onnxruntime
pip install onnxruntime-gpu
```

**AMD/Intel GPU (Windows)**: Requires `onnxruntime-directml`
```bash
pip uninstall onnxruntime
pip install onnxruntime-directml
```

### Still Having Issues?
- Check the [GitHub Issues](https://github.com/danielcorsano/reader/issues)
- Run with debug mode: `reader convert --debug --file yourfile.txt`
- Verify Python version: `python --version` (requires 3.10-3.13)

## 📜 Credits & Licensing

### Kokoro TTS Model
This project uses the [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model by [hexgrad](https://github.com/hexgrad/kokoro), licensed under Apache 2.0.

**Model Credits:**
- Original Model: [hexgrad/Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) (Apache 2.0)
- ONNX Wrapper: [kokoro-onnx](https://github.com/thewh1teagle/kokoro-onnx) by thewh1teagle (MIT)
- Training datasets: Koniwa (CC BY 3.0), SIWIS (CC BY 4.0)

### Reader Package
This audiobook CLI tool is licensed under the MIT License. See `LICENSE` file for details.

---

**Ready to create your first audiobook?** Check out the **[Usage Guide](https://github.com/danielcorsano/reader/blob/main/docs/USAGE.md)** for step-by-step instructions!
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/danielcorsano/reader",
    "name": "audiobook-reader",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.14,>=3.10",
    "maintainer_email": null,
    "keywords": "audiobook, text-to-speech, tts, kokoro, neural-engine, apple-silicon, m1, m2, m3, m4, epub, pdf, cli, audio, conversion, character-voices, emotion-analysis",
    "author": "danielcorsano",
    "author_email": "danielcorsano@users.noreply.github.com",
    "download_url": "https://files.pythonhosted.org/packages/de/d8/272512ebd2dcfe34abc009d9ddfd223ef0c7da7c12877310ce3cd159a0e9/audiobook_reader-0.1.3.tar.gz",
    "platform": null,
    "description": "# AI Audiobook Generator: CLI Tool with GPU & NPU Acceleration\n\n[![PyPI](https://img.shields.io/pypi/v/reader)](https://pypi.org/project/reader/)\n[![Python](https://img.shields.io/pypi/pyversions/reader)](https://pypi.org/project/reader/)\n[![License](https://img.shields.io/pypi/l/reader)](https://github.com/danielcorsano/reader/blob/main/LICENSE)\n[![Downloads](https://img.shields.io/pypi/dm/reader)](https://pypi.org/project/reader/)\n\n**Transform long-form text into professional audiobooks with character-aware voices, emotion analysis, and intelligent processing.**\n\nPerfect for novels, articles, textbooks, research papers, and any long-form content. Built with Kokoro-82M TTS for production-quality narration. Works on all platforms with optimizations for Apple Silicon (M1/M2/M3/M4 Neural Engine), NVIDIA GPUs, and AMD/Intel GPUs.\n\n## \u2728 Core Features\n\n### \u26a1 **High-Performance Conversion**\n- **Up to 6x faster than real-time** on Apple Silicon (M1/M2/M3/M4) with Neural Engine\n- **GPU acceleration** for NVIDIA (CUDA), AMD/Intel (DirectML on Windows)\n- **Efficient CPU processing** on all platforms\n- Kokoro-82M engine optimized for speed + quality balance\n\n### \ud83c\udfad **Character-Aware Narration**\n- **Automatic character detection** in dialogue\n- **Auto-assign different voices** with automatic gender detection when possible\n- Assigns gender-appropriate voices (e.g., Alice gets `af_sarah`, Bob gets `am_adam`)\n- Perfect for fiction, interviews, dialogues, and multi-speaker content\n\n### \ud83d\ude0a **Emotion Analysis**\n- **VADER sentiment analysis** adjusts prosody in real-time\n- Excitement, sadness, tension automatically reflected in voice tone\n- Natural emotional narration without manual SSML tagging\n\n### \ud83d\udcbe **Checkpoint Resumption**\n- **Resume interrupted conversions** from where you left off\n- Essential for extra-long texts (500+ page books, textbooks, research papers)\n- Reliable production workflow for lengthy content\n\n### \ud83d\udcda **Chapter Management**\n- **Automatic chapter detection** from EPUB TOC, PDF structure, or text patterns\n- **M4B audiobook format** with chapter metadata\n- Chapter timestamps and navigation\n\n### \ud83d\udcca **Professional Production Tools**\n- **4 progress visualization styles**: simple, tqdm, rich, timeseries charts\n- **Real-time metrics**: processing speed, ETA, completion percentage\n- **Batch processing** with queue management\n- **Multiple output formats**: MP3 (48kHz mono optimized), WAV, M4A, M4B\n\n### \ud83c\udf99\ufe0f **Production-Quality TTS**\n- **Kokoro-82M**: 48 high-quality neural voices across 8 languages\n- **Near-human quality** narration\n- **Consistent voice** throughout long documents\n- No voice cloning overhead\n\n---\n\n## \u2696\ufe0f Copyright Notice\n\n**IMPORTANT**: This software is a tool for converting text to audio. Users are solely responsible for:\n\n- Ensuring they have the legal right to convert any text to audio\n- Obtaining necessary permissions for copyrighted materials\n- Complying with all applicable copyright laws and licensing terms\n- Understanding that creating audiobooks from copyrighted text without authorization may constitute copyright infringement\n\n**Recommended Use Cases:**\n- \u2705 Your own original content\n- \u2705 Public domain works\n- \u2705 Content you have explicit permission to convert\n- \u2705 Educational materials you legally own\n- \u2705 Open-source or Creative Commons licensed texts (per their terms)\n\nThe developers of audiobook-reader do not condone or support copyright infringement. By using this software, you agree to use it only for content you have the legal right to convert.\n\n---\n\n## \ud83d\udcda Supported Input Formats\n\nEPUB, PDF, TXT, Markdown, ReStructuredText\n\n## \ud83d\udce6 Installation\n\n### Using pip (recommended for users)\n```bash\n# Default installation (Kokoro TTS + core features)\npip install audiobook-reader\n\n# With all progress visualizations (tqdm, rich, plotext)\npip install audiobook-reader[progress-full]\n\n# With system monitoring\npip install audiobook-reader[monitoring]\n\n# With everything\npip install audiobook-reader[all]\n```\n\n### Hardware Acceleration Options\n\naudiobook-reader works great on **all platforms**. For maximum performance, enable hardware acceleration:\n\n#### \u2705 Apple Silicon (M1/M2/M3/M4)\n**Neural Engine (CoreML) works automatically** - no additional setup needed!\n\n```bash\npip install audiobook-reader\n# That's it! CoreML acceleration is built-in\n```\n\n#### \u2705 NVIDIA GPU (Windows/Linux)\nGet **CUDA acceleration** with a simple package swap:\n\n```bash\npip install audiobook-reader\npip uninstall onnxruntime\npip install onnxruntime-gpu\n```\n\n#### \u2705 AMD/Intel GPU (Windows)\nGet **DirectML acceleration**:\n\n```bash\npip install audiobook-reader\npip uninstall onnxruntime\npip install onnxruntime-directml\n```\n\n#### \u2705 CPU Only (All Platforms)\n**No GPU? No problem!** The default installation works efficiently on any CPU:\n\n```bash\npip install audiobook-reader\n# Works great on Intel, AMD, ARM processors\n```\n\n## \ud83d\ude80 Quick Start\n\n```bash\n# 1. Install\npip install audiobook-reader\n\n# 2. Models auto-download on first use (~310MB)\n#    Or manually: reader download models\n#    For permanent local storage: reader download models --local\n\n# 3. Add a text file\necho \"Hello world! This is my first audiobook.\" > text/hello.txt\n\n# 4. Convert to audiobook (Neural Engine optimized)\nreader convert\n\n# 5. Listen to finished/hello_kokoro_am_michael.mp3\n```\n\n### \ud83c\udfad Character Voices (Optional)\n\nFor books with dialogue, assign different voices to each character:\n\n```bash\n# Auto-detect characters and generate config\nreader characters detect text/mybook.txt --auto-assign\n\n# OR manually create mybook.characters.yaml:\n# characters:\n#   - name: Alice\n#     voice: af_sarah\n#     gender: female\n#   - name: Bob\n#     voice: am_michael\n#     gender: male\n\n# Convert with character voices\nreader convert --characters --file text/mybook.txt\n```\n\n## \ud83d\udcd6 Documentation\n\n- **[Usage Guide](https://github.com/danielcorsano/reader/blob/main/docs/USAGE.md)** - Complete command reference and workflows\n- **[Examples](https://github.com/danielcorsano/reader/blob/main/docs/EXAMPLES.md)** - Real-world examples and use cases\n- **[Advanced Features](https://github.com/danielcorsano/reader/blob/main/docs/ADVANCED_FEATURES.md)** - Professional audiobook production features\n- **[Kokoro Setup](https://github.com/danielcorsano/reader/blob/main/docs/KOKORO_SETUP.md)** - Neural TTS model setup guide\n\n## \ud83c\udf99\ufe0f Command Reference\n\n### Basic Conversion\n```bash\n# Convert single file with Neural Engine acceleration\nreader convert --file text/book.epub\n\n# Convert with specific voice\nreader convert --file text/book.epub --voice am_michael\n\n# Kokoro is the TTS engine\n\n# Enable debug mode to see Neural Engine status\nreader convert --file text/book.epub --debug\n```\n\n### \ud83d\udcca Progress Visualization Options\n\n```bash\n# Simple text progress (default)\nreader convert --progress-style simple --file \"book.epub\"\n\n# Professional progress bars with speed metrics\nreader convert --progress-style tqdm --file \"book.epub\"\n\n# Beautiful Rich formatted displays with colors\nreader convert --progress-style rich --file \"book.epub\"\n\n# Real-time ASCII charts showing processing speed\nreader convert --progress-style timeseries --file \"book.epub\"\n```\n\n### Configuration Management\n```bash\n# Save permanent settings to config file\nreader config --engine kokoro --voice am_michael --format mp3\n\n# List available Kokoro voices\nreader voices\n\n# View current configuration\nreader config\n\n# View application info and features\nreader info\n```\n\n### **Parameter Hierarchy (How Settings Work)**\n1. **CLI parameters** (highest priority) - temporary overrides, never saved\n2. **Config file** (middle priority) - your saved preferences  \n3. **Code defaults** (lowest priority) - sensible fallbacks\n\nExample:\n```bash\n# Save your preferred settings\nreader config --engine kokoro --voice am_michael --format mp3\n\n# Use temporary override (doesn't change your saved config)\nreader convert --voice af_sarah\n\n# Your config file still has kokoro/am_michael/mp3 saved\n```\n\n## \ud83d\udcc1 File Support\n\n### Input Formats\n| Format | Extension | Chapter Detection |\n|--------|-----------|------------------|\n| EPUB | `.epub` | \u2705 Automatic from TOC |\n| PDF | `.pdf` | \u2705 Page-based |\n| Text | `.txt` | \u2705 Simple patterns |\n| Markdown | `.md` | \u2705 Header-based |\n| ReStructuredText | `.rst` | \u2705 Header-based |\n\n### Output Formats\n- **MP3** (default) - 48kHz mono, configurable bitrate (32k-64k, default 48k)\n- **WAV** - Uncompressed, high quality\n- **M4A** - Apple-friendly format\n- **M4B** - Audiobook format with chapter support\n\n## \ud83c\udfd7\ufe0f Project Structure\n\n```\nreader/\n\u251c\u2500\u2500 text/                   # \ud83d\udcc2 Input files (your books)\n\u251c\u2500\u2500 audio/                  # \ud83d\udd0a Temporary processing\n\u251c\u2500\u2500 finished/               # \u2705 Completed audiobooks\n\u251c\u2500\u2500 config/                 # \u2699\ufe0f Configuration files\n\u251c\u2500\u2500 models/                 # \ud83e\udd16 Kokoro TTS models\n\u2514\u2500\u2500 reader/\n    \u251c\u2500\u2500 engines/           # \ud83c\udf99\ufe0f TTS engine (Kokoro)\n    \u251c\u2500\u2500 parsers/           # \ud83d\udcd6 File format parsers\n    \u251c\u2500\u2500 batch/             # \ud83d\udcbe Neural Engine processor\n    \u251c\u2500\u2500 analysis/          # \ud83c\udfad Emotion/dialogue detection\n    \u2514\u2500\u2500 cli.py             # \ud83d\udcbb Command-line interface\n```\n\n## \ud83c\udfa8 Example Workflows\n\n### Simple Book Conversion\n```bash\n# Add your book\ncp \"My Novel.epub\" text/\n\n# Convert with Neural Engine acceleration\nreader convert\n\n# Result: finished/My Novel_kokoro_am_michael.mp3\n```\n\n### Voice Comparison\n```bash\n# Test different Kokoro voices on same content\nreader convert --voice af_sarah --file text/sample.txt\nreader convert --voice am_adam --file text/sample.txt\nreader convert --voice bf_emma --file text/sample.txt\n\n# Compare finished/sample_*.mp3 outputs\n```\n\n### Batch Processing\n```bash\n# Add multiple books\ncp book1.epub book2.pdf story.txt text/\n\n# Set default voice and convert all\nreader config --voice am_michael --speed 1.0\nreader convert\n\n# Results: finished/book1_*.mp3, finished/book2_*.mp3, finished/story_*.mp3\n```\n\n## \u2699\ufe0f Configuration\n\nSettings are saved to `config/settings.yaml`:\n\n```yaml\ntts:\n  engine: kokoro           # TTS engine (Kokoro)\n  voice: am_michael        # Default voice\n  speed: 1.0               # Speech rate multiplier\n  volume: 1.0              # Volume level\naudio:\n  format: mp3              # Output format (mp3, wav, m4a, m4b)\n  bitrate: 48k             # MP3 bitrate (32k-64k typical for audiobooks)\n  add_metadata: true       # Metadata support\nprocessing:\n  chunk_size: 400          # Text chunk size for processing (Kokoro optimal)\n  auto_detect_chapters: true  # Chapter detection\n```\n\n## \ud83d\udee0\ufe0f Development\n\n**Modular Architecture Benefits:**\n- **Easy TTS upgrades**: pyttsx3 \u2192 Kokoro \u2192 Custom engines\n- **New format support**: Add parsers for Word, HTML, etc.  \n- **Enhanced processing**: Audio effects, normalization, etc.\n- **Cloud integration**: Azure, AWS, Google TTS services\n\n**Component Swapping:**\n```python\n# Each component implements abstract interfaces\nclass MyCustomTTS(TTSEngine):\n    def synthesize(self, text, voice, speed): ...\n    def list_voices(self): ...\n```\n\n## \ud83c\udfaf Quick Examples\n\nSee **[docs/EXAMPLES.md](https://github.com/danielcorsano/reader/blob/main/docs/EXAMPLES.md)** for detailed examples including:\n- Voice testing and selection\n- PDF processing workflows  \n- Markdown chapter handling\n- Batch processing scripts\n- Configuration optimization\n\n## \ud83d\udcca Technical Specs\n\n- **TTS Engine**: Kokoro-82M (82M parameters, Apache 2.0 license)\n- **Model Size**: ~310MB ONNX models (auto-downloaded on first use to cache)\n- **Model Cache**: Follows XDG standard (`~/.cache/audiobook-reader/models/`)\n- **Python**: 3.10-3.13 compatibility\n- **Platforms**: macOS, Linux, Windows (all fully supported)\n- **Audio Quality**: 48kHz mono MP3, configurable bitrate (32k-64k, default 48k)\n- **Hardware Acceleration**:\n  - \u2705 Apple Silicon (M1/M2/M3/M4): CoreML (Neural Engine) - automatic\n  - \u2705 NVIDIA GPUs: CUDA via onnxruntime-gpu\n  - \u2705 AMD/Intel GPUs: DirectML on Windows\n  - \u2705 CPU: Works efficiently on all processors\n- **Performance**: Hardware-accelerated on all major platforms\n- **Memory**: Efficient streaming processing for large books\n\n## \ud83c\udfb5 Audio Quality\n\n**Kokoro TTS** (primary engine):\n- \u2705 Near-human quality neural voices\n- \u2705 48 voices across 8 languages\n- \u2705 Apple Neural Engine acceleration\n- \u2705 Professional audiobook production\n- \u2705 Consistent narration (no hallucinations)\n\n---\n\n## \ud83d\udd27 Troubleshooting\n\n### FFmpeg Not Found\n**Error**: `FFmpeg not found` or `Command 'ffmpeg' not found`\n\n**Solution**:\n```bash\n# macOS\nbrew install ffmpeg\n\n# Ubuntu/Debian\nsudo apt-get install ffmpeg\n\n# Windows\n# Download from https://ffmpeg.org/download.html\n# Or use: choco install ffmpeg\n```\n\n### Models Not Downloading\n**Error**: `Failed to download Kokoro models`\n\n**Solution**:\nModels auto-download on first use (~310MB). If automatic download fails:\n```bash\n# Download to system cache (default)\nreader download models\n\n# Download to local models/ folder (permanent storage)\nreader download models --local\n\n# Force re-download\nreader download models --force\n```\n\n**Model Storage Options:**\n- **Cache** (default): System cache directory, shared across installations\n  - macOS: `~/Library/Caches/audiobook-reader/models/`\n  - Linux: `~/.cache/audiobook-reader/models/`\n  - Windows: `%LOCALAPPDATA%\\audiobook-reader\\models\\`\n- **Local** (`--local` flag): `models/` folder in package root\n  - Permanent local storage, survives cache clears\n  - Priority: Reader checks `models/` first, then falls back to cache\n\n### Neural Engine Not Detected (Apple Silicon)\n**Error**: `Neural Engine not available, using CPU`\n\n**Solution**:\n- Ensure you're on Apple Silicon (M1/M2/M3/M4 Mac)\n- Update macOS to latest version\n- Reinstall onnxruntime: `pip uninstall onnxruntime && pip install onnxruntime`\n- CPU processing works fine but is slower than GPU/NPU\n\n### Permission Errors\n**Error**: `Permission denied` when creating directories\n\n**Solution**:\n```bash\n# Ensure write permissions in project directory\nchmod -R u+w /path/to/reader\n\n# Or run from a directory you own\ncd ~/Documents\ngit clone https://github.com/danielcorsano/reader.git\ncd reader\n```\n\n### Import Errors\n**Error**: `ModuleNotFoundError: No module named 'kokoro_onnx'`\n\n**Solution**:\n```bash\n# Reinstall package\npip install --force-reinstall audiobook-reader\n```\n\n### Invalid Input Format\n**Error**: `Unsupported file format`\n\n**Supported formats**: `.epub`, `.pdf`, `.txt`, `.md`, `.rst`\n\n**Solution**:\n```bash\n# Convert your file to a supported format first\n# For Word docs: Save as .txt or .pdf\n# For HTML: Save as .txt or use pandoc to convert\n```\n\n### GPU Acceleration Issues\n**NVIDIA GPU**: Requires `onnxruntime-gpu` instead of `onnxruntime`\n```bash\npip uninstall onnxruntime\npip install onnxruntime-gpu\n```\n\n**AMD/Intel GPU (Windows)**: Requires `onnxruntime-directml`\n```bash\npip uninstall onnxruntime\npip install onnxruntime-directml\n```\n\n### Still Having Issues?\n- Check the [GitHub Issues](https://github.com/danielcorsano/reader/issues)\n- Run with debug mode: `reader convert --debug --file yourfile.txt`\n- Verify Python version: `python --version` (requires 3.10-3.13)\n\n## \ud83d\udcdc Credits & Licensing\n\n### Kokoro TTS Model\nThis project uses the [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model by [hexgrad](https://github.com/hexgrad/kokoro), licensed under Apache 2.0.\n\n**Model Credits:**\n- Original Model: [hexgrad/Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) (Apache 2.0)\n- ONNX Wrapper: [kokoro-onnx](https://github.com/thewh1teagle/kokoro-onnx) by thewh1teagle (MIT)\n- Training datasets: Koniwa (CC BY 3.0), SIWIS (CC BY 4.0)\n\n### Reader Package\nThis audiobook CLI tool is licensed under the MIT License. See `LICENSE` file for details.\n\n---\n\n**Ready to create your first audiobook?** Check out the **[Usage Guide](https://github.com/danielcorsano/reader/blob/main/docs/USAGE.md)** for step-by-step instructions!",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "AI-powered audiobook generator with GPU/NPU acceleration (up to 6x faster than real-time). Built-in Kokoro-82M TTS with character-aware voices and emotion analysis.",
    "version": "0.1.3",
    "project_urls": {
        "Documentation": "https://github.com/danielcorsano/reader/tree/main/docs",
        "Homepage": "https://github.com/danielcorsano/reader",
        "Repository": "https://github.com/danielcorsano/reader"
    },
    "split_keywords": [
        "audiobook",
        " text-to-speech",
        " tts",
        " kokoro",
        " neural-engine",
        " apple-silicon",
        " m1",
        " m2",
        " m3",
        " m4",
        " epub",
        " pdf",
        " cli",
        " audio",
        " conversion",
        " character-voices",
        " emotion-analysis"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3c30d923e45d9956e30905ed57bf1f93131d928430511141271191dfbc261644",
                "md5": "8c4700fa213707827c9ff15c2ccbc615",
                "sha256": "e557e4318b3a8fefa9061474be30035d95975c538e201d80af0912ae8c7d1b1b"
            },
            "downloads": -1,
            "filename": "audiobook_reader-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8c4700fa213707827c9ff15c2ccbc615",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.14,>=3.10",
            "size": 84308,
            "upload_time": "2025-10-11T21:04:24",
            "upload_time_iso_8601": "2025-10-11T21:04:24.977034Z",
            "url": "https://files.pythonhosted.org/packages/3c/30/d923e45d9956e30905ed57bf1f93131d928430511141271191dfbc261644/audiobook_reader-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ded8272512ebd2dcfe34abc009d9ddfd223ef0c7da7c12877310ce3cd159a0e9",
                "md5": "c6da466ec39d9757895a9ddc70e576b0",
                "sha256": "c62f56824a5077a9272cc194e924f07b87b1030fd566c73615ac0d995ef6a04d"
            },
            "downloads": -1,
            "filename": "audiobook_reader-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "c6da466ec39d9757895a9ddc70e576b0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.14,>=3.10",
            "size": 75434,
            "upload_time": "2025-10-11T21:04:26",
            "upload_time_iso_8601": "2025-10-11T21:04:26.628445Z",
            "url": "https://files.pythonhosted.org/packages/de/d8/272512ebd2dcfe34abc009d9ddfd223ef0c7da7c12877310ce3cd159a0e9/audiobook_reader-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-11 21:04:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "danielcorsano",
    "github_project": "reader",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "audiobook-reader"
}
        
Elapsed time: 2.08987s