# EPUB Pinyin
A Python tool to add Pinyin annotations above Chinese characters in EPUB files. This tool helps Chinese language learners by automatically adding pronunciation guides to Chinese text while maintaining the EPUB format and structure.
## Features
- Automatically detects Chinese characters in EPUB content
- Adds Pinyin annotations using `<ruby>` tags
- Preserves original EPUB structure and formatting
- Supports tone marks in Pinyin
- Handles complex EPUB structures
- Maintains original file organization
- Processes files in correct spine order
- **NEW**: Convert EPUB to PDF with Pinyin annotations
- **NEW**: Smart context-aware pronunciation for polyphonic characters
- **NEW**: Custom font support for Chinese text and Pinyin
- **NEW**: Professional PDF layout with proper Chinese typography
## Installation
You can install the package directly from PyPI:
```bash
pip install epub-pinyin
```
Or install from source:
```bash
git clone https://github.com/tony-develop-2025/epub-pinyin.git
cd epub-pinyin
pip install -e .
```
## Usage
### EPUB Processing
#### Command Line
After installation, you can use the tool from the command line:
```bash
epub-pinyin input.epub -o output.epub
```
Or use the module directly:
```bash
python -m epub_pinyin.main input.epub -o output.epub
```
Where:
- `input.epub` is your source EPUB file containing Chinese text
- `-o output.epub` (optional) specifies the output file name (defaults to "input_annotated.epub")
#### Python API
You can also use the tool programmatically in your Python code:
```python
from epub_pinyin import process_epub
# Process an EPUB file
process_epub("input.epub", "output.epub")
```
### PDF Conversion
#### Command Line
Convert EPUB to PDF with Pinyin annotations:
```bash
python -m epub_pinyin.main input.epub --pdf output.pdf
```
Or use the dedicated PDF converter:
```bash
python -m epub_pinyin.pdf_converter input.epub output.pdf
```
#### Python API
```python
from epub_pinyin.pdf_converter import convert_epub_to_pdf
# Convert EPUB to PDF
convert_epub_to_pdf("input.epub", "output.pdf")
```
#### Advanced PDF Usage
For more control over PDF generation:
```python
from epub_pinyin.epub_parser import EpubParser
from epub_pinyin.pdf_converter import PdfConverter
import tempfile
# Extract EPUB content
with tempfile.TemporaryDirectory() as temp_dir:
parser = EpubParser(temp_dir)
parser.extract_epub("input.epub")
# Convert to PDF with custom settings
converter = PdfConverter(parser)
converter.convert_to_pdf("output.pdf")
```
## Examples
### EPUB Output
When rendered in an EPUB reader, the Pinyin appears above each character:
```
nǐ hǎo shì jiè
你好,世界!
```

### PDF Output
The PDF conversion creates a professional document with:
- **Smart Pinyin**: Context-aware pronunciation for polyphonic characters
- 长袍 (zhǎng páo) vs 长度 (cháng dù)
- 银行 (yín háng) vs 行走 (xíng zǒu)
- 觉得 (jué de) vs 得到 (dé dào)
- **Professional Layout**:
- Custom Chinese fonts (SimSun, FangSong, STSong)
- Separate font styling for Pinyin annotations
- Proper Chinese typography rules
- No punctuation at line start (except quotes)
- **Typography Features**:
- Automatic line breaking with proper Chinese rules
- Ruby text positioning for Pinyin
- Optimized spacing and margins
- Chapter and page organization
## Requirements
### Core Requirements
- Python 3.7 or higher
- beautifulsoup4
- pypinyin
- lxml
### PDF Conversion Requirements
- reportlab (for PDF generation)
- Custom Chinese fonts (optional, will use system fonts as fallback)
### Installation with PDF Support
```bash
# Install with PDF support
pip install epub-pinyin[pdf]
# Or install all dependencies
pip install epub-pinyin[dev]
```
## Development
To set up the development environment:
1. Clone the repository:
```bash
git clone https://github.com/tony-develop-2025/epub-pinyin.git
cd epub-pinyin
```
2. Create a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. Install development dependencies:
```bash
pip install -r requirements-dev.txt
```
4. Run tests:
```bash
pytest tests/
```
5. Test PDF conversion:
```bash
# Create a test EPUB file first, then convert to PDF
python -m epub_pinyin.main test.epub --pdf test.pdf
```
## Configuration
### Custom Fonts
You can add custom Chinese fonts to improve PDF output quality:
1. Place your font files in the `epub_pinyin/fonts/` directory
2. Supported formats: `.ttf`, `.ttc`
3. The system will automatically detect and use available fonts
### PDF Settings
The PDF converter includes several configurable options:
- **Font sizes**: Adjustable for title, chapter, body, and Pinyin text
- **Margins**: Customizable page margins
- **Line spacing**: Configurable line height
- **Character width**: Uniform character spacing for Chinese text
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
### Areas for Contribution
- **Font Support**: Add support for more Chinese font formats
- **Typography**: Improve Chinese typography rules
- **Performance**: Optimize PDF generation speed
- **Testing**: Add more comprehensive test cases
- **Documentation**: Improve user guides and examples
## Troubleshooting
### Common Issues
#### PDF Generation Fails
- **Error**: "No Chinese fonts found"
- **Solution**: Install ReportLab with PDF support: `pip install epub-pinyin[pdf]`
- **Solution**: Add custom fonts to `epub_pinyin/fonts/` directory
#### Pinyin Accuracy Issues
- **Issue**: Incorrect pronunciation for polyphonic characters
- **Solution**: The system uses context-aware pronunciation. For better accuracy, ensure proper word boundaries in your text.
#### Font Display Issues
- **Issue**: Chinese characters not displaying correctly in PDF
- **Solution**: Check that your font files are valid and supported
- **Solution**: Try different font formats (.ttf, .ttc)
### Performance Tips
- For large EPUB files, PDF conversion may take some time
- Consider processing chapters separately for very large documents
- Ensure sufficient disk space for temporary files during conversion
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "epub-pinyin",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "Tony <tony.develop.2025@gmail.com>",
"keywords": "epub, pinyin, chinese, annotation, pdf, text-processing",
"author": null,
"author_email": "Tony <tony.develop.2025@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/41/87/aa4ce403d221eb37ee7b1ae5000d73960b056adf447ca762d8524055ad9b/epub_pinyin-0.2.1.tar.gz",
"platform": null,
"description": "# EPUB Pinyin\n\nA Python tool to add Pinyin annotations above Chinese characters in EPUB files. This tool helps Chinese language learners by automatically adding pronunciation guides to Chinese text while maintaining the EPUB format and structure.\n\n## Features\n\n- Automatically detects Chinese characters in EPUB content\n- Adds Pinyin annotations using `<ruby>` tags\n- Preserves original EPUB structure and formatting\n- Supports tone marks in Pinyin\n- Handles complex EPUB structures\n- Maintains original file organization\n- Processes files in correct spine order\n- **NEW**: Convert EPUB to PDF with Pinyin annotations\n- **NEW**: Smart context-aware pronunciation for polyphonic characters\n- **NEW**: Custom font support for Chinese text and Pinyin\n- **NEW**: Professional PDF layout with proper Chinese typography\n\n## Installation\n\nYou can install the package directly from PyPI:\n\n```bash\npip install epub-pinyin\n```\n\nOr install from source:\n\n```bash\ngit clone https://github.com/tony-develop-2025/epub-pinyin.git\ncd epub-pinyin\npip install -e .\n```\n\n## Usage\n\n### EPUB Processing\n\n#### Command Line\n\nAfter installation, you can use the tool from the command line:\n\n```bash\nepub-pinyin input.epub -o output.epub\n```\n\nOr use the module directly:\n\n```bash\npython -m epub_pinyin.main input.epub -o output.epub\n```\n\nWhere:\n- `input.epub` is your source EPUB file containing Chinese text\n- `-o output.epub` (optional) specifies the output file name (defaults to \"input_annotated.epub\")\n\n#### Python API\n\nYou can also use the tool programmatically in your Python code:\n\n```python\nfrom epub_pinyin import process_epub\n\n# Process an EPUB file\nprocess_epub(\"input.epub\", \"output.epub\")\n```\n\n### PDF Conversion\n\n#### Command Line\n\nConvert EPUB to PDF with Pinyin annotations:\n\n```bash\npython -m epub_pinyin.main input.epub --pdf output.pdf\n```\n\nOr use the dedicated PDF converter:\n\n```bash\npython -m epub_pinyin.pdf_converter input.epub output.pdf\n```\n\n#### Python API\n\n```python\nfrom epub_pinyin.pdf_converter import convert_epub_to_pdf\n\n# Convert EPUB to PDF\nconvert_epub_to_pdf(\"input.epub\", \"output.pdf\")\n```\n\n#### Advanced PDF Usage\n\nFor more control over PDF generation:\n\n```python\nfrom epub_pinyin.epub_parser import EpubParser\nfrom epub_pinyin.pdf_converter import PdfConverter\nimport tempfile\n\n# Extract EPUB content\nwith tempfile.TemporaryDirectory() as temp_dir:\n parser = EpubParser(temp_dir)\n parser.extract_epub(\"input.epub\")\n \n # Convert to PDF with custom settings\n converter = PdfConverter(parser)\n converter.convert_to_pdf(\"output.pdf\")\n```\n\n## Examples\n\n### EPUB Output\n\nWhen rendered in an EPUB reader, the Pinyin appears above each character:\n```\n n\u01d0 h\u01ceo sh\u00ec ji\u00e8\n\u4f60\u597d\uff0c\u4e16\u754c\uff01\n```\n\n\n### PDF Output\n\nThe PDF conversion creates a professional document with:\n\n- **Smart Pinyin**: Context-aware pronunciation for polyphonic characters\n - \u957f\u888d (zh\u01ceng p\u00e1o) vs \u957f\u5ea6 (ch\u00e1ng d\u00f9)\n - \u94f6\u884c (y\u00edn h\u00e1ng) vs \u884c\u8d70 (x\u00edng z\u01d2u)\n - \u89c9\u5f97 (ju\u00e9 de) vs \u5f97\u5230 (d\u00e9 d\u00e0o)\n\n- **Professional Layout**:\n - Custom Chinese fonts (SimSun, FangSong, STSong)\n - Separate font styling for Pinyin annotations\n - Proper Chinese typography rules\n - No punctuation at line start (except quotes)\n\n- **Typography Features**:\n - Automatic line breaking with proper Chinese rules\n - Ruby text positioning for Pinyin\n - Optimized spacing and margins\n - Chapter and page organization\n\n## Requirements\n\n### Core Requirements\n- Python 3.7 or higher\n- beautifulsoup4\n- pypinyin\n- lxml\n\n### PDF Conversion Requirements\n- reportlab (for PDF generation)\n- Custom Chinese fonts (optional, will use system fonts as fallback)\n\n### Installation with PDF Support\n\n```bash\n# Install with PDF support\npip install epub-pinyin[pdf]\n\n# Or install all dependencies\npip install epub-pinyin[dev]\n```\n\n## Development\n\nTo set up the development environment:\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/tony-develop-2025/epub-pinyin.git\ncd epub-pinyin\n```\n\n2. Create a virtual environment:\n```bash\npython -m venv venv\nsource venv/bin/activate # On Windows: venv\\Scripts\\activate\n```\n\n3. Install development dependencies:\n```bash\npip install -r requirements-dev.txt\n```\n\n4. Run tests:\n```bash\npytest tests/\n```\n\n5. Test PDF conversion:\n```bash\n# Create a test EPUB file first, then convert to PDF\npython -m epub_pinyin.main test.epub --pdf test.pdf\n```\n\n## Configuration\n\n### Custom Fonts\n\nYou can add custom Chinese fonts to improve PDF output quality:\n\n1. Place your font files in the `epub_pinyin/fonts/` directory\n2. Supported formats: `.ttf`, `.ttc`\n3. The system will automatically detect and use available fonts\n\n### PDF Settings\n\nThe PDF converter includes several configurable options:\n\n- **Font sizes**: Adjustable for title, chapter, body, and Pinyin text\n- **Margins**: Customizable page margins\n- **Line spacing**: Configurable line height\n- **Character width**: Uniform character spacing for Chinese text\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n### Areas for Contribution\n\n- **Font Support**: Add support for more Chinese font formats\n- **Typography**: Improve Chinese typography rules\n- **Performance**: Optimize PDF generation speed\n- **Testing**: Add more comprehensive test cases\n- **Documentation**: Improve user guides and examples\n\n## Troubleshooting\n\n### Common Issues\n\n#### PDF Generation Fails\n- **Error**: \"No Chinese fonts found\"\n - **Solution**: Install ReportLab with PDF support: `pip install epub-pinyin[pdf]`\n - **Solution**: Add custom fonts to `epub_pinyin/fonts/` directory\n\n#### Pinyin Accuracy Issues\n- **Issue**: Incorrect pronunciation for polyphonic characters\n - **Solution**: The system uses context-aware pronunciation. For better accuracy, ensure proper word boundaries in your text.\n\n#### Font Display Issues\n- **Issue**: Chinese characters not displaying correctly in PDF\n - **Solution**: Check that your font files are valid and supported\n - **Solution**: Try different font formats (.ttf, .ttc)\n\n### Performance Tips\n\n- For large EPUB files, PDF conversion may take some time\n- Consider processing chapters separately for very large documents\n- Ensure sufficient disk space for temporary files during conversion\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Add Pinyin annotations to Chinese text in EPUB files with smart polyphonic character handling",
"version": "0.2.1",
"project_urls": {
"Documentation": "https://github.com/tony-develop-2025/epub-pinyin#readme",
"Homepage": "https://github.com/tony-develop-2025/epub-pinyin",
"Issues": "https://github.com/tony-develop-2025/epub-pinyin/issues",
"Repository": "https://github.com/tony-develop-2025/epub-pinyin"
},
"split_keywords": [
"epub",
" pinyin",
" chinese",
" annotation",
" pdf",
" text-processing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "bb8fc4a46d83a472d9135220c6af6ab4d52d46f0f9d4a88437ccc7c2049cb7b6",
"md5": "36c614f2994c84a6a66b96c4434c625d",
"sha256": "3fd1c345d6ed567db71f77ae994c121f656325d109261498b1e09eb4d5281f20"
},
"downloads": -1,
"filename": "epub_pinyin-0.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "36c614f2994c84a6a66b96c4434c625d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 7966916,
"upload_time": "2025-08-31T11:18:38",
"upload_time_iso_8601": "2025-08-31T11:18:38.038641Z",
"url": "https://files.pythonhosted.org/packages/bb/8f/c4a46d83a472d9135220c6af6ab4d52d46f0f9d4a88437ccc7c2049cb7b6/epub_pinyin-0.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4187aa4ce403d221eb37ee7b1ae5000d73960b056adf447ca762d8524055ad9b",
"md5": "488d6a680b4afc30cf9ae9be66bf5ad9",
"sha256": "3ff7a596f0b926b8b118e1e791c4fa124640499023cb3c32629c2725d079169a"
},
"downloads": -1,
"filename": "epub_pinyin-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "488d6a680b4afc30cf9ae9be66bf5ad9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 7941089,
"upload_time": "2025-08-31T11:18:43",
"upload_time_iso_8601": "2025-08-31T11:18:43.943546Z",
"url": "https://files.pythonhosted.org/packages/41/87/aa4ce403d221eb37ee7b1ae5000d73960b056adf447ca762d8524055ad9b/epub_pinyin-0.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-31 11:18:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tony-develop-2025",
"github_project": "epub-pinyin#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "beautifulsoup4",
"specs": [
[
">=",
"4.12.0"
]
]
},
{
"name": "pypinyin",
"specs": [
[
">=",
"0.49.0"
]
]
},
{
"name": "lxml",
"specs": [
[
">=",
"4.9.0"
]
]
}
],
"lcname": "epub-pinyin"
}