epub-pinyin


Nameepub-pinyin JSON
Version 0.2.1 PyPI version JSON
download
home_pageNone
SummaryAdd Pinyin annotations to Chinese text in EPUB files with smart polyphonic character handling
upload_time2025-08-31 11:18:43
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseNone
keywords epub pinyin chinese annotation pdf text-processing
VCS
bugtrack_url
requirements beautifulsoup4 pypinyin lxml
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # EPUB Pinyin

A Python tool to add Pinyin annotations above Chinese characters in EPUB files. This tool helps Chinese language learners by automatically adding pronunciation guides to Chinese text while maintaining the EPUB format and structure.

## Features

- Automatically detects Chinese characters in EPUB content
- Adds Pinyin annotations using `<ruby>` tags
- Preserves original EPUB structure and formatting
- Supports tone marks in Pinyin
- Handles complex EPUB structures
- Maintains original file organization
- Processes files in correct spine order
- **NEW**: Convert EPUB to PDF with Pinyin annotations
- **NEW**: Smart context-aware pronunciation for polyphonic characters
- **NEW**: Custom font support for Chinese text and Pinyin
- **NEW**: Professional PDF layout with proper Chinese typography

## Installation

You can install the package directly from PyPI:

```bash
pip install epub-pinyin
```

Or install from source:

```bash
git clone https://github.com/tony-develop-2025/epub-pinyin.git
cd epub-pinyin
pip install -e .
```

## Usage

### EPUB Processing

#### Command Line

After installation, you can use the tool from the command line:

```bash
epub-pinyin input.epub -o output.epub
```

Or use the module directly:

```bash
python -m epub_pinyin.main input.epub -o output.epub
```

Where:
- `input.epub` is your source EPUB file containing Chinese text
- `-o output.epub` (optional) specifies the output file name (defaults to "input_annotated.epub")

#### Python API

You can also use the tool programmatically in your Python code:

```python
from epub_pinyin import process_epub

# Process an EPUB file
process_epub("input.epub", "output.epub")
```

### PDF Conversion

#### Command Line

Convert EPUB to PDF with Pinyin annotations:

```bash
python -m epub_pinyin.main input.epub --pdf output.pdf
```

Or use the dedicated PDF converter:

```bash
python -m epub_pinyin.pdf_converter input.epub output.pdf
```

#### Python API

```python
from epub_pinyin.pdf_converter import convert_epub_to_pdf

# Convert EPUB to PDF
convert_epub_to_pdf("input.epub", "output.pdf")
```

#### Advanced PDF Usage

For more control over PDF generation:

```python
from epub_pinyin.epub_parser import EpubParser
from epub_pinyin.pdf_converter import PdfConverter
import tempfile

# Extract EPUB content
with tempfile.TemporaryDirectory() as temp_dir:
    parser = EpubParser(temp_dir)
    parser.extract_epub("input.epub")
    
    # Convert to PDF with custom settings
    converter = PdfConverter(parser)
    converter.convert_to_pdf("output.pdf")
```

## Examples

### EPUB Output

When rendered in an EPUB reader, the Pinyin appears above each character:
```
 nǐ  hǎo    shì  jiè
你好,世界!
```
![alt text](image.png)

### PDF Output

The PDF conversion creates a professional document with:

- **Smart Pinyin**: Context-aware pronunciation for polyphonic characters
  - 长袍 (zhǎng páo) vs 长度 (cháng dù)
  - 银行 (yín háng) vs 行走 (xíng zǒu)
  - 觉得 (jué de) vs 得到 (dé dào)

- **Professional Layout**:
  - Custom Chinese fonts (SimSun, FangSong, STSong)
  - Separate font styling for Pinyin annotations
  - Proper Chinese typography rules
  - No punctuation at line start (except quotes)

- **Typography Features**:
  - Automatic line breaking with proper Chinese rules
  - Ruby text positioning for Pinyin
  - Optimized spacing and margins
  - Chapter and page organization

## Requirements

### Core Requirements
- Python 3.7 or higher
- beautifulsoup4
- pypinyin
- lxml

### PDF Conversion Requirements
- reportlab (for PDF generation)
- Custom Chinese fonts (optional, will use system fonts as fallback)

### Installation with PDF Support

```bash
# Install with PDF support
pip install epub-pinyin[pdf]

# Or install all dependencies
pip install epub-pinyin[dev]
```

## Development

To set up the development environment:

1. Clone the repository:
```bash
git clone https://github.com/tony-develop-2025/epub-pinyin.git
cd epub-pinyin
```

2. Create a virtual environment:
```bash
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
```

3. Install development dependencies:
```bash
pip install -r requirements-dev.txt
```

4. Run tests:
```bash
pytest tests/
```

5. Test PDF conversion:
```bash
# Create a test EPUB file first, then convert to PDF
python -m epub_pinyin.main test.epub --pdf test.pdf
```

## Configuration

### Custom Fonts

You can add custom Chinese fonts to improve PDF output quality:

1. Place your font files in the `epub_pinyin/fonts/` directory
2. Supported formats: `.ttf`, `.ttc`
3. The system will automatically detect and use available fonts

### PDF Settings

The PDF converter includes several configurable options:

- **Font sizes**: Adjustable for title, chapter, body, and Pinyin text
- **Margins**: Customizable page margins
- **Line spacing**: Configurable line height
- **Character width**: Uniform character spacing for Chinese text

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

### Areas for Contribution

- **Font Support**: Add support for more Chinese font formats
- **Typography**: Improve Chinese typography rules
- **Performance**: Optimize PDF generation speed
- **Testing**: Add more comprehensive test cases
- **Documentation**: Improve user guides and examples

## Troubleshooting

### Common Issues

#### PDF Generation Fails
- **Error**: "No Chinese fonts found"
  - **Solution**: Install ReportLab with PDF support: `pip install epub-pinyin[pdf]`
  - **Solution**: Add custom fonts to `epub_pinyin/fonts/` directory

#### Pinyin Accuracy Issues
- **Issue**: Incorrect pronunciation for polyphonic characters
  - **Solution**: The system uses context-aware pronunciation. For better accuracy, ensure proper word boundaries in your text.

#### Font Display Issues
- **Issue**: Chinese characters not displaying correctly in PDF
  - **Solution**: Check that your font files are valid and supported
  - **Solution**: Try different font formats (.ttf, .ttc)

### Performance Tips

- For large EPUB files, PDF conversion may take some time
- Consider processing chapters separately for very large documents
- Ensure sufficient disk space for temporary files during conversion

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.




            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "epub-pinyin",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "Tony <tony.develop.2025@gmail.com>",
    "keywords": "epub, pinyin, chinese, annotation, pdf, text-processing",
    "author": null,
    "author_email": "Tony <tony.develop.2025@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/41/87/aa4ce403d221eb37ee7b1ae5000d73960b056adf447ca762d8524055ad9b/epub_pinyin-0.2.1.tar.gz",
    "platform": null,
    "description": "# EPUB Pinyin\n\nA Python tool to add Pinyin annotations above Chinese characters in EPUB files. This tool helps Chinese language learners by automatically adding pronunciation guides to Chinese text while maintaining the EPUB format and structure.\n\n## Features\n\n- Automatically detects Chinese characters in EPUB content\n- Adds Pinyin annotations using `<ruby>` tags\n- Preserves original EPUB structure and formatting\n- Supports tone marks in Pinyin\n- Handles complex EPUB structures\n- Maintains original file organization\n- Processes files in correct spine order\n- **NEW**: Convert EPUB to PDF with Pinyin annotations\n- **NEW**: Smart context-aware pronunciation for polyphonic characters\n- **NEW**: Custom font support for Chinese text and Pinyin\n- **NEW**: Professional PDF layout with proper Chinese typography\n\n## Installation\n\nYou can install the package directly from PyPI:\n\n```bash\npip install epub-pinyin\n```\n\nOr install from source:\n\n```bash\ngit clone https://github.com/tony-develop-2025/epub-pinyin.git\ncd epub-pinyin\npip install -e .\n```\n\n## Usage\n\n### EPUB Processing\n\n#### Command Line\n\nAfter installation, you can use the tool from the command line:\n\n```bash\nepub-pinyin input.epub -o output.epub\n```\n\nOr use the module directly:\n\n```bash\npython -m epub_pinyin.main input.epub -o output.epub\n```\n\nWhere:\n- `input.epub` is your source EPUB file containing Chinese text\n- `-o output.epub` (optional) specifies the output file name (defaults to \"input_annotated.epub\")\n\n#### Python API\n\nYou can also use the tool programmatically in your Python code:\n\n```python\nfrom epub_pinyin import process_epub\n\n# Process an EPUB file\nprocess_epub(\"input.epub\", \"output.epub\")\n```\n\n### PDF Conversion\n\n#### Command Line\n\nConvert EPUB to PDF with Pinyin annotations:\n\n```bash\npython -m epub_pinyin.main input.epub --pdf output.pdf\n```\n\nOr use the dedicated PDF converter:\n\n```bash\npython -m epub_pinyin.pdf_converter input.epub output.pdf\n```\n\n#### Python API\n\n```python\nfrom epub_pinyin.pdf_converter import convert_epub_to_pdf\n\n# Convert EPUB to PDF\nconvert_epub_to_pdf(\"input.epub\", \"output.pdf\")\n```\n\n#### Advanced PDF Usage\n\nFor more control over PDF generation:\n\n```python\nfrom epub_pinyin.epub_parser import EpubParser\nfrom epub_pinyin.pdf_converter import PdfConverter\nimport tempfile\n\n# Extract EPUB content\nwith tempfile.TemporaryDirectory() as temp_dir:\n    parser = EpubParser(temp_dir)\n    parser.extract_epub(\"input.epub\")\n    \n    # Convert to PDF with custom settings\n    converter = PdfConverter(parser)\n    converter.convert_to_pdf(\"output.pdf\")\n```\n\n## Examples\n\n### EPUB Output\n\nWhen rendered in an EPUB reader, the Pinyin appears above each character:\n```\n n\u01d0  h\u01ceo    sh\u00ec  ji\u00e8\n\u4f60\u597d\uff0c\u4e16\u754c\uff01\n```\n![alt text](image.png)\n\n### PDF Output\n\nThe PDF conversion creates a professional document with:\n\n- **Smart Pinyin**: Context-aware pronunciation for polyphonic characters\n  - \u957f\u888d (zh\u01ceng p\u00e1o) vs \u957f\u5ea6 (ch\u00e1ng d\u00f9)\n  - \u94f6\u884c (y\u00edn h\u00e1ng) vs \u884c\u8d70 (x\u00edng z\u01d2u)\n  - \u89c9\u5f97 (ju\u00e9 de) vs \u5f97\u5230 (d\u00e9 d\u00e0o)\n\n- **Professional Layout**:\n  - Custom Chinese fonts (SimSun, FangSong, STSong)\n  - Separate font styling for Pinyin annotations\n  - Proper Chinese typography rules\n  - No punctuation at line start (except quotes)\n\n- **Typography Features**:\n  - Automatic line breaking with proper Chinese rules\n  - Ruby text positioning for Pinyin\n  - Optimized spacing and margins\n  - Chapter and page organization\n\n## Requirements\n\n### Core Requirements\n- Python 3.7 or higher\n- beautifulsoup4\n- pypinyin\n- lxml\n\n### PDF Conversion Requirements\n- reportlab (for PDF generation)\n- Custom Chinese fonts (optional, will use system fonts as fallback)\n\n### Installation with PDF Support\n\n```bash\n# Install with PDF support\npip install epub-pinyin[pdf]\n\n# Or install all dependencies\npip install epub-pinyin[dev]\n```\n\n## Development\n\nTo set up the development environment:\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/tony-develop-2025/epub-pinyin.git\ncd epub-pinyin\n```\n\n2. Create a virtual environment:\n```bash\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n```\n\n3. Install development dependencies:\n```bash\npip install -r requirements-dev.txt\n```\n\n4. Run tests:\n```bash\npytest tests/\n```\n\n5. Test PDF conversion:\n```bash\n# Create a test EPUB file first, then convert to PDF\npython -m epub_pinyin.main test.epub --pdf test.pdf\n```\n\n## Configuration\n\n### Custom Fonts\n\nYou can add custom Chinese fonts to improve PDF output quality:\n\n1. Place your font files in the `epub_pinyin/fonts/` directory\n2. Supported formats: `.ttf`, `.ttc`\n3. The system will automatically detect and use available fonts\n\n### PDF Settings\n\nThe PDF converter includes several configurable options:\n\n- **Font sizes**: Adjustable for title, chapter, body, and Pinyin text\n- **Margins**: Customizable page margins\n- **Line spacing**: Configurable line height\n- **Character width**: Uniform character spacing for Chinese text\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n### Areas for Contribution\n\n- **Font Support**: Add support for more Chinese font formats\n- **Typography**: Improve Chinese typography rules\n- **Performance**: Optimize PDF generation speed\n- **Testing**: Add more comprehensive test cases\n- **Documentation**: Improve user guides and examples\n\n## Troubleshooting\n\n### Common Issues\n\n#### PDF Generation Fails\n- **Error**: \"No Chinese fonts found\"\n  - **Solution**: Install ReportLab with PDF support: `pip install epub-pinyin[pdf]`\n  - **Solution**: Add custom fonts to `epub_pinyin/fonts/` directory\n\n#### Pinyin Accuracy Issues\n- **Issue**: Incorrect pronunciation for polyphonic characters\n  - **Solution**: The system uses context-aware pronunciation. For better accuracy, ensure proper word boundaries in your text.\n\n#### Font Display Issues\n- **Issue**: Chinese characters not displaying correctly in PDF\n  - **Solution**: Check that your font files are valid and supported\n  - **Solution**: Try different font formats (.ttf, .ttc)\n\n### Performance Tips\n\n- For large EPUB files, PDF conversion may take some time\n- Consider processing chapters separately for very large documents\n- Ensure sufficient disk space for temporary files during conversion\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Add Pinyin annotations to Chinese text in EPUB files with smart polyphonic character handling",
    "version": "0.2.1",
    "project_urls": {
        "Documentation": "https://github.com/tony-develop-2025/epub-pinyin#readme",
        "Homepage": "https://github.com/tony-develop-2025/epub-pinyin",
        "Issues": "https://github.com/tony-develop-2025/epub-pinyin/issues",
        "Repository": "https://github.com/tony-develop-2025/epub-pinyin"
    },
    "split_keywords": [
        "epub",
        " pinyin",
        " chinese",
        " annotation",
        " pdf",
        " text-processing"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bb8fc4a46d83a472d9135220c6af6ab4d52d46f0f9d4a88437ccc7c2049cb7b6",
                "md5": "36c614f2994c84a6a66b96c4434c625d",
                "sha256": "3fd1c345d6ed567db71f77ae994c121f656325d109261498b1e09eb4d5281f20"
            },
            "downloads": -1,
            "filename": "epub_pinyin-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "36c614f2994c84a6a66b96c4434c625d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 7966916,
            "upload_time": "2025-08-31T11:18:38",
            "upload_time_iso_8601": "2025-08-31T11:18:38.038641Z",
            "url": "https://files.pythonhosted.org/packages/bb/8f/c4a46d83a472d9135220c6af6ab4d52d46f0f9d4a88437ccc7c2049cb7b6/epub_pinyin-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4187aa4ce403d221eb37ee7b1ae5000d73960b056adf447ca762d8524055ad9b",
                "md5": "488d6a680b4afc30cf9ae9be66bf5ad9",
                "sha256": "3ff7a596f0b926b8b118e1e791c4fa124640499023cb3c32629c2725d079169a"
            },
            "downloads": -1,
            "filename": "epub_pinyin-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "488d6a680b4afc30cf9ae9be66bf5ad9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 7941089,
            "upload_time": "2025-08-31T11:18:43",
            "upload_time_iso_8601": "2025-08-31T11:18:43.943546Z",
            "url": "https://files.pythonhosted.org/packages/41/87/aa4ce403d221eb37ee7b1ae5000d73960b056adf447ca762d8524055ad9b/epub_pinyin-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-31 11:18:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tony-develop-2025",
    "github_project": "epub-pinyin#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    ">=",
                    "4.12.0"
                ]
            ]
        },
        {
            "name": "pypinyin",
            "specs": [
                [
                    ">=",
                    "0.49.0"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": [
                [
                    ">=",
                    "4.9.0"
                ]
            ]
        }
    ],
    "lcname": "epub-pinyin"
}
        
Elapsed time: 5.05942s