# EPUB TOC
[![Python Version](https://img.shields.io/badge/python-3.7%2B-blue)](https://www.python.org)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
A Python tool for extracting table of contents from EPUB files with hierarchical structure support.
## Features
- Multiple extraction methods support (NCX, epub_meta, OPF)
- Automatic best method selection
- Hierarchical TOC structure preservation
- Russian and English language support
- JSON output format
- Detailed logging
- EPUB file analysis reports
## Installation
```bash
pip install epub_toc
```
## Usage
### As a module
```python
from epub_toc import EPUBTOCParser
# Create parser
parser = EPUBTOCParser('path/to/book.epub')
# Extract TOC
toc = parser.extract_toc()
# Print to console
parser.print_toc()
# Save to JSON
parser.save_toc_to_json('output.json')
```
### From command line
```bash
epub-toc path/to/book.epub
```
### EPUB File Analysis
To analyze all EPUB files in tests/data/epub_samples directory:
```bash
python tests/integration/test_epub_analysis.py
```
Analysis results are saved in `reports/` directory:
- `epub_analysis_YYYYMMDD_HHMMSS.json` - detailed report in JSON format
- `epub_analysis_YYYYMMDD_HHMMSS.txt` - brief report in text format
- `toc/*.json` - extracted TOCs for each EPUB file
Report structure:
1. JSON report contains:
- Overall statistics for all files
- Extraction methods success rate
- Detailed results for each file
- Links to extracted TOC files
2. Text report includes:
- Brief statistics
- Information about each file
- Paths to extracted TOCs
3. TOC files:
- Saved in `toc/` subdirectory
- Named as `book_name_toc.json`
- Contain complete TOC in JSON format
## Output Format
TOC is saved in JSON format with the following structure:
```json
{
"metadata": {
"title": "Book Title",
"authors": ["Author 1", "Author 2"],
"publisher": "Publisher Name",
"publication_date": "2024-01-01",
"language": "en",
"description": "Book description",
"cover_image_path": "path/to/cover.jpg",
"isbn": "978-3-16-148410-0",
"rights": "Copyright information",
"series": "Series Name",
"series_index": 1,
"identifiers": {
"isbn13": "978-3-16-148410-0",
"uuid": "550e8400-e29b-41d4-a716-446655440000"
},
"subjects": ["Fiction", "Adventure"],
"file_size": 1234567,
"file_name": "book.epub"
},
"toc": [
{
"title": "Chapter 1",
"href": "chapter1.html",
"level": 0,
"children": [
{
"title": "Section 1.1",
"href": "chapter1.html#section1",
"level": 1,
"children": []
}
]
}
]
}
```
All metadata fields are optional and will be omitted if not available in the EPUB file.
## Testing
The module has been successfully tested on various EPUB files:
- Russian books (NCX method)
- English books (epub_meta method)
- Files with different TOC structures
- Files of different sizes (from 400KB to 8MB)
## Requirements
- Python 3.7+
- epub_meta>=0.0.7
- lxml>=4.9.3
- beautifulsoup4>=4.12.2
## Contributing
We welcome contributions! If you'd like to help:
1. Fork the repository
2. Create a branch for your changes
3. Make changes and add tests
4. Ensure all tests pass
5. Create a Pull Request
See [CONTRIBUTING.md](CONTRIBUTING.md) for details.
## Security
If you discover a security vulnerability, please DO NOT create a public issue.
Instead, send a report following the instructions in [SECURITY.md](SECURITY.md)
## License
This project is licensed under the MIT License. See [LICENSE](LICENSE) file for details.
## Roadmap
- [ ] Additional EPUB format support
- [ ] Improved complex hierarchical structure handling
- [ ] Integration with popular e-readers
- [ ] Web service API
- [ ] Additional language support
Raw data
{
"_id": null,
"home_page": "https://github.com/almazilaletdinov/epub_toc",
"name": "epub-toc",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "epub, ebook, toc, table of contents, parser, ebook-tools",
"author": "Almaz Ilaletdinov",
"author_email": "a.ilaletdinov@yandex.ru",
"download_url": "https://files.pythonhosted.org/packages/2e/f8/bfcc5667925c9d39d0320757bece1b5a10402c7a8ed0db41a26ca1e9b23a/epub_toc-1.0.0.tar.gz",
"platform": null,
"description": "# EPUB TOC\n\n[![Python Version](https://img.shields.io/badge/python-3.7%2B-blue)](https://www.python.org)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nA Python tool for extracting table of contents from EPUB files with hierarchical structure support.\n\n## Features\n- Multiple extraction methods support (NCX, epub_meta, OPF)\n- Automatic best method selection\n- Hierarchical TOC structure preservation\n- Russian and English language support\n- JSON output format\n- Detailed logging\n- EPUB file analysis reports\n\n## Installation\n\n```bash\npip install epub_toc\n```\n\n## Usage\n\n### As a module\n\n```python\nfrom epub_toc import EPUBTOCParser\n\n# Create parser\nparser = EPUBTOCParser('path/to/book.epub')\n\n# Extract TOC\ntoc = parser.extract_toc()\n\n# Print to console\nparser.print_toc()\n\n# Save to JSON\nparser.save_toc_to_json('output.json')\n```\n\n### From command line\n\n```bash\nepub-toc path/to/book.epub\n```\n\n### EPUB File Analysis\n\nTo analyze all EPUB files in tests/data/epub_samples directory:\n\n```bash\npython tests/integration/test_epub_analysis.py\n```\n\nAnalysis results are saved in `reports/` directory:\n- `epub_analysis_YYYYMMDD_HHMMSS.json` - detailed report in JSON format\n- `epub_analysis_YYYYMMDD_HHMMSS.txt` - brief report in text format\n- `toc/*.json` - extracted TOCs for each EPUB file\n\nReport structure:\n1. JSON report contains:\n - Overall statistics for all files\n - Extraction methods success rate\n - Detailed results for each file\n - Links to extracted TOC files\n\n2. Text report includes:\n - Brief statistics\n - Information about each file\n - Paths to extracted TOCs\n\n3. TOC files:\n - Saved in `toc/` subdirectory\n - Named as `book_name_toc.json`\n - Contain complete TOC in JSON format\n\n## Output Format\n\nTOC is saved in JSON format with the following structure:\n\n```json\n{\n \"metadata\": {\n \"title\": \"Book Title\",\n \"authors\": [\"Author 1\", \"Author 2\"],\n \"publisher\": \"Publisher Name\",\n \"publication_date\": \"2024-01-01\",\n \"language\": \"en\",\n \"description\": \"Book description\",\n \"cover_image_path\": \"path/to/cover.jpg\",\n \"isbn\": \"978-3-16-148410-0\",\n \"rights\": \"Copyright information\",\n \"series\": \"Series Name\",\n \"series_index\": 1,\n \"identifiers\": {\n \"isbn13\": \"978-3-16-148410-0\",\n \"uuid\": \"550e8400-e29b-41d4-a716-446655440000\"\n },\n \"subjects\": [\"Fiction\", \"Adventure\"],\n \"file_size\": 1234567,\n \"file_name\": \"book.epub\"\n },\n \"toc\": [\n {\n \"title\": \"Chapter 1\",\n \"href\": \"chapter1.html\",\n \"level\": 0,\n \"children\": [\n {\n \"title\": \"Section 1.1\",\n \"href\": \"chapter1.html#section1\",\n \"level\": 1,\n \"children\": []\n }\n ]\n }\n ]\n}\n```\n\nAll metadata fields are optional and will be omitted if not available in the EPUB file.\n\n## Testing\n\nThe module has been successfully tested on various EPUB files:\n- Russian books (NCX method)\n- English books (epub_meta method)\n- Files with different TOC structures\n- Files of different sizes (from 400KB to 8MB)\n\n## Requirements\n- Python 3.7+\n- epub_meta>=0.0.7\n- lxml>=4.9.3\n- beautifulsoup4>=4.12.2 \n\n## Contributing\n\nWe welcome contributions! If you'd like to help:\n\n1. Fork the repository\n2. Create a branch for your changes\n3. Make changes and add tests\n4. Ensure all tests pass\n5. Create a Pull Request\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for details.\n\n## Security\n\nIf you discover a security vulnerability, please DO NOT create a public issue.\nInstead, send a report following the instructions in [SECURITY.md](SECURITY.md)\n\n## License\n\nThis project is licensed under the MIT License. See [LICENSE](LICENSE) file for details.\n\n## Roadmap\n\n- [ ] Additional EPUB format support\n- [ ] Improved complex hierarchical structure handling\n- [ ] Integration with popular e-readers\n- [ ] Web service API\n- [ ] Additional language support \n",
"bugtrack_url": null,
"license": null,
"summary": "A Python tool for extracting table of contents from EPUB files with hierarchical structure support",
"version": "1.0.0",
"project_urls": {
"Documentation": "https://github.com/almazilaletdinov/epub_toc/docs",
"Homepage": "https://github.com/almazilaletdinov/epub_toc",
"Source": "https://github.com/almazilaletdinov/epub_toc",
"Tracker": "https://github.com/almazilaletdinov/epub_toc/issues"
},
"split_keywords": [
"epub",
" ebook",
" toc",
" table of contents",
" parser",
" ebook-tools"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2ef8bfcc5667925c9d39d0320757bece1b5a10402c7a8ed0db41a26ca1e9b23a",
"md5": "d6c9bc5e7364326198d633866c92d196",
"sha256": "a39a3109b8c4f6120e0c11ec794cc7f72018f0c957a6d09ed77daaf0b3cf0f17"
},
"downloads": -1,
"filename": "epub_toc-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "d6c9bc5e7364326198d633866c92d196",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 14794,
"upload_time": "2024-12-26T12:52:53",
"upload_time_iso_8601": "2024-12-26T12:52:53.212613Z",
"url": "https://files.pythonhosted.org/packages/2e/f8/bfcc5667925c9d39d0320757bece1b5a10402c7a8ed0db41a26ca1e9b23a/epub_toc-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-26 12:52:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "almazilaletdinov",
"github_project": "epub_toc",
"github_not_found": true,
"lcname": "epub-toc"
}