epub-toc


Nameepub-toc JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/almazilaletdinov/epub_toc
SummaryA Python tool for extracting table of contents from EPUB files with hierarchical structure support
upload_time2024-12-26 12:52:53
maintainerNone
docs_urlNone
authorAlmaz Ilaletdinov
requires_python>=3.7
licenseNone
keywords epub ebook toc table of contents parser ebook-tools
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # EPUB TOC

[![Python Version](https://img.shields.io/badge/python-3.7%2B-blue)](https://www.python.org)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

A Python tool for extracting table of contents from EPUB files with hierarchical structure support.

## Features
- Multiple extraction methods support (NCX, epub_meta, OPF)
- Automatic best method selection
- Hierarchical TOC structure preservation
- Russian and English language support
- JSON output format
- Detailed logging
- EPUB file analysis reports

## Installation

```bash
pip install epub_toc
```

## Usage

### As a module

```python
from epub_toc import EPUBTOCParser

# Create parser
parser = EPUBTOCParser('path/to/book.epub')

# Extract TOC
toc = parser.extract_toc()

# Print to console
parser.print_toc()

# Save to JSON
parser.save_toc_to_json('output.json')
```

### From command line

```bash
epub-toc path/to/book.epub
```

### EPUB File Analysis

To analyze all EPUB files in tests/data/epub_samples directory:

```bash
python tests/integration/test_epub_analysis.py
```

Analysis results are saved in `reports/` directory:
- `epub_analysis_YYYYMMDD_HHMMSS.json` - detailed report in JSON format
- `epub_analysis_YYYYMMDD_HHMMSS.txt` - brief report in text format
- `toc/*.json` - extracted TOCs for each EPUB file

Report structure:
1. JSON report contains:
   - Overall statistics for all files
   - Extraction methods success rate
   - Detailed results for each file
   - Links to extracted TOC files

2. Text report includes:
   - Brief statistics
   - Information about each file
   - Paths to extracted TOCs

3. TOC files:
   - Saved in `toc/` subdirectory
   - Named as `book_name_toc.json`
   - Contain complete TOC in JSON format

## Output Format

TOC is saved in JSON format with the following structure:

```json
{
  "metadata": {
    "title": "Book Title",
    "authors": ["Author 1", "Author 2"],
    "publisher": "Publisher Name",
    "publication_date": "2024-01-01",
    "language": "en",
    "description": "Book description",
    "cover_image_path": "path/to/cover.jpg",
    "isbn": "978-3-16-148410-0",
    "rights": "Copyright information",
    "series": "Series Name",
    "series_index": 1,
    "identifiers": {
      "isbn13": "978-3-16-148410-0",
      "uuid": "550e8400-e29b-41d4-a716-446655440000"
    },
    "subjects": ["Fiction", "Adventure"],
    "file_size": 1234567,
    "file_name": "book.epub"
  },
  "toc": [
    {
      "title": "Chapter 1",
      "href": "chapter1.html",
      "level": 0,
      "children": [
        {
          "title": "Section 1.1",
          "href": "chapter1.html#section1",
          "level": 1,
          "children": []
        }
      ]
    }
  ]
}
```

All metadata fields are optional and will be omitted if not available in the EPUB file.

## Testing

The module has been successfully tested on various EPUB files:
- Russian books (NCX method)
- English books (epub_meta method)
- Files with different TOC structures
- Files of different sizes (from 400KB to 8MB)

## Requirements
- Python 3.7+
- epub_meta>=0.0.7
- lxml>=4.9.3
- beautifulsoup4>=4.12.2 

## Contributing

We welcome contributions! If you'd like to help:

1. Fork the repository
2. Create a branch for your changes
3. Make changes and add tests
4. Ensure all tests pass
5. Create a Pull Request

See [CONTRIBUTING.md](CONTRIBUTING.md) for details.

## Security

If you discover a security vulnerability, please DO NOT create a public issue.
Instead, send a report following the instructions in [SECURITY.md](SECURITY.md)

## License

This project is licensed under the MIT License. See [LICENSE](LICENSE) file for details.

## Roadmap

- [ ] Additional EPUB format support
- [ ] Improved complex hierarchical structure handling
- [ ] Integration with popular e-readers
- [ ] Web service API
- [ ] Additional language support 

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/almazilaletdinov/epub_toc",
    "name": "epub-toc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "epub, ebook, toc, table of contents, parser, ebook-tools",
    "author": "Almaz Ilaletdinov",
    "author_email": "a.ilaletdinov@yandex.ru",
    "download_url": "https://files.pythonhosted.org/packages/2e/f8/bfcc5667925c9d39d0320757bece1b5a10402c7a8ed0db41a26ca1e9b23a/epub_toc-1.0.0.tar.gz",
    "platform": null,
    "description": "# EPUB TOC\n\n[![Python Version](https://img.shields.io/badge/python-3.7%2B-blue)](https://www.python.org)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nA Python tool for extracting table of contents from EPUB files with hierarchical structure support.\n\n## Features\n- Multiple extraction methods support (NCX, epub_meta, OPF)\n- Automatic best method selection\n- Hierarchical TOC structure preservation\n- Russian and English language support\n- JSON output format\n- Detailed logging\n- EPUB file analysis reports\n\n## Installation\n\n```bash\npip install epub_toc\n```\n\n## Usage\n\n### As a module\n\n```python\nfrom epub_toc import EPUBTOCParser\n\n# Create parser\nparser = EPUBTOCParser('path/to/book.epub')\n\n# Extract TOC\ntoc = parser.extract_toc()\n\n# Print to console\nparser.print_toc()\n\n# Save to JSON\nparser.save_toc_to_json('output.json')\n```\n\n### From command line\n\n```bash\nepub-toc path/to/book.epub\n```\n\n### EPUB File Analysis\n\nTo analyze all EPUB files in tests/data/epub_samples directory:\n\n```bash\npython tests/integration/test_epub_analysis.py\n```\n\nAnalysis results are saved in `reports/` directory:\n- `epub_analysis_YYYYMMDD_HHMMSS.json` - detailed report in JSON format\n- `epub_analysis_YYYYMMDD_HHMMSS.txt` - brief report in text format\n- `toc/*.json` - extracted TOCs for each EPUB file\n\nReport structure:\n1. JSON report contains:\n   - Overall statistics for all files\n   - Extraction methods success rate\n   - Detailed results for each file\n   - Links to extracted TOC files\n\n2. Text report includes:\n   - Brief statistics\n   - Information about each file\n   - Paths to extracted TOCs\n\n3. TOC files:\n   - Saved in `toc/` subdirectory\n   - Named as `book_name_toc.json`\n   - Contain complete TOC in JSON format\n\n## Output Format\n\nTOC is saved in JSON format with the following structure:\n\n```json\n{\n  \"metadata\": {\n    \"title\": \"Book Title\",\n    \"authors\": [\"Author 1\", \"Author 2\"],\n    \"publisher\": \"Publisher Name\",\n    \"publication_date\": \"2024-01-01\",\n    \"language\": \"en\",\n    \"description\": \"Book description\",\n    \"cover_image_path\": \"path/to/cover.jpg\",\n    \"isbn\": \"978-3-16-148410-0\",\n    \"rights\": \"Copyright information\",\n    \"series\": \"Series Name\",\n    \"series_index\": 1,\n    \"identifiers\": {\n      \"isbn13\": \"978-3-16-148410-0\",\n      \"uuid\": \"550e8400-e29b-41d4-a716-446655440000\"\n    },\n    \"subjects\": [\"Fiction\", \"Adventure\"],\n    \"file_size\": 1234567,\n    \"file_name\": \"book.epub\"\n  },\n  \"toc\": [\n    {\n      \"title\": \"Chapter 1\",\n      \"href\": \"chapter1.html\",\n      \"level\": 0,\n      \"children\": [\n        {\n          \"title\": \"Section 1.1\",\n          \"href\": \"chapter1.html#section1\",\n          \"level\": 1,\n          \"children\": []\n        }\n      ]\n    }\n  ]\n}\n```\n\nAll metadata fields are optional and will be omitted if not available in the EPUB file.\n\n## Testing\n\nThe module has been successfully tested on various EPUB files:\n- Russian books (NCX method)\n- English books (epub_meta method)\n- Files with different TOC structures\n- Files of different sizes (from 400KB to 8MB)\n\n## Requirements\n- Python 3.7+\n- epub_meta>=0.0.7\n- lxml>=4.9.3\n- beautifulsoup4>=4.12.2 \n\n## Contributing\n\nWe welcome contributions! If you'd like to help:\n\n1. Fork the repository\n2. Create a branch for your changes\n3. Make changes and add tests\n4. Ensure all tests pass\n5. Create a Pull Request\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for details.\n\n## Security\n\nIf you discover a security vulnerability, please DO NOT create a public issue.\nInstead, send a report following the instructions in [SECURITY.md](SECURITY.md)\n\n## License\n\nThis project is licensed under the MIT License. See [LICENSE](LICENSE) file for details.\n\n## Roadmap\n\n- [ ] Additional EPUB format support\n- [ ] Improved complex hierarchical structure handling\n- [ ] Integration with popular e-readers\n- [ ] Web service API\n- [ ] Additional language support \n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python tool for extracting table of contents from EPUB files with hierarchical structure support",
    "version": "1.0.0",
    "project_urls": {
        "Documentation": "https://github.com/almazilaletdinov/epub_toc/docs",
        "Homepage": "https://github.com/almazilaletdinov/epub_toc",
        "Source": "https://github.com/almazilaletdinov/epub_toc",
        "Tracker": "https://github.com/almazilaletdinov/epub_toc/issues"
    },
    "split_keywords": [
        "epub",
        " ebook",
        " toc",
        " table of contents",
        " parser",
        " ebook-tools"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2ef8bfcc5667925c9d39d0320757bece1b5a10402c7a8ed0db41a26ca1e9b23a",
                "md5": "d6c9bc5e7364326198d633866c92d196",
                "sha256": "a39a3109b8c4f6120e0c11ec794cc7f72018f0c957a6d09ed77daaf0b3cf0f17"
            },
            "downloads": -1,
            "filename": "epub_toc-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "d6c9bc5e7364326198d633866c92d196",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 14794,
            "upload_time": "2024-12-26T12:52:53",
            "upload_time_iso_8601": "2024-12-26T12:52:53.212613Z",
            "url": "https://files.pythonhosted.org/packages/2e/f8/bfcc5667925c9d39d0320757bece1b5a10402c7a8ed0db41a26ca1e9b23a/epub_toc-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-26 12:52:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "almazilaletdinov",
    "github_project": "epub_toc",
    "github_not_found": true,
    "lcname": "epub-toc"
}
        
Elapsed time: 0.41401s