axa-fr-splitter


Nameaxa-fr-splitter JSON
Version 1.0.0 PyPI version JSON
download
home_pageNone
SummaryThis package splits PDF and TIFF files into separate PNGs and extracts text from input files.
upload_time2024-10-09 08:43:50
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT License
keywords doc-proc pdf tif tiff word
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # @axa-fr/axa-fr-splitter
![PyPI](https://img.shields.io/pypi/v/axa-fr-splitter)
![PyPI - License](https://img.shields.io/pypi/l/axa-fr-splitter)
![PyPI - Wheel](https://img.shields.io/pypi/wheel/axa-fr-splitter)

![Tests](https://github.com/AxaFrance/axa-fr-splitter/actions/workflows/tests.yml/badge.svg)
![python: 3.10 (shields.io)](https://img.shields.io/badge/python-3.10-green)
![python: 3.11 (shields.io)](https://img.shields.io/badge/python-3.11-green)
![python: 3.12 (shields.io)](https://img.shields.io/badge/python-3.12-green)

[//]: # ([![Continuous Integration](https://github.com/AxaFrance/axa-fr-splitter/actions/workflows/python-publish.yml/badge.svg)](https://github.com/AxaFrance/axa-fr-splitter/actions/workflows/python-publish.yml))

[//]: # ([![Quality Gate]&#40;https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=alert_status&#41;]&#40;https://sonarcloud.io/dashboard?id=<INSERT SONAR SPLITTER PROJECT>&#41;)

[//]: # ([![Reliability]&#40;https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=reliability_rating&#41;]&#40;https://sonarcloud.io/component_measures?id=<INSERT SONAR SPLITTER PROJECT>&metric=reliability_rating&#41;)

[//]: # ([![Security]&#40;https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=security_rating&#41;]&#40;https://sonarcloud.io/component_measures?id=A<INSERT SONAR SPLITTER PROJECT>&metric=security_rating&#41;)

[//]: # ([![Code Coverage]&#40;https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=coverage&#41;]&#40;https://sonarcloud.io/component_measures?id=<INSERT SONAR SPLITTER PROJECT>&metric=Coverage&#41;)

[//]: # ([![Twitter]&#40;https://img.shields.io/twitter/follow/GuildDEvOpen?style=social&#41;]&#40;https://twitter.com/intent/follow?screen_name=GuildDEvOpen&#41;)

- [About](#about)
- [How to consume](#how-to-consume)
- [Contribute](#contribute)

## About
The axa-fr-splitter package aims at providing tools to process several types of documents (pdf, tiff, ...) into images using Python.

## Quick Start
```sh
pip install axa-fr-splitter
```


```python
from pathlib import Path
from splitter import FileHandler
from splitter.image.tiff_handler import TifHandler
from splitter.pdf.pdf_handler import FitzPdfHandler


def create_file_handler() -> FileHandler:
    """Factory to create customized file handler"""

    # Create File Handler
    file_handler = FileHandler()

    # Create pdf Handler
    pdf_handler = FitzPdfHandler()

    # Create tiff Handler
    tiff_handler = TifHandler()

    # Register PDF Handler
    file_handler.register_converter(
        pdf_handler,
        extensions=['.pdf'],
        mime_types=['application/pdf']
    )

    # Register tiff Handler
    file_handler.register_converter(
        tiff_handler,
        extensions=['.tif', '.tiff'],
        mime_types=['image/tiff']
    )

    return file_handler


def main(filepath, output_path):
    file_handler = create_file_handler()
    output_path = Path(output_path)

    for file_or_exception in file_handler.split_document(filepath):
        file = file_or_exception.unwrap()

        print(file.metadata)
        # {
        #     'original_filename': 'specimen.tiff',
        #     'page_number': 1,
        #     'total_pages': 4,
        #     'width': 1554,
        #     'height': 2200,
        #     'resized_ratio': 0.9405728943993159
        # }

        # Export File file bytes:
        export_path = output_path.joinpath(file.relative_path)
        export_path.write_bytes(file.file_bytes)

if __name__ == '__main__':
    main(r"tests/inputs/specimen.tiff", MY_OUTPUT_PATH)
```

You can use the `match` statement to handle the exceptions in a different way:

``` python
from returns.result import Failure, Success

...

def main(filepath, output_path):
    file_handler = create_file_handler()
    output_path = Path(output_path)

    for file_or_exception in file_handler.split_document(filepath):
        match file_or_exception:
            case Success(file):
                print(file.metadata)
                export_path = output_path.joinpath(file.relative_path)
                export_path.write_bytes(file.file_bytes)
            case Failure(exception):
                # Handle Exception ...
                raise exception

```

## Contribute

- [How to run the solution and to contribute](./.github/CONTRIBUTING.md)
- [Please respect our code of conduct](./.github/CODE_OF_CONDUCT.md)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "axa-fr-splitter",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "doc-proc, pdf, tif, tiff, word",
    "author": null,
    "author_email": "Dhia Hmila <dhia.hmila@axa.fr>, Guillaume Chervet <guillaume.chervet@axa.fr>, Hicham Dakhli <hicham.dakhli@axa.fr>",
    "download_url": "https://files.pythonhosted.org/packages/48/24/7db02fef5c9b7ba550e3fea3eda4bc7938305f78dca0c6fa008a4631a505/axa_fr_splitter-1.0.0.tar.gz",
    "platform": null,
    "description": "# @axa-fr/axa-fr-splitter\n![PyPI](https://img.shields.io/pypi/v/axa-fr-splitter)\n![PyPI - License](https://img.shields.io/pypi/l/axa-fr-splitter)\n![PyPI - Wheel](https://img.shields.io/pypi/wheel/axa-fr-splitter)\n\n![Tests](https://github.com/AxaFrance/axa-fr-splitter/actions/workflows/tests.yml/badge.svg)\n![python: 3.10 (shields.io)](https://img.shields.io/badge/python-3.10-green)\n![python: 3.11 (shields.io)](https://img.shields.io/badge/python-3.11-green)\n![python: 3.12 (shields.io)](https://img.shields.io/badge/python-3.12-green)\n\n[//]: # ([![Continuous Integration]&#40;https://github.com/AxaFrance/axa-fr-splitter/actions/workflows/python-publish.yml/badge.svg&#41;]&#40;https://github.com/AxaFrance/axa-fr-splitter/actions/workflows/python-publish.yml&#41;)\n\n[//]: # ([![Quality Gate]&#40;https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=alert_status&#41;]&#40;https://sonarcloud.io/dashboard?id=<INSERT SONAR SPLITTER PROJECT>&#41;)\n\n[//]: # ([![Reliability]&#40;https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=reliability_rating&#41;]&#40;https://sonarcloud.io/component_measures?id=<INSERT SONAR SPLITTER PROJECT>&metric=reliability_rating&#41;)\n\n[//]: # ([![Security]&#40;https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=security_rating&#41;]&#40;https://sonarcloud.io/component_measures?id=A<INSERT SONAR SPLITTER PROJECT>&metric=security_rating&#41;)\n\n[//]: # ([![Code Coverage]&#40;https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=coverage&#41;]&#40;https://sonarcloud.io/component_measures?id=<INSERT SONAR SPLITTER PROJECT>&metric=Coverage&#41;)\n\n[//]: # ([![Twitter]&#40;https://img.shields.io/twitter/follow/GuildDEvOpen?style=social&#41;]&#40;https://twitter.com/intent/follow?screen_name=GuildDEvOpen&#41;)\n\n- [About](#about)\n- [How to consume](#how-to-consume)\n- [Contribute](#contribute)\n\n## About\nThe axa-fr-splitter package aims at providing tools to process several types of documents (pdf, tiff, ...) into images using Python.\n\n## Quick Start\n```sh\npip install axa-fr-splitter\n```\n\n\n```python\nfrom pathlib import Path\nfrom splitter import FileHandler\nfrom splitter.image.tiff_handler import TifHandler\nfrom splitter.pdf.pdf_handler import FitzPdfHandler\n\n\ndef create_file_handler() -> FileHandler:\n    \"\"\"Factory to create customized file handler\"\"\"\n\n    # Create File Handler\n    file_handler = FileHandler()\n\n    # Create pdf Handler\n    pdf_handler = FitzPdfHandler()\n\n    # Create tiff Handler\n    tiff_handler = TifHandler()\n\n    # Register PDF Handler\n    file_handler.register_converter(\n        pdf_handler,\n        extensions=['.pdf'],\n        mime_types=['application/pdf']\n    )\n\n    # Register tiff Handler\n    file_handler.register_converter(\n        tiff_handler,\n        extensions=['.tif', '.tiff'],\n        mime_types=['image/tiff']\n    )\n\n    return file_handler\n\n\ndef main(filepath, output_path):\n    file_handler = create_file_handler()\n    output_path = Path(output_path)\n\n    for file_or_exception in file_handler.split_document(filepath):\n        file = file_or_exception.unwrap()\n\n        print(file.metadata)\n        # {\n        #     'original_filename': 'specimen.tiff',\n        #     'page_number': 1,\n        #     'total_pages': 4,\n        #     'width': 1554,\n        #     'height': 2200,\n        #     'resized_ratio': 0.9405728943993159\n        # }\n\n        # Export File file bytes:\n        export_path = output_path.joinpath(file.relative_path)\n        export_path.write_bytes(file.file_bytes)\n\nif __name__ == '__main__':\n    main(r\"tests/inputs/specimen.tiff\", MY_OUTPUT_PATH)\n```\n\nYou can use the `match` statement to handle the exceptions in a different way:\n\n``` python\nfrom returns.result import Failure, Success\n\n...\n\ndef main(filepath, output_path):\n    file_handler = create_file_handler()\n    output_path = Path(output_path)\n\n    for file_or_exception in file_handler.split_document(filepath):\n        match file_or_exception:\n            case Success(file):\n                print(file.metadata)\n                export_path = output_path.joinpath(file.relative_path)\n                export_path.write_bytes(file.file_bytes)\n            case Failure(exception):\n                # Handle Exception ...\n                raise exception\n\n```\n\n## Contribute\n\n- [How to run the solution and to contribute](./.github/CONTRIBUTING.md)\n- [Please respect our code of conduct](./.github/CODE_OF_CONDUCT.md)\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "This package splits PDF and TIFF files into separate PNGs and extracts text from input files.",
    "version": "1.0.0",
    "project_urls": {
        "Issues": "https://github.com/AxaFrance/axa-fr-splitter/issues",
        "Repository": "https://github.com/AxaFrance/axa-fr-splitter"
    },
    "split_keywords": [
        "doc-proc",
        " pdf",
        " tif",
        " tiff",
        " word"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "245aa5f25af013d4d586344544bf1ad1ac22edb346e94097bf591eee32d7184e",
                "md5": "81962d8cafc11b8755690636971e7021",
                "sha256": "341f8bca5ea8792c6bb2aad704f583bc38f0eb122b5bd6711db53ee499c51e7c"
            },
            "downloads": -1,
            "filename": "axa_fr_splitter-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "81962d8cafc11b8755690636971e7021",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 16896,
            "upload_time": "2024-10-09T08:43:48",
            "upload_time_iso_8601": "2024-10-09T08:43:48.933492Z",
            "url": "https://files.pythonhosted.org/packages/24/5a/a5f25af013d4d586344544bf1ad1ac22edb346e94097bf591eee32d7184e/axa_fr_splitter-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "48247db02fef5c9b7ba550e3fea3eda4bc7938305f78dca0c6fa008a4631a505",
                "md5": "dfe45bae393fb49db5aaa5274b3b4ba7",
                "sha256": "1b7dc685de4a769207d62d8d0309df8996cd99155eea1c4702e2c9636c5162a9"
            },
            "downloads": -1,
            "filename": "axa_fr_splitter-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "dfe45bae393fb49db5aaa5274b3b4ba7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 1163147,
            "upload_time": "2024-10-09T08:43:50",
            "upload_time_iso_8601": "2024-10-09T08:43:50.794292Z",
            "url": "https://files.pythonhosted.org/packages/48/24/7db02fef5c9b7ba550e3fea3eda4bc7938305f78dca0c6fa008a4631a505/axa_fr_splitter-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-09 08:43:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AxaFrance",
    "github_project": "axa-fr-splitter",
    "github_not_found": true,
    "lcname": "axa-fr-splitter"
}
        
Elapsed time: 0.67796s