Name | axa-fr-splitter JSON |
Version |
1.0.0
JSON |
| download |
home_page | None |
Summary | This package splits PDF and TIFF files into separate PNGs and extracts text from input files. |
upload_time | 2024-10-09 08:43:50 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | MIT License |
keywords |
doc-proc
pdf
tif
tiff
word
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# @axa-fr/axa-fr-splitter
![PyPI](https://img.shields.io/pypi/v/axa-fr-splitter)
![PyPI - License](https://img.shields.io/pypi/l/axa-fr-splitter)
![PyPI - Wheel](https://img.shields.io/pypi/wheel/axa-fr-splitter)
![Tests](https://github.com/AxaFrance/axa-fr-splitter/actions/workflows/tests.yml/badge.svg)
![python: 3.10 (shields.io)](https://img.shields.io/badge/python-3.10-green)
![python: 3.11 (shields.io)](https://img.shields.io/badge/python-3.11-green)
![python: 3.12 (shields.io)](https://img.shields.io/badge/python-3.12-green)
[//]: # ([![Continuous Integration](https://github.com/AxaFrance/axa-fr-splitter/actions/workflows/python-publish.yml/badge.svg)](https://github.com/AxaFrance/axa-fr-splitter/actions/workflows/python-publish.yml))
[//]: # ([![Quality Gate](https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=alert_status)](https://sonarcloud.io/dashboard?id=<INSERT SONAR SPLITTER PROJECT>))
[//]: # ([![Reliability](https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=reliability_rating)](https://sonarcloud.io/component_measures?id=<INSERT SONAR SPLITTER PROJECT>&metric=reliability_rating))
[//]: # ([![Security](https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=security_rating)](https://sonarcloud.io/component_measures?id=A<INSERT SONAR SPLITTER PROJECT>&metric=security_rating))
[//]: # ([![Code Coverage](https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=coverage)](https://sonarcloud.io/component_measures?id=<INSERT SONAR SPLITTER PROJECT>&metric=Coverage))
[//]: # ([![Twitter](https://img.shields.io/twitter/follow/GuildDEvOpen?style=social)](https://twitter.com/intent/follow?screen_name=GuildDEvOpen))
- [About](#about)
- [How to consume](#how-to-consume)
- [Contribute](#contribute)
## About
The axa-fr-splitter package aims at providing tools to process several types of documents (pdf, tiff, ...) into images using Python.
## Quick Start
```sh
pip install axa-fr-splitter
```
```python
from pathlib import Path
from splitter import FileHandler
from splitter.image.tiff_handler import TifHandler
from splitter.pdf.pdf_handler import FitzPdfHandler
def create_file_handler() -> FileHandler:
"""Factory to create customized file handler"""
# Create File Handler
file_handler = FileHandler()
# Create pdf Handler
pdf_handler = FitzPdfHandler()
# Create tiff Handler
tiff_handler = TifHandler()
# Register PDF Handler
file_handler.register_converter(
pdf_handler,
extensions=['.pdf'],
mime_types=['application/pdf']
)
# Register tiff Handler
file_handler.register_converter(
tiff_handler,
extensions=['.tif', '.tiff'],
mime_types=['image/tiff']
)
return file_handler
def main(filepath, output_path):
file_handler = create_file_handler()
output_path = Path(output_path)
for file_or_exception in file_handler.split_document(filepath):
file = file_or_exception.unwrap()
print(file.metadata)
# {
# 'original_filename': 'specimen.tiff',
# 'page_number': 1,
# 'total_pages': 4,
# 'width': 1554,
# 'height': 2200,
# 'resized_ratio': 0.9405728943993159
# }
# Export File file bytes:
export_path = output_path.joinpath(file.relative_path)
export_path.write_bytes(file.file_bytes)
if __name__ == '__main__':
main(r"tests/inputs/specimen.tiff", MY_OUTPUT_PATH)
```
You can use the `match` statement to handle the exceptions in a different way:
``` python
from returns.result import Failure, Success
...
def main(filepath, output_path):
file_handler = create_file_handler()
output_path = Path(output_path)
for file_or_exception in file_handler.split_document(filepath):
match file_or_exception:
case Success(file):
print(file.metadata)
export_path = output_path.joinpath(file.relative_path)
export_path.write_bytes(file.file_bytes)
case Failure(exception):
# Handle Exception ...
raise exception
```
## Contribute
- [How to run the solution and to contribute](./.github/CONTRIBUTING.md)
- [Please respect our code of conduct](./.github/CODE_OF_CONDUCT.md)
Raw data
{
"_id": null,
"home_page": null,
"name": "axa-fr-splitter",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "doc-proc, pdf, tif, tiff, word",
"author": null,
"author_email": "Dhia Hmila <dhia.hmila@axa.fr>, Guillaume Chervet <guillaume.chervet@axa.fr>, Hicham Dakhli <hicham.dakhli@axa.fr>",
"download_url": "https://files.pythonhosted.org/packages/48/24/7db02fef5c9b7ba550e3fea3eda4bc7938305f78dca0c6fa008a4631a505/axa_fr_splitter-1.0.0.tar.gz",
"platform": null,
"description": "# @axa-fr/axa-fr-splitter\n![PyPI](https://img.shields.io/pypi/v/axa-fr-splitter)\n![PyPI - License](https://img.shields.io/pypi/l/axa-fr-splitter)\n![PyPI - Wheel](https://img.shields.io/pypi/wheel/axa-fr-splitter)\n\n![Tests](https://github.com/AxaFrance/axa-fr-splitter/actions/workflows/tests.yml/badge.svg)\n![python: 3.10 (shields.io)](https://img.shields.io/badge/python-3.10-green)\n![python: 3.11 (shields.io)](https://img.shields.io/badge/python-3.11-green)\n![python: 3.12 (shields.io)](https://img.shields.io/badge/python-3.12-green)\n\n[//]: # ([![Continuous Integration](https://github.com/AxaFrance/axa-fr-splitter/actions/workflows/python-publish.yml/badge.svg)](https://github.com/AxaFrance/axa-fr-splitter/actions/workflows/python-publish.yml))\n\n[//]: # ([![Quality Gate](https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=alert_status)](https://sonarcloud.io/dashboard?id=<INSERT SONAR SPLITTER PROJECT>))\n\n[//]: # ([![Reliability](https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=reliability_rating)](https://sonarcloud.io/component_measures?id=<INSERT SONAR SPLITTER PROJECT>&metric=reliability_rating))\n\n[//]: # ([![Security](https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=security_rating)](https://sonarcloud.io/component_measures?id=A<INSERT SONAR SPLITTER PROJECT>&metric=security_rating))\n\n[//]: # ([![Code Coverage](https://sonarcloud.io/api/project_badges/measure?project=<INSERT SONAR SPLITTER PROJECT>&metric=coverage)](https://sonarcloud.io/component_measures?id=<INSERT SONAR SPLITTER PROJECT>&metric=Coverage))\n\n[//]: # ([![Twitter](https://img.shields.io/twitter/follow/GuildDEvOpen?style=social)](https://twitter.com/intent/follow?screen_name=GuildDEvOpen))\n\n- [About](#about)\n- [How to consume](#how-to-consume)\n- [Contribute](#contribute)\n\n## About\nThe axa-fr-splitter package aims at providing tools to process several types of documents (pdf, tiff, ...) into images using Python.\n\n## Quick Start\n```sh\npip install axa-fr-splitter\n```\n\n\n```python\nfrom pathlib import Path\nfrom splitter import FileHandler\nfrom splitter.image.tiff_handler import TifHandler\nfrom splitter.pdf.pdf_handler import FitzPdfHandler\n\n\ndef create_file_handler() -> FileHandler:\n \"\"\"Factory to create customized file handler\"\"\"\n\n # Create File Handler\n file_handler = FileHandler()\n\n # Create pdf Handler\n pdf_handler = FitzPdfHandler()\n\n # Create tiff Handler\n tiff_handler = TifHandler()\n\n # Register PDF Handler\n file_handler.register_converter(\n pdf_handler,\n extensions=['.pdf'],\n mime_types=['application/pdf']\n )\n\n # Register tiff Handler\n file_handler.register_converter(\n tiff_handler,\n extensions=['.tif', '.tiff'],\n mime_types=['image/tiff']\n )\n\n return file_handler\n\n\ndef main(filepath, output_path):\n file_handler = create_file_handler()\n output_path = Path(output_path)\n\n for file_or_exception in file_handler.split_document(filepath):\n file = file_or_exception.unwrap()\n\n print(file.metadata)\n # {\n # 'original_filename': 'specimen.tiff',\n # 'page_number': 1,\n # 'total_pages': 4,\n # 'width': 1554,\n # 'height': 2200,\n # 'resized_ratio': 0.9405728943993159\n # }\n\n # Export File file bytes:\n export_path = output_path.joinpath(file.relative_path)\n export_path.write_bytes(file.file_bytes)\n\nif __name__ == '__main__':\n main(r\"tests/inputs/specimen.tiff\", MY_OUTPUT_PATH)\n```\n\nYou can use the `match` statement to handle the exceptions in a different way:\n\n``` python\nfrom returns.result import Failure, Success\n\n...\n\ndef main(filepath, output_path):\n file_handler = create_file_handler()\n output_path = Path(output_path)\n\n for file_or_exception in file_handler.split_document(filepath):\n match file_or_exception:\n case Success(file):\n print(file.metadata)\n export_path = output_path.joinpath(file.relative_path)\n export_path.write_bytes(file.file_bytes)\n case Failure(exception):\n # Handle Exception ...\n raise exception\n\n```\n\n## Contribute\n\n- [How to run the solution and to contribute](./.github/CONTRIBUTING.md)\n- [Please respect our code of conduct](./.github/CODE_OF_CONDUCT.md)\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "This package splits PDF and TIFF files into separate PNGs and extracts text from input files.",
"version": "1.0.0",
"project_urls": {
"Issues": "https://github.com/AxaFrance/axa-fr-splitter/issues",
"Repository": "https://github.com/AxaFrance/axa-fr-splitter"
},
"split_keywords": [
"doc-proc",
" pdf",
" tif",
" tiff",
" word"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "245aa5f25af013d4d586344544bf1ad1ac22edb346e94097bf591eee32d7184e",
"md5": "81962d8cafc11b8755690636971e7021",
"sha256": "341f8bca5ea8792c6bb2aad704f583bc38f0eb122b5bd6711db53ee499c51e7c"
},
"downloads": -1,
"filename": "axa_fr_splitter-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "81962d8cafc11b8755690636971e7021",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 16896,
"upload_time": "2024-10-09T08:43:48",
"upload_time_iso_8601": "2024-10-09T08:43:48.933492Z",
"url": "https://files.pythonhosted.org/packages/24/5a/a5f25af013d4d586344544bf1ad1ac22edb346e94097bf591eee32d7184e/axa_fr_splitter-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "48247db02fef5c9b7ba550e3fea3eda4bc7938305f78dca0c6fa008a4631a505",
"md5": "dfe45bae393fb49db5aaa5274b3b4ba7",
"sha256": "1b7dc685de4a769207d62d8d0309df8996cd99155eea1c4702e2c9636c5162a9"
},
"downloads": -1,
"filename": "axa_fr_splitter-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "dfe45bae393fb49db5aaa5274b3b4ba7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 1163147,
"upload_time": "2024-10-09T08:43:50",
"upload_time_iso_8601": "2024-10-09T08:43:50.794292Z",
"url": "https://files.pythonhosted.org/packages/48/24/7db02fef5c9b7ba550e3fea3eda4bc7938305f78dca0c6fa008a4631a505/axa_fr_splitter-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-09 08:43:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AxaFrance",
"github_project": "axa-fr-splitter",
"github_not_found": true,
"lcname": "axa-fr-splitter"
}