amazon-textract-overlayer


Nameamazon-textract-overlayer JSON
Version 0.0.12 PyPI version JSON
download
home_pagehttps://github.com/aws-samples/amazon-textract-textractor/tree/master/overlayer
SummaryAmazon Textract Overlay tools
upload_time2023-11-14 16:44:58
maintainer
docs_urlNone
authorAmazon Rekognition Textract Demoes
requires_python>=3.6
licenseApache License Version 2.0
keywords amazon-textract-textractor amazon textract textractor helper overlayer
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Textract-Overlayer

amazon-textract-overlayer provides functions to help overlay bounding boxes on documents.

# Install

```bash
> python -m pip install amazon-textract-overlayer
```

Make sure your environment is setup with AWS credentials through configuration files or environment variables or an attached role. (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)

# Samples

Primary method provided is get_bounding_boxes which returns bounding boxes based on the Textract_Type passed in.
Mostly taken from the ```amazon-textract``` command from the package ```amazon-textract-helper```.

This will return the bounding boxes for WORD and CELL data types.

```python
from textractoverlayer.t_overlay import DocumentDimensions, get_bounding_boxes
from textractcaller.t_call import Textract_Features, Textract_Types, call_textract

doc = call_textract(input_document=input_document, features=features)
# image is a PIL.Image.Image in this case
document_dimension:DocumentDimensions = DocumentDimensions(doc_width=image.size[0], doc_height=image.size[1])
overlay=[Textract_Types.WORD, Textract_Types.CELL]

bounding_box_list = get_bounding_boxes(textract_json=doc, document_dimensions=document_dimension, overlay_features=overlay)
```

The actual overlay drawing of bounding boxes for images is in the ```amazon-textract``` command from the package ```amazon-textract-helper``` and looks like this:

```python
from PIL import Image, ImageDraw

image = Image.open(input_document)
rgb_im = image.convert('RGB')
draw = ImageDraw.Draw(rgb_im)

# check the impl in amazon-textract-helper for ways to associate different colors to types
for bbox in bounding_box_list:
    draw.rectangle(xy=[bbox.xmin, bbox.ymin, bbox.xmax, bbox.ymax], outline=(128, 128, 0), width=2)

rgb_im.show()
```

The draw bounding boxes within PDF documents the following code can be used:

```python
import fitz

# for local stored files
file_path = "<<replace with the local path to your pdf file>>"
doc = fitz.open(file_path)
# for files stored in S3 the streaming object can be used
# doc = fitz.open(stream="<<replace with stream_object_variable>>", filetype="pdf")

# draw boxes
for p, page in enumerate(doc):
    p += 1
    for bbox in bounding_box_list:
        if bbox.page_number == p:
            page.draw_rect(
                [bbox.xmin, bbox.ymin, bbox.xmax, bbox.ymax], color=(0, 1, 0), width=2
            )

# save file locally 
doc.save("<<local path for output file>>")

```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/aws-samples/amazon-textract-textractor/tree/master/overlayer",
    "name": "amazon-textract-overlayer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "amazon-textract-textractor amazon textract textractor helper overlayer",
    "author": "Amazon Rekognition Textract Demoes",
    "author_email": "rekognition-textract-demos@amazon.com",
    "download_url": "https://files.pythonhosted.org/packages/be/41/cdfc5dcab9eaf3c2b3aedc7d49bfa18cecae06d0f87e2732bf39ce2f5aa7/amazon-textract-overlayer-0.0.12.tar.gz",
    "platform": null,
    "description": "# Textract-Overlayer\n\namazon-textract-overlayer provides functions to help overlay bounding boxes on documents.\n\n# Install\n\n```bash\n> python -m pip install amazon-textract-overlayer\n```\n\nMake sure your environment is setup with AWS credentials through configuration files or environment variables or an attached role. (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)\n\n# Samples\n\nPrimary method provided is get_bounding_boxes which returns bounding boxes based on the Textract_Type passed in.\nMostly taken from the ```amazon-textract``` command from the package ```amazon-textract-helper```.\n\nThis will return the bounding boxes for WORD and CELL data types.\n\n```python\nfrom textractoverlayer.t_overlay import DocumentDimensions, get_bounding_boxes\nfrom textractcaller.t_call import Textract_Features, Textract_Types, call_textract\n\ndoc = call_textract(input_document=input_document, features=features)\n# image is a PIL.Image.Image in this case\ndocument_dimension:DocumentDimensions = DocumentDimensions(doc_width=image.size[0], doc_height=image.size[1])\noverlay=[Textract_Types.WORD, Textract_Types.CELL]\n\nbounding_box_list = get_bounding_boxes(textract_json=doc, document_dimensions=document_dimension, overlay_features=overlay)\n```\n\nThe actual overlay drawing of bounding boxes for images is in the ```amazon-textract``` command from the package ```amazon-textract-helper``` and looks like this:\n\n```python\nfrom PIL import Image, ImageDraw\n\nimage = Image.open(input_document)\nrgb_im = image.convert('RGB')\ndraw = ImageDraw.Draw(rgb_im)\n\n# check the impl in amazon-textract-helper for ways to associate different colors to types\nfor bbox in bounding_box_list:\n    draw.rectangle(xy=[bbox.xmin, bbox.ymin, bbox.xmax, bbox.ymax], outline=(128, 128, 0), width=2)\n\nrgb_im.show()\n```\n\nThe draw bounding boxes within PDF documents the following code can be used:\n\n```python\nimport fitz\n\n# for local stored files\nfile_path = \"<<replace with the local path to your pdf file>>\"\ndoc = fitz.open(file_path)\n# for files stored in S3 the streaming object can be used\n# doc = fitz.open(stream=\"<<replace with stream_object_variable>>\", filetype=\"pdf\")\n\n# draw boxes\nfor p, page in enumerate(doc):\n    p += 1\n    for bbox in bounding_box_list:\n        if bbox.page_number == p:\n            page.draw_rect(\n                [bbox.xmin, bbox.ymin, bbox.xmax, bbox.ymax], color=(0, 1, 0), width=2\n            )\n\n# save file locally \ndoc.save(\"<<local path for output file>>\")\n\n```\n\n",
    "bugtrack_url": null,
    "license": "Apache License Version 2.0",
    "summary": "Amazon Textract Overlay tools",
    "version": "0.0.12",
    "project_urls": {
        "Homepage": "https://github.com/aws-samples/amazon-textract-textractor/tree/master/overlayer"
    },
    "split_keywords": [
        "amazon-textract-textractor",
        "amazon",
        "textract",
        "textractor",
        "helper",
        "overlayer"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7bd665dd95f8807c7bba6f6ace217ae00c505504b09ca39d2c7559a2f4edff18",
                "md5": "4c23cdcda519fe9683c52969617490ef",
                "sha256": "68ac82fbee1fa8080a79cb2cba304d94e07862b856fbbaebe50fc2f23195926c"
            },
            "downloads": -1,
            "filename": "amazon_textract_overlayer-0.0.12-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4c23cdcda519fe9683c52969617490ef",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.6",
            "size": 9400,
            "upload_time": "2023-11-14T16:44:57",
            "upload_time_iso_8601": "2023-11-14T16:44:57.077697Z",
            "url": "https://files.pythonhosted.org/packages/7b/d6/65dd95f8807c7bba6f6ace217ae00c505504b09ca39d2c7559a2f4edff18/amazon_textract_overlayer-0.0.12-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "be41cdfc5dcab9eaf3c2b3aedc7d49bfa18cecae06d0f87e2732bf39ce2f5aa7",
                "md5": "af47810e9f5d286af3dc34e654ccfca9",
                "sha256": "f6b7f87381d62a84aa8f159c218600f7e6742771a58e6126515b1849a105e288"
            },
            "downloads": -1,
            "filename": "amazon-textract-overlayer-0.0.12.tar.gz",
            "has_sig": false,
            "md5_digest": "af47810e9f5d286af3dc34e654ccfca9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 9100,
            "upload_time": "2023-11-14T16:44:58",
            "upload_time_iso_8601": "2023-11-14T16:44:58.594285Z",
            "url": "https://files.pythonhosted.org/packages/be/41/cdfc5dcab9eaf3c2b3aedc7d49bfa18cecae06d0f87e2732bf39ce2f5aa7/amazon-textract-overlayer-0.0.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-14 16:44:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aws-samples",
    "github_project": "amazon-textract-textractor",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "amazon-textract-overlayer"
}
        
Elapsed time: 0.49816s