# Textract-Overlayer
amazon-textract-overlayer provides functions to help overlay bounding boxes on documents.
# Install
```bash
> python -m pip install amazon-textract-overlayer
```
Make sure your environment is setup with AWS credentials through configuration files or environment variables or an attached role. (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)
# Samples
Primary method provided is get_bounding_boxes which returns bounding boxes based on the Textract_Type passed in.
Mostly taken from the ```amazon-textract``` command from the package ```amazon-textract-helper```.
This will return the bounding boxes for WORD and CELL data types.
```python
from textractoverlayer.t_overlay import DocumentDimensions, get_bounding_boxes
from textractcaller.t_call import Textract_Features, Textract_Types, call_textract
doc = call_textract(input_document=input_document, features=features)
# image is a PIL.Image.Image in this case
document_dimension:DocumentDimensions = DocumentDimensions(doc_width=image.size[0], doc_height=image.size[1])
overlay=[Textract_Types.WORD, Textract_Types.CELL]
bounding_box_list = get_bounding_boxes(textract_json=doc, document_dimensions=document_dimension, overlay_features=overlay)
```
The actual overlay drawing of bounding boxes for images is in the ```amazon-textract``` command from the package ```amazon-textract-helper``` and looks like this:
```python
from PIL import Image, ImageDraw
image = Image.open(input_document)
rgb_im = image.convert('RGB')
draw = ImageDraw.Draw(rgb_im)
# check the impl in amazon-textract-helper for ways to associate different colors to types
for bbox in bounding_box_list:
draw.rectangle(xy=[bbox.xmin, bbox.ymin, bbox.xmax, bbox.ymax], outline=(128, 128, 0), width=2)
rgb_im.show()
```
The draw bounding boxes within PDF documents the following code can be used:
```python
import fitz
# for local stored files
file_path = "<<replace with the local path to your pdf file>>"
doc = fitz.open(file_path)
# for files stored in S3 the streaming object can be used
# doc = fitz.open(stream="<<replace with stream_object_variable>>", filetype="pdf")
# draw boxes
for p, page in enumerate(doc):
p += 1
for bbox in bounding_box_list:
if bbox.page_number == p:
page.draw_rect(
[bbox.xmin, bbox.ymin, bbox.xmax, bbox.ymax], color=(0, 1, 0), width=2
)
# save file locally
doc.save("<<local path for output file>>")
```
Raw data
{
"_id": null,
"home_page": "https://github.com/aws-samples/amazon-textract-textractor/tree/master/overlayer",
"name": "amazon-textract-overlayer",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "amazon-textract-textractor amazon textract textractor helper overlayer",
"author": "Amazon Rekognition Textract Demoes",
"author_email": "rekognition-textract-demos@amazon.com",
"download_url": "https://files.pythonhosted.org/packages/be/41/cdfc5dcab9eaf3c2b3aedc7d49bfa18cecae06d0f87e2732bf39ce2f5aa7/amazon-textract-overlayer-0.0.12.tar.gz",
"platform": null,
"description": "# Textract-Overlayer\n\namazon-textract-overlayer provides functions to help overlay bounding boxes on documents.\n\n# Install\n\n```bash\n> python -m pip install amazon-textract-overlayer\n```\n\nMake sure your environment is setup with AWS credentials through configuration files or environment variables or an attached role. (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)\n\n# Samples\n\nPrimary method provided is get_bounding_boxes which returns bounding boxes based on the Textract_Type passed in.\nMostly taken from the ```amazon-textract``` command from the package ```amazon-textract-helper```.\n\nThis will return the bounding boxes for WORD and CELL data types.\n\n```python\nfrom textractoverlayer.t_overlay import DocumentDimensions, get_bounding_boxes\nfrom textractcaller.t_call import Textract_Features, Textract_Types, call_textract\n\ndoc = call_textract(input_document=input_document, features=features)\n# image is a PIL.Image.Image in this case\ndocument_dimension:DocumentDimensions = DocumentDimensions(doc_width=image.size[0], doc_height=image.size[1])\noverlay=[Textract_Types.WORD, Textract_Types.CELL]\n\nbounding_box_list = get_bounding_boxes(textract_json=doc, document_dimensions=document_dimension, overlay_features=overlay)\n```\n\nThe actual overlay drawing of bounding boxes for images is in the ```amazon-textract``` command from the package ```amazon-textract-helper``` and looks like this:\n\n```python\nfrom PIL import Image, ImageDraw\n\nimage = Image.open(input_document)\nrgb_im = image.convert('RGB')\ndraw = ImageDraw.Draw(rgb_im)\n\n# check the impl in amazon-textract-helper for ways to associate different colors to types\nfor bbox in bounding_box_list:\n draw.rectangle(xy=[bbox.xmin, bbox.ymin, bbox.xmax, bbox.ymax], outline=(128, 128, 0), width=2)\n\nrgb_im.show()\n```\n\nThe draw bounding boxes within PDF documents the following code can be used:\n\n```python\nimport fitz\n\n# for local stored files\nfile_path = \"<<replace with the local path to your pdf file>>\"\ndoc = fitz.open(file_path)\n# for files stored in S3 the streaming object can be used\n# doc = fitz.open(stream=\"<<replace with stream_object_variable>>\", filetype=\"pdf\")\n\n# draw boxes\nfor p, page in enumerate(doc):\n p += 1\n for bbox in bounding_box_list:\n if bbox.page_number == p:\n page.draw_rect(\n [bbox.xmin, bbox.ymin, bbox.xmax, bbox.ymax], color=(0, 1, 0), width=2\n )\n\n# save file locally \ndoc.save(\"<<local path for output file>>\")\n\n```\n\n",
"bugtrack_url": null,
"license": "Apache License Version 2.0",
"summary": "Amazon Textract Overlay tools",
"version": "0.0.12",
"project_urls": {
"Homepage": "https://github.com/aws-samples/amazon-textract-textractor/tree/master/overlayer"
},
"split_keywords": [
"amazon-textract-textractor",
"amazon",
"textract",
"textractor",
"helper",
"overlayer"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7bd665dd95f8807c7bba6f6ace217ae00c505504b09ca39d2c7559a2f4edff18",
"md5": "4c23cdcda519fe9683c52969617490ef",
"sha256": "68ac82fbee1fa8080a79cb2cba304d94e07862b856fbbaebe50fc2f23195926c"
},
"downloads": -1,
"filename": "amazon_textract_overlayer-0.0.12-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "4c23cdcda519fe9683c52969617490ef",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.6",
"size": 9400,
"upload_time": "2023-11-14T16:44:57",
"upload_time_iso_8601": "2023-11-14T16:44:57.077697Z",
"url": "https://files.pythonhosted.org/packages/7b/d6/65dd95f8807c7bba6f6ace217ae00c505504b09ca39d2c7559a2f4edff18/amazon_textract_overlayer-0.0.12-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "be41cdfc5dcab9eaf3c2b3aedc7d49bfa18cecae06d0f87e2732bf39ce2f5aa7",
"md5": "af47810e9f5d286af3dc34e654ccfca9",
"sha256": "f6b7f87381d62a84aa8f159c218600f7e6742771a58e6126515b1849a105e288"
},
"downloads": -1,
"filename": "amazon-textract-overlayer-0.0.12.tar.gz",
"has_sig": false,
"md5_digest": "af47810e9f5d286af3dc34e654ccfca9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 9100,
"upload_time": "2023-11-14T16:44:58",
"upload_time_iso_8601": "2023-11-14T16:44:58.594285Z",
"url": "https://files.pythonhosted.org/packages/be/41/cdfc5dcab9eaf3c2b3aedc7d49bfa18cecae06d0f87e2732bf39ce2f5aa7/amazon-textract-overlayer-0.0.12.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-14 16:44:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aws-samples",
"github_project": "amazon-textract-textractor",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "amazon-textract-overlayer"
}