[![ci](https://github.com/amenezes/aiopytesseract/actions/workflows/ci.yml/badge.svg)](https://github.com/amenezes/aiopytesseract/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/amenezes/aiopytesseract/branch/master/graph/badge.svg)](https://codecov.io/gh/amenezes/aiopytesseract)
[![PyPI version](https://badge.fury.io/py/aiopytesseract.svg)](https://badge.fury.io/py/aiopytesseract)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/aiopytesseract)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
# aiopytesseract
A Python [asyncio](https://docs.python.org/3/library/asyncio.html) wrapper for [Tesseract-OCR](https://tesseract-ocr.github.io/tessdoc/).
## Installation
Install and update using pip:
````bash
pip install aiopytesseract
````
## Usage
```python
from pathlib import Path
import aiopytesseract
# list all available languages by tesseract installation
await aiopytesseract.languages()
await aiopytesseract.get_languages()
# tesseract version
await aiopytesseract.tesseract_version()
await aiopytesseract.get_tesseract_version()
# tesseract parameters
await aiopytesseract.tesseract_parameters()
# confidence only info
await aiopytesseract.confidence("tests/samples/file-sample_150kB.png")
# deskew info
await aiopytesseract.deskew("tests/samples/file-sample_150kB.png")
# extract text from an image: locally or bytes
await aiopytesseract.image_to_string("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_string(
Path("tests/samples/file-sample_150kB.png")read_bytes(), dpi=220, lang='eng+por'
)
# box estimates
await aiopytesseract.image_to_boxes("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_boxes(Path("tests/samples/file-sample_150kB.png")
# boxes, confidence and page numbers
await aiopytesseract.image_to_data("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_data(Path("tests/samples/file-sample_150kB.png")
# information about orientation and script detection
await aiopytesseract.image_to_osd("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_osd(Path("tests/samples/file-sample_150kB.png")
# generate a searchable PDF
await aiopytesseract.image_to_pdf("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_pdf(Path("tests/samples/file-sample_150kB.png")
# generate HOCR output
await aiopytesseract.image_to_hocr("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_hocr(Path("tests/samples/file-sample_150kB.png")
# multi ouput
async with aiopytesseract.run(
Path('tests/samples/file-sample_150kB.png').read_bytes(),
'output',
'alto tsv txt'
) as resp:
# will generate (output.xml, output.tsv and output.txt)
print(resp)
alto_file, tsv_file, txt_file = resp
```
For more details on Tesseract best practices and the aiopytesseract, see the folder: `docs`.
## Examples
If you want to test **aiopytesseract** easily, can you use some options like:
- docker/docker-compose
- [streamlit](https://streamlit.io)
### Docker / docker-compose
After clone this repo run the command below:
```bash
docker-compose up -d
```
### streamlit app
For this option it's necessary first install `aiopytesseract` and `streamlit`, after execute:
``` py
# remote option:
streamlit run https://github.com/amenezes/aiopytesseract/blob/master/examples/streamlit/app.py
```
``` py
# local option:
streamlit run examples/streamlit/app.py
```
> note: The streamlit example need **python >= 3.10**
## Links
- License: [Apache License](https://choosealicense.com/licenses/apache-2.0/)
- Code: [https://github.com/amenezes/aiopytesseract](https://github.com/amenezes/aiopytesseract)
- Issue tracker: [https://github.com/amenezes/aiopytesseract/issues](https://github.com/amenezes/aiopytesseract/issues)
- Docs: [https://github.com/amenezes/aiopytesseract](https://github.com/amenezes/aiopytesseract)
Raw data
{
"_id": null,
"home_page": "https://github.com/amenezes/aiopytesseract",
"name": "aiopytesseract",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "\"asyncio\",\"ocr\",\"tesseract\"",
"author": "Alexandre Menezes",
"author_email": "alexandre.fmenezes@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/27/bb/975580dc5546bb8f0f1dd9907b56771a295afcb07c50018b6991f477dec3/aiopytesseract-0.13.0.tar.gz",
"platform": null,
"description": "[![ci](https://github.com/amenezes/aiopytesseract/actions/workflows/ci.yml/badge.svg)](https://github.com/amenezes/aiopytesseract/actions/workflows/ci.yml)\n[![codecov](https://codecov.io/gh/amenezes/aiopytesseract/branch/master/graph/badge.svg)](https://codecov.io/gh/amenezes/aiopytesseract)\n[![PyPI version](https://badge.fury.io/py/aiopytesseract.svg)](https://badge.fury.io/py/aiopytesseract)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/aiopytesseract)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n# aiopytesseract\n\nA Python [asyncio](https://docs.python.org/3/library/asyncio.html) wrapper for [Tesseract-OCR](https://tesseract-ocr.github.io/tessdoc/).\n\n## Installation\n\nInstall and update using pip:\n\n````bash\npip install aiopytesseract\n````\n\n## Usage\n\n```python\nfrom pathlib import Path\n\nimport aiopytesseract\n\n\n# list all available languages by tesseract installation\nawait aiopytesseract.languages()\nawait aiopytesseract.get_languages()\n\n\n# tesseract version\nawait aiopytesseract.tesseract_version()\nawait aiopytesseract.get_tesseract_version()\n\n\n# tesseract parameters\nawait aiopytesseract.tesseract_parameters()\n\n\n# confidence only info\nawait aiopytesseract.confidence(\"tests/samples/file-sample_150kB.png\")\n\n\n# deskew info\nawait aiopytesseract.deskew(\"tests/samples/file-sample_150kB.png\")\n\n\n# extract text from an image: locally or bytes\nawait aiopytesseract.image_to_string(\"tests/samples/file-sample_150kB.png\")\nawait aiopytesseract.image_to_string(\n\tPath(\"tests/samples/file-sample_150kB.png\")read_bytes(), dpi=220, lang='eng+por'\n)\n\n\n# box estimates\nawait aiopytesseract.image_to_boxes(\"tests/samples/file-sample_150kB.png\")\nawait aiopytesseract.image_to_boxes(Path(\"tests/samples/file-sample_150kB.png\")\n\n\n# boxes, confidence and page numbers\nawait aiopytesseract.image_to_data(\"tests/samples/file-sample_150kB.png\")\nawait aiopytesseract.image_to_data(Path(\"tests/samples/file-sample_150kB.png\")\n\n\n# information about orientation and script detection\nawait aiopytesseract.image_to_osd(\"tests/samples/file-sample_150kB.png\")\nawait aiopytesseract.image_to_osd(Path(\"tests/samples/file-sample_150kB.png\")\n\n\n# generate a searchable PDF\nawait aiopytesseract.image_to_pdf(\"tests/samples/file-sample_150kB.png\")\nawait aiopytesseract.image_to_pdf(Path(\"tests/samples/file-sample_150kB.png\")\n\n\n# generate HOCR output\nawait aiopytesseract.image_to_hocr(\"tests/samples/file-sample_150kB.png\")\nawait aiopytesseract.image_to_hocr(Path(\"tests/samples/file-sample_150kB.png\")\n\n\n# multi ouput\nasync with aiopytesseract.run(\n\tPath('tests/samples/file-sample_150kB.png').read_bytes(),\n\t'output',\n\t'alto tsv txt'\n) as resp:\n\t# will generate (output.xml, output.tsv and output.txt)\n\tprint(resp)\n\talto_file, tsv_file, txt_file = resp\n```\n\nFor more details on Tesseract best practices and the aiopytesseract, see the folder: `docs`.\n\n## Examples\n\nIf you want to test **aiopytesseract** easily, can you use some options like:\n\n- docker/docker-compose\n- [streamlit](https://streamlit.io)\n\n### Docker / docker-compose\n\nAfter clone this repo run the command below:\n\n```bash\ndocker-compose up -d\n```\n\n### streamlit app\n\nFor this option it's necessary first install `aiopytesseract` and `streamlit`, after execute:\n\n``` py\n# remote option:\nstreamlit run https://github.com/amenezes/aiopytesseract/blob/master/examples/streamlit/app.py\n```\n\n``` py\n# local option:\nstreamlit run examples/streamlit/app.py\n```\n\n> note: The streamlit example need **python >= 3.10**\n\n## Links\n\n- License: [Apache License](https://choosealicense.com/licenses/apache-2.0/)\n- Code: [https://github.com/amenezes/aiopytesseract](https://github.com/amenezes/aiopytesseract)\n- Issue tracker: [https://github.com/amenezes/aiopytesseract/issues](https://github.com/amenezes/aiopytesseract/issues)\n- Docs: [https://github.com/amenezes/aiopytesseract](https://github.com/amenezes/aiopytesseract)\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "asyncio tesseract wrapper for Tesseract-OCR",
"version": "0.13.0",
"project_urls": {
"Changes": "https://github.com/amenezes/aiopytesseract/releases",
"Code": "https://github.com/amenezes/aiopytesseract",
"Documentation": "https://github.com/amenezes/aiopytesseract",
"Homepage": "https://github.com/amenezes/aiopytesseract",
"Issue tracker": "https://github.com/amenezes/aiopytesseract/issues"
},
"split_keywords": [
"\"asyncio\"",
"\"ocr\"",
"\"tesseract\""
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0467c8bed9438fb5c77a86c9d08f0c73ed810bd333dbca84283621ad9745d485",
"md5": "3502915eef3c05fe5389b536c442601f",
"sha256": "6b195a6ab492fd0898e61f5222a1b113b9cc5f0d3ed64c8b185fafb9a3b38e4a"
},
"downloads": -1,
"filename": "aiopytesseract-0.13.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "3502915eef3c05fe5389b536c442601f",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 23649,
"upload_time": "2023-12-07T15:22:37",
"upload_time_iso_8601": "2023-12-07T15:22:37.708849Z",
"url": "https://files.pythonhosted.org/packages/04/67/c8bed9438fb5c77a86c9d08f0c73ed810bd333dbca84283621ad9745d485/aiopytesseract-0.13.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "27bb975580dc5546bb8f0f1dd9907b56771a295afcb07c50018b6991f477dec3",
"md5": "60440227b11f4a2d7179a0c97d2c297c",
"sha256": "255317273185668055b86d0e7a526a1d688d8bfeecee732f6a90ed0ca4267cad"
},
"downloads": -1,
"filename": "aiopytesseract-0.13.0.tar.gz",
"has_sig": false,
"md5_digest": "60440227b11f4a2d7179a0c97d2c297c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 18663,
"upload_time": "2023-12-07T15:22:39",
"upload_time_iso_8601": "2023-12-07T15:22:39.786926Z",
"url": "https://files.pythonhosted.org/packages/27/bb/975580dc5546bb8f0f1dd9907b56771a295afcb07c50018b6991f477dec3/aiopytesseract-0.13.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-07 15:22:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "amenezes",
"github_project": "aiopytesseract",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"requirements": [
{
"name": "aiofiles",
"specs": []
},
{
"name": "cattrs",
"specs": []
}
],
"lcname": "aiopytesseract"
}