aiopytesseract


Nameaiopytesseract JSON
Version 0.13.0 PyPI version JSON
download
home_pagehttps://github.com/amenezes/aiopytesseract
Summaryasyncio tesseract wrapper for Tesseract-OCR
upload_time2023-12-07 15:22:39
maintainer
docs_urlNone
authorAlexandre Menezes
requires_python>=3.8
licenseApache-2.0
keywords "asyncio" "ocr" "tesseract"
VCS
bugtrack_url
requirements aiofiles cattrs
Travis-CI No Travis.
coveralls test coverage
            [![ci](https://github.com/amenezes/aiopytesseract/actions/workflows/ci.yml/badge.svg)](https://github.com/amenezes/aiopytesseract/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/amenezes/aiopytesseract/branch/master/graph/badge.svg)](https://codecov.io/gh/amenezes/aiopytesseract)
[![PyPI version](https://badge.fury.io/py/aiopytesseract.svg)](https://badge.fury.io/py/aiopytesseract)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/aiopytesseract)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

# aiopytesseract

A Python [asyncio](https://docs.python.org/3/library/asyncio.html) wrapper for [Tesseract-OCR](https://tesseract-ocr.github.io/tessdoc/).

## Installation

Install and update using pip:

````bash
pip install aiopytesseract
````

## Usage

```python
from pathlib import Path

import aiopytesseract


# list all available languages by tesseract installation
await aiopytesseract.languages()
await aiopytesseract.get_languages()


# tesseract version
await aiopytesseract.tesseract_version()
await aiopytesseract.get_tesseract_version()


# tesseract parameters
await aiopytesseract.tesseract_parameters()


# confidence only info
await aiopytesseract.confidence("tests/samples/file-sample_150kB.png")


# deskew info
await aiopytesseract.deskew("tests/samples/file-sample_150kB.png")


# extract text from an image: locally or bytes
await aiopytesseract.image_to_string("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_string(
	Path("tests/samples/file-sample_150kB.png")read_bytes(), dpi=220, lang='eng+por'
)


# box estimates
await aiopytesseract.image_to_boxes("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_boxes(Path("tests/samples/file-sample_150kB.png")


# boxes, confidence and page numbers
await aiopytesseract.image_to_data("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_data(Path("tests/samples/file-sample_150kB.png")


# information about orientation and script detection
await aiopytesseract.image_to_osd("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_osd(Path("tests/samples/file-sample_150kB.png")


# generate a searchable PDF
await aiopytesseract.image_to_pdf("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_pdf(Path("tests/samples/file-sample_150kB.png")


# generate HOCR output
await aiopytesseract.image_to_hocr("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_hocr(Path("tests/samples/file-sample_150kB.png")


# multi ouput
async with aiopytesseract.run(
	Path('tests/samples/file-sample_150kB.png').read_bytes(),
	'output',
	'alto tsv txt'
) as resp:
	# will generate (output.xml, output.tsv and output.txt)
	print(resp)
	alto_file, tsv_file, txt_file = resp
```

For more details on Tesseract best practices and the aiopytesseract, see the folder: `docs`.

## Examples

If you want to test **aiopytesseract** easily, can you use some options like:

- docker/docker-compose
- [streamlit](https://streamlit.io)

### Docker / docker-compose

After clone this repo run the command below:

```bash
docker-compose up -d
```

### streamlit app

For this option it's necessary first install `aiopytesseract` and `streamlit`, after execute:

``` py
# remote option:
streamlit run https://github.com/amenezes/aiopytesseract/blob/master/examples/streamlit/app.py
```

``` py
# local option:
streamlit run examples/streamlit/app.py
```

> note: The streamlit example need **python >= 3.10**

## Links

- License: [Apache License](https://choosealicense.com/licenses/apache-2.0/)
- Code: [https://github.com/amenezes/aiopytesseract](https://github.com/amenezes/aiopytesseract)
- Issue tracker: [https://github.com/amenezes/aiopytesseract/issues](https://github.com/amenezes/aiopytesseract/issues)
- Docs: [https://github.com/amenezes/aiopytesseract](https://github.com/amenezes/aiopytesseract)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/amenezes/aiopytesseract",
    "name": "aiopytesseract",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "\"asyncio\",\"ocr\",\"tesseract\"",
    "author": "Alexandre Menezes",
    "author_email": "alexandre.fmenezes@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/27/bb/975580dc5546bb8f0f1dd9907b56771a295afcb07c50018b6991f477dec3/aiopytesseract-0.13.0.tar.gz",
    "platform": null,
    "description": "[![ci](https://github.com/amenezes/aiopytesseract/actions/workflows/ci.yml/badge.svg)](https://github.com/amenezes/aiopytesseract/actions/workflows/ci.yml)\n[![codecov](https://codecov.io/gh/amenezes/aiopytesseract/branch/master/graph/badge.svg)](https://codecov.io/gh/amenezes/aiopytesseract)\n[![PyPI version](https://badge.fury.io/py/aiopytesseract.svg)](https://badge.fury.io/py/aiopytesseract)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/aiopytesseract)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n# aiopytesseract\n\nA Python [asyncio](https://docs.python.org/3/library/asyncio.html) wrapper for [Tesseract-OCR](https://tesseract-ocr.github.io/tessdoc/).\n\n## Installation\n\nInstall and update using pip:\n\n````bash\npip install aiopytesseract\n````\n\n## Usage\n\n```python\nfrom pathlib import Path\n\nimport aiopytesseract\n\n\n# list all available languages by tesseract installation\nawait aiopytesseract.languages()\nawait aiopytesseract.get_languages()\n\n\n# tesseract version\nawait aiopytesseract.tesseract_version()\nawait aiopytesseract.get_tesseract_version()\n\n\n# tesseract parameters\nawait aiopytesseract.tesseract_parameters()\n\n\n# confidence only info\nawait aiopytesseract.confidence(\"tests/samples/file-sample_150kB.png\")\n\n\n# deskew info\nawait aiopytesseract.deskew(\"tests/samples/file-sample_150kB.png\")\n\n\n# extract text from an image: locally or bytes\nawait aiopytesseract.image_to_string(\"tests/samples/file-sample_150kB.png\")\nawait aiopytesseract.image_to_string(\n\tPath(\"tests/samples/file-sample_150kB.png\")read_bytes(), dpi=220, lang='eng+por'\n)\n\n\n# box estimates\nawait aiopytesseract.image_to_boxes(\"tests/samples/file-sample_150kB.png\")\nawait aiopytesseract.image_to_boxes(Path(\"tests/samples/file-sample_150kB.png\")\n\n\n# boxes, confidence and page numbers\nawait aiopytesseract.image_to_data(\"tests/samples/file-sample_150kB.png\")\nawait aiopytesseract.image_to_data(Path(\"tests/samples/file-sample_150kB.png\")\n\n\n# information about orientation and script detection\nawait aiopytesseract.image_to_osd(\"tests/samples/file-sample_150kB.png\")\nawait aiopytesseract.image_to_osd(Path(\"tests/samples/file-sample_150kB.png\")\n\n\n# generate a searchable PDF\nawait aiopytesseract.image_to_pdf(\"tests/samples/file-sample_150kB.png\")\nawait aiopytesseract.image_to_pdf(Path(\"tests/samples/file-sample_150kB.png\")\n\n\n# generate HOCR output\nawait aiopytesseract.image_to_hocr(\"tests/samples/file-sample_150kB.png\")\nawait aiopytesseract.image_to_hocr(Path(\"tests/samples/file-sample_150kB.png\")\n\n\n# multi ouput\nasync with aiopytesseract.run(\n\tPath('tests/samples/file-sample_150kB.png').read_bytes(),\n\t'output',\n\t'alto tsv txt'\n) as resp:\n\t# will generate (output.xml, output.tsv and output.txt)\n\tprint(resp)\n\talto_file, tsv_file, txt_file = resp\n```\n\nFor more details on Tesseract best practices and the aiopytesseract, see the folder: `docs`.\n\n## Examples\n\nIf you want to test **aiopytesseract** easily, can you use some options like:\n\n- docker/docker-compose\n- [streamlit](https://streamlit.io)\n\n### Docker / docker-compose\n\nAfter clone this repo run the command below:\n\n```bash\ndocker-compose up -d\n```\n\n### streamlit app\n\nFor this option it's necessary first install `aiopytesseract` and `streamlit`, after execute:\n\n``` py\n# remote option:\nstreamlit run https://github.com/amenezes/aiopytesseract/blob/master/examples/streamlit/app.py\n```\n\n``` py\n# local option:\nstreamlit run examples/streamlit/app.py\n```\n\n> note: The streamlit example need **python >= 3.10**\n\n## Links\n\n- License: [Apache License](https://choosealicense.com/licenses/apache-2.0/)\n- Code: [https://github.com/amenezes/aiopytesseract](https://github.com/amenezes/aiopytesseract)\n- Issue tracker: [https://github.com/amenezes/aiopytesseract/issues](https://github.com/amenezes/aiopytesseract/issues)\n- Docs: [https://github.com/amenezes/aiopytesseract](https://github.com/amenezes/aiopytesseract)\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "asyncio tesseract wrapper for Tesseract-OCR",
    "version": "0.13.0",
    "project_urls": {
        "Changes": "https://github.com/amenezes/aiopytesseract/releases",
        "Code": "https://github.com/amenezes/aiopytesseract",
        "Documentation": "https://github.com/amenezes/aiopytesseract",
        "Homepage": "https://github.com/amenezes/aiopytesseract",
        "Issue tracker": "https://github.com/amenezes/aiopytesseract/issues"
    },
    "split_keywords": [
        "\"asyncio\"",
        "\"ocr\"",
        "\"tesseract\""
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0467c8bed9438fb5c77a86c9d08f0c73ed810bd333dbca84283621ad9745d485",
                "md5": "3502915eef3c05fe5389b536c442601f",
                "sha256": "6b195a6ab492fd0898e61f5222a1b113b9cc5f0d3ed64c8b185fafb9a3b38e4a"
            },
            "downloads": -1,
            "filename": "aiopytesseract-0.13.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3502915eef3c05fe5389b536c442601f",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.8",
            "size": 23649,
            "upload_time": "2023-12-07T15:22:37",
            "upload_time_iso_8601": "2023-12-07T15:22:37.708849Z",
            "url": "https://files.pythonhosted.org/packages/04/67/c8bed9438fb5c77a86c9d08f0c73ed810bd333dbca84283621ad9745d485/aiopytesseract-0.13.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "27bb975580dc5546bb8f0f1dd9907b56771a295afcb07c50018b6991f477dec3",
                "md5": "60440227b11f4a2d7179a0c97d2c297c",
                "sha256": "255317273185668055b86d0e7a526a1d688d8bfeecee732f6a90ed0ca4267cad"
            },
            "downloads": -1,
            "filename": "aiopytesseract-0.13.0.tar.gz",
            "has_sig": false,
            "md5_digest": "60440227b11f4a2d7179a0c97d2c297c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 18663,
            "upload_time": "2023-12-07T15:22:39",
            "upload_time_iso_8601": "2023-12-07T15:22:39.786926Z",
            "url": "https://files.pythonhosted.org/packages/27/bb/975580dc5546bb8f0f1dd9907b56771a295afcb07c50018b6991f477dec3/aiopytesseract-0.13.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-07 15:22:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "amenezes",
    "github_project": "aiopytesseract",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [
        {
            "name": "aiofiles",
            "specs": []
        },
        {
            "name": "cattrs",
            "specs": []
        }
    ],
    "lcname": "aiopytesseract"
}
        
Elapsed time: 0.20305s