apple-vision-utils

Name	apple-vision-utils JSON
Version	1.0.2 JSON
	download
home_page	https://github.com/tddschn/apple-vision-utils
Summary	Fast and accurate OCR on images and PDFs using Apple Vision framework directly from command line.
upload_time	2025-01-31 01:53:37
maintainer	None
docs_url	None
author	Teddy Xinyuan Chen
requires_python	<4.0,>=3.11
license	MIT
keywords	ocr apple-vision-framework
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Apple Vision Framework Python Utilities

Fast and accurate OCR on images and PDFs using Apple Vision framework (`pyobjc-framework-Vision`) directly from command line.

- [Apple Vision Framework Python Utilities](#apple-vision-framework-python-utilities)
  - [Features](#features)
  - [Demo](#demo)
  - [Installation](#installation)
    - [pipx](#pipx)
    - [pip](#pip)
    - [`uv tool` installation doesn't work](#uv-tool-installation-doesnt-work)
  - [Usage](#usage)
    - [Command Line](#command-line)
    - [As a Library](#as-a-library)
  - [Develop](#develop)
  - [Test](#test)

## Features

- Fast and accurate, multi-language support (`-l`, `--lang`), powered by Apple's industry-strength Vision framework (`pyobjc-framework-Vision`).
- Supports all common input image formats: PNG, JPEG, TIFF and WebP.
- Supports PDF input (the file gets converted to images first). This tool does NOT assume a file is PDF just because it has a `.pdf` extension, you need to pass `-p`, `--pdf` flag.
- Outputs extracted text only by default, but can output in JSON format containing confidence of recognition for each line with `-j`, `--json` flag.
- Supports text clipping based on start and end markers (`-s`, `-S`, `-e`, `-E`).

## Demo

Below is the output of running the [tests](#test):

https://g.teddysc.me/96d5b1217b90035c163b3c97ce99112f

## Installation

Requires Python >= 3.11, <4.0.

Since this package uses Apple's Vision framework, it only works on macOS.

To OCR PDFs with `-p`, you need to install required dependency `poppler` with `brew install poppler` ([detailed guide](https://github.com/Belval/pdf2image)).

### pipx

This is the recommended installation method.

```
$ pipx install apple-vision-utils
```

### [pip](https://pypi.org/project/apple-vision-utils/)

```
$ pip install apple-vision-utils
```

### `uv tool` installation doesn't work

I tried to install this with `uv tool install` using different Python versions on Apple Silicon Mac, it didn't work. May be caused by some peculiarities of objc interfacing libs. Just use `pipx` for now.

## Usage

### Command Line

```
$ apple-ocr --help

usage: apple-ocr [-h] [-j] [-p] [-l LANG] [--pdf2image-only] [--pdf2image-dir PDF2IMAGE_DIR] [-s START_MARKER_INCLUSIVE] [-S START_MARKER_EXCLUSIVE] [-e END_MARKER_INCLUSIVE] [-E END_MARKER] [-V] file_path

Extract text from an image or PDF using Apple's Vision framework.

positional arguments:
  file_path             Path to the image or PDF file.

options:
  -h, --help            show this help message and exit
  -j, --json            Output results in JSON format.
  -p, --pdf             Specify if the input file is a PDF.
  -l LANG, --lang LANG  Specify the language for text recognition (e.g., eng,
                        fra, deu, zh-Hans for Simplified Chinese, zh-Hant for
                        Traditional Chinese). Default is 'zh-Hant', which
                        works with images containing both Chinese characters
                        and latin letters.
  --pdf2image-only      Only convert PDF to images without performing OCR.
  --pdf2image-dir PDF2IMAGE_DIR
                        Specify the directory to store output images. By
                        default, a secure temporary directory is created.
  -s START_MARKER_INCLUSIVE, --start-marker-inclusive START_MARKER_INCLUSIVE
                        Specify the start marker (included, as the first line of the extracted text) for text extraction in PDF.
  -S START_MARKER_EXCLUSIVE, --start-marker-exclusive START_MARKER_EXCLUSIVE
                        Specify the start marker (excluded, as the first line of the extracted text) for text extraction in PDF.
  -e END_MARKER_INCLUSIVE, --end-marker-inclusive END_MARKER_INCLUSIVE
                        Specify the end marker (included, as the last line of the extracted text) for text extraction in PDF.
  -E END_MARKER, --end-marker END_MARKER
                        Specify the end marker (excluded, as the last line of the extracted text) for text extraction in PDF.
  -V, --version         show program's version number and exit
```

### As a Library

You can also use the utility functions in your own Python code:

```python
from apple_vision_utils.utils import image_to_text, pdf_to_images, process_pdf, clip_results

# Extract text from an image
results = image_to_text("path/to/image.png", lang="eng")

# Convert PDF to images
images = pdf_to_images("path/to/document.pdf")

# Process PDF for text recognition
pdf_results = process_pdf("path/to/document.pdf", lang="eng")

# Clip text results based on markers
clipped_results = clip_results(results, start_marker_inclusive="Start", end_marker_exclusive="End")
```

## Develop

```
$ git clone https://github.com/tddschn/apple-vision-utils.git
$ cd apple-vision-utils
$ poetry install
```

## Test

```
# in the root of the project
poetry install
poetry shell
cd tests && ./test.sh
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/tddschn/apple-vision-utils",
    "name": "apple-vision-utils",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.11",
    "maintainer_email": null,
    "keywords": "ocr, apple-vision-framework",
    "author": "Teddy Xinyuan Chen",
    "author_email": "45612704+tddschn@users.noreply.github.com",
    "download_url": "https://files.pythonhosted.org/packages/40/86/1947d4267acdbe80883e425a0956d2c2ff8ea9554411b0c0e72dd2565e0e/apple_vision_utils-1.0.2.tar.gz",
    "platform": null,
    "description": "# Apple Vision Framework Python Utilities\n\nFast and accurate OCR on images and PDFs using Apple Vision framework (`pyobjc-framework-Vision`) directly from command line.\n\n- [Apple Vision Framework Python Utilities](#apple-vision-framework-python-utilities)\n  - [Features](#features)\n  - [Demo](#demo)\n  - [Installation](#installation)\n    - [pipx](#pipx)\n    - [pip](#pip)\n    - [`uv tool` installation doesn't work](#uv-tool-installation-doesnt-work)\n  - [Usage](#usage)\n    - [Command Line](#command-line)\n    - [As a Library](#as-a-library)\n  - [Develop](#develop)\n  - [Test](#test)\n\n## Features\n\n- Fast and accurate, multi-language support (`-l`, `--lang`), powered by Apple's industry-strength Vision framework (`pyobjc-framework-Vision`).\n- Supports all common input image formats: PNG, JPEG, TIFF and WebP.\n- Supports PDF input (the file gets converted to images first). This tool does NOT assume a file is PDF just because it has a `.pdf` extension, you need to pass `-p`, `--pdf` flag.\n- Outputs extracted text only by default, but can output in JSON format containing confidence of recognition for each line with `-j`, `--json` flag.\n- Supports text clipping based on start and end markers (`-s`, `-S`, `-e`, `-E`).\n\n## Demo\n\nBelow is the output of running the [tests](#test):\n\nhttps://g.teddysc.me/96d5b1217b90035c163b3c97ce99112f\n\n## Installation\n\nRequires Python >= 3.11, <4.0.\n\nSince this package uses Apple's Vision framework, it only works on macOS.\n\nTo OCR PDFs with `-p`, you need to install required dependency `poppler` with `brew install poppler` ([detailed guide](https://github.com/Belval/pdf2image)).\n\n### pipx\n\nThis is the recommended installation method.\n\n```\n$ pipx install apple-vision-utils\n```\n\n### [pip](https://pypi.org/project/apple-vision-utils/)\n\n```\n$ pip install apple-vision-utils\n```\n\n### `uv tool` installation doesn't work\n\nI tried to install this with `uv tool install` using different Python versions on Apple Silicon Mac, it didn't work. May be caused by some peculiarities of objc interfacing libs. Just use `pipx` for now.\n\n## Usage\n\n### Command Line\n\n```\n$ apple-ocr --help\n\nusage: apple-ocr [-h] [-j] [-p] [-l LANG] [--pdf2image-only] [--pdf2image-dir PDF2IMAGE_DIR] [-s START_MARKER_INCLUSIVE] [-S START_MARKER_EXCLUSIVE] [-e END_MARKER_INCLUSIVE] [-E END_MARKER] [-V] file_path\n\nExtract text from an image or PDF using Apple's Vision framework.\n\npositional arguments:\n  file_path             Path to the image or PDF file.\n\noptions:\n  -h, --help            show this help message and exit\n  -j, --json            Output results in JSON format.\n  -p, --pdf             Specify if the input file is a PDF.\n  -l LANG, --lang LANG  Specify the language for text recognition (e.g., eng,\n                        fra, deu, zh-Hans for Simplified Chinese, zh-Hant for\n                        Traditional Chinese). Default is 'zh-Hant', which\n                        works with images containing both Chinese characters\n                        and latin letters.\n  --pdf2image-only      Only convert PDF to images without performing OCR.\n  --pdf2image-dir PDF2IMAGE_DIR\n                        Specify the directory to store output images. By\n                        default, a secure temporary directory is created.\n  -s START_MARKER_INCLUSIVE, --start-marker-inclusive START_MARKER_INCLUSIVE\n                        Specify the start marker (included, as the first line of the extracted text) for text extraction in PDF.\n  -S START_MARKER_EXCLUSIVE, --start-marker-exclusive START_MARKER_EXCLUSIVE\n                        Specify the start marker (excluded, as the first line of the extracted text) for text extraction in PDF.\n  -e END_MARKER_INCLUSIVE, --end-marker-inclusive END_MARKER_INCLUSIVE\n                        Specify the end marker (included, as the last line of the extracted text) for text extraction in PDF.\n  -E END_MARKER, --end-marker END_MARKER\n                        Specify the end marker (excluded, as the last line of the extracted text) for text extraction in PDF.\n  -V, --version         show program's version number and exit\n```\n\n### As a Library\n\nYou can also use the utility functions in your own Python code:\n\n```python\nfrom apple_vision_utils.utils import image_to_text, pdf_to_images, process_pdf, clip_results\n\n# Extract text from an image\nresults = image_to_text(\"path/to/image.png\", lang=\"eng\")\n\n# Convert PDF to images\nimages = pdf_to_images(\"path/to/document.pdf\")\n\n# Process PDF for text recognition\npdf_results = process_pdf(\"path/to/document.pdf\", lang=\"eng\")\n\n# Clip text results based on markers\nclipped_results = clip_results(results, start_marker_inclusive=\"Start\", end_marker_exclusive=\"End\")\n```\n\n## Develop\n\n```\n$ git clone https://github.com/tddschn/apple-vision-utils.git\n$ cd apple-vision-utils\n$ poetry install\n```\n\n## Test\n\n```\n# in the root of the project\npoetry install\npoetry shell\ncd tests && ./test.sh\n```",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Fast and accurate OCR on images and PDFs using Apple Vision framework directly from command line.",
    "version": "1.0.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/tddschn/apple-vision-utils/issues",
        "Homepage": "https://github.com/tddschn/apple-vision-utils",
        "Repository": "https://github.com/tddschn/apple-vision-utils"
    },
    "split_keywords": [
        "ocr",
        " apple-vision-framework"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c6b2ab79ad54ad6882d421825c904b3d6818de2cc690b0f693c3620d30d77247",
                "md5": "a285de5e880376c5c8d6d59c9cb91f4c",
                "sha256": "eb8aee26f820b9f053824c52a7b2f666f6a3452d14b46bdade40a11257b152e5"
            },
            "downloads": -1,
            "filename": "apple_vision_utils-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a285de5e880376c5c8d6d59c9cb91f4c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.11",
            "size": 7561,
            "upload_time": "2025-01-31T01:53:35",
            "upload_time_iso_8601": "2025-01-31T01:53:35.391267Z",
            "url": "https://files.pythonhosted.org/packages/c6/b2/ab79ad54ad6882d421825c904b3d6818de2cc690b0f693c3620d30d77247/apple_vision_utils-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "40861947d4267acdbe80883e425a0956d2c2ff8ea9554411b0c0e72dd2565e0e",
                "md5": "9237274e63d37984ddc7798c3f75db0e",
                "sha256": "574d560a5d0b1885ec09e02c31d8c26678ce58ecd6d4fb767dbb81854bea60d0"
            },
            "downloads": -1,
            "filename": "apple_vision_utils-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "9237274e63d37984ddc7798c3f75db0e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.11",
            "size": 5920,
            "upload_time": "2025-01-31T01:53:37",
            "upload_time_iso_8601": "2025-01-31T01:53:37.493908Z",
            "url": "https://files.pythonhosted.org/packages/40/86/1947d4267acdbe80883e425a0956d2c2ff8ea9554411b0c0e72dd2565e0e/apple_vision_utils-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-31 01:53:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tddschn",
    "github_project": "apple-vision-utils",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "apple-vision-utils"
}

Teddy Xinyuan Chen