# Apple Vision Framework Python Utilities
Fast and accurate OCR on images and PDFs using Apple Vision framework (`pyobjc-framework-Vision`) directly from command line.
- [Apple Vision Framework Python Utilities](#apple-vision-framework-python-utilities)
- [Features](#features)
- [Demo](#demo)
- [Installation](#installation)
- [pipx](#pipx)
- [pip](#pip)
- [`uv tool` installation doesn't work](#uv-tool-installation-doesnt-work)
- [Usage](#usage)
- [Command Line](#command-line)
- [As a Library](#as-a-library)
- [Develop](#develop)
- [Test](#test)
## Features
- Fast and accurate, multi-language support (`-l`, `--lang`), powered by Apple's industry-strength Vision framework (`pyobjc-framework-Vision`).
- Supports all common input image formats: PNG, JPEG, TIFF and WebP.
- Supports PDF input (the file gets converted to images first). This tool does NOT assume a file is PDF just because it has a `.pdf` extension, you need to pass `-p`, `--pdf` flag.
- Outputs extracted text only by default, but can output in JSON format containing confidence of recognition for each line with `-j`, `--json` flag.
- Supports text clipping based on start and end markers (`-s`, `-S`, `-e`, `-E`).
## Demo
Below is the output of running the [tests](#test):
https://g.teddysc.me/96d5b1217b90035c163b3c97ce99112f
## Installation
Requires Python >= 3.11, <4.0.
Since this package uses Apple's Vision framework, it only works on macOS.
To OCR PDFs with `-p`, you need to install required dependency `poppler` with `brew install poppler` ([detailed guide](https://github.com/Belval/pdf2image)).
### pipx
This is the recommended installation method.
```
$ pipx install apple-vision-utils
```
### [pip](https://pypi.org/project/apple-vision-utils/)
```
$ pip install apple-vision-utils
```
### `uv tool` installation doesn't work
I tried to install this with `uv tool install` using different Python versions on Apple Silicon Mac, it didn't work. May be caused by some peculiarities of objc interfacing libs. Just use `pipx` for now.
## Usage
### Command Line
```
$ apple-ocr --help
usage: apple-ocr [-h] [-j] [-p] [-l LANG] [--pdf2image-only] [--pdf2image-dir PDF2IMAGE_DIR] [-s START_MARKER_INCLUSIVE] [-S START_MARKER_EXCLUSIVE] [-e END_MARKER_INCLUSIVE] [-E END_MARKER] [-V] file_path
Extract text from an image or PDF using Apple's Vision framework.
positional arguments:
file_path Path to the image or PDF file.
options:
-h, --help show this help message and exit
-j, --json Output results in JSON format.
-p, --pdf Specify if the input file is a PDF.
-l LANG, --lang LANG Specify the language for text recognition (e.g., eng,
fra, deu, zh-Hans for Simplified Chinese, zh-Hant for
Traditional Chinese). Default is 'zh-Hant', which
works with images containing both Chinese characters
and latin letters.
--pdf2image-only Only convert PDF to images without performing OCR.
--pdf2image-dir PDF2IMAGE_DIR
Specify the directory to store output images. By
default, a secure temporary directory is created.
-s START_MARKER_INCLUSIVE, --start-marker-inclusive START_MARKER_INCLUSIVE
Specify the start marker (included, as the first line of the extracted text) for text extraction in PDF.
-S START_MARKER_EXCLUSIVE, --start-marker-exclusive START_MARKER_EXCLUSIVE
Specify the start marker (excluded, as the first line of the extracted text) for text extraction in PDF.
-e END_MARKER_INCLUSIVE, --end-marker-inclusive END_MARKER_INCLUSIVE
Specify the end marker (included, as the last line of the extracted text) for text extraction in PDF.
-E END_MARKER, --end-marker END_MARKER
Specify the end marker (excluded, as the last line of the extracted text) for text extraction in PDF.
-V, --version show program's version number and exit
```
### As a Library
You can also use the utility functions in your own Python code:
```python
from apple_vision_utils.utils import image_to_text, pdf_to_images, process_pdf, clip_results
# Extract text from an image
results = image_to_text("path/to/image.png", lang="eng")
# Convert PDF to images
images = pdf_to_images("path/to/document.pdf")
# Process PDF for text recognition
pdf_results = process_pdf("path/to/document.pdf", lang="eng")
# Clip text results based on markers
clipped_results = clip_results(results, start_marker_inclusive="Start", end_marker_exclusive="End")
```
## Develop
```
$ git clone https://github.com/tddschn/apple-vision-utils.git
$ cd apple-vision-utils
$ poetry install
```
## Test
```
# in the root of the project
poetry install
poetry shell
cd tests && ./test.sh
```
Raw data
{
"_id": null,
"home_page": "https://github.com/tddschn/apple-vision-utils",
"name": "apple-vision-utils",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.11",
"maintainer_email": null,
"keywords": "ocr, apple-vision-framework",
"author": "Teddy Xinyuan Chen",
"author_email": "45612704+tddschn@users.noreply.github.com",
"download_url": "https://files.pythonhosted.org/packages/40/86/1947d4267acdbe80883e425a0956d2c2ff8ea9554411b0c0e72dd2565e0e/apple_vision_utils-1.0.2.tar.gz",
"platform": null,
"description": "# Apple Vision Framework Python Utilities\n\nFast and accurate OCR on images and PDFs using Apple Vision framework (`pyobjc-framework-Vision`) directly from command line.\n\n- [Apple Vision Framework Python Utilities](#apple-vision-framework-python-utilities)\n - [Features](#features)\n - [Demo](#demo)\n - [Installation](#installation)\n - [pipx](#pipx)\n - [pip](#pip)\n - [`uv tool` installation doesn't work](#uv-tool-installation-doesnt-work)\n - [Usage](#usage)\n - [Command Line](#command-line)\n - [As a Library](#as-a-library)\n - [Develop](#develop)\n - [Test](#test)\n\n## Features\n\n- Fast and accurate, multi-language support (`-l`, `--lang`), powered by Apple's industry-strength Vision framework (`pyobjc-framework-Vision`).\n- Supports all common input image formats: PNG, JPEG, TIFF and WebP.\n- Supports PDF input (the file gets converted to images first). This tool does NOT assume a file is PDF just because it has a `.pdf` extension, you need to pass `-p`, `--pdf` flag.\n- Outputs extracted text only by default, but can output in JSON format containing confidence of recognition for each line with `-j`, `--json` flag.\n- Supports text clipping based on start and end markers (`-s`, `-S`, `-e`, `-E`).\n\n## Demo\n\nBelow is the output of running the [tests](#test):\n\nhttps://g.teddysc.me/96d5b1217b90035c163b3c97ce99112f\n\n## Installation\n\nRequires Python >= 3.11, <4.0.\n\nSince this package uses Apple's Vision framework, it only works on macOS.\n\nTo OCR PDFs with `-p`, you need to install required dependency `poppler` with `brew install poppler` ([detailed guide](https://github.com/Belval/pdf2image)).\n\n### pipx\n\nThis is the recommended installation method.\n\n```\n$ pipx install apple-vision-utils\n```\n\n### [pip](https://pypi.org/project/apple-vision-utils/)\n\n```\n$ pip install apple-vision-utils\n```\n\n### `uv tool` installation doesn't work\n\nI tried to install this with `uv tool install` using different Python versions on Apple Silicon Mac, it didn't work. May be caused by some peculiarities of objc interfacing libs. Just use `pipx` for now.\n\n## Usage\n\n### Command Line\n\n```\n$ apple-ocr --help\n\nusage: apple-ocr [-h] [-j] [-p] [-l LANG] [--pdf2image-only] [--pdf2image-dir PDF2IMAGE_DIR] [-s START_MARKER_INCLUSIVE] [-S START_MARKER_EXCLUSIVE] [-e END_MARKER_INCLUSIVE] [-E END_MARKER] [-V] file_path\n\nExtract text from an image or PDF using Apple's Vision framework.\n\npositional arguments:\n file_path Path to the image or PDF file.\n\noptions:\n -h, --help show this help message and exit\n -j, --json Output results in JSON format.\n -p, --pdf Specify if the input file is a PDF.\n -l LANG, --lang LANG Specify the language for text recognition (e.g., eng,\n fra, deu, zh-Hans for Simplified Chinese, zh-Hant for\n Traditional Chinese). Default is 'zh-Hant', which\n works with images containing both Chinese characters\n and latin letters.\n --pdf2image-only Only convert PDF to images without performing OCR.\n --pdf2image-dir PDF2IMAGE_DIR\n Specify the directory to store output images. By\n default, a secure temporary directory is created.\n -s START_MARKER_INCLUSIVE, --start-marker-inclusive START_MARKER_INCLUSIVE\n Specify the start marker (included, as the first line of the extracted text) for text extraction in PDF.\n -S START_MARKER_EXCLUSIVE, --start-marker-exclusive START_MARKER_EXCLUSIVE\n Specify the start marker (excluded, as the first line of the extracted text) for text extraction in PDF.\n -e END_MARKER_INCLUSIVE, --end-marker-inclusive END_MARKER_INCLUSIVE\n Specify the end marker (included, as the last line of the extracted text) for text extraction in PDF.\n -E END_MARKER, --end-marker END_MARKER\n Specify the end marker (excluded, as the last line of the extracted text) for text extraction in PDF.\n -V, --version show program's version number and exit\n```\n\n### As a Library\n\nYou can also use the utility functions in your own Python code:\n\n```python\nfrom apple_vision_utils.utils import image_to_text, pdf_to_images, process_pdf, clip_results\n\n# Extract text from an image\nresults = image_to_text(\"path/to/image.png\", lang=\"eng\")\n\n# Convert PDF to images\nimages = pdf_to_images(\"path/to/document.pdf\")\n\n# Process PDF for text recognition\npdf_results = process_pdf(\"path/to/document.pdf\", lang=\"eng\")\n\n# Clip text results based on markers\nclipped_results = clip_results(results, start_marker_inclusive=\"Start\", end_marker_exclusive=\"End\")\n```\n\n## Develop\n\n```\n$ git clone https://github.com/tddschn/apple-vision-utils.git\n$ cd apple-vision-utils\n$ poetry install\n```\n\n## Test\n\n```\n# in the root of the project\npoetry install\npoetry shell\ncd tests && ./test.sh\n```",
"bugtrack_url": null,
"license": "MIT",
"summary": "Fast and accurate OCR on images and PDFs using Apple Vision framework directly from command line.",
"version": "1.0.2",
"project_urls": {
"Bug Tracker": "https://github.com/tddschn/apple-vision-utils/issues",
"Homepage": "https://github.com/tddschn/apple-vision-utils",
"Repository": "https://github.com/tddschn/apple-vision-utils"
},
"split_keywords": [
"ocr",
" apple-vision-framework"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c6b2ab79ad54ad6882d421825c904b3d6818de2cc690b0f693c3620d30d77247",
"md5": "a285de5e880376c5c8d6d59c9cb91f4c",
"sha256": "eb8aee26f820b9f053824c52a7b2f666f6a3452d14b46bdade40a11257b152e5"
},
"downloads": -1,
"filename": "apple_vision_utils-1.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a285de5e880376c5c8d6d59c9cb91f4c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.11",
"size": 7561,
"upload_time": "2025-01-31T01:53:35",
"upload_time_iso_8601": "2025-01-31T01:53:35.391267Z",
"url": "https://files.pythonhosted.org/packages/c6/b2/ab79ad54ad6882d421825c904b3d6818de2cc690b0f693c3620d30d77247/apple_vision_utils-1.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "40861947d4267acdbe80883e425a0956d2c2ff8ea9554411b0c0e72dd2565e0e",
"md5": "9237274e63d37984ddc7798c3f75db0e",
"sha256": "574d560a5d0b1885ec09e02c31d8c26678ce58ecd6d4fb767dbb81854bea60d0"
},
"downloads": -1,
"filename": "apple_vision_utils-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "9237274e63d37984ddc7798c3f75db0e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.11",
"size": 5920,
"upload_time": "2025-01-31T01:53:37",
"upload_time_iso_8601": "2025-01-31T01:53:37.493908Z",
"url": "https://files.pythonhosted.org/packages/40/86/1947d4267acdbe80883e425a0956d2c2ff8ea9554411b0c0e72dd2565e0e/apple_vision_utils-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-31 01:53:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tddschn",
"github_project": "apple-vision-utils",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "apple-vision-utils"
}