# bOCR: OCR Framework with Vision LLMs
**bOCR** is an Optical Character Recognition (OCR) framework that uses Vision Large Language Models (VLLMs) for text extraction and document processing.
## Features
- **Minimal Setup**: Requires just a single backbone file (e.g., `qwen.py` or `ollamas.py`) for OCR execution, making it lightweight and easy to use.
- **Broad Vision LLM Support**: Integrates with vision LLMs like `Qwen`, `Llama`, `Phi`, and various VLLMs included in the `Ollama` package.
- **Customizable Prompts**: Fine-tune OCR output using either a custom or default prompt.
- **Automated Preprocessing**: Image denoising, resizing, and PDF-to-image conversion.
- **Postprocessing & Export**: Supports merging pages and multiple export formats (`plain`, `markdown`, `docx`, `pdf`).
- **Configurable Pipeline**: A single `Config` object centralizes OCR settings.
- **Detailed Logging**: Integrated verbose logging for insights and debugging.
---
## Installation
### Install from PyPI (Recommended)
```bash
pip install bocr
```
### Install from Source (Development Version)
```bash
git clone https://github.com/adrianphoulady/bocr.git
cd bocr
pip install .
```
### Required Dependencies
For PDF and document processing, `poppler`, `pandoc`, and LaTeX are also required. You can install them as follows:
#### Linux (Debian/Ubuntu)
```bash
sudo apt install poppler-utils pandoc texlive-xetex texlive-fonts-recommended lmodern
```
#### macOS (using Homebrew)
```bash
brew install poppler pandoc --cask mactex-no-gui
```
#### Windows (using Chocolatey)
```powershell
choco install poppler pandoc miktex
```
---
## Quick Start
### Simple Example (Single File OCR)
Any backbone file in the `backbones` module, like `qwen.py`, is all you need to run OCR on an image:
```python
from bocr.backbones.qwen import extract_text
result = extract_text("sample1.png")
print(result)
```
---
### Advanced Usage
```python
from bocr import Config, ocr
config = Config(model_id="Qwen/Qwen2-VL-7B-Instruct", export_results=True, export_format="pdf", verbose=True)
files = ["sample2.pdf"]
results = ocr(files, config)
print(results)
```
### Command Line Example
```bash
bocr sample1.jpg --export-results --export-format docx --verbose
```
---
## Configuration
The `Config` class centralizes OCR settings. Key parameters:
| Parameter | Type | Description | Default |
|------------------|--------------|--------------------------------------------------------|-------------------------------|
| `prompt` | `str`/`None` | Custom OCR prompt or `None` for default. | `None` |
| `model_id` | `str` | Vision LLM model identifier. | `Qwen/Qwen2.5-VL-3B-Instruct` |
| `max_new_tokens` | `int` | Max tokens generated by model. | `1024` |
| `preprocess` | `bool` | Enable preprocessing of input files. | `False` |
| `resolution` | `int` | DPI for PDF-to-image conversion. | `150` |
| `max_image_size` | `int`/`None` | Resize images to a max size. No resizing if `None`. | `1920` |
| `result_format` | `str` | Output format (`plain`, `markdown`). | `md` |
| `merge_text` | `bool` | Merge extracted text. | `False` |
| `export_results` | `bool` | Save results to files. | `False` |
| `export_format` | `str` | File output format (`txt`, `md`, `docx`, `pdf`). | `md` |
| `export_dir` | `str`/`None` | Directory for output files. `./ocr_exports` if `None`. | `None` |
| `verbose` | `bool` | Enables detailed logging for debugging. | `False` |
---
## OCR Pipeline
### 1. Preprocessing
- **URL Handling**: Downloads remote files if input is a URL.
- **PDF Conversion**: Converts PDFs into image format (requires `poppler` installed and in `PATH`).
- **Image Enhancement**: Applies denoising and contrast adjustment.
- **Resizing**: Optimizes images for Vision LLMs.
### 2. Text Extraction
- Extracts text using Vision LLMs, with support for custom prompts for tailored OCR instructions.
### 3. Postprocessing
- Formats and merges extracted text in specified format.
- Converts it into specified export formats (e.g., Markdown, PDF).
- Saves results if configured.
---
## Logging
Enable logging by setting `verbose=True` in the `Config` object. Logs provide insights into preprocessing, extraction, and postprocessing steps.
---
## Supported Models
bOCR supports Vision LLMs such as:
- `Qwen/Qwen2.5-VL-3B-Instruct`
- `Qwen/Qwen2.5-VL-7B-Instruct`
- `Qwen/Qwen2.5-VL-72B-Instruct`
- `Qwen/Qwen2-VL-2B-Instruct`
- `Qwen/Qwen2-VL-7B-Instruct`
- `Qwen/Qwen2-VL-72B-Instruct`
- `Qwen/QVQ-72B-Preview`
- `meta-llama/Llama-3.2-11B-Vision-Instruct`
- `meta-llama/Llama-3.2-90B-Vision-Instruct`
- `microsoft/Phi-3.5-vision-instruct`
- `llama3.2-vision:11b` from Ollama
- `llama3.2-vision:90b` from Ollama
Additional models can be supported by implementing a new backbone in `bocr/backbones/` and updating `mappings.yaml`.
---
## License
This project is licensed under the MIT License.
Raw data
{
"_id": null,
"home_page": null,
"name": "bocr",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "ocr, vision, llm, vllm, bocr, text extraction, qwen-vl, llama-vision, phi-vision, ollama",
"author": null,
"author_email": "Adrian Phoulady <adrian.phoulady@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/60/2f/a7ecad814ecf6cc96c9af87e5e8e1344aac7df407827521c2016706ae7b1/bocr-0.2.0.tar.gz",
"platform": null,
"description": "# bOCR: OCR Framework with Vision LLMs\n\n**bOCR** is an Optical Character Recognition (OCR) framework that uses Vision Large Language Models (VLLMs) for text extraction and document processing.\n\n## Features\n\n- **Minimal Setup**: Requires just a single backbone file (e.g., `qwen.py` or `ollamas.py`) for OCR execution, making it lightweight and easy to use.\n- **Broad Vision LLM Support**: Integrates with vision LLMs like `Qwen`, `Llama`, `Phi`, and various VLLMs included in the `Ollama` package.\n- **Customizable Prompts**: Fine-tune OCR output using either a custom or default prompt.\n- **Automated Preprocessing**: Image denoising, resizing, and PDF-to-image conversion.\n- **Postprocessing & Export**: Supports merging pages and multiple export formats (`plain`, `markdown`, `docx`, `pdf`).\n- **Configurable Pipeline**: A single `Config` object centralizes OCR settings.\n- **Detailed Logging**: Integrated verbose logging for insights and debugging.\n\n---\n\n## Installation\n\n### Install from PyPI (Recommended)\n\n```bash\npip install bocr\n```\n\n### Install from Source (Development Version)\n\n```bash\ngit clone https://github.com/adrianphoulady/bocr.git\ncd bocr\npip install .\n```\n\n### Required Dependencies\n\nFor PDF and document processing, `poppler`, `pandoc`, and LaTeX are also required. You can install them as follows:\n\n#### Linux (Debian/Ubuntu)\n```bash\nsudo apt install poppler-utils pandoc texlive-xetex texlive-fonts-recommended lmodern\n```\n\n#### macOS (using Homebrew)\n```bash\nbrew install poppler pandoc --cask mactex-no-gui\n```\n\n#### Windows (using Chocolatey)\n```powershell\nchoco install poppler pandoc miktex\n```\n\n---\n\n## Quick Start\n\n### Simple Example (Single File OCR)\n\nAny backbone file in the `backbones` module, like `qwen.py`, is all you need to run OCR on an image:\n\n```python\nfrom bocr.backbones.qwen import extract_text\n\nresult = extract_text(\"sample1.png\")\nprint(result)\n```\n\n---\n\n### Advanced Usage\n\n```python\nfrom bocr import Config, ocr\n\nconfig = Config(model_id=\"Qwen/Qwen2-VL-7B-Instruct\", export_results=True, export_format=\"pdf\", verbose=True)\nfiles = [\"sample2.pdf\"]\nresults = ocr(files, config)\nprint(results)\n```\n\n### Command Line Example\n\n```bash\nbocr sample1.jpg --export-results --export-format docx --verbose\n```\n\n---\n\n## Configuration\n\nThe `Config` class centralizes OCR settings. Key parameters:\n\n| Parameter | Type | Description | Default |\n|------------------|--------------|--------------------------------------------------------|-------------------------------|\n| `prompt` | `str`/`None` | Custom OCR prompt or `None` for default. | `None` |\n| `model_id` | `str` | Vision LLM model identifier. | `Qwen/Qwen2.5-VL-3B-Instruct` |\n| `max_new_tokens` | `int` | Max tokens generated by model. | `1024` |\n| `preprocess` | `bool` | Enable preprocessing of input files. | `False` |\n| `resolution` | `int` | DPI for PDF-to-image conversion. | `150` |\n| `max_image_size` | `int`/`None` | Resize images to a max size. No resizing if `None`. | `1920` |\n| `result_format` | `str` | Output format (`plain`, `markdown`). | `md` |\n| `merge_text` | `bool` | Merge extracted text. | `False` |\n| `export_results` | `bool` | Save results to files. | `False` |\n| `export_format` | `str` | File output format (`txt`, `md`, `docx`, `pdf`). | `md` |\n| `export_dir` | `str`/`None` | Directory for output files. `./ocr_exports` if `None`. | `None` |\n| `verbose` | `bool` | Enables detailed logging for debugging. | `False` |\n\n---\n\n## OCR Pipeline\n\n### 1. Preprocessing\n\n- **URL Handling**: Downloads remote files if input is a URL.\n- **PDF Conversion**: Converts PDFs into image format (requires `poppler` installed and in `PATH`).\n- **Image Enhancement**: Applies denoising and contrast adjustment.\n- **Resizing**: Optimizes images for Vision LLMs.\n\n### 2. Text Extraction\n\n- Extracts text using Vision LLMs, with support for custom prompts for tailored OCR instructions.\n\n### 3. Postprocessing\n\n- Formats and merges extracted text in specified format.\n- Converts it into specified export formats (e.g., Markdown, PDF).\n- Saves results if configured.\n\n---\n\n## Logging\n\nEnable logging by setting `verbose=True` in the `Config` object. Logs provide insights into preprocessing, extraction, and postprocessing steps.\n\n---\n\n## Supported Models\n\nbOCR supports Vision LLMs such as:\n\n- `Qwen/Qwen2.5-VL-3B-Instruct`\n- `Qwen/Qwen2.5-VL-7B-Instruct`\n- `Qwen/Qwen2.5-VL-72B-Instruct`\n- `Qwen/Qwen2-VL-2B-Instruct`\n- `Qwen/Qwen2-VL-7B-Instruct`\n- `Qwen/Qwen2-VL-72B-Instruct`\n- `Qwen/QVQ-72B-Preview`\n- `meta-llama/Llama-3.2-11B-Vision-Instruct`\n- `meta-llama/Llama-3.2-90B-Vision-Instruct`\n- `microsoft/Phi-3.5-vision-instruct`\n- `llama3.2-vision:11b` from Ollama\n- `llama3.2-vision:90b` from Ollama\n\nAdditional models can be supported by implementing a new backbone in `bocr/backbones/` and updating `mappings.yaml`.\n\n---\n\n## License\n\nThis project is licensed under the MIT License.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python package for OCR using Vision LLMs",
"version": "0.2.0",
"project_urls": {
"Repository": "https://github.com/adrianphoulady/bocr"
},
"split_keywords": [
"ocr",
" vision",
" llm",
" vllm",
" bocr",
" text extraction",
" qwen-vl",
" llama-vision",
" phi-vision",
" ollama"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "7593c592b15c0384a4bbe7b7597f2d8736b55ac29128903b42042285d3a6acbe",
"md5": "0e6c99a3b86426755f7c825bcdff288d",
"sha256": "280f29fce59314f8832a157f6f952f8aaefa20d2287a9d37cb0cc0448aff8941"
},
"downloads": -1,
"filename": "bocr-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0e6c99a3b86426755f7c825bcdff288d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 19426,
"upload_time": "2025-02-22T13:33:10",
"upload_time_iso_8601": "2025-02-22T13:33:10.469627Z",
"url": "https://files.pythonhosted.org/packages/75/93/c592b15c0384a4bbe7b7597f2d8736b55ac29128903b42042285d3a6acbe/bocr-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "602fa7ecad814ecf6cc96c9af87e5e8e1344aac7df407827521c2016706ae7b1",
"md5": "a81bfa7185a9d18ed139cdbcc28f493c",
"sha256": "031a8fe427e5cb1adf0671914ab21f3dd11ab249e1e529d9d49672b79df13b48"
},
"downloads": -1,
"filename": "bocr-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "a81bfa7185a9d18ed139cdbcc28f493c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 16238,
"upload_time": "2025-02-22T13:33:12",
"upload_time_iso_8601": "2025-02-22T13:33:12.281946Z",
"url": "https://files.pythonhosted.org/packages/60/2f/a7ecad814ecf6cc96c9af87e5e8e1344aac7df407827521c2016706ae7b1/bocr-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-22 13:33:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "adrianphoulady",
"github_project": "bocr",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "bocr"
}