bocr

Name	bocr JSON
Version	0.2.0 JSON
	download
home_page	None
Summary	A Python package for OCR using Vision LLMs
upload_time	2025-02-22 13:33:12
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	MIT
keywords	ocr vision llm vllm bocr text extraction qwen-vl llama-vision phi-vision ollama
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # bOCR: OCR Framework with Vision LLMs

**bOCR** is an Optical Character Recognition (OCR) framework that uses Vision Large Language Models (VLLMs) for text extraction and document processing.

## Features

- **Minimal Setup**: Requires just a single backbone file (e.g., `qwen.py` or `ollamas.py`) for OCR execution, making it lightweight and easy to use.
- **Broad Vision LLM Support**: Integrates with vision LLMs like `Qwen`, `Llama`, `Phi`, and various VLLMs included in the `Ollama` package.
- **Customizable Prompts**: Fine-tune OCR output using either a custom or default prompt.
- **Automated Preprocessing**: Image denoising, resizing, and PDF-to-image conversion.
- **Postprocessing & Export**: Supports merging pages and multiple export formats (`plain`, `markdown`, `docx`, `pdf`).
- **Configurable Pipeline**: A single `Config` object centralizes OCR settings.
- **Detailed Logging**: Integrated verbose logging for insights and debugging.

---

## Installation

### Install from PyPI (Recommended)

```bash
pip install bocr
```

### Install from Source (Development Version)

```bash
git clone https://github.com/adrianphoulady/bocr.git
cd bocr
pip install .
```

### Required Dependencies

For PDF and document processing, `poppler`, `pandoc`, and LaTeX are also required. You can install them as follows:

#### Linux (Debian/Ubuntu)
```bash
sudo apt install poppler-utils pandoc texlive-xetex texlive-fonts-recommended lmodern
```

#### macOS (using Homebrew)
```bash
brew install poppler pandoc --cask mactex-no-gui
```

#### Windows (using Chocolatey)
```powershell
choco install poppler pandoc miktex
```

---

## Quick Start

### Simple Example (Single File OCR)

Any backbone file in the `backbones` module, like `qwen.py`, is all you need to run OCR on an image:

```python
from bocr.backbones.qwen import extract_text

result = extract_text("sample1.png")
print(result)
```

---

### Advanced Usage

```python
from bocr import Config, ocr

config = Config(model_id="Qwen/Qwen2-VL-7B-Instruct", export_results=True, export_format="pdf", verbose=True)
files = ["sample2.pdf"]
results = ocr(files, config)
print(results)
```

### Command Line Example

```bash
bocr sample1.jpg --export-results --export-format docx --verbose
```

---

## Configuration

The `Config` class centralizes OCR settings. Key parameters:

| Parameter        | Type         | Description                                            | Default                       |
|------------------|--------------|--------------------------------------------------------|-------------------------------|
| `prompt`         | `str`/`None` | Custom OCR prompt or `None` for default.               | `None`                        |
| `model_id`       | `str`        | Vision LLM model identifier.                           | `Qwen/Qwen2.5-VL-3B-Instruct` |
| `max_new_tokens` | `int`        | Max tokens generated by model.                         | `1024`                        |
| `preprocess`     | `bool`       | Enable preprocessing of input files.                   | `False`                       |
| `resolution`     | `int`        | DPI for PDF-to-image conversion.                       | `150`                         |
| `max_image_size` | `int`/`None` | Resize images to a max size. No resizing if `None`.    | `1920`                        |
| `result_format`  | `str`        | Output format (`plain`, `markdown`).                   | `md`                          |
| `merge_text`     | `bool`       | Merge extracted text.                                  | `False`                       |
| `export_results` | `bool`       | Save results to files.                                 | `False`                       |
| `export_format`  | `str`        | File output format (`txt`, `md`, `docx`, `pdf`).       | `md`                          |
| `export_dir`     | `str`/`None` | Directory for output files. `./ocr_exports` if `None`. | `None`                        |
| `verbose`        | `bool`       | Enables detailed logging for debugging.                | `False`                       |

---

## OCR Pipeline

### 1. Preprocessing

- **URL Handling**: Downloads remote files if input is a URL.
- **PDF Conversion**: Converts PDFs into image format (requires `poppler` installed and in `PATH`).
- **Image Enhancement**: Applies denoising and contrast adjustment.
- **Resizing**: Optimizes images for Vision LLMs.

### 2. Text Extraction

- Extracts text using Vision LLMs, with support for custom prompts for tailored OCR instructions.

### 3. Postprocessing

- Formats and merges extracted text in specified format.
- Converts it into specified export formats (e.g., Markdown, PDF).
- Saves results if configured.

---

## Logging

Enable logging by setting `verbose=True` in the `Config` object. Logs provide insights into preprocessing, extraction, and postprocessing steps.

---

## Supported Models

bOCR supports Vision LLMs such as:

- `Qwen/Qwen2.5-VL-3B-Instruct`
- `Qwen/Qwen2.5-VL-7B-Instruct`
- `Qwen/Qwen2.5-VL-72B-Instruct`
- `Qwen/Qwen2-VL-2B-Instruct`
- `Qwen/Qwen2-VL-7B-Instruct`
- `Qwen/Qwen2-VL-72B-Instruct`
- `Qwen/QVQ-72B-Preview`
- `meta-llama/Llama-3.2-11B-Vision-Instruct`
- `meta-llama/Llama-3.2-90B-Vision-Instruct`
- `microsoft/Phi-3.5-vision-instruct`
- `llama3.2-vision:11b` from Ollama
- `llama3.2-vision:90b` from Ollama

Additional models can be supported by implementing a new backbone in `bocr/backbones/` and updating `mappings.yaml`.

---

## License

This project is licensed under the MIT License.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "bocr",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "ocr, vision, llm, vllm, bocr, text extraction, qwen-vl, llama-vision, phi-vision, ollama",
    "author": null,
    "author_email": "Adrian Phoulady <adrian.phoulady@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/60/2f/a7ecad814ecf6cc96c9af87e5e8e1344aac7df407827521c2016706ae7b1/bocr-0.2.0.tar.gz",
    "platform": null,
    "description": "# bOCR: OCR Framework with Vision LLMs\n\n**bOCR** is an Optical Character Recognition (OCR) framework that uses Vision Large Language Models (VLLMs) for text extraction and document processing.\n\n## Features\n\n- **Minimal Setup**: Requires just a single backbone file (e.g., `qwen.py` or `ollamas.py`) for OCR execution, making it lightweight and easy to use.\n- **Broad Vision LLM Support**: Integrates with vision LLMs like `Qwen`, `Llama`, `Phi`, and various VLLMs included in the `Ollama` package.\n- **Customizable Prompts**: Fine-tune OCR output using either a custom or default prompt.\n- **Automated Preprocessing**: Image denoising, resizing, and PDF-to-image conversion.\n- **Postprocessing & Export**: Supports merging pages and multiple export formats (`plain`, `markdown`, `docx`, `pdf`).\n- **Configurable Pipeline**: A single `Config` object centralizes OCR settings.\n- **Detailed Logging**: Integrated verbose logging for insights and debugging.\n\n---\n\n## Installation\n\n### Install from PyPI (Recommended)\n\n```bash\npip install bocr\n```\n\n### Install from Source (Development Version)\n\n```bash\ngit clone https://github.com/adrianphoulady/bocr.git\ncd bocr\npip install .\n```\n\n### Required Dependencies\n\nFor PDF and document processing, `poppler`, `pandoc`, and LaTeX are also required. You can install them as follows:\n\n#### Linux (Debian/Ubuntu)\n```bash\nsudo apt install poppler-utils pandoc texlive-xetex texlive-fonts-recommended lmodern\n```\n\n#### macOS (using Homebrew)\n```bash\nbrew install poppler pandoc --cask mactex-no-gui\n```\n\n#### Windows (using Chocolatey)\n```powershell\nchoco install poppler pandoc miktex\n```\n\n---\n\n## Quick Start\n\n### Simple Example (Single File OCR)\n\nAny backbone file in the `backbones` module, like `qwen.py`, is all you need to run OCR on an image:\n\n```python\nfrom bocr.backbones.qwen import extract_text\n\nresult = extract_text(\"sample1.png\")\nprint(result)\n```\n\n---\n\n### Advanced Usage\n\n```python\nfrom bocr import Config, ocr\n\nconfig = Config(model_id=\"Qwen/Qwen2-VL-7B-Instruct\", export_results=True, export_format=\"pdf\", verbose=True)\nfiles = [\"sample2.pdf\"]\nresults = ocr(files, config)\nprint(results)\n```\n\n### Command Line Example\n\n```bash\nbocr sample1.jpg --export-results --export-format docx --verbose\n```\n\n---\n\n## Configuration\n\nThe `Config` class centralizes OCR settings. Key parameters:\n\n| Parameter        | Type         | Description                                            | Default                       |\n|------------------|--------------|--------------------------------------------------------|-------------------------------|\n| `prompt`         | `str`/`None` | Custom OCR prompt or `None` for default.               | `None`                        |\n| `model_id`       | `str`        | Vision LLM model identifier.                           | `Qwen/Qwen2.5-VL-3B-Instruct` |\n| `max_new_tokens` | `int`        | Max tokens generated by model.                         | `1024`                        |\n| `preprocess`     | `bool`       | Enable preprocessing of input files.                   | `False`                       |\n| `resolution`     | `int`        | DPI for PDF-to-image conversion.                       | `150`                         |\n| `max_image_size` | `int`/`None` | Resize images to a max size. No resizing if `None`.    | `1920`                        |\n| `result_format`  | `str`        | Output format (`plain`, `markdown`).                   | `md`                          |\n| `merge_text`     | `bool`       | Merge extracted text.                                  | `False`                       |\n| `export_results` | `bool`       | Save results to files.                                 | `False`                       |\n| `export_format`  | `str`        | File output format (`txt`, `md`, `docx`, `pdf`).       | `md`                          |\n| `export_dir`     | `str`/`None` | Directory for output files. `./ocr_exports` if `None`. | `None`                        |\n| `verbose`        | `bool`       | Enables detailed logging for debugging.                | `False`                       |\n\n---\n\n## OCR Pipeline\n\n### 1. Preprocessing\n\n- **URL Handling**: Downloads remote files if input is a URL.\n- **PDF Conversion**: Converts PDFs into image format (requires `poppler` installed and in `PATH`).\n- **Image Enhancement**: Applies denoising and contrast adjustment.\n- **Resizing**: Optimizes images for Vision LLMs.\n\n### 2. Text Extraction\n\n- Extracts text using Vision LLMs, with support for custom prompts for tailored OCR instructions.\n\n### 3. Postprocessing\n\n- Formats and merges extracted text in specified format.\n- Converts it into specified export formats (e.g., Markdown, PDF).\n- Saves results if configured.\n\n---\n\n## Logging\n\nEnable logging by setting `verbose=True` in the `Config` object. Logs provide insights into preprocessing, extraction, and postprocessing steps.\n\n---\n\n## Supported Models\n\nbOCR supports Vision LLMs such as:\n\n- `Qwen/Qwen2.5-VL-3B-Instruct`\n- `Qwen/Qwen2.5-VL-7B-Instruct`\n- `Qwen/Qwen2.5-VL-72B-Instruct`\n- `Qwen/Qwen2-VL-2B-Instruct`\n- `Qwen/Qwen2-VL-7B-Instruct`\n- `Qwen/Qwen2-VL-72B-Instruct`\n- `Qwen/QVQ-72B-Preview`\n- `meta-llama/Llama-3.2-11B-Vision-Instruct`\n- `meta-llama/Llama-3.2-90B-Vision-Instruct`\n- `microsoft/Phi-3.5-vision-instruct`\n- `llama3.2-vision:11b` from Ollama\n- `llama3.2-vision:90b` from Ollama\n\nAdditional models can be supported by implementing a new backbone in `bocr/backbones/` and updating `mappings.yaml`.\n\n---\n\n## License\n\nThis project is licensed under the MIT License.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python package for OCR using Vision LLMs",
    "version": "0.2.0",
    "project_urls": {
        "Repository": "https://github.com/adrianphoulady/bocr"
    },
    "split_keywords": [
        "ocr",
        " vision",
        " llm",
        " vllm",
        " bocr",
        " text extraction",
        " qwen-vl",
        " llama-vision",
        " phi-vision",
        " ollama"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7593c592b15c0384a4bbe7b7597f2d8736b55ac29128903b42042285d3a6acbe",
                "md5": "0e6c99a3b86426755f7c825bcdff288d",
                "sha256": "280f29fce59314f8832a157f6f952f8aaefa20d2287a9d37cb0cc0448aff8941"
            },
            "downloads": -1,
            "filename": "bocr-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0e6c99a3b86426755f7c825bcdff288d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 19426,
            "upload_time": "2025-02-22T13:33:10",
            "upload_time_iso_8601": "2025-02-22T13:33:10.469627Z",
            "url": "https://files.pythonhosted.org/packages/75/93/c592b15c0384a4bbe7b7597f2d8736b55ac29128903b42042285d3a6acbe/bocr-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "602fa7ecad814ecf6cc96c9af87e5e8e1344aac7df407827521c2016706ae7b1",
                "md5": "a81bfa7185a9d18ed139cdbcc28f493c",
                "sha256": "031a8fe427e5cb1adf0671914ab21f3dd11ab249e1e529d9d49672b79df13b48"
            },
            "downloads": -1,
            "filename": "bocr-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a81bfa7185a9d18ed139cdbcc28f493c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 16238,
            "upload_time": "2025-02-22T13:33:12",
            "upload_time_iso_8601": "2025-02-22T13:33:12.281946Z",
            "url": "https://files.pythonhosted.org/packages/60/2f/a7ecad814ecf6cc96c9af87e5e8e1344aac7df407827521c2016706ae7b1/bocr-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-22 13:33:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "adrianphoulady",
    "github_project": "bocr",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "bocr"
}

None