vllmocr


Namevllmocr JSON
Version 0.3.8 PyPI version JSON
download
home_pageNone
SummaryOCR project using LLMs
upload_time2025-03-13 10:43:39
maintainerNone
docs_urlNone
authorNone
requires_pythonNone
licenseNone
keywords ocr llm image processing vllm
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # vllmocr

[![PyPI version](https://badge.fury.io/py/vllmocr.svg)](https://badge.fury.io/py/vllmocr)

`vllmocr` is a command-line tool that performs Optical Character Recognition (OCR) on images and PDFs using Large Language Models (LLMs). It supports multiple LLM providers, including OpenAI, Anthropic, Google, and local models via Ollama.

## Features

*   **Image and PDF OCR:** Extracts text from both images (PNG, JPG, JPEG) and PDF files.
*   **Multiple LLM Providers:**  Supports a variety of LLMs:
    *   **OpenAI:**  GPT-4o
    *   **Anthropic:** Claude 3 Haiku, Claude 3 Sonnet
    *   **Google:** Gemini 1.5 Pro
    *   **Ollama:**  (Local models) Llama3, MiniCPM, and other models supported by Ollama.
*   **Configurable:**  Settings, including the LLM provider and model, can be adjusted via a configuration file or environment variables.
*   **Image Preprocessing:** Includes optional image rotation for improved OCR accuracy.

## Installation

It is recommended to install `vllmocr` using `uv`:

```bash
uv pip install vllmocr
```

If you don't have `uv` installed, you can install it with:
```
pipx install uv
```
You may need to restart your shell session for `uv` to be available.

Alternatively, you can use `pip`:

```bash
pip install vllmocr
```

## Usage

The `vllmocr` command-line tool has two main subcommands: `image` and `pdf`.

**1.  Process a Single Image:**

```bash
vllmocr image <image_path> [options]
```

*   `<image_path>`:  The path to the image file (PNG, JPG, JPEG).

**Options:**

*   `--provider`:  The LLM provider to use (openai, anthropic, google, ollama).  Defaults to `openai`.
*   `--model`: The specific model to use (e.g., `gpt-4o`, `haiku`, `gemini-1.5-pro-002`, `llama3`).  Defaults to the provider's default model.
*   `--api-key`: The API key for the LLM provider. Overrides API keys from the config file or environment variables.
*    `--config`: Path to a TOML configuration file.
*   `--help`: Show the help message and exit.

**Example:**

```bash
vllmocr image my_image.jpg --provider anthropic --model haiku
```

**2. Process a PDF:**

```bash
vllmocr pdf <pdf_path> [options]
```

*   `<pdf_path>`: The path to the PDF file.

**Options:** (Same as `image` subcommand, including `--api-key`)

**Example:**

```bash
vllmocr pdf my_document.pdf --provider openai --model gpt-4o
```

## Configuration

`vllmocr` can be configured using a TOML file or environment variables.  The configuration file is searched for in the following locations (in order of precedence):

1.  A path specified with the `--config` command-line option.
2.  `./config.toml` (current working directory)
3.  `~/.config/vllmocr/config.toml` (user's home directory)
4.  `/etc/vllmocr/config.toml` (system-wide)

**config.toml (Example):**

```toml
[llm]
provider = "anthropic"  # Default provider
model = "haiku"        # Default model for the provider

[image_processing]
rotation = 0           # Image rotation in degrees (optional)

[api_keys]
openai = "YOUR_OPENAI_API_KEY"
anthropic = "YOUR_ANTHROPIC_API_KEY"
google = "YOUR_GOOGLE_API_KEY"
# Ollama doesn't require an API key
```

**Environment Variables:**

You can also set API keys using environment variables:

*   `VLLM_OCR_OPENAI_API_KEY`
*   `VLLM_OCR_ANTHROPIC_API_KEY`
*   `VLLM_OCR_GOOGLE_API_KEY`

Environment variables override settings in the configuration file.  This is the recommended way to set API keys for security reasons. You can also pass the API key directly via the `--api-key` command-line option, which takes the highest precedence.

## Development

To set up a development environment:

1.  Clone the repository:

    ```bash
    git clone https://github.com/<your-username>/vllmocr.git
    cd vllmocr
    ```

2.  Create and activate a virtual environment (using `uv`):

    ```bash
    uv venv
    uv pip install -e .[dev]
    ```

    This installs the package in editable mode (`-e`) along with development dependencies (like `pytest` and `pytest-mock`).

3.  Run tests:

    ```bash
    uv pip install pytest pytest-mock  # if not already installed as dev dependencies
    pytest
    ```

## License

This project is licensed under the MIT License (see `pyproject.toml` for details).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "vllmocr",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "ocr, llm, image processing, vllm",
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/94/c7/8fea86b27209bbda89c603b86e5013e2ad9160adc0fd13a812cf1cc6fcf1/vllmocr-0.3.8.tar.gz",
    "platform": null,
    "description": "# vllmocr\n\n[![PyPI version](https://badge.fury.io/py/vllmocr.svg)](https://badge.fury.io/py/vllmocr)\n\n`vllmocr` is a command-line tool that performs Optical Character Recognition (OCR) on images and PDFs using Large Language Models (LLMs). It supports multiple LLM providers, including OpenAI, Anthropic, Google, and local models via Ollama.\n\n## Features\n\n*   **Image and PDF OCR:** Extracts text from both images (PNG, JPG, JPEG) and PDF files.\n*   **Multiple LLM Providers:**  Supports a variety of LLMs:\n    *   **OpenAI:**  GPT-4o\n    *   **Anthropic:** Claude 3 Haiku, Claude 3 Sonnet\n    *   **Google:** Gemini 1.5 Pro\n    *   **Ollama:**  (Local models) Llama3, MiniCPM, and other models supported by Ollama.\n*   **Configurable:**  Settings, including the LLM provider and model, can be adjusted via a configuration file or environment variables.\n*   **Image Preprocessing:** Includes optional image rotation for improved OCR accuracy.\n\n## Installation\n\nIt is recommended to install `vllmocr` using `uv`:\n\n```bash\nuv pip install vllmocr\n```\n\nIf you don't have `uv` installed, you can install it with:\n```\npipx install uv\n```\nYou may need to restart your shell session for `uv` to be available.\n\nAlternatively, you can use `pip`:\n\n```bash\npip install vllmocr\n```\n\n## Usage\n\nThe `vllmocr` command-line tool has two main subcommands: `image` and `pdf`.\n\n**1.  Process a Single Image:**\n\n```bash\nvllmocr image <image_path> [options]\n```\n\n*   `<image_path>`:  The path to the image file (PNG, JPG, JPEG).\n\n**Options:**\n\n*   `--provider`:  The LLM provider to use (openai, anthropic, google, ollama).  Defaults to `openai`.\n*   `--model`: The specific model to use (e.g., `gpt-4o`, `haiku`, `gemini-1.5-pro-002`, `llama3`).  Defaults to the provider's default model.\n*   `--api-key`: The API key for the LLM provider. Overrides API keys from the config file or environment variables.\n*    `--config`: Path to a TOML configuration file.\n*   `--help`: Show the help message and exit.\n\n**Example:**\n\n```bash\nvllmocr image my_image.jpg --provider anthropic --model haiku\n```\n\n**2. Process a PDF:**\n\n```bash\nvllmocr pdf <pdf_path> [options]\n```\n\n*   `<pdf_path>`: The path to the PDF file.\n\n**Options:** (Same as `image` subcommand, including `--api-key`)\n\n**Example:**\n\n```bash\nvllmocr pdf my_document.pdf --provider openai --model gpt-4o\n```\n\n## Configuration\n\n`vllmocr` can be configured using a TOML file or environment variables.  The configuration file is searched for in the following locations (in order of precedence):\n\n1.  A path specified with the `--config` command-line option.\n2.  `./config.toml` (current working directory)\n3.  `~/.config/vllmocr/config.toml` (user's home directory)\n4.  `/etc/vllmocr/config.toml` (system-wide)\n\n**config.toml (Example):**\n\n```toml\n[llm]\nprovider = \"anthropic\"  # Default provider\nmodel = \"haiku\"        # Default model for the provider\n\n[image_processing]\nrotation = 0           # Image rotation in degrees (optional)\n\n[api_keys]\nopenai = \"YOUR_OPENAI_API_KEY\"\nanthropic = \"YOUR_ANTHROPIC_API_KEY\"\ngoogle = \"YOUR_GOOGLE_API_KEY\"\n# Ollama doesn't require an API key\n```\n\n**Environment Variables:**\n\nYou can also set API keys using environment variables:\n\n*   `VLLM_OCR_OPENAI_API_KEY`\n*   `VLLM_OCR_ANTHROPIC_API_KEY`\n*   `VLLM_OCR_GOOGLE_API_KEY`\n\nEnvironment variables override settings in the configuration file.  This is the recommended way to set API keys for security reasons. You can also pass the API key directly via the `--api-key` command-line option, which takes the highest precedence.\n\n## Development\n\nTo set up a development environment:\n\n1.  Clone the repository:\n\n    ```bash\n    git clone https://github.com/<your-username>/vllmocr.git\n    cd vllmocr\n    ```\n\n2.  Create and activate a virtual environment (using `uv`):\n\n    ```bash\n    uv venv\n    uv pip install -e .[dev]\n    ```\n\n    This installs the package in editable mode (`-e`) along with development dependencies (like `pytest` and `pytest-mock`).\n\n3.  Run tests:\n\n    ```bash\n    uv pip install pytest pytest-mock  # if not already installed as dev dependencies\n    pytest\n    ```\n\n## License\n\nThis project is licensed under the MIT License (see `pyproject.toml` for details).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "OCR project using LLMs",
    "version": "0.3.8",
    "project_urls": null,
    "split_keywords": [
        "ocr",
        " llm",
        " image processing",
        " vllm"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "50bae514c1eb1d4b62728238a3b45fb691c2443a0d32ff5762d4c70bf77aab1f",
                "md5": "5faa4f0df3b422b49bd409195032ba16",
                "sha256": "6b1690a2a82c4c21f4e40c4b238f8d1b040738285f13fbf5c99dc2865cbdda85"
            },
            "downloads": -1,
            "filename": "vllmocr-0.3.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5faa4f0df3b422b49bd409195032ba16",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 14766,
            "upload_time": "2025-03-13T10:43:38",
            "upload_time_iso_8601": "2025-03-13T10:43:38.736642Z",
            "url": "https://files.pythonhosted.org/packages/50/ba/e514c1eb1d4b62728238a3b45fb691c2443a0d32ff5762d4c70bf77aab1f/vllmocr-0.3.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "94c78fea86b27209bbda89c603b86e5013e2ad9160adc0fd13a812cf1cc6fcf1",
                "md5": "26bc52a640fe607c8417075609ece41e",
                "sha256": "21db15bc2c7622f584815b208f11407c89004c2501ca021f2e3420dd8ed36855"
            },
            "downloads": -1,
            "filename": "vllmocr-0.3.8.tar.gz",
            "has_sig": false,
            "md5_digest": "26bc52a640fe607c8417075609ece41e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 14281,
            "upload_time": "2025-03-13T10:43:39",
            "upload_time_iso_8601": "2025-03-13T10:43:39.921151Z",
            "url": "https://files.pythonhosted.org/packages/94/c7/8fea86b27209bbda89c603b86e5013e2ad9160adc0fd13a812cf1cc6fcf1/vllmocr-0.3.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-03-13 10:43:39",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "vllmocr"
}
        
Elapsed time: 0.40470s