upspawn-ocr-cli


Nameupspawn-ocr-cli JSON
Version 0.1.0b3 PyPI version JSON
download
home_pageNone
SummaryModern, polished CLI to extract text from PDFs using the Mistral OCR API.
upload_time2025-08-15 23:24:29
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords ocr mistral pdf cli text-extraction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Mistral OCR CLI

[![CI](https://github.com/upspawn/mistral-ocr-cli/actions/workflows/ci.yml/badge.svg)](https://github.com/upspawn/mistral-ocr-cli/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/mistral-ocr-cli.svg)](https://pypi.org/project/mistral-ocr-cli/)
[![Python](https://img.shields.io/pypi/pyversions/mistral-ocr-cli.svg)](https://pypi.org/project/mistral-ocr-cli/)
[![License](https://img.shields.io/pypi/l/mistral-ocr-cli.svg)](LICENSE)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

Modern, polished CLI to extract text from PDFs using the Mistral OCR API.

## Features

- Elegant TUI with progress bars and rich output
- Single file or batch processing
- Output in text, JSON, or Markdown
- Parallel batch processing with `--jobs`
- Config helper and `.env` support

## Quickstart

1) Install

```bash
uv tool install mistral-ocr-cli  # via pipx-like tool install
# or
uv pip install mistral-ocr-cli   # into current environment
```

2) Configure API key

```bash
export MISTRAL_API_KEY=your_key_here
# or
echo "MISTRAL_API_KEY=your_key_here" >> .env
```

3) Extract text

```bash
ocr extract file.pdf -o out.txt
ocr extract file1.pdf file2.pdf --batch --output-dir outputs --jobs 4
```

## Usage

```bash
ocr extract [OPTIONS] FILES...

Options:
  -o, --output PATH            Output file (single-file mode)
  -f, --format [text|json|markdown]
  -b, --batch                  Enable batch mode
  -O, --output-dir PATH        Directory for batch outputs
  -j, --jobs INTEGER RANGE     Parallel jobs for batch [default: 1]
  -v, --verbose                Verbose logs
  -q, --quiet                  Only errors
  --version                    Show version
  --help                       Show help
```

## Programmatic use

```python
from ocr.pdf2text import pdf_to_text

text = pdf_to_text("/path/file.pdf")
```

## Development

```bash
uv pip install -e .[dev]
uv run pre-commit install
uv run pytest -q
```

Releasing is handled via standard tags and GitHub Releases.

## License

MIT


## Test coverage

```bash
# Terminal report
make coverage

# HTML report in htmlcov/
make coverhtml
```


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "upspawn-ocr-cli",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "ocr, mistral, pdf, cli, text-extraction",
    "author": null,
    "author_email": "UpSpawn <opensource@upspawn.com>",
    "download_url": "https://files.pythonhosted.org/packages/88/5d/e32c4d79b79e7112b398924d0db4579467d19938593847086bb45833d91c/upspawn_ocr_cli-0.1.0b3.tar.gz",
    "platform": null,
    "description": "# Mistral OCR CLI\n\n[![CI](https://github.com/upspawn/mistral-ocr-cli/actions/workflows/ci.yml/badge.svg)](https://github.com/upspawn/mistral-ocr-cli/actions/workflows/ci.yml)\n[![PyPI](https://img.shields.io/pypi/v/mistral-ocr-cli.svg)](https://pypi.org/project/mistral-ocr-cli/)\n[![Python](https://img.shields.io/pypi/pyversions/mistral-ocr-cli.svg)](https://pypi.org/project/mistral-ocr-cli/)\n[![License](https://img.shields.io/pypi/l/mistral-ocr-cli.svg)](LICENSE)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nModern, polished CLI to extract text from PDFs using the Mistral OCR API.\n\n## Features\n\n- Elegant TUI with progress bars and rich output\n- Single file or batch processing\n- Output in text, JSON, or Markdown\n- Parallel batch processing with `--jobs`\n- Config helper and `.env` support\n\n## Quickstart\n\n1) Install\n\n```bash\nuv tool install mistral-ocr-cli  # via pipx-like tool install\n# or\nuv pip install mistral-ocr-cli   # into current environment\n```\n\n2) Configure API key\n\n```bash\nexport MISTRAL_API_KEY=your_key_here\n# or\necho \"MISTRAL_API_KEY=your_key_here\" >> .env\n```\n\n3) Extract text\n\n```bash\nocr extract file.pdf -o out.txt\nocr extract file1.pdf file2.pdf --batch --output-dir outputs --jobs 4\n```\n\n## Usage\n\n```bash\nocr extract [OPTIONS] FILES...\n\nOptions:\n  -o, --output PATH            Output file (single-file mode)\n  -f, --format [text|json|markdown]\n  -b, --batch                  Enable batch mode\n  -O, --output-dir PATH        Directory for batch outputs\n  -j, --jobs INTEGER RANGE     Parallel jobs for batch [default: 1]\n  -v, --verbose                Verbose logs\n  -q, --quiet                  Only errors\n  --version                    Show version\n  --help                       Show help\n```\n\n## Programmatic use\n\n```python\nfrom ocr.pdf2text import pdf_to_text\n\ntext = pdf_to_text(\"/path/file.pdf\")\n```\n\n## Development\n\n```bash\nuv pip install -e .[dev]\nuv run pre-commit install\nuv run pytest -q\n```\n\nReleasing is handled via standard tags and GitHub Releases.\n\n## License\n\nMIT\n\n\n## Test coverage\n\n```bash\n# Terminal report\nmake coverage\n\n# HTML report in htmlcov/\nmake coverhtml\n```\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Modern, polished CLI to extract text from PDFs using the Mistral OCR API.",
    "version": "0.1.0b3",
    "project_urls": {
        "Homepage": "https://github.com/upspawn/mistral-ocr-cli",
        "Issues": "https://github.com/upspawn/mistral-ocr-cli/issues",
        "Release Notes": "https://github.com/upspawn/mistral-ocr-cli/releases",
        "Repository": "https://github.com/upspawn/mistral-ocr-cli.git"
    },
    "split_keywords": [
        "ocr",
        " mistral",
        " pdf",
        " cli",
        " text-extraction"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bb842f9727840b72b84baa69880f54d7615b83526e483eb85617663af877b324",
                "md5": "ea5356d0ee5f83e89c7b2fff797351e6",
                "sha256": "8b0d2a9d813f94fe1632ddf2e57199101edc164bc0e83f211a0e5649045fc9d8"
            },
            "downloads": -1,
            "filename": "upspawn_ocr_cli-0.1.0b3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ea5356d0ee5f83e89c7b2fff797351e6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 9645,
            "upload_time": "2025-08-15T23:24:28",
            "upload_time_iso_8601": "2025-08-15T23:24:28.309165Z",
            "url": "https://files.pythonhosted.org/packages/bb/84/2f9727840b72b84baa69880f54d7615b83526e483eb85617663af877b324/upspawn_ocr_cli-0.1.0b3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "885de32c4d79b79e7112b398924d0db4579467d19938593847086bb45833d91c",
                "md5": "29fa681c6f238616ba10bf5164f2ee46",
                "sha256": "6966f79be95e17865115930d1051a6d6b52a9e293b4a1de9bcee03d0bf98e67c"
            },
            "downloads": -1,
            "filename": "upspawn_ocr_cli-0.1.0b3.tar.gz",
            "has_sig": false,
            "md5_digest": "29fa681c6f238616ba10bf5164f2ee46",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 13133,
            "upload_time": "2025-08-15T23:24:29",
            "upload_time_iso_8601": "2025-08-15T23:24:29.418542Z",
            "url": "https://files.pythonhosted.org/packages/88/5d/e32c4d79b79e7112b398924d0db4579467d19938593847086bb45833d91c/upspawn_ocr_cli-0.1.0b3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-15 23:24:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "upspawn",
    "github_project": "mistral-ocr-cli",
    "github_not_found": true,
    "lcname": "upspawn-ocr-cli"
}
        
Elapsed time: 1.53893s