Name | upspawn-ocr-cli JSON |
Version |
0.1.0b3
JSON |
| download |
home_page | None |
Summary | Modern, polished CLI to extract text from PDFs using the Mistral OCR API. |
upload_time | 2025-08-15 23:24:29 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | None |
keywords |
ocr
mistral
pdf
cli
text-extraction
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Mistral OCR CLI
[](https://github.com/upspawn/mistral-ocr-cli/actions/workflows/ci.yml)
[](https://pypi.org/project/mistral-ocr-cli/)
[](https://pypi.org/project/mistral-ocr-cli/)
[](LICENSE)
[](https://github.com/pre-commit/pre-commit)
[](https://github.com/psf/black)
Modern, polished CLI to extract text from PDFs using the Mistral OCR API.
## Features
- Elegant TUI with progress bars and rich output
- Single file or batch processing
- Output in text, JSON, or Markdown
- Parallel batch processing with `--jobs`
- Config helper and `.env` support
## Quickstart
1) Install
```bash
uv tool install mistral-ocr-cli # via pipx-like tool install
# or
uv pip install mistral-ocr-cli # into current environment
```
2) Configure API key
```bash
export MISTRAL_API_KEY=your_key_here
# or
echo "MISTRAL_API_KEY=your_key_here" >> .env
```
3) Extract text
```bash
ocr extract file.pdf -o out.txt
ocr extract file1.pdf file2.pdf --batch --output-dir outputs --jobs 4
```
## Usage
```bash
ocr extract [OPTIONS] FILES...
Options:
-o, --output PATH Output file (single-file mode)
-f, --format [text|json|markdown]
-b, --batch Enable batch mode
-O, --output-dir PATH Directory for batch outputs
-j, --jobs INTEGER RANGE Parallel jobs for batch [default: 1]
-v, --verbose Verbose logs
-q, --quiet Only errors
--version Show version
--help Show help
```
## Programmatic use
```python
from ocr.pdf2text import pdf_to_text
text = pdf_to_text("/path/file.pdf")
```
## Development
```bash
uv pip install -e .[dev]
uv run pre-commit install
uv run pytest -q
```
Releasing is handled via standard tags and GitHub Releases.
## License
MIT
## Test coverage
```bash
# Terminal report
make coverage
# HTML report in htmlcov/
make coverhtml
```
Raw data
{
"_id": null,
"home_page": null,
"name": "upspawn-ocr-cli",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "ocr, mistral, pdf, cli, text-extraction",
"author": null,
"author_email": "UpSpawn <opensource@upspawn.com>",
"download_url": "https://files.pythonhosted.org/packages/88/5d/e32c4d79b79e7112b398924d0db4579467d19938593847086bb45833d91c/upspawn_ocr_cli-0.1.0b3.tar.gz",
"platform": null,
"description": "# Mistral OCR CLI\n\n[](https://github.com/upspawn/mistral-ocr-cli/actions/workflows/ci.yml)\n[](https://pypi.org/project/mistral-ocr-cli/)\n[](https://pypi.org/project/mistral-ocr-cli/)\n[](LICENSE)\n[](https://github.com/pre-commit/pre-commit)\n[](https://github.com/psf/black)\n\nModern, polished CLI to extract text from PDFs using the Mistral OCR API.\n\n## Features\n\n- Elegant TUI with progress bars and rich output\n- Single file or batch processing\n- Output in text, JSON, or Markdown\n- Parallel batch processing with `--jobs`\n- Config helper and `.env` support\n\n## Quickstart\n\n1) Install\n\n```bash\nuv tool install mistral-ocr-cli # via pipx-like tool install\n# or\nuv pip install mistral-ocr-cli # into current environment\n```\n\n2) Configure API key\n\n```bash\nexport MISTRAL_API_KEY=your_key_here\n# or\necho \"MISTRAL_API_KEY=your_key_here\" >> .env\n```\n\n3) Extract text\n\n```bash\nocr extract file.pdf -o out.txt\nocr extract file1.pdf file2.pdf --batch --output-dir outputs --jobs 4\n```\n\n## Usage\n\n```bash\nocr extract [OPTIONS] FILES...\n\nOptions:\n -o, --output PATH Output file (single-file mode)\n -f, --format [text|json|markdown]\n -b, --batch Enable batch mode\n -O, --output-dir PATH Directory for batch outputs\n -j, --jobs INTEGER RANGE Parallel jobs for batch [default: 1]\n -v, --verbose Verbose logs\n -q, --quiet Only errors\n --version Show version\n --help Show help\n```\n\n## Programmatic use\n\n```python\nfrom ocr.pdf2text import pdf_to_text\n\ntext = pdf_to_text(\"/path/file.pdf\")\n```\n\n## Development\n\n```bash\nuv pip install -e .[dev]\nuv run pre-commit install\nuv run pytest -q\n```\n\nReleasing is handled via standard tags and GitHub Releases.\n\n## License\n\nMIT\n\n\n## Test coverage\n\n```bash\n# Terminal report\nmake coverage\n\n# HTML report in htmlcov/\nmake coverhtml\n```\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Modern, polished CLI to extract text from PDFs using the Mistral OCR API.",
"version": "0.1.0b3",
"project_urls": {
"Homepage": "https://github.com/upspawn/mistral-ocr-cli",
"Issues": "https://github.com/upspawn/mistral-ocr-cli/issues",
"Release Notes": "https://github.com/upspawn/mistral-ocr-cli/releases",
"Repository": "https://github.com/upspawn/mistral-ocr-cli.git"
},
"split_keywords": [
"ocr",
" mistral",
" pdf",
" cli",
" text-extraction"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "bb842f9727840b72b84baa69880f54d7615b83526e483eb85617663af877b324",
"md5": "ea5356d0ee5f83e89c7b2fff797351e6",
"sha256": "8b0d2a9d813f94fe1632ddf2e57199101edc164bc0e83f211a0e5649045fc9d8"
},
"downloads": -1,
"filename": "upspawn_ocr_cli-0.1.0b3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ea5356d0ee5f83e89c7b2fff797351e6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 9645,
"upload_time": "2025-08-15T23:24:28",
"upload_time_iso_8601": "2025-08-15T23:24:28.309165Z",
"url": "https://files.pythonhosted.org/packages/bb/84/2f9727840b72b84baa69880f54d7615b83526e483eb85617663af877b324/upspawn_ocr_cli-0.1.0b3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "885de32c4d79b79e7112b398924d0db4579467d19938593847086bb45833d91c",
"md5": "29fa681c6f238616ba10bf5164f2ee46",
"sha256": "6966f79be95e17865115930d1051a6d6b52a9e293b4a1de9bcee03d0bf98e67c"
},
"downloads": -1,
"filename": "upspawn_ocr_cli-0.1.0b3.tar.gz",
"has_sig": false,
"md5_digest": "29fa681c6f238616ba10bf5164f2ee46",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 13133,
"upload_time": "2025-08-15T23:24:29",
"upload_time_iso_8601": "2025-08-15T23:24:29.418542Z",
"url": "https://files.pythonhosted.org/packages/88/5d/e32c4d79b79e7112b398924d0db4579467d19938593847086bb45833d91c/upspawn_ocr_cli-0.1.0b3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-15 23:24:29",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "upspawn",
"github_project": "mistral-ocr-cli",
"github_not_found": true,
"lcname": "upspawn-ocr-cli"
}