# OCR & LLM Parser
A powerful Python package for parsing and processing documents using multiple providers:
- **Mistral OCR** — Extracts text from PDFs and images with high accuracy.
- **LangChain** — Processes or summarizes text using LLMs.
- **Llama Parser** — Advanced parsing with Markdown or text output.
- **HuggingFace** — OCR and document question answering with transformer models.
The package provides a **unified interface** so you can switch between providers easily using a **factory pattern**.
---
## 🚀 Features
- Extract text from PDFs or images
- Summarize or process text using LLMs
- Support for **Markdown** or **plain text** output
- Plug-and-play factory to switch providers without changing much code
- Handles environment variable loading for API keys automatically
---
# 🔑 Tokens
Create a .env file in your project root and add the API keys for the services you want to use.
### Mistral OCR
MISTRAL-OCR-API-TOKEN=your_mistral_api_key
### Llama Parser
LLAMA-PARSER-API-TOKEN=your_llama_parser_api_key
### HuggingFace
HF-API-TOKEN=your_huggingface_api_key
Only include the keys for the providers you plan to use.
---
# 🛠️ Usage
from HowdenParser import ParserFactory
from pathlib import Path
parser = ParserFactory.get_parser("mistralocr:", result_type="md")
text = parser.parse(Path("document.pdf"))
print(text)
if HowdenConfig package being used
parser = ParserFactory.get_parser("mistralocr:", **config.parameter.dump_model())
text = parser.parse(Path("document.pdf"))
Raw data
{
"_id": null,
"home_page": "https://github.com/yourusername/config",
"name": "HowdenParser",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.12",
"maintainer_email": null,
"keywords": "config, configuration, pydantic, json",
"author": "JesperThoftIllemannJ",
"author_email": "jesper.jaeger@howdendanmark.dk",
"download_url": "https://files.pythonhosted.org/packages/d4/89/c243171b4b065b1a02769a1722e57843d64375b44178ecccd1b55be63fae/howdenparser-0.1.12.tar.gz",
"platform": null,
"description": "# OCR & LLM Parser\n\nA powerful Python package for parsing and processing documents using multiple providers:\n- **Mistral OCR** \u2014 Extracts text from PDFs and images with high accuracy.\n- **LangChain** \u2014 Processes or summarizes text using LLMs.\n- **Llama Parser** \u2014 Advanced parsing with Markdown or text output.\n- **HuggingFace** \u2014 OCR and document question answering with transformer models.\n\nThe package provides a **unified interface** so you can switch between providers easily using a **factory pattern**.\n\n---\n\n## \ud83d\ude80 Features\n- Extract text from PDFs or images\n- Summarize or process text using LLMs\n- Support for **Markdown** or **plain text** output\n- Plug-and-play factory to switch providers without changing much code\n- Handles environment variable loading for API keys automatically\n\n---\n\n# \ud83d\udd11 Tokens\n\nCreate a .env file in your project root and add the API keys for the services you want to use.\n\n### Mistral OCR\nMISTRAL-OCR-API-TOKEN=your_mistral_api_key\n\n### Llama Parser\nLLAMA-PARSER-API-TOKEN=your_llama_parser_api_key\n\n### HuggingFace\nHF-API-TOKEN=your_huggingface_api_key\n\nOnly include the keys for the providers you plan to use.\n\n---\n\n# \ud83d\udee0\ufe0f Usage\n\nfrom HowdenParser import ParserFactory\n\nfrom pathlib import Path\n\nparser = ParserFactory.get_parser(\"mistralocr:\", result_type=\"md\")\ntext = parser.parse(Path(\"document.pdf\"))\nprint(text)\n\nif HowdenConfig package being used\n\n\nparser = ParserFactory.get_parser(\"mistralocr:\", **config.parameter.dump_model()) \n\ntext = parser.parse(Path(\"document.pdf\"))\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A simple configuration manager with Pydantic and JSON export.",
"version": "0.1.12",
"project_urls": {
"Documentation": "https://github.com/yourusername/config",
"Homepage": "https://github.com/yourusername/config",
"Repository": "https://github.com/yourusername/config"
},
"split_keywords": [
"config",
" configuration",
" pydantic",
" json"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "282a93ac5a3ab91d908137913681b237097c2ce535440e94fad9764f81f381a7",
"md5": "e0dee2981cc37d0f1fdab90da3db4ed2",
"sha256": "8aa301776fa993afb73a71686f7923dd55b1e7186aacd9f71b5d9abcc2a0d54c"
},
"downloads": -1,
"filename": "howdenparser-0.1.12-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e0dee2981cc37d0f1fdab90da3db4ed2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.12",
"size": 5386,
"upload_time": "2025-08-14T12:17:49",
"upload_time_iso_8601": "2025-08-14T12:17:49.481426Z",
"url": "https://files.pythonhosted.org/packages/28/2a/93ac5a3ab91d908137913681b237097c2ce535440e94fad9764f81f381a7/howdenparser-0.1.12-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d489c243171b4b065b1a02769a1722e57843d64375b44178ecccd1b55be63fae",
"md5": "371787315d278e3b4ef38f4c50d79e6e",
"sha256": "ce2e781aa6e72d5ada87748466c50f865f7bc0b7a67c95d163a8bc9b5e4d304b"
},
"downloads": -1,
"filename": "howdenparser-0.1.12.tar.gz",
"has_sig": false,
"md5_digest": "371787315d278e3b4ef38f4c50d79e6e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.12",
"size": 3828,
"upload_time": "2025-08-14T12:17:50",
"upload_time_iso_8601": "2025-08-14T12:17:50.665456Z",
"url": "https://files.pythonhosted.org/packages/d4/89/c243171b4b065b1a02769a1722e57843d64375b44178ecccd1b55be63fae/howdenparser-0.1.12.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-14 12:17:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yourusername",
"github_project": "config",
"github_not_found": true,
"lcname": "howdenparser"
}