HowdenParser

Name	HowdenParser JSON
Version	0.1.12 JSON
	download
home_page	https://github.com/yourusername/config
Summary	A simple configuration manager with Pydantic and JSON export.
upload_time	2025-08-14 12:17:50
maintainer	None
docs_url	None
author	JesperThoftIllemannJ
requires_python	<4.0,>=3.12
license	MIT
keywords	config configuration pydantic json
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # OCR & LLM Parser

A powerful Python package for parsing and processing documents using multiple providers:
- **Mistral OCR** — Extracts text from PDFs and images with high accuracy.
- **LangChain** — Processes or summarizes text using LLMs.
- **Llama Parser** — Advanced parsing with Markdown or text output.
- **HuggingFace** — OCR and document question answering with transformer models.

The package provides a **unified interface** so you can switch between providers easily using a **factory pattern**.

---

## 🚀 Features
- Extract text from PDFs or images
- Summarize or process text using LLMs
- Support for **Markdown** or **plain text** output
- Plug-and-play factory to switch providers without changing much code
- Handles environment variable loading for API keys automatically

---

# 🔑 Tokens

Create a .env file in your project root and add the API keys for the services you want to use.

### Mistral OCR
MISTRAL-OCR-API-TOKEN=your_mistral_api_key

### Llama Parser
LLAMA-PARSER-API-TOKEN=your_llama_parser_api_key

### HuggingFace
HF-API-TOKEN=your_huggingface_api_key

Only include the keys for the providers you plan to use.

---

# 🛠️ Usage

from HowdenParser import ParserFactory

from pathlib import Path

parser = ParserFactory.get_parser("mistralocr:", result_type="md")
text = parser.parse(Path("document.pdf"))
print(text)

if HowdenConfig package being used


parser = ParserFactory.get_parser("mistralocr:", **config.parameter.dump_model()) 

text = parser.parse(Path("document.pdf"))

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yourusername/config",
    "name": "HowdenParser",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.12",
    "maintainer_email": null,
    "keywords": "config, configuration, pydantic, json",
    "author": "JesperThoftIllemannJ",
    "author_email": "jesper.jaeger@howdendanmark.dk",
    "download_url": "https://files.pythonhosted.org/packages/d4/89/c243171b4b065b1a02769a1722e57843d64375b44178ecccd1b55be63fae/howdenparser-0.1.12.tar.gz",
    "platform": null,
    "description": "# OCR & LLM Parser\n\nA powerful Python package for parsing and processing documents using multiple providers:\n- **Mistral OCR** \u2014 Extracts text from PDFs and images with high accuracy.\n- **LangChain** \u2014 Processes or summarizes text using LLMs.\n- **Llama Parser** \u2014 Advanced parsing with Markdown or text output.\n- **HuggingFace** \u2014 OCR and document question answering with transformer models.\n\nThe package provides a **unified interface** so you can switch between providers easily using a **factory pattern**.\n\n---\n\n## \ud83d\ude80 Features\n- Extract text from PDFs or images\n- Summarize or process text using LLMs\n- Support for **Markdown** or **plain text** output\n- Plug-and-play factory to switch providers without changing much code\n- Handles environment variable loading for API keys automatically\n\n---\n\n# \ud83d\udd11 Tokens\n\nCreate a .env file in your project root and add the API keys for the services you want to use.\n\n### Mistral OCR\nMISTRAL-OCR-API-TOKEN=your_mistral_api_key\n\n### Llama Parser\nLLAMA-PARSER-API-TOKEN=your_llama_parser_api_key\n\n### HuggingFace\nHF-API-TOKEN=your_huggingface_api_key\n\nOnly include the keys for the providers you plan to use.\n\n---\n\n# \ud83d\udee0\ufe0f Usage\n\nfrom HowdenParser import ParserFactory\n\nfrom pathlib import Path\n\nparser = ParserFactory.get_parser(\"mistralocr:\", result_type=\"md\")\ntext = parser.parse(Path(\"document.pdf\"))\nprint(text)\n\nif HowdenConfig package being used\n\n\nparser = ParserFactory.get_parser(\"mistralocr:\", **config.parameter.dump_model()) \n\ntext = parser.parse(Path(\"document.pdf\"))\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A simple configuration manager with Pydantic and JSON export.",
    "version": "0.1.12",
    "project_urls": {
        "Documentation": "https://github.com/yourusername/config",
        "Homepage": "https://github.com/yourusername/config",
        "Repository": "https://github.com/yourusername/config"
    },
    "split_keywords": [
        "config",
        " configuration",
        " pydantic",
        " json"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "282a93ac5a3ab91d908137913681b237097c2ce535440e94fad9764f81f381a7",
                "md5": "e0dee2981cc37d0f1fdab90da3db4ed2",
                "sha256": "8aa301776fa993afb73a71686f7923dd55b1e7186aacd9f71b5d9abcc2a0d54c"
            },
            "downloads": -1,
            "filename": "howdenparser-0.1.12-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e0dee2981cc37d0f1fdab90da3db4ed2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.12",
            "size": 5386,
            "upload_time": "2025-08-14T12:17:49",
            "upload_time_iso_8601": "2025-08-14T12:17:49.481426Z",
            "url": "https://files.pythonhosted.org/packages/28/2a/93ac5a3ab91d908137913681b237097c2ce535440e94fad9764f81f381a7/howdenparser-0.1.12-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d489c243171b4b065b1a02769a1722e57843d64375b44178ecccd1b55be63fae",
                "md5": "371787315d278e3b4ef38f4c50d79e6e",
                "sha256": "ce2e781aa6e72d5ada87748466c50f865f7bc0b7a67c95d163a8bc9b5e4d304b"
            },
            "downloads": -1,
            "filename": "howdenparser-0.1.12.tar.gz",
            "has_sig": false,
            "md5_digest": "371787315d278e3b4ef38f4c50d79e6e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.12",
            "size": 3828,
            "upload_time": "2025-08-14T12:17:50",
            "upload_time_iso_8601": "2025-08-14T12:17:50.665456Z",
            "url": "https://files.pythonhosted.org/packages/d4/89/c243171b4b065b1a02769a1722e57843d64375b44178ecccd1b55be63fae/howdenparser-0.1.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-14 12:17:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yourusername",
    "github_project": "config",
    "github_not_found": true,
    "lcname": "howdenparser"
}

JesperThoftIllemannJ