llmextract


Namellmextract JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryA library to extract structured information from unstructured text using LLMs, powered by LangChain.
upload_time2025-10-21 20:32:52
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords llm langchain extraction structured-data nlp openrouter ollama
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # llmextract

A Python library to extract structured information from unstructured text using LLMs, powered by LangChain. It supports multiple providers like OpenRouter and Ollama out of the box.

## Features

- **Multi-Provider Support**: Seamlessly switch between cloud-based models (via OpenRouter) and local models (via Ollama).
- **Structured Output**: Uses Pydantic and few-shot prompting to ensure reliable, structured JSON output.
- **Source Grounding**: Automatically finds the exact character positions of extractions in the source text.
- **Long Document Support**: Handles documents larger than a model's context window via intelligent chunking.
- **Interactive Visualization**: Generate a self-contained HTML report to review extractions in context.

## Installation

```bash
pip install llmextract
```

## Quick Start

1.  **Set up your API keys.** Create a `.env` file in your project root:

    ```
    # For cloud models via OpenRouter
    OPENROUTER_API_KEY="sk-or-..."

    # For local models, ensure Ollama is running
    # ollama serve
    ```

2.  **Run an extraction.** Create a Python script and add the following:

    ```python
    from dotenv import load_dotenv
    from llmextract import extract, ExampleData, Extraction

    # Load API keys from .env file
    load_dotenv()

    # 1. Define the extraction task with a clear prompt and examples
    prompt = "Extract patient names, medications, and conditions."
    examples = [
        ExampleData(
            text="Jane took 20mg of Zoloft for depression.",
            extractions=[
                Extraction(extraction_class="patient", extraction_text="Jane"),
                Extraction(extraction_class="medication", extraction_text="Zoloft"),
                Extraction(extraction_class="condition", extraction_text="depression"),
            ],
        )
    ]

    # 2. Define the input text
    text = "The patient, John Doe, was prescribed Lisinopril for hypertension."

    # 3. Call the extract function
    result_doc = extract(
        text=text,
        prompt_description=prompt,
        examples=examples,
        model_name="mistralai/mistral-7b-instruct:free", # An OpenRouter model
    )

    # 4. Programmatically access the results
    print("--- Extracted Data ---")
    for extraction in result_doc.extractions:
        print(f"Class: {extraction.extraction_class}, Text: '{extraction.extraction_text}'")
        if extraction.char_interval:
            print(f"  -> Found at characters {extraction.char_interval.start}-{extraction.char_interval.end}")
    ```

## License

This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llmextract",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "llm, langchain, extraction, structured-data, nlp, openrouter, ollama",
    "author": null,
    "author_email": "Harshaditya Sharma <mail@harshaditya.com>",
    "download_url": "https://files.pythonhosted.org/packages/95/ad/4d94ad0428a4cfe44cbcce90cfb5b8aad2932c013b6b657ac6fa197bc694/llmextract-0.1.0.tar.gz",
    "platform": null,
    "description": "# llmextract\n\nA Python library to extract structured information from unstructured text using LLMs, powered by LangChain. It supports multiple providers like OpenRouter and Ollama out of the box.\n\n## Features\n\n- **Multi-Provider Support**: Seamlessly switch between cloud-based models (via OpenRouter) and local models (via Ollama).\n- **Structured Output**: Uses Pydantic and few-shot prompting to ensure reliable, structured JSON output.\n- **Source Grounding**: Automatically finds the exact character positions of extractions in the source text.\n- **Long Document Support**: Handles documents larger than a model's context window via intelligent chunking.\n- **Interactive Visualization**: Generate a self-contained HTML report to review extractions in context.\n\n## Installation\n\n```bash\npip install llmextract\n```\n\n## Quick Start\n\n1.  **Set up your API keys.** Create a `.env` file in your project root:\n\n    ```\n    # For cloud models via OpenRouter\n    OPENROUTER_API_KEY=\"sk-or-...\"\n\n    # For local models, ensure Ollama is running\n    # ollama serve\n    ```\n\n2.  **Run an extraction.** Create a Python script and add the following:\n\n    ```python\n    from dotenv import load_dotenv\n    from llmextract import extract, ExampleData, Extraction\n\n    # Load API keys from .env file\n    load_dotenv()\n\n    # 1. Define the extraction task with a clear prompt and examples\n    prompt = \"Extract patient names, medications, and conditions.\"\n    examples = [\n        ExampleData(\n            text=\"Jane took 20mg of Zoloft for depression.\",\n            extractions=[\n                Extraction(extraction_class=\"patient\", extraction_text=\"Jane\"),\n                Extraction(extraction_class=\"medication\", extraction_text=\"Zoloft\"),\n                Extraction(extraction_class=\"condition\", extraction_text=\"depression\"),\n            ],\n        )\n    ]\n\n    # 2. Define the input text\n    text = \"The patient, John Doe, was prescribed Lisinopril for hypertension.\"\n\n    # 3. Call the extract function\n    result_doc = extract(\n        text=text,\n        prompt_description=prompt,\n        examples=examples,\n        model_name=\"mistralai/mistral-7b-instruct:free\", # An OpenRouter model\n    )\n\n    # 4. Programmatically access the results\n    print(\"--- Extracted Data ---\")\n    for extraction in result_doc.extractions:\n        print(f\"Class: {extraction.extraction_class}, Text: '{extraction.extraction_text}'\")\n        if extraction.char_interval:\n            print(f\"  -> Found at characters {extraction.char_interval.start}-{extraction.char_interval.end}\")\n    ```\n\n## License\n\nThis project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A library to extract structured information from unstructured text using LLMs, powered by LangChain.",
    "version": "0.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/chromedupcivilian/llmextract/issues",
        "Homepage": "https://github.com/chromedupcivilian/llmextract",
        "Repository": "https://github.com/chromedupcivilian/llmextract"
    },
    "split_keywords": [
        "llm",
        " langchain",
        " extraction",
        " structured-data",
        " nlp",
        " openrouter",
        " ollama"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7a0bcf5fef02937fb01afebbe46809ca870de019ebcec10458075f4008c045a4",
                "md5": "b0854e3fd7877562b053f2a0df6e225c",
                "sha256": "af7f9e684d753a89ce9560b45a2bb1bda9faf861b1911b6df82307754cfc0c7c"
            },
            "downloads": -1,
            "filename": "llmextract-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b0854e3fd7877562b053f2a0df6e225c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 16064,
            "upload_time": "2025-10-21T20:32:51",
            "upload_time_iso_8601": "2025-10-21T20:32:51.030076Z",
            "url": "https://files.pythonhosted.org/packages/7a/0b/cf5fef02937fb01afebbe46809ca870de019ebcec10458075f4008c045a4/llmextract-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "95ad4d94ad0428a4cfe44cbcce90cfb5b8aad2932c013b6b657ac6fa197bc694",
                "md5": "a9e6d3afb0276c1c6670374681cddd5d",
                "sha256": "ed39b2c79fb9f2fd80e75689bc7c1859e18848fa9c0bd9caf7d157e6e778a5a5"
            },
            "downloads": -1,
            "filename": "llmextract-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a9e6d3afb0276c1c6670374681cddd5d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 16598,
            "upload_time": "2025-10-21T20:32:52",
            "upload_time_iso_8601": "2025-10-21T20:32:52.925473Z",
            "url": "https://files.pythonhosted.org/packages/95/ad/4d94ad0428a4cfe44cbcce90cfb5b8aad2932c013b6b657ac6fa197bc694/llmextract-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-21 20:32:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "chromedupcivilian",
    "github_project": "llmextract",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "llmextract"
}
        
Elapsed time: 0.64857s