# llmextract
A Python library to extract structured information from unstructured text using LLMs, powered by LangChain. It supports multiple providers like OpenRouter and Ollama out of the box.
## Features
- **Multi-Provider Support**: Seamlessly switch between cloud-based models (via OpenRouter) and local models (via Ollama).
- **Structured Output**: Uses Pydantic and few-shot prompting to ensure reliable, structured JSON output.
- **Source Grounding**: Automatically finds the exact character positions of extractions in the source text.
- **Long Document Support**: Handles documents larger than a model's context window via intelligent chunking.
- **Interactive Visualization**: Generate a self-contained HTML report to review extractions in context.
## Installation
```bash
pip install llmextract
```
## Quick Start
1. **Set up your API keys.** Create a `.env` file in your project root:
```
# For cloud models via OpenRouter
OPENROUTER_API_KEY="sk-or-..."
# For local models, ensure Ollama is running
# ollama serve
```
2. **Run an extraction.** Create a Python script and add the following:
```python
from dotenv import load_dotenv
from llmextract import extract, ExampleData, Extraction
# Load API keys from .env file
load_dotenv()
# 1. Define the extraction task with a clear prompt and examples
prompt = "Extract patient names, medications, and conditions."
examples = [
ExampleData(
text="Jane took 20mg of Zoloft for depression.",
extractions=[
Extraction(extraction_class="patient", extraction_text="Jane"),
Extraction(extraction_class="medication", extraction_text="Zoloft"),
Extraction(extraction_class="condition", extraction_text="depression"),
],
)
]
# 2. Define the input text
text = "The patient, John Doe, was prescribed Lisinopril for hypertension."
# 3. Call the extract function
result_doc = extract(
text=text,
prompt_description=prompt,
examples=examples,
model_name="mistralai/mistral-7b-instruct:free", # An OpenRouter model
)
# 4. Programmatically access the results
print("--- Extracted Data ---")
for extraction in result_doc.extractions:
print(f"Class: {extraction.extraction_class}, Text: '{extraction.extraction_text}'")
if extraction.char_interval:
print(f" -> Found at characters {extraction.char_interval.start}-{extraction.char_interval.end}")
```
## License
This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "llmextract",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "llm, langchain, extraction, structured-data, nlp, openrouter, ollama",
"author": null,
"author_email": "Harshaditya Sharma <mail@harshaditya.com>",
"download_url": "https://files.pythonhosted.org/packages/95/ad/4d94ad0428a4cfe44cbcce90cfb5b8aad2932c013b6b657ac6fa197bc694/llmextract-0.1.0.tar.gz",
"platform": null,
"description": "# llmextract\n\nA Python library to extract structured information from unstructured text using LLMs, powered by LangChain. It supports multiple providers like OpenRouter and Ollama out of the box.\n\n## Features\n\n- **Multi-Provider Support**: Seamlessly switch between cloud-based models (via OpenRouter) and local models (via Ollama).\n- **Structured Output**: Uses Pydantic and few-shot prompting to ensure reliable, structured JSON output.\n- **Source Grounding**: Automatically finds the exact character positions of extractions in the source text.\n- **Long Document Support**: Handles documents larger than a model's context window via intelligent chunking.\n- **Interactive Visualization**: Generate a self-contained HTML report to review extractions in context.\n\n## Installation\n\n```bash\npip install llmextract\n```\n\n## Quick Start\n\n1. **Set up your API keys.** Create a `.env` file in your project root:\n\n ```\n # For cloud models via OpenRouter\n OPENROUTER_API_KEY=\"sk-or-...\"\n\n # For local models, ensure Ollama is running\n # ollama serve\n ```\n\n2. **Run an extraction.** Create a Python script and add the following:\n\n ```python\n from dotenv import load_dotenv\n from llmextract import extract, ExampleData, Extraction\n\n # Load API keys from .env file\n load_dotenv()\n\n # 1. Define the extraction task with a clear prompt and examples\n prompt = \"Extract patient names, medications, and conditions.\"\n examples = [\n ExampleData(\n text=\"Jane took 20mg of Zoloft for depression.\",\n extractions=[\n Extraction(extraction_class=\"patient\", extraction_text=\"Jane\"),\n Extraction(extraction_class=\"medication\", extraction_text=\"Zoloft\"),\n Extraction(extraction_class=\"condition\", extraction_text=\"depression\"),\n ],\n )\n ]\n\n # 2. Define the input text\n text = \"The patient, John Doe, was prescribed Lisinopril for hypertension.\"\n\n # 3. Call the extract function\n result_doc = extract(\n text=text,\n prompt_description=prompt,\n examples=examples,\n model_name=\"mistralai/mistral-7b-instruct:free\", # An OpenRouter model\n )\n\n # 4. Programmatically access the results\n print(\"--- Extracted Data ---\")\n for extraction in result_doc.extractions:\n print(f\"Class: {extraction.extraction_class}, Text: '{extraction.extraction_text}'\")\n if extraction.char_interval:\n print(f\" -> Found at characters {extraction.char_interval.start}-{extraction.char_interval.end}\")\n ```\n\n## License\n\nThis project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details.\n",
"bugtrack_url": null,
"license": null,
"summary": "A library to extract structured information from unstructured text using LLMs, powered by LangChain.",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/chromedupcivilian/llmextract/issues",
"Homepage": "https://github.com/chromedupcivilian/llmextract",
"Repository": "https://github.com/chromedupcivilian/llmextract"
},
"split_keywords": [
"llm",
" langchain",
" extraction",
" structured-data",
" nlp",
" openrouter",
" ollama"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "7a0bcf5fef02937fb01afebbe46809ca870de019ebcec10458075f4008c045a4",
"md5": "b0854e3fd7877562b053f2a0df6e225c",
"sha256": "af7f9e684d753a89ce9560b45a2bb1bda9faf861b1911b6df82307754cfc0c7c"
},
"downloads": -1,
"filename": "llmextract-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b0854e3fd7877562b053f2a0df6e225c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 16064,
"upload_time": "2025-10-21T20:32:51",
"upload_time_iso_8601": "2025-10-21T20:32:51.030076Z",
"url": "https://files.pythonhosted.org/packages/7a/0b/cf5fef02937fb01afebbe46809ca870de019ebcec10458075f4008c045a4/llmextract-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "95ad4d94ad0428a4cfe44cbcce90cfb5b8aad2932c013b6b657ac6fa197bc694",
"md5": "a9e6d3afb0276c1c6670374681cddd5d",
"sha256": "ed39b2c79fb9f2fd80e75689bc7c1859e18848fa9c0bd9caf7d157e6e778a5a5"
},
"downloads": -1,
"filename": "llmextract-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "a9e6d3afb0276c1c6670374681cddd5d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 16598,
"upload_time": "2025-10-21T20:32:52",
"upload_time_iso_8601": "2025-10-21T20:32:52.925473Z",
"url": "https://files.pythonhosted.org/packages/95/ad/4d94ad0428a4cfe44cbcce90cfb5b8aad2932c013b6b657ac6fa197bc694/llmextract-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-21 20:32:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "chromedupcivilian",
"github_project": "llmextract",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "llmextract"
}