# LangExtract MCP Server
A FastMCP server for Google's [langextract](https://github.com/google/langextract) library. This server enables AI assistants like Claude Code to extract structured information from unstructured text using Large Language Models through a MCP interface.
<a href="https://glama.ai/mcp/servers/@larsenweigle/langextract-mcp">
<img width="380" height="200" src="https://glama.ai/mcp/servers/@larsenweigle/langextract-mcp/badge" alt="LangExtract Server MCP server" />
</a>
## Overview
LangExtract is a Python library that uses LLMs to extract structured information from text documents while maintaining precise source grounding. This MCP server exposes langextract's capabilities through the Model Context Protocol. The server includes intelligent caching, persistent connections, and server-side credential management to provide optimal performance in long-running environments like Claude Code.
## Quick Setup for Claude Code
### Prerequisites
- Claude Code installed and configured
- Google Gemini API key ([Get one here](https://aistudio.google.com/app/apikey))
- Python 3.10 or higher
### Installation
Install directly into Claude Code using the built-in MCP management:
```bash
claude mcp add langextract-mcp -e LANGEXTRACT_API_KEY=your-gemini-api-key -- uv run --with fastmcp fastmcp run src/langextract_mcp/server.py
```
The server will automatically start and integrate with Claude Code. No additional configuration is required.
### Verification
After installation, verify the integration entering in Claude Code:
```
/mcp
```
You should see output indicating the server is running and can enter the server to see its tool contents.
## Available Tools
The server provides the following tools for text extraction workflows:
**Core Extraction**
- `extract_from_text` - Extract structured information from provided text
- `extract_from_url` - Extract information from web content
- `save_extraction_results` - Save results to JSONL format
- `generate_visualization` - Create interactive HTML visualizations
For more information, you can checkout out the resources available to the client under `src/langextract_mcp/resources`
## Usage Examples
I am currently adding the abilty for MCP clients to pass file paths to unstructured text.
### Basic Text Extraction
Ask Claude Code to extract information using natural language:
```
Extract medication information from this text: "Patient prescribed 500mg amoxicillin twice daily for infection"
Use these examples to guide the extraction:
- Text: "Take 250mg ibuprofen every 4 hours"
- Expected: medication=ibuprofen, dosage=250mg, frequency=every 4 hours
```
### Advanced Configuration
For complex extractions, specify configuration parameters:
```
Extract character emotions from Shakespeare using:
- Model: gemini-2.5-pro for better literary analysis
- Multiple passes: 3 for comprehensive extraction
- Temperature: 0.2 for consistent results
```
### URL Processing
Extract information directly from web content:
```
Extract key findings from this research paper: https://arxiv.org/abs/example
Focus on methodology, results, and conclusions
```
## Supported Models
This server currently supports **Google Gemini models only**, optimized for reliable structured extraction with advanced schema constraints:
- `gemini-2.5-flash` - **Recommended default** - Optimal balance of speed, cost, and quality
- `gemini-2.5-pro` - Best for complex reasoning and analysis tasks requiring highest accuracy
The server uses persistent connections, schema caching, and connection pooling for optimal performance with Gemini models. Support for additional providers may be added in future versions.
## Configuration Reference
### Environment Variables
Set during installation or in server environment:
```bash
LANGEXTRACT_API_KEY=your-gemini-api-key # Required
```
### Tool Parameters
Configure extraction behavior through tool parameters:
```python
{
"model_id": "gemini-2.5-flash", # Language model selection
"max_char_buffer": 1000, # Text chunk size
"temperature": 0.5, # Sampling temperature (0.0-1.0)
"extraction_passes": 1, # Number of extraction attempts
"max_workers": 10 # Parallel processing threads
}
```
### Output Format
All extractions return consistent structured data:
```python
{
"document_id": "doc_123",
"total_extractions": 5,
"extractions": [
{
"extraction_class": "medication",
"extraction_text": "amoxicillin",
"attributes": {"type": "antibiotic"},
"start_char": 25,
"end_char": 35
}
],
"metadata": {
"model_id": "gemini-2.5-flash",
"extraction_passes": 1,
"temperature": 0.5
}
}
```
## Use Cases
LangExtract MCP Server supports a wide range of use cases across multiple domains. In healthcare and life sciences, it can extract medications, dosages, and treatment protocols from clinical notes, structure radiology and pathology reports, and process research papers or clinical trial data. For legal and compliance applications, it enables extraction of contract terms, parties, and obligations, as well as analysis of regulatory documents, compliance reports, and case law. In research and academia, the server is useful for extracting methodologies, findings, and citations from papers, analyzing survey responses and interview transcripts, and processing historical or archival materials. For business intelligence, it helps extract insights from customer feedback and reviews, analyze news articles and market reports, and process financial documents and earnings reports.
## Support and Documentation
**Primary Resources:**
- [LangExtract Documentation](https://github.com/google/langextract) - Core library reference
- [FastMCP Documentation](https://gofastmcp.com/) - MCP server framework
- [Model Context Protocol](https://modelcontextprotocol.io/) - Protocol specification
Raw data
{
"_id": null,
"home_page": null,
"name": "iflow-mcp_langextract-mcp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "ai, fastmcp, langextract, llm, mcp, nlp, text-extraction",
"author": "Larsen Weigle",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/de/c0/c5fdaa9b04910dc03f3b32bb3debe344de1fec9d1b301f200a3cd744db68/iflow_mcp_langextract_mcp-0.1.1.tar.gz",
"platform": null,
"description": "# LangExtract MCP Server\n\nA FastMCP server for Google's [langextract](https://github.com/google/langextract) library. This server enables AI assistants like Claude Code to extract structured information from unstructured text using Large Language Models through a MCP interface.\n\n<a href=\"https://glama.ai/mcp/servers/@larsenweigle/langextract-mcp\">\n <img width=\"380\" height=\"200\" src=\"https://glama.ai/mcp/servers/@larsenweigle/langextract-mcp/badge\" alt=\"LangExtract Server MCP server\" />\n</a>\n\n## Overview\n\nLangExtract is a Python library that uses LLMs to extract structured information from text documents while maintaining precise source grounding. This MCP server exposes langextract's capabilities through the Model Context Protocol. The server includes intelligent caching, persistent connections, and server-side credential management to provide optimal performance in long-running environments like Claude Code.\n\n## Quick Setup for Claude Code\n\n### Prerequisites\n\n- Claude Code installed and configured\n- Google Gemini API key ([Get one here](https://aistudio.google.com/app/apikey))\n- Python 3.10 or higher\n\n### Installation\n\nInstall directly into Claude Code using the built-in MCP management:\n\n```bash\nclaude mcp add langextract-mcp -e LANGEXTRACT_API_KEY=your-gemini-api-key -- uv run --with fastmcp fastmcp run src/langextract_mcp/server.py\n```\n\nThe server will automatically start and integrate with Claude Code. No additional configuration is required.\n\n### Verification\n\nAfter installation, verify the integration entering in Claude Code:\n\n```\n/mcp\n```\n\nYou should see output indicating the server is running and can enter the server to see its tool contents.\n\n## Available Tools\n\nThe server provides the following tools for text extraction workflows:\n\n**Core Extraction**\n- `extract_from_text` - Extract structured information from provided text\n- `extract_from_url` - Extract information from web content\n- `save_extraction_results` - Save results to JSONL format\n- `generate_visualization` - Create interactive HTML visualizations\n\nFor more information, you can checkout out the resources available to the client under `src/langextract_mcp/resources`\n\n## Usage Examples\n\nI am currently adding the abilty for MCP clients to pass file paths to unstructured text.\n\n### Basic Text Extraction\n\nAsk Claude Code to extract information using natural language:\n\n```\nExtract medication information from this text: \"Patient prescribed 500mg amoxicillin twice daily for infection\"\n\nUse these examples to guide the extraction:\n- Text: \"Take 250mg ibuprofen every 4 hours\"\n- Expected: medication=ibuprofen, dosage=250mg, frequency=every 4 hours\n```\n\n### Advanced Configuration\n\nFor complex extractions, specify configuration parameters:\n\n```\nExtract character emotions from Shakespeare using:\n- Model: gemini-2.5-pro for better literary analysis\n- Multiple passes: 3 for comprehensive extraction\n- Temperature: 0.2 for consistent results\n```\n\n### URL Processing\n\nExtract information directly from web content:\n\n```\nExtract key findings from this research paper: https://arxiv.org/abs/example\nFocus on methodology, results, and conclusions\n```\n\n## Supported Models\n\nThis server currently supports **Google Gemini models only**, optimized for reliable structured extraction with advanced schema constraints:\n\n- `gemini-2.5-flash` - **Recommended default** - Optimal balance of speed, cost, and quality\n- `gemini-2.5-pro` - Best for complex reasoning and analysis tasks requiring highest accuracy\n\nThe server uses persistent connections, schema caching, and connection pooling for optimal performance with Gemini models. Support for additional providers may be added in future versions.\n\n## Configuration Reference\n\n### Environment Variables\n\nSet during installation or in server environment:\n\n```bash\nLANGEXTRACT_API_KEY=your-gemini-api-key # Required\n```\n\n### Tool Parameters\n\nConfigure extraction behavior through tool parameters:\n\n```python\n{\n \"model_id\": \"gemini-2.5-flash\", # Language model selection\n \"max_char_buffer\": 1000, # Text chunk size\n \"temperature\": 0.5, # Sampling temperature (0.0-1.0) \n \"extraction_passes\": 1, # Number of extraction attempts\n \"max_workers\": 10 # Parallel processing threads\n}\n```\n\n### Output Format\n\nAll extractions return consistent structured data:\n\n```python\n{\n \"document_id\": \"doc_123\",\n \"total_extractions\": 5,\n \"extractions\": [\n {\n \"extraction_class\": \"medication\", \n \"extraction_text\": \"amoxicillin\",\n \"attributes\": {\"type\": \"antibiotic\"},\n \"start_char\": 25,\n \"end_char\": 35\n }\n ],\n \"metadata\": {\n \"model_id\": \"gemini-2.5-flash\",\n \"extraction_passes\": 1,\n \"temperature\": 0.5\n }\n}\n```\n\n## Use Cases\n\nLangExtract MCP Server supports a wide range of use cases across multiple domains. In healthcare and life sciences, it can extract medications, dosages, and treatment protocols from clinical notes, structure radiology and pathology reports, and process research papers or clinical trial data. For legal and compliance applications, it enables extraction of contract terms, parties, and obligations, as well as analysis of regulatory documents, compliance reports, and case law. In research and academia, the server is useful for extracting methodologies, findings, and citations from papers, analyzing survey responses and interview transcripts, and processing historical or archival materials. For business intelligence, it helps extract insights from customer feedback and reviews, analyze news articles and market reports, and process financial documents and earnings reports.\n\n## Support and Documentation\n\n**Primary Resources:**\n- [LangExtract Documentation](https://github.com/google/langextract) - Core library reference\n- [FastMCP Documentation](https://gofastmcp.com/) - MCP server framework\n- [Model Context Protocol](https://modelcontextprotocol.io/) - Protocol specification\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "FastMCP server for Google's langextract library - extract structured information from unstructured text using LLMs",
"version": "0.1.1",
"project_urls": {
"Documentation": "https://github.com/your-org/langextract-mcp/blob/main/README.md",
"Homepage": "https://github.com/your-org/langextract-mcp",
"Issues": "https://github.com/your-org/langextract-mcp/issues",
"Repository": "https://github.com/your-org/langextract-mcp"
},
"split_keywords": [
"ai",
" fastmcp",
" langextract",
" llm",
" mcp",
" nlp",
" text-extraction"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "71c37f5f58666a566c640eefab91327863b2b3f1c0135917592d84a9d5476251",
"md5": "4d9538b5a574585f07968d2fa2aa4f88",
"sha256": "a02504d69f85ffec23d728062ddaf62cd70707fb2ae75f81b469bfe3a32ae441"
},
"downloads": -1,
"filename": "iflow_mcp_langextract_mcp-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4d9538b5a574585f07968d2fa2aa4f88",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 18882,
"upload_time": "2025-08-26T08:42:34",
"upload_time_iso_8601": "2025-08-26T08:42:34.062416Z",
"url": "https://files.pythonhosted.org/packages/71/c3/7f5f58666a566c640eefab91327863b2b3f1c0135917592d84a9d5476251/iflow_mcp_langextract_mcp-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "dec0c5fdaa9b04910dc03f3b32bb3debe344de1fec9d1b301f200a3cd744db68",
"md5": "d6cb573cd5ff10a15fe742af7a7de776",
"sha256": "ad9808aae5fb3314400356953ca1e1b2e27c501b437af5b8ccf675d4304f8743"
},
"downloads": -1,
"filename": "iflow_mcp_langextract_mcp-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "d6cb573cd5ff10a15fe742af7a7de776",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 20913,
"upload_time": "2025-08-26T08:42:35",
"upload_time_iso_8601": "2025-08-26T08:42:35.362945Z",
"url": "https://files.pythonhosted.org/packages/de/c0/c5fdaa9b04910dc03f3b32bb3debe344de1fec9d1b301f200a3cd744db68/iflow_mcp_langextract_mcp-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-26 08:42:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "your-org",
"github_project": "langextract-mcp",
"github_not_found": true,
"lcname": "iflow-mcp_langextract-mcp"
}