medlitanno

Name	medlitanno JSON
Version	1.1.1 JSON
	download
home_page	https://github.com/chenxingqiang/medlitanno
Summary	Medical Literature Analysis and Annotation System with LLM-powered automation
upload_time	2025-08-14 09:35:20
maintainer	None
docs_url	None
author	Chen Xingqiang
requires_python	>=3.9
license	MIT
keywords	medical literature annotation pubmed search mendelian randomization llm biomedical nlp causal inference gwas automation literature mining
VCS
bugtrack_url
requirements	pandas openpyxl openai requests beautifulsoup4 biopython numpy scipy dataclasses-json pathlib2 typing-extensions reportlab PyPDF2 rpy2 streamlit plotly pytest pytest-cov black flake8 mypy tqdm colorama python-dotenv
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # MedLitAnno: Medical Literature Annotation System

[![GitHub](https://img.shields.io/github/license/chenxingqiang/medlitanno)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.8%2B-blue)](https://www.python.org/)
[![PyPI](https://img.shields.io/badge/pypi-v1.1.0-blue)](https://pypi.org/project/medlitanno/)
[![CI](https://img.shields.io/badge/CI-passing-brightgreen)](https://github.com/chenxingqiang/medlitanno/actions)

MedLitAnno is a comprehensive medical literature analysis platform that combines automated annotation, PubMed search integration, and causal knowledge discovery. Extract structured information about bacteria-disease relationships from scientific texts, search and annotate PubMed literature automatically, and discover causal relationships through Mendelian Randomization (MR) analysis.

## 🌟 Features

### 🔍 PubMed Literature Search & Annotation
- **Direct PubMed Integration**: Search medical literature using keywords, diseases, bacteria, or recent publications
- **Automated Annotation Pipeline**: Seamlessly combine literature search with LLM-powered annotation
- **Multiple Search Strategies**: Basic search, disease-bacteria relationships, recent articles, keyword combinations
- **Excel Export**: Save search results with comprehensive metadata and citation information
- **Rate-Limited API Access**: Compliant with PubMed guidelines for responsible usage

### 📝 Advanced Medical Literature Annotation
- **Multi-model Support**: Use OpenAI, DeepSeek, DeepSeek Reasoner, or Qianwen models
- **Automatic Position Matching**: Intelligent text position calculation with 100% success rate
- **Smart Content Recognition**: LLM focuses on content identification while system handles positioning
- **Robust Processing**: Breakpoint resume and error retry mechanisms for network stability
- **Comprehensive Annotation**: Entity recognition, relation extraction, evidence detection
- **Batch Processing**: Process entire directories of Excel files with progress monitoring
- **Format Conversion**: Export to Label Studio compatible format

### MRAgent: Causal Knowledge Discovery
- **Automated Literature Analysis**: Scans scientific literature to discover potential exposure-outcome pairs
- **Causal Inference**: Performs Mendelian Randomization using GWAS data
- **Knowledge Discovery Mode**: Autonomously identifies potential causal factors for diseases
- **Causal Validation Mode**: Validates specific causal hypotheses
- **GWAS Integration**: Seamless integration with OpenGWAS database

## 🚀 Installation

### From PyPI (Recommended)

```bash
pip install medlitanno
```

### From Source

```bash
# Clone the repository
git clone https://github.com/chenxingqiang/medlitanno.git
cd medlitanno

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install the package
pip install -e .
```

## ⚙️ Configuration

### API Keys and Environment Setup

Set your API keys and configuration as environment variables:

```bash
# For LLM models
export DEEPSEEK_API_KEY="your-deepseek-api-key"
export QIANWEN_API_KEY="your-qianwen-api-key"
export OPENAI_API_KEY="your-openai-api-key"  # Optional

# For PubMed search (required for literature search)
export PUBMED_EMAIL="your_email@example.com"  # Required by PubMed API
export PUBMED_TOOL="medlitanno"              # Tool identifier

# For MR analysis (optional)
export OPENGWAS_JWT="your-opengwas-jwt-token"
```

### Configuration File

You can also create a `.env` file in your project directory:

```bash
# Copy the example configuration
cp config/env.example .env
# Edit .env with your actual API keys and settings
```

## 📊 Usage

### 🔍 PubMed Literature Search

#### Command Line Interface

```bash
# Search PubMed and automatically annotate results
medlitanno search "Helicobacter pylori gastric cancer" --max-results 50

# Search for disease-bacteria relationships
medlitanno search "diabetes microbiome" --disease "diabetes" --bacteria "gut bacteria"

# Search recent publications (last 30 days)
medlitanno search "COVID-19 microbiome" --recent-days 30

# Search and save results to Excel (without annotation)
medlitanno search "inflammatory bowel disease" --output-dir ./results --max-results 100
```

#### Python API

```python
from medlitanno.pubmed import PubMedSearcher, search_and_annotate
import os

# Initialize PubMed searcher
searcher = PubMedSearcher(
    email=os.environ.get("PUBMED_EMAIL"),
    tool="medlitanno"
)

# Search for articles
results = searcher.search("Helicobacter pylori gastric cancer", max_results=50)
print(f"Found {len(results.articles)} articles")

# Search and automatically annotate
search_and_annotate(
    query="microbiome inflammatory disease",
    api_key=os.environ.get("DEEPSEEK_API_KEY"),
    model="deepseek-chat",
    max_results=20,
    output_dir="./results"
)
```

### 📝 Medical Literature Annotation

#### Command Line Interface

```bash
# Annotate medical literature
medlitanno annotate --data-dir datatrain --model deepseek-chat

# Use DeepSeek Reasoner for enhanced inference
medlitanno annotate --data-dir datatrain --model deepseek-reasoner --model-type deepseek
```

#### Python API

```python
from medlitanno.annotation import MedicalAnnotationLLM
import os

# Initialize the annotator with automatic position matching
annotator = MedicalAnnotationLLM(
    api_key=os.environ.get("DEEPSEEK_API_KEY"),
    model="deepseek-chat",
    model_type="deepseek"
)

# Annotate text with automatic position calculation
text = "Helicobacter pylori infection is associated with gastric cancer."
result = annotator.annotate_text(text)

# Print results with position information
print(f"Entities: {result.entities}")
for entity in result.entities:
    print(f"  - {entity.text} ({entity.label}): pos {entity.start_pos}-{entity.end_pos}, confidence: {entity.confidence:.2f}")

print(f"Relations: {result.relations}")
print(f"Evidences: {result.evidences}")
for evidence in result.evidences:
    print(f"  - {evidence.text}: pos {evidence.start_pos}-{evidence.end_pos}, confidence: {evidence.confidence:.2f}")
```

### MRAgent: Causal Knowledge Discovery

#### Command Line Interface

```bash
# Knowledge Discovery mode
medlitanno mr --outcome "back pain" --model gpt-4o

# Causal Validation mode
medlitanno mr --exposure "osteoarthritis" --outcome "back pain" --mode causal
```

#### Python API

```python
from medlitanno.mragent import MRAgent, MRAgentOE
import os

# Knowledge Discovery mode
agent = MRAgent(
    outcome="back pain",
    AI_key=os.environ.get("OPENAI_API_KEY"),
    LLM_model="gpt-4o",
    gwas_token=os.environ.get("OPENGWAS_JWT")
)
agent.run()

# Causal Validation mode
agent_oe = MRAgentOE(
    exposure="osteoarthritis",
    outcome="back pain",
    AI_key=os.environ.get("OPENAI_API_KEY"),
    LLM_model="gpt-4o",
    gwas_token=os.environ.get("OPENGWAS_JWT")
)
agent_oe.run()
```

## 📄 Output Format

### PubMed Search Results

PubMed search provides:

1. **Article Metadata**: Title, abstract, authors, publication date, journal
2. **Citation Information**: PMID, DOI, publication details
3. **Search Statistics**: Total results, query details, search timestamp
4. **Excel Export**: Structured data export for further analysis

### Annotation System

The annotation system extracts structured information with automatic position matching:

1. **Entities**: Bacteria and Disease mentions with precise text positions
2. **Relations**: Connections between entities with relation types
3. **Evidences**: Text spans supporting the relations with confidence scores
4. **Position Statistics**: Success rates and confidence metrics for quality assessment

#### Relation Types

- `contributes_to`: Bacteria contributes to disease development
- `ameliorates`: Bacteria improves or alleviates disease
- `correlated_with`: Bacteria and disease show correlation
- `biomarker_for`: Bacteria serves as a biomarker for disease

#### Position Matching Features

- **100% Success Rate**: Intelligent matching strategies ensure reliable position detection
- **Multiple Strategies**: Exact, case-insensitive, normalized, fuzzy, and partial matching
- **Confidence Scoring**: Average confidence >0.8 for quality assessment
- **Automatic Fallback**: Progressive matching strategies for robust results

### MRAgent System

MRAgent provides:

1. **Literature Analysis**: Summary of relevant scientific papers
2. **Potential Exposures**: List of potential causal factors
3. **MR Results**: Statistical evidence for causal relationships
4. **Visualizations**: Forest plots and other visual representations
5. **Recommendations**: Insights for further research

## 🚀 Performance

### Literature Search
- **PubMed Integration**: Real-time search with rate limiting (3 requests/second)
- **Search Speed**: ~2-5 seconds per query (depends on result count)
- **Result Processing**: Handles thousands of articles efficiently

### Annotation System
- **Processing Speed**: ~30-60 seconds per document (depends on model and text length)
- **Position Matching**: 100% success rate with <1 second processing per document
- **Batch Processing**: Optimized for large-scale literature analysis
- **Accuracy**: Comparable to manual annotation in controlled tests

### MR Analysis
- **Literature Processing**: Handles hundreds of articles and GWAS datasets efficiently
- **Causal Discovery**: Automated analysis of complex exposure-outcome relationships

## 💪 Stability & Reliability

### Network Resilience
- **Automatic Retry**: Smart retry mechanisms for network instability
- **Rate Limiting**: Compliant with API guidelines and rate limits
- **Connection Recovery**: Robust handling of network interruptions

### Processing Reliability
- **Breakpoint Resume**: Automatically continues from the last processed file
- **Error Recovery**: Graceful handling of parsing and processing errors
- **Progress Monitoring**: Real-time tracking with detailed statistics
- **Data Validation**: Comprehensive validation of results and outputs

## 📋 Project Structure

```
medlitanno/
├── src/                    # Source code
│   └── medlitanno/         # Main package
│       ├── annotation/     # Annotation system with position matching
│       ├── pubmed/         # PubMed search integration
│       ├── common/         # Shared utilities and base classes
│       └── mragent/        # MR analysis (optional, requires biopython)
├── docs/                   # Documentation
│   ├── PUBMED_SEARCH_GUIDE.md  # PubMed search usage guide
│   ├── README_annotation.md    # Annotation system documentation
│   └── ...
├── examples/               # Example scripts and demos
│   ├── pubmed_search_demo.py   # PubMed search examples
│   ├── position_matching_demo.py # Position matching examples
│   └── ...
├── tests/                  # Unit tests
├── scripts/                # Utility scripts
├── config/                 # Configuration files
│   ├── env.example         # Environment configuration template
│   └── requirements.txt    # Dependencies
├── CHANGELOG.md            # Version history
└── ...
```

## 📚 Documentation

- **[PubMed Search Guide](docs/PUBMED_SEARCH_GUIDE.md)**: Complete guide for literature search functionality
- **[Annotation Documentation](docs/README_annotation.md)**: Detailed annotation system documentation
- **[Setup Guide](docs/SETUP.md)**: Installation and configuration instructions
- **[Examples](examples/)**: Working examples and demo scripts

## 🔄 Version History

See [CHANGELOG.md](CHANGELOG.md) for detailed version history and feature updates.

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

### Development Setup

```bash
# Clone the repository
git clone https://github.com/chenxingqiang/medlitanno.git
cd medlitanno

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest tests/
```

## 📜 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 📧 Contact

For questions or feedback, please contact [joy66777@gmail.com](mailto:joy66777@gmail.com).

## 🙏 Acknowledgments

- **[MRAgent](https://github.com/xuwei1997/MRAgent)**: Innovative automated agent for causal knowledge discovery via Mendelian Randomization
- **[PyMed](https://github.com/gijswobben/pymed)**: Python library for PubMed API access
- **[OpenGWAS](https://gwas.mrcieu.ac.uk/)**: GWAS summary data for causal inference

---

**Latest Version**: v1.1.0 - Now with PubMed search integration and automatic position matching!

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/chenxingqiang/medlitanno",
    "name": "medlitanno",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "medical literature, annotation, pubmed search, mendelian randomization, llm, biomedical nlp, causal inference, gwas, automation, literature mining",
    "author": "Chen Xingqiang",
    "author_email": "Chen Xingqiang <chenxingqiang@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/04/c5/5542fa77e58aca420ec82e2daf703ee295afe9eda6e02eb86c966342782f/medlitanno-1.1.1.tar.gz",
    "platform": null,
    "description": "# MedLitAnno: Medical Literature Annotation System\n\n[![GitHub](https://img.shields.io/github/license/chenxingqiang/medlitanno)](LICENSE)\n[![Python](https://img.shields.io/badge/python-3.8%2B-blue)](https://www.python.org/)\n[![PyPI](https://img.shields.io/badge/pypi-v1.1.0-blue)](https://pypi.org/project/medlitanno/)\n[![CI](https://img.shields.io/badge/CI-passing-brightgreen)](https://github.com/chenxingqiang/medlitanno/actions)\n\nMedLitAnno is a comprehensive medical literature analysis platform that combines automated annotation, PubMed search integration, and causal knowledge discovery. Extract structured information about bacteria-disease relationships from scientific texts, search and annotate PubMed literature automatically, and discover causal relationships through Mendelian Randomization (MR) analysis.\n\n## \ud83c\udf1f Features\n\n### \ud83d\udd0d PubMed Literature Search & Annotation\n- **Direct PubMed Integration**: Search medical literature using keywords, diseases, bacteria, or recent publications\n- **Automated Annotation Pipeline**: Seamlessly combine literature search with LLM-powered annotation\n- **Multiple Search Strategies**: Basic search, disease-bacteria relationships, recent articles, keyword combinations\n- **Excel Export**: Save search results with comprehensive metadata and citation information\n- **Rate-Limited API Access**: Compliant with PubMed guidelines for responsible usage\n\n### \ud83d\udcdd Advanced Medical Literature Annotation\n- **Multi-model Support**: Use OpenAI, DeepSeek, DeepSeek Reasoner, or Qianwen models\n- **Automatic Position Matching**: Intelligent text position calculation with 100% success rate\n- **Smart Content Recognition**: LLM focuses on content identification while system handles positioning\n- **Robust Processing**: Breakpoint resume and error retry mechanisms for network stability\n- **Comprehensive Annotation**: Entity recognition, relation extraction, evidence detection\n- **Batch Processing**: Process entire directories of Excel files with progress monitoring\n- **Format Conversion**: Export to Label Studio compatible format\n\n### MRAgent: Causal Knowledge Discovery\n- **Automated Literature Analysis**: Scans scientific literature to discover potential exposure-outcome pairs\n- **Causal Inference**: Performs Mendelian Randomization using GWAS data\n- **Knowledge Discovery Mode**: Autonomously identifies potential causal factors for diseases\n- **Causal Validation Mode**: Validates specific causal hypotheses\n- **GWAS Integration**: Seamless integration with OpenGWAS database\n\n## \ud83d\ude80 Installation\n\n### From PyPI (Recommended)\n\n```bash\npip install medlitanno\n```\n\n### From Source\n\n```bash\n# Clone the repository\ngit clone https://github.com/chenxingqiang/medlitanno.git\ncd medlitanno\n\n# Create and activate virtual environment\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n\n# Install the package\npip install -e .\n```\n\n## \u2699\ufe0f Configuration\n\n### API Keys and Environment Setup\n\nSet your API keys and configuration as environment variables:\n\n```bash\n# For LLM models\nexport DEEPSEEK_API_KEY=\"your-deepseek-api-key\"\nexport QIANWEN_API_KEY=\"your-qianwen-api-key\"\nexport OPENAI_API_KEY=\"your-openai-api-key\"  # Optional\n\n# For PubMed search (required for literature search)\nexport PUBMED_EMAIL=\"your_email@example.com\"  # Required by PubMed API\nexport PUBMED_TOOL=\"medlitanno\"              # Tool identifier\n\n# For MR analysis (optional)\nexport OPENGWAS_JWT=\"your-opengwas-jwt-token\"\n```\n\n### Configuration File\n\nYou can also create a `.env` file in your project directory:\n\n```bash\n# Copy the example configuration\ncp config/env.example .env\n# Edit .env with your actual API keys and settings\n```\n\n## \ud83d\udcca Usage\n\n### \ud83d\udd0d PubMed Literature Search\n\n#### Command Line Interface\n\n```bash\n# Search PubMed and automatically annotate results\nmedlitanno search \"Helicobacter pylori gastric cancer\" --max-results 50\n\n# Search for disease-bacteria relationships\nmedlitanno search \"diabetes microbiome\" --disease \"diabetes\" --bacteria \"gut bacteria\"\n\n# Search recent publications (last 30 days)\nmedlitanno search \"COVID-19 microbiome\" --recent-days 30\n\n# Search and save results to Excel (without annotation)\nmedlitanno search \"inflammatory bowel disease\" --output-dir ./results --max-results 100\n```\n\n#### Python API\n\n```python\nfrom medlitanno.pubmed import PubMedSearcher, search_and_annotate\nimport os\n\n# Initialize PubMed searcher\nsearcher = PubMedSearcher(\n    email=os.environ.get(\"PUBMED_EMAIL\"),\n    tool=\"medlitanno\"\n)\n\n# Search for articles\nresults = searcher.search(\"Helicobacter pylori gastric cancer\", max_results=50)\nprint(f\"Found {len(results.articles)} articles\")\n\n# Search and automatically annotate\nsearch_and_annotate(\n    query=\"microbiome inflammatory disease\",\n    api_key=os.environ.get(\"DEEPSEEK_API_KEY\"),\n    model=\"deepseek-chat\",\n    max_results=20,\n    output_dir=\"./results\"\n)\n```\n\n### \ud83d\udcdd Medical Literature Annotation\n\n#### Command Line Interface\n\n```bash\n# Annotate medical literature\nmedlitanno annotate --data-dir datatrain --model deepseek-chat\n\n# Use DeepSeek Reasoner for enhanced inference\nmedlitanno annotate --data-dir datatrain --model deepseek-reasoner --model-type deepseek\n```\n\n#### Python API\n\n```python\nfrom medlitanno.annotation import MedicalAnnotationLLM\nimport os\n\n# Initialize the annotator with automatic position matching\nannotator = MedicalAnnotationLLM(\n    api_key=os.environ.get(\"DEEPSEEK_API_KEY\"),\n    model=\"deepseek-chat\",\n    model_type=\"deepseek\"\n)\n\n# Annotate text with automatic position calculation\ntext = \"Helicobacter pylori infection is associated with gastric cancer.\"\nresult = annotator.annotate_text(text)\n\n# Print results with position information\nprint(f\"Entities: {result.entities}\")\nfor entity in result.entities:\n    print(f\"  - {entity.text} ({entity.label}): pos {entity.start_pos}-{entity.end_pos}, confidence: {entity.confidence:.2f}\")\n\nprint(f\"Relations: {result.relations}\")\nprint(f\"Evidences: {result.evidences}\")\nfor evidence in result.evidences:\n    print(f\"  - {evidence.text}: pos {evidence.start_pos}-{evidence.end_pos}, confidence: {evidence.confidence:.2f}\")\n```\n\n### MRAgent: Causal Knowledge Discovery\n\n#### Command Line Interface\n\n```bash\n# Knowledge Discovery mode\nmedlitanno mr --outcome \"back pain\" --model gpt-4o\n\n# Causal Validation mode\nmedlitanno mr --exposure \"osteoarthritis\" --outcome \"back pain\" --mode causal\n```\n\n#### Python API\n\n```python\nfrom medlitanno.mragent import MRAgent, MRAgentOE\nimport os\n\n# Knowledge Discovery mode\nagent = MRAgent(\n    outcome=\"back pain\",\n    AI_key=os.environ.get(\"OPENAI_API_KEY\"),\n    LLM_model=\"gpt-4o\",\n    gwas_token=os.environ.get(\"OPENGWAS_JWT\")\n)\nagent.run()\n\n# Causal Validation mode\nagent_oe = MRAgentOE(\n    exposure=\"osteoarthritis\",\n    outcome=\"back pain\",\n    AI_key=os.environ.get(\"OPENAI_API_KEY\"),\n    LLM_model=\"gpt-4o\",\n    gwas_token=os.environ.get(\"OPENGWAS_JWT\")\n)\nagent_oe.run()\n```\n\n## \ud83d\udcc4 Output Format\n\n### PubMed Search Results\n\nPubMed search provides:\n\n1. **Article Metadata**: Title, abstract, authors, publication date, journal\n2. **Citation Information**: PMID, DOI, publication details\n3. **Search Statistics**: Total results, query details, search timestamp\n4. **Excel Export**: Structured data export for further analysis\n\n### Annotation System\n\nThe annotation system extracts structured information with automatic position matching:\n\n1. **Entities**: Bacteria and Disease mentions with precise text positions\n2. **Relations**: Connections between entities with relation types\n3. **Evidences**: Text spans supporting the relations with confidence scores\n4. **Position Statistics**: Success rates and confidence metrics for quality assessment\n\n#### Relation Types\n\n- `contributes_to`: Bacteria contributes to disease development\n- `ameliorates`: Bacteria improves or alleviates disease\n- `correlated_with`: Bacteria and disease show correlation\n- `biomarker_for`: Bacteria serves as a biomarker for disease\n\n#### Position Matching Features\n\n- **100% Success Rate**: Intelligent matching strategies ensure reliable position detection\n- **Multiple Strategies**: Exact, case-insensitive, normalized, fuzzy, and partial matching\n- **Confidence Scoring**: Average confidence >0.8 for quality assessment\n- **Automatic Fallback**: Progressive matching strategies for robust results\n\n### MRAgent System\n\nMRAgent provides:\n\n1. **Literature Analysis**: Summary of relevant scientific papers\n2. **Potential Exposures**: List of potential causal factors\n3. **MR Results**: Statistical evidence for causal relationships\n4. **Visualizations**: Forest plots and other visual representations\n5. **Recommendations**: Insights for further research\n\n## \ud83d\ude80 Performance\n\n### Literature Search\n- **PubMed Integration**: Real-time search with rate limiting (3 requests/second)\n- **Search Speed**: ~2-5 seconds per query (depends on result count)\n- **Result Processing**: Handles thousands of articles efficiently\n\n### Annotation System\n- **Processing Speed**: ~30-60 seconds per document (depends on model and text length)\n- **Position Matching**: 100% success rate with <1 second processing per document\n- **Batch Processing**: Optimized for large-scale literature analysis\n- **Accuracy**: Comparable to manual annotation in controlled tests\n\n### MR Analysis\n- **Literature Processing**: Handles hundreds of articles and GWAS datasets efficiently\n- **Causal Discovery**: Automated analysis of complex exposure-outcome relationships\n\n## \ud83d\udcaa Stability & Reliability\n\n### Network Resilience\n- **Automatic Retry**: Smart retry mechanisms for network instability\n- **Rate Limiting**: Compliant with API guidelines and rate limits\n- **Connection Recovery**: Robust handling of network interruptions\n\n### Processing Reliability\n- **Breakpoint Resume**: Automatically continues from the last processed file\n- **Error Recovery**: Graceful handling of parsing and processing errors\n- **Progress Monitoring**: Real-time tracking with detailed statistics\n- **Data Validation**: Comprehensive validation of results and outputs\n\n## \ud83d\udccb Project Structure\n\n```\nmedlitanno/\n\u251c\u2500\u2500 src/                    # Source code\n\u2502   \u2514\u2500\u2500 medlitanno/         # Main package\n\u2502       \u251c\u2500\u2500 annotation/     # Annotation system with position matching\n\u2502       \u251c\u2500\u2500 pubmed/         # PubMed search integration\n\u2502       \u251c\u2500\u2500 common/         # Shared utilities and base classes\n\u2502       \u2514\u2500\u2500 mragent/        # MR analysis (optional, requires biopython)\n\u251c\u2500\u2500 docs/                   # Documentation\n\u2502   \u251c\u2500\u2500 PUBMED_SEARCH_GUIDE.md  # PubMed search usage guide\n\u2502   \u251c\u2500\u2500 README_annotation.md    # Annotation system documentation\n\u2502   \u2514\u2500\u2500 ...\n\u251c\u2500\u2500 examples/               # Example scripts and demos\n\u2502   \u251c\u2500\u2500 pubmed_search_demo.py   # PubMed search examples\n\u2502   \u251c\u2500\u2500 position_matching_demo.py # Position matching examples\n\u2502   \u2514\u2500\u2500 ...\n\u251c\u2500\u2500 tests/                  # Unit tests\n\u251c\u2500\u2500 scripts/                # Utility scripts\n\u251c\u2500\u2500 config/                 # Configuration files\n\u2502   \u251c\u2500\u2500 env.example         # Environment configuration template\n\u2502   \u2514\u2500\u2500 requirements.txt    # Dependencies\n\u251c\u2500\u2500 CHANGELOG.md            # Version history\n\u2514\u2500\u2500 ...\n```\n\n## \ud83d\udcda Documentation\n\n- **[PubMed Search Guide](docs/PUBMED_SEARCH_GUIDE.md)**: Complete guide for literature search functionality\n- **[Annotation Documentation](docs/README_annotation.md)**: Detailed annotation system documentation\n- **[Setup Guide](docs/SETUP.md)**: Installation and configuration instructions\n- **[Examples](examples/)**: Working examples and demo scripts\n\n## \ud83d\udd04 Version History\n\nSee [CHANGELOG.md](CHANGELOG.md) for detailed version history and feature updates.\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\n\n### Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/chenxingqiang/medlitanno.git\ncd medlitanno\n\n# Install in development mode\npip install -e \".[dev]\"\n\n# Run tests\npytest tests/\n```\n\n## \ud83d\udcdc License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\udce7 Contact\n\nFor questions or feedback, please contact [joy66777@gmail.com](mailto:joy66777@gmail.com).\n\n## \ud83d\ude4f Acknowledgments\n\n- **[MRAgent](https://github.com/xuwei1997/MRAgent)**: Innovative automated agent for causal knowledge discovery via Mendelian Randomization\n- **[PyMed](https://github.com/gijswobben/pymed)**: Python library for PubMed API access\n- **[OpenGWAS](https://gwas.mrcieu.ac.uk/)**: GWAS summary data for causal inference\n\n---\n\n**Latest Version**: v1.1.0 - Now with PubMed search integration and automatic position matching! \n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Medical Literature Analysis and Annotation System with LLM-powered automation",
    "version": "1.1.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/chenxingqiang/medlitanno/issues",
        "Documentation": "https://github.com/chenxingqiang/medlitanno/blob/main/docs/",
        "Homepage": "https://github.com/chenxingqiang/medlitanno",
        "Source Code": "https://github.com/chenxingqiang/medlitanno"
    },
    "split_keywords": [
        "medical literature",
        " annotation",
        " pubmed search",
        " mendelian randomization",
        " llm",
        " biomedical nlp",
        " causal inference",
        " gwas",
        " automation",
        " literature mining"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "30200a57822e8264c4a0196c00a6b10e582e15d316357719062f4b7ca85c2c4d",
                "md5": "634f374e9af647d03e008e30ecc8af28",
                "sha256": "d02f2eab331460c0296b7c888bfdb4ded7ddd35ac5651603aa5c7724e43b0146"
            },
            "downloads": -1,
            "filename": "medlitanno-1.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "634f374e9af647d03e008e30ecc8af28",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 1671340,
            "upload_time": "2025-08-14T09:35:18",
            "upload_time_iso_8601": "2025-08-14T09:35:18.164557Z",
            "url": "https://files.pythonhosted.org/packages/30/20/0a57822e8264c4a0196c00a6b10e582e15d316357719062f4b7ca85c2c4d/medlitanno-1.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "04c55542fa77e58aca420ec82e2daf703ee295afe9eda6e02eb86c966342782f",
                "md5": "495c416dc83690bbab2899ac8c00e2dd",
                "sha256": "7aeae60d27918cecdadbb26ba335d001323fb80d723ba36d3b41c20932a5276d"
            },
            "downloads": -1,
            "filename": "medlitanno-1.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "495c416dc83690bbab2899ac8c00e2dd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 1624512,
            "upload_time": "2025-08-14T09:35:20",
            "upload_time_iso_8601": "2025-08-14T09:35:20.794873Z",
            "url": "https://files.pythonhosted.org/packages/04/c5/5542fa77e58aca420ec82e2daf703ee295afe9eda6e02eb86c966342782f/medlitanno-1.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-14 09:35:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "chenxingqiang",
    "github_project": "medlitanno",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "openai",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.28.0"
                ]
            ]
        },
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    ">=",
                    "4.11.0"
                ]
            ]
        },
        {
            "name": "biopython",
            "specs": [
                [
                    ">=",
                    "1.81"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.24.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.10.0"
                ]
            ]
        },
        {
            "name": "dataclasses-json",
            "specs": [
                [
                    ">=",
                    "0.5.0"
                ]
            ]
        },
        {
            "name": "pathlib2",
            "specs": [
                [
                    ">=",
                    "2.3.0"
                ]
            ]
        },
        {
            "name": "typing-extensions",
            "specs": [
                [
                    ">=",
                    "4.5.0"
                ]
            ]
        },
        {
            "name": "reportlab",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "PyPDF2",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "rpy2",
            "specs": [
                [
                    ">=",
                    "3.5.0"
                ]
            ]
        },
        {
            "name": "streamlit",
            "specs": [
                [
                    ">=",
                    "1.28.0"
                ]
            ]
        },
        {
            "name": "plotly",
            "specs": [
                [
                    ">=",
                    "5.17.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "7.0.0"
                ]
            ]
        },
        {
            "name": "pytest-cov",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "black",
            "specs": [
                [
                    ">=",
                    "23.0.0"
                ]
            ]
        },
        {
            "name": "flake8",
            "specs": [
                [
                    ">=",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "mypy",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.65.0"
                ]
            ]
        },
        {
            "name": "colorama",
            "specs": [
                [
                    ">=",
                    "0.4.6"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        }
    ],
    "lcname": "medlitanno"
}

Chen Xingqiang