biodisco


Namebiodisco JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/yujingke/BioDisco
SummaryAI-powered Biomedical Discovery Agent System
upload_time2025-07-30 13:43:08
maintainerNone
docs_urlNone
authorBioDisco Team
requires_python>=3.11
licenseNone
keywords biomedical ai agents hypothesis generation literature mining knowledge graph
VCS
bugtrack_url
requirements langroid transformers neo4j autogen faiss-cpu sentence-transformers python-dotenv
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # BioDisco 🧬🤖

**AI-powered Biomedical Discovery Agent System**

BioDisco is a comprehensive framework for scientific hypothesis generation and biomedical literature and knowledge graph mining using AI agents. It leverages multiple AI agents to automatically discover patterns, generate hypotheses, and gather supporting evidence from biomedical literature and knowledge databases.

## 🌟 Features

- **Multi-Agent AI System**: Coordinated AI agents for different aspects of scientific discovery
- **Hypothesis Generation**: Automated generation of novel biomedical hypotheses
- **Literature Mining**: Intelligent PubMed search and literature analysis
- **Knowledge Graph Integration**: Neo4j-based knowledge graph for storing and querying biomedical entities
- **Evidence Collection**: Systematic gathering and linking of supporting evidence
- **Simple Python Interface**: Easy-to-use API for scientific discovery

## 🚀 Quick Start

### Installation

```bash
pip install biodisco
```

### Basic Usage

BioDisco provides a simple interface for biomedical discovery

```python
import BioDisco

# Simple disease-based discovery
results = BioDisco.generate("Role of GPR153 in vascular injury and disease")

```

**You need to setup your Open AI API key as a environment variable `OPENAI_API_KEY`**

On your terminal 
```bash
export OPENAI_API_KEY=your_openai_api_key_here
````

or create a `.env` file in your project directory (check `.env.example`)
```bash
# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here
```


### Development Installation

```bash
git clone https://github.com/yujingke/BioDisco.git
cd BioDisco
pip install -e .
```

## 🔧 PubMed and Knowledge Graph Integration

By default PubMed and Knowledge Graph Integration is off. Follow the steps to setup knowledge integration.

### PubMed Setup

You can setup an an environment variable `DISABLE_PUBMED=False` in your `.env` file or using `export` command

**or**

Just pass an argument to the generate function

```python
## Turn on PubMed Integration
results = BioDisco.generate("Role of GPR153 in vascular injury and disease", disable_pubmed=False)
```

### Neo4j Setup

#### 1. Install Neo4j server 

First you need to install Neo4j server. Follow the instuctions [here](https://neo4j.com/docs/operations-manual/current/installation/) to install Neo4j for your OS

#### 2. Add Neo4j login details to as enviroment variable

```bash
export NEO4J_URI=bolt://localhost:7687
export NEO4J_USER=neo4j
export NEO4J_PASSWORD=your_neo4j_password
```
or set these `.env` file (check `.env.example`)

#### 3. Download and Setup PrimeKG

- Download PrimeKG 
```bash
wget -O kg.csv https://dataverse.harvard.edu/api/access/datafile/6180620
```

- run `split_nodes_edges.py` (should be in the same location as `kg.csv`) to create `nodes.csv` and `edges.csv`

- run `build_kg_index.py` (should be in the same location as `nodes.csv`)

- add location of files as environment variable `KG_PATH`  (check `.env.example`)
```bash
export KG_PATH=/path/to/your/kg_specific_files
```

#### Import PrimeKG to Neo4j

```bash
neo4j-admin database import full --nodes nodes.csv --relationships edges.csv --overwrite-destination
```

#### Start Neo4j

```bash
neo4j start
```

#### Turn on PubMed and KG Integration

```python
results = BioDisco.generate("Role of GPR153 in vascular injury and disease", disable_pubmed=False, disable_kg=False)
```

**or**

setup environment variable `DISABLE_KG=False` (check `.env.example`)

<!-- ## 🏗️ Architecture

BioDisco consists of several core components:


### AI Agents
- **KeywordExtractorAgent**: Extracts relevant keywords from text
- **HypothesisQueryAgent**: Generates search queries for hypothesis validation
- **DomainSelectorAgent**: Identifies relevant scientific domains
- **ScientistAgent**: Orchestrates the discovery process

### Data Integration
- **Neo4j Integration**: Graph database for complex biomedical relationships
- **PubMed API**: Automated literature search and retrieval
- **Embedding Models**: Semantic similarity and search capabilities -->

## 📖 Detailed Usage

### Setting number of iterations and PubMed cutoff

```python
import BioDisco

results = BioDisco.generate("Role of GPR153 in vascular injury and disease", disable_pubmed=False, disable_kg=False, n_iterations=3, start_year=2020)
```


<!-- ## 📚 Examples

Check out the `/examples` directory for comprehensive examples:

- `basic_usage.py`: Getting started with core functionality
- `hypothesis_pipeline.ipynb`: Full hypothesis generation pipeline
- `literature_analysis.ipynb`: Advanced literature mining techniques
- `knowledge_graph_demo.ipynb`: Working with biomedical knowledge graphs -->


<!-- ## 📖 Documentation

Full documentation is available at: [https://biodisco.readthedocs.io/](https://biodisco.readthedocs.io/)

```

## 📚 API Reference

### Main Interface

#### `BioDisco.generate(input_data, **kwargs)`

The primary interface for biomedical discovery.

**Parameters:**
- `input_data` (str or dict): Disease name or structured input
- `genes` (list, optional): List of gene names 
- `background` (str, optional): Background research context
- `start_year` (int, optional): Earliest publication year (default: 2019)
- `min_results` (int, optional): Minimum search results (default: 3)
- `max_results` (int, optional): Maximum search results (default: 10)
- `n_iterations` (int, optional): Number of discovery iterations (default: 3)
- `max_articles_per_round` (int, optional): Articles per iteration (default: 10)
- `node_limit` (int, optional): Knowledge graph node limit (default: 50)
- `direct_edge_limit` (int, optional): Knowledge graph edge limit (default: 30)

**Returns:**
- `dict`: Discovery results with hypotheses, evidence, and analysis

**Examples:**
```python
# Simple usage
results = BioDisco.generate("diabetes")

# With genes
results = BioDisco.generate("cancer", genes=["BRCA1", "BRCA2"])

# Structured input
results = BioDisco.generate({
    "disease": "Alzheimer's", 
    "genes": ["APP", "PSEN1"],
    "background": "Amyloid cascade hypothesis"
})

# Custom parameters
results = BioDisco.generate(
    "multiple sclerosis",
    n_iterations=5,
    max_results=20,
    start_year=2020
)
```

### Advanced Functions

- `run_biodisco_full(disease, core_genes, **params)`: Full pipeline with background generation
- `run_full_pipeline(background, **params)`: Evidence-focused pipeline
- `run_background_only(disease, core_genes, **params)`: Generate research background only -->

<!-- ## 🤝 Contributing

We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request -->

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

<!-- ## 🙏 Acknowledgments

- OpenAI for providing the foundation AI models
- Neo4j for graph database technology
- PubMed/NCBI for biomedical literature access
- The scientific community for open data and collaboration -->

<!-- ## 📧 Contact

- **Team**: BioDisco Team
- **Email**: contact@biodisco.ai
- **Issues**: [GitHub Issues](https://github.com/yourusername/BioDisco/issues) -->

<!-- ## 🔄 Version History

- **v0.1.0**: Initial release with core functionality
  - Basic library system for data management
  - AI agent framework
  - Neo4j and PubMed integration
  - Hypothesis generation and evidence collection -->


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yujingke/BioDisco",
    "name": "biodisco",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "biomedical, AI, agents, hypothesis generation, literature mining, knowledge graph",
    "author": "BioDisco Team",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/0e/26/64cb58d6af1dbe3d70d2aebf14986d792e7a5ab113e9e8e8c59b19f7534f/biodisco-0.1.0.tar.gz",
    "platform": null,
    "description": "# BioDisco \ud83e\uddec\ud83e\udd16\n\n**AI-powered Biomedical Discovery Agent System**\n\nBioDisco is a comprehensive framework for scientific hypothesis generation and biomedical literature and knowledge graph mining using AI agents. It leverages multiple AI agents to automatically discover patterns, generate hypotheses, and gather supporting evidence from biomedical literature and knowledge databases.\n\n## \ud83c\udf1f Features\n\n- **Multi-Agent AI System**: Coordinated AI agents for different aspects of scientific discovery\n- **Hypothesis Generation**: Automated generation of novel biomedical hypotheses\n- **Literature Mining**: Intelligent PubMed search and literature analysis\n- **Knowledge Graph Integration**: Neo4j-based knowledge graph for storing and querying biomedical entities\n- **Evidence Collection**: Systematic gathering and linking of supporting evidence\n- **Simple Python Interface**: Easy-to-use API for scientific discovery\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n```bash\npip install biodisco\n```\n\n### Basic Usage\n\nBioDisco provides a simple interface for biomedical discovery\n\n```python\nimport BioDisco\n\n# Simple disease-based discovery\nresults = BioDisco.generate(\"Role of GPR153 in vascular injury and disease\")\n\n```\n\n**You need to setup your Open AI API key as a environment variable `OPENAI_API_KEY`**\n\nOn your terminal \n```bash\nexport OPENAI_API_KEY=your_openai_api_key_here\n````\n\nor create a `.env` file in your project directory (check `.env.example`)\n```bash\n# OpenAI API Configuration\nOPENAI_API_KEY=your_openai_api_key_here\n```\n\n\n### Development Installation\n\n```bash\ngit clone https://github.com/yujingke/BioDisco.git\ncd BioDisco\npip install -e .\n```\n\n## \ud83d\udd27 PubMed and Knowledge Graph Integration\n\nBy default PubMed and Knowledge Graph Integration is off. Follow the steps to setup knowledge integration.\n\n### PubMed Setup\n\nYou can setup an an environment variable `DISABLE_PUBMED=False` in your `.env` file or using `export` command\n\n**or**\n\nJust pass an argument to the generate function\n\n```python\n## Turn on PubMed Integration\nresults = BioDisco.generate(\"Role of GPR153 in vascular injury and disease\", disable_pubmed=False)\n```\n\n### Neo4j Setup\n\n#### 1. Install Neo4j server \n\nFirst you need to install Neo4j server. Follow the instuctions [here](https://neo4j.com/docs/operations-manual/current/installation/) to install Neo4j for your OS\n\n#### 2. Add Neo4j login details to as enviroment variable\n\n```bash\nexport NEO4J_URI=bolt://localhost:7687\nexport NEO4J_USER=neo4j\nexport NEO4J_PASSWORD=your_neo4j_password\n```\nor set these `.env` file (check `.env.example`)\n\n#### 3. Download and Setup PrimeKG\n\n- Download PrimeKG \n```bash\nwget -O kg.csv https://dataverse.harvard.edu/api/access/datafile/6180620\n```\n\n- run `split_nodes_edges.py` (should be in the same location as `kg.csv`) to create `nodes.csv` and `edges.csv`\n\n- run `build_kg_index.py` (should be in the same location as `nodes.csv`)\n\n- add location of files as environment variable `KG_PATH`  (check `.env.example`)\n```bash\nexport KG_PATH=/path/to/your/kg_specific_files\n```\n\n#### Import PrimeKG to Neo4j\n\n```bash\nneo4j-admin database import full --nodes nodes.csv --relationships edges.csv --overwrite-destination\n```\n\n#### Start Neo4j\n\n```bash\nneo4j start\n```\n\n#### Turn on PubMed and KG Integration\n\n```python\nresults = BioDisco.generate(\"Role of GPR153 in vascular injury and disease\", disable_pubmed=False, disable_kg=False)\n```\n\n**or**\n\nsetup environment variable `DISABLE_KG=False` (check `.env.example`)\n\n<!-- ## \ud83c\udfd7\ufe0f Architecture\n\nBioDisco consists of several core components:\n\n\n### AI Agents\n- **KeywordExtractorAgent**: Extracts relevant keywords from text\n- **HypothesisQueryAgent**: Generates search queries for hypothesis validation\n- **DomainSelectorAgent**: Identifies relevant scientific domains\n- **ScientistAgent**: Orchestrates the discovery process\n\n### Data Integration\n- **Neo4j Integration**: Graph database for complex biomedical relationships\n- **PubMed API**: Automated literature search and retrieval\n- **Embedding Models**: Semantic similarity and search capabilities -->\n\n## \ud83d\udcd6 Detailed Usage\n\n### Setting number of iterations and PubMed cutoff\n\n```python\nimport BioDisco\n\nresults = BioDisco.generate(\"Role of GPR153 in vascular injury and disease\", disable_pubmed=False, disable_kg=False, n_iterations=3, start_year=2020)\n```\n\n\n<!-- ## \ud83d\udcda Examples\n\nCheck out the `/examples` directory for comprehensive examples:\n\n- `basic_usage.py`: Getting started with core functionality\n- `hypothesis_pipeline.ipynb`: Full hypothesis generation pipeline\n- `literature_analysis.ipynb`: Advanced literature mining techniques\n- `knowledge_graph_demo.ipynb`: Working with biomedical knowledge graphs -->\n\n\n<!-- ## \ud83d\udcd6 Documentation\n\nFull documentation is available at: [https://biodisco.readthedocs.io/](https://biodisco.readthedocs.io/)\n\n```\n\n## \ud83d\udcda API Reference\n\n### Main Interface\n\n#### `BioDisco.generate(input_data, **kwargs)`\n\nThe primary interface for biomedical discovery.\n\n**Parameters:**\n- `input_data` (str or dict): Disease name or structured input\n- `genes` (list, optional): List of gene names \n- `background` (str, optional): Background research context\n- `start_year` (int, optional): Earliest publication year (default: 2019)\n- `min_results` (int, optional): Minimum search results (default: 3)\n- `max_results` (int, optional): Maximum search results (default: 10)\n- `n_iterations` (int, optional): Number of discovery iterations (default: 3)\n- `max_articles_per_round` (int, optional): Articles per iteration (default: 10)\n- `node_limit` (int, optional): Knowledge graph node limit (default: 50)\n- `direct_edge_limit` (int, optional): Knowledge graph edge limit (default: 30)\n\n**Returns:**\n- `dict`: Discovery results with hypotheses, evidence, and analysis\n\n**Examples:**\n```python\n# Simple usage\nresults = BioDisco.generate(\"diabetes\")\n\n# With genes\nresults = BioDisco.generate(\"cancer\", genes=[\"BRCA1\", \"BRCA2\"])\n\n# Structured input\nresults = BioDisco.generate({\n    \"disease\": \"Alzheimer's\", \n    \"genes\": [\"APP\", \"PSEN1\"],\n    \"background\": \"Amyloid cascade hypothesis\"\n})\n\n# Custom parameters\nresults = BioDisco.generate(\n    \"multiple sclerosis\",\n    n_iterations=5,\n    max_results=20,\n    start_year=2020\n)\n```\n\n### Advanced Functions\n\n- `run_biodisco_full(disease, core_genes, **params)`: Full pipeline with background generation\n- `run_full_pipeline(background, **params)`: Evidence-focused pipeline\n- `run_background_only(disease, core_genes, **params)`: Generate research background only -->\n\n<!-- ## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request -->\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n<!-- ## \ud83d\ude4f Acknowledgments\n\n- OpenAI for providing the foundation AI models\n- Neo4j for graph database technology\n- PubMed/NCBI for biomedical literature access\n- The scientific community for open data and collaboration -->\n\n<!-- ## \ud83d\udce7 Contact\n\n- **Team**: BioDisco Team\n- **Email**: contact@biodisco.ai\n- **Issues**: [GitHub Issues](https://github.com/yourusername/BioDisco/issues) -->\n\n<!-- ## \ud83d\udd04 Version History\n\n- **v0.1.0**: Initial release with core functionality\n  - Basic library system for data management\n  - AI agent framework\n  - Neo4j and PubMed integration\n  - Hypothesis generation and evidence collection -->\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "AI-powered Biomedical Discovery Agent System",
    "version": "0.1.0",
    "project_urls": {
        "Bug Reports": "https://github.com/yujingke/BioDisco/issues",
        "Homepage": "https://github.com/yujingke/BioDisco",
        "Source": "https://github.com/yujingke/BioDisco"
    },
    "split_keywords": [
        "biomedical",
        " ai",
        " agents",
        " hypothesis generation",
        " literature mining",
        " knowledge graph"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cb60fbe131f40c6d65d83eda4451d1720cd9ebadb7ee02cf3720ce6a3702f153",
                "md5": "6493c8bec88e32b1e78224b2fd682bce",
                "sha256": "3dde5c34a40296b1dd53980ffcb434a1c03ecb6c1da6463ba90fa065a958520d"
            },
            "downloads": -1,
            "filename": "biodisco-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6493c8bec88e32b1e78224b2fd682bce",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 34214,
            "upload_time": "2025-07-30T13:43:06",
            "upload_time_iso_8601": "2025-07-30T13:43:06.797558Z",
            "url": "https://files.pythonhosted.org/packages/cb/60/fbe131f40c6d65d83eda4451d1720cd9ebadb7ee02cf3720ce6a3702f153/biodisco-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0e2664cb58d6af1dbe3d70d2aebf14986d792e7a5ab113e9e8e8c59b19f7534f",
                "md5": "5a0713ef22ef31fcc737a37e13caa410",
                "sha256": "6d2d81ca6ef493dedc7a4209acb35b6d8b1bee9e775a2172a0bb2bb839fba676"
            },
            "downloads": -1,
            "filename": "biodisco-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "5a0713ef22ef31fcc737a37e13caa410",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 32094,
            "upload_time": "2025-07-30T13:43:08",
            "upload_time_iso_8601": "2025-07-30T13:43:08.041207Z",
            "url": "https://files.pythonhosted.org/packages/0e/26/64cb58d6af1dbe3d70d2aebf14986d792e7a5ab113e9e8e8c59b19f7534f/biodisco-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-30 13:43:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yujingke",
    "github_project": "BioDisco",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "langroid",
            "specs": [
                [
                    ">=",
                    "0.58.0"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    ">=",
                    "4.53.0"
                ]
            ]
        },
        {
            "name": "neo4j",
            "specs": [
                [
                    ">=",
                    "5.28.0"
                ]
            ]
        },
        {
            "name": "autogen",
            "specs": [
                [
                    ">=",
                    "0.9.6"
                ]
            ]
        },
        {
            "name": "faiss-cpu",
            "specs": [
                [
                    ">=",
                    "1.11.0"
                ]
            ]
        },
        {
            "name": "sentence-transformers",
            "specs": [
                [
                    ">=",
                    "5.0.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        }
    ],
    "lcname": "biodisco"
}
        
Elapsed time: 1.39492s