biomapper

Name	biomapper JSON
Version	0.5.0 JSON
	download
home_page	https://github.com/arpanauts/biomapper
Summary	A unified Python toolkit for biological data harmonization and ontology mapping
upload_time	2025-03-21 20:35:02
maintainer	None
docs_url	None
author	Trent Leslie
requires_python	<4.0,>=3.11
license	MIT
keywords	bioinformatics ontology data mapping biological data standardization harmonization
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # biomapper

A unified Python toolkit for biological data harmonization and ontology mapping. `biomapper` provides a single interface for standardizing identifiers and mapping between various biological ontologies, making multi-omic data integration more accessible and reproducible.

[Documentation](https://biomapper.readthedocs.io/) | [Examples](examples/) | [Contributing](CONTRIBUTING.md)

## Features

### Core Functionality
- **ID Standardization**: Unified interface for standardizing biological identifiers
- **Ontology Mapping**: Comprehensive ontology mapping using major biological databases and AI-powered techniques
- **Data Validation**: Robust validation of input data and mappings
- **Extensible Architecture**: Easy integration of new data sources and mapping services

### Supported Systems

#### ID Standardization Tools
- RaMP-DB: Integration with the Rapid Mapping Database for metabolites and pathways

#### Mapping Services
- ChEBI: Chemical Entities of Biological Interest database integration
- UniChem: Cross-referencing of chemical structure identifiers
- UniProt: Protein-focused mapping capabilities
- RefMet: Reference list of metabolite names and identifiers
- RAG-Based Mapping: AI-powered mapping using Retrieval Augmented Generation
- Multi-Provider RAG: Combining multiple data sources for improved mapping accuracy

## Installation

### Using pip
```bash
pip install biomapper
```

### Development Setup

1. Clone the repository:
```bash
git clone https://github.com/arpanauts/biomapper.git
cd biomapper
```

2. Install Python 3.11 with pyenv (if not already installed):
```bash
# Install pyenv dependencies
sudo apt-get update
sudo apt-get install -y make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev \
libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl

# Install pyenv
curl https://pyenv.run | bash

# Add to your shell configuration
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc

# Reload shell configuration
source ~/.bashrc

# Install Python 3.11
pyenv install 3.11
pyenv local 3.11
```

3. Install Poetry (if not already installed):
```bash
curl -sSL https://install.python-poetry.org | python3 -
```

4. Install project dependencies:
```bash
poetry install
```

5. Set up pre-commit hooks:
```bash
poetry run pre-commit install
```

### Running Examples

The `examples/` directory contains tutorials and utility scripts demonstrating various features of biomapper. Before running examples:

1. Install additional dependencies:
```bash
poetry install --with examples
```

2. Set up environment variables:
```bash
# Create a .env file from the template
cp .env.example .env

# Edit .env with your API keys and configurations
vim .env
```

3. Initialize the vector store:
```bash
# Create the vector store directory
mkdir -p vector_store

# Run the setup script
poetry run python examples/utilities/verify_chromadb_setup.py
```

4. Run an example:
```bash
poetry run python examples/tutorials/tutorial_basic_llm_mapping.py
```

Refer to the [examples documentation](examples/README.md) for detailed information about each example.
```

## Project Structure

```
biomapper/
├── biomapper/          # Main package source code
├── docs/              # Documentation
│   ├── source/        # Sphinx documentation source
│   │   ├── api/       # API reference
│   │   ├── guides/    # User guides
│   │   └── tutorials/ # Tutorial documentation
│   └── technical_notes/# Technical design notes
├── examples/          # Example scripts and tutorials
│   ├── tutorials/     # Step-by-step tutorials
│   └── utilities/     # Utility scripts
├── tests/             # Test suite
├── poetry.lock       # Poetry dependency lock file
├── pyproject.toml    # Project configuration
└── README.md         # This file
```

## Documentation

Full documentation is available at [biomapper.readthedocs.io](https://biomapper.readthedocs.io/), including:

- Getting Started Guide
- API Reference
- Tutorials
- Architecture Overview
- Example Scripts



## Contributing

Contributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for details.


## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.



## Roadmap

- [x] Initial release with core functionality
- [x] Implement RAG-based mapping capabilities
- [x] Add support for major chemical/biological databases (ChEBI, UniChem, UniProt)
- [ ] Enhance RAG-based mapping with additional data sources
- [ ] Improve compound name normalization
- [ ] Add support for pathway databases (KEGG, Reactome)
- [ ] Expand test coverage and documentation
- [ ] Add more example workflows and use cases


## Project Structure

```
biomapper/
├── biomapper/           # Main package directory
│   ├── core/           # Core functionality
│   │   ├── metadata.py # Metadata handling
│   │   └── validators.py # Data validation
│   ├── standardization/# ID standardization components
│   ├── mapping/        # Ontology mapping components
│   ├── utils/          # Utility functions
│   └── schemas/        # Data schemas and models
├── tests/              # Test files
├── docs/               # Documentation
├── scripts/            # Utility scripts
├── pyproject.toml      # Poetry configuration and dependencies
└── poetry.lock        # Lock file for dependencies
```

## License

This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.

## Support

For support, please open an issue in the GitHub issue tracker.

## Roadmap

- [x] Initial release with core functionality
- [x] Implement RAG-based mapping capabilities
- [x] Add support for major chemical/biological databases (ChEBI, UniChem, UniProt)
- [ ] Add caching layer for improved performance
- [ ] Expand RAG capabilities with more specialized models
- [ ] Add batch processing capabilities
- [ ] Develop REST API interface

## Acknowledgments

- [RaMP-DB](http://rampdb.org/)
- [ChEBI](https://www.ebi.ac.uk/chebi/)
- [UniChem](https://www.ebi.ac.uk/unichem/)
- [UniProt](https://www.uniprot.org/)
- [RefMet](https://refmet.metabolomicsworkbench.org/)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/arpanauts/biomapper",
    "name": "biomapper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.11",
    "maintainer_email": null,
    "keywords": "bioinformatics, ontology, data mapping, biological data, standardization, harmonization",
    "author": "Trent Leslie",
    "author_email": "trent.leslie@phenomehealth.org",
    "download_url": "https://files.pythonhosted.org/packages/aa/61/34869699119430d766454425a234fc6749d99e208f5bafa350a2223100ed/biomapper-0.5.0.tar.gz",
    "platform": null,
    "description": "# biomapper\n\nA unified Python toolkit for biological data harmonization and ontology mapping. `biomapper` provides a single interface for standardizing identifiers and mapping between various biological ontologies, making multi-omic data integration more accessible and reproducible.\n\n[Documentation](https://biomapper.readthedocs.io/) | [Examples](examples/) | [Contributing](CONTRIBUTING.md)\n\n## Features\n\n### Core Functionality\n- **ID Standardization**: Unified interface for standardizing biological identifiers\n- **Ontology Mapping**: Comprehensive ontology mapping using major biological databases and AI-powered techniques\n- **Data Validation**: Robust validation of input data and mappings\n- **Extensible Architecture**: Easy integration of new data sources and mapping services\n\n### Supported Systems\n\n#### ID Standardization Tools\n- RaMP-DB: Integration with the Rapid Mapping Database for metabolites and pathways\n\n#### Mapping Services\n- ChEBI: Chemical Entities of Biological Interest database integration\n- UniChem: Cross-referencing of chemical structure identifiers\n- UniProt: Protein-focused mapping capabilities\n- RefMet: Reference list of metabolite names and identifiers\n- RAG-Based Mapping: AI-powered mapping using Retrieval Augmented Generation\n- Multi-Provider RAG: Combining multiple data sources for improved mapping accuracy\n\n## Installation\n\n### Using pip\n```bash\npip install biomapper\n```\n\n### Development Setup\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/arpanauts/biomapper.git\ncd biomapper\n```\n\n2. Install Python 3.11 with pyenv (if not already installed):\n```bash\n# Install pyenv dependencies\nsudo apt-get update\nsudo apt-get install -y make build-essential libssl-dev zlib1g-dev \\\nlibbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev \\\nlibncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl\n\n# Install pyenv\ncurl https://pyenv.run | bash\n\n# Add to your shell configuration\necho 'export PYENV_ROOT=\"$HOME/.pyenv\"' >> ~/.bashrc\necho 'command -v pyenv >/dev/null || export PATH=\"$PYENV_ROOT/bin:$PATH\"' >> ~/.bashrc\necho 'eval \"$(pyenv init -)\"' >> ~/.bashrc\n\n# Reload shell configuration\nsource ~/.bashrc\n\n# Install Python 3.11\npyenv install 3.11\npyenv local 3.11\n```\n\n3. Install Poetry (if not already installed):\n```bash\ncurl -sSL https://install.python-poetry.org | python3 -\n```\n\n4. Install project dependencies:\n```bash\npoetry install\n```\n\n5. Set up pre-commit hooks:\n```bash\npoetry run pre-commit install\n```\n\n### Running Examples\n\nThe `examples/` directory contains tutorials and utility scripts demonstrating various features of biomapper. Before running examples:\n\n1. Install additional dependencies:\n```bash\npoetry install --with examples\n```\n\n2. Set up environment variables:\n```bash\n# Create a .env file from the template\ncp .env.example .env\n\n# Edit .env with your API keys and configurations\nvim .env\n```\n\n3. Initialize the vector store:\n```bash\n# Create the vector store directory\nmkdir -p vector_store\n\n# Run the setup script\npoetry run python examples/utilities/verify_chromadb_setup.py\n```\n\n4. Run an example:\n```bash\npoetry run python examples/tutorials/tutorial_basic_llm_mapping.py\n```\n\nRefer to the [examples documentation](examples/README.md) for detailed information about each example.\n```\n\n## Project Structure\n\n```\nbiomapper/\n\u251c\u2500\u2500 biomapper/          # Main package source code\n\u251c\u2500\u2500 docs/              # Documentation\n\u2502   \u251c\u2500\u2500 source/        # Sphinx documentation source\n\u2502   \u2502   \u251c\u2500\u2500 api/       # API reference\n\u2502   \u2502   \u251c\u2500\u2500 guides/    # User guides\n\u2502   \u2502   \u2514\u2500\u2500 tutorials/ # Tutorial documentation\n\u2502   \u2514\u2500\u2500 technical_notes/# Technical design notes\n\u251c\u2500\u2500 examples/          # Example scripts and tutorials\n\u2502   \u251c\u2500\u2500 tutorials/     # Step-by-step tutorials\n\u2502   \u2514\u2500\u2500 utilities/     # Utility scripts\n\u251c\u2500\u2500 tests/             # Test suite\n\u251c\u2500\u2500 poetry.lock       # Poetry dependency lock file\n\u251c\u2500\u2500 pyproject.toml    # Project configuration\n\u2514\u2500\u2500 README.md         # This file\n```\n\n## Documentation\n\nFull documentation is available at [biomapper.readthedocs.io](https://biomapper.readthedocs.io/), including:\n\n- Getting Started Guide\n- API Reference\n- Tutorials\n- Architecture Overview\n- Example Scripts\n\n\n\n## Contributing\n\nContributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n\n\n## Roadmap\n\n- [x] Initial release with core functionality\n- [x] Implement RAG-based mapping capabilities\n- [x] Add support for major chemical/biological databases (ChEBI, UniChem, UniProt)\n- [ ] Enhance RAG-based mapping with additional data sources\n- [ ] Improve compound name normalization\n- [ ] Add support for pathway databases (KEGG, Reactome)\n- [ ] Expand test coverage and documentation\n- [ ] Add more example workflows and use cases\n\n\n## Project Structure\n\n```\nbiomapper/\n\u251c\u2500\u2500 biomapper/           # Main package directory\n\u2502   \u251c\u2500\u2500 core/           # Core functionality\n\u2502   \u2502   \u251c\u2500\u2500 metadata.py # Metadata handling\n\u2502   \u2502   \u2514\u2500\u2500 validators.py # Data validation\n\u2502   \u251c\u2500\u2500 standardization/# ID standardization components\n\u2502   \u251c\u2500\u2500 mapping/        # Ontology mapping components\n\u2502   \u251c\u2500\u2500 utils/          # Utility functions\n\u2502   \u2514\u2500\u2500 schemas/        # Data schemas and models\n\u251c\u2500\u2500 tests/              # Test files\n\u251c\u2500\u2500 docs/               # Documentation\n\u251c\u2500\u2500 scripts/            # Utility scripts\n\u251c\u2500\u2500 pyproject.toml      # Poetry configuration and dependencies\n\u2514\u2500\u2500 poetry.lock        # Lock file for dependencies\n```\n\n## License\n\nThis project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.\n\n## Support\n\nFor support, please open an issue in the GitHub issue tracker.\n\n## Roadmap\n\n- [x] Initial release with core functionality\n- [x] Implement RAG-based mapping capabilities\n- [x] Add support for major chemical/biological databases (ChEBI, UniChem, UniProt)\n- [ ] Add caching layer for improved performance\n- [ ] Expand RAG capabilities with more specialized models\n- [ ] Add batch processing capabilities\n- [ ] Develop REST API interface\n\n## Acknowledgments\n\n- [RaMP-DB](http://rampdb.org/)\n- [ChEBI](https://www.ebi.ac.uk/chebi/)\n- [UniChem](https://www.ebi.ac.uk/unichem/)\n- [UniProt](https://www.uniprot.org/)\n- [RefMet](https://refmet.metabolomicsworkbench.org/)",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A unified Python toolkit for biological data harmonization and ontology mapping",
    "version": "0.5.0",
    "project_urls": {
        "Documentation": "https://biomapper.readthedocs.io/",
        "Homepage": "https://github.com/arpanauts/biomapper",
        "Repository": "https://github.com/arpanauts/biomapper"
    },
    "split_keywords": [
        "bioinformatics",
        " ontology",
        " data mapping",
        " biological data",
        " standardization",
        " harmonization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cebb16aaf393d4ba1914fccbaab81f47370f41f4f5a36e896ba7b95e0109ac19",
                "md5": "ed94ee233f2c166748805a49f624200b",
                "sha256": "accdb0f5414911b975236ec59e4de10cdc07a773bfab00089c8f8b91cea21372"
            },
            "downloads": -1,
            "filename": "biomapper-0.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ed94ee233f2c166748805a49f624200b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.11",
            "size": 100291,
            "upload_time": "2025-03-21T20:35:01",
            "upload_time_iso_8601": "2025-03-21T20:35:01.335447Z",
            "url": "https://files.pythonhosted.org/packages/ce/bb/16aaf393d4ba1914fccbaab81f47370f41f4f5a36e896ba7b95e0109ac19/biomapper-0.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aa6134869699119430d766454425a234fc6749d99e208f5bafa350a2223100ed",
                "md5": "51d7f9818d2286686e7f661ab20f44a5",
                "sha256": "d0cab34a350eeca65cdf8ff3c5173fff1a528cb02ed745750810bfb5533491ec"
            },
            "downloads": -1,
            "filename": "biomapper-0.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "51d7f9818d2286686e7f661ab20f44a5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.11",
            "size": 72042,
            "upload_time": "2025-03-21T20:35:02",
            "upload_time_iso_8601": "2025-03-21T20:35:02.537778Z",
            "url": "https://files.pythonhosted.org/packages/aa/61/34869699119430d766454425a234fc6749d99e208f5bafa350a2223100ed/biomapper-0.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-03-21 20:35:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "arpanauts",
    "github_project": "biomapper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "biomapper"
}

Trent Leslie