# biomapper
A unified Python toolkit for biological data harmonization and ontology mapping. `biomapper` provides a single interface for standardizing identifiers and mapping between various biological ontologies, making multi-omic data integration more accessible and reproducible.
## Features
### Core Functionality
- **ID Standardization**: Unified interface for standardizing biological identifiers
- **Ontology Mapping**: Comprehensive ontology mapping using major biological databases and AI-powered techniques
- **Data Validation**: Robust validation of input data and mappings
- **Extensible Architecture**: Easy integration of new data sources and mapping services
### Supported Systems
#### ID Standardization Tools
- RaMP-DB: Integration with the Rapid Mapping Database for metabolites and pathways
#### Mapping Services
- ChEBI: Chemical Entities of Biological Interest database integration
- UniChem: Cross-referencing of chemical structure identifiers
- UniProt: Protein-focused mapping capabilities
- RefMet: Reference list of metabolite names and identifiers
- RAG-Based Mapping: AI-powered mapping using Retrieval Augmented Generation
- Multi-Provider RAG: Combining multiple data sources for improved mapping accuracy
## Installation
### Using pip
```bash
pip install biomapper
```
### Development Setup
1. Install Python 3.11 with pyenv (if not already installed):
```bash
# Install pyenv dependencies
sudo apt-get update
sudo apt-get install -y make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev \
libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl
# Install pyenv
curl https://pyenv.run | bash
# Add to your shell configuration
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc
# Reload shell configuration
source ~/.bashrc
# Install Python 3.11
pyenv install 3.11.7
pyenv local 3.11.7
```
2. Install Poetry (if not already installed):
```bash
curl -sSL https://install.python-poetry.org | python3 -
# Add Poetry to your PATH
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
```
3. Clone and set up the project:
```bash
git clone https://github.com/yourusername/biomapper.git
cd biomapper
# Install dependencies with Poetry
poetry install
```
## Quick Start
```python
from biomapper.mapping import UniProtFocusedMapper, MetaboliteNameMapper
from biomapper.standardization import RaMPClient
# Example 1: Using UniProt-focused mapping
uniprot_mapper = UniProtFocusedMapper()
protein_mapping = uniprot_mapper.map_identifier("P12345")
# Example 2: Using Metabolite Name Mapping
metabolite_mapper = MetaboliteNameMapper()
metabolite_mapping = metabolite_mapper.map_name("glucose")
# Example 3: Using RaMP-DB
# Initialize the RaMP client
ramp_client = RaMPClient()
# Get database versions
versions = ramp_client.get_source_versions()
# Get pathways for metabolites
# Example: Get pathways for Creatine (HMDB0000064)
pathways = ramp_client.get_pathways_from_analytes(["hmdb:HMDB0000064"])
# Example 4: Using RAG-based mapping
from biomapper.mapping import RagMapper
rag_mapper = RagMapper()
rag_results = rag_mapper.map_name("alpha-D-glucose")
```
## Development
### Using Poetry
```bash
# Activate virtual environment
poetry shell
# Run a command in the virtual environment
poetry run python script.py
# Add a new dependency
poetry add package-name
# Add a development dependency
poetry add --group dev package-name
# Update dependencies
poetry update
# Show currently installed packages
poetry show
# Build the package
poetry build
```
### Running Tests
```bash
# Run tests
poetry run pytest
# Run tests with coverage
poetry run pytest --cov=biomapper
```
### Code Quality
```bash
# Format code with black
poetry run black .
# Run linting
poetry run flake8 .
# Type checking
poetry run mypy .
```
## Project Structure
```
biomapper/
├── biomapper/ # Main package directory
│ ├── core/ # Core functionality
│ │ ├── metadata.py # Metadata handling
│ │ └── validators.py # Data validation
│ ├── standardization/# ID standardization components
│ ├── mapping/ # Ontology mapping components
│ ├── utils/ # Utility functions
│ └── schemas/ # Data schemas and models
├── tests/ # Test files
├── docs/ # Documentation
├── scripts/ # Utility scripts
├── pyproject.toml # Poetry configuration and dependencies
└── poetry.lock # Lock file for dependencies
```
## License
This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
## Support
For support, please open an issue in the GitHub issue tracker.
## Roadmap
- [x] Initial release with core functionality
- [x] Implement RAG-based mapping capabilities
- [x] Add support for major chemical/biological databases (ChEBI, UniChem, UniProt)
- [ ] Add caching layer for improved performance
- [ ] Expand RAG capabilities with more specialized models
- [ ] Add batch processing capabilities
- [ ] Develop REST API interface
## Acknowledgments
- [RaMP-DB](http://rampdb.org/)
- [ChEBI](https://www.ebi.ac.uk/chebi/)
- [UniChem](https://www.ebi.ac.uk/unichem/)
- [UniProt](https://www.uniprot.org/)
- [RefMet](https://refmet.metabolomicsworkbench.org/)
Raw data
{
"_id": null,
"home_page": "https://github.com/arpanauts/biomapper",
"name": "biomapper",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.11",
"maintainer_email": null,
"keywords": "bioinformatics, ontology, data mapping, biological data, standardization, harmonization",
"author": "Trent Leslie",
"author_email": "trent.leslie@phenomehealth.org",
"download_url": "https://files.pythonhosted.org/packages/2a/12/73d6220e972ec0e8096055604634b616be08174a31dce1f4488748f21774/biomapper-0.4.0.tar.gz",
"platform": null,
"description": "# biomapper\n\nA unified Python toolkit for biological data harmonization and ontology mapping. `biomapper` provides a single interface for standardizing identifiers and mapping between various biological ontologies, making multi-omic data integration more accessible and reproducible.\n\n## Features\n\n### Core Functionality\n- **ID Standardization**: Unified interface for standardizing biological identifiers\n- **Ontology Mapping**: Comprehensive ontology mapping using major biological databases and AI-powered techniques\n- **Data Validation**: Robust validation of input data and mappings\n- **Extensible Architecture**: Easy integration of new data sources and mapping services\n\n### Supported Systems\n\n#### ID Standardization Tools\n- RaMP-DB: Integration with the Rapid Mapping Database for metabolites and pathways\n\n#### Mapping Services\n- ChEBI: Chemical Entities of Biological Interest database integration\n- UniChem: Cross-referencing of chemical structure identifiers\n- UniProt: Protein-focused mapping capabilities\n- RefMet: Reference list of metabolite names and identifiers\n- RAG-Based Mapping: AI-powered mapping using Retrieval Augmented Generation\n- Multi-Provider RAG: Combining multiple data sources for improved mapping accuracy\n\n## Installation\n\n### Using pip\n```bash\npip install biomapper\n```\n\n### Development Setup\n\n1. Install Python 3.11 with pyenv (if not already installed):\n```bash\n# Install pyenv dependencies\nsudo apt-get update\nsudo apt-get install -y make build-essential libssl-dev zlib1g-dev \\\nlibbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev \\\nlibncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl\n\n# Install pyenv\ncurl https://pyenv.run | bash\n\n# Add to your shell configuration\necho 'export PYENV_ROOT=\"$HOME/.pyenv\"' >> ~/.bashrc\necho 'command -v pyenv >/dev/null || export PATH=\"$PYENV_ROOT/bin:$PATH\"' >> ~/.bashrc\necho 'eval \"$(pyenv init -)\"' >> ~/.bashrc\n\n# Reload shell configuration\nsource ~/.bashrc\n\n# Install Python 3.11\npyenv install 3.11.7\npyenv local 3.11.7\n```\n\n2. Install Poetry (if not already installed):\n```bash\ncurl -sSL https://install.python-poetry.org | python3 -\n\n# Add Poetry to your PATH\necho 'export PATH=\"$HOME/.local/bin:$PATH\"' >> ~/.bashrc\nsource ~/.bashrc\n```\n\n3. Clone and set up the project:\n```bash\ngit clone https://github.com/yourusername/biomapper.git\ncd biomapper\n\n# Install dependencies with Poetry\npoetry install\n```\n\n## Quick Start\n\n```python\nfrom biomapper.mapping import UniProtFocusedMapper, MetaboliteNameMapper\nfrom biomapper.standardization import RaMPClient\n\n# Example 1: Using UniProt-focused mapping\nuniprot_mapper = UniProtFocusedMapper()\nprotein_mapping = uniprot_mapper.map_identifier(\"P12345\")\n\n# Example 2: Using Metabolite Name Mapping\nmetabolite_mapper = MetaboliteNameMapper()\nmetabolite_mapping = metabolite_mapper.map_name(\"glucose\")\n\n# Example 3: Using RaMP-DB\n# Initialize the RaMP client\nramp_client = RaMPClient()\n\n# Get database versions\nversions = ramp_client.get_source_versions()\n\n# Get pathways for metabolites\n# Example: Get pathways for Creatine (HMDB0000064)\npathways = ramp_client.get_pathways_from_analytes([\"hmdb:HMDB0000064\"])\n\n# Example 4: Using RAG-based mapping\nfrom biomapper.mapping import RagMapper\n\nrag_mapper = RagMapper()\nrag_results = rag_mapper.map_name(\"alpha-D-glucose\")\n```\n\n## Development\n\n### Using Poetry\n\n```bash\n# Activate virtual environment\npoetry shell\n\n# Run a command in the virtual environment\npoetry run python script.py\n\n# Add a new dependency\npoetry add package-name\n\n# Add a development dependency\npoetry add --group dev package-name\n\n# Update dependencies\npoetry update\n\n# Show currently installed packages\npoetry show\n\n# Build the package\npoetry build\n```\n\n### Running Tests\n```bash\n# Run tests\npoetry run pytest\n\n# Run tests with coverage\npoetry run pytest --cov=biomapper\n```\n\n### Code Quality\n```bash\n# Format code with black\npoetry run black .\n\n# Run linting\npoetry run flake8 .\n\n# Type checking\npoetry run mypy .\n```\n\n## Project Structure\n\n```\nbiomapper/\n\u251c\u2500\u2500 biomapper/ # Main package directory\n\u2502 \u251c\u2500\u2500 core/ # Core functionality\n\u2502 \u2502 \u251c\u2500\u2500 metadata.py # Metadata handling\n\u2502 \u2502 \u2514\u2500\u2500 validators.py # Data validation\n\u2502 \u251c\u2500\u2500 standardization/# ID standardization components\n\u2502 \u251c\u2500\u2500 mapping/ # Ontology mapping components\n\u2502 \u251c\u2500\u2500 utils/ # Utility functions\n\u2502 \u2514\u2500\u2500 schemas/ # Data schemas and models\n\u251c\u2500\u2500 tests/ # Test files\n\u251c\u2500\u2500 docs/ # Documentation\n\u251c\u2500\u2500 scripts/ # Utility scripts\n\u251c\u2500\u2500 pyproject.toml # Poetry configuration and dependencies\n\u2514\u2500\u2500 poetry.lock # Lock file for dependencies\n```\n\n## License\n\nThis project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.\n\n## Support\n\nFor support, please open an issue in the GitHub issue tracker.\n\n## Roadmap\n\n- [x] Initial release with core functionality\n- [x] Implement RAG-based mapping capabilities\n- [x] Add support for major chemical/biological databases (ChEBI, UniChem, UniProt)\n- [ ] Add caching layer for improved performance\n- [ ] Expand RAG capabilities with more specialized models\n- [ ] Add batch processing capabilities\n- [ ] Develop REST API interface\n\n## Acknowledgments\n\n- [RaMP-DB](http://rampdb.org/)\n- [ChEBI](https://www.ebi.ac.uk/chebi/)\n- [UniChem](https://www.ebi.ac.uk/unichem/)\n- [UniProt](https://www.uniprot.org/)\n- [RefMet](https://refmet.metabolomicsworkbench.org/)",
"bugtrack_url": null,
"license": "MIT",
"summary": "A unified Python toolkit for biological data harmonization and ontology mapping",
"version": "0.4.0",
"project_urls": {
"Documentation": "https://github.com/arpanauts/biomapper/blob/main/README.md",
"Homepage": "https://github.com/arpanauts/biomapper",
"Repository": "https://github.com/arpanauts/biomapper"
},
"split_keywords": [
"bioinformatics",
" ontology",
" data mapping",
" biological data",
" standardization",
" harmonization"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bab919556b3a13d23ccce8ae1340902efbfee2229105647f2071f31cf7cfdd0e",
"md5": "6cd89d237c9b7bc08b66abb8e2269bba",
"sha256": "00b70a4490cdf756f82a4693b7f744a6a83d50aacf6068ea023f045fbaa0db50"
},
"downloads": -1,
"filename": "biomapper-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6cd89d237c9b7bc08b66abb8e2269bba",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.11",
"size": 57744,
"upload_time": "2025-01-08T12:58:12",
"upload_time_iso_8601": "2025-01-08T12:58:12.223574Z",
"url": "https://files.pythonhosted.org/packages/ba/b9/19556b3a13d23ccce8ae1340902efbfee2229105647f2071f31cf7cfdd0e/biomapper-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2a1273d6220e972ec0e8096055604634b616be08174a31dce1f4488748f21774",
"md5": "3c64a80f90c2872d8acecd246e309ed7",
"sha256": "11a21b3a015102a854dbded35564100b607815e4f187f611ebcfa6b592efbf65"
},
"downloads": -1,
"filename": "biomapper-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "3c64a80f90c2872d8acecd246e309ed7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.11",
"size": 48913,
"upload_time": "2025-01-08T12:58:14",
"upload_time_iso_8601": "2025-01-08T12:58:14.950781Z",
"url": "https://files.pythonhosted.org/packages/2a/12/73d6220e972ec0e8096055604634b616be08174a31dce1f4488748f21774/biomapper-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-08 12:58:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "arpanauts",
"github_project": "biomapper",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "biomapper"
}