honeybee-ml


Namehoneybee-ml JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryA Scalable Modular Framework for Multimodal AI in Oncology
upload_time2025-10-14 02:14:14
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords multimodal ai oncology cancer research medical imaging clinical nlp machine learning pathology radiology biomedical healthcare
VCS
bugtrack_url
requirements ipykernel ipywidgets numpy pandas llama_index pymongo transformers torch torchvision torchaudio accelerate bitsandbytes pytesseract pdf2image PyPDF2 pyarrow fastparquet pydicom opencv-python matplotlib langchain scikit-image imageio albumentations peft cucim openslide-python colour-science scipy pytesseract onnxruntime SimpleITK nibabel timm
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <img src="website/public/images/logo.png" alt="HoneyBee Logo" width="200">
  
  # HoneyBee
  
  **A Scalable Modular Framework for Multimodal AI in Oncology**
  
  [![arXiv](https://img.shields.io/badge/arXiv-2405.07460-b31b1b.svg)](https://arxiv.org/abs/2405.07460)
  [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
  [![GitHub stars](https://img.shields.io/github/stars/lab-rasool/HoneyBee?style=social)](https://github.com/lab-rasool/HoneyBee/stargazers)
  [![Python](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
  [![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-ee4c2c.svg)](https://pytorch.org/)
  
  [Documentation](https://lab-rasool.github.io/HoneyBee/) | [Paper](https://arxiv.org/abs/2405.07460) | [Examples](examples/) | [Demo](app.py) | [Google Colab](https://colab.research.google.com/)
</div>

## 🚀 Overview

HoneyBee is a comprehensive multimodal AI framework designed specifically for oncology research and clinical applications. It seamlessly integrates and processes diverse medical data types—clinical text, radiology images, pathology slides, and molecular data—through a unified, modular architecture. Built with scalability and extensibility in mind, HoneyBee empowers researchers to develop sophisticated AI models for cancer diagnosis, prognosis, and treatment planning.

> [!WARNING]
> **Alpha Release**: This framework is currently in alpha. APIs may change, and some features are still under development.

## ✨ Key Features

### 🏗️ Modular Architecture
- **3-Layer Design**: Clean separation between data loaders, embedding models, and processors
- **Unified API**: Consistent interface across all modalities
- **Extensible**: Easy to add new models and data sources
- **Production-Ready**: Optimized for both research and clinical deployment

### 📊 Comprehensive Data Support

#### Medical Imaging
- **Pathology**: Whole Slide Images (WSI) - SVS, TIFF formats with tissue detection
- **Radiology**: DICOM, NIFTI processing with 3D support
- **Preprocessing**: Advanced augmentation and normalization pipelines

#### Clinical Text
- **Document Processing**: PDF support with OCR for scanned documents
- **NLP Pipeline**: Cancer entity extraction, temporal parsing, medical ontology integration
- **Database Integration**: Native [MINDS](https://github.com/lab-rasool/MINDS) format support
- **Long Document Handling**: Multiple tokenization strategies for clinical notes

#### Molecular Data
- **Genomics**: Support for expression data and mutation profiles
- **Integration**: Seamless combination with imaging and clinical data

### 🧠 State-of-the-Art Embedding Models

#### Clinical Text Embeddings
- **GatorTron**: Domain-specific clinical language model
- **BioBERT**: Biomedical text understanding
- **PubMedBERT**: Scientific literature embeddings
- **Clinical-T5**: Text-to-text clinical transformers

#### Medical Image Embeddings
- **REMEDIS**: Self-supervised medical image representations
- **RadImageNet**: Pre-trained radiological feature extractors
- **UNI**: Universal medical image encoder
- **Custom Models**: Easy integration of proprietary models

### 🛠️ Advanced Capabilities

#### Multimodal Integration
- **Cross-Modal Learning**: Unified representations across modalities
- **Attention Mechanisms**: Interpretable fusion strategies
- **Patient-Level Aggregation**: Comprehensive patient profiles

#### Analysis Tools
- **Survival Analysis**: Cox PH, Random Survival Forest, DeepSurv
- **Classification**: Multi-class cancer type prediction
- **Retrieval**: Similar patient identification
- **Visualization**: Interactive t-SNE dashboards

#### Clinical Applications
- **Risk Stratification**: Patient outcome prediction
- **Treatment Planning**: Personalized therapy recommendations
- **Biomarker Discovery**: Multi-omic pattern identification

## 🚀 Quick Start

### Prerequisites

- Python 3.8+
- PyTorch 2.0+
- CUDA 11.7+ (optional, for GPU acceleration)

### System Dependencies

```bash
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y openslide-tools tesseract-ocr

# macOS
brew install openslide tesseract

# Windows
# Install from official websites:
# - OpenSlide: https://openslide.org/download/
# - Tesseract: https://github.com/UB-Mannheim/tesseract/wiki
```

### Installation

```bash
# Clone the repository
git clone https://github.com/lab-rasool/HoneyBee.git
cd HoneyBee

# Install dependencies
pip install -r requirements.txt

# Download required NLTK data
python -c "import nltk; nltk.download('punkt')"

# Install HoneyBee in development mode
pip install -e .
```

### Environment Setup

Create a `.env` file in the project root:

```bash
# MINDS database credentials (if using MINDS format)
HOST=your_server
PORT=5433
DB_USER=postgres
PASSWORD=your_password
DATABASE=minds

# HuggingFace API (for some models)
HF_API_KEY=your_huggingface_api_key
```

## 🔬 Research Applications

HoneyBee has been successfully applied to:

- **Cancer Subtype Classification**: Automated identification of cancer subtypes from multimodal data
- **Survival Prediction**: Risk stratification and outcome prediction for treatment planning
- **Similar Patient Retrieval**: Finding patients with similar clinical profiles for precision medicine
- **Biomarker Discovery**: Identifying multimodal patterns associated with treatment response

## 🤝 Contributing

We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.

### Development Setup

```bash
# Fork and clone your fork
git clone https://github.com/YOUR_USERNAME/HoneyBee.git
cd HoneyBee

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -r requirements.txt
pip install -e .
```

## 🐛 Known Issues & Limitations

- **Alpha Status**: Some features are still under development
- **Memory Requirements**: WSI processing requires significant RAM (16GB+ recommended)
- **GPU Recommended**: While CPU fallback exists, GPU acceleration significantly improves performance
- **Limited Test Coverage**: Comprehensive test suite is planned for future releases

## 📜 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 📝 Citation

If you use HoneyBee in your research, please cite our paper:

```bibtex
@article{tripathi2024honeybee,
    title={HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models},
    author={Aakash Tripathi and Asim Waqas and Yasin Yilmaz and Ghulam Rasool},
    journal={arXiv preprint arXiv:2405.07460},
    year={2024},
    eprint={2405.07460},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
```

---

<div align="center">
  Made with ❤️ by the <a href="https://github.com/lab-rasool">Lab Rasool</a> team
</div>

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "honeybee-ml",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Aakash Tripathi <aakash.tripathi@moffitt.org>",
    "keywords": "multimodal AI, oncology, cancer research, medical imaging, clinical NLP, machine learning, pathology, radiology, biomedical, healthcare",
    "author": null,
    "author_email": "Aakash Tripathi <aakash.tripathi@moffitt.org>, Lab Rasool <ghulam.rasool@moffitt.org>",
    "download_url": "https://files.pythonhosted.org/packages/50/80/d15659ee2b1f83fc67c2b610afcda7c99e8456fbb77f1bb9b1c685830fdf/honeybee_ml-0.1.0.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <img src=\"website/public/images/logo.png\" alt=\"HoneyBee Logo\" width=\"200\">\n  \n  # HoneyBee\n  \n  **A Scalable Modular Framework for Multimodal AI in Oncology**\n  \n  [![arXiv](https://img.shields.io/badge/arXiv-2405.07460-b31b1b.svg)](https://arxiv.org/abs/2405.07460)\n  [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)\n  [![GitHub stars](https://img.shields.io/github/stars/lab-rasool/HoneyBee?style=social)](https://github.com/lab-rasool/HoneyBee/stargazers)\n  [![Python](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n  [![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-ee4c2c.svg)](https://pytorch.org/)\n  \n  [Documentation](https://lab-rasool.github.io/HoneyBee/) | [Paper](https://arxiv.org/abs/2405.07460) | [Examples](examples/) | [Demo](app.py) | [Google Colab](https://colab.research.google.com/)\n</div>\n\n## \ud83d\ude80 Overview\n\nHoneyBee is a comprehensive multimodal AI framework designed specifically for oncology research and clinical applications. It seamlessly integrates and processes diverse medical data types\u2014clinical text, radiology images, pathology slides, and molecular data\u2014through a unified, modular architecture. Built with scalability and extensibility in mind, HoneyBee empowers researchers to develop sophisticated AI models for cancer diagnosis, prognosis, and treatment planning.\n\n> [!WARNING]\n> **Alpha Release**: This framework is currently in alpha. APIs may change, and some features are still under development.\n\n## \u2728 Key Features\n\n### \ud83c\udfd7\ufe0f Modular Architecture\n- **3-Layer Design**: Clean separation between data loaders, embedding models, and processors\n- **Unified API**: Consistent interface across all modalities\n- **Extensible**: Easy to add new models and data sources\n- **Production-Ready**: Optimized for both research and clinical deployment\n\n### \ud83d\udcca Comprehensive Data Support\n\n#### Medical Imaging\n- **Pathology**: Whole Slide Images (WSI) - SVS, TIFF formats with tissue detection\n- **Radiology**: DICOM, NIFTI processing with 3D support\n- **Preprocessing**: Advanced augmentation and normalization pipelines\n\n#### Clinical Text\n- **Document Processing**: PDF support with OCR for scanned documents\n- **NLP Pipeline**: Cancer entity extraction, temporal parsing, medical ontology integration\n- **Database Integration**: Native [MINDS](https://github.com/lab-rasool/MINDS) format support\n- **Long Document Handling**: Multiple tokenization strategies for clinical notes\n\n#### Molecular Data\n- **Genomics**: Support for expression data and mutation profiles\n- **Integration**: Seamless combination with imaging and clinical data\n\n### \ud83e\udde0 State-of-the-Art Embedding Models\n\n#### Clinical Text Embeddings\n- **GatorTron**: Domain-specific clinical language model\n- **BioBERT**: Biomedical text understanding\n- **PubMedBERT**: Scientific literature embeddings\n- **Clinical-T5**: Text-to-text clinical transformers\n\n#### Medical Image Embeddings\n- **REMEDIS**: Self-supervised medical image representations\n- **RadImageNet**: Pre-trained radiological feature extractors\n- **UNI**: Universal medical image encoder\n- **Custom Models**: Easy integration of proprietary models\n\n### \ud83d\udee0\ufe0f Advanced Capabilities\n\n#### Multimodal Integration\n- **Cross-Modal Learning**: Unified representations across modalities\n- **Attention Mechanisms**: Interpretable fusion strategies\n- **Patient-Level Aggregation**: Comprehensive patient profiles\n\n#### Analysis Tools\n- **Survival Analysis**: Cox PH, Random Survival Forest, DeepSurv\n- **Classification**: Multi-class cancer type prediction\n- **Retrieval**: Similar patient identification\n- **Visualization**: Interactive t-SNE dashboards\n\n#### Clinical Applications\n- **Risk Stratification**: Patient outcome prediction\n- **Treatment Planning**: Personalized therapy recommendations\n- **Biomarker Discovery**: Multi-omic pattern identification\n\n## \ud83d\ude80 Quick Start\n\n### Prerequisites\n\n- Python 3.8+\n- PyTorch 2.0+\n- CUDA 11.7+ (optional, for GPU acceleration)\n\n### System Dependencies\n\n```bash\n# Ubuntu/Debian\nsudo apt-get update\nsudo apt-get install -y openslide-tools tesseract-ocr\n\n# macOS\nbrew install openslide tesseract\n\n# Windows\n# Install from official websites:\n# - OpenSlide: https://openslide.org/download/\n# - Tesseract: https://github.com/UB-Mannheim/tesseract/wiki\n```\n\n### Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/lab-rasool/HoneyBee.git\ncd HoneyBee\n\n# Install dependencies\npip install -r requirements.txt\n\n# Download required NLTK data\npython -c \"import nltk; nltk.download('punkt')\"\n\n# Install HoneyBee in development mode\npip install -e .\n```\n\n### Environment Setup\n\nCreate a `.env` file in the project root:\n\n```bash\n# MINDS database credentials (if using MINDS format)\nHOST=your_server\nPORT=5433\nDB_USER=postgres\nPASSWORD=your_password\nDATABASE=minds\n\n# HuggingFace API (for some models)\nHF_API_KEY=your_huggingface_api_key\n```\n\n## \ud83d\udd2c Research Applications\n\nHoneyBee has been successfully applied to:\n\n- **Cancer Subtype Classification**: Automated identification of cancer subtypes from multimodal data\n- **Survival Prediction**: Risk stratification and outcome prediction for treatment planning\n- **Similar Patient Retrieval**: Finding patients with similar clinical profiles for precision medicine\n- **Biomarker Discovery**: Identifying multimodal patterns associated with treatment response\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.\n\n### Development Setup\n\n```bash\n# Fork and clone your fork\ngit clone https://github.com/YOUR_USERNAME/HoneyBee.git\ncd HoneyBee\n\n# Create a virtual environment\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n\n# Install in development mode\npip install -r requirements.txt\npip install -e .\n```\n\n## \ud83d\udc1b Known Issues & Limitations\n\n- **Alpha Status**: Some features are still under development\n- **Memory Requirements**: WSI processing requires significant RAM (16GB+ recommended)\n- **GPU Recommended**: While CPU fallback exists, GPU acceleration significantly improves performance\n- **Limited Test Coverage**: Comprehensive test suite is planned for future releases\n\n## \ud83d\udcdc License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\udcdd Citation\n\nIf you use HoneyBee in your research, please cite our paper:\n\n```bibtex\n@article{tripathi2024honeybee,\n    title={HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models},\n    author={Aakash Tripathi and Asim Waqas and Yasin Yilmaz and Ghulam Rasool},\n    journal={arXiv preprint arXiv:2405.07460},\n    year={2024},\n    eprint={2405.07460},\n    archivePrefix={arXiv},\n    primaryClass={cs.LG}\n}\n```\n\n---\n\n<div align=\"center\">\n  Made with \u2764\ufe0f by the <a href=\"https://github.com/lab-rasool\">Lab Rasool</a> team\n</div>\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Scalable Modular Framework for Multimodal AI in Oncology",
    "version": "0.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/lab-rasool/HoneyBee/issues",
        "Documentation": "https://lab-rasool.github.io/HoneyBee/",
        "Homepage": "https://github.com/lab-rasool/HoneyBee",
        "Paper": "https://arxiv.org/abs/2405.07460",
        "Repository": "https://github.com/lab-rasool/HoneyBee"
    },
    "split_keywords": [
        "multimodal ai",
        " oncology",
        " cancer research",
        " medical imaging",
        " clinical nlp",
        " machine learning",
        " pathology",
        " radiology",
        " biomedical",
        " healthcare"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3269992dffcf350049039cabb3cf2efaa8ca7a77c229a71314b66ee483a9e99b",
                "md5": "fef9c27d74f737c17b8c328bcd4cf15d",
                "sha256": "d1c12de10edd3987aa9d56f5a616b992d8717d982a00f47704e34bbba9d0eb05"
            },
            "downloads": -1,
            "filename": "honeybee_ml-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fef9c27d74f737c17b8c328bcd4cf15d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 97486,
            "upload_time": "2025-10-14T02:14:12",
            "upload_time_iso_8601": "2025-10-14T02:14:12.260755Z",
            "url": "https://files.pythonhosted.org/packages/32/69/992dffcf350049039cabb3cf2efaa8ca7a77c229a71314b66ee483a9e99b/honeybee_ml-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5080d15659ee2b1f83fc67c2b610afcda7c99e8456fbb77f1bb9b1c685830fdf",
                "md5": "b882a234bf0764b0330b1cfeb8bcf567",
                "sha256": "bbfed4c984dd2210967b74ee24998f2393fc6f0db6dcf3acd671b6707efac709"
            },
            "downloads": -1,
            "filename": "honeybee_ml-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "b882a234bf0764b0330b1cfeb8bcf567",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 89202,
            "upload_time": "2025-10-14T02:14:14",
            "upload_time_iso_8601": "2025-10-14T02:14:14.837254Z",
            "url": "https://files.pythonhosted.org/packages/50/80/d15659ee2b1f83fc67c2b610afcda7c99e8456fbb77f1bb9b1c685830fdf/honeybee_ml-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-14 02:14:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lab-rasool",
    "github_project": "HoneyBee",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "ipykernel",
            "specs": []
        },
        {
            "name": "ipywidgets",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "llama_index",
            "specs": []
        },
        {
            "name": "pymongo",
            "specs": []
        },
        {
            "name": "transformers",
            "specs": []
        },
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "torchvision",
            "specs": []
        },
        {
            "name": "torchaudio",
            "specs": []
        },
        {
            "name": "accelerate",
            "specs": []
        },
        {
            "name": "bitsandbytes",
            "specs": []
        },
        {
            "name": "pytesseract",
            "specs": []
        },
        {
            "name": "pdf2image",
            "specs": []
        },
        {
            "name": "PyPDF2",
            "specs": []
        },
        {
            "name": "pyarrow",
            "specs": []
        },
        {
            "name": "fastparquet",
            "specs": []
        },
        {
            "name": "pydicom",
            "specs": []
        },
        {
            "name": "opencv-python",
            "specs": []
        },
        {
            "name": "matplotlib",
            "specs": []
        },
        {
            "name": "langchain",
            "specs": []
        },
        {
            "name": "scikit-image",
            "specs": []
        },
        {
            "name": "imageio",
            "specs": []
        },
        {
            "name": "albumentations",
            "specs": []
        },
        {
            "name": "peft",
            "specs": []
        },
        {
            "name": "cucim",
            "specs": []
        },
        {
            "name": "openslide-python",
            "specs": []
        },
        {
            "name": "colour-science",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": []
        },
        {
            "name": "pytesseract",
            "specs": []
        },
        {
            "name": "onnxruntime",
            "specs": []
        },
        {
            "name": "SimpleITK",
            "specs": []
        },
        {
            "name": "nibabel",
            "specs": []
        },
        {
            "name": "timm",
            "specs": []
        }
    ],
    "lcname": "honeybee-ml"
}
        
Elapsed time: 2.93816s