Name | mosaicx JSON |
Version |
1.0.0
JSON |
| download |
home_page | None |
Summary | Medical cOmputational Suite for Advanced Intelligent eXtraction - Intelligent radiology report extraction using local LLMs |
upload_time | 2025-09-11 13:53:23 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.11 |
license | DUAL LICENSING NOTICE
====================
MOSAICX is dual-licensed under the terms of both the GNU Affero General Public License v3.0 (AGPL-3.0) and a Commercial License.
OPEN SOURCE LICENSE
===================
This software is available under the GNU Affero General Public License v3.0 (AGPL-3.0).
Under this license, you are free to use, modify, and distribute this software, provided that:
- Any derivative work or application that uses this software must also be open-sourced under AGPL-3.0
- If you run this software on a server and provide it as a service, you must make the complete source code of your application (including modifications) available to your users
- You must include this license notice and copyright information in all copies
For the complete AGPL-3.0 license terms, see LICENSE-AGPL-3.0.txt
COMMERCIAL LICENSE
==================
If you wish to use this software in a commercial product or service without the open-source requirements of AGPL-3.0, you must obtain a commercial license.
Commercial licenses are available from:
Zenta GmbH
For commercial licensing inquiries, please contact:
Email: info@zenta.solutions
Subject: MOSAICX Commercial License Request
Commercial licensing allows you to:
- Use this software in proprietary applications
- Distribute applications containing this software without open-source obligations
- Customize and modify the software without sharing changes
- Receive commercial support and maintenance
COPYRIGHT AND ATTRIBUTION
==========================
Copyright (c) 2024 DIGITX Lab, Department of Radiology, LMU Munich University Hospital
Developed by Lalith Kumar Shiyam Sundar, PhD
Commercial licensing managed by Zenta GmbH
IMPORTANT NOTICE
================
By using this software, you agree to comply with the terms of one of the above licenses.
If you are unsure which license applies to your use case, please contact Zenta GmbH for clarification. |
keywords |
extraction
llm
medical
nlp
pdf
radiology
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# MOSAICX
**Medical cOmputational Suite for Advanced Intelligent eXtraction**
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/Apache-2.0)
MOSAICX is an intelligent radiology report extraction tool that uses local Large Language Models (LLMs) to extract structured data from medical reports. It supports both PDF and text inputs, provides configurable output formats, and offers both programmatic and command-line interfaces.
## Features
🔬 **Intelligent Extraction**: Uses local LLMs (Ollama) for context-aware data extraction
📄 **Advanced Document Processing**: Powered by Docling for superior PDF and document parsing
⚙️ **Configurable Schemas**: Define custom extraction schemas with interactive brainstorming
📊 **Flexible Outputs**: Export to JSON, CSV, or custom formats
🔄 **Multi-Report Analysis**: Process multiple reports for patient history synthesis
🖥️ **Dual Interface**: Use as Python library or CLI tool
🏠 **Local Processing**: All processing happens locally using Ollama - no cloud dependencies
⚡ **Fast Development**: Built with uv for lightning-fast dependency management
## Quick Start
### Installation
```bash
pip install mosaicx
```
**For Development (with uv - recommended):**
```bash
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and setup
git clone https://github.com/LalithShiyam/MOSAICX.git
cd MOSAICX
uv sync --dev
uv run pre-commit install
```
### Basic Usage
#### Command Line Interface
```bash
# Extract from a single PDF report
uv run mosaicx extract report.pdf --config extraction_config.yaml --output results.json
# Interactive schema building
uv run mosaicx brainstorm --report sample_report.pdf --schema-output custom_schema.yaml
# Batch processing multiple reports
uv run mosaicx extract-batch reports/ --config config.yaml --output-dir results/
```
#### Python Library
```python
from mosaicx import ReportExtractor, ExtractionConfig
# Initialize extractor
extractor = ReportExtractor()
# Extract from PDF
config = ExtractionConfig.from_file('config.yaml')
results = extractor.extract_from_pdf('report.pdf', config)
# Extract from text
text_content = "Patient shows signs of pneumonia..."
results = extractor.extract_from_text(text_content, config)
# Multi-report analysis
patient_reports = ['report1.pdf', 'report2.pdf', 'report3.pdf']
timeline = extractor.analyze_patient_history(patient_reports, config)
```
## Configuration
Create a YAML configuration file to define extraction schemas:
```yaml
schema:
findings:
- field: "primary_diagnosis"
type: "string"
description: "Main diagnosis from the report"
- field: "severity"
type: "enum"
options: ["mild", "moderate", "severe"]
- field: "follow_up_required"
type: "boolean"
output:
format: "json"
include_confidence: true
include_source_text: true
llm:
model: "llama2"
temperature: 0.1
max_tokens: 1000
```
## Documentation
- [Installation Guide](docs/installation.md)
- [Configuration Reference](docs/configuration.md)
- [API Documentation](docs/api.md)
- [Examples](examples/)
## Development
MOSAICX is developed by the DIGITX Lab at the Department of Radiology, LMU Munich University Hospital.
### Requirements
- Python 3.11+
- Ollama installed locally
- Local LLM model (e.g., Llama2, CodeLlama)
### Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
## License
Apache License 2.0 - see [LICENSE](LICENSE) for details.
## Authors
**Lalith Kumar Shiyam Sundar, PhD**
DIGITX Lab, Department of Radiology
LMU Munich University Hospital
📧 lalith.shiyam@med.uni-muenchen.de
## Citation
If you use MOSAICX in your research, please cite:
```bibtex
@software{mosaicx2024,
title={MOSAICX: Medical cOmputational Suite for Advanced Intelligent eXtraction},
author={Sundar, Lalith Kumar Shiyam},
year={2024},
institution={DIGITX Lab, Department of Radiology, LMU Munich University Hospital},
url={https://github.com/LalithShiyam/MOSAICX}
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "mosaicx",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "extraction, llm, medical, nlp, pdf, radiology",
"author": null,
"author_email": "Lalith Kumar Shiyam Sundar <lalith.shiyam@med.uni-muenchen.de>",
"download_url": "https://files.pythonhosted.org/packages/5d/51/3c8b33edc6cb17360b31efdda2854a086b430e1d32b34b0ec208094e520b/mosaicx-1.0.0.tar.gz",
"platform": null,
"description": "# MOSAICX\n\n**Medical cOmputational Suite for Advanced Intelligent eXtraction**\n\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/Apache-2.0)\n\nMOSAICX is an intelligent radiology report extraction tool that uses local Large Language Models (LLMs) to extract structured data from medical reports. It supports both PDF and text inputs, provides configurable output formats, and offers both programmatic and command-line interfaces.\n\n## Features\n\n\ud83d\udd2c **Intelligent Extraction**: Uses local LLMs (Ollama) for context-aware data extraction \n\ud83d\udcc4 **Advanced Document Processing**: Powered by Docling for superior PDF and document parsing \n\u2699\ufe0f **Configurable Schemas**: Define custom extraction schemas with interactive brainstorming \n\ud83d\udcca **Flexible Outputs**: Export to JSON, CSV, or custom formats \n\ud83d\udd04 **Multi-Report Analysis**: Process multiple reports for patient history synthesis \n\ud83d\udda5\ufe0f **Dual Interface**: Use as Python library or CLI tool \n\ud83c\udfe0 **Local Processing**: All processing happens locally using Ollama - no cloud dependencies \n\u26a1 **Fast Development**: Built with uv for lightning-fast dependency management \n\n## Quick Start\n\n### Installation\n\n```bash\npip install mosaicx\n```\n\n**For Development (with uv - recommended):**\n\n```bash\n# Install uv (if not already installed)\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\n# Clone and setup\ngit clone https://github.com/LalithShiyam/MOSAICX.git\ncd MOSAICX\nuv sync --dev\nuv run pre-commit install\n```\n\n### Basic Usage\n\n#### Command Line Interface\n\n```bash\n# Extract from a single PDF report \nuv run mosaicx extract report.pdf --config extraction_config.yaml --output results.json\n\n# Interactive schema building\nuv run mosaicx brainstorm --report sample_report.pdf --schema-output custom_schema.yaml\n\n# Batch processing multiple reports\nuv run mosaicx extract-batch reports/ --config config.yaml --output-dir results/\n```\n\n#### Python Library\n\n```python\nfrom mosaicx import ReportExtractor, ExtractionConfig\n\n# Initialize extractor\nextractor = ReportExtractor()\n\n# Extract from PDF\nconfig = ExtractionConfig.from_file('config.yaml')\nresults = extractor.extract_from_pdf('report.pdf', config)\n\n# Extract from text\ntext_content = \"Patient shows signs of pneumonia...\"\nresults = extractor.extract_from_text(text_content, config)\n\n# Multi-report analysis\npatient_reports = ['report1.pdf', 'report2.pdf', 'report3.pdf']\ntimeline = extractor.analyze_patient_history(patient_reports, config)\n```\n\n## Configuration\n\nCreate a YAML configuration file to define extraction schemas:\n\n```yaml\nschema:\n findings:\n - field: \"primary_diagnosis\"\n type: \"string\"\n description: \"Main diagnosis from the report\"\n - field: \"severity\"\n type: \"enum\"\n options: [\"mild\", \"moderate\", \"severe\"]\n - field: \"follow_up_required\"\n type: \"boolean\"\n\noutput:\n format: \"json\"\n include_confidence: true\n include_source_text: true\n\nllm:\n model: \"llama2\"\n temperature: 0.1\n max_tokens: 1000\n```\n\n## Documentation\n\n- [Installation Guide](docs/installation.md)\n- [Configuration Reference](docs/configuration.md)\n- [API Documentation](docs/api.md)\n- [Examples](examples/)\n\n## Development\n\nMOSAICX is developed by the DIGITX Lab at the Department of Radiology, LMU Munich University Hospital.\n\n### Requirements\n\n- Python 3.11+\n- Ollama installed locally\n- Local LLM model (e.g., Llama2, CodeLlama)\n\n### Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n## License\n\nApache License 2.0 - see [LICENSE](LICENSE) for details.\n\n## Authors\n\n**Lalith Kumar Shiyam Sundar, PhD** \nDIGITX Lab, Department of Radiology \nLMU Munich University Hospital \n\ud83d\udce7 lalith.shiyam@med.uni-muenchen.de\n\n## Citation\n\nIf you use MOSAICX in your research, please cite:\n\n```bibtex\n@software{mosaicx2024,\n title={MOSAICX: Medical cOmputational Suite for Advanced Intelligent eXtraction},\n author={Sundar, Lalith Kumar Shiyam},\n year={2024},\n institution={DIGITX Lab, Department of Radiology, LMU Munich University Hospital},\n url={https://github.com/LalithShiyam/MOSAICX}\n}\n```\n",
"bugtrack_url": null,
"license": "DUAL LICENSING NOTICE\n ====================\n \n MOSAICX is dual-licensed under the terms of both the GNU Affero General Public License v3.0 (AGPL-3.0) and a Commercial License.\n \n OPEN SOURCE LICENSE\n ===================\n \n This software is available under the GNU Affero General Public License v3.0 (AGPL-3.0).\n \n Under this license, you are free to use, modify, and distribute this software, provided that:\n - Any derivative work or application that uses this software must also be open-sourced under AGPL-3.0\n - If you run this software on a server and provide it as a service, you must make the complete source code of your application (including modifications) available to your users\n - You must include this license notice and copyright information in all copies\n \n For the complete AGPL-3.0 license terms, see LICENSE-AGPL-3.0.txt\n \n COMMERCIAL LICENSE\n ==================\n \n If you wish to use this software in a commercial product or service without the open-source requirements of AGPL-3.0, you must obtain a commercial license.\n \n Commercial licenses are available from:\n \n Zenta GmbH\n \n For commercial licensing inquiries, please contact:\n Email: info@zenta.solutions\n Subject: MOSAICX Commercial License Request\n \n Commercial licensing allows you to:\n - Use this software in proprietary applications\n - Distribute applications containing this software without open-source obligations\n - Customize and modify the software without sharing changes\n - Receive commercial support and maintenance\n \n COPYRIGHT AND ATTRIBUTION\n ==========================\n \n Copyright (c) 2024 DIGITX Lab, Department of Radiology, LMU Munich University Hospital\n Developed by Lalith Kumar Shiyam Sundar, PhD\n \n Commercial licensing managed by Zenta GmbH\n \n IMPORTANT NOTICE\n ================\n \n By using this software, you agree to comply with the terms of one of the above licenses.\n If you are unsure which license applies to your use case, please contact Zenta GmbH for clarification.",
"summary": "Medical cOmputational Suite for Advanced Intelligent eXtraction - Intelligent radiology report extraction using local LLMs",
"version": "1.0.0",
"project_urls": {
"Bug Tracker": "https://github.com/LalithShiyam/MOSAICX/issues",
"Documentation": "https://github.com/LalithShiyam/MOSAICX#readme",
"Homepage": "https://github.com/LalithShiyam/MOSAICX",
"Repository": "https://github.com/LalithShiyam/MOSAICX"
},
"split_keywords": [
"extraction",
" llm",
" medical",
" nlp",
" pdf",
" radiology"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "6d558d8165eb855c67b2d5b8f61159262437e345c93c4a5cf3218a60db2b44d4",
"md5": "23c0f839c78a6864214ecd3a3731e135",
"sha256": "ec0e16d3abd05863b9e72492c0efded846b2891e2602f3b619198beb09da136a"
},
"downloads": -1,
"filename": "mosaicx-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "23c0f839c78a6864214ecd3a3731e135",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 23946,
"upload_time": "2025-09-11T13:53:21",
"upload_time_iso_8601": "2025-09-11T13:53:21.729312Z",
"url": "https://files.pythonhosted.org/packages/6d/55/8d8165eb855c67b2d5b8f61159262437e345c93c4a5cf3218a60db2b44d4/mosaicx-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5d513c8b33edc6cb17360b31efdda2854a086b430e1d32b34b0ec208094e520b",
"md5": "1abeadfb83d4648c4e3dec7b245b703a",
"sha256": "8f314e22747f5ffeac4c0b2e21bbbbf27c2f43841d8c5f9e94974c9259aabed3"
},
"downloads": -1,
"filename": "mosaicx-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "1abeadfb83d4648c4e3dec7b245b703a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 251964,
"upload_time": "2025-09-11T13:53:23",
"upload_time_iso_8601": "2025-09-11T13:53:23.407169Z",
"url": "https://files.pythonhosted.org/packages/5d/51/3c8b33edc6cb17360b31efdda2854a086b430e1d32b34b0ec208094e520b/mosaicx-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-11 13:53:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "LalithShiyam",
"github_project": "MOSAICX",
"github_not_found": true,
"lcname": "mosaicx"
}