lung-sarg


Namelung-sarg JSON
Version 1.0.0 PyPI version JSON
download
home_pageNone
SummaryNone
upload_time2024-08-22 00:33:02
maintainerNone
docs_urlNone
authorMatt McCormick
requires_python<=3.13,>=3.11
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <!-- markdownlint-disable MD033 MD041-->

<p align="center">
  <h1 style="font-size:80px; font-weight: 800;" align="center">L U N G - S A R G</h1>
  <p align="center">The Open Data Platform for Sustainable, Accessible Lung Radiogenomics</a> </p>
</p>

<div align="center">
  <img alt="GitHub" src="https://img.shields.io/github/license/open-radiogenomics/lung-sarg?style=flat-square">
  <img alt="GitHub Workflow Status" src="https://img.shields.io/github/actions/workflow/status/open-radiogenomics/lung-sarg/ci.yml?style=flat-square">
  <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/open-radiogenomics/lung-sarg?style=flat-square">
</div>

<br>

Lung-SARG is a fully open-source and local-first platform that improves how communities collaborate on open data to diagnose lung cancer and perform epidemiology on local populations in low and middle income countries.

> [!TIP]
> Datasets generated by this project are ready to explore and consume at HuggingFace.
>
> [Check them out](https://huggingface.co/radiogenomics)!

### 💡 Principles

- **Open**: Code, standards, infrastructure, and data, are public and open source.
- **Modular and Interoperable**: Each component can be replaced, extended, or removed. Works well in many environments (your laptop, in a cluster, or from the browser), can be deployed to many places (S3 + GH Pages, IPFS, ...) and integrates with multiple tools (thanks to the Arrow and Zarr ecosystems). Use open tools, standards, infrastructure, and share data in accessible formats.
- **Data as Code**: Declarative stateless transformations tracked in `git`. Improves data access and empowers data scientists to conduct research and helps to guide community-driven analysis and decisions. Version your data as code! Publish and share your reusable models for others to build on top. Datasets should be both reproducible and accessible!
- **Glue**: Be a bridge between tools and approaches. E.g: Use software engineering good practices like types, tests, materialized views, and more.
- [**FAIR**](https://www.go-fair.org/fair-principles/).
- **KISS**: Minimal and flexible. Rely on tools that do one thing and do it well.
- **No vendor lock-in**
  - Rely on Open code, standards, and infrastructure.
  - Use the tool you want to create, explore, and consume the datasets. Agnostic of any tooling or infrastructure provider.
  - Standard format for data and APIs! [Keep your data as future-friendly and future-proof as possible](https://indieweb.org/longevity)!
- **Distributed**: Permissionless ecosystem and collaboration. Open source code and make it ready to be improved.
- **Community**: that incentives contributors.
- **Immutability**: Embrace idempotency. Rely on content-addressable storage and append-only logs.
- **Stateless and serverless**: as much as possible. E.g. use GitHub Pages, host datasets on S3, interface with HTML, JavaScript, and WASM. No servers to maintain, no databases to manage, no infrastructure to worry about. Keep infrastructure management lean.
- **Offline-first**: Rely on static files and offline-first tools.
- **Above all, have fun and enjoy the process** 🎉

## Overview

![Lung SARG dataflow](./docs/figures/lung-sarg.png)

*Lung SARG dataflow.*

## ⚙️ Setup and execution

### 🐍 Pixi

You can install all the dependencies inside a reproducible software environment via pixi. To do that, [install pixi](https://pixi.sh), clone the repository, and run the following command from the root folder.

```bash
pixi install -a
```

To see all tasks available:

```bash
pixi task list
```

Start and access the [Dagster UI](http://127.0.0.1:3000) locally.

```bash
pixi run dev
```

### 🧬 Run on sample data

In the Dagster UI, click

 > *Overview* -> *Jobs* -> *stage_idc_nsclc_radiogenomic_samples* -> *Materialize all*

![Materialize staging of samples](./docs/figures/lung-sarg-stage.png)

Observe what happens in the *Overview*, *Runs*, and *Assets* pages of the Dagster UI, and the content in the *lung-sarg/data* directory.


## 🎯 Motivation

This project started after [thinking about what an Open Data Protocol could look like](https://publish.obsidian.md/davidgasquez/Open+Data)!

## 👏 Acknowledgements

- This project was built on the principles espoused by David Gasquez at [Datonic](https://datonic.io). It is built on the approach in the [Datadex](https://datadex.datonic.io/) Open Data Platform and extended for scientific imaging data with [OME-Zarr](https://ngff.openmicroscopy.org/) and the DICOM-based image data model in the [NIH Imaging Data Commons](https://portal.imaging.datacommons.cancer.gov/).
- Lung-SARG is possible thanks to amazing open source projects like [DuckDB](https://www.duckdb.org/), [dbt](https://getdbt.com), [Dagster](https://dagster.io/), [ITK](https://docs.itk.org) and many others...
- This project was built with support from Dr. James Gee in collaboration with the [UPenn PICSL Lab](https://picsl.upenn.edu/).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "lung-sarg",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<=3.13,>=3.11",
    "maintainer_email": null,
    "keywords": null,
    "author": "Matt McCormick",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/09/96/32bc8386f73c633934e9bde686ec571ff35599a5e6a41a2d3136ed05b854/lung_sarg-1.0.0.tar.gz",
    "platform": null,
    "description": "<!-- markdownlint-disable MD033 MD041-->\n\n<p align=\"center\">\n  <h1 style=\"font-size:80px; font-weight: 800;\" align=\"center\">L U N G - S A R G</h1>\n  <p align=\"center\">The Open Data Platform for Sustainable, Accessible Lung Radiogenomics</a> </p>\n</p>\n\n<div align=\"center\">\n  <img alt=\"GitHub\" src=\"https://img.shields.io/github/license/open-radiogenomics/lung-sarg?style=flat-square\">\n  <img alt=\"GitHub Workflow Status\" src=\"https://img.shields.io/github/actions/workflow/status/open-radiogenomics/lung-sarg/ci.yml?style=flat-square\">\n  <img alt=\"GitHub Repo stars\" src=\"https://img.shields.io/github/stars/open-radiogenomics/lung-sarg?style=flat-square\">\n</div>\n\n<br>\n\nLung-SARG is a fully open-source and local-first platform that improves how communities collaborate on open data to diagnose lung cancer and perform epidemiology on local populations in low and middle income countries.\n\n> [!TIP]\n> Datasets generated by this project are ready to explore and consume at HuggingFace.\n>\n> [Check them out](https://huggingface.co/radiogenomics)!\n\n### \ud83d\udca1 Principles\n\n- **Open**: Code, standards, infrastructure, and data, are public and open source.\n- **Modular and Interoperable**: Each component can be replaced, extended, or removed. Works well in many environments (your laptop, in a cluster, or from the browser), can be deployed to many places (S3 + GH Pages, IPFS, ...) and integrates with multiple tools (thanks to the Arrow and Zarr ecosystems). Use open tools, standards, infrastructure, and share data in accessible formats.\n- **Data as Code**: Declarative stateless transformations tracked in `git`. Improves data access and empowers data scientists to conduct research and helps to guide community-driven analysis and decisions. Version your data as code! Publish and share your reusable models for others to build on top. Datasets should be both reproducible and accessible!\n- **Glue**: Be a bridge between tools and approaches. E.g: Use software engineering good practices like types, tests, materialized views, and more.\n- [**FAIR**](https://www.go-fair.org/fair-principles/).\n- **KISS**: Minimal and flexible. Rely on tools that do one thing and do it well.\n- **No vendor lock-in**\n  - Rely on Open code, standards, and infrastructure.\n  - Use the tool you want to create, explore, and consume the datasets. Agnostic of any tooling or infrastructure provider.\n  - Standard format for data and APIs! [Keep your data as future-friendly and future-proof as possible](https://indieweb.org/longevity)!\n- **Distributed**: Permissionless ecosystem and collaboration. Open source code and make it ready to be improved.\n- **Community**: that incentives contributors.\n- **Immutability**: Embrace idempotency. Rely on content-addressable storage and append-only logs.\n- **Stateless and serverless**: as much as possible. E.g. use GitHub Pages, host datasets on S3, interface with HTML, JavaScript, and WASM. No servers to maintain, no databases to manage, no infrastructure to worry about. Keep infrastructure management lean.\n- **Offline-first**: Rely on static files and offline-first tools.\n- **Above all, have fun and enjoy the process** \ud83c\udf89\n\n## Overview\n\n![Lung SARG dataflow](./docs/figures/lung-sarg.png)\n\n*Lung SARG dataflow.*\n\n## \u2699\ufe0f Setup and execution\n\n### \ud83d\udc0d Pixi\n\nYou can install all the dependencies inside a reproducible software environment via pixi. To do that, [install pixi](https://pixi.sh), clone the repository, and run the following command from the root folder.\n\n```bash\npixi install -a\n```\n\nTo see all tasks available:\n\n```bash\npixi task list\n```\n\nStart and access the [Dagster UI](http://127.0.0.1:3000) locally.\n\n```bash\npixi run dev\n```\n\n### \ud83e\uddec Run on sample data\n\nIn the Dagster UI, click\n\n > *Overview* -> *Jobs* -> *stage_idc_nsclc_radiogenomic_samples* -> *Materialize all*\n\n![Materialize staging of samples](./docs/figures/lung-sarg-stage.png)\n\nObserve what happens in the *Overview*, *Runs*, and *Assets* pages of the Dagster UI, and the content in the *lung-sarg/data* directory.\n\n\n## \ud83c\udfaf Motivation\n\nThis project started after [thinking about what an Open Data Protocol could look like](https://publish.obsidian.md/davidgasquez/Open+Data)!\n\n## \ud83d\udc4f Acknowledgements\n\n- This project was built on the principles espoused by David Gasquez at [Datonic](https://datonic.io). It is built on the approach in the [Datadex](https://datadex.datonic.io/) Open Data Platform and extended for scientific imaging data with [OME-Zarr](https://ngff.openmicroscopy.org/) and the DICOM-based image data model in the [NIH Imaging Data Commons](https://portal.imaging.datacommons.cancer.gov/).\n- Lung-SARG is possible thanks to amazing open source projects like [DuckDB](https://www.duckdb.org/), [dbt](https://getdbt.com), [Dagster](https://dagster.io/), [ITK](https://docs.itk.org) and many others...\n- This project was built with support from Dr. James Gee in collaboration with the [UPenn PICSL Lab](https://picsl.upenn.edu/).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": null,
    "version": "1.0.0",
    "project_urls": {
        "CI": "https://github.com/open-radiogenomics/lung-sarg/actions",
        "Changelog": "https://github.com/open-radiogenomics/lung-sarg/commits/main/",
        "Homepage": "https://radiogenomics.github.io/lung-sarg/",
        "Issues": "https://github.com/open-radiogenomics/lung-sarg/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "df5548697fb14b3676d24f2dedfa0f3af8a7bbe38b8da99eaebeda0e42c3ba13",
                "md5": "59000e04702c6d2b185f1fd00db5b929",
                "sha256": "32391fc670edad6b3e68e803a55f78f7df1ddd0565c6b66b032293afa90d6473"
            },
            "downloads": -1,
            "filename": "lung_sarg-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "59000e04702c6d2b185f1fd00db5b929",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<=3.13,>=3.11",
            "size": 9826,
            "upload_time": "2024-08-22T00:33:01",
            "upload_time_iso_8601": "2024-08-22T00:33:01.684909Z",
            "url": "https://files.pythonhosted.org/packages/df/55/48697fb14b3676d24f2dedfa0f3af8a7bbe38b8da99eaebeda0e42c3ba13/lung_sarg-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "099632bc8386f73c633934e9bde686ec571ff35599a5e6a41a2d3136ed05b854",
                "md5": "a0b8ccc2c2b9257cd3e143eb28f5448d",
                "sha256": "b5bec0f6491654d4cd05ffb8450c9c4162947d07d62aa10b7e598a17c0f63542"
            },
            "downloads": -1,
            "filename": "lung_sarg-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a0b8ccc2c2b9257cd3e143eb28f5448d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<=3.13,>=3.11",
            "size": 11840,
            "upload_time": "2024-08-22T00:33:02",
            "upload_time_iso_8601": "2024-08-22T00:33:02.803788Z",
            "url": "https://files.pythonhosted.org/packages/09/96/32bc8386f73c633934e9bde686ec571ff35599a5e6a41a2d3136ed05b854/lung_sarg-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-22 00:33:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "open-radiogenomics",
    "github_project": "lung-sarg",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "lung-sarg"
}
        
Elapsed time: 6.52432s