flatiron-cleaner


Nameflatiron-cleaner JSON
Version 0.1.4 PyPI version JSON
download
home_pageNone
SummaryA Python package for cleaning and harmonizing Flatiron Health cancer data
upload_time2025-10-30 05:30:51
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords cancer healthcare data flatiron
VCS
bugtrack_url
requirements asttokens decorator executing ipython jedi matplotlib-inline numpy pandas parso pexpect prompt_toolkit ptyprocess pure_eval Pygments python-dateutil pytz six stack-data traitlets tzdata wcwidth
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # flatiron-cleaner

`flatiron-cleaner` is a Python package that cleans Flatiron Health cancer datasets into analysis-ready formats, specifically designed with predictive modeling and survival analysis in mind. By automating complex and tedious data processing workflows, it helps researchers extract meaningful insights and ensure reproducible results while reducing preparation time.

Key features of the package include:
- Providing a modular architecture that allows researchers to select which Flatiron files to process
- Converting long-format dataframes into wide-format dataframes with unique PatientIDs per row
- Ensuring appropriate data types for predictive modeling and statistical analysis
- Standardizing data cleaning around a user-specified index date, such as metastatic diagnosis or treatment initiation
- Engineering clinically relevant variables for analysis

## Installation

Built and tested in python 3.13.

```python
pip install flatiron-cleaner 

```

## Available Processors

### Cancer-Specific Processors

The following cancers have their own dedicated data processor class:

| Cancer Type | Processor Name | 
|-------------|-----------------|
| Advanced Urothelial Cancer | `DataProcessorUrothelial` |
| Advanced NSCLC | `DataProcessorNSCLC` |
| Metastatic Colorectal Cancer | `DataProcessorColorectal` |
| Metastatic Breast Cancer | `DataProcessorBreast` |
| Metastatic Prostate Cancer | `DataProcessorProstate` |
| Metastatic Renal Cell Cancer | `DataProcessorRenal` |

### General Processor 

For cancer types without a dedicated processor, `DataProcessorGeneral` is available with standard methods. 

## Processing Methods

### Standard Methods

The following methods are available across all processor classes, including the general processor:

| Method | Description | File Processed |
|--------|-------------|----------------|
| `process_demographics()` | Processes patient demographic information | Demographics.csv |
| `process_mortality()` | Processes mortality data | Enhanced_Mortality_V2.csv |
| `process_ecog()` | Processes performance status data | ECOG.csv |
| `process_medications()` | Processes medication administration records | MedicationAdministration.csv |
| `process_diagnosis()` | Processes ICD coding information | Diagnosis.csv |
| `process_labs()` | Processes laboratory test results | Lab.csv |
| `process_vitals()` | Processes vital signs data | Vitals.csv |
| `process_insurance()` | Processes insurance information | Insurance.csv |
| `process_practice()` | Processes practice type data | Practice.csv |

### Cancer-Specific Methods

Cancer-specific classes contain additional methods (e.g., `process_enhanced()` and `process_biomarkers()`). For a complete list of available methods for each cancer type, refer to the source code or use Python's built-in help functionality:

```python
from flatiron_cleaner import DataProcessorUrothelial

```

## Usage Example

```python
from flatiron_cleaner import DataProcessorUrothelial
from flatiron_cleaner import merge_dataframes

# Initialize class
processor = DataProcessorUrothelial()

# Import dataframe with PatientIDs and index date of interest
df = pd.read_csv('path/to/your/data')

# Load and clean data
cleaned_ecog_df = processor.process_ecog('path/to/your/ECOG.csv',
                                         index_date_df=df,
                                         index_date_column='AdvancedDiagnosisDate',
                                         days_before=30,
                                         days_after=0)                  

cleaned_medication_df = processor.process_medications('path/to/your/MedicationAdmninistration.csv',
                                                      index_date_df=df,
                                                      index_date_column='AdvancedDiagnosisDate',
                                                      days_before=180,
                                                      days_after=0)

# Merge dataframes 
merged_data = merge_dataframes(cleaned_ecog_df, cleaned_medication_df)
```

For a more detailed usage demonstration, see the notebook titled "tutorial" in the `example/` directory.

## Contact

Contributions and feedback are welcome. Contact: xavierorcutt@gmail.com

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "flatiron-cleaner",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "cancer, healthcare, data, flatiron",
    "author": null,
    "author_email": "Xavier Orcutt <xavierorcutt@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d7/cd/1297296ef26131aed3a5ada37fae2041e1d1e43e5ef500a6522235708a73/flatiron_cleaner-0.1.4.tar.gz",
    "platform": null,
    "description": "# flatiron-cleaner\n\n`flatiron-cleaner` is a Python package that cleans Flatiron Health cancer datasets into analysis-ready formats, specifically designed with predictive modeling and survival analysis in mind. By automating complex and tedious data processing workflows, it helps researchers extract meaningful insights and ensure reproducible results while reducing preparation time.\n\nKey features of the package include:\n- Providing a modular architecture that allows researchers to select which Flatiron files to process\n- Converting long-format dataframes into wide-format dataframes with unique PatientIDs per row\n- Ensuring appropriate data types for predictive modeling and statistical analysis\n- Standardizing data cleaning around a user-specified index date, such as metastatic diagnosis or treatment initiation\n- Engineering clinically relevant variables for analysis\n\n## Installation\n\nBuilt and tested in python 3.13.\n\n```python\npip install flatiron-cleaner \n\n```\n\n## Available Processors\n\n### Cancer-Specific Processors\n\nThe following cancers have their own dedicated data processor class:\n\n| Cancer Type | Processor Name | \n|-------------|-----------------|\n| Advanced Urothelial Cancer | `DataProcessorUrothelial` |\n| Advanced NSCLC | `DataProcessorNSCLC` |\n| Metastatic Colorectal Cancer | `DataProcessorColorectal` |\n| Metastatic Breast Cancer | `DataProcessorBreast` |\n| Metastatic Prostate Cancer | `DataProcessorProstate` |\n| Metastatic Renal Cell Cancer | `DataProcessorRenal` |\n\n### General Processor \n\nFor cancer types without a dedicated processor, `DataProcessorGeneral` is available with standard methods. \n\n## Processing Methods\n\n### Standard Methods\n\nThe following methods are available across all processor classes, including the general processor:\n\n| Method | Description | File Processed |\n|--------|-------------|----------------|\n| `process_demographics()` | Processes patient demographic information | Demographics.csv |\n| `process_mortality()` | Processes mortality data | Enhanced_Mortality_V2.csv |\n| `process_ecog()` | Processes performance status data | ECOG.csv |\n| `process_medications()` | Processes medication administration records | MedicationAdministration.csv |\n| `process_diagnosis()` | Processes ICD coding information | Diagnosis.csv |\n| `process_labs()` | Processes laboratory test results | Lab.csv |\n| `process_vitals()` | Processes vital signs data | Vitals.csv |\n| `process_insurance()` | Processes insurance information | Insurance.csv |\n| `process_practice()` | Processes practice type data | Practice.csv |\n\n### Cancer-Specific Methods\n\nCancer-specific classes contain additional methods (e.g., `process_enhanced()` and `process_biomarkers()`). For a complete list of available methods for each cancer type, refer to the source code or use Python's built-in help functionality:\n\n```python\nfrom flatiron_cleaner import DataProcessorUrothelial\n\n```\n\n## Usage Example\n\n```python\nfrom flatiron_cleaner import DataProcessorUrothelial\nfrom flatiron_cleaner import merge_dataframes\n\n# Initialize class\nprocessor = DataProcessorUrothelial()\n\n# Import dataframe with PatientIDs and index date of interest\ndf = pd.read_csv('path/to/your/data')\n\n# Load and clean data\ncleaned_ecog_df = processor.process_ecog('path/to/your/ECOG.csv',\n                                         index_date_df=df,\n                                         index_date_column='AdvancedDiagnosisDate',\n                                         days_before=30,\n                                         days_after=0)                  \n\ncleaned_medication_df = processor.process_medications('path/to/your/MedicationAdmninistration.csv',\n                                                      index_date_df=df,\n                                                      index_date_column='AdvancedDiagnosisDate',\n                                                      days_before=180,\n                                                      days_after=0)\n\n# Merge dataframes \nmerged_data = merge_dataframes(cleaned_ecog_df, cleaned_medication_df)\n```\n\nFor a more detailed usage demonstration, see the notebook titled \"tutorial\" in the `example/` directory.\n\n## Contact\n\nContributions and feedback are welcome. Contact: xavierorcutt@gmail.com\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python package for cleaning and harmonizing Flatiron Health cancer data",
    "version": "0.1.4",
    "project_urls": {
        "Bug Tracker": "https://github.com/xavier-orcutt/FlatironCleaner/issues",
        "Documentation": "https://github.com/xavier-orcutt/FlatironCleaner#readme",
        "Homepage": "https://github.com/xavier-orcutt/FlatironCleaner"
    },
    "split_keywords": [
        "cancer",
        " healthcare",
        " data",
        " flatiron"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fb0ec63b3fabd235ac6a38e7999c6d17a1b357b9fccc83319fe23db22eecd230",
                "md5": "a7bdadfa00ff756982615fcd57d54fd2",
                "sha256": "a91dbd071c50e53e5bbbd3971752805dea23a3351e908d0c974def8bb562e102"
            },
            "downloads": -1,
            "filename": "flatiron_cleaner-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a7bdadfa00ff756982615fcd57d54fd2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 182325,
            "upload_time": "2025-10-30T05:30:50",
            "upload_time_iso_8601": "2025-10-30T05:30:50.585872Z",
            "url": "https://files.pythonhosted.org/packages/fb/0e/c63b3fabd235ac6a38e7999c6d17a1b357b9fccc83319fe23db22eecd230/flatiron_cleaner-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d7cd1297296ef26131aed3a5ada37fae2041e1d1e43e5ef500a6522235708a73",
                "md5": "3313bae1f72e3be66540623fc81081a6",
                "sha256": "71d3fa331ea11318b7aeb54eb2e933f0608301738944b784059d0e41b109fa33"
            },
            "downloads": -1,
            "filename": "flatiron_cleaner-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "3313bae1f72e3be66540623fc81081a6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 171806,
            "upload_time": "2025-10-30T05:30:51",
            "upload_time_iso_8601": "2025-10-30T05:30:51.736581Z",
            "url": "https://files.pythonhosted.org/packages/d7/cd/1297296ef26131aed3a5ada37fae2041e1d1e43e5ef500a6522235708a73/flatiron_cleaner-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-30 05:30:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "xavier-orcutt",
    "github_project": "FlatironCleaner",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "asttokens",
            "specs": [
                [
                    "==",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "decorator",
            "specs": [
                [
                    "==",
                    "5.1.1"
                ]
            ]
        },
        {
            "name": "executing",
            "specs": [
                [
                    "==",
                    "2.2.0"
                ]
            ]
        },
        {
            "name": "ipython",
            "specs": [
                [
                    "==",
                    "8.31.0"
                ]
            ]
        },
        {
            "name": "jedi",
            "specs": [
                [
                    "==",
                    "0.19.2"
                ]
            ]
        },
        {
            "name": "matplotlib-inline",
            "specs": [
                [
                    "==",
                    "0.1.7"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "2.2.2"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.3"
                ]
            ]
        },
        {
            "name": "parso",
            "specs": [
                [
                    "==",
                    "0.8.4"
                ]
            ]
        },
        {
            "name": "pexpect",
            "specs": [
                [
                    "==",
                    "4.9.0"
                ]
            ]
        },
        {
            "name": "prompt_toolkit",
            "specs": [
                [
                    "==",
                    "3.0.50"
                ]
            ]
        },
        {
            "name": "ptyprocess",
            "specs": [
                [
                    "==",
                    "0.7.0"
                ]
            ]
        },
        {
            "name": "pure_eval",
            "specs": [
                [
                    "==",
                    "0.2.3"
                ]
            ]
        },
        {
            "name": "Pygments",
            "specs": [
                [
                    "==",
                    "2.19.1"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.9.0.post0"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2024.2"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.17.0"
                ]
            ]
        },
        {
            "name": "stack-data",
            "specs": [
                [
                    "==",
                    "0.6.3"
                ]
            ]
        },
        {
            "name": "traitlets",
            "specs": [
                [
                    "==",
                    "5.14.3"
                ]
            ]
        },
        {
            "name": "tzdata",
            "specs": [
                [
                    "==",
                    "2025.1"
                ]
            ]
        },
        {
            "name": "wcwidth",
            "specs": [
                [
                    "==",
                    "0.2.13"
                ]
            ]
        }
    ],
    "lcname": "flatiron-cleaner"
}
        
Elapsed time: 1.63978s