scd-analysis


Namescd-analysis JSON
Version 1.0.0 PyPI version JSON
download
home_pageNone
SummarySevere Chronic Disease (SCD) Analysis Pipeline for Danish National Registers
upload_time2025-08-25 18:16:04
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords epidemiology healthcare chronic-disease danish-registers population-health medical-research data-analysis biostatistics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SCD Analysis - Severe Chronic Disease Analysis Pipeline

[![Python Version](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

A high-performance Python package for analyzing severe chronic diseases (SCD) using Danish national health registers. This package provides a complete pipeline for processing, analyzing, and matching complex epidemiological data with lazy evaluation and optimal memory usage.

### Basic Usage

```python
from scd_analysis import run_scd_pipeline, get_default_config

# Run with default configuration
final_data = run_scd_pipeline()

# Customize configuration
config = get_default_config()
config["age_cutoff"] = 5
config["study_period"]["end_year"] = 2020

final_data = run_scd_pipeline(config)

# Basic descriptive analysis
from scd_analysis.pipeline import run_descriptive_analysis
summary_stats = run_descriptive_analysis(final_data)
print(summary_stats)
```

### Advanced Usage

```python
from scd_analysis.data import process_lpr_data, process_mfr_data
from scd_analysis.socioeconomic import SocioeconomicProcessor

# Process specific components
config = get_default_config()

# Process hospital data
df_lpr = process_lpr_data(config)

# Process socioeconomic data with custom settings
socio_processor = SocioeconomicProcessor(config)
df_socio = socio_processor.process(df_lpr)
```

## Package Structure

- **`scd_analysis.config`**: Configuration management
- **`scd_analysis.data`**: Core data processing modules
- **`scd_analysis.socioeconomic`**: Socioeconomic data processing (SEPLINE-compliant)
- **`scd_analysis.pipeline`**: Pipeline orchestration and analysis
- **`scd_analysis.utils`**: Utility functions and helpers

## Data Requirements

This package is designed to work with Danish national health registers:

- **LPR**: Hospital discharge register
- **MFR**: Birth register
- **BEF**: Population register
- **AKM**: Employment register
- **FAIK**: Income register
- **UDDF**: Education register
- **DOD/VNDS**: Death/emigration registers

Data should be provided as Parquet files (single files or partitioned datasets).

## Performance Benefits

- **Lazy Evaluation**: Only loads necessary data into memory
- **Predicate Pushdown**: Filters applied at file level
- **Partitioned Support**: Efficient processing of time-partitioned data
- **Parallel Processing**: Automatic parallelization of operations
- **Memory Optimization**: Streaming processing for large datasets

## Key Features

### Socioeconomic Processing

- SEPLINE-compliant ethnicity categorization (A1, A2, B1, B2, C1, C2)
- Danish regional and municipal classifications
- Population density and urbanization categories
- Family structure and cohabitation status

### SCD Analysis

- Automated severe chronic disease flagging
- Age-appropriate diagnosis criteria
- Temporal analysis capabilities
- Cohort matching and controls

### Data Quality

- Comprehensive validation and quality checks
- Missing data reporting
- Data lineage tracking
- Performance monitoring

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "scd-analysis",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "epidemiology, healthcare, chronic-disease, danish-registers, population-health, medical-research, data-analysis, biostatistics",
    "author": null,
    "author_email": "Tobias Kragholm <tobias.kragholm@example.com>",
    "download_url": "https://files.pythonhosted.org/packages/0b/9e/2d663559fe691e121649b4983db00b4430398a6a22749a1399116c820a13/scd_analysis-1.0.0.tar.gz",
    "platform": null,
    "description": "# SCD Analysis - Severe Chronic Disease Analysis Pipeline\n\n[![Python Version](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nA high-performance Python package for analyzing severe chronic diseases (SCD) using Danish national health registers. This package provides a complete pipeline for processing, analyzing, and matching complex epidemiological data with lazy evaluation and optimal memory usage.\n\n### Basic Usage\n\n```python\nfrom scd_analysis import run_scd_pipeline, get_default_config\n\n# Run with default configuration\nfinal_data = run_scd_pipeline()\n\n# Customize configuration\nconfig = get_default_config()\nconfig[\"age_cutoff\"] = 5\nconfig[\"study_period\"][\"end_year\"] = 2020\n\nfinal_data = run_scd_pipeline(config)\n\n# Basic descriptive analysis\nfrom scd_analysis.pipeline import run_descriptive_analysis\nsummary_stats = run_descriptive_analysis(final_data)\nprint(summary_stats)\n```\n\n### Advanced Usage\n\n```python\nfrom scd_analysis.data import process_lpr_data, process_mfr_data\nfrom scd_analysis.socioeconomic import SocioeconomicProcessor\n\n# Process specific components\nconfig = get_default_config()\n\n# Process hospital data\ndf_lpr = process_lpr_data(config)\n\n# Process socioeconomic data with custom settings\nsocio_processor = SocioeconomicProcessor(config)\ndf_socio = socio_processor.process(df_lpr)\n```\n\n## Package Structure\n\n- **`scd_analysis.config`**: Configuration management\n- **`scd_analysis.data`**: Core data processing modules\n- **`scd_analysis.socioeconomic`**: Socioeconomic data processing (SEPLINE-compliant)\n- **`scd_analysis.pipeline`**: Pipeline orchestration and analysis\n- **`scd_analysis.utils`**: Utility functions and helpers\n\n## Data Requirements\n\nThis package is designed to work with Danish national health registers:\n\n- **LPR**: Hospital discharge register\n- **MFR**: Birth register\n- **BEF**: Population register\n- **AKM**: Employment register\n- **FAIK**: Income register\n- **UDDF**: Education register\n- **DOD/VNDS**: Death/emigration registers\n\nData should be provided as Parquet files (single files or partitioned datasets).\n\n## Performance Benefits\n\n- **Lazy Evaluation**: Only loads necessary data into memory\n- **Predicate Pushdown**: Filters applied at file level\n- **Partitioned Support**: Efficient processing of time-partitioned data\n- **Parallel Processing**: Automatic parallelization of operations\n- **Memory Optimization**: Streaming processing for large datasets\n\n## Key Features\n\n### Socioeconomic Processing\n\n- SEPLINE-compliant ethnicity categorization (A1, A2, B1, B2, C1, C2)\n- Danish regional and municipal classifications\n- Population density and urbanization categories\n- Family structure and cohabitation status\n\n### SCD Analysis\n\n- Automated severe chronic disease flagging\n- Age-appropriate diagnosis criteria\n- Temporal analysis capabilities\n- Cohort matching and controls\n\n### Data Quality\n\n- Comprehensive validation and quality checks\n- Missing data reporting\n- Data lineage tracking\n- Performance monitoring\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Severe Chronic Disease (SCD) Analysis Pipeline for Danish National Registers",
    "version": "1.0.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/tkragholm/scd-analysis/issues",
        "Documentation": "https://scd-analysis.readthedocs.io/",
        "Homepage": "https://github.com/tkragholm/scd-analysis",
        "Repository": "https://github.com/tkragholm/scd-analysis"
    },
    "split_keywords": [
        "epidemiology",
        " healthcare",
        " chronic-disease",
        " danish-registers",
        " population-health",
        " medical-research",
        " data-analysis",
        " biostatistics"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f74df027577c93e9df8cced7a3fb109d96b6a8719b2fd0a80c66b282f69ec5b9",
                "md5": "7b6f0d52c38b4e1102b6cf5cb3c5942f",
                "sha256": "b05faa35f98559eafbf56f77d27ed56538777bbccfb148005e76e3ebd131e1a6"
            },
            "downloads": -1,
            "filename": "scd_analysis-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7b6f0d52c38b4e1102b6cf5cb3c5942f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 42039,
            "upload_time": "2025-08-25T18:16:02",
            "upload_time_iso_8601": "2025-08-25T18:16:02.928093Z",
            "url": "https://files.pythonhosted.org/packages/f7/4d/f027577c93e9df8cced7a3fb109d96b6a8719b2fd0a80c66b282f69ec5b9/scd_analysis-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0b9e2d663559fe691e121649b4983db00b4430398a6a22749a1399116c820a13",
                "md5": "7820ed55881adf835f058904bb757f2c",
                "sha256": "6df9f58b0ea250dc49f6e79a39c340bc8cc040e62051cc52ea0552ef96288693"
            },
            "downloads": -1,
            "filename": "scd_analysis-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "7820ed55881adf835f058904bb757f2c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 34284,
            "upload_time": "2025-08-25T18:16:04",
            "upload_time_iso_8601": "2025-08-25T18:16:04.510246Z",
            "url": "https://files.pythonhosted.org/packages/0b/9e/2d663559fe691e121649b4983db00b4430398a6a22749a1399116c820a13/scd_analysis-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-25 18:16:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tkragholm",
    "github_project": "scd-analysis",
    "github_not_found": true,
    "lcname": "scd-analysis"
}
        
Elapsed time: 1.52671s