# SCD Analysis - Severe Chronic Disease Analysis Pipeline
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/psf/black)
A high-performance Python package for analyzing severe chronic diseases (SCD) using Danish national health registers. This package provides a complete pipeline for processing, analyzing, and matching complex epidemiological data with lazy evaluation and optimal memory usage.
### Basic Usage
```python
from scd_analysis import run_scd_pipeline, get_default_config
# Run with default configuration
final_data = run_scd_pipeline()
# Customize configuration
config = get_default_config()
config["age_cutoff"] = 5
config["study_period"]["end_year"] = 2020
final_data = run_scd_pipeline(config)
# Basic descriptive analysis
from scd_analysis.pipeline import run_descriptive_analysis
summary_stats = run_descriptive_analysis(final_data)
print(summary_stats)
```
### Advanced Usage
```python
from scd_analysis.data import process_lpr_data, process_mfr_data
from scd_analysis.socioeconomic import SocioeconomicProcessor
# Process specific components
config = get_default_config()
# Process hospital data
df_lpr = process_lpr_data(config)
# Process socioeconomic data with custom settings
socio_processor = SocioeconomicProcessor(config)
df_socio = socio_processor.process(df_lpr)
```
## Package Structure
- **`scd_analysis.config`**: Configuration management
- **`scd_analysis.data`**: Core data processing modules
- **`scd_analysis.socioeconomic`**: Socioeconomic data processing (SEPLINE-compliant)
- **`scd_analysis.pipeline`**: Pipeline orchestration and analysis
- **`scd_analysis.utils`**: Utility functions and helpers
## Data Requirements
This package is designed to work with Danish national health registers:
- **LPR**: Hospital discharge register
- **MFR**: Birth register
- **BEF**: Population register
- **AKM**: Employment register
- **FAIK**: Income register
- **UDDF**: Education register
- **DOD/VNDS**: Death/emigration registers
Data should be provided as Parquet files (single files or partitioned datasets).
## Performance Benefits
- **Lazy Evaluation**: Only loads necessary data into memory
- **Predicate Pushdown**: Filters applied at file level
- **Partitioned Support**: Efficient processing of time-partitioned data
- **Parallel Processing**: Automatic parallelization of operations
- **Memory Optimization**: Streaming processing for large datasets
## Key Features
### Socioeconomic Processing
- SEPLINE-compliant ethnicity categorization (A1, A2, B1, B2, C1, C2)
- Danish regional and municipal classifications
- Population density and urbanization categories
- Family structure and cohabitation status
### SCD Analysis
- Automated severe chronic disease flagging
- Age-appropriate diagnosis criteria
- Temporal analysis capabilities
- Cohort matching and controls
### Data Quality
- Comprehensive validation and quality checks
- Missing data reporting
- Data lineage tracking
- Performance monitoring
Raw data
{
"_id": null,
"home_page": null,
"name": "scd-analysis",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "epidemiology, healthcare, chronic-disease, danish-registers, population-health, medical-research, data-analysis, biostatistics",
"author": null,
"author_email": "Tobias Kragholm <tobias.kragholm@example.com>",
"download_url": "https://files.pythonhosted.org/packages/0b/9e/2d663559fe691e121649b4983db00b4430398a6a22749a1399116c820a13/scd_analysis-1.0.0.tar.gz",
"platform": null,
"description": "# SCD Analysis - Severe Chronic Disease Analysis Pipeline\n\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/psf/black)\n\nA high-performance Python package for analyzing severe chronic diseases (SCD) using Danish national health registers. This package provides a complete pipeline for processing, analyzing, and matching complex epidemiological data with lazy evaluation and optimal memory usage.\n\n### Basic Usage\n\n```python\nfrom scd_analysis import run_scd_pipeline, get_default_config\n\n# Run with default configuration\nfinal_data = run_scd_pipeline()\n\n# Customize configuration\nconfig = get_default_config()\nconfig[\"age_cutoff\"] = 5\nconfig[\"study_period\"][\"end_year\"] = 2020\n\nfinal_data = run_scd_pipeline(config)\n\n# Basic descriptive analysis\nfrom scd_analysis.pipeline import run_descriptive_analysis\nsummary_stats = run_descriptive_analysis(final_data)\nprint(summary_stats)\n```\n\n### Advanced Usage\n\n```python\nfrom scd_analysis.data import process_lpr_data, process_mfr_data\nfrom scd_analysis.socioeconomic import SocioeconomicProcessor\n\n# Process specific components\nconfig = get_default_config()\n\n# Process hospital data\ndf_lpr = process_lpr_data(config)\n\n# Process socioeconomic data with custom settings\nsocio_processor = SocioeconomicProcessor(config)\ndf_socio = socio_processor.process(df_lpr)\n```\n\n## Package Structure\n\n- **`scd_analysis.config`**: Configuration management\n- **`scd_analysis.data`**: Core data processing modules\n- **`scd_analysis.socioeconomic`**: Socioeconomic data processing (SEPLINE-compliant)\n- **`scd_analysis.pipeline`**: Pipeline orchestration and analysis\n- **`scd_analysis.utils`**: Utility functions and helpers\n\n## Data Requirements\n\nThis package is designed to work with Danish national health registers:\n\n- **LPR**: Hospital discharge register\n- **MFR**: Birth register\n- **BEF**: Population register\n- **AKM**: Employment register\n- **FAIK**: Income register\n- **UDDF**: Education register\n- **DOD/VNDS**: Death/emigration registers\n\nData should be provided as Parquet files (single files or partitioned datasets).\n\n## Performance Benefits\n\n- **Lazy Evaluation**: Only loads necessary data into memory\n- **Predicate Pushdown**: Filters applied at file level\n- **Partitioned Support**: Efficient processing of time-partitioned data\n- **Parallel Processing**: Automatic parallelization of operations\n- **Memory Optimization**: Streaming processing for large datasets\n\n## Key Features\n\n### Socioeconomic Processing\n\n- SEPLINE-compliant ethnicity categorization (A1, A2, B1, B2, C1, C2)\n- Danish regional and municipal classifications\n- Population density and urbanization categories\n- Family structure and cohabitation status\n\n### SCD Analysis\n\n- Automated severe chronic disease flagging\n- Age-appropriate diagnosis criteria\n- Temporal analysis capabilities\n- Cohort matching and controls\n\n### Data Quality\n\n- Comprehensive validation and quality checks\n- Missing data reporting\n- Data lineage tracking\n- Performance monitoring\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Severe Chronic Disease (SCD) Analysis Pipeline for Danish National Registers",
"version": "1.0.0",
"project_urls": {
"Bug Tracker": "https://github.com/tkragholm/scd-analysis/issues",
"Documentation": "https://scd-analysis.readthedocs.io/",
"Homepage": "https://github.com/tkragholm/scd-analysis",
"Repository": "https://github.com/tkragholm/scd-analysis"
},
"split_keywords": [
"epidemiology",
" healthcare",
" chronic-disease",
" danish-registers",
" population-health",
" medical-research",
" data-analysis",
" biostatistics"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f74df027577c93e9df8cced7a3fb109d96b6a8719b2fd0a80c66b282f69ec5b9",
"md5": "7b6f0d52c38b4e1102b6cf5cb3c5942f",
"sha256": "b05faa35f98559eafbf56f77d27ed56538777bbccfb148005e76e3ebd131e1a6"
},
"downloads": -1,
"filename": "scd_analysis-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7b6f0d52c38b4e1102b6cf5cb3c5942f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 42039,
"upload_time": "2025-08-25T18:16:02",
"upload_time_iso_8601": "2025-08-25T18:16:02.928093Z",
"url": "https://files.pythonhosted.org/packages/f7/4d/f027577c93e9df8cced7a3fb109d96b6a8719b2fd0a80c66b282f69ec5b9/scd_analysis-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0b9e2d663559fe691e121649b4983db00b4430398a6a22749a1399116c820a13",
"md5": "7820ed55881adf835f058904bb757f2c",
"sha256": "6df9f58b0ea250dc49f6e79a39c340bc8cc040e62051cc52ea0552ef96288693"
},
"downloads": -1,
"filename": "scd_analysis-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "7820ed55881adf835f058904bb757f2c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 34284,
"upload_time": "2025-08-25T18:16:04",
"upload_time_iso_8601": "2025-08-25T18:16:04.510246Z",
"url": "https://files.pythonhosted.org/packages/0b/9e/2d663559fe691e121649b4983db00b4430398a6a22749a1399116c820a13/scd_analysis-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-25 18:16:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tkragholm",
"github_project": "scd-analysis",
"github_not_found": true,
"lcname": "scd-analysis"
}