synthea-pydantic


Namesynthea-pydantic JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryType-safe Pydantic models for Synthea health data CSV exports
upload_time2025-07-24 20:12:11
maintainerNone
docs_urlNone
authorNone
requires_python>=3.12
licenseMIT
keywords synthea healthcare pydantic csv synthetic-data health-data
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # synthea-pydantic

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)

Type-safe Pydantic models for parsing and validating [Synthea's](https://github.com/synthetichealth/synthea) synthetic healthcare data CSV exports.

## Overview

synthea-pydantic provides lightweight, type-annotated Pydantic models that make it easy to work with Synthea's CSV output format in Python. Synthea is a synthetic patient generator that creates realistic (but not real) patient health records for research, education, and software development.

### Key Features

- 🏥 **Complete Coverage**: Models for all 19 Synthea CSV export types
- 🔍 **Type Safety**: Full type annotations with proper validation
- 🚀 **Easy to Use**: Simple API that works with standard CSV libraries
- 📋 **Well Documented**: Comprehensive field descriptions from Synthea specifications
- 🔧 **Flexible**: Handles optional fields and empty values gracefully
- ⚡ **Lightweight**: Minimal dependencies (just Pydantic)

## Installation

```bash
pip install synthea-pydantic
```

Or with [uv](https://github.com/astral-sh/uv):

```bash
uv pip install synthea-pydantic
```

## Quick Start

```python
import csv
from synthea_pydantic import Patient, Medication, Condition

# Load patients from CSV
with open('patients.csv') as f:
    reader = csv.DictReader(f)
    patients = [Patient(**row) for row in reader]

# Access patient data with full type safety
for patient in patients:
    print(f"{patient.first} {patient.last} - Born: {patient.birthdate}")
    if patient.deathdate:
        print(f"  Died: {patient.deathdate}")

# Load related data
with open('medications.csv') as f:
    reader = csv.DictReader(f)
    medications = [Medication(**row) for row in reader]

# Filter medications for a specific patient
patient_meds = [m for m in medications if m.patient == patient.id]
```

## Supported Models

synthea-pydantic includes models for all Synthea CSV export types:

| Model | Description | Key Fields |
|-------|-------------|------------|
| `Patient` | Patient demographics | id, birthdate, name, address, ssn |
| `Encounter` | Healthcare encounters | id, patient, start/stop, type, provider |
| `Condition` | Medical conditions | patient, code, description, onset |
| `Medication` | Prescriptions | patient, code, description, start/stop |
| `Observation` | Clinical observations | patient, code, value, units |
| `Procedure` | Medical procedures | patient, code, description, date |
| `Immunization` | Vaccination records | patient, code, date |
| `CarePlan` | Treatment plans | patient, code, activities |
| `Allergy` | Allergy records | patient, code, description |
| `Device` | Medical devices | patient, code, start/stop |
| `Supply` | Medical supplies | patient, code, quantity |
| `Organization` | Healthcare facilities | id, name, address, phone |
| `Provider` | Healthcare providers | id, name, speciality, organization |
| `Payer` | Insurance companies | id, name, ownership |
| `PayerTransition` | Insurance changes | patient, payer, start/stop |
| `Claim` | Insurance claims | id, patient, provider, total |
| `ClaimTransaction` | Claim line items | claim, type, amount |
| `ImagingStudy` | Medical imaging | patient, modality, body_site |

## Usage Examples

### Loading CSV Data

The models work with Python's built-in `csv` module:

```python
import csv
from synthea_pydantic import Patient

# Load from CSV file
with open('data/patients.csv') as f:
    reader = csv.DictReader(f)
    patients = [Patient(**row) for row in reader]
```

### Working with Optional Fields

Synthea CSVs often have empty values. The models handle these gracefully:

```python
# Empty strings in CSV are converted to None
patient = Patient(**{
    'Id': '123e4567-e89b-12d3-a456-426614174000',
    'BIRTHDATE': '1980-01-01',
    'DEATHDATE': '',  # Empty string becomes None
    'PREFIX': '',     # Empty string becomes None
    'FIRST': 'John',
    'LAST': 'Doe',
    # ... other required fields
})

assert patient.deathdate is None
assert patient.prefix is None
```

### Type Validation

All fields are validated according to their types:

```python
from decimal import Decimal
from datetime import date, datetime
from uuid import UUID

# UUIDs are automatically parsed
assert isinstance(patient.id, UUID)

# Dates are parsed from YYYY-MM-DD format
assert isinstance(patient.birthdate, date)

# Decimals maintain precision for monetary values
assert isinstance(patient.healthcare_expenses, Decimal)
```

### Linking Related Data

Use the UUID foreign keys to link related records:

```python
# Find all medications for a patient
patient_meds = [
    med for med in medications 
    if med.patient == patient.id
]

# Find all conditions treated in an encounter
encounter_conditions = [
    cond for cond in conditions 
    if cond.encounter == encounter.id
]
```

### Error Handling

The models provide clear error messages for invalid data:

```python
try:
    patient = Patient(**invalid_data)
except ValidationError as e:
    print(f"Validation failed: {e}")
```

## Model Details

### Common Field Types

- **IDs**: UUID fields for primary and foreign keys
- **Dates**: `date` fields for dates (YYYY-MM-DD)
- **Timestamps**: `datetime` fields for date/time values
- **Money**: `Decimal` fields for monetary amounts
- **Codes**: String fields for medical codes (SNOMED-CT, RxNorm, etc.)

### Base Model Features

All models inherit from `SyntheaBaseModel` which provides:

- Automatic whitespace stripping
- Empty string to None conversion
- Case-insensitive literal field matching
- Field alias support for CSV column mapping

## Development

### Setup

To develop or contribute to synthea-pydantic:

```bash
# Clone the repository
git clone https://github.com/yourusername/synthea-pydantic.git
cd synthea-pydantic

# Install in development mode
uv pip install -e .
```

### Running Tests

```bash
# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=synthea_pydantic

# Run specific test file
uv run pytest tests/test_patients.py
```

### Code Quality

```bash
# Type checking
uv run mypy synthea_pydantic/

# Linting
uv run ruff check

# Formatting
uv run ruff format
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- [Synthea](https://github.com/synthetichealth/synthea) - The synthetic patient generator
- [Pydantic](https://pydantic-docs.helpmanual.io/) - Data validation using Python type annotations

## Resources

- [Synthea Documentation](https://github.com/synthetichealth/synthea/wiki)
- [Synthea CSV Format](https://github.com/synthetichealth/synthea/wiki/CSV-File-Data-Dictionary)
- [Sample Synthea Data](https://github.com/synthetichealth/synthea-sample-data)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "synthea-pydantic",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "synthea, healthcare, pydantic, csv, synthetic-data, health-data",
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/1c/39/88324779fca128581adeb8eb781b8b04158ce37ff80a79684b2182a196a2/synthea_pydantic-0.1.0.tar.gz",
    "platform": null,
    "description": "# synthea-pydantic\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)\n\nType-safe Pydantic models for parsing and validating [Synthea's](https://github.com/synthetichealth/synthea) synthetic healthcare data CSV exports.\n\n## Overview\n\nsynthea-pydantic provides lightweight, type-annotated Pydantic models that make it easy to work with Synthea's CSV output format in Python. Synthea is a synthetic patient generator that creates realistic (but not real) patient health records for research, education, and software development.\n\n### Key Features\n\n- \ud83c\udfe5 **Complete Coverage**: Models for all 19 Synthea CSV export types\n- \ud83d\udd0d **Type Safety**: Full type annotations with proper validation\n- \ud83d\ude80 **Easy to Use**: Simple API that works with standard CSV libraries\n- \ud83d\udccb **Well Documented**: Comprehensive field descriptions from Synthea specifications\n- \ud83d\udd27 **Flexible**: Handles optional fields and empty values gracefully\n- \u26a1 **Lightweight**: Minimal dependencies (just Pydantic)\n\n## Installation\n\n```bash\npip install synthea-pydantic\n```\n\nOr with [uv](https://github.com/astral-sh/uv):\n\n```bash\nuv pip install synthea-pydantic\n```\n\n## Quick Start\n\n```python\nimport csv\nfrom synthea_pydantic import Patient, Medication, Condition\n\n# Load patients from CSV\nwith open('patients.csv') as f:\n    reader = csv.DictReader(f)\n    patients = [Patient(**row) for row in reader]\n\n# Access patient data with full type safety\nfor patient in patients:\n    print(f\"{patient.first} {patient.last} - Born: {patient.birthdate}\")\n    if patient.deathdate:\n        print(f\"  Died: {patient.deathdate}\")\n\n# Load related data\nwith open('medications.csv') as f:\n    reader = csv.DictReader(f)\n    medications = [Medication(**row) for row in reader]\n\n# Filter medications for a specific patient\npatient_meds = [m for m in medications if m.patient == patient.id]\n```\n\n## Supported Models\n\nsynthea-pydantic includes models for all Synthea CSV export types:\n\n| Model | Description | Key Fields |\n|-------|-------------|------------|\n| `Patient` | Patient demographics | id, birthdate, name, address, ssn |\n| `Encounter` | Healthcare encounters | id, patient, start/stop, type, provider |\n| `Condition` | Medical conditions | patient, code, description, onset |\n| `Medication` | Prescriptions | patient, code, description, start/stop |\n| `Observation` | Clinical observations | patient, code, value, units |\n| `Procedure` | Medical procedures | patient, code, description, date |\n| `Immunization` | Vaccination records | patient, code, date |\n| `CarePlan` | Treatment plans | patient, code, activities |\n| `Allergy` | Allergy records | patient, code, description |\n| `Device` | Medical devices | patient, code, start/stop |\n| `Supply` | Medical supplies | patient, code, quantity |\n| `Organization` | Healthcare facilities | id, name, address, phone |\n| `Provider` | Healthcare providers | id, name, speciality, organization |\n| `Payer` | Insurance companies | id, name, ownership |\n| `PayerTransition` | Insurance changes | patient, payer, start/stop |\n| `Claim` | Insurance claims | id, patient, provider, total |\n| `ClaimTransaction` | Claim line items | claim, type, amount |\n| `ImagingStudy` | Medical imaging | patient, modality, body_site |\n\n## Usage Examples\n\n### Loading CSV Data\n\nThe models work with Python's built-in `csv` module:\n\n```python\nimport csv\nfrom synthea_pydantic import Patient\n\n# Load from CSV file\nwith open('data/patients.csv') as f:\n    reader = csv.DictReader(f)\n    patients = [Patient(**row) for row in reader]\n```\n\n### Working with Optional Fields\n\nSynthea CSVs often have empty values. The models handle these gracefully:\n\n```python\n# Empty strings in CSV are converted to None\npatient = Patient(**{\n    'Id': '123e4567-e89b-12d3-a456-426614174000',\n    'BIRTHDATE': '1980-01-01',\n    'DEATHDATE': '',  # Empty string becomes None\n    'PREFIX': '',     # Empty string becomes None\n    'FIRST': 'John',\n    'LAST': 'Doe',\n    # ... other required fields\n})\n\nassert patient.deathdate is None\nassert patient.prefix is None\n```\n\n### Type Validation\n\nAll fields are validated according to their types:\n\n```python\nfrom decimal import Decimal\nfrom datetime import date, datetime\nfrom uuid import UUID\n\n# UUIDs are automatically parsed\nassert isinstance(patient.id, UUID)\n\n# Dates are parsed from YYYY-MM-DD format\nassert isinstance(patient.birthdate, date)\n\n# Decimals maintain precision for monetary values\nassert isinstance(patient.healthcare_expenses, Decimal)\n```\n\n### Linking Related Data\n\nUse the UUID foreign keys to link related records:\n\n```python\n# Find all medications for a patient\npatient_meds = [\n    med for med in medications \n    if med.patient == patient.id\n]\n\n# Find all conditions treated in an encounter\nencounter_conditions = [\n    cond for cond in conditions \n    if cond.encounter == encounter.id\n]\n```\n\n### Error Handling\n\nThe models provide clear error messages for invalid data:\n\n```python\ntry:\n    patient = Patient(**invalid_data)\nexcept ValidationError as e:\n    print(f\"Validation failed: {e}\")\n```\n\n## Model Details\n\n### Common Field Types\n\n- **IDs**: UUID fields for primary and foreign keys\n- **Dates**: `date` fields for dates (YYYY-MM-DD)\n- **Timestamps**: `datetime` fields for date/time values\n- **Money**: `Decimal` fields for monetary amounts\n- **Codes**: String fields for medical codes (SNOMED-CT, RxNorm, etc.)\n\n### Base Model Features\n\nAll models inherit from `SyntheaBaseModel` which provides:\n\n- Automatic whitespace stripping\n- Empty string to None conversion\n- Case-insensitive literal field matching\n- Field alias support for CSV column mapping\n\n## Development\n\n### Setup\n\nTo develop or contribute to synthea-pydantic:\n\n```bash\n# Clone the repository\ngit clone https://github.com/yourusername/synthea-pydantic.git\ncd synthea-pydantic\n\n# Install in development mode\nuv pip install -e .\n```\n\n### Running Tests\n\n```bash\n# Run all tests\nuv run pytest\n\n# Run with coverage\nuv run pytest --cov=synthea_pydantic\n\n# Run specific test file\nuv run pytest tests/test_patients.py\n```\n\n### Code Quality\n\n```bash\n# Type checking\nuv run mypy synthea_pydantic/\n\n# Linting\nuv run ruff check\n\n# Formatting\nuv run ruff format\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add some amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- [Synthea](https://github.com/synthetichealth/synthea) - The synthetic patient generator\n- [Pydantic](https://pydantic-docs.helpmanual.io/) - Data validation using Python type annotations\n\n## Resources\n\n- [Synthea Documentation](https://github.com/synthetichealth/synthea/wiki)\n- [Synthea CSV Format](https://github.com/synthetichealth/synthea/wiki/CSV-File-Data-Dictionary)\n- [Sample Synthea Data](https://github.com/synthetichealth/synthea-sample-data)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Type-safe Pydantic models for Synthea health data CSV exports",
    "version": "0.1.0",
    "project_urls": null,
    "split_keywords": [
        "synthea",
        " healthcare",
        " pydantic",
        " csv",
        " synthetic-data",
        " health-data"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b48ff5d6165586125d963ce57551b24c7cf527127184289771eef4ab599de80b",
                "md5": "ff7f258f64e593c34dbccfa22d839cbb",
                "sha256": "0090ae0d41b6ab1bbf9fbc9b68c5134eb4eb073cf945a0f035ad9a53b0af113b"
            },
            "downloads": -1,
            "filename": "synthea_pydantic-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ff7f258f64e593c34dbccfa22d839cbb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 24122,
            "upload_time": "2025-07-24T20:12:09",
            "upload_time_iso_8601": "2025-07-24T20:12:09.924394Z",
            "url": "https://files.pythonhosted.org/packages/b4/8f/f5d6165586125d963ce57551b24c7cf527127184289771eef4ab599de80b/synthea_pydantic-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1c3988324779fca128581adeb8eb781b8b04158ce37ff80a79684b2182a196a2",
                "md5": "a003a5691d6dcc505b825491894b5ed2",
                "sha256": "d5f4f40a8fcaded6091496286f7761e2f03bcb89eec02a9f1fbdb5bedb1c5b23"
            },
            "downloads": -1,
            "filename": "synthea_pydantic-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a003a5691d6dcc505b825491894b5ed2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 26372,
            "upload_time": "2025-07-24T20:12:11",
            "upload_time_iso_8601": "2025-07-24T20:12:11.437483Z",
            "url": "https://files.pythonhosted.org/packages/1c/39/88324779fca128581adeb8eb781b8b04158ce37ff80a79684b2182a196a2/synthea_pydantic-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-24 20:12:11",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "synthea-pydantic"
}
        
Elapsed time: 0.48683s