# mednotegen
This project uses [Synthea™](https://github.com/synthetichealth/synthea) to generate realistic synthetic patient data for medical notes.
---
## Usage
```python
from mednotegen.generator import NoteGenerator
gen = NoteGenerator.from_config("config.yaml")
gen.generate_notes(10, "output_dir")
# Or specify Synthea CSV directory directly:
gen = NoteGenerator(synthea_csv_dir="/path/to/synthea/output/csv")
gen.generate_notes(10, "output_dir")
```
## Using a Custom Synthea Directory with config.yaml
You can specify the Synthea CSV directory directly in your config file. Add the following line to your `config.yaml`:
Example `config.yaml`:
```yaml
count: 10
output_dir: output_dir
synthea_csv_dir: /path/to/synthea/output/csv
```
Then generate notes using:
```python
from mednotegen.generator import NoteGenerator
gen = NoteGenerator.from_config("config.yaml")
gen.generate_notes(10, "output_dir")
```
## ⚠️ Synthea Dependency Required
This project requires [Synthea™](https://github.com/synthetichealth/synthea), an open-source synthetic patient generator, as an **external dependency**. You must clone and build Synthea yourself before using `mednotegen`.
**To set up Synthea:**
1. **Clone Synthea**
```sh
git clone https://github.com/synthetichealth/synthea.git
```
2. **Build the Synthea JAR**
```sh
cd synthea
./gradlew build check test
cp build/libs/synthea-with-dependencies.jar .
cd ..
```
Ensure `synthea-with-dependencies.jar` is in the `synthea/` directory at the root of your project.
---
## Configuration (`config.yaml`)
You can customize patient generation and report output using a `config.yaml` file. Example options:
```yaml
count: 10 # Number of reports to generate
output_dir: output_dir # Output directory for PDFs
use_llm: false # Use LLM for report generation
synthea_csv_dir: /path/to/synthea/output/csv # Path to Synthea-generated CSV files
seed: 1234 # Random seed for reproducibility
reference_date: "20250628" # Reference date for data generation (YYYYMMDD)
clinician_seed: 5678 # Optional: separate seed for clinician assignment
gender: female # male, female, or any
min_age: 30 # Minimum patient age
max_age: 60 # Maximum patient age
state: New York # Synthea state parameter
modules:
- cardiovascular-disease
- diabetes
- hypertension
- asthma
local_config: custom_synthea.properties # Custom Synthea config file
local_modules: ./synthea_modules # Directory for custom modules
```
- **count**: Number of reports to generate
- **output_dir**: Directory to save generated PDFs
- **use_llm**: If true, uses OpenAI LLM for report text
- **seed**: Random seed for reproducibility
- **reference_date**: Reference date for age calculations (YYYYMMDD)
- **clinician_seed**: Optional, separate seed for clinician assignment
- **gender**: Gender filter for patients (`male`, `female`, or `any`)
- **min_age**, **max_age**: Age range for patients
- **state**: US state for Synthea simulation
- **modules**: Synthea disease modules to enable
- **local_config**: Path to a custom Synthea config file
- **local_modules**: Directory for custom Synthea modules
---
### More Synthea Modules
For an up-to-date and complete list of available modules, see the [official Synthea modules directory](https://github.com/synthetichealth/synthea/tree/master/src/main/resources/modules).
---
### Troubleshooting:
#### Synthea Data Location
If you see errors about missing `patients.csv`, `medications.csv`, or `conditions.csv`, make sure you have generated Synthea data and that the path you provide (via `synthea_csv_dir`, CLI, or config) points to the correct directory containing those files.
If you installed `mednotegen` via pip, the default location is inside the package directory. For custom or system-wide Synthea runs, always specify the output CSV directory explicitly.
- **No CSV files generated:**
- Make sure you edited the correct `synthea.properties` and used the `-c` flag when running Synthea.
- Ensure `exporter.csv.export = true` is set and not overridden elsewhere in the file.
- **FileNotFoundError for CSVs:**
- Confirm the CSV files exist in the path specified by `synthea_csv_dir` or in the expected package location.
- **ValueError: No patients found matching the specified filters:**
- Check your age/gender filters in `config.yaml`. Try relaxing them if you have too few patients.
### Configure Synthea to Export CSVs
Edit `src/main/resources/synthea.properties` in your Synthea directory:
```
exporter.csv.export = true
```
(Ensure any `exporter.csv.export = false` lines are removed or commented out.)
### Generate Patient Data with Synthea
From your Synthea directory, clean any old output and generate new data:
```
rm -rf output/
java -jar synthea-with-dependencies.jar -c src/main/resources/synthea.properties -p 1000
```
- The `-p 1000` flag generates 1000 patients.
- After running, check for CSV files in `output/csv/`.
### Attribution
See `README_SYNTHEA_NOTICE.md` and `LICENSE-APACHE-2.0` for license and attribution requirements.
Raw data
{
"_id": null,
"home_page": null,
"name": "mednotegen",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "synthea, medical notes, pdf",
"author": null,
"author_email": "Mikael Moise <mikael.moise@protonmail.com>",
"download_url": "https://files.pythonhosted.org/packages/f1/9f/b4dd049b9a73a93b35a666b110ebf3569c0bcdcaeb39eb6298caa9c238b7/mednotegen-0.1.2.tar.gz",
"platform": null,
"description": "# mednotegen\n\nThis project uses [Synthea\u2122](https://github.com/synthetichealth/synthea) to generate realistic synthetic patient data for medical notes. \n\n---\n\n## Usage\n\n```python\nfrom mednotegen.generator import NoteGenerator\n\ngen = NoteGenerator.from_config(\"config.yaml\")\ngen.generate_notes(10, \"output_dir\")\n\n# Or specify Synthea CSV directory directly:\ngen = NoteGenerator(synthea_csv_dir=\"/path/to/synthea/output/csv\")\ngen.generate_notes(10, \"output_dir\")\n```\n\n## Using a Custom Synthea Directory with config.yaml\n\nYou can specify the Synthea CSV directory directly in your config file. Add the following line to your `config.yaml`:\n\nExample `config.yaml`:\n```yaml\ncount: 10\noutput_dir: output_dir\nsynthea_csv_dir: /path/to/synthea/output/csv\n```\n\nThen generate notes using:\n\n```python\nfrom mednotegen.generator import NoteGenerator\n\ngen = NoteGenerator.from_config(\"config.yaml\")\ngen.generate_notes(10, \"output_dir\")\n```\n\n\n## \u26a0\ufe0f Synthea Dependency Required\n\nThis project requires [Synthea\u2122](https://github.com/synthetichealth/synthea), an open-source synthetic patient generator, as an **external dependency**. You must clone and build Synthea yourself before using `mednotegen`.\n\n**To set up Synthea:**\n\n1. **Clone Synthea**\n ```sh\n git clone https://github.com/synthetichealth/synthea.git\n ```\n2. **Build the Synthea JAR**\n ```sh\n cd synthea\n ./gradlew build check test\n cp build/libs/synthea-with-dependencies.jar .\n cd ..\n ```\n Ensure `synthea-with-dependencies.jar` is in the `synthea/` directory at the root of your project.\n\n---\n\n## Configuration (`config.yaml`)\n\nYou can customize patient generation and report output using a `config.yaml` file. Example options:\n\n```yaml\ncount: 10 # Number of reports to generate\noutput_dir: output_dir # Output directory for PDFs\nuse_llm: false # Use LLM for report generation\nsynthea_csv_dir: /path/to/synthea/output/csv # Path to Synthea-generated CSV files\nseed: 1234 # Random seed for reproducibility\nreference_date: \"20250628\" # Reference date for data generation (YYYYMMDD)\nclinician_seed: 5678 # Optional: separate seed for clinician assignment\ngender: female # male, female, or any\nmin_age: 30 # Minimum patient age\nmax_age: 60 # Maximum patient age\nstate: New York # Synthea state parameter\nmodules:\n - cardiovascular-disease\n - diabetes \n - hypertension\n - asthma \nlocal_config: custom_synthea.properties # Custom Synthea config file\nlocal_modules: ./synthea_modules # Directory for custom modules\n```\n\n- **count**: Number of reports to generate\n- **output_dir**: Directory to save generated PDFs\n- **use_llm**: If true, uses OpenAI LLM for report text\n- **seed**: Random seed for reproducibility\n- **reference_date**: Reference date for age calculations (YYYYMMDD)\n- **clinician_seed**: Optional, separate seed for clinician assignment\n- **gender**: Gender filter for patients (`male`, `female`, or `any`)\n- **min_age**, **max_age**: Age range for patients\n- **state**: US state for Synthea simulation\n- **modules**: Synthea disease modules to enable\n- **local_config**: Path to a custom Synthea config file\n- **local_modules**: Directory for custom Synthea modules\n\n---\n\n### More Synthea Modules\nFor an up-to-date and complete list of available modules, see the [official Synthea modules directory](https://github.com/synthetichealth/synthea/tree/master/src/main/resources/modules).\n\n---\n\n### Troubleshooting: \n#### Synthea Data Location\n\nIf you see errors about missing `patients.csv`, `medications.csv`, or `conditions.csv`, make sure you have generated Synthea data and that the path you provide (via `synthea_csv_dir`, CLI, or config) points to the correct directory containing those files.\n\nIf you installed `mednotegen` via pip, the default location is inside the package directory. For custom or system-wide Synthea runs, always specify the output CSV directory explicitly.\n\n- **No CSV files generated:**\n - Make sure you edited the correct `synthea.properties` and used the `-c` flag when running Synthea.\n - Ensure `exporter.csv.export = true` is set and not overridden elsewhere in the file.\n- **FileNotFoundError for CSVs:**\n - Confirm the CSV files exist in the path specified by `synthea_csv_dir` or in the expected package location.\n- **ValueError: No patients found matching the specified filters:**\n - Check your age/gender filters in `config.yaml`. Try relaxing them if you have too few patients.\n\n\n### Configure Synthea to Export CSVs\n\nEdit `src/main/resources/synthea.properties` in your Synthea directory:\n\n```\nexporter.csv.export = true\n```\n\n(Ensure any `exporter.csv.export = false` lines are removed or commented out.)\n\n### Generate Patient Data with Synthea\n\nFrom your Synthea directory, clean any old output and generate new data:\n\n```\nrm -rf output/\njava -jar synthea-with-dependencies.jar -c src/main/resources/synthea.properties -p 1000\n```\n\n- The `-p 1000` flag generates 1000 patients.\n- After running, check for CSV files in `output/csv/`.\n\n\n### Attribution\n\nSee `README_SYNTHEA_NOTICE.md` and `LICENSE-APACHE-2.0` for license and attribution requirements.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Generate fake patient reports as PDFs.",
"version": "0.1.2",
"project_urls": {
"Bug Tracker": "https://github.com/nortelabs/mednotegen/issues",
"Repository": "https://github.com/nortelabs/mednotegen"
},
"split_keywords": [
"synthea",
" medical notes",
" pdf"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f7c59c20c2c5c3c9cb55e44cd904e4bb0590ab4f981d970429256b0505604d3b",
"md5": "5db869dea0fba542ee4251ba28e80da4",
"sha256": "9dea9607f01eb415a5c6bafb19725eed60d20db2898bcda3c000556ada22758f"
},
"downloads": -1,
"filename": "mednotegen-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5db869dea0fba542ee4251ba28e80da4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 13829,
"upload_time": "2025-07-08T22:31:10",
"upload_time_iso_8601": "2025-07-08T22:31:10.917966Z",
"url": "https://files.pythonhosted.org/packages/f7/c5/9c20c2c5c3c9cb55e44cd904e4bb0590ab4f981d970429256b0505604d3b/mednotegen-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f19fb4dd049b9a73a93b35a666b110ebf3569c0bcdcaeb39eb6298caa9c238b7",
"md5": "7e41d2f361dc63cc6d6b0cd8d28d161c",
"sha256": "23ec9e2edf97e77c004818d8335de72500e64673594a4cb3ce5c6874311fbd0e"
},
"downloads": -1,
"filename": "mednotegen-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "7e41d2f361dc63cc6d6b0cd8d28d161c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 14332,
"upload_time": "2025-07-08T22:31:11",
"upload_time_iso_8601": "2025-07-08T22:31:11.846705Z",
"url": "https://files.pythonhosted.org/packages/f1/9f/b4dd049b9a73a93b35a666b110ebf3569c0bcdcaeb39eb6298caa9c238b7/mednotegen-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-08 22:31:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nortelabs",
"github_project": "mednotegen",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "faker",
"specs": []
},
{
"name": "fpdf",
"specs": []
}
],
"lcname": "mednotegen"
}