# pyeducationdata
Python package for accessing the Urban Institute's Education Data Portal API.
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
## Overview
`pyeducationdata` is a Python client library for the [Urban Institute's Education Data Portal API](https://educationdata.urban.org/). It provides convenient access to comprehensive US education data from kindergarten through postsecondary education, covering decades of data from multiple federal sources.
This package is a Python implementation inspired by the Urban Institute's [R package `educationdata`](https://github.com/UrbanInstitute/education-data-package-r), designed to provide the same functionality with a Pythonic interface.
## Features
- **Simple API**: Two main functions mirror the R package design
- **Automatic pagination**: Handles the API's 10,000 record limit transparently
- **Type-safe**: Full type hints and pydantic validation
- **Flexible filtering**: Filter by year, grade, location, and more
- **Label mapping**: Convert integer codes to human-readable labels
- **CSV support**: Download complete datasets efficiently
- **Summary statistics**: Server-side aggregation for fast statistics
## Installation
### Using pip
```bash
pip install pyeducationdata
```
### Using uv
```bash
uv add pyeducationdata
```
### Development installation
```bash
git clone https://github.com/shaneorr/pyeducationdata.git
cd pyeducationdata
uv pip install -e ".[dev]"
```
## Quick Start
```python
import pyeducationdata as ped
# Get school enrollment data with demographic breakdowns
df = ped.get_education_data(
level='schools',
source='ccd',
topic='enrollment',
subtopic=['race', 'sex'],
filters={'year': 2020, 'grade': [9, 10, 11, 12], 'fips': 13},
add_labels=True
)
print(df.head())
```
## Main Functions
### `get_education_data()`
Retrieve data from the Education Data Portal API.
**Parameters:**
- `level` (str, required): API data level - `'schools'`, `'school-districts'`, or `'college-university'`
- `source` (str, required): Data source - `'ccd'`, `'crdc'`, `'ipeds'`, `'edfacts'`, etc.
- `topic` (str, required): Data topic - `'enrollment'`, `'directory'`, `'finance'`, etc.
- `subtopic` (list[str] | None): Grouping parameters like `['race', 'sex']`
- `filters` (dict | None): Query filters like `{'year': 2020, 'grade': 9}`
- `add_labels` (bool): Convert integer codes to descriptive labels (default: `False`)
- `csv` (bool): Download full CSV instead of using JSON API (default: `False`)
**Returns:** `pandas.DataFrame`
### `get_education_data_summary()`
Retrieve aggregated summary statistics from the API.
**Parameters:**
- `level`, `source`, `topic`, `subtopic`: Same as `get_education_data()`
- `stat` (str, required): Statistic to compute - `'sum'`, `'avg'`, `'median'`, `'max'`, `'min'`, `'count'`
- `var` (str, required): Variable to aggregate
- `by` (str | list[str]): Variables to group by
- `filters` (dict | None): Query filters
**Returns:** `pandas.DataFrame`
## Usage Examples
### Example 1: School Directory Data
Get information about schools in California for 2020:
```python
import pyeducationdata as ped
schools = ped.get_education_data(
level='schools',
source='ccd',
topic='directory',
filters={'year': 2020, 'fips': 6}, # fips=6 is California
add_labels=True
)
print(f"Found {len(schools)} schools")
print(schools[['school_name', 'city', 'charter', 'school_level']].head())
```
### Example 2: Enrollment by Demographics
Get enrollment by race and sex for high school grades:
```python
enrollment = ped.get_education_data(
level='schools',
source='ccd',
topic='enrollment',
subtopic=['race', 'sex'],
filters={
'year': 2020,
'grade': [9, 10, 11, 12],
'fips': 36 # New York
},
add_labels=True
)
# Analyze enrollment patterns
enrollment_summary = enrollment.groupby(['race', 'sex'])['enrollment'].sum()
print(enrollment_summary)
```
### Example 3: College/University Data
Get IPEDS data for 4-year public universities:
```python
colleges = ped.get_education_data(
level='college-university',
source='ipeds',
topic='directory',
filters={'year': 2023}
)
# Filter to 4-year public institutions
public_4year = colleges[
(colleges['inst_level'] == 1) & # 4-year
(colleges['inst_control'] == 1) # Public
]
print(f"Found {len(public_4year)} public 4-year institutions")
```
### Example 4: Summary Statistics
Get state-level enrollment totals:
```python
state_totals = ped.get_education_data_summary(
level='schools',
source='ccd',
topic='enrollment',
stat='sum',
var='enrollment',
by='fips',
filters={'year': 2020}
)
print(state_totals.sort_values('enrollment', ascending=False).head(10))
```
### Example 5: Multi-Year Analysis
Get enrollment trends over multiple years:
```python
trends = ped.get_education_data(
level='schools',
source='ccd',
topic='enrollment',
filters={
'year': [2015, 2016, 2017, 2018, 2019, 2020],
'grade': 99, # All grades total
'fips': 17 # Illinois
}
)
# Analyze yearly trends
yearly_totals = trends.groupby('year')['enrollment'].sum()
print(yearly_totals)
```
## Available Data
The Education Data Portal provides 160+ endpoints across three institutional levels:
### Schools (K-12 school level)
- **CCD (Common Core of Data)**: School directory, enrollment, demographics (1986-2023)
- **CRDC (Civil Rights Data Collection)**: Discipline, advanced coursework, school characteristics (2011-2020, biennial)
- **EdFacts**: Assessment results, graduation rates (2009-2020)
- **NHGIS**: Census data at school locations
### School Districts (K-12 district level)
- **CCD**: District directory, enrollment, finance data (1986-2023)
- **EdFacts**: District assessments and graduation rates
- **SAIPE**: Poverty estimates for school-age children (1995-2023)
### Colleges and Universities
- **IPEDS**: Comprehensive postsecondary data - admissions, enrollment, completions, finance, student aid (1980-2023)
- **College Scorecard**: Student outcomes, earnings, loan repayment (1996-2020)
- **FSA**: Federal student aid data
- **Other**: Campus crime, athletics, endowments
## API Structure
The Education Data Portal API is organized hierarchically:
```
https://educationdata.urban.org/api/v1/{level}/{source}/{topic}/{subtopic}/{year}/
```
For example:
```
https://educationdata.urban.org/api/v1/schools/ccd/enrollment/race/2020/
```
This package handles URL construction, pagination, and data formatting automatically.
## Data Attribution
By using this package, you agree to the Urban Institute's Data Policy and Terms of Use. The data is provided under the Open Data Commons Attribution License (ODC-By) v1.0.
**When using the data in publications, please provide attribution:**
```
[Dataset names], Education Data Portal (Version 0.23.0), Urban Institute,
accessed [Month DD, YYYY], https://educationdata.urban.org/documentation/,
made available under the ODC Attribution License.
```
## Comparison to R Package
This package aims for feature parity with the Urban Institute's R `educationdata` package:
| Feature | R Package | Python Package |
|---------|-----------|----------------|
| Main function | `get_education_data()` | `get_education_data()` |
| Summary function | `get_education_data_summary()` | `get_education_data_summary()` |
| Automatic pagination | ✓ | ✓ |
| Label mapping | ✓ | ✓ |
| CSV downloads | ✓ | ✓ |
| Type safety | R types | Python type hints + pydantic |
| Async support | N/A | Not yet (sync only) |
## Technical Details
### Implementation
- **HTTP Client**: Uses `httpx` for reliable HTTP communication
- **Data Handling**: Returns `pandas.DataFrame` objects
- **Validation**: Uses `pydantic` v2 for parameter validation
- **Sync Only**: Currently synchronous implementation (async may be added in future)
### Requirements
- Python 3.9+
- httpx >= 0.27.0
- pandas >= 2.0.0
- pydantic >= 2.0.0
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## Development
```bash
# Clone the repository
git clone https://github.com/shaneorr/pyeducationdata.git
cd pyeducationdata
# Install with development dependencies
uv pip install -e ".[dev]"
# Run tests
pytest
# Run linting
ruff check .
# Format code
ruff format .
```
## License
This package is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
The data accessed through this package is provided by the Urban Institute under the Open Data Commons Attribution License (ODC-By) v1.0.
## Links
- **Education Data Portal**: https://educationdata.urban.org/
- **API Documentation**: https://educationdata.urban.org/documentation/
- **R Package**: https://github.com/UrbanInstitute/education-data-package-r
- **Urban Institute**: https://www.urban.org/
## Support
For questions about the package, please open an issue on GitHub.
For questions about the data or API, contact the Urban Institute at educationdata@urban.org.
Raw data
{
"_id": null,
"home_page": null,
"name": "pyeducationdata",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "api, data, education, research, urban-institute",
"author": null,
"author_email": "Shane Orr <shane.j.orr@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/3d/3b/3dddf1b2827cab4863d53678265d513c545e7e4131d93813bcbe0f1b601c/pyeducationdata-0.1.0.tar.gz",
"platform": null,
"description": "# pyeducationdata\n\nPython package for accessing the Urban Institute's Education Data Portal API.\n\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n\n## Overview\n\n`pyeducationdata` is a Python client library for the [Urban Institute's Education Data Portal API](https://educationdata.urban.org/). It provides convenient access to comprehensive US education data from kindergarten through postsecondary education, covering decades of data from multiple federal sources.\n\nThis package is a Python implementation inspired by the Urban Institute's [R package `educationdata`](https://github.com/UrbanInstitute/education-data-package-r), designed to provide the same functionality with a Pythonic interface.\n\n## Features\n\n- **Simple API**: Two main functions mirror the R package design\n- **Automatic pagination**: Handles the API's 10,000 record limit transparently\n- **Type-safe**: Full type hints and pydantic validation\n- **Flexible filtering**: Filter by year, grade, location, and more\n- **Label mapping**: Convert integer codes to human-readable labels\n- **CSV support**: Download complete datasets efficiently\n- **Summary statistics**: Server-side aggregation for fast statistics\n\n## Installation\n\n### Using pip\n\n```bash\npip install pyeducationdata\n```\n\n### Using uv\n\n```bash\nuv add pyeducationdata\n```\n\n### Development installation\n\n```bash\ngit clone https://github.com/shaneorr/pyeducationdata.git\ncd pyeducationdata\nuv pip install -e \".[dev]\"\n```\n\n## Quick Start\n\n```python\nimport pyeducationdata as ped\n\n# Get school enrollment data with demographic breakdowns\ndf = ped.get_education_data(\n level='schools',\n source='ccd',\n topic='enrollment',\n subtopic=['race', 'sex'],\n filters={'year': 2020, 'grade': [9, 10, 11, 12], 'fips': 13},\n add_labels=True\n)\n\nprint(df.head())\n```\n\n## Main Functions\n\n### `get_education_data()`\n\nRetrieve data from the Education Data Portal API.\n\n**Parameters:**\n- `level` (str, required): API data level - `'schools'`, `'school-districts'`, or `'college-university'`\n- `source` (str, required): Data source - `'ccd'`, `'crdc'`, `'ipeds'`, `'edfacts'`, etc.\n- `topic` (str, required): Data topic - `'enrollment'`, `'directory'`, `'finance'`, etc.\n- `subtopic` (list[str] | None): Grouping parameters like `['race', 'sex']`\n- `filters` (dict | None): Query filters like `{'year': 2020, 'grade': 9}`\n- `add_labels` (bool): Convert integer codes to descriptive labels (default: `False`)\n- `csv` (bool): Download full CSV instead of using JSON API (default: `False`)\n\n**Returns:** `pandas.DataFrame`\n\n### `get_education_data_summary()`\n\nRetrieve aggregated summary statistics from the API.\n\n**Parameters:**\n- `level`, `source`, `topic`, `subtopic`: Same as `get_education_data()`\n- `stat` (str, required): Statistic to compute - `'sum'`, `'avg'`, `'median'`, `'max'`, `'min'`, `'count'`\n- `var` (str, required): Variable to aggregate\n- `by` (str | list[str]): Variables to group by\n- `filters` (dict | None): Query filters\n\n**Returns:** `pandas.DataFrame`\n\n## Usage Examples\n\n### Example 1: School Directory Data\n\nGet information about schools in California for 2020:\n\n```python\nimport pyeducationdata as ped\n\nschools = ped.get_education_data(\n level='schools',\n source='ccd',\n topic='directory',\n filters={'year': 2020, 'fips': 6}, # fips=6 is California\n add_labels=True\n)\n\nprint(f\"Found {len(schools)} schools\")\nprint(schools[['school_name', 'city', 'charter', 'school_level']].head())\n```\n\n### Example 2: Enrollment by Demographics\n\nGet enrollment by race and sex for high school grades:\n\n```python\nenrollment = ped.get_education_data(\n level='schools',\n source='ccd',\n topic='enrollment',\n subtopic=['race', 'sex'],\n filters={\n 'year': 2020,\n 'grade': [9, 10, 11, 12],\n 'fips': 36 # New York\n },\n add_labels=True\n)\n\n# Analyze enrollment patterns\nenrollment_summary = enrollment.groupby(['race', 'sex'])['enrollment'].sum()\nprint(enrollment_summary)\n```\n\n### Example 3: College/University Data\n\nGet IPEDS data for 4-year public universities:\n\n```python\ncolleges = ped.get_education_data(\n level='college-university',\n source='ipeds',\n topic='directory',\n filters={'year': 2023}\n)\n\n# Filter to 4-year public institutions\npublic_4year = colleges[\n (colleges['inst_level'] == 1) & # 4-year\n (colleges['inst_control'] == 1) # Public\n]\nprint(f\"Found {len(public_4year)} public 4-year institutions\")\n```\n\n### Example 4: Summary Statistics\n\nGet state-level enrollment totals:\n\n```python\nstate_totals = ped.get_education_data_summary(\n level='schools',\n source='ccd',\n topic='enrollment',\n stat='sum',\n var='enrollment',\n by='fips',\n filters={'year': 2020}\n)\n\nprint(state_totals.sort_values('enrollment', ascending=False).head(10))\n```\n\n### Example 5: Multi-Year Analysis\n\nGet enrollment trends over multiple years:\n\n```python\ntrends = ped.get_education_data(\n level='schools',\n source='ccd',\n topic='enrollment',\n filters={\n 'year': [2015, 2016, 2017, 2018, 2019, 2020],\n 'grade': 99, # All grades total\n 'fips': 17 # Illinois\n }\n)\n\n# Analyze yearly trends\nyearly_totals = trends.groupby('year')['enrollment'].sum()\nprint(yearly_totals)\n```\n\n## Available Data\n\nThe Education Data Portal provides 160+ endpoints across three institutional levels:\n\n### Schools (K-12 school level)\n- **CCD (Common Core of Data)**: School directory, enrollment, demographics (1986-2023)\n- **CRDC (Civil Rights Data Collection)**: Discipline, advanced coursework, school characteristics (2011-2020, biennial)\n- **EdFacts**: Assessment results, graduation rates (2009-2020)\n- **NHGIS**: Census data at school locations\n\n### School Districts (K-12 district level)\n- **CCD**: District directory, enrollment, finance data (1986-2023)\n- **EdFacts**: District assessments and graduation rates\n- **SAIPE**: Poverty estimates for school-age children (1995-2023)\n\n### Colleges and Universities\n- **IPEDS**: Comprehensive postsecondary data - admissions, enrollment, completions, finance, student aid (1980-2023)\n- **College Scorecard**: Student outcomes, earnings, loan repayment (1996-2020)\n- **FSA**: Federal student aid data\n- **Other**: Campus crime, athletics, endowments\n\n## API Structure\n\nThe Education Data Portal API is organized hierarchically:\n\n```\nhttps://educationdata.urban.org/api/v1/{level}/{source}/{topic}/{subtopic}/{year}/\n```\n\nFor example:\n```\nhttps://educationdata.urban.org/api/v1/schools/ccd/enrollment/race/2020/\n```\n\nThis package handles URL construction, pagination, and data formatting automatically.\n\n## Data Attribution\n\nBy using this package, you agree to the Urban Institute's Data Policy and Terms of Use. The data is provided under the Open Data Commons Attribution License (ODC-By) v1.0.\n\n**When using the data in publications, please provide attribution:**\n\n```\n[Dataset names], Education Data Portal (Version 0.23.0), Urban Institute,\naccessed [Month DD, YYYY], https://educationdata.urban.org/documentation/,\nmade available under the ODC Attribution License.\n```\n\n## Comparison to R Package\n\nThis package aims for feature parity with the Urban Institute's R `educationdata` package:\n\n| Feature | R Package | Python Package |\n|---------|-----------|----------------|\n| Main function | `get_education_data()` | `get_education_data()` |\n| Summary function | `get_education_data_summary()` | `get_education_data_summary()` |\n| Automatic pagination | \u2713 | \u2713 |\n| Label mapping | \u2713 | \u2713 |\n| CSV downloads | \u2713 | \u2713 |\n| Type safety | R types | Python type hints + pydantic |\n| Async support | N/A | Not yet (sync only) |\n\n## Technical Details\n\n### Implementation\n\n- **HTTP Client**: Uses `httpx` for reliable HTTP communication\n- **Data Handling**: Returns `pandas.DataFrame` objects\n- **Validation**: Uses `pydantic` v2 for parameter validation\n- **Sync Only**: Currently synchronous implementation (async may be added in future)\n\n### Requirements\n\n- Python 3.9+\n- httpx >= 0.27.0\n- pandas >= 2.0.0\n- pydantic >= 2.0.0\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## Development\n\n```bash\n# Clone the repository\ngit clone https://github.com/shaneorr/pyeducationdata.git\ncd pyeducationdata\n\n# Install with development dependencies\nuv pip install -e \".[dev]\"\n\n# Run tests\npytest\n\n# Run linting\nruff check .\n\n# Format code\nruff format .\n```\n\n## License\n\nThis package is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\nThe data accessed through this package is provided by the Urban Institute under the Open Data Commons Attribution License (ODC-By) v1.0.\n\n## Links\n\n- **Education Data Portal**: https://educationdata.urban.org/\n- **API Documentation**: https://educationdata.urban.org/documentation/\n- **R Package**: https://github.com/UrbanInstitute/education-data-package-r\n- **Urban Institute**: https://www.urban.org/\n\n## Support\n\nFor questions about the package, please open an issue on GitHub.\n\nFor questions about the data or API, contact the Urban Institute at educationdata@urban.org.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python package for accessing the Urban Institute's Education Data Portal API",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/shaneorr/pyeducationdata/issues",
"Documentation": "https://github.com/shaneorr/pyeducationdata#readme",
"Homepage": "https://github.com/shaneorr/pyeducationdata",
"Repository": "https://github.com/shaneorr/pyeducationdata"
},
"split_keywords": [
"api",
" data",
" education",
" research",
" urban-institute"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "2b5132e32642a1f08de07b715fc181be7efe63e16a195c3098e322cb3c17a236",
"md5": "377792af01a482108c9ddde8e5de98ad",
"sha256": "567eedcdda4797b819c00b2e2b127b41aa83e41e9a614f0367e6ccb55f129d71"
},
"downloads": -1,
"filename": "pyeducationdata-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "377792af01a482108c9ddde8e5de98ad",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 28362,
"upload_time": "2025-10-29T21:28:07",
"upload_time_iso_8601": "2025-10-29T21:28:07.266381Z",
"url": "https://files.pythonhosted.org/packages/2b/51/32e32642a1f08de07b715fc181be7efe63e16a195c3098e322cb3c17a236/pyeducationdata-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3d3b3dddf1b2827cab4863d53678265d513c545e7e4131d93813bcbe0f1b601c",
"md5": "0cbcedccf1cea8ea56f172989036c8a0",
"sha256": "16256b3e4641828af93d668a55f8f91f38290834c29468c1a5a075fb4813e5f1"
},
"downloads": -1,
"filename": "pyeducationdata-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "0cbcedccf1cea8ea56f172989036c8a0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 139091,
"upload_time": "2025-10-29T21:28:08",
"upload_time_iso_8601": "2025-10-29T21:28:08.489979Z",
"url": "https://files.pythonhosted.org/packages/3d/3b/3dddf1b2827cab4863d53678265d513c545e7e4131d93813bcbe0f1b601c/pyeducationdata-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-29 21:28:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "shaneorr",
"github_project": "pyeducationdata",
"github_not_found": true,
"lcname": "pyeducationdata"
}