eoir


Nameeoir JSON
Version 0.0.1 PyPI version JSON
download
home_pageNone
SummaryEOIR FOIA data processing tools
upload_time2025-07-08 21:32:11
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords eoir foia immigration data-processing etl
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # EOIR FOIA Data Processing Tool

[![PyPI version](https://badge.fury.io/py/eoir.svg)](https://badge.fury.io/py/eoir)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

A high-performance tool for downloading, processing, and managing U.S. immigration court data from the Department of Justice's public FOIA releases.

## Overview

The Executive Office for Immigration Review (EOIR) releases anonymized immigration court data through FOIA requests. This tool automates the entire pipeline of downloading, extracting, cleaning, and loading this data into a PostgreSQL database for analysis.

## Features

- **Automated Downloads**: Fetch the latest FOIA data releases with progress tracking
- **Smart Extraction**: Automatically extract and organize ZIP files
- **Data Cleaning**: Clean and validate CSV files with parallel processing support
- **Database Management**: Load data into PostgreSQL with versioned table names
- **Pipeline Automation**: One-command execution of the entire workflow
- **Docker Support**: Fully containerized with Docker Compose
- **Progress Tracking**: Real-time progress bars and status updates
- **Incremental Updates**: Only download new data when available

## Requirements

- Python 3.10+
- PostgreSQL database
- Docker and Docker Compose (optional, for containerized deployment)

## Installation

### Install from PyPI

```bash
pip install eoir
```

### Local Development Installation

1. Clone the repository:
```bash
git clone https://github.com/marrowb/eoir.git
cd eoir
```

2. Install the package in development mode:
```bash
pip install -e .
```

3. Copy the environment template and configure:
```bash
cp .env.example .env
# Edit .env with your PostgreSQL credentials
```

### Docker Installation

1. Clone the repository:
```bash
git clone https://github.com/marrowb/eoir.git
cd eoir
```

2. Copy the environment template:
```bash
cp .env.example .env
# Edit .env if needed (defaults work for Docker)
```

3. Start the services:
```bash
docker-compose up -d
```

4. Access the application:
```bash
docker-compose exec app bash
# Or use the run script:
./run shell
```

## Quick Start

### Using Docker (Recommended)

```bash
# Run the complete pipeline
./run eoir run-pipeline

# Or run individual commands
./run eoir download status      # Check for new data
./run eoir download fetch       # Download latest data
./run eoir db init             # Initialize database
./run eoir clean               # Clean CSV files
```

### Local Development

```bash
# Run the complete pipeline
eoir run-pipeline

# Or run individual commands
eoir download status           # Check for new data
eoir download fetch           # Download latest data
eoir db init                  # Initialize database
eoir clean                    # Clean CSV files
```

## CLI Commands

### `eoir download`

Manage FOIA data downloads from the DOJ.

```bash
# Check if new data is available
eoir download status

# Download the latest FOIA release
eoir download fetch

# Download without extracting
eoir download fetch --no-unzip
```

### `eoir db`

Database management commands.

```bash
# Initialize database and create tables
eoir db init

# Create a database dump
eoir db dump

# Dump with custom output directory
eoir db dump -o /path/to/dumps
```

### `eoir clean`

Clean and process CSV files.

```bash
# Clean all CSV files in the latest download
eoir clean

# Clean with custom worker count
eoir clean --workers 16

# Clean specific input directory
eoir clean --input-dir /path/to/csvs
```

### `eoir run-pipeline`

Execute the complete data pipeline.

```bash
# Run full pipeline with defaults
eoir run-pipeline

# Run with custom settings
eoir run-pipeline --workers 16 --output-dir custom_dumps

# Skip download if data exists
eoir run-pipeline --skip-download
```

### `eoir config`

View configuration settings.

```bash
# Show current configuration
eoir config show
```

## Architecture

### Project Structure

```
eoir/
├── src/eoir/
│   ├── cli/               # Command-line interface modules
│   │   ├── download.py    # Download commands
│   │   ├── db.py          # Database commands
│   │   ├── clean.py       # CSV cleaning commands
│   │   └── pipeline.py    # Pipeline orchestration
│   ├── core/              # Core business logic
│   │   ├── download.py    # Download functionality
│   │   ├── db.py          # Database operations
│   │   ├── clean.py       # CSV processing
│   │   └── models.py      # Data models
│   ├── metadata/          # Data definitions
│   │   ├── foia_tables.sql      # Database schema
│   │   └── json/                # Table and column metadata
│   ├── logging/           # Structured logging
│   └── settings.py        # Configuration management
├── docker-compose.yml     # Docker services
├── Dockerfile            # Container definition
└── run                   # Development helper script
```

### Data Flow

1. **Download**: Fetches ZIP file from `https://fileshare.eoir.justice.gov/FOIA-TRAC-Report.zip`
2. **Extract**: Unzips to timestamped directory in `downloads/`
3. **Clean**: Processes CSV files to handle encoding and data issues
4. **Load**: Imports cleaned data into PostgreSQL with versioned table names
5. **Track**: Records download history and file metadata

### Database Schema

The tool creates versioned tables based on the download date. For example, a download on June 25th creates tables like:
- `foia_appeal_06_25`
- `foia_case_06_25`
- `foia_schedule_06_25`

See `src/eoir/metadata/foia_tables.sql` for the complete schema.

## Data Reference

### Processed Tables

The tool processes 20 different CSV files containing various immigration court records:

| CSV File | Database Table | Description |
|----------|----------------|-------------|
| `A_TblCase.csv` | `foia_case_XX_XX` | Case information |
| `tblAppeal.csv` | `foia_appeal_XX_XX` | Appeal records |
| `tbl_schedule.csv` | `foia_schedule_XX_XX` | Court schedules |
| `B_TblProceeding.csv` | `foia_proceeding_XX_XX` | Proceeding details |
| `tbl_EOIR_Attorney.csv` | `foia_atty_XX_XX` | Attorney information |

See `src/eoir/metadata/json/tables.json` for the complete mapping.

### Data Format

- **Encoding**: Latin-1
- **Delimiter**: Tab (`\t`)
- **Escape Character**: Backslash (`\\`)
- **Dialect**: Excel-tab

## Development

### Using the Run Script

The `run` script provides convenient commands for development:

```bash
./run eoir --help          # Run EOIR CLI
./run shell                # Start interactive shell
./run manage               # Database management
./run psql                 # PostgreSQL console
./run pip install package  # Install Python packages
./run yarn                 # Manage frontend (if applicable)
```

### Environment Variables

Configure the following in your `.env` file:

```env
# PostgreSQL Configuration
POSTGRES_USER=eoir
POSTGRES_PASSWORD=changeme
POSTGRES_DB=eoir
POSTGRES_HOST=postgres      # 'postgres' for Docker, 'localhost' for local
POSTGRES_PORT=5434         # External port (internal always 5432)

# Logging
LOG_LEVEL=INFO             # DEBUG, INFO, WARNING, ERROR
```

### Docker Development

```bash
# Build and start services
docker-compose up -d --build

# View logs
docker-compose logs -f app

# Stop services
docker-compose down

# Remove all data (including database)
docker-compose down -v
```

## Troubleshooting

### Common Issues

1. **Port conflicts**: If port 5434 is in use, change `POSTGRES_PORT` in `.env`
2. **Permission errors**: Ensure `downloads/` and `dumps/` directories are writable
3. **Memory issues**: Reduce worker count with `--workers` flag for large files
4. **Encoding errors**: The tool handles Latin-1 encoding automatically

### Debug Mode

Enable debug logging for troubleshooting:

```bash
export LOG_LEVEL=DEBUG
eoir run-pipeline
```

## Contributing

Contributions are welcome! Please follow these guidelines:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature-name`)
3. Make your changes
4. Run tests (if available)
5. Submit a pull request

## License

[MIT License Copyright (c) 2025 Backlog Immigration LLC](LICENSE)

## Acknowledgments

This tool processes publicly available FOIA data from the U.S. Department of Justice Executive Office for Immigration Review.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "eoir",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Backlog Immigration LLC <info@bklg.org>",
    "keywords": "eoir, foia, immigration, data-processing, etl",
    "author": null,
    "author_email": "Backlog Immigration LLC <info@bklg.org>",
    "download_url": "https://files.pythonhosted.org/packages/5e/c9/e925c5aabadb1fc7ee1d12ddb0bff55b5c928826260212394c7c7dbafc19/eoir-0.0.1.tar.gz",
    "platform": null,
    "description": "# EOIR FOIA Data Processing Tool\n\n[![PyPI version](https://badge.fury.io/py/eoir.svg)](https://badge.fury.io/py/eoir)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n\nA high-performance tool for downloading, processing, and managing U.S. immigration court data from the Department of Justice's public FOIA releases.\n\n## Overview\n\nThe Executive Office for Immigration Review (EOIR) releases anonymized immigration court data through FOIA requests. This tool automates the entire pipeline of downloading, extracting, cleaning, and loading this data into a PostgreSQL database for analysis.\n\n## Features\n\n- **Automated Downloads**: Fetch the latest FOIA data releases with progress tracking\n- **Smart Extraction**: Automatically extract and organize ZIP files\n- **Data Cleaning**: Clean and validate CSV files with parallel processing support\n- **Database Management**: Load data into PostgreSQL with versioned table names\n- **Pipeline Automation**: One-command execution of the entire workflow\n- **Docker Support**: Fully containerized with Docker Compose\n- **Progress Tracking**: Real-time progress bars and status updates\n- **Incremental Updates**: Only download new data when available\n\n## Requirements\n\n- Python 3.10+\n- PostgreSQL database\n- Docker and Docker Compose (optional, for containerized deployment)\n\n## Installation\n\n### Install from PyPI\n\n```bash\npip install eoir\n```\n\n### Local Development Installation\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/marrowb/eoir.git\ncd eoir\n```\n\n2. Install the package in development mode:\n```bash\npip install -e .\n```\n\n3. Copy the environment template and configure:\n```bash\ncp .env.example .env\n# Edit .env with your PostgreSQL credentials\n```\n\n### Docker Installation\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/marrowb/eoir.git\ncd eoir\n```\n\n2. Copy the environment template:\n```bash\ncp .env.example .env\n# Edit .env if needed (defaults work for Docker)\n```\n\n3. Start the services:\n```bash\ndocker-compose up -d\n```\n\n4. Access the application:\n```bash\ndocker-compose exec app bash\n# Or use the run script:\n./run shell\n```\n\n## Quick Start\n\n### Using Docker (Recommended)\n\n```bash\n# Run the complete pipeline\n./run eoir run-pipeline\n\n# Or run individual commands\n./run eoir download status      # Check for new data\n./run eoir download fetch       # Download latest data\n./run eoir db init             # Initialize database\n./run eoir clean               # Clean CSV files\n```\n\n### Local Development\n\n```bash\n# Run the complete pipeline\neoir run-pipeline\n\n# Or run individual commands\neoir download status           # Check for new data\neoir download fetch           # Download latest data\neoir db init                  # Initialize database\neoir clean                    # Clean CSV files\n```\n\n## CLI Commands\n\n### `eoir download`\n\nManage FOIA data downloads from the DOJ.\n\n```bash\n# Check if new data is available\neoir download status\n\n# Download the latest FOIA release\neoir download fetch\n\n# Download without extracting\neoir download fetch --no-unzip\n```\n\n### `eoir db`\n\nDatabase management commands.\n\n```bash\n# Initialize database and create tables\neoir db init\n\n# Create a database dump\neoir db dump\n\n# Dump with custom output directory\neoir db dump -o /path/to/dumps\n```\n\n### `eoir clean`\n\nClean and process CSV files.\n\n```bash\n# Clean all CSV files in the latest download\neoir clean\n\n# Clean with custom worker count\neoir clean --workers 16\n\n# Clean specific input directory\neoir clean --input-dir /path/to/csvs\n```\n\n### `eoir run-pipeline`\n\nExecute the complete data pipeline.\n\n```bash\n# Run full pipeline with defaults\neoir run-pipeline\n\n# Run with custom settings\neoir run-pipeline --workers 16 --output-dir custom_dumps\n\n# Skip download if data exists\neoir run-pipeline --skip-download\n```\n\n### `eoir config`\n\nView configuration settings.\n\n```bash\n# Show current configuration\neoir config show\n```\n\n## Architecture\n\n### Project Structure\n\n```\neoir/\n\u251c\u2500\u2500 src/eoir/\n\u2502   \u251c\u2500\u2500 cli/               # Command-line interface modules\n\u2502   \u2502   \u251c\u2500\u2500 download.py    # Download commands\n\u2502   \u2502   \u251c\u2500\u2500 db.py          # Database commands\n\u2502   \u2502   \u251c\u2500\u2500 clean.py       # CSV cleaning commands\n\u2502   \u2502   \u2514\u2500\u2500 pipeline.py    # Pipeline orchestration\n\u2502   \u251c\u2500\u2500 core/              # Core business logic\n\u2502   \u2502   \u251c\u2500\u2500 download.py    # Download functionality\n\u2502   \u2502   \u251c\u2500\u2500 db.py          # Database operations\n\u2502   \u2502   \u251c\u2500\u2500 clean.py       # CSV processing\n\u2502   \u2502   \u2514\u2500\u2500 models.py      # Data models\n\u2502   \u251c\u2500\u2500 metadata/          # Data definitions\n\u2502   \u2502   \u251c\u2500\u2500 foia_tables.sql      # Database schema\n\u2502   \u2502   \u2514\u2500\u2500 json/                # Table and column metadata\n\u2502   \u251c\u2500\u2500 logging/           # Structured logging\n\u2502   \u2514\u2500\u2500 settings.py        # Configuration management\n\u251c\u2500\u2500 docker-compose.yml     # Docker services\n\u251c\u2500\u2500 Dockerfile            # Container definition\n\u2514\u2500\u2500 run                   # Development helper script\n```\n\n### Data Flow\n\n1. **Download**: Fetches ZIP file from `https://fileshare.eoir.justice.gov/FOIA-TRAC-Report.zip`\n2. **Extract**: Unzips to timestamped directory in `downloads/`\n3. **Clean**: Processes CSV files to handle encoding and data issues\n4. **Load**: Imports cleaned data into PostgreSQL with versioned table names\n5. **Track**: Records download history and file metadata\n\n### Database Schema\n\nThe tool creates versioned tables based on the download date. For example, a download on June 25th creates tables like:\n- `foia_appeal_06_25`\n- `foia_case_06_25`\n- `foia_schedule_06_25`\n\nSee `src/eoir/metadata/foia_tables.sql` for the complete schema.\n\n## Data Reference\n\n### Processed Tables\n\nThe tool processes 20 different CSV files containing various immigration court records:\n\n| CSV File | Database Table | Description |\n|----------|----------------|-------------|\n| `A_TblCase.csv` | `foia_case_XX_XX` | Case information |\n| `tblAppeal.csv` | `foia_appeal_XX_XX` | Appeal records |\n| `tbl_schedule.csv` | `foia_schedule_XX_XX` | Court schedules |\n| `B_TblProceeding.csv` | `foia_proceeding_XX_XX` | Proceeding details |\n| `tbl_EOIR_Attorney.csv` | `foia_atty_XX_XX` | Attorney information |\n\nSee `src/eoir/metadata/json/tables.json` for the complete mapping.\n\n### Data Format\n\n- **Encoding**: Latin-1\n- **Delimiter**: Tab (`\\t`)\n- **Escape Character**: Backslash (`\\\\`)\n- **Dialect**: Excel-tab\n\n## Development\n\n### Using the Run Script\n\nThe `run` script provides convenient commands for development:\n\n```bash\n./run eoir --help          # Run EOIR CLI\n./run shell                # Start interactive shell\n./run manage               # Database management\n./run psql                 # PostgreSQL console\n./run pip install package  # Install Python packages\n./run yarn                 # Manage frontend (if applicable)\n```\n\n### Environment Variables\n\nConfigure the following in your `.env` file:\n\n```env\n# PostgreSQL Configuration\nPOSTGRES_USER=eoir\nPOSTGRES_PASSWORD=changeme\nPOSTGRES_DB=eoir\nPOSTGRES_HOST=postgres      # 'postgres' for Docker, 'localhost' for local\nPOSTGRES_PORT=5434         # External port (internal always 5432)\n\n# Logging\nLOG_LEVEL=INFO             # DEBUG, INFO, WARNING, ERROR\n```\n\n### Docker Development\n\n```bash\n# Build and start services\ndocker-compose up -d --build\n\n# View logs\ndocker-compose logs -f app\n\n# Stop services\ndocker-compose down\n\n# Remove all data (including database)\ndocker-compose down -v\n```\n\n## Troubleshooting\n\n### Common Issues\n\n1. **Port conflicts**: If port 5434 is in use, change `POSTGRES_PORT` in `.env`\n2. **Permission errors**: Ensure `downloads/` and `dumps/` directories are writable\n3. **Memory issues**: Reduce worker count with `--workers` flag for large files\n4. **Encoding errors**: The tool handles Latin-1 encoding automatically\n\n### Debug Mode\n\nEnable debug logging for troubleshooting:\n\n```bash\nexport LOG_LEVEL=DEBUG\neoir run-pipeline\n```\n\n## Contributing\n\nContributions are welcome! Please follow these guidelines:\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature-name`)\n3. Make your changes\n4. Run tests (if available)\n5. Submit a pull request\n\n## License\n\n[MIT License Copyright (c) 2025 Backlog Immigration LLC](LICENSE)\n\n## Acknowledgments\n\nThis tool processes publicly available FOIA data from the U.S. Department of Justice Executive Office for Immigration Review.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "EOIR FOIA data processing tools",
    "version": "0.0.1",
    "project_urls": {
        "Documentation": "https://github.com/marrowb/eoir#readme",
        "Homepage": "https://github.com/marrowb/eoir",
        "Issues": "https://github.com/marrowb/eoir/issues",
        "Repository": "https://github.com/marrowb/eoir"
    },
    "split_keywords": [
        "eoir",
        " foia",
        " immigration",
        " data-processing",
        " etl"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b2acf99832e452cc51fb64205a522a2e4efd9cb11e026e11277873c2d7d4ff8c",
                "md5": "6315180c0967c6d27f6ece0b9e4b2430",
                "sha256": "be964ea77f061edf4ce4e7de4396bda477865c48017892faaa9a2ba2d3104b4a"
            },
            "downloads": -1,
            "filename": "eoir-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6315180c0967c6d27f6ece0b9e4b2430",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 281660,
            "upload_time": "2025-07-08T21:32:10",
            "upload_time_iso_8601": "2025-07-08T21:32:10.490978Z",
            "url": "https://files.pythonhosted.org/packages/b2/ac/f99832e452cc51fb64205a522a2e4efd9cb11e026e11277873c2d7d4ff8c/eoir-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5ec9e925c5aabadb1fc7ee1d12ddb0bff55b5c928826260212394c7c7dbafc19",
                "md5": "e541733a82d62af974d9eb5947d3de3b",
                "sha256": "75d0cf03b278f357f4d3d85d318e097b102cb769b62ec35c6eddffb68d9fd567"
            },
            "downloads": -1,
            "filename": "eoir-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e541733a82d62af974d9eb5947d3de3b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 253098,
            "upload_time": "2025-07-08T21:32:11",
            "upload_time_iso_8601": "2025-07-08T21:32:11.877375Z",
            "url": "https://files.pythonhosted.org/packages/5e/c9/e925c5aabadb1fc7ee1d12ddb0bff55b5c928826260212394c7c7dbafc19/eoir-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-08 21:32:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "marrowb",
    "github_project": "eoir#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "eoir"
}
        
Elapsed time: 2.17705s