bicam


Namebicam JSON
Version 0.1.3 PyPI version JSON
download
home_pageNone
SummaryBICAM Dataset Downloader - Access comprehensive congressional data
upload_time2025-07-14 13:03:14
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords congress legislation political-science dataset government-data
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # BICAM - Comprehensive Congressional Data Downloader

[![PyPI version](https://badge.fury.io/py/bicam.svg)](https://badge.fury.io/py/bicam)
[![Python versions](https://img.shields.io/pypi/pyversions/bicam.svg)](https://pypi.org/project/bicam/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

The BICAM package provides easy programmatic access to the Bulk Ingestion of Congressional Actions & Materials (BICAM) dataset, a comprehensive collection of congressional data including bills, amendments, committee reports, hearings, and more sourced from the official [congress.gov](https://congress.gov) and [GovInfo](https://govinfo.gov) APIs.

## Features

- 📦 **11 Dataset Types**: Access bills, amendments, members, committees, hearings, and more
- 🚀 **Fast Downloads**: Optimized S3 downloads with progress tracking
- 💾 **Smart Caching**: Automatic local caching to avoid re-downloads
- 🔧 **Simple API**: Both Python API and command-line interface
- ✅ **Data Integrity**: Automatic checksum verification
- 📊 **Large Scale**: Efficiently handles datasets from 100MB to 12GB+

## Installation

### From PyPI (Recommended)

```bash
# Using uv (faster, recommended)
uv pip install bicam

# Using pip (alternative)
pip install bicam
```

### From Source

```bash
# Clone and install in development mode
git clone https://github.com/bicam-data/bicam
cd bicam
uv pip install -e .
```

## Quick Start

### Python API

```python
import bicam

# Download a dataset
bills_path = bicam.download_dataset('bills')
print(f"Bills data available at: {bills_path}")

# Load data directly into a DataFrame (downloads if needed, auto-confirms for large datasets)
bills_df = bicam.load_dataframe('bills', 'bills.csv', download=True)
print(f"Loaded {len(bills_df)} bills")

# Load members data (will raise error if not cached)
try:
    members_df = bicam.load_dataframe('members', 'members.csv')
except ValueError as e:
    print(f"Dataset not cached: {e}")
    # Download it first
    members_df = bicam.load_dataframe('members', 'members.csv', download=True)

# Load first available CSV file from a dataset
df = bicam.load_dataframe('bills', download=True)

# Use different DataFrame engines
bills_df = bicam.load_dataframe('bills', 'bills.csv', df_engine='polars')  # Faster for large datasets
bills_df = bicam.load_dataframe('bills', 'bills.csv', df_engine='dask')    # Out-of-memory processing
bills_df = bicam.load_dataframe('bills', 'bills.csv', df_engine='spark')   # Distributed processing
bills_df = bicam.load_dataframe('bills', 'bills.csv', df_engine='duckdb')  # SQL-like queries

# List available datasets
datasets = bicam.list_datasets()
print(f"Available datasets: {datasets}")

# Get dataset information
info = bicam.get_dataset_info('bills')
print(f"Size: {info['size_mb']} MB")

# Advanced options
bills_path = bicam.download_dataset('bills', force_download=True)  # Force re-download
bills_path = bicam.download_dataset('bills', cache_dir='/custom/path')  # Custom cache directory
bills_path = bicam.download_dataset('complete', confirm=True)  # Skip confirmation for large datasets
bills_path = bicam.download_dataset('bills', quiet=True)  # Suppress logging

### Command Line Interface

```bash
# List all available datasets
bicam list-datasets

# List with detailed information
bicam list-datasets --detailed

# Download a specific dataset
bicam download bills

# Download with options
bicam download bills --force          # Force re-download
bicam download bills --cache-dir /path/to/cache  # Custom cache directory
bicam download complete --confirm     # Skip confirmation for large datasets
bicam download bills --quiet          # Suppress output

# Get detailed information about a dataset
bicam info bills

# Show cache usage
bicam cache

# Clear cached data
bicam clear bills        # Clear specific dataset
bicam clear --all       # Clear all cached data
```

## Available Datasets

**NOTE:** Ensure that you have extra disk space in order to properly unzip these datasets, as they are stored as .zip files and
automatically unzip into the cache directory. This may require space up to around 30 GB for larger datatypes, such as amendments.

| Dataset | Size | Description |
|---------|------|-------------|
| **bills** | ~1.8GB | Complete bills data including text, summaries, and related records |
| **amendments** | ~6.6GB | All amendments with amended items |
| **members** | ~1MB | Historical and current member information |
| **nominations** | ~21MB | Presidential nominations data |
| **committees** | ~17MB | Committee information, including history of committee names |
| **committeereports** | ~570MB | Committee reports, with full text and related information |
| **committeemeetings** | ~5MB | Committee meeting records |
| **committeeprints** | ~91MB | Committee prints, including full text and topics |
| **hearings** | ~1.7GB | Hearing information, such as address and transcripts |
| **treaties** | ~0MB | Treaty documents with actions, titles, and more |
| **congresses** | ~1MB | Congressional session metadata, like directories and session dates |
| **complete** | ~12GB | Complete BICAM dataset with all data types |

## Working with Data

### Basic Analysis

```python
import bicam
import pandas as pd

# Load bills data directly into DataFrame
bills_df = bicam.load_dataframe('bills', 'bills.csv', download=True)

# Basic analysis
print(f"Total bills: {len(bills_df)}")
print(f"Congress range: {bills_df['congress'].min()} - {bills_df['congress'].max()}")

# Filter recent bills
recent_bills = bills_df[bills_df['congress'] >= 115]
print(f"Recent bills: {len(recent_bills)}")
```

### Working with Multiple Datasets

```python
import bicam

# Load multiple datasets as DataFrames
bills_sponsors_df = bicam.load_dataframe('bills', 'bills_sponsors.csv', download=True)
members_df = bicam.load_dataframe('members', 'members.csv', download=True)

# Join data (example)
# bills_with_sponsors_detailed = bills_sponsors_df.merge(members_df, left_on='bioguide_id')
```

## Configuration

### Environment Variables

```bash
# Set custom cache directory (default: ~/.bicam)
export BICAM_DATA=/path/to/cache

# Control logging
export BICAM_LOG_LEVEL=DEBUG

# Disable version check
export BICAM_CHECK_VERSION=false
```

### Python Configuration

```python
import bicam

# Get current cache size
cache_info = bicam.get_cache_size()
print(f"Total cache size: {cache_info['total']}")

# Clear specific dataset cache
bicam.clear_cache('bills')

# Clear all cached data
bicam.clear_cache()
```

## Best Practices

### Dataset Selection

- Start with smaller datasets like `congresses` or `members`
- Use `bills` for legislative analysis
- Download `complete` only if you need all data

### Performance Tips

- Use `--quiet` for automated scripts
- Use `--confirm` to skip prompts in batch operations
- Monitor disk space before downloading large datasets
- Use `df_engine='polars'` for faster loading of large datasets
- Use `df_engine='dask'` for out-of-memory processing

### Data Management

- Use `bicam cache` to monitor storage usage
- Clear unused datasets with `bicam clear`
- Consider using custom cache directories for different projects

### Error Handling

```python
import bicam

try:
    bills_df = bicam.load_dataframe('bills', download=True)
except Exception as e:
    print(f"Download failed: {e}")
    # Handle error appropriately
```

## Contributing

We may welcome contributions in the future. For now, please visit <https://bicam.net/feedback> for suggestions, concerns, or data inaccuracies.

## Citation

If you use BICAM in your research, please cite:

{FUTURE CITATION GOES HERE}

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Support

- 📧 Email: <bicam.data@gmail.com>
- 🐛 Issues: [GitHub Issues](https://github.com/bicam-data/bicam/issues)
- 📖 Documentation: [Read the Docs](https://bicam.readthedocs.io)
- 💬 Feedback: [BICAM.net/feedback](https://bicam.net/feedback)

## Acknowledgments

- Congressional data provided by <https://api.congress.gov> and <https://api.govinfo.gov>
- Built with support from MIT and the [LobbyView](https://lobbyview.org) team.

---

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "bicam",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Ryan Delano <bicam.data@gmail.com>",
    "keywords": "congress, legislation, political-science, dataset, government-data",
    "author": null,
    "author_email": "Ryan Delano <bicam.data@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/90/35/4e2508ff60bf22ffcc55d43d14aa3742452c49b769e8104367a4650ce1ee/bicam-0.1.3.tar.gz",
    "platform": null,
    "description": "# BICAM - Comprehensive Congressional Data Downloader\n\n[![PyPI version](https://badge.fury.io/py/bicam.svg)](https://badge.fury.io/py/bicam)\n[![Python versions](https://img.shields.io/pypi/pyversions/bicam.svg)](https://pypi.org/project/bicam/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nThe BICAM package provides easy programmatic access to the Bulk Ingestion of Congressional Actions & Materials (BICAM) dataset, a comprehensive collection of congressional data including bills, amendments, committee reports, hearings, and more sourced from the official [congress.gov](https://congress.gov) and [GovInfo](https://govinfo.gov) APIs.\n\n## Features\n\n- \ud83d\udce6 **11 Dataset Types**: Access bills, amendments, members, committees, hearings, and more\n- \ud83d\ude80 **Fast Downloads**: Optimized S3 downloads with progress tracking\n- \ud83d\udcbe **Smart Caching**: Automatic local caching to avoid re-downloads\n- \ud83d\udd27 **Simple API**: Both Python API and command-line interface\n- \u2705 **Data Integrity**: Automatic checksum verification\n- \ud83d\udcca **Large Scale**: Efficiently handles datasets from 100MB to 12GB+\n\n## Installation\n\n### From PyPI (Recommended)\n\n```bash\n# Using uv (faster, recommended)\nuv pip install bicam\n\n# Using pip (alternative)\npip install bicam\n```\n\n### From Source\n\n```bash\n# Clone and install in development mode\ngit clone https://github.com/bicam-data/bicam\ncd bicam\nuv pip install -e .\n```\n\n## Quick Start\n\n### Python API\n\n```python\nimport bicam\n\n# Download a dataset\nbills_path = bicam.download_dataset('bills')\nprint(f\"Bills data available at: {bills_path}\")\n\n# Load data directly into a DataFrame (downloads if needed, auto-confirms for large datasets)\nbills_df = bicam.load_dataframe('bills', 'bills.csv', download=True)\nprint(f\"Loaded {len(bills_df)} bills\")\n\n# Load members data (will raise error if not cached)\ntry:\n    members_df = bicam.load_dataframe('members', 'members.csv')\nexcept ValueError as e:\n    print(f\"Dataset not cached: {e}\")\n    # Download it first\n    members_df = bicam.load_dataframe('members', 'members.csv', download=True)\n\n# Load first available CSV file from a dataset\ndf = bicam.load_dataframe('bills', download=True)\n\n# Use different DataFrame engines\nbills_df = bicam.load_dataframe('bills', 'bills.csv', df_engine='polars')  # Faster for large datasets\nbills_df = bicam.load_dataframe('bills', 'bills.csv', df_engine='dask')    # Out-of-memory processing\nbills_df = bicam.load_dataframe('bills', 'bills.csv', df_engine='spark')   # Distributed processing\nbills_df = bicam.load_dataframe('bills', 'bills.csv', df_engine='duckdb')  # SQL-like queries\n\n# List available datasets\ndatasets = bicam.list_datasets()\nprint(f\"Available datasets: {datasets}\")\n\n# Get dataset information\ninfo = bicam.get_dataset_info('bills')\nprint(f\"Size: {info['size_mb']} MB\")\n\n# Advanced options\nbills_path = bicam.download_dataset('bills', force_download=True)  # Force re-download\nbills_path = bicam.download_dataset('bills', cache_dir='/custom/path')  # Custom cache directory\nbills_path = bicam.download_dataset('complete', confirm=True)  # Skip confirmation for large datasets\nbills_path = bicam.download_dataset('bills', quiet=True)  # Suppress logging\n\n### Command Line Interface\n\n```bash\n# List all available datasets\nbicam list-datasets\n\n# List with detailed information\nbicam list-datasets --detailed\n\n# Download a specific dataset\nbicam download bills\n\n# Download with options\nbicam download bills --force          # Force re-download\nbicam download bills --cache-dir /path/to/cache  # Custom cache directory\nbicam download complete --confirm     # Skip confirmation for large datasets\nbicam download bills --quiet          # Suppress output\n\n# Get detailed information about a dataset\nbicam info bills\n\n# Show cache usage\nbicam cache\n\n# Clear cached data\nbicam clear bills        # Clear specific dataset\nbicam clear --all       # Clear all cached data\n```\n\n## Available Datasets\n\n**NOTE:** Ensure that you have extra disk space in order to properly unzip these datasets, as they are stored as .zip files and\nautomatically unzip into the cache directory. This may require space up to around 30 GB for larger datatypes, such as amendments.\n\n| Dataset | Size | Description |\n|---------|------|-------------|\n| **bills** | ~1.8GB | Complete bills data including text, summaries, and related records |\n| **amendments** | ~6.6GB | All amendments with amended items |\n| **members** | ~1MB | Historical and current member information |\n| **nominations** | ~21MB | Presidential nominations data |\n| **committees** | ~17MB | Committee information, including history of committee names |\n| **committeereports** | ~570MB | Committee reports, with full text and related information |\n| **committeemeetings** | ~5MB | Committee meeting records |\n| **committeeprints** | ~91MB | Committee prints, including full text and topics |\n| **hearings** | ~1.7GB | Hearing information, such as address and transcripts |\n| **treaties** | ~0MB | Treaty documents with actions, titles, and more |\n| **congresses** | ~1MB | Congressional session metadata, like directories and session dates |\n| **complete** | ~12GB | Complete BICAM dataset with all data types |\n\n## Working with Data\n\n### Basic Analysis\n\n```python\nimport bicam\nimport pandas as pd\n\n# Load bills data directly into DataFrame\nbills_df = bicam.load_dataframe('bills', 'bills.csv', download=True)\n\n# Basic analysis\nprint(f\"Total bills: {len(bills_df)}\")\nprint(f\"Congress range: {bills_df['congress'].min()} - {bills_df['congress'].max()}\")\n\n# Filter recent bills\nrecent_bills = bills_df[bills_df['congress'] >= 115]\nprint(f\"Recent bills: {len(recent_bills)}\")\n```\n\n### Working with Multiple Datasets\n\n```python\nimport bicam\n\n# Load multiple datasets as DataFrames\nbills_sponsors_df = bicam.load_dataframe('bills', 'bills_sponsors.csv', download=True)\nmembers_df = bicam.load_dataframe('members', 'members.csv', download=True)\n\n# Join data (example)\n# bills_with_sponsors_detailed = bills_sponsors_df.merge(members_df, left_on='bioguide_id')\n```\n\n## Configuration\n\n### Environment Variables\n\n```bash\n# Set custom cache directory (default: ~/.bicam)\nexport BICAM_DATA=/path/to/cache\n\n# Control logging\nexport BICAM_LOG_LEVEL=DEBUG\n\n# Disable version check\nexport BICAM_CHECK_VERSION=false\n```\n\n### Python Configuration\n\n```python\nimport bicam\n\n# Get current cache size\ncache_info = bicam.get_cache_size()\nprint(f\"Total cache size: {cache_info['total']}\")\n\n# Clear specific dataset cache\nbicam.clear_cache('bills')\n\n# Clear all cached data\nbicam.clear_cache()\n```\n\n## Best Practices\n\n### Dataset Selection\n\n- Start with smaller datasets like `congresses` or `members`\n- Use `bills` for legislative analysis\n- Download `complete` only if you need all data\n\n### Performance Tips\n\n- Use `--quiet` for automated scripts\n- Use `--confirm` to skip prompts in batch operations\n- Monitor disk space before downloading large datasets\n- Use `df_engine='polars'` for faster loading of large datasets\n- Use `df_engine='dask'` for out-of-memory processing\n\n### Data Management\n\n- Use `bicam cache` to monitor storage usage\n- Clear unused datasets with `bicam clear`\n- Consider using custom cache directories for different projects\n\n### Error Handling\n\n```python\nimport bicam\n\ntry:\n    bills_df = bicam.load_dataframe('bills', download=True)\nexcept Exception as e:\n    print(f\"Download failed: {e}\")\n    # Handle error appropriately\n```\n\n## Contributing\n\nWe may welcome contributions in the future. For now, please visit <https://bicam.net/feedback> for suggestions, concerns, or data inaccuracies.\n\n## Citation\n\nIf you use BICAM in your research, please cite:\n\n{FUTURE CITATION GOES HERE}\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Support\n\n- \ud83d\udce7 Email: <bicam.data@gmail.com>\n- \ud83d\udc1b Issues: [GitHub Issues](https://github.com/bicam-data/bicam/issues)\n- \ud83d\udcd6 Documentation: [Read the Docs](https://bicam.readthedocs.io)\n- \ud83d\udcac Feedback: [BICAM.net/feedback](https://bicam.net/feedback)\n\n## Acknowledgments\n\n- Congressional data provided by <https://api.congress.gov> and <https://api.govinfo.gov>\n- Built with support from MIT and the [LobbyView](https://lobbyview.org) team.\n\n---\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "BICAM Dataset Downloader - Access comprehensive congressional data",
    "version": "0.1.3",
    "project_urls": {
        "Changelog": "https://github.com/bicam-data/bicam/blob/main/CHANGELOG.md",
        "Documentation": "https://bicam.readthedocs.io",
        "Homepage": "https://github.com/bicam-data/bicam",
        "Issues": "https://github.com/bicam-data/bicam/issues",
        "Repository": "https://github.com/bicam-data/bicam.git"
    },
    "split_keywords": [
        "congress",
        " legislation",
        " political-science",
        " dataset",
        " government-data"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3c1cfdf7722f24e9cd31abb7b7e388b261b418e297259aad6f2a411624d74b7a",
                "md5": "5f5a492f02484964a930f0cbb9d0dc02",
                "sha256": "28874bc429d55576368397caf3a1760326426feed9be677c9f29ac00c1a0c8cb"
            },
            "downloads": -1,
            "filename": "bicam-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5f5a492f02484964a930f0cbb9d0dc02",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 21423,
            "upload_time": "2025-07-14T13:03:13",
            "upload_time_iso_8601": "2025-07-14T13:03:13.590398Z",
            "url": "https://files.pythonhosted.org/packages/3c/1c/fdf7722f24e9cd31abb7b7e388b261b418e297259aad6f2a411624d74b7a/bicam-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "90354e2508ff60bf22ffcc55d43d14aa3742452c49b769e8104367a4650ce1ee",
                "md5": "ea45826c70eb6e32c72c899d416838d8",
                "sha256": "cf54d70b2f43bb9d51f02e2da7f3565e4e5d069b0bbb1cf2ff26154cfeec96cf"
            },
            "downloads": -1,
            "filename": "bicam-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "ea45826c70eb6e32c72c899d416838d8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 183021,
            "upload_time": "2025-07-14T13:03:14",
            "upload_time_iso_8601": "2025-07-14T13:03:14.576232Z",
            "url": "https://files.pythonhosted.org/packages/90/35/4e2508ff60bf22ffcc55d43d14aa3742452c49b769e8104367a4650ce1ee/bicam-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-14 13:03:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bicam-data",
    "github_project": "bicam",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "bicam"
}
        
Elapsed time: 0.41804s