redcap-eda


Nameredcap-eda JSON
Version 0.2.1 PyPI version JSON
download
home_pagehttps://github.com/yourusername/redcap-eda
SummaryPerform exploratory data analysis on REDCap data
upload_time2025-02-28 21:26:45
maintainerNone
docs_urlNone
authorRobert Portelli
requires_python>=3.13
licenseMIT
keywords redcap eda exploratory data analysis data visualization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # REDCap-EDA

![CI Status](https://github.com/robertp/REDCap-EDA/actions/workflows/ci.yaml/badge.svg)

## 📌 Overview
REDCap-EDA is a command-line tool for performing **Exploratory Data Analysis (EDA)** on **REDCap datasets**. It automates data inspection, schema enforcement, statistical analysis, visualization, and report generation.

## 🚀 Features
- ✅ **Automatic Data Type Enforcement** (casts columns based on a predefined or user-defined schema)
- 📊 **Summary Statistics** (mean, median, std dev, outliers, categorical distributions)
- 📉 **Visualizations** (histograms, box plots, categorical distributions, time trends, word clouds)
- 📂 **Comprehensive PDF Report Generation** with **UnifiedReport**
- 🔄 **Multiprocessing for Faster Execution**
- 🔍 **Progress Bars with `tqdm`**
- 📂 **Exports Reports** (JSON, PDF, and saved visualizations)
- 📝 **Interactive Schema Creation** for custom datasets

## 📦 Installation
```bash
pip install redcap-eda
```

## 🛠️ Usage

### 🔹 Example Using the Sample Dataset and Interactive Schema Creation
```bash
redcap-eda analyze --sample
```

### 🔹 Example Using the Sample Dataset with a Predefined Schema
```bash
redcap-eda analyze --sample --sample-schema
```

### 🔹 Running EDA on a Custom Dataset with Interactive Schema Creation
```bash
redcap-eda analyze --csv path/to/your_data.csv
```

### 🔹 Running EDA on a Custom Dataset with a Predefined Schema
```bash
redcap-eda analyze --csv path/to/your_data.csv --schema path/to/schema.json
```

### 🔹 Running in Debug Mode
```bash
redcap-eda --debug analyze --sample
```

### 🔹 Listing Available Test Cases
```bash
redcap-eda list-cases
```

## 📂 Project Structure
```bash
.
├── Makefile                # Helper commands
├── README.md               # Project documentation
├── dist                    # Distribution files for PyPI
├── mypy.ini                # Type checking configuration
├── poetry.lock             # Poetry dependency lock file
├── pyproject.toml          # Poetry project configuration
├── schemas                 # Saved schema files
│   └── schema_sample_dataset.json
├── src
│   ├── logs
│   │   └── redcap_eda.log  # Log files
│   └── redcap_eda
│       ├── analysis        # EDA analysis modules
│       │   ├── categorical
│       │   │   └── mixins.py # Categorical data analysis
│       │   ├── datetime
│       │   │   └── mixins.py # Datetime data analysis
│       │   ├── eda.py      # Main EDA module
│       │   ├── json_report_handler.py # JSON export utility
│       │   ├── lib.py       # Shared data structures (e.g., AnalysisResult)
│       │   ├── missing
│       │   │   └── mixins.py # Missing data analysis
│       │   ├── numerical
│       │   │   └── mixins.py # Numerical data analysis
│       │   └── text
│       │       └── mixins.py # Text data analysis
│       ├── cast_schema.py  # Schema enforcement
│       ├── cli.py          # Command-line interface
│       ├── load_case_data.py # Dataset loader
│       ├── logger.py       # Logging utilities
│       └── unified_report.py # PDF report generation
└── tests                   # Unit tests
    ├── __init__.py
    └── fixtures
        └── toy_data.csv    # Sample test data
```

## 📝 Contributing
1. **Fork the repository** and create a feature branch.
2. **Run tests** to ensure code integrity:
   ```bash
   poetry run pytest tests/
   ```
3. **Submit a pull request** with a detailed description.

## 📜 License
This project is licensed under the **MIT License**.

## 🤝 Acknowledgments
- [REDCap](https://projectredcap.org/) for enabling structured data collection.
- The **Open Source Community** for inspiration & contributions!

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yourusername/redcap-eda",
    "name": "redcap-eda",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.13",
    "maintainer_email": null,
    "keywords": "redcap, eda, exploratory data analysis, data visualization",
    "author": "Robert Portelli",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/d9/aa/50d8fd45f0772dcc8f67fdd6fe1a1c7b5c5e5400348c174fe6474325237d/redcap_eda-0.2.1.tar.gz",
    "platform": null,
    "description": "# REDCap-EDA\n\n![CI Status](https://github.com/robertp/REDCap-EDA/actions/workflows/ci.yaml/badge.svg)\n\n## \ud83d\udccc Overview\nREDCap-EDA is a command-line tool for performing **Exploratory Data Analysis (EDA)** on **REDCap datasets**. It automates data inspection, schema enforcement, statistical analysis, visualization, and report generation.\n\n## \ud83d\ude80 Features\n- \u2705 **Automatic Data Type Enforcement** (casts columns based on a predefined or user-defined schema)\n- \ud83d\udcca **Summary Statistics** (mean, median, std dev, outliers, categorical distributions)\n- \ud83d\udcc9 **Visualizations** (histograms, box plots, categorical distributions, time trends, word clouds)\n- \ud83d\udcc2 **Comprehensive PDF Report Generation** with **UnifiedReport**\n- \ud83d\udd04 **Multiprocessing for Faster Execution**\n- \ud83d\udd0d **Progress Bars with `tqdm`**\n- \ud83d\udcc2 **Exports Reports** (JSON, PDF, and saved visualizations)\n- \ud83d\udcdd **Interactive Schema Creation** for custom datasets\n\n## \ud83d\udce6 Installation\n```bash\npip install redcap-eda\n```\n\n## \ud83d\udee0\ufe0f Usage\n\n### \ud83d\udd39 Example Using the Sample Dataset and Interactive Schema Creation\n```bash\nredcap-eda analyze --sample\n```\n\n### \ud83d\udd39 Example Using the Sample Dataset with a Predefined Schema\n```bash\nredcap-eda analyze --sample --sample-schema\n```\n\n### \ud83d\udd39 Running EDA on a Custom Dataset with Interactive Schema Creation\n```bash\nredcap-eda analyze --csv path/to/your_data.csv\n```\n\n### \ud83d\udd39 Running EDA on a Custom Dataset with a Predefined Schema\n```bash\nredcap-eda analyze --csv path/to/your_data.csv --schema path/to/schema.json\n```\n\n### \ud83d\udd39 Running in Debug Mode\n```bash\nredcap-eda --debug analyze --sample\n```\n\n### \ud83d\udd39 Listing Available Test Cases\n```bash\nredcap-eda list-cases\n```\n\n## \ud83d\udcc2 Project Structure\n```bash\n.\n\u251c\u2500\u2500 Makefile                # Helper commands\n\u251c\u2500\u2500 README.md               # Project documentation\n\u251c\u2500\u2500 dist                    # Distribution files for PyPI\n\u251c\u2500\u2500 mypy.ini                # Type checking configuration\n\u251c\u2500\u2500 poetry.lock             # Poetry dependency lock file\n\u251c\u2500\u2500 pyproject.toml          # Poetry project configuration\n\u251c\u2500\u2500 schemas                 # Saved schema files\n\u2502   \u2514\u2500\u2500 schema_sample_dataset.json\n\u251c\u2500\u2500 src\n\u2502   \u251c\u2500\u2500 logs\n\u2502   \u2502   \u2514\u2500\u2500 redcap_eda.log  # Log files\n\u2502   \u2514\u2500\u2500 redcap_eda\n\u2502       \u251c\u2500\u2500 analysis        # EDA analysis modules\n\u2502       \u2502   \u251c\u2500\u2500 categorical\n\u2502       \u2502   \u2502   \u2514\u2500\u2500 mixins.py # Categorical data analysis\n\u2502       \u2502   \u251c\u2500\u2500 datetime\n\u2502       \u2502   \u2502   \u2514\u2500\u2500 mixins.py # Datetime data analysis\n\u2502       \u2502   \u251c\u2500\u2500 eda.py      # Main EDA module\n\u2502       \u2502   \u251c\u2500\u2500 json_report_handler.py # JSON export utility\n\u2502       \u2502   \u251c\u2500\u2500 lib.py       # Shared data structures (e.g., AnalysisResult)\n\u2502       \u2502   \u251c\u2500\u2500 missing\n\u2502       \u2502   \u2502   \u2514\u2500\u2500 mixins.py # Missing data analysis\n\u2502       \u2502   \u251c\u2500\u2500 numerical\n\u2502       \u2502   \u2502   \u2514\u2500\u2500 mixins.py # Numerical data analysis\n\u2502       \u2502   \u2514\u2500\u2500 text\n\u2502       \u2502       \u2514\u2500\u2500 mixins.py # Text data analysis\n\u2502       \u251c\u2500\u2500 cast_schema.py  # Schema enforcement\n\u2502       \u251c\u2500\u2500 cli.py          # Command-line interface\n\u2502       \u251c\u2500\u2500 load_case_data.py # Dataset loader\n\u2502       \u251c\u2500\u2500 logger.py       # Logging utilities\n\u2502       \u2514\u2500\u2500 unified_report.py # PDF report generation\n\u2514\u2500\u2500 tests                   # Unit tests\n    \u251c\u2500\u2500 __init__.py\n    \u2514\u2500\u2500 fixtures\n        \u2514\u2500\u2500 toy_data.csv    # Sample test data\n```\n\n## \ud83d\udcdd Contributing\n1. **Fork the repository** and create a feature branch.\n2. **Run tests** to ensure code integrity:\n   ```bash\n   poetry run pytest tests/\n   ```\n3. **Submit a pull request** with a detailed description.\n\n## \ud83d\udcdc License\nThis project is licensed under the **MIT License**.\n\n## \ud83e\udd1d Acknowledgments\n- [REDCap](https://projectredcap.org/) for enabling structured data collection.\n- The **Open Source Community** for inspiration & contributions!\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Perform exploratory data analysis on REDCap data",
    "version": "0.2.1",
    "project_urls": {
        "Homepage": "https://github.com/yourusername/redcap-eda",
        "Repository": "https://github.com/yourusername/redcap-eda",
        "issue_tracker": "https://github.com/yourusername/redcap-eda/issues"
    },
    "split_keywords": [
        "redcap",
        " eda",
        " exploratory data analysis",
        " data visualization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3826e5be179d63ebf7c1903ec9471ad02ce8b557e48589e56e69188f48c40586",
                "md5": "93c068a948722014af85306e4d278f4b",
                "sha256": "366f778a899e42469ecb42ca2268fd1454edc7762363e2c37aea0f27d6bf1350"
            },
            "downloads": -1,
            "filename": "redcap_eda-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "93c068a948722014af85306e4d278f4b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.13",
            "size": 26405,
            "upload_time": "2025-02-28T21:26:43",
            "upload_time_iso_8601": "2025-02-28T21:26:43.828643Z",
            "url": "https://files.pythonhosted.org/packages/38/26/e5be179d63ebf7c1903ec9471ad02ce8b557e48589e56e69188f48c40586/redcap_eda-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d9aa50d8fd45f0772dcc8f67fdd6fe1a1c7b5c5e5400348c174fe6474325237d",
                "md5": "5b1783656f2a171f821faf9290df90b2",
                "sha256": "d4142d963ddf52680f20c3d996c6225d9e329f7f7972e7ee50b327aa7ab25f34"
            },
            "downloads": -1,
            "filename": "redcap_eda-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5b1783656f2a171f821faf9290df90b2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.13",
            "size": 18311,
            "upload_time": "2025-02-28T21:26:45",
            "upload_time_iso_8601": "2025-02-28T21:26:45.264827Z",
            "url": "https://files.pythonhosted.org/packages/d9/aa/50d8fd45f0772dcc8f67fdd6fe1a1c7b5c5e5400348c174fe6474325237d/redcap_eda-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-28 21:26:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yourusername",
    "github_project": "redcap-eda",
    "github_not_found": true,
    "lcname": "redcap-eda"
}
        
Elapsed time: 0.43837s