# REDCap-EDA

## 📌 Overview
REDCap-EDA is a command-line tool for performing **Exploratory Data Analysis (EDA)** on **REDCap datasets**. It automates data inspection, schema enforcement, statistical analysis, visualization, and report generation.
## 🚀 Features
- ✅ **Automatic Data Type Enforcement** (casts columns based on a predefined or user-defined schema)
- 📊 **Summary Statistics** (mean, median, std dev, outliers, categorical distributions)
- 📉 **Visualizations** (histograms, box plots, categorical distributions, time trends, word clouds)
- 📂 **Comprehensive PDF Report Generation** with **UnifiedReport**
- 🔄 **Multiprocessing for Faster Execution**
- 🔍 **Progress Bars with `tqdm`**
- 📂 **Exports Reports** (JSON, PDF, and saved visualizations)
- 📝 **Interactive Schema Creation** for custom datasets
## 📦 Installation
```bash
pip install redcap-eda
```
## 🛠️ Usage
### 🔹 Example Using the Sample Dataset and Interactive Schema Creation
```bash
redcap-eda analyze --sample
```
### 🔹 Example Using the Sample Dataset with a Predefined Schema
```bash
redcap-eda analyze --sample --sample-schema
```
### 🔹 Running EDA on a Custom Dataset with Interactive Schema Creation
```bash
redcap-eda analyze --csv path/to/your_data.csv
```
### 🔹 Running EDA on a Custom Dataset with a Predefined Schema
```bash
redcap-eda analyze --csv path/to/your_data.csv --schema path/to/schema.json
```
### 🔹 Running in Debug Mode
```bash
redcap-eda --debug analyze --sample
```
### 🔹 Listing Available Test Cases
```bash
redcap-eda list-cases
```
## 📂 Project Structure
```bash
.
├── Makefile # Helper commands
├── README.md # Project documentation
├── dist # Distribution files for PyPI
├── mypy.ini # Type checking configuration
├── poetry.lock # Poetry dependency lock file
├── pyproject.toml # Poetry project configuration
├── schemas # Saved schema files
│ └── schema_sample_dataset.json
├── src
│ ├── logs
│ │ └── redcap_eda.log # Log files
│ └── redcap_eda
│ ├── analysis # EDA analysis modules
│ │ ├── categorical
│ │ │ └── mixins.py # Categorical data analysis
│ │ ├── datetime
│ │ │ └── mixins.py # Datetime data analysis
│ │ ├── eda.py # Main EDA module
│ │ ├── json_report_handler.py # JSON export utility
│ │ ├── lib.py # Shared data structures (e.g., AnalysisResult)
│ │ ├── missing
│ │ │ └── mixins.py # Missing data analysis
│ │ ├── numerical
│ │ │ └── mixins.py # Numerical data analysis
│ │ └── text
│ │ └── mixins.py # Text data analysis
│ ├── cast_schema.py # Schema enforcement
│ ├── cli.py # Command-line interface
│ ├── load_case_data.py # Dataset loader
│ ├── logger.py # Logging utilities
│ └── unified_report.py # PDF report generation
└── tests # Unit tests
├── __init__.py
└── fixtures
└── toy_data.csv # Sample test data
```
## 📝 Contributing
1. **Fork the repository** and create a feature branch.
2. **Run tests** to ensure code integrity:
```bash
poetry run pytest tests/
```
3. **Submit a pull request** with a detailed description.
## 📜 License
This project is licensed under the **MIT License**.
## 🤝 Acknowledgments
- [REDCap](https://projectredcap.org/) for enabling structured data collection.
- The **Open Source Community** for inspiration & contributions!
Raw data
{
"_id": null,
"home_page": "https://github.com/yourusername/redcap-eda",
"name": "redcap-eda",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.13",
"maintainer_email": null,
"keywords": "redcap, eda, exploratory data analysis, data visualization",
"author": "Robert Portelli",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/d9/aa/50d8fd45f0772dcc8f67fdd6fe1a1c7b5c5e5400348c174fe6474325237d/redcap_eda-0.2.1.tar.gz",
"platform": null,
"description": "# REDCap-EDA\n\n\n\n## \ud83d\udccc Overview\nREDCap-EDA is a command-line tool for performing **Exploratory Data Analysis (EDA)** on **REDCap datasets**. It automates data inspection, schema enforcement, statistical analysis, visualization, and report generation.\n\n## \ud83d\ude80 Features\n- \u2705 **Automatic Data Type Enforcement** (casts columns based on a predefined or user-defined schema)\n- \ud83d\udcca **Summary Statistics** (mean, median, std dev, outliers, categorical distributions)\n- \ud83d\udcc9 **Visualizations** (histograms, box plots, categorical distributions, time trends, word clouds)\n- \ud83d\udcc2 **Comprehensive PDF Report Generation** with **UnifiedReport**\n- \ud83d\udd04 **Multiprocessing for Faster Execution**\n- \ud83d\udd0d **Progress Bars with `tqdm`**\n- \ud83d\udcc2 **Exports Reports** (JSON, PDF, and saved visualizations)\n- \ud83d\udcdd **Interactive Schema Creation** for custom datasets\n\n## \ud83d\udce6 Installation\n```bash\npip install redcap-eda\n```\n\n## \ud83d\udee0\ufe0f Usage\n\n### \ud83d\udd39 Example Using the Sample Dataset and Interactive Schema Creation\n```bash\nredcap-eda analyze --sample\n```\n\n### \ud83d\udd39 Example Using the Sample Dataset with a Predefined Schema\n```bash\nredcap-eda analyze --sample --sample-schema\n```\n\n### \ud83d\udd39 Running EDA on a Custom Dataset with Interactive Schema Creation\n```bash\nredcap-eda analyze --csv path/to/your_data.csv\n```\n\n### \ud83d\udd39 Running EDA on a Custom Dataset with a Predefined Schema\n```bash\nredcap-eda analyze --csv path/to/your_data.csv --schema path/to/schema.json\n```\n\n### \ud83d\udd39 Running in Debug Mode\n```bash\nredcap-eda --debug analyze --sample\n```\n\n### \ud83d\udd39 Listing Available Test Cases\n```bash\nredcap-eda list-cases\n```\n\n## \ud83d\udcc2 Project Structure\n```bash\n.\n\u251c\u2500\u2500 Makefile # Helper commands\n\u251c\u2500\u2500 README.md # Project documentation\n\u251c\u2500\u2500 dist # Distribution files for PyPI\n\u251c\u2500\u2500 mypy.ini # Type checking configuration\n\u251c\u2500\u2500 poetry.lock # Poetry dependency lock file\n\u251c\u2500\u2500 pyproject.toml # Poetry project configuration\n\u251c\u2500\u2500 schemas # Saved schema files\n\u2502 \u2514\u2500\u2500 schema_sample_dataset.json\n\u251c\u2500\u2500 src\n\u2502 \u251c\u2500\u2500 logs\n\u2502 \u2502 \u2514\u2500\u2500 redcap_eda.log # Log files\n\u2502 \u2514\u2500\u2500 redcap_eda\n\u2502 \u251c\u2500\u2500 analysis # EDA analysis modules\n\u2502 \u2502 \u251c\u2500\u2500 categorical\n\u2502 \u2502 \u2502 \u2514\u2500\u2500 mixins.py # Categorical data analysis\n\u2502 \u2502 \u251c\u2500\u2500 datetime\n\u2502 \u2502 \u2502 \u2514\u2500\u2500 mixins.py # Datetime data analysis\n\u2502 \u2502 \u251c\u2500\u2500 eda.py # Main EDA module\n\u2502 \u2502 \u251c\u2500\u2500 json_report_handler.py # JSON export utility\n\u2502 \u2502 \u251c\u2500\u2500 lib.py # Shared data structures (e.g., AnalysisResult)\n\u2502 \u2502 \u251c\u2500\u2500 missing\n\u2502 \u2502 \u2502 \u2514\u2500\u2500 mixins.py # Missing data analysis\n\u2502 \u2502 \u251c\u2500\u2500 numerical\n\u2502 \u2502 \u2502 \u2514\u2500\u2500 mixins.py # Numerical data analysis\n\u2502 \u2502 \u2514\u2500\u2500 text\n\u2502 \u2502 \u2514\u2500\u2500 mixins.py # Text data analysis\n\u2502 \u251c\u2500\u2500 cast_schema.py # Schema enforcement\n\u2502 \u251c\u2500\u2500 cli.py # Command-line interface\n\u2502 \u251c\u2500\u2500 load_case_data.py # Dataset loader\n\u2502 \u251c\u2500\u2500 logger.py # Logging utilities\n\u2502 \u2514\u2500\u2500 unified_report.py # PDF report generation\n\u2514\u2500\u2500 tests # Unit tests\n \u251c\u2500\u2500 __init__.py\n \u2514\u2500\u2500 fixtures\n \u2514\u2500\u2500 toy_data.csv # Sample test data\n```\n\n## \ud83d\udcdd Contributing\n1. **Fork the repository** and create a feature branch.\n2. **Run tests** to ensure code integrity:\n ```bash\n poetry run pytest tests/\n ```\n3. **Submit a pull request** with a detailed description.\n\n## \ud83d\udcdc License\nThis project is licensed under the **MIT License**.\n\n## \ud83e\udd1d Acknowledgments\n- [REDCap](https://projectredcap.org/) for enabling structured data collection.\n- The **Open Source Community** for inspiration & contributions!\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Perform exploratory data analysis on REDCap data",
"version": "0.2.1",
"project_urls": {
"Homepage": "https://github.com/yourusername/redcap-eda",
"Repository": "https://github.com/yourusername/redcap-eda",
"issue_tracker": "https://github.com/yourusername/redcap-eda/issues"
},
"split_keywords": [
"redcap",
" eda",
" exploratory data analysis",
" data visualization"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3826e5be179d63ebf7c1903ec9471ad02ce8b557e48589e56e69188f48c40586",
"md5": "93c068a948722014af85306e4d278f4b",
"sha256": "366f778a899e42469ecb42ca2268fd1454edc7762363e2c37aea0f27d6bf1350"
},
"downloads": -1,
"filename": "redcap_eda-0.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "93c068a948722014af85306e4d278f4b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.13",
"size": 26405,
"upload_time": "2025-02-28T21:26:43",
"upload_time_iso_8601": "2025-02-28T21:26:43.828643Z",
"url": "https://files.pythonhosted.org/packages/38/26/e5be179d63ebf7c1903ec9471ad02ce8b557e48589e56e69188f48c40586/redcap_eda-0.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d9aa50d8fd45f0772dcc8f67fdd6fe1a1c7b5c5e5400348c174fe6474325237d",
"md5": "5b1783656f2a171f821faf9290df90b2",
"sha256": "d4142d963ddf52680f20c3d996c6225d9e329f7f7972e7ee50b327aa7ab25f34"
},
"downloads": -1,
"filename": "redcap_eda-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "5b1783656f2a171f821faf9290df90b2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.13",
"size": 18311,
"upload_time": "2025-02-28T21:26:45",
"upload_time_iso_8601": "2025-02-28T21:26:45.264827Z",
"url": "https://files.pythonhosted.org/packages/d9/aa/50d8fd45f0772dcc8f67fdd6fe1a1c7b5c5e5400348c174fe6474325237d/redcap_eda-0.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-28 21:26:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yourusername",
"github_project": "redcap-eda",
"github_not_found": true,
"lcname": "redcap-eda"
}