# AutoCSV Profiler
A Python toolkit for automated CSV data analysis with statistical profiling and visualization.
[](https://pypi.org/project/autocsv-profiler/)
[](https://www.python.org/downloads/)
[](LICENSE)
[](https://github.com/dhaneshbb/autocsv-profiler)
## Overview
AutoCSV Profiler provides automated analysis of CSV files with statistical summaries, data quality assessment, and visualization generation. It features memory-efficient processing, automatic delimiter detection, and a rich console interface.
**Key Features:**
- Interactive analysis mode with step-by-step guidance
- Automatic delimiter detection and encoding validation
- Memory-efficient chunked processing for large files
- Statistical analysis with descriptive statistics and data quality metrics
- Visualization generation (KDE plots, box plots, Q-Q plots, bar charts, pie charts)
- Rich console interface with progress tracking
- Configurable via CLI flags or environment variables
## Installation
**Requirements:** Python 3.8 - 3.13
```bash
pip install autocsv-profiler
```
## Quick Start
**Interactive Mode:**
```bash
autocsv-profiler
```
Step-by-step guidance for first-time users.
**Direct Analysis:**
```bash
autocsv-profiler data.csv
```
Quick analysis with sensible defaults.
## Usage
```bash
# Show help
autocsv-profiler --help
```
### Command Line Interface
```bash
# Show help
autocsv-profiler --help
# Basic analysis
autocsv-profiler data.csv
# Custom output directory
autocsv-profiler data.csv --output results/
# Custom delimiter
autocsv-profiler data.csv --delimiter ";"
# Large file processing
autocsv-profiler data.csv --memory-limit 4.0 --chunk-size 20000
# Non-interactive mode
autocsv-profiler data.csv --non-interactive
# Debug mode
autocsv-profiler data.csv --debug
```
### Python API
```python
import autocsv_profiler
# Basic analysis
result_dir = autocsv_profiler.analyze('data.csv')
print(f"Analysis saved to: {result_dir}")
# Custom configuration
result_dir = autocsv_profiler.analyze(
csv_file_path='data.csv',
output_dir='results/',
delimiter=',',
chunk_size=10000,
memory_limit_gb=1
)
# Interactive mode
result_dir = autocsv_profiler.analyze(
csv_file_path='data.csv',
interactive=True
)
```
### Configuration
Environment variables with `AUTOCSV_` prefix:
```bash
# Performance settings
export AUTOCSV_PERFORMANCE_MEMORY_LIMIT_GB=2
export AUTOCSV_PERFORMANCE_CHUNK_SIZE=20000
# Logging settings
export AUTOCSV_LOGGING_LEVEL=DEBUG
export AUTOCSV_LOGGING_CONSOLE_LEVEL=INFO
```
## Output Files
Analysis generates the following files in the output directory:
**Data Summaries:**
- `dataset_analysis.txt` - Dataset overview and basic statistics
- `numerical_summary.csv` - Summary statistics for numeric columns
- `categorical_summary.csv` - Summary for categorical columns
- `numerical_stats.csv` - Descriptive statistics using researchpy
- `categorical_stats.csv` - Categorical frequency analysis
- `distinct_values.txt` - Unique value counts per column
**Visualizations:**
- `kde_plots/` - Kernel density estimation plots
- `box_plots/` - Box plots for numerical variables
- `qq_plots/` - Q-Q plots for normality testing
- `bar_charts/` - Bar charts for categorical variables
- `pie_charts/` - Categorical distribution pie charts
**Process Logs:**
- `autocsv_profiler.log` - Processing log file
## Documentation
**User Documentation:**
- [User Guide](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/user-guide.md) - Installation, CLI usage, and examples
- [Configuration](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/configuration.md) - Settings and environment variables
- [Troubleshooting](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/troubleshooting.md) - Problem-solving guide
**Developer Documentation:**
- [API Reference](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/api-reference.md) - Python API documentation
- [Developer Guide](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/developer-guide.md) - Development workflow and architecture
- [Architecture Diagrams](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/diagrams.md) - Visual system architecture
**Complete Index:**
- [Documentation Index](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/index.md) - Complete documentation overview
## Contributing
Contributions are welcome! See [CONTRIBUTING.md](https://github.com/dhaneshbb/autocsv-profiler/blob/master/CONTRIBUTING.md) for guidelines.
## License
MIT License - see [LICENSE](https://github.com/dhaneshbb/autocsv-profiler/blob/master/LICENSE) for details.
This software includes third-party components. See [NOTICE](https://github.com/dhaneshbb/autocsv-profiler/blob/master/NOTICE) for complete license information.
## Links
- **PyPI:** https://pypi.org/project/autocsv-profiler/
- **Repository:** https://github.com/dhaneshbb/autocsv-profiler
- **Documentation:** https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/index.md
- **Issues:** https://github.com/dhaneshbb/autocsv-profiler/issues
- **Changelog:** https://github.com/dhaneshbb/autocsv-profiler/blob/master/CHANGELOG.md
---
Version: 2.0.0 | Status: Beta | Python: 3.8-3.13
Copyright 2025 dhaneshbb | License: MIT | Homepage: https://github.com/dhaneshbb/autocsv-profiler
Raw data
{
"_id": null,
"home_page": null,
"name": "autocsv-profiler",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "dhaneshbb <dhaneshbb5@gmail.com>",
"keywords": "csv, data-analysis, profiling, statistics, pandas, data-quality, exploratory-analysis, visualization, statistical-analysis, data-profiling, automated-analysis, csv-analysis, data-exploration",
"author": null,
"author_email": "dhaneshbb <dhaneshbb5@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/4f/2c/05e20e74ca2528a2076b14d43babc2f5d878c19b6f180157bd30d30d0ad6/autocsv_profiler-2.0.0.tar.gz",
"platform": null,
"description": "# AutoCSV Profiler\r\n\r\nA Python toolkit for automated CSV data analysis with statistical profiling and visualization.\r\n\r\n[](https://pypi.org/project/autocsv-profiler/)\r\n[](https://www.python.org/downloads/)\r\n[](LICENSE)\r\n[](https://github.com/dhaneshbb/autocsv-profiler)\r\n\r\n## Overview\r\n\r\nAutoCSV Profiler provides automated analysis of CSV files with statistical summaries, data quality assessment, and visualization generation. It features memory-efficient processing, automatic delimiter detection, and a rich console interface.\r\n\r\n**Key Features:**\r\n- Interactive analysis mode with step-by-step guidance\r\n- Automatic delimiter detection and encoding validation\r\n- Memory-efficient chunked processing for large files\r\n- Statistical analysis with descriptive statistics and data quality metrics\r\n- Visualization generation (KDE plots, box plots, Q-Q plots, bar charts, pie charts)\r\n- Rich console interface with progress tracking\r\n- Configurable via CLI flags or environment variables\r\n\r\n## Installation\r\n\r\n**Requirements:** Python 3.8 - 3.13\r\n\r\n```bash\r\npip install autocsv-profiler\r\n```\r\n\r\n## Quick Start\r\n\r\n**Interactive Mode:**\r\n```bash\r\nautocsv-profiler\r\n```\r\n\r\nStep-by-step guidance for first-time users.\r\n\r\n**Direct Analysis:**\r\n```bash\r\nautocsv-profiler data.csv\r\n```\r\n\r\nQuick analysis with sensible defaults.\r\n\r\n## Usage\r\n\r\n```bash\r\n# Show help\r\nautocsv-profiler --help\r\n```\r\n\r\n### Command Line Interface\r\n\r\n```bash\r\n# Show help\r\nautocsv-profiler --help\r\n\r\n# Basic analysis\r\nautocsv-profiler data.csv\r\n\r\n# Custom output directory\r\nautocsv-profiler data.csv --output results/\r\n\r\n# Custom delimiter\r\nautocsv-profiler data.csv --delimiter \";\"\r\n\r\n# Large file processing\r\nautocsv-profiler data.csv --memory-limit 4.0 --chunk-size 20000\r\n\r\n# Non-interactive mode\r\nautocsv-profiler data.csv --non-interactive\r\n\r\n# Debug mode\r\nautocsv-profiler data.csv --debug\r\n```\r\n\r\n### Python API\r\n\r\n```python\r\nimport autocsv_profiler\r\n\r\n# Basic analysis\r\nresult_dir = autocsv_profiler.analyze('data.csv')\r\nprint(f\"Analysis saved to: {result_dir}\")\r\n\r\n# Custom configuration\r\nresult_dir = autocsv_profiler.analyze(\r\n csv_file_path='data.csv',\r\n output_dir='results/',\r\n delimiter=',',\r\n chunk_size=10000,\r\n memory_limit_gb=1\r\n)\r\n\r\n# Interactive mode\r\nresult_dir = autocsv_profiler.analyze(\r\n csv_file_path='data.csv',\r\n interactive=True\r\n)\r\n```\r\n\r\n### Configuration\r\n\r\nEnvironment variables with `AUTOCSV_` prefix:\r\n\r\n```bash\r\n# Performance settings\r\nexport AUTOCSV_PERFORMANCE_MEMORY_LIMIT_GB=2\r\nexport AUTOCSV_PERFORMANCE_CHUNK_SIZE=20000\r\n\r\n# Logging settings\r\nexport AUTOCSV_LOGGING_LEVEL=DEBUG\r\nexport AUTOCSV_LOGGING_CONSOLE_LEVEL=INFO\r\n```\r\n\r\n## Output Files\r\n\r\nAnalysis generates the following files in the output directory:\r\n\r\n**Data Summaries:**\r\n- `dataset_analysis.txt` - Dataset overview and basic statistics\r\n- `numerical_summary.csv` - Summary statistics for numeric columns\r\n- `categorical_summary.csv` - Summary for categorical columns\r\n- `numerical_stats.csv` - Descriptive statistics using researchpy\r\n- `categorical_stats.csv` - Categorical frequency analysis\r\n- `distinct_values.txt` - Unique value counts per column\r\n\r\n**Visualizations:**\r\n- `kde_plots/` - Kernel density estimation plots\r\n- `box_plots/` - Box plots for numerical variables\r\n- `qq_plots/` - Q-Q plots for normality testing\r\n- `bar_charts/` - Bar charts for categorical variables\r\n- `pie_charts/` - Categorical distribution pie charts\r\n\r\n**Process Logs:**\r\n- `autocsv_profiler.log` - Processing log file\r\n\r\n## Documentation\r\n\r\n**User Documentation:**\r\n- [User Guide](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/user-guide.md) - Installation, CLI usage, and examples\r\n- [Configuration](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/configuration.md) - Settings and environment variables\r\n- [Troubleshooting](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/troubleshooting.md) - Problem-solving guide\r\n\r\n**Developer Documentation:**\r\n- [API Reference](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/api-reference.md) - Python API documentation\r\n- [Developer Guide](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/developer-guide.md) - Development workflow and architecture\r\n- [Architecture Diagrams](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/diagrams.md) - Visual system architecture\r\n\r\n**Complete Index:**\r\n- [Documentation Index](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/index.md) - Complete documentation overview\r\n\r\n## Contributing\r\n\r\nContributions are welcome! See [CONTRIBUTING.md](https://github.com/dhaneshbb/autocsv-profiler/blob/master/CONTRIBUTING.md) for guidelines.\r\n\r\n## License\r\n\r\nMIT License - see [LICENSE](https://github.com/dhaneshbb/autocsv-profiler/blob/master/LICENSE) for details.\r\n\r\nThis software includes third-party components. See [NOTICE](https://github.com/dhaneshbb/autocsv-profiler/blob/master/NOTICE) for complete license information.\r\n\r\n## Links\r\n\r\n- **PyPI:** https://pypi.org/project/autocsv-profiler/\r\n- **Repository:** https://github.com/dhaneshbb/autocsv-profiler\r\n- **Documentation:** https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/index.md\r\n- **Issues:** https://github.com/dhaneshbb/autocsv-profiler/issues\r\n- **Changelog:** https://github.com/dhaneshbb/autocsv-profiler/blob/master/CHANGELOG.md\r\n\r\n---\r\n\r\nVersion: 2.0.0 | Status: Beta | Python: 3.8-3.13\r\n\r\nCopyright 2025 dhaneshbb | License: MIT | Homepage: https://github.com/dhaneshbb/autocsv-profiler\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Automated CSV data analysis with statistical profiling and visualization",
"version": "2.0.0",
"project_urls": {
"Changelog": "https://github.com/dhaneshbb/autocsv-profiler/blob/master/CHANGELOG.md",
"Documentation": "https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/index.md",
"Homepage": "https://github.com/dhaneshbb/autocsv-profiler",
"Issues": "https://github.com/dhaneshbb/autocsv-profiler/issues",
"Repository": "https://github.com/dhaneshbb/autocsv-profiler.git"
},
"split_keywords": [
"csv",
" data-analysis",
" profiling",
" statistics",
" pandas",
" data-quality",
" exploratory-analysis",
" visualization",
" statistical-analysis",
" data-profiling",
" automated-analysis",
" csv-analysis",
" data-exploration"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "13f96bf3243ba11486352d31036f27572d9dd896fa08e6900e843ea4c5758495",
"md5": "3546392d2254ac755188596569618b4d",
"sha256": "bd271f2fb909c0dbe5a060beffd8b33f6887f1837acf9eb9763ec15c4add2541"
},
"downloads": -1,
"filename": "autocsv_profiler-2.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3546392d2254ac755188596569618b4d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 154098,
"upload_time": "2025-10-09T11:50:37",
"upload_time_iso_8601": "2025-10-09T11:50:37.822810Z",
"url": "https://files.pythonhosted.org/packages/13/f9/6bf3243ba11486352d31036f27572d9dd896fa08e6900e843ea4c5758495/autocsv_profiler-2.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4f2c05e20e74ca2528a2076b14d43babc2f5d878c19b6f180157bd30d30d0ad6",
"md5": "58903947dd83204c26808e76d92915c4",
"sha256": "05396fcb48448aaeaf19c575edc1e6381a97bc2b36b95e0c1b2cbc53c5ed9fd4"
},
"downloads": -1,
"filename": "autocsv_profiler-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "58903947dd83204c26808e76d92915c4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 148220,
"upload_time": "2025-10-09T11:50:39",
"upload_time_iso_8601": "2025-10-09T11:50:39.001551Z",
"url": "https://files.pythonhosted.org/packages/4f/2c/05e20e74ca2528a2076b14d43babc2f5d878c19b6f180157bd30d30d0ad6/autocsv_profiler-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-09 11:50:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dhaneshbb",
"github_project": "autocsv-profiler",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pandas",
"specs": [
[
"==",
"2.3.1"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"2.2.6"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.10.0"
]
]
},
{
"name": "researchpy",
"specs": [
[
">=",
"0.3.0"
]
]
},
{
"name": "statsmodels",
"specs": [
[
">=",
"0.14.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.5.0"
]
]
},
{
"name": "seaborn",
"specs": [
[
">=",
"0.11.0"
]
]
},
{
"name": "rich",
"specs": [
[
"==",
"14.1.0"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.67.1"
]
]
},
{
"name": "psutil",
"specs": [
[
"==",
"7.0.0"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
">=",
"3.0.0"
]
]
},
{
"name": "pyyaml",
"specs": [
[
"==",
"6.0.2"
]
]
},
{
"name": "tabulate",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "tableone",
"specs": [
[
"==",
"0.9.5"
]
]
}
],
"lcname": "autocsv-profiler"
}