autocsv-profiler


Nameautocsv-profiler JSON
Version 2.0.0 PyPI version JSON
download
home_pageNone
SummaryAutomated CSV data analysis with statistical profiling and visualization
upload_time2025-10-09 11:50:39
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords csv data-analysis profiling statistics pandas data-quality exploratory-analysis visualization statistical-analysis data-profiling automated-analysis csv-analysis data-exploration
VCS
bugtrack_url
requirements pandas numpy scipy researchpy statsmodels matplotlib seaborn rich tqdm psutil charset-normalizer pyyaml tabulate tableone
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # AutoCSV Profiler

A Python toolkit for automated CSV data analysis with statistical profiling and visualization.

[![PyPI version](https://badge.fury.io/py/autocsv-profiler.svg)](https://pypi.org/project/autocsv-profiler/)
[![Python Version](https://img.shields.io/badge/python-3.8--3.13-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![Status](https://img.shields.io/badge/status-beta-orange.svg)](https://github.com/dhaneshbb/autocsv-profiler)

## Overview

AutoCSV Profiler provides automated analysis of CSV files with statistical summaries, data quality assessment, and visualization generation. It features memory-efficient processing, automatic delimiter detection, and a rich console interface.

**Key Features:**
- Interactive analysis mode with step-by-step guidance
- Automatic delimiter detection and encoding validation
- Memory-efficient chunked processing for large files
- Statistical analysis with descriptive statistics and data quality metrics
- Visualization generation (KDE plots, box plots, Q-Q plots, bar charts, pie charts)
- Rich console interface with progress tracking
- Configurable via CLI flags or environment variables

## Installation

**Requirements:** Python 3.8 - 3.13

```bash
pip install autocsv-profiler
```

## Quick Start

**Interactive Mode:**
```bash
autocsv-profiler
```

Step-by-step guidance for first-time users.

**Direct Analysis:**
```bash
autocsv-profiler data.csv
```

Quick analysis with sensible defaults.

## Usage

```bash
# Show help
autocsv-profiler --help
```

### Command Line Interface

```bash
# Show help
autocsv-profiler --help

# Basic analysis
autocsv-profiler data.csv

# Custom output directory
autocsv-profiler data.csv --output results/

# Custom delimiter
autocsv-profiler data.csv --delimiter ";"

# Large file processing
autocsv-profiler data.csv --memory-limit 4.0 --chunk-size 20000

# Non-interactive mode
autocsv-profiler data.csv --non-interactive

# Debug mode
autocsv-profiler data.csv --debug
```

### Python API

```python
import autocsv_profiler

# Basic analysis
result_dir = autocsv_profiler.analyze('data.csv')
print(f"Analysis saved to: {result_dir}")

# Custom configuration
result_dir = autocsv_profiler.analyze(
    csv_file_path='data.csv',
    output_dir='results/',
    delimiter=',',
    chunk_size=10000,
    memory_limit_gb=1
)

# Interactive mode
result_dir = autocsv_profiler.analyze(
    csv_file_path='data.csv',
    interactive=True
)
```

### Configuration

Environment variables with `AUTOCSV_` prefix:

```bash
# Performance settings
export AUTOCSV_PERFORMANCE_MEMORY_LIMIT_GB=2
export AUTOCSV_PERFORMANCE_CHUNK_SIZE=20000

# Logging settings
export AUTOCSV_LOGGING_LEVEL=DEBUG
export AUTOCSV_LOGGING_CONSOLE_LEVEL=INFO
```

## Output Files

Analysis generates the following files in the output directory:

**Data Summaries:**
- `dataset_analysis.txt` - Dataset overview and basic statistics
- `numerical_summary.csv` - Summary statistics for numeric columns
- `categorical_summary.csv` - Summary for categorical columns
- `numerical_stats.csv` - Descriptive statistics using researchpy
- `categorical_stats.csv` - Categorical frequency analysis
- `distinct_values.txt` - Unique value counts per column

**Visualizations:**
- `kde_plots/` - Kernel density estimation plots
- `box_plots/` - Box plots for numerical variables
- `qq_plots/` - Q-Q plots for normality testing
- `bar_charts/` - Bar charts for categorical variables
- `pie_charts/` - Categorical distribution pie charts

**Process Logs:**
- `autocsv_profiler.log` - Processing log file

## Documentation

**User Documentation:**
- [User Guide](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/user-guide.md) - Installation, CLI usage, and examples
- [Configuration](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/configuration.md) - Settings and environment variables
- [Troubleshooting](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/troubleshooting.md) - Problem-solving guide

**Developer Documentation:**
- [API Reference](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/api-reference.md) - Python API documentation
- [Developer Guide](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/developer-guide.md) - Development workflow and architecture
- [Architecture Diagrams](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/diagrams.md) - Visual system architecture

**Complete Index:**
- [Documentation Index](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/index.md) - Complete documentation overview

## Contributing

Contributions are welcome! See [CONTRIBUTING.md](https://github.com/dhaneshbb/autocsv-profiler/blob/master/CONTRIBUTING.md) for guidelines.

## License

MIT License - see [LICENSE](https://github.com/dhaneshbb/autocsv-profiler/blob/master/LICENSE) for details.

This software includes third-party components. See [NOTICE](https://github.com/dhaneshbb/autocsv-profiler/blob/master/NOTICE) for complete license information.

## Links

- **PyPI:** https://pypi.org/project/autocsv-profiler/
- **Repository:** https://github.com/dhaneshbb/autocsv-profiler
- **Documentation:** https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/index.md
- **Issues:** https://github.com/dhaneshbb/autocsv-profiler/issues
- **Changelog:** https://github.com/dhaneshbb/autocsv-profiler/blob/master/CHANGELOG.md

---

Version: 2.0.0 | Status: Beta | Python: 3.8-3.13

Copyright 2025 dhaneshbb | License: MIT | Homepage: https://github.com/dhaneshbb/autocsv-profiler

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "autocsv-profiler",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "dhaneshbb <dhaneshbb5@gmail.com>",
    "keywords": "csv, data-analysis, profiling, statistics, pandas, data-quality, exploratory-analysis, visualization, statistical-analysis, data-profiling, automated-analysis, csv-analysis, data-exploration",
    "author": null,
    "author_email": "dhaneshbb <dhaneshbb5@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/4f/2c/05e20e74ca2528a2076b14d43babc2f5d878c19b6f180157bd30d30d0ad6/autocsv_profiler-2.0.0.tar.gz",
    "platform": null,
    "description": "# AutoCSV Profiler\r\n\r\nA Python toolkit for automated CSV data analysis with statistical profiling and visualization.\r\n\r\n[![PyPI version](https://badge.fury.io/py/autocsv-profiler.svg)](https://pypi.org/project/autocsv-profiler/)\r\n[![Python Version](https://img.shields.io/badge/python-3.8--3.13-blue.svg)](https://www.python.org/downloads/)\r\n[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)\r\n[![Status](https://img.shields.io/badge/status-beta-orange.svg)](https://github.com/dhaneshbb/autocsv-profiler)\r\n\r\n## Overview\r\n\r\nAutoCSV Profiler provides automated analysis of CSV files with statistical summaries, data quality assessment, and visualization generation. It features memory-efficient processing, automatic delimiter detection, and a rich console interface.\r\n\r\n**Key Features:**\r\n- Interactive analysis mode with step-by-step guidance\r\n- Automatic delimiter detection and encoding validation\r\n- Memory-efficient chunked processing for large files\r\n- Statistical analysis with descriptive statistics and data quality metrics\r\n- Visualization generation (KDE plots, box plots, Q-Q plots, bar charts, pie charts)\r\n- Rich console interface with progress tracking\r\n- Configurable via CLI flags or environment variables\r\n\r\n## Installation\r\n\r\n**Requirements:** Python 3.8 - 3.13\r\n\r\n```bash\r\npip install autocsv-profiler\r\n```\r\n\r\n## Quick Start\r\n\r\n**Interactive Mode:**\r\n```bash\r\nautocsv-profiler\r\n```\r\n\r\nStep-by-step guidance for first-time users.\r\n\r\n**Direct Analysis:**\r\n```bash\r\nautocsv-profiler data.csv\r\n```\r\n\r\nQuick analysis with sensible defaults.\r\n\r\n## Usage\r\n\r\n```bash\r\n# Show help\r\nautocsv-profiler --help\r\n```\r\n\r\n### Command Line Interface\r\n\r\n```bash\r\n# Show help\r\nautocsv-profiler --help\r\n\r\n# Basic analysis\r\nautocsv-profiler data.csv\r\n\r\n# Custom output directory\r\nautocsv-profiler data.csv --output results/\r\n\r\n# Custom delimiter\r\nautocsv-profiler data.csv --delimiter \";\"\r\n\r\n# Large file processing\r\nautocsv-profiler data.csv --memory-limit 4.0 --chunk-size 20000\r\n\r\n# Non-interactive mode\r\nautocsv-profiler data.csv --non-interactive\r\n\r\n# Debug mode\r\nautocsv-profiler data.csv --debug\r\n```\r\n\r\n### Python API\r\n\r\n```python\r\nimport autocsv_profiler\r\n\r\n# Basic analysis\r\nresult_dir = autocsv_profiler.analyze('data.csv')\r\nprint(f\"Analysis saved to: {result_dir}\")\r\n\r\n# Custom configuration\r\nresult_dir = autocsv_profiler.analyze(\r\n    csv_file_path='data.csv',\r\n    output_dir='results/',\r\n    delimiter=',',\r\n    chunk_size=10000,\r\n    memory_limit_gb=1\r\n)\r\n\r\n# Interactive mode\r\nresult_dir = autocsv_profiler.analyze(\r\n    csv_file_path='data.csv',\r\n    interactive=True\r\n)\r\n```\r\n\r\n### Configuration\r\n\r\nEnvironment variables with `AUTOCSV_` prefix:\r\n\r\n```bash\r\n# Performance settings\r\nexport AUTOCSV_PERFORMANCE_MEMORY_LIMIT_GB=2\r\nexport AUTOCSV_PERFORMANCE_CHUNK_SIZE=20000\r\n\r\n# Logging settings\r\nexport AUTOCSV_LOGGING_LEVEL=DEBUG\r\nexport AUTOCSV_LOGGING_CONSOLE_LEVEL=INFO\r\n```\r\n\r\n## Output Files\r\n\r\nAnalysis generates the following files in the output directory:\r\n\r\n**Data Summaries:**\r\n- `dataset_analysis.txt` - Dataset overview and basic statistics\r\n- `numerical_summary.csv` - Summary statistics for numeric columns\r\n- `categorical_summary.csv` - Summary for categorical columns\r\n- `numerical_stats.csv` - Descriptive statistics using researchpy\r\n- `categorical_stats.csv` - Categorical frequency analysis\r\n- `distinct_values.txt` - Unique value counts per column\r\n\r\n**Visualizations:**\r\n- `kde_plots/` - Kernel density estimation plots\r\n- `box_plots/` - Box plots for numerical variables\r\n- `qq_plots/` - Q-Q plots for normality testing\r\n- `bar_charts/` - Bar charts for categorical variables\r\n- `pie_charts/` - Categorical distribution pie charts\r\n\r\n**Process Logs:**\r\n- `autocsv_profiler.log` - Processing log file\r\n\r\n## Documentation\r\n\r\n**User Documentation:**\r\n- [User Guide](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/user-guide.md) - Installation, CLI usage, and examples\r\n- [Configuration](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/configuration.md) - Settings and environment variables\r\n- [Troubleshooting](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/troubleshooting.md) - Problem-solving guide\r\n\r\n**Developer Documentation:**\r\n- [API Reference](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/api-reference.md) - Python API documentation\r\n- [Developer Guide](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/developer-guide.md) - Development workflow and architecture\r\n- [Architecture Diagrams](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/diagrams.md) - Visual system architecture\r\n\r\n**Complete Index:**\r\n- [Documentation Index](https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/index.md) - Complete documentation overview\r\n\r\n## Contributing\r\n\r\nContributions are welcome! See [CONTRIBUTING.md](https://github.com/dhaneshbb/autocsv-profiler/blob/master/CONTRIBUTING.md) for guidelines.\r\n\r\n## License\r\n\r\nMIT License - see [LICENSE](https://github.com/dhaneshbb/autocsv-profiler/blob/master/LICENSE) for details.\r\n\r\nThis software includes third-party components. See [NOTICE](https://github.com/dhaneshbb/autocsv-profiler/blob/master/NOTICE) for complete license information.\r\n\r\n## Links\r\n\r\n- **PyPI:** https://pypi.org/project/autocsv-profiler/\r\n- **Repository:** https://github.com/dhaneshbb/autocsv-profiler\r\n- **Documentation:** https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/index.md\r\n- **Issues:** https://github.com/dhaneshbb/autocsv-profiler/issues\r\n- **Changelog:** https://github.com/dhaneshbb/autocsv-profiler/blob/master/CHANGELOG.md\r\n\r\n---\r\n\r\nVersion: 2.0.0 | Status: Beta | Python: 3.8-3.13\r\n\r\nCopyright 2025 dhaneshbb | License: MIT | Homepage: https://github.com/dhaneshbb/autocsv-profiler\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Automated CSV data analysis with statistical profiling and visualization",
    "version": "2.0.0",
    "project_urls": {
        "Changelog": "https://github.com/dhaneshbb/autocsv-profiler/blob/master/CHANGELOG.md",
        "Documentation": "https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/index.md",
        "Homepage": "https://github.com/dhaneshbb/autocsv-profiler",
        "Issues": "https://github.com/dhaneshbb/autocsv-profiler/issues",
        "Repository": "https://github.com/dhaneshbb/autocsv-profiler.git"
    },
    "split_keywords": [
        "csv",
        " data-analysis",
        " profiling",
        " statistics",
        " pandas",
        " data-quality",
        " exploratory-analysis",
        " visualization",
        " statistical-analysis",
        " data-profiling",
        " automated-analysis",
        " csv-analysis",
        " data-exploration"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "13f96bf3243ba11486352d31036f27572d9dd896fa08e6900e843ea4c5758495",
                "md5": "3546392d2254ac755188596569618b4d",
                "sha256": "bd271f2fb909c0dbe5a060beffd8b33f6887f1837acf9eb9763ec15c4add2541"
            },
            "downloads": -1,
            "filename": "autocsv_profiler-2.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3546392d2254ac755188596569618b4d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 154098,
            "upload_time": "2025-10-09T11:50:37",
            "upload_time_iso_8601": "2025-10-09T11:50:37.822810Z",
            "url": "https://files.pythonhosted.org/packages/13/f9/6bf3243ba11486352d31036f27572d9dd896fa08e6900e843ea4c5758495/autocsv_profiler-2.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4f2c05e20e74ca2528a2076b14d43babc2f5d878c19b6f180157bd30d30d0ad6",
                "md5": "58903947dd83204c26808e76d92915c4",
                "sha256": "05396fcb48448aaeaf19c575edc1e6381a97bc2b36b95e0c1b2cbc53c5ed9fd4"
            },
            "downloads": -1,
            "filename": "autocsv_profiler-2.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "58903947dd83204c26808e76d92915c4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 148220,
            "upload_time": "2025-10-09T11:50:39",
            "upload_time_iso_8601": "2025-10-09T11:50:39.001551Z",
            "url": "https://files.pythonhosted.org/packages/4f/2c/05e20e74ca2528a2076b14d43babc2f5d878c19b6f180157bd30d30d0ad6/autocsv_profiler-2.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-09 11:50:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dhaneshbb",
    "github_project": "autocsv-profiler",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.3.1"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "2.2.6"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.10.0"
                ]
            ]
        },
        {
            "name": "researchpy",
            "specs": [
                [
                    ">=",
                    "0.3.0"
                ]
            ]
        },
        {
            "name": "statsmodels",
            "specs": [
                [
                    ">=",
                    "0.14.0"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.5.0"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    ">=",
                    "0.11.0"
                ]
            ]
        },
        {
            "name": "rich",
            "specs": [
                [
                    "==",
                    "14.1.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.67.1"
                ]
            ]
        },
        {
            "name": "psutil",
            "specs": [
                [
                    "==",
                    "7.0.0"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "pyyaml",
            "specs": [
                [
                    "==",
                    "6.0.2"
                ]
            ]
        },
        {
            "name": "tabulate",
            "specs": [
                [
                    "==",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "tableone",
            "specs": [
                [
                    "==",
                    "0.9.5"
                ]
            ]
        }
    ],
    "lcname": "autocsv-profiler"
}
        
Elapsed time: 2.82526s