datatidy


Namedatatidy JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryA powerful, configuration-driven data processing and cleaning package
upload_time2025-08-04 13:38:16
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords etl configuration-driven data cleaning data processing data transformation pandas
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <img src="assets/datatidy-logo-pypi.png" alt="DataTidy Logo" width="300">
  
  <h3>Configuration-Driven Data Processing Made Simple</h3>
  
  [![PyPI version](https://badge.fury.io/py/datatidy.svg)](https://pypi.org/project/datatidy/)
  [![Python versions](https://img.shields.io/pypi/pyversions/datatidy.svg)](https://pypi.org/project/datatidy/)
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
  [![Downloads](https://pepy.tech/badge/datatidy)](https://pepy.tech/project/datatidy)
</div>

# DataTidy

A powerful, configuration-driven data processing and cleaning package for Python with robust fallback capabilities. DataTidy allows you to define complex data transformations, validations, and cleanings through simple YAML configuration files, ensuring 100% reliability in production environments.

## 🚀 Key Features

- **🔧 Configuration-Driven**: Define all transformations in YAML - no code required
- **📊 Multiple Data Sources**: CSV, Excel, databases (PostgreSQL, MySQL, Snowflake, etc.)
- **🔗 Multi-Input Joins**: Combine data from multiple sources with flexible join operations
- **⚡ Advanced Operations**: Map/reduce/filter with lambda functions and chained operations
- **🧠 Dependency Resolution**: Automatic execution order planning for complex transformations
- **📈 Time Series Support**: Lag operations and rolling window calculations
- **🛡️ Safe Expressions**: Secure evaluation with whitelist-based security
- **🎯 Data Validation**: Comprehensive validation rules with detailed error reporting
- **⚙️ CLI Interface**: Easy-to-use command-line tools for batch processing

### 🔄 Enhanced Fallback System (v0.1.0)

- **🛡️ 100% Reliability**: Dashboard never fails to load data with automatic fallback mechanisms
- **⚖️ Graceful Degradation**: Gets sophisticated transformations when possible, basic data when needed
- **🔍 Enhanced Error Logging**: Detailed error categorization with actionable debugging suggestions
- **📊 Data Quality Metrics**: Compare DataTidy results with fallback data for quality assessment
- **🎛️ Multiple Processing Modes**: Strict, partial, and fallback modes for different reliability requirements
- **🔧 Partial Processing**: Skip problematic columns while processing successful ones
- **📋 Processing Recommendations**: Get specific suggestions for improving configurations


## Installation

```bash
pip install datatidy
```

For development installation:
```bash
git clone https://github.com/your-repo/datatidy.git
cd datatidy
pip install -e ".[dev]"
```

## Quick Start

### 1. Create a sample configuration

```bash
datatidy sample config.yaml
```

### 2. Process your data

```bash
datatidy process config.yaml -i input.csv -o output.csv
```

### 3. Or use programmatically

```python
from datatidy import DataTidy

# Initialize with configuration
dt = DataTidy('config.yaml')

# Standard processing
result = dt.process_data('input.csv')

# Enhanced processing with fallback
result = dt.process_data_with_fallback('input.csv')

# Save result
dt.process_and_save('output.csv', 'input.csv')
```

## Configuration Structure

DataTidy uses YAML configuration files to define data processing pipelines:

```yaml
input:
  type: csv                    # csv, excel, database
  source: "data/input.csv"     # file path or SQL query
  options:
    encoding: utf-8
    delimiter: ","

output:
  columns:
    user_id:
      source: "id"             # Source column name
      type: int                # Data type conversion
      validation:
        required: true
        min_value: 1
    
    full_name:
      source: "name"
      type: string
      transformation: "str.title()"  # Python expression
      validation:
        required: true
        min_length: 2
        max_length: 100
    
    age_group:
      transformation: "'adult' if age >= 18 else 'minor'"
      type: string
      validation:
        allowed_values: ["adult", "minor"]

  filters:
    - condition: "age >= 0"
      action: keep

  sort:
    - column: user_id
      ascending: true

global_settings:
  ignore_errors: false
  max_errors: 100
  
  # Enhanced fallback settings
  processing_mode: partial           # strict, partial, or fallback
  enable_partial_processing: true
  enable_fallback: true
  max_column_failures: 5
  failure_threshold: 0.3             # 30% failure rate triggers fallback
  
  # Fallback transformations for problematic columns
  fallback_transformations:
    age_group:
      type: default_value
      value: "unknown"
```

## Examples

### Basic CSV Processing

```python
from datatidy import DataTidy

config = {
    "input": {
        "type": "csv",
        "source": "users.csv"
    },
    "output": {
        "columns": {
            "clean_name": {
                "source": "name",
                "transformation": "str.strip().title()",
                "type": "string"
            },
            "age_category": {
                "transformation": "'senior' if age > 65 else ('adult' if age >= 18 else 'minor')",
                "type": "string"
            }
        }
    }
}

dt = DataTidy()
dt.load_config(config)
result = dt.process_data()
print(result)
```

### Database Processing

```yaml
input:
  type: database
  source: 
    query: "SELECT * FROM users WHERE active = true"
    connection_string: "postgresql://user:pass@localhost/db"

output:
  columns:
    user_email:
      source: "email"
      type: string
      validation:
        pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
    
    signup_date:
      source: "created_at"
      type: datetime
      format: "%Y-%m-%d"
```

### Excel Processing with Complex Transformations

```yaml
input:
  type: excel
  source:
    path: "sales_data.xlsx"
    sheet_name: "Q1_Sales"
    options:
      header: 0
      skiprows: 2

output:
  columns:
    revenue_category:
      transformation: |
        'high' if revenue > 100000 else (
          'medium' if revenue > 50000 else 'low'
        )
      validation:
        allowed_values: ["high", "medium", "low"]
    
    formatted_date:
      source: "sale_date"
      type: datetime
      format: "%Y-%m-%d"
    
    clean_product_name:
      source: "product"
      transformation: "str.strip().upper().replace('_', ' ')"
      validation:
        min_length: 1
        max_length: 50

  filters:
    - condition: "revenue > 0"
      action: keep
    - condition: "product != 'DELETED'"
      action: keep
```

## Enhanced Fallback Processing

### Production-Ready Data Processing
```python
from datatidy import DataTidy

# Initialize with fallback-enabled configuration
dt = DataTidy('config.yaml')

# Define fallback database query
def fallback_database_query():
    return pd.read_sql("SELECT * FROM facilities", db_connection)

# Process with guaranteed results
result = dt.process_data_with_fallback(
    data=input_df,
    fallback_query_func=fallback_database_query
)

# Your application always gets data!
if result.fallback_used:
    logger.warning("DataTidy processing failed, using database fallback")

# Check processing results
summary = dt.get_processing_summary()
print(f"Success: {summary['success']}")
print(f"Successful columns: {summary['successful_columns']}")
print(f"Failed columns: {summary['failed_columns']}")

# Get improvement recommendations
recommendations = dt.get_processing_recommendations()
for rec in recommendations:
    print(f"💡 {rec}")

# Compare data quality when both available
if not result.fallback_used:
    fallback_data = fallback_database_query()
    quality = dt.compare_with_fallback(fallback_data)
    print(f"Overall quality score: {quality.overall_quality_score:.2f}")
```

### Data Quality Monitoring
```python
from datatidy.fallback.metrics import DataQualityMetrics

# Compare processing results
comparison = DataQualityMetrics.compare_results(
    datatidy_df=processed_data,
    fallback_df=fallback_data,
    datatidy_time=2.3,
    fallback_time=0.8
)

# Print detailed comparison
DataQualityMetrics.print_comparison_summary(comparison)

# Export for analysis
DataQualityMetrics.export_comparison_report(
    comparison, 
    'quality_report.json'
)
```

## Command Line Usage

### Enhanced Processing Modes
```bash
# Strict mode (default) - fails on any error
datatidy process config.yaml --mode strict

# Partial mode - skip problematic columns
datatidy process config.yaml --mode partial --show-summary

# Fallback mode - use fallback transformations
datatidy process config.yaml --mode fallback

# Development mode with detailed feedback
datatidy process config.yaml --mode partial \\
  --show-summary \\
  --show-recommendations \\
  --error-log debug.json
```

### Process Data
```bash
# Basic processing
datatidy process config.yaml

# With input/output files
datatidy process config.yaml -i input.csv -o output.csv

# Ignore validation errors
datatidy process config.yaml --ignore-errors
```

### Validate Configuration
```bash
datatidy validate config.yaml
```

### Create Sample Configuration
```bash
datatidy sample my_config.yaml
```

## Expression System

DataTidy includes a safe expression parser that supports:

### Basic Operations
- Arithmetic: `+`, `-`, `*`, `/`, `//`, `%`, `**`
- Comparison: `==`, `!=`, `<`, `<=`, `>`, `>=`
- Logical: `and`, `or`, `not`
- Membership: `in`, `not in`

### Functions
- Type conversion: `str()`, `int()`, `float()`, `bool()`
- Math: `abs()`, `max()`, `min()`, `round()`
- String methods: `upper()`, `lower()`, `strip()`, `replace()`, etc.

### Examples
```yaml
transformations:
  # Conditional expressions
  status: "'active' if last_login_days < 30 else 'inactive'"
  
  # String operations
  clean_name: "name.strip().title()"
  
  # Mathematical calculations
  bmi: "weight / (height / 100) ** 2"
  
  # Complex conditions
  risk_level: |
    'high' if (age > 65 and income < 30000) else (
      'medium' if age > 40 else 'low'
    )
```

## Validation Rules

DataTidy supports comprehensive validation:

```yaml
validation:
  required: true              # Field must not be null
  nullable: false             # Field cannot be null
  min_value: 0               # Minimum numeric value
  max_value: 100             # Maximum numeric value
  min_length: 2              # Minimum string length
  max_length: 50             # Maximum string length
  pattern: "^[A-Za-z]+$"     # Regex pattern
  allowed_values: ["A", "B"] # Whitelist of values
```

## Error Handling

```python
dt = DataTidy('config.yaml')
result = dt.process_data('input.csv')

# Check for errors
if dt.has_errors():
    for error in dt.get_errors():
        print(f"Error: {error['message']}")
```

## API Reference

### DataTidy Class

#### Core Methods
- `load_config(config)`: Load configuration from file or dict
- `process_data(data=None)`: Process data according to configuration
- `process_and_save(output_path, data=None)`: Process and save data
- `get_errors()`: Get list of processing errors
- `has_errors()`: Check if errors occurred

#### Enhanced Fallback Methods
- `process_data_with_fallback(data=None, fallback_query_func=None)`: Process with fallback capabilities
- `get_processing_summary()`: Get detailed processing summary with metrics
- `get_error_report()`: Get categorized error report with debugging info
- `get_processing_recommendations()`: Get actionable recommendations for improvements
- `compare_with_fallback(fallback_df)`: Compare DataTidy results with fallback data
- `export_error_log(file_path)`: Export detailed error log to JSON
- `set_processing_mode(mode)`: Set processing mode (strict, partial, fallback)

### Processing Result Class

#### Properties
- `success`: Boolean indicating overall processing success
- `data`: Processed DataFrame result
- `processing_mode`: Mode used for processing
- `successful_columns`: List of successfully processed columns
- `failed_columns`: List of failed columns
- `fallback_used`: Boolean indicating if fallback was activated
- `processing_time`: Time taken for processing
- `error_log`: Detailed list of processing errors

### Data Quality Metrics

#### Static Methods
- `DataQualityMetrics.compare_results(datatidy_df, fallback_df)`: Compare two DataFrames
- `DataQualityMetrics.print_comparison_summary(comparison)`: Print formatted comparison
- `DataQualityMetrics.export_comparison_report(comparison, file_path)`: Export report to JSON

### Configuration Schema

See [Configuration Reference](docs/configuration.md) for complete schema documentation.

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Submit a pull request

## License

MIT License - see LICENSE file for details.

## Changelog

### Version 0.1.0
- Initial release
- Basic CSV, Excel, and database support
- Safe expression engine
- Comprehensive validation system
- CLI interface
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "datatidy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Wendi Wang <wwd1015@gmail.com>",
    "keywords": "ETL, configuration-driven, data cleaning, data processing, data transformation, pandas",
    "author": null,
    "author_email": "Wendi Wang <wwd1015@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/71/20/2190e5b148dffc17906b2bea0ab17e5df03022815586e1e9c48326126a51/datatidy-0.1.1.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <img src=\"assets/datatidy-logo-pypi.png\" alt=\"DataTidy Logo\" width=\"300\">\n  \n  <h3>Configuration-Driven Data Processing Made Simple</h3>\n  \n  [![PyPI version](https://badge.fury.io/py/datatidy.svg)](https://pypi.org/project/datatidy/)\n  [![Python versions](https://img.shields.io/pypi/pyversions/datatidy.svg)](https://pypi.org/project/datatidy/)\n  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n  [![Downloads](https://pepy.tech/badge/datatidy)](https://pepy.tech/project/datatidy)\n</div>\n\n# DataTidy\n\nA powerful, configuration-driven data processing and cleaning package for Python with robust fallback capabilities. DataTidy allows you to define complex data transformations, validations, and cleanings through simple YAML configuration files, ensuring 100% reliability in production environments.\n\n## \ud83d\ude80 Key Features\n\n- **\ud83d\udd27 Configuration-Driven**: Define all transformations in YAML - no code required\n- **\ud83d\udcca Multiple Data Sources**: CSV, Excel, databases (PostgreSQL, MySQL, Snowflake, etc.)\n- **\ud83d\udd17 Multi-Input Joins**: Combine data from multiple sources with flexible join operations\n- **\u26a1 Advanced Operations**: Map/reduce/filter with lambda functions and chained operations\n- **\ud83e\udde0 Dependency Resolution**: Automatic execution order planning for complex transformations\n- **\ud83d\udcc8 Time Series Support**: Lag operations and rolling window calculations\n- **\ud83d\udee1\ufe0f Safe Expressions**: Secure evaluation with whitelist-based security\n- **\ud83c\udfaf Data Validation**: Comprehensive validation rules with detailed error reporting\n- **\u2699\ufe0f CLI Interface**: Easy-to-use command-line tools for batch processing\n\n### \ud83d\udd04 Enhanced Fallback System (v0.1.0)\n\n- **\ud83d\udee1\ufe0f 100% Reliability**: Dashboard never fails to load data with automatic fallback mechanisms\n- **\u2696\ufe0f Graceful Degradation**: Gets sophisticated transformations when possible, basic data when needed\n- **\ud83d\udd0d Enhanced Error Logging**: Detailed error categorization with actionable debugging suggestions\n- **\ud83d\udcca Data Quality Metrics**: Compare DataTidy results with fallback data for quality assessment\n- **\ud83c\udf9b\ufe0f Multiple Processing Modes**: Strict, partial, and fallback modes for different reliability requirements\n- **\ud83d\udd27 Partial Processing**: Skip problematic columns while processing successful ones\n- **\ud83d\udccb Processing Recommendations**: Get specific suggestions for improving configurations\n\n\n## Installation\n\n```bash\npip install datatidy\n```\n\nFor development installation:\n```bash\ngit clone https://github.com/your-repo/datatidy.git\ncd datatidy\npip install -e \".[dev]\"\n```\n\n## Quick Start\n\n### 1. Create a sample configuration\n\n```bash\ndatatidy sample config.yaml\n```\n\n### 2. Process your data\n\n```bash\ndatatidy process config.yaml -i input.csv -o output.csv\n```\n\n### 3. Or use programmatically\n\n```python\nfrom datatidy import DataTidy\n\n# Initialize with configuration\ndt = DataTidy('config.yaml')\n\n# Standard processing\nresult = dt.process_data('input.csv')\n\n# Enhanced processing with fallback\nresult = dt.process_data_with_fallback('input.csv')\n\n# Save result\ndt.process_and_save('output.csv', 'input.csv')\n```\n\n## Configuration Structure\n\nDataTidy uses YAML configuration files to define data processing pipelines:\n\n```yaml\ninput:\n  type: csv                    # csv, excel, database\n  source: \"data/input.csv\"     # file path or SQL query\n  options:\n    encoding: utf-8\n    delimiter: \",\"\n\noutput:\n  columns:\n    user_id:\n      source: \"id\"             # Source column name\n      type: int                # Data type conversion\n      validation:\n        required: true\n        min_value: 1\n    \n    full_name:\n      source: \"name\"\n      type: string\n      transformation: \"str.title()\"  # Python expression\n      validation:\n        required: true\n        min_length: 2\n        max_length: 100\n    \n    age_group:\n      transformation: \"'adult' if age >= 18 else 'minor'\"\n      type: string\n      validation:\n        allowed_values: [\"adult\", \"minor\"]\n\n  filters:\n    - condition: \"age >= 0\"\n      action: keep\n\n  sort:\n    - column: user_id\n      ascending: true\n\nglobal_settings:\n  ignore_errors: false\n  max_errors: 100\n  \n  # Enhanced fallback settings\n  processing_mode: partial           # strict, partial, or fallback\n  enable_partial_processing: true\n  enable_fallback: true\n  max_column_failures: 5\n  failure_threshold: 0.3             # 30% failure rate triggers fallback\n  \n  # Fallback transformations for problematic columns\n  fallback_transformations:\n    age_group:\n      type: default_value\n      value: \"unknown\"\n```\n\n## Examples\n\n### Basic CSV Processing\n\n```python\nfrom datatidy import DataTidy\n\nconfig = {\n    \"input\": {\n        \"type\": \"csv\",\n        \"source\": \"users.csv\"\n    },\n    \"output\": {\n        \"columns\": {\n            \"clean_name\": {\n                \"source\": \"name\",\n                \"transformation\": \"str.strip().title()\",\n                \"type\": \"string\"\n            },\n            \"age_category\": {\n                \"transformation\": \"'senior' if age > 65 else ('adult' if age >= 18 else 'minor')\",\n                \"type\": \"string\"\n            }\n        }\n    }\n}\n\ndt = DataTidy()\ndt.load_config(config)\nresult = dt.process_data()\nprint(result)\n```\n\n### Database Processing\n\n```yaml\ninput:\n  type: database\n  source: \n    query: \"SELECT * FROM users WHERE active = true\"\n    connection_string: \"postgresql://user:pass@localhost/db\"\n\noutput:\n  columns:\n    user_email:\n      source: \"email\"\n      type: string\n      validation:\n        pattern: \"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\\\.[a-zA-Z]{2,}$\"\n    \n    signup_date:\n      source: \"created_at\"\n      type: datetime\n      format: \"%Y-%m-%d\"\n```\n\n### Excel Processing with Complex Transformations\n\n```yaml\ninput:\n  type: excel\n  source:\n    path: \"sales_data.xlsx\"\n    sheet_name: \"Q1_Sales\"\n    options:\n      header: 0\n      skiprows: 2\n\noutput:\n  columns:\n    revenue_category:\n      transformation: |\n        'high' if revenue > 100000 else (\n          'medium' if revenue > 50000 else 'low'\n        )\n      validation:\n        allowed_values: [\"high\", \"medium\", \"low\"]\n    \n    formatted_date:\n      source: \"sale_date\"\n      type: datetime\n      format: \"%Y-%m-%d\"\n    \n    clean_product_name:\n      source: \"product\"\n      transformation: \"str.strip().upper().replace('_', ' ')\"\n      validation:\n        min_length: 1\n        max_length: 50\n\n  filters:\n    - condition: \"revenue > 0\"\n      action: keep\n    - condition: \"product != 'DELETED'\"\n      action: keep\n```\n\n## Enhanced Fallback Processing\n\n### Production-Ready Data Processing\n```python\nfrom datatidy import DataTidy\n\n# Initialize with fallback-enabled configuration\ndt = DataTidy('config.yaml')\n\n# Define fallback database query\ndef fallback_database_query():\n    return pd.read_sql(\"SELECT * FROM facilities\", db_connection)\n\n# Process with guaranteed results\nresult = dt.process_data_with_fallback(\n    data=input_df,\n    fallback_query_func=fallback_database_query\n)\n\n# Your application always gets data!\nif result.fallback_used:\n    logger.warning(\"DataTidy processing failed, using database fallback\")\n\n# Check processing results\nsummary = dt.get_processing_summary()\nprint(f\"Success: {summary['success']}\")\nprint(f\"Successful columns: {summary['successful_columns']}\")\nprint(f\"Failed columns: {summary['failed_columns']}\")\n\n# Get improvement recommendations\nrecommendations = dt.get_processing_recommendations()\nfor rec in recommendations:\n    print(f\"\ud83d\udca1 {rec}\")\n\n# Compare data quality when both available\nif not result.fallback_used:\n    fallback_data = fallback_database_query()\n    quality = dt.compare_with_fallback(fallback_data)\n    print(f\"Overall quality score: {quality.overall_quality_score:.2f}\")\n```\n\n### Data Quality Monitoring\n```python\nfrom datatidy.fallback.metrics import DataQualityMetrics\n\n# Compare processing results\ncomparison = DataQualityMetrics.compare_results(\n    datatidy_df=processed_data,\n    fallback_df=fallback_data,\n    datatidy_time=2.3,\n    fallback_time=0.8\n)\n\n# Print detailed comparison\nDataQualityMetrics.print_comparison_summary(comparison)\n\n# Export for analysis\nDataQualityMetrics.export_comparison_report(\n    comparison, \n    'quality_report.json'\n)\n```\n\n## Command Line Usage\n\n### Enhanced Processing Modes\n```bash\n# Strict mode (default) - fails on any error\ndatatidy process config.yaml --mode strict\n\n# Partial mode - skip problematic columns\ndatatidy process config.yaml --mode partial --show-summary\n\n# Fallback mode - use fallback transformations\ndatatidy process config.yaml --mode fallback\n\n# Development mode with detailed feedback\ndatatidy process config.yaml --mode partial \\\\\n  --show-summary \\\\\n  --show-recommendations \\\\\n  --error-log debug.json\n```\n\n### Process Data\n```bash\n# Basic processing\ndatatidy process config.yaml\n\n# With input/output files\ndatatidy process config.yaml -i input.csv -o output.csv\n\n# Ignore validation errors\ndatatidy process config.yaml --ignore-errors\n```\n\n### Validate Configuration\n```bash\ndatatidy validate config.yaml\n```\n\n### Create Sample Configuration\n```bash\ndatatidy sample my_config.yaml\n```\n\n## Expression System\n\nDataTidy includes a safe expression parser that supports:\n\n### Basic Operations\n- Arithmetic: `+`, `-`, `*`, `/`, `//`, `%`, `**`\n- Comparison: `==`, `!=`, `<`, `<=`, `>`, `>=`\n- Logical: `and`, `or`, `not`\n- Membership: `in`, `not in`\n\n### Functions\n- Type conversion: `str()`, `int()`, `float()`, `bool()`\n- Math: `abs()`, `max()`, `min()`, `round()`\n- String methods: `upper()`, `lower()`, `strip()`, `replace()`, etc.\n\n### Examples\n```yaml\ntransformations:\n  # Conditional expressions\n  status: \"'active' if last_login_days < 30 else 'inactive'\"\n  \n  # String operations\n  clean_name: \"name.strip().title()\"\n  \n  # Mathematical calculations\n  bmi: \"weight / (height / 100) ** 2\"\n  \n  # Complex conditions\n  risk_level: |\n    'high' if (age > 65 and income < 30000) else (\n      'medium' if age > 40 else 'low'\n    )\n```\n\n## Validation Rules\n\nDataTidy supports comprehensive validation:\n\n```yaml\nvalidation:\n  required: true              # Field must not be null\n  nullable: false             # Field cannot be null\n  min_value: 0               # Minimum numeric value\n  max_value: 100             # Maximum numeric value\n  min_length: 2              # Minimum string length\n  max_length: 50             # Maximum string length\n  pattern: \"^[A-Za-z]+$\"     # Regex pattern\n  allowed_values: [\"A\", \"B\"] # Whitelist of values\n```\n\n## Error Handling\n\n```python\ndt = DataTidy('config.yaml')\nresult = dt.process_data('input.csv')\n\n# Check for errors\nif dt.has_errors():\n    for error in dt.get_errors():\n        print(f\"Error: {error['message']}\")\n```\n\n## API Reference\n\n### DataTidy Class\n\n#### Core Methods\n- `load_config(config)`: Load configuration from file or dict\n- `process_data(data=None)`: Process data according to configuration\n- `process_and_save(output_path, data=None)`: Process and save data\n- `get_errors()`: Get list of processing errors\n- `has_errors()`: Check if errors occurred\n\n#### Enhanced Fallback Methods\n- `process_data_with_fallback(data=None, fallback_query_func=None)`: Process with fallback capabilities\n- `get_processing_summary()`: Get detailed processing summary with metrics\n- `get_error_report()`: Get categorized error report with debugging info\n- `get_processing_recommendations()`: Get actionable recommendations for improvements\n- `compare_with_fallback(fallback_df)`: Compare DataTidy results with fallback data\n- `export_error_log(file_path)`: Export detailed error log to JSON\n- `set_processing_mode(mode)`: Set processing mode (strict, partial, fallback)\n\n### Processing Result Class\n\n#### Properties\n- `success`: Boolean indicating overall processing success\n- `data`: Processed DataFrame result\n- `processing_mode`: Mode used for processing\n- `successful_columns`: List of successfully processed columns\n- `failed_columns`: List of failed columns\n- `fallback_used`: Boolean indicating if fallback was activated\n- `processing_time`: Time taken for processing\n- `error_log`: Detailed list of processing errors\n\n### Data Quality Metrics\n\n#### Static Methods\n- `DataQualityMetrics.compare_results(datatidy_df, fallback_df)`: Compare two DataFrames\n- `DataQualityMetrics.print_comparison_summary(comparison)`: Print formatted comparison\n- `DataQualityMetrics.export_comparison_report(comparison, file_path)`: Export report to JSON\n\n### Configuration Schema\n\nSee [Configuration Reference](docs/configuration.md) for complete schema documentation.\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests\n5. Submit a pull request\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Changelog\n\n### Version 0.1.0\n- Initial release\n- Basic CSV, Excel, and database support\n- Safe expression engine\n- Comprehensive validation system\n- CLI interface",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A powerful, configuration-driven data processing and cleaning package",
    "version": "0.1.1",
    "project_urls": {
        "Bug Reports": "https://github.com/wwd1015/datatidy/issues",
        "Changelog": "https://github.com/wwd1015/datatidy/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/wwd1015/datatidy#readme",
        "Homepage": "https://github.com/wwd1015/datatidy",
        "Repository": "https://github.com/wwd1015/datatidy"
    },
    "split_keywords": [
        "etl",
        " configuration-driven",
        " data cleaning",
        " data processing",
        " data transformation",
        " pandas"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "63dd29f58e8050d2c99a51b242c05058e3719d1512b7ac50730852381ed3b330",
                "md5": "ad19a233bc3e2a37c6b99237d9cc162d",
                "sha256": "8d88dd3b45ebded706f4f27085aafaf738b833071bff0073bf9d9843ddaede83"
            },
            "downloads": -1,
            "filename": "datatidy-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ad19a233bc3e2a37c6b99237d9cc162d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 43179,
            "upload_time": "2025-08-04T13:38:15",
            "upload_time_iso_8601": "2025-08-04T13:38:15.045109Z",
            "url": "https://files.pythonhosted.org/packages/63/dd/29f58e8050d2c99a51b242c05058e3719d1512b7ac50730852381ed3b330/datatidy-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "71202190e5b148dffc17906b2bea0ab17e5df03022815586e1e9c48326126a51",
                "md5": "737b4d8fd11faad26a1e5ca412cefd41",
                "sha256": "18661da4c88877766e86235f60f2669a3b08b7bddd01642c2c56c76ce3b0e986"
            },
            "downloads": -1,
            "filename": "datatidy-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "737b4d8fd11faad26a1e5ca412cefd41",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 401260,
            "upload_time": "2025-08-04T13:38:16",
            "upload_time_iso_8601": "2025-08-04T13:38:16.119192Z",
            "url": "https://files.pythonhosted.org/packages/71/20/2190e5b148dffc17906b2bea0ab17e5df03022815586e1e9c48326126a51/datatidy-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-04 13:38:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wwd1015",
    "github_project": "datatidy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "datatidy"
}
        
Elapsed time: 8.97105s