<div align="center">
<img src="assets/datatidy-logo-pypi.png" alt="DataTidy Logo" width="300">
<h3>Configuration-Driven Data Processing Made Simple</h3>
[](https://pypi.org/project/datatidy/)
[](https://pypi.org/project/datatidy/)
[](https://opensource.org/licenses/MIT)
[](https://pepy.tech/project/datatidy)
</div>
# DataTidy
A powerful, configuration-driven data processing and cleaning package for Python with robust fallback capabilities. DataTidy allows you to define complex data transformations, validations, and cleanings through simple YAML configuration files, ensuring 100% reliability in production environments.
## 🚀 Key Features
- **🔧 Configuration-Driven**: Define all transformations in YAML - no code required
- **📊 Multiple Data Sources**: CSV, Excel, databases (PostgreSQL, MySQL, Snowflake, etc.)
- **🔗 Multi-Input Joins**: Combine data from multiple sources with flexible join operations
- **⚡ Advanced Operations**: Map/reduce/filter with lambda functions and chained operations
- **🧠 Dependency Resolution**: Automatic execution order planning for complex transformations
- **📈 Time Series Support**: Lag operations and rolling window calculations
- **🛡️ Safe Expressions**: Secure evaluation with whitelist-based security
- **🎯 Data Validation**: Comprehensive validation rules with detailed error reporting
- **⚙️ CLI Interface**: Easy-to-use command-line tools for batch processing
### 🔄 Enhanced Fallback System (v0.1.0)
- **🛡️ 100% Reliability**: Dashboard never fails to load data with automatic fallback mechanisms
- **⚖️ Graceful Degradation**: Gets sophisticated transformations when possible, basic data when needed
- **🔍 Enhanced Error Logging**: Detailed error categorization with actionable debugging suggestions
- **📊 Data Quality Metrics**: Compare DataTidy results with fallback data for quality assessment
- **🎛️ Multiple Processing Modes**: Strict, partial, and fallback modes for different reliability requirements
- **🔧 Partial Processing**: Skip problematic columns while processing successful ones
- **📋 Processing Recommendations**: Get specific suggestions for improving configurations
## Installation
```bash
pip install datatidy
```
For development installation:
```bash
git clone https://github.com/your-repo/datatidy.git
cd datatidy
pip install -e ".[dev]"
```
## Quick Start
### 1. Create a sample configuration
```bash
datatidy sample config.yaml
```
### 2. Process your data
```bash
datatidy process config.yaml -i input.csv -o output.csv
```
### 3. Or use programmatically
```python
from datatidy import DataTidy
# Initialize with configuration
dt = DataTidy('config.yaml')
# Standard processing
result = dt.process_data('input.csv')
# Enhanced processing with fallback
result = dt.process_data_with_fallback('input.csv')
# Save result
dt.process_and_save('output.csv', 'input.csv')
```
## Configuration Structure
DataTidy uses YAML configuration files to define data processing pipelines:
```yaml
input:
type: csv # csv, excel, database
source: "data/input.csv" # file path or SQL query
options:
encoding: utf-8
delimiter: ","
output:
columns:
user_id:
source: "id" # Source column name
type: int # Data type conversion
validation:
required: true
min_value: 1
full_name:
source: "name"
type: string
transformation: "str.title()" # Python expression
validation:
required: true
min_length: 2
max_length: 100
age_group:
transformation: "'adult' if age >= 18 else 'minor'"
type: string
validation:
allowed_values: ["adult", "minor"]
filters:
- condition: "age >= 0"
action: keep
sort:
- column: user_id
ascending: true
global_settings:
ignore_errors: false
max_errors: 100
# Enhanced fallback settings
processing_mode: partial # strict, partial, or fallback
enable_partial_processing: true
enable_fallback: true
max_column_failures: 5
failure_threshold: 0.3 # 30% failure rate triggers fallback
# Fallback transformations for problematic columns
fallback_transformations:
age_group:
type: default_value
value: "unknown"
```
## Examples
### Basic CSV Processing
```python
from datatidy import DataTidy
config = {
"input": {
"type": "csv",
"source": "users.csv"
},
"output": {
"columns": {
"clean_name": {
"source": "name",
"transformation": "str.strip().title()",
"type": "string"
},
"age_category": {
"transformation": "'senior' if age > 65 else ('adult' if age >= 18 else 'minor')",
"type": "string"
}
}
}
}
dt = DataTidy()
dt.load_config(config)
result = dt.process_data()
print(result)
```
### Database Processing
```yaml
input:
type: database
source:
query: "SELECT * FROM users WHERE active = true"
connection_string: "postgresql://user:pass@localhost/db"
output:
columns:
user_email:
source: "email"
type: string
validation:
pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
signup_date:
source: "created_at"
type: datetime
format: "%Y-%m-%d"
```
### Excel Processing with Complex Transformations
```yaml
input:
type: excel
source:
path: "sales_data.xlsx"
sheet_name: "Q1_Sales"
options:
header: 0
skiprows: 2
output:
columns:
revenue_category:
transformation: |
'high' if revenue > 100000 else (
'medium' if revenue > 50000 else 'low'
)
validation:
allowed_values: ["high", "medium", "low"]
formatted_date:
source: "sale_date"
type: datetime
format: "%Y-%m-%d"
clean_product_name:
source: "product"
transformation: "str.strip().upper().replace('_', ' ')"
validation:
min_length: 1
max_length: 50
filters:
- condition: "revenue > 0"
action: keep
- condition: "product != 'DELETED'"
action: keep
```
## Enhanced Fallback Processing
### Production-Ready Data Processing
```python
from datatidy import DataTidy
# Initialize with fallback-enabled configuration
dt = DataTidy('config.yaml')
# Define fallback database query
def fallback_database_query():
return pd.read_sql("SELECT * FROM facilities", db_connection)
# Process with guaranteed results
result = dt.process_data_with_fallback(
data=input_df,
fallback_query_func=fallback_database_query
)
# Your application always gets data!
if result.fallback_used:
logger.warning("DataTidy processing failed, using database fallback")
# Check processing results
summary = dt.get_processing_summary()
print(f"Success: {summary['success']}")
print(f"Successful columns: {summary['successful_columns']}")
print(f"Failed columns: {summary['failed_columns']}")
# Get improvement recommendations
recommendations = dt.get_processing_recommendations()
for rec in recommendations:
print(f"💡 {rec}")
# Compare data quality when both available
if not result.fallback_used:
fallback_data = fallback_database_query()
quality = dt.compare_with_fallback(fallback_data)
print(f"Overall quality score: {quality.overall_quality_score:.2f}")
```
### Data Quality Monitoring
```python
from datatidy.fallback.metrics import DataQualityMetrics
# Compare processing results
comparison = DataQualityMetrics.compare_results(
datatidy_df=processed_data,
fallback_df=fallback_data,
datatidy_time=2.3,
fallback_time=0.8
)
# Print detailed comparison
DataQualityMetrics.print_comparison_summary(comparison)
# Export for analysis
DataQualityMetrics.export_comparison_report(
comparison,
'quality_report.json'
)
```
## Command Line Usage
### Enhanced Processing Modes
```bash
# Strict mode (default) - fails on any error
datatidy process config.yaml --mode strict
# Partial mode - skip problematic columns
datatidy process config.yaml --mode partial --show-summary
# Fallback mode - use fallback transformations
datatidy process config.yaml --mode fallback
# Development mode with detailed feedback
datatidy process config.yaml --mode partial \\
--show-summary \\
--show-recommendations \\
--error-log debug.json
```
### Process Data
```bash
# Basic processing
datatidy process config.yaml
# With input/output files
datatidy process config.yaml -i input.csv -o output.csv
# Ignore validation errors
datatidy process config.yaml --ignore-errors
```
### Validate Configuration
```bash
datatidy validate config.yaml
```
### Create Sample Configuration
```bash
datatidy sample my_config.yaml
```
## Expression System
DataTidy includes a safe expression parser that supports:
### Basic Operations
- Arithmetic: `+`, `-`, `*`, `/`, `//`, `%`, `**`
- Comparison: `==`, `!=`, `<`, `<=`, `>`, `>=`
- Logical: `and`, `or`, `not`
- Membership: `in`, `not in`
### Functions
- Type conversion: `str()`, `int()`, `float()`, `bool()`
- Math: `abs()`, `max()`, `min()`, `round()`
- String methods: `upper()`, `lower()`, `strip()`, `replace()`, etc.
### Examples
```yaml
transformations:
# Conditional expressions
status: "'active' if last_login_days < 30 else 'inactive'"
# String operations
clean_name: "name.strip().title()"
# Mathematical calculations
bmi: "weight / (height / 100) ** 2"
# Complex conditions
risk_level: |
'high' if (age > 65 and income < 30000) else (
'medium' if age > 40 else 'low'
)
```
## Validation Rules
DataTidy supports comprehensive validation:
```yaml
validation:
required: true # Field must not be null
nullable: false # Field cannot be null
min_value: 0 # Minimum numeric value
max_value: 100 # Maximum numeric value
min_length: 2 # Minimum string length
max_length: 50 # Maximum string length
pattern: "^[A-Za-z]+$" # Regex pattern
allowed_values: ["A", "B"] # Whitelist of values
```
## Error Handling
```python
dt = DataTidy('config.yaml')
result = dt.process_data('input.csv')
# Check for errors
if dt.has_errors():
for error in dt.get_errors():
print(f"Error: {error['message']}")
```
## API Reference
### DataTidy Class
#### Core Methods
- `load_config(config)`: Load configuration from file or dict
- `process_data(data=None)`: Process data according to configuration
- `process_and_save(output_path, data=None)`: Process and save data
- `get_errors()`: Get list of processing errors
- `has_errors()`: Check if errors occurred
#### Enhanced Fallback Methods
- `process_data_with_fallback(data=None, fallback_query_func=None)`: Process with fallback capabilities
- `get_processing_summary()`: Get detailed processing summary with metrics
- `get_error_report()`: Get categorized error report with debugging info
- `get_processing_recommendations()`: Get actionable recommendations for improvements
- `compare_with_fallback(fallback_df)`: Compare DataTidy results with fallback data
- `export_error_log(file_path)`: Export detailed error log to JSON
- `set_processing_mode(mode)`: Set processing mode (strict, partial, fallback)
### Processing Result Class
#### Properties
- `success`: Boolean indicating overall processing success
- `data`: Processed DataFrame result
- `processing_mode`: Mode used for processing
- `successful_columns`: List of successfully processed columns
- `failed_columns`: List of failed columns
- `fallback_used`: Boolean indicating if fallback was activated
- `processing_time`: Time taken for processing
- `error_log`: Detailed list of processing errors
### Data Quality Metrics
#### Static Methods
- `DataQualityMetrics.compare_results(datatidy_df, fallback_df)`: Compare two DataFrames
- `DataQualityMetrics.print_comparison_summary(comparison)`: Print formatted comparison
- `DataQualityMetrics.export_comparison_report(comparison, file_path)`: Export report to JSON
### Configuration Schema
See [Configuration Reference](docs/configuration.md) for complete schema documentation.
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Submit a pull request
## License
MIT License - see LICENSE file for details.
## Changelog
### Version 0.1.0
- Initial release
- Basic CSV, Excel, and database support
- Safe expression engine
- Comprehensive validation system
- CLI interface
Raw data
{
"_id": null,
"home_page": null,
"name": "datatidy",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Wendi Wang <wwd1015@gmail.com>",
"keywords": "ETL, configuration-driven, data cleaning, data processing, data transformation, pandas",
"author": null,
"author_email": "Wendi Wang <wwd1015@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/71/20/2190e5b148dffc17906b2bea0ab17e5df03022815586e1e9c48326126a51/datatidy-0.1.1.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <img src=\"assets/datatidy-logo-pypi.png\" alt=\"DataTidy Logo\" width=\"300\">\n \n <h3>Configuration-Driven Data Processing Made Simple</h3>\n \n [](https://pypi.org/project/datatidy/)\n [](https://pypi.org/project/datatidy/)\n [](https://opensource.org/licenses/MIT)\n [](https://pepy.tech/project/datatidy)\n</div>\n\n# DataTidy\n\nA powerful, configuration-driven data processing and cleaning package for Python with robust fallback capabilities. DataTidy allows you to define complex data transformations, validations, and cleanings through simple YAML configuration files, ensuring 100% reliability in production environments.\n\n## \ud83d\ude80 Key Features\n\n- **\ud83d\udd27 Configuration-Driven**: Define all transformations in YAML - no code required\n- **\ud83d\udcca Multiple Data Sources**: CSV, Excel, databases (PostgreSQL, MySQL, Snowflake, etc.)\n- **\ud83d\udd17 Multi-Input Joins**: Combine data from multiple sources with flexible join operations\n- **\u26a1 Advanced Operations**: Map/reduce/filter with lambda functions and chained operations\n- **\ud83e\udde0 Dependency Resolution**: Automatic execution order planning for complex transformations\n- **\ud83d\udcc8 Time Series Support**: Lag operations and rolling window calculations\n- **\ud83d\udee1\ufe0f Safe Expressions**: Secure evaluation with whitelist-based security\n- **\ud83c\udfaf Data Validation**: Comprehensive validation rules with detailed error reporting\n- **\u2699\ufe0f CLI Interface**: Easy-to-use command-line tools for batch processing\n\n### \ud83d\udd04 Enhanced Fallback System (v0.1.0)\n\n- **\ud83d\udee1\ufe0f 100% Reliability**: Dashboard never fails to load data with automatic fallback mechanisms\n- **\u2696\ufe0f Graceful Degradation**: Gets sophisticated transformations when possible, basic data when needed\n- **\ud83d\udd0d Enhanced Error Logging**: Detailed error categorization with actionable debugging suggestions\n- **\ud83d\udcca Data Quality Metrics**: Compare DataTidy results with fallback data for quality assessment\n- **\ud83c\udf9b\ufe0f Multiple Processing Modes**: Strict, partial, and fallback modes for different reliability requirements\n- **\ud83d\udd27 Partial Processing**: Skip problematic columns while processing successful ones\n- **\ud83d\udccb Processing Recommendations**: Get specific suggestions for improving configurations\n\n\n## Installation\n\n```bash\npip install datatidy\n```\n\nFor development installation:\n```bash\ngit clone https://github.com/your-repo/datatidy.git\ncd datatidy\npip install -e \".[dev]\"\n```\n\n## Quick Start\n\n### 1. Create a sample configuration\n\n```bash\ndatatidy sample config.yaml\n```\n\n### 2. Process your data\n\n```bash\ndatatidy process config.yaml -i input.csv -o output.csv\n```\n\n### 3. Or use programmatically\n\n```python\nfrom datatidy import DataTidy\n\n# Initialize with configuration\ndt = DataTidy('config.yaml')\n\n# Standard processing\nresult = dt.process_data('input.csv')\n\n# Enhanced processing with fallback\nresult = dt.process_data_with_fallback('input.csv')\n\n# Save result\ndt.process_and_save('output.csv', 'input.csv')\n```\n\n## Configuration Structure\n\nDataTidy uses YAML configuration files to define data processing pipelines:\n\n```yaml\ninput:\n type: csv # csv, excel, database\n source: \"data/input.csv\" # file path or SQL query\n options:\n encoding: utf-8\n delimiter: \",\"\n\noutput:\n columns:\n user_id:\n source: \"id\" # Source column name\n type: int # Data type conversion\n validation:\n required: true\n min_value: 1\n \n full_name:\n source: \"name\"\n type: string\n transformation: \"str.title()\" # Python expression\n validation:\n required: true\n min_length: 2\n max_length: 100\n \n age_group:\n transformation: \"'adult' if age >= 18 else 'minor'\"\n type: string\n validation:\n allowed_values: [\"adult\", \"minor\"]\n\n filters:\n - condition: \"age >= 0\"\n action: keep\n\n sort:\n - column: user_id\n ascending: true\n\nglobal_settings:\n ignore_errors: false\n max_errors: 100\n \n # Enhanced fallback settings\n processing_mode: partial # strict, partial, or fallback\n enable_partial_processing: true\n enable_fallback: true\n max_column_failures: 5\n failure_threshold: 0.3 # 30% failure rate triggers fallback\n \n # Fallback transformations for problematic columns\n fallback_transformations:\n age_group:\n type: default_value\n value: \"unknown\"\n```\n\n## Examples\n\n### Basic CSV Processing\n\n```python\nfrom datatidy import DataTidy\n\nconfig = {\n \"input\": {\n \"type\": \"csv\",\n \"source\": \"users.csv\"\n },\n \"output\": {\n \"columns\": {\n \"clean_name\": {\n \"source\": \"name\",\n \"transformation\": \"str.strip().title()\",\n \"type\": \"string\"\n },\n \"age_category\": {\n \"transformation\": \"'senior' if age > 65 else ('adult' if age >= 18 else 'minor')\",\n \"type\": \"string\"\n }\n }\n }\n}\n\ndt = DataTidy()\ndt.load_config(config)\nresult = dt.process_data()\nprint(result)\n```\n\n### Database Processing\n\n```yaml\ninput:\n type: database\n source: \n query: \"SELECT * FROM users WHERE active = true\"\n connection_string: \"postgresql://user:pass@localhost/db\"\n\noutput:\n columns:\n user_email:\n source: \"email\"\n type: string\n validation:\n pattern: \"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\\\.[a-zA-Z]{2,}$\"\n \n signup_date:\n source: \"created_at\"\n type: datetime\n format: \"%Y-%m-%d\"\n```\n\n### Excel Processing with Complex Transformations\n\n```yaml\ninput:\n type: excel\n source:\n path: \"sales_data.xlsx\"\n sheet_name: \"Q1_Sales\"\n options:\n header: 0\n skiprows: 2\n\noutput:\n columns:\n revenue_category:\n transformation: |\n 'high' if revenue > 100000 else (\n 'medium' if revenue > 50000 else 'low'\n )\n validation:\n allowed_values: [\"high\", \"medium\", \"low\"]\n \n formatted_date:\n source: \"sale_date\"\n type: datetime\n format: \"%Y-%m-%d\"\n \n clean_product_name:\n source: \"product\"\n transformation: \"str.strip().upper().replace('_', ' ')\"\n validation:\n min_length: 1\n max_length: 50\n\n filters:\n - condition: \"revenue > 0\"\n action: keep\n - condition: \"product != 'DELETED'\"\n action: keep\n```\n\n## Enhanced Fallback Processing\n\n### Production-Ready Data Processing\n```python\nfrom datatidy import DataTidy\n\n# Initialize with fallback-enabled configuration\ndt = DataTidy('config.yaml')\n\n# Define fallback database query\ndef fallback_database_query():\n return pd.read_sql(\"SELECT * FROM facilities\", db_connection)\n\n# Process with guaranteed results\nresult = dt.process_data_with_fallback(\n data=input_df,\n fallback_query_func=fallback_database_query\n)\n\n# Your application always gets data!\nif result.fallback_used:\n logger.warning(\"DataTidy processing failed, using database fallback\")\n\n# Check processing results\nsummary = dt.get_processing_summary()\nprint(f\"Success: {summary['success']}\")\nprint(f\"Successful columns: {summary['successful_columns']}\")\nprint(f\"Failed columns: {summary['failed_columns']}\")\n\n# Get improvement recommendations\nrecommendations = dt.get_processing_recommendations()\nfor rec in recommendations:\n print(f\"\ud83d\udca1 {rec}\")\n\n# Compare data quality when both available\nif not result.fallback_used:\n fallback_data = fallback_database_query()\n quality = dt.compare_with_fallback(fallback_data)\n print(f\"Overall quality score: {quality.overall_quality_score:.2f}\")\n```\n\n### Data Quality Monitoring\n```python\nfrom datatidy.fallback.metrics import DataQualityMetrics\n\n# Compare processing results\ncomparison = DataQualityMetrics.compare_results(\n datatidy_df=processed_data,\n fallback_df=fallback_data,\n datatidy_time=2.3,\n fallback_time=0.8\n)\n\n# Print detailed comparison\nDataQualityMetrics.print_comparison_summary(comparison)\n\n# Export for analysis\nDataQualityMetrics.export_comparison_report(\n comparison, \n 'quality_report.json'\n)\n```\n\n## Command Line Usage\n\n### Enhanced Processing Modes\n```bash\n# Strict mode (default) - fails on any error\ndatatidy process config.yaml --mode strict\n\n# Partial mode - skip problematic columns\ndatatidy process config.yaml --mode partial --show-summary\n\n# Fallback mode - use fallback transformations\ndatatidy process config.yaml --mode fallback\n\n# Development mode with detailed feedback\ndatatidy process config.yaml --mode partial \\\\\n --show-summary \\\\\n --show-recommendations \\\\\n --error-log debug.json\n```\n\n### Process Data\n```bash\n# Basic processing\ndatatidy process config.yaml\n\n# With input/output files\ndatatidy process config.yaml -i input.csv -o output.csv\n\n# Ignore validation errors\ndatatidy process config.yaml --ignore-errors\n```\n\n### Validate Configuration\n```bash\ndatatidy validate config.yaml\n```\n\n### Create Sample Configuration\n```bash\ndatatidy sample my_config.yaml\n```\n\n## Expression System\n\nDataTidy includes a safe expression parser that supports:\n\n### Basic Operations\n- Arithmetic: `+`, `-`, `*`, `/`, `//`, `%`, `**`\n- Comparison: `==`, `!=`, `<`, `<=`, `>`, `>=`\n- Logical: `and`, `or`, `not`\n- Membership: `in`, `not in`\n\n### Functions\n- Type conversion: `str()`, `int()`, `float()`, `bool()`\n- Math: `abs()`, `max()`, `min()`, `round()`\n- String methods: `upper()`, `lower()`, `strip()`, `replace()`, etc.\n\n### Examples\n```yaml\ntransformations:\n # Conditional expressions\n status: \"'active' if last_login_days < 30 else 'inactive'\"\n \n # String operations\n clean_name: \"name.strip().title()\"\n \n # Mathematical calculations\n bmi: \"weight / (height / 100) ** 2\"\n \n # Complex conditions\n risk_level: |\n 'high' if (age > 65 and income < 30000) else (\n 'medium' if age > 40 else 'low'\n )\n```\n\n## Validation Rules\n\nDataTidy supports comprehensive validation:\n\n```yaml\nvalidation:\n required: true # Field must not be null\n nullable: false # Field cannot be null\n min_value: 0 # Minimum numeric value\n max_value: 100 # Maximum numeric value\n min_length: 2 # Minimum string length\n max_length: 50 # Maximum string length\n pattern: \"^[A-Za-z]+$\" # Regex pattern\n allowed_values: [\"A\", \"B\"] # Whitelist of values\n```\n\n## Error Handling\n\n```python\ndt = DataTidy('config.yaml')\nresult = dt.process_data('input.csv')\n\n# Check for errors\nif dt.has_errors():\n for error in dt.get_errors():\n print(f\"Error: {error['message']}\")\n```\n\n## API Reference\n\n### DataTidy Class\n\n#### Core Methods\n- `load_config(config)`: Load configuration from file or dict\n- `process_data(data=None)`: Process data according to configuration\n- `process_and_save(output_path, data=None)`: Process and save data\n- `get_errors()`: Get list of processing errors\n- `has_errors()`: Check if errors occurred\n\n#### Enhanced Fallback Methods\n- `process_data_with_fallback(data=None, fallback_query_func=None)`: Process with fallback capabilities\n- `get_processing_summary()`: Get detailed processing summary with metrics\n- `get_error_report()`: Get categorized error report with debugging info\n- `get_processing_recommendations()`: Get actionable recommendations for improvements\n- `compare_with_fallback(fallback_df)`: Compare DataTidy results with fallback data\n- `export_error_log(file_path)`: Export detailed error log to JSON\n- `set_processing_mode(mode)`: Set processing mode (strict, partial, fallback)\n\n### Processing Result Class\n\n#### Properties\n- `success`: Boolean indicating overall processing success\n- `data`: Processed DataFrame result\n- `processing_mode`: Mode used for processing\n- `successful_columns`: List of successfully processed columns\n- `failed_columns`: List of failed columns\n- `fallback_used`: Boolean indicating if fallback was activated\n- `processing_time`: Time taken for processing\n- `error_log`: Detailed list of processing errors\n\n### Data Quality Metrics\n\n#### Static Methods\n- `DataQualityMetrics.compare_results(datatidy_df, fallback_df)`: Compare two DataFrames\n- `DataQualityMetrics.print_comparison_summary(comparison)`: Print formatted comparison\n- `DataQualityMetrics.export_comparison_report(comparison, file_path)`: Export report to JSON\n\n### Configuration Schema\n\nSee [Configuration Reference](docs/configuration.md) for complete schema documentation.\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests\n5. Submit a pull request\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Changelog\n\n### Version 0.1.0\n- Initial release\n- Basic CSV, Excel, and database support\n- Safe expression engine\n- Comprehensive validation system\n- CLI interface",
"bugtrack_url": null,
"license": "MIT",
"summary": "A powerful, configuration-driven data processing and cleaning package",
"version": "0.1.1",
"project_urls": {
"Bug Reports": "https://github.com/wwd1015/datatidy/issues",
"Changelog": "https://github.com/wwd1015/datatidy/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/wwd1015/datatidy#readme",
"Homepage": "https://github.com/wwd1015/datatidy",
"Repository": "https://github.com/wwd1015/datatidy"
},
"split_keywords": [
"etl",
" configuration-driven",
" data cleaning",
" data processing",
" data transformation",
" pandas"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "63dd29f58e8050d2c99a51b242c05058e3719d1512b7ac50730852381ed3b330",
"md5": "ad19a233bc3e2a37c6b99237d9cc162d",
"sha256": "8d88dd3b45ebded706f4f27085aafaf738b833071bff0073bf9d9843ddaede83"
},
"downloads": -1,
"filename": "datatidy-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ad19a233bc3e2a37c6b99237d9cc162d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 43179,
"upload_time": "2025-08-04T13:38:15",
"upload_time_iso_8601": "2025-08-04T13:38:15.045109Z",
"url": "https://files.pythonhosted.org/packages/63/dd/29f58e8050d2c99a51b242c05058e3719d1512b7ac50730852381ed3b330/datatidy-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "71202190e5b148dffc17906b2bea0ab17e5df03022815586e1e9c48326126a51",
"md5": "737b4d8fd11faad26a1e5ca412cefd41",
"sha256": "18661da4c88877766e86235f60f2669a3b08b7bddd01642c2c56c76ce3b0e986"
},
"downloads": -1,
"filename": "datatidy-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "737b4d8fd11faad26a1e5ca412cefd41",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 401260,
"upload_time": "2025-08-04T13:38:16",
"upload_time_iso_8601": "2025-08-04T13:38:16.119192Z",
"url": "https://files.pythonhosted.org/packages/71/20/2190e5b148dffc17906b2bea0ab17e5df03022815586e1e9c48326126a51/datatidy-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-04 13:38:16",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "wwd1015",
"github_project": "datatidy",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "datatidy"
}