funputer


Namefunputer JSON
Version 1.3.1 PyPI version JSON
download
home_pageNone
SummarySimple, intelligent imputation analysis with PREFLIGHT validation and auto-metadata inference
upload_time2025-08-06 14:35:18
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords imputation missing-data data-science machine-learning pandas auto-inference metadata preflight validation
VCS
bugtrack_url
requirements pandas numpy scipy pyyaml click pydantic requests jsonschema
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # FunPuter v1.3.0 - Intelligent Imputation Analysis

[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![PyPI](https://img.shields.io/pypi/v/funputer.svg)](https://pypi.org/project/funputer/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Test Coverage](https://img.shields.io/badge/coverage-69%25-brightgreen.svg)](#test-coverage)

**Production-ready intelligent imputation analysis with comprehensive constraint validation and auto-metadata inference.**

FunPuter analyzes your data and suggests the best imputation methods based on:
- ๐Ÿค– **Auto-metadata inference** (10/12 fields detected automatically)
- ๐Ÿ” **Missing data mechanisms** (MCAR, MAR, MNAR detection)
- ๐Ÿ“Š **Data types** and statistical properties  
- ๐Ÿข **Business rules** and column dependencies
- โšก **Enhanced constraints** (nullable, allowed_values, max_length validation)
- ๐Ÿ›ก๏ธ **PREFLIGHT system** (8 core validation checks A1-A8)
- ๐ŸŽฏ **Adaptive thresholds** based on your dataset characteristics

## ๐Ÿš€ Quick Start

### Installation
```bash
pip install funputer
```

### 30-Second Demo

**๐Ÿค– Auto-Inference Mode (Zero Configuration!)**
```python
import funputer

# Just point to your CSV - FunPuter figures out everything automatically!
suggestions = funputer.analyze_imputation_requirements("your_data.csv")

# Get intelligent suggestions
for suggestion in suggestions:
    if suggestion.missing_count > 0:
        print(f"๐Ÿ“Š {suggestion.column_name}: {suggestion.proposed_method}")
        print(f"   Confidence: {suggestion.confidence_score:.2f}")
        print(f"   Reason: {suggestion.rationale}")
```

**๐Ÿ“‹ Production Mode (Full Control)**
```python
import funputer
from funputer.models import ColumnMetadata

# Define your data structure with constraints
metadata = [
    ColumnMetadata('customer_id', 'integer', unique_flag=True),
    ColumnMetadata('age', 'integer', min_value=18, max_value=100),
    ColumnMetadata('income', 'float', min_value=0),
    ColumnMetadata('category', 'categorical', allowed_values='A,B,C'),
]

# Get production-grade suggestions
suggestions = funputer.analyze_dataframe(your_dataframe, metadata)
```

**๐Ÿ–ฅ๏ธ Command Line Interface**
```bash
# Auto-inference - easiest way
funputer analyze -d your_data.csv

# Production analysis with metadata
funputer analyze -d your_data.csv -m metadata.csv --verbose

# Data quality check first
funputer preflight your_data.csv
```

## ๐Ÿšจ **IMPORTANT: v1.3.0 Breaking Change**

**๐ŸŽฏ Consistent Naming**: Starting with v1.3.0, all imports and CLI commands use consistent `funputer` naming:

```python
# โœ… NEW (v1.3.0+): Consistent naming
import funputer
funputer.analyze_imputation_requirements("data.csv")
```

```bash
# โœ… NEW CLI command (v1.3.0+)
funputer analyze -d data.csv
```

**๐Ÿ”„ Migration**: For backward compatibility, old imports still work with deprecation warnings:

```python
# โš ๏ธ DEPRECATED (still works but shows warning)
import funimpute
# Old funimputer CLI command also still works
```

**๐Ÿ“… Timeline**: Deprecated imports will be removed in v2.0.0. Please update your code!

## ๐ŸŽฏ Enhanced Features (v1.3.0)

**What's New in v1.3.0:**
- ๐ŸŽฏ **Consistent Naming**: All imports and CLI use `funputer` (backward compatible)
- ๐Ÿ”„ **JSON Metadata Support**: SimpleImputationAnalyzer now handles both CSV and JSON metadata formats
- ๐Ÿ“‹ **Enhanced Documentation**: Updated examples and migration guides

**Previous Features (v1.2.1):**
- ๐Ÿšจ **PREFLIGHT System**: Lean validation (75% test coverage) that runs before ANY analysis - prevents crashes!
- ๐Ÿ” **Smart Auto-Inference**: Intelligent metadata detection with confidence scoring (10/12 fields)
- โšก **Constraint Validation**: Real-time nullable, allowed_values, and max_length checking
- ๐ŸŽฏ **Enhanced Proposals**: Metadata-aware imputation method selection
- ๐Ÿ›ก๏ธ **Exception Detection**: Comprehensive constraint violation handling (68% test coverage)
- ๐Ÿ“ˆ **Improved Confidence**: Dynamic scoring based on metadata compliance
- ๐Ÿงน **Warning Suppression**: Clean output with optimized pandas datetime parsing
- โœ… **Quality Assurance**: 51% overall test coverage with 220+ tests (98.3% pass rate)

## ๐Ÿšจ PREFLIGHT System (NEW!)

**Fast validation to prevent crashes and guide your workflow**

### What PREFLIGHT Does
- **Runs automatically** before `init` and `analyze` commands
- **8 core checks** (A1-A8): file access, format detection, encoding, structure, memory estimation
- **Advisory recommendations**: "generate metadata first" vs "analyze now"
- **Zero crashes**: Catches problems before they break your workflow
- **Backward compatible**: All existing commands work exactly as before

### Independent Usage
```bash
# Basic preflight check
funputer preflight -d your_data.csv

# With custom options
funputer preflight -d data.csv --sample-rows 5000 --encoding utf-8

# JSON report output
funputer preflight -d data.csv --json-out report.json
```

### Exit Codes
- **0**: โœ… Ready for analysis
- **2**: โš ๏ธ OK with warnings (can proceed)
- **10**: โŒ Hard error (cannot proceed)

### Example Output
```bash
๐Ÿ” PREFLIGHT REPORT
==================================================
Status: โœ… OK
File: data.csv
Size: 2.5 MB (csv)  
Columns: 12
Recommendation: Analyze Infer Only
```

FunPuter now supports comprehensive metadata fields that actively influence imputation recommendations:

### Metadata Schema

| Field | Type | Description | Example |
|-------|------|-------------|---------|
| `column_name` | string | Column identifier | `"age"` |
| `data_type` | string | Data type (integer, float, string, categorical, datetime) | `"integer"` |
| `nullable` | boolean | Allow null values | `false` |
| `min_value` | number | Minimum allowed value | `0` |
| `max_value` | number | Maximum allowed value | `120` |
| `max_length` | integer | Maximum string length | `50` |
| `allowed_values` | string | Comma-separated list of allowed values | `"A,B,C"` |
| `unique_flag` | boolean | Require unique values | `true` |
| `dependent_column` | string | Column dependencies | `"age"` |
| `business_rule` | string | Custom validation rules | `"Must be positive"` |
| `description` | string | Human-readable description | `"User age in years"` |

### ๐Ÿ› ๏ธ Creating Metadata

**Method 1: CLI Template Generation**
```bash
# Generate a metadata template from your data
funputer init -d data.csv -o metadata.csv

# Edit the generated file to add constraints
# Then analyze with enhanced metadata
funputer analyze -d data.csv -m metadata.csv
```

**Method 2: Manual CSV Creation**
```csv
# metadata.csv
# column_name,data_type,nullable,min_value,max_value,max_length,allowed_values,unique_flag,dependent_column,business_rule,description
user_id,integer,false,,,50,,true,,,"Unique user identifier"
age,integer,false,0,120,,,,,Must be positive,"User age in years"
income,float,true,0,,,,,age,Higher with age,"Annual income in USD"
category,categorical,false,,,10,"A,B,C",,,,"User category classification"
email,string,true,,,255,,true,,,"User email address"
```

### ๐ŸŽฏ Metadata in Action

**Example 1: Nullable Constraints**
```python
# When nullable=False but data has missing values
metadata = ColumnMetadata(
    column_name="age",
    data_type="integer",
    nullable=False,
    min_value=0,
    max_value=120
)

# FunPuter will:
# - Detect nullable constraint violations
# - Recommend immediate data quality fixes
# - Lower confidence score due to constraint violations
```

**Example 2: Allowed Values**
```python
# For categorical data with specific allowed values
metadata = ColumnMetadata(
    column_name="status",
    data_type="categorical",
    allowed_values="active,inactive,pending"
)

# FunPuter will:
# - Validate all values against allowed list
# - Recommend mode imputation using only allowed values
# - Increase confidence when data respects constraints
```

**Example 3: String Length Constraints**
```python
# For string data with length limits
metadata = ColumnMetadata(
    column_name="username",
    data_type="string",
    max_length=20,
    unique_flag=True
)

# FunPuter will:
# - Check string lengths against max_length
# - Recommend imputation respecting length limits
# - Consider uniqueness requirements in recommendations
```

### ๐Ÿ“Š Enhanced Analysis Results

```python
# Results now include metadata-aware recommendations
for suggestion in suggestions:
    print(f"Column: {suggestion.column_name}")
    print(f"Method: {suggestion.proposed_method}")
    print(f"Confidence: {suggestion.confidence_score:.3f}")
    print(f"Rationale: {suggestion.rationale}")
    
    # New: Metadata constraint information
    if suggestion.metadata_violations:
        print(f"Violations: {suggestion.metadata_violations}")
    
    # New: Enhanced parameters
    if suggestion.parameters:
        print(f"Parameters: {suggestion.parameters}")
```

## ๐Ÿ” Confidence-Score Heuristics

FunPuter assigns a **`confidence_score`** (range **0 โ€“ 1**) to every imputation recommendation.  The value is a transparent, rule-based estimate of how reliable the proposed method is, **not** a formal statistical uncertainty.  Two calculators are used:

### Base heuristic
When only column-level data is available (no full DataFrame), the score is computed as follows:

| Signal | Condition | ฮ” Score |
|--------|-----------|---------|
| **Starting value** | | **0.50** |
| Missing % | `< 5 %` +0.20 โ€ข `5 โ€“ 20 %` +0.10 โ€ข `> 50 %` โˆ’0.20 |
| Mechanism | MCAR (weak evidence) +0.10 โ€ข MAR (related cols) +0.05 โ€ข MNAR/UNKNOWN โˆ’0.10 |
| Outliers | `< 5 %` +0.05 โ€ข `> 20 %` โˆ’0.10 |
| Metadata constraints | `allowed_values` (categorical/string) +0.10 โ€ข `max_length` (string) +0.05 |
| Nullable constraint | `nullable=False` **with** missing โˆ’0.15 โ€ข **without** missing +0.05 |
| Data-quality checks | Strings within `max_length` +0.05 โ€ข Categorical values inside `allowed_values` + *(valid_ratio ร— 0.10)* |

The final score is clipped to the **[0.10, 1.00]** interval.

### Adaptive variant
When the analyzer receives the full DataFrame **and** complete metadata, it builds dataset-specific thresholds using `AdaptiveThresholds` and applies `calculate_adaptive_confidence_score`:

* Adaptive missing/outlier thresholds (based on row-count, variability, etc.)
* An additional adjustment factor (โˆ’0.30 โ€ฆ +0.30) reflecting dataset characteristics

This yields a context-aware score that remains interpretable yet sensitive to each dataset.

### Future work
For maximum transparency and speed we use heuristics today.  Future releases may include probabilistic or conformal approaches (e.g., multiple-imputation variance or ensemble uncertainty) to provide statistically grounded confidence estimates.

## ๐Ÿš€ Advanced Usage

### Programmatic Metadata Creation
```python
from funputer.models import ColumnMetadata

metadata = [
    ColumnMetadata(
        column_name="product_code",
        data_type="string",
        max_length=10,
        allowed_values="A1,A2,B1,B2",
        nullable=False,
        description="Product classification code"
    ),
    ColumnMetadata(
        column_name="price",
        data_type="float",
        min_value=0,
        max_value=10000,
        business_rule="Must be non-negative"
    )
]

# Analyze with custom metadata
import pandas as pd
data = pd.read_csv("products.csv")
from funputer.simple_analyzer import SimpleImputationAnalyzer

analyzer = SimpleImputationAnalyzer()
results = analyzer.analyze_dataframe(data, metadata)
```

### CLI Usage with Enhanced Metadata & PREFLIGHT
```bash
# PREFLIGHT runs automatically before init/analyze
funputer init -d products.csv -o products_metadata.csv
# ๐Ÿ” Preflight Check: โœ… OK - File validated, ready for processing

# Edit metadata.csv to add constraints, then:
funputer analyze -d products.csv -m products_metadata.csv -o results.csv
# ๐Ÿ” Preflight Check: โœ… OK - Recommendation: Analyze Now

# Run standalone preflight validation
funputer preflight -d products.csv --json-out validation_report.json

# Disable preflight if needed (not recommended)
export FUNPUTER_PREFLIGHT=off
funputer analyze -d products.csv

# Results are automatically saved in CSV format for easy viewing
```

## ๐Ÿ“‹ Requirements

- **Python**: 3.9 or higher
- **Dependencies**: pandas, numpy, scipy, scikit-learn

## ๐Ÿ”ง Installation from Source

```bash
git clone https://github.com/RajeshRamachander/funputer.git
cd funputer
pip install -e .
```

## ๐Ÿ“š Comprehensive Examples

FunPuter comes with extensive real-world examples covering every feature:

### ๐ŸŽฏ **Quick Start Examples**
- **[quick_start_guide.py](examples/quick_start_guide.py)** - Get started in 5 minutes with common patterns
- **[comprehensive_usage_guide.py](examples/comprehensive_usage_guide.py)** - Every feature demonstrated
- **[cli_examples.sh](examples/cli_examples.sh)** - Complete CLI usage guide

### ๐Ÿญ **Industry Examples**
- **[real_world_examples.py](examples/real_world_examples.py)** - Production scenarios across industries:
  - ๐Ÿ›’ **E-commerce Customer Analytics** - Customer behavior, churn prediction
  - ๐Ÿฅ **Healthcare Patient Records** - Clinical data with regulatory constraints  
  - ๐Ÿ’ฐ **Financial Risk Assessment** - Credit scoring, loan applications
  - ๐Ÿ“ข **Marketing Campaign Analysis** - ROI optimization, A/B testing
  - ๐ŸŒก๏ธ **IoT Sensor Data** - Time series, equipment monitoring

### ๐Ÿ“Š **Usage Patterns**

**Auto-Inference (Zero Configuration)**
```python
# Perfect for data exploration and prototyping
suggestions = funputer.analyze_imputation_requirements("mystery_data.csv")
```

**Production Mode (Full Control)**
```python
# Enterprise-grade with constraint validation
from funputer.models import ColumnMetadata, AnalysisConfig

metadata = [
    ColumnMetadata('customer_id', 'integer', unique_flag=True, nullable=False),
    ColumnMetadata('age', 'integer', min_value=18, max_value=100),
    ColumnMetadata('income', 'float', dependent_column='age', 
                   business_rule='Income correlates with age'),
    ColumnMetadata('category', 'categorical', allowed_values='A,B,C,D')
]

config = AnalysisConfig(missing_percentage_threshold=0.25, skip_columns=['id'])
suggestions = funputer.analyze_dataframe(df, metadata, config)
```

**CLI Automation**
```bash
# Batch processing workflow
for file in data/*.csv; do
    funputer preflight "$file" && \
    funputer analyze -d "$file" --output "results/$(basename "$file" .csv)_plan.csv"
done
```

### ๐ŸŽ“ **Learning Path**

1. **Start Here**: `quick_start_guide.py` - Master the basics in 5 minutes
2. **Go Deeper**: `comprehensive_usage_guide.py` - Learn every feature  
3. **Real World**: `real_world_examples.py` - See industry applications
4. **CLI Mastery**: `cli_examples.sh` - Automate your workflows
5. **Production**: Use the patterns in your specific domain

### ๐Ÿ’ก **Pro Tips**

- **Exploration**: Use auto-inference for quick insights
- **Production**: Always use explicit metadata with constraints
- **Automation**: CLI is perfect for CI/CD and batch processing
- **Validation**: Run preflight checks before expensive analysis
- **Performance**: Skip unnecessary columns, tune thresholds appropriately

## ๐Ÿ“š Documentation

- **Examples Directory**: [examples/](examples/) - Comprehensive usage examples
- **API Reference**: See docstrings and type hints in the code
- **Changelog**: [CHANGELOG.md](CHANGELOG.md) - Version history and features

## ๐Ÿค Contributing

We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.

## ๐Ÿ“„ License

MIT License - see [LICENSE](LICENSE) file for details.

---

**Focus**: Get intelligent imputation recommendations with enhanced metadata support, not complex infrastructure.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "funputer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "imputation, missing-data, data-science, machine-learning, pandas, auto-inference, metadata, preflight, validation",
    "author": null,
    "author_email": "Rajesh Ramachander <rajeshr.technocraft@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/a7/d9/1ecfc07f87fd9d30ce8ec4e6943d95c02f8f7f15cd5d9c295db386bfece7/funputer-1.3.1.tar.gz",
    "platform": null,
    "description": "# FunPuter v1.3.0 - Intelligent Imputation Analysis\n\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![PyPI](https://img.shields.io/pypi/v/funputer.svg)](https://pypi.org/project/funputer/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Test Coverage](https://img.shields.io/badge/coverage-69%25-brightgreen.svg)](#test-coverage)\n\n**Production-ready intelligent imputation analysis with comprehensive constraint validation and auto-metadata inference.**\n\nFunPuter analyzes your data and suggests the best imputation methods based on:\n- \ud83e\udd16 **Auto-metadata inference** (10/12 fields detected automatically)\n- \ud83d\udd0d **Missing data mechanisms** (MCAR, MAR, MNAR detection)\n- \ud83d\udcca **Data types** and statistical properties  \n- \ud83c\udfe2 **Business rules** and column dependencies\n- \u26a1 **Enhanced constraints** (nullable, allowed_values, max_length validation)\n- \ud83d\udee1\ufe0f **PREFLIGHT system** (8 core validation checks A1-A8)\n- \ud83c\udfaf **Adaptive thresholds** based on your dataset characteristics\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n```bash\npip install funputer\n```\n\n### 30-Second Demo\n\n**\ud83e\udd16 Auto-Inference Mode (Zero Configuration!)**\n```python\nimport funputer\n\n# Just point to your CSV - FunPuter figures out everything automatically!\nsuggestions = funputer.analyze_imputation_requirements(\"your_data.csv\")\n\n# Get intelligent suggestions\nfor suggestion in suggestions:\n    if suggestion.missing_count > 0:\n        print(f\"\ud83d\udcca {suggestion.column_name}: {suggestion.proposed_method}\")\n        print(f\"   Confidence: {suggestion.confidence_score:.2f}\")\n        print(f\"   Reason: {suggestion.rationale}\")\n```\n\n**\ud83d\udccb Production Mode (Full Control)**\n```python\nimport funputer\nfrom funputer.models import ColumnMetadata\n\n# Define your data structure with constraints\nmetadata = [\n    ColumnMetadata('customer_id', 'integer', unique_flag=True),\n    ColumnMetadata('age', 'integer', min_value=18, max_value=100),\n    ColumnMetadata('income', 'float', min_value=0),\n    ColumnMetadata('category', 'categorical', allowed_values='A,B,C'),\n]\n\n# Get production-grade suggestions\nsuggestions = funputer.analyze_dataframe(your_dataframe, metadata)\n```\n\n**\ud83d\udda5\ufe0f Command Line Interface**\n```bash\n# Auto-inference - easiest way\nfunputer analyze -d your_data.csv\n\n# Production analysis with metadata\nfunputer analyze -d your_data.csv -m metadata.csv --verbose\n\n# Data quality check first\nfunputer preflight your_data.csv\n```\n\n## \ud83d\udea8 **IMPORTANT: v1.3.0 Breaking Change**\n\n**\ud83c\udfaf Consistent Naming**: Starting with v1.3.0, all imports and CLI commands use consistent `funputer` naming:\n\n```python\n# \u2705 NEW (v1.3.0+): Consistent naming\nimport funputer\nfunputer.analyze_imputation_requirements(\"data.csv\")\n```\n\n```bash\n# \u2705 NEW CLI command (v1.3.0+)\nfunputer analyze -d data.csv\n```\n\n**\ud83d\udd04 Migration**: For backward compatibility, old imports still work with deprecation warnings:\n\n```python\n# \u26a0\ufe0f DEPRECATED (still works but shows warning)\nimport funimpute\n# Old funimputer CLI command also still works\n```\n\n**\ud83d\udcc5 Timeline**: Deprecated imports will be removed in v2.0.0. Please update your code!\n\n## \ud83c\udfaf Enhanced Features (v1.3.0)\n\n**What's New in v1.3.0:**\n- \ud83c\udfaf **Consistent Naming**: All imports and CLI use `funputer` (backward compatible)\n- \ud83d\udd04 **JSON Metadata Support**: SimpleImputationAnalyzer now handles both CSV and JSON metadata formats\n- \ud83d\udccb **Enhanced Documentation**: Updated examples and migration guides\n\n**Previous Features (v1.2.1):**\n- \ud83d\udea8 **PREFLIGHT System**: Lean validation (75% test coverage) that runs before ANY analysis - prevents crashes!\n- \ud83d\udd0d **Smart Auto-Inference**: Intelligent metadata detection with confidence scoring (10/12 fields)\n- \u26a1 **Constraint Validation**: Real-time nullable, allowed_values, and max_length checking\n- \ud83c\udfaf **Enhanced Proposals**: Metadata-aware imputation method selection\n- \ud83d\udee1\ufe0f **Exception Detection**: Comprehensive constraint violation handling (68% test coverage)\n- \ud83d\udcc8 **Improved Confidence**: Dynamic scoring based on metadata compliance\n- \ud83e\uddf9 **Warning Suppression**: Clean output with optimized pandas datetime parsing\n- \u2705 **Quality Assurance**: 51% overall test coverage with 220+ tests (98.3% pass rate)\n\n## \ud83d\udea8 PREFLIGHT System (NEW!)\n\n**Fast validation to prevent crashes and guide your workflow**\n\n### What PREFLIGHT Does\n- **Runs automatically** before `init` and `analyze` commands\n- **8 core checks** (A1-A8): file access, format detection, encoding, structure, memory estimation\n- **Advisory recommendations**: \"generate metadata first\" vs \"analyze now\"\n- **Zero crashes**: Catches problems before they break your workflow\n- **Backward compatible**: All existing commands work exactly as before\n\n### Independent Usage\n```bash\n# Basic preflight check\nfunputer preflight -d your_data.csv\n\n# With custom options\nfunputer preflight -d data.csv --sample-rows 5000 --encoding utf-8\n\n# JSON report output\nfunputer preflight -d data.csv --json-out report.json\n```\n\n### Exit Codes\n- **0**: \u2705 Ready for analysis\n- **2**: \u26a0\ufe0f OK with warnings (can proceed)\n- **10**: \u274c Hard error (cannot proceed)\n\n### Example Output\n```bash\n\ud83d\udd0d PREFLIGHT REPORT\n==================================================\nStatus: \u2705 OK\nFile: data.csv\nSize: 2.5 MB (csv)  \nColumns: 12\nRecommendation: Analyze Infer Only\n```\n\nFunPuter now supports comprehensive metadata fields that actively influence imputation recommendations:\n\n### Metadata Schema\n\n| Field | Type | Description | Example |\n|-------|------|-------------|---------|\n| `column_name` | string | Column identifier | `\"age\"` |\n| `data_type` | string | Data type (integer, float, string, categorical, datetime) | `\"integer\"` |\n| `nullable` | boolean | Allow null values | `false` |\n| `min_value` | number | Minimum allowed value | `0` |\n| `max_value` | number | Maximum allowed value | `120` |\n| `max_length` | integer | Maximum string length | `50` |\n| `allowed_values` | string | Comma-separated list of allowed values | `\"A,B,C\"` |\n| `unique_flag` | boolean | Require unique values | `true` |\n| `dependent_column` | string | Column dependencies | `\"age\"` |\n| `business_rule` | string | Custom validation rules | `\"Must be positive\"` |\n| `description` | string | Human-readable description | `\"User age in years\"` |\n\n### \ud83d\udee0\ufe0f Creating Metadata\n\n**Method 1: CLI Template Generation**\n```bash\n# Generate a metadata template from your data\nfunputer init -d data.csv -o metadata.csv\n\n# Edit the generated file to add constraints\n# Then analyze with enhanced metadata\nfunputer analyze -d data.csv -m metadata.csv\n```\n\n**Method 2: Manual CSV Creation**\n```csv\n# metadata.csv\n# column_name,data_type,nullable,min_value,max_value,max_length,allowed_values,unique_flag,dependent_column,business_rule,description\nuser_id,integer,false,,,50,,true,,,\"Unique user identifier\"\nage,integer,false,0,120,,,,,Must be positive,\"User age in years\"\nincome,float,true,0,,,,,age,Higher with age,\"Annual income in USD\"\ncategory,categorical,false,,,10,\"A,B,C\",,,,\"User category classification\"\nemail,string,true,,,255,,true,,,\"User email address\"\n```\n\n### \ud83c\udfaf Metadata in Action\n\n**Example 1: Nullable Constraints**\n```python\n# When nullable=False but data has missing values\nmetadata = ColumnMetadata(\n    column_name=\"age\",\n    data_type=\"integer\",\n    nullable=False,\n    min_value=0,\n    max_value=120\n)\n\n# FunPuter will:\n# - Detect nullable constraint violations\n# - Recommend immediate data quality fixes\n# - Lower confidence score due to constraint violations\n```\n\n**Example 2: Allowed Values**\n```python\n# For categorical data with specific allowed values\nmetadata = ColumnMetadata(\n    column_name=\"status\",\n    data_type=\"categorical\",\n    allowed_values=\"active,inactive,pending\"\n)\n\n# FunPuter will:\n# - Validate all values against allowed list\n# - Recommend mode imputation using only allowed values\n# - Increase confidence when data respects constraints\n```\n\n**Example 3: String Length Constraints**\n```python\n# For string data with length limits\nmetadata = ColumnMetadata(\n    column_name=\"username\",\n    data_type=\"string\",\n    max_length=20,\n    unique_flag=True\n)\n\n# FunPuter will:\n# - Check string lengths against max_length\n# - Recommend imputation respecting length limits\n# - Consider uniqueness requirements in recommendations\n```\n\n### \ud83d\udcca Enhanced Analysis Results\n\n```python\n# Results now include metadata-aware recommendations\nfor suggestion in suggestions:\n    print(f\"Column: {suggestion.column_name}\")\n    print(f\"Method: {suggestion.proposed_method}\")\n    print(f\"Confidence: {suggestion.confidence_score:.3f}\")\n    print(f\"Rationale: {suggestion.rationale}\")\n    \n    # New: Metadata constraint information\n    if suggestion.metadata_violations:\n        print(f\"Violations: {suggestion.metadata_violations}\")\n    \n    # New: Enhanced parameters\n    if suggestion.parameters:\n        print(f\"Parameters: {suggestion.parameters}\")\n```\n\n## \ud83d\udd0d Confidence-Score Heuristics\n\nFunPuter assigns a **`confidence_score`** (range **0 \u2013 1**) to every imputation recommendation.  The value is a transparent, rule-based estimate of how reliable the proposed method is, **not** a formal statistical uncertainty.  Two calculators are used:\n\n### Base heuristic\nWhen only column-level data is available (no full DataFrame), the score is computed as follows:\n\n| Signal | Condition | \u0394 Score |\n|--------|-----------|---------|\n| **Starting value** | | **0.50** |\n| Missing % | `< 5 %` +0.20 \u2022 `5 \u2013 20 %` +0.10 \u2022 `> 50 %` \u22120.20 |\n| Mechanism | MCAR (weak evidence) +0.10 \u2022 MAR (related cols) +0.05 \u2022 MNAR/UNKNOWN \u22120.10 |\n| Outliers | `< 5 %` +0.05 \u2022 `> 20 %` \u22120.10 |\n| Metadata constraints | `allowed_values` (categorical/string) +0.10 \u2022 `max_length` (string) +0.05 |\n| Nullable constraint | `nullable=False` **with** missing \u22120.15 \u2022 **without** missing +0.05 |\n| Data-quality checks | Strings within `max_length` +0.05 \u2022 Categorical values inside `allowed_values` + *(valid_ratio \u00d7 0.10)* |\n\nThe final score is clipped to the **[0.10, 1.00]** interval.\n\n### Adaptive variant\nWhen the analyzer receives the full DataFrame **and** complete metadata, it builds dataset-specific thresholds using `AdaptiveThresholds` and applies `calculate_adaptive_confidence_score`:\n\n* Adaptive missing/outlier thresholds (based on row-count, variability, etc.)\n* An additional adjustment factor (\u22120.30 \u2026 +0.30) reflecting dataset characteristics\n\nThis yields a context-aware score that remains interpretable yet sensitive to each dataset.\n\n### Future work\nFor maximum transparency and speed we use heuristics today.  Future releases may include probabilistic or conformal approaches (e.g., multiple-imputation variance or ensemble uncertainty) to provide statistically grounded confidence estimates.\n\n## \ud83d\ude80 Advanced Usage\n\n### Programmatic Metadata Creation\n```python\nfrom funputer.models import ColumnMetadata\n\nmetadata = [\n    ColumnMetadata(\n        column_name=\"product_code\",\n        data_type=\"string\",\n        max_length=10,\n        allowed_values=\"A1,A2,B1,B2\",\n        nullable=False,\n        description=\"Product classification code\"\n    ),\n    ColumnMetadata(\n        column_name=\"price\",\n        data_type=\"float\",\n        min_value=0,\n        max_value=10000,\n        business_rule=\"Must be non-negative\"\n    )\n]\n\n# Analyze with custom metadata\nimport pandas as pd\ndata = pd.read_csv(\"products.csv\")\nfrom funputer.simple_analyzer import SimpleImputationAnalyzer\n\nanalyzer = SimpleImputationAnalyzer()\nresults = analyzer.analyze_dataframe(data, metadata)\n```\n\n### CLI Usage with Enhanced Metadata & PREFLIGHT\n```bash\n# PREFLIGHT runs automatically before init/analyze\nfunputer init -d products.csv -o products_metadata.csv\n# \ud83d\udd0d Preflight Check: \u2705 OK - File validated, ready for processing\n\n# Edit metadata.csv to add constraints, then:\nfunputer analyze -d products.csv -m products_metadata.csv -o results.csv\n# \ud83d\udd0d Preflight Check: \u2705 OK - Recommendation: Analyze Now\n\n# Run standalone preflight validation\nfunputer preflight -d products.csv --json-out validation_report.json\n\n# Disable preflight if needed (not recommended)\nexport FUNPUTER_PREFLIGHT=off\nfunputer analyze -d products.csv\n\n# Results are automatically saved in CSV format for easy viewing\n```\n\n## \ud83d\udccb Requirements\n\n- **Python**: 3.9 or higher\n- **Dependencies**: pandas, numpy, scipy, scikit-learn\n\n## \ud83d\udd27 Installation from Source\n\n```bash\ngit clone https://github.com/RajeshRamachander/funputer.git\ncd funputer\npip install -e .\n```\n\n## \ud83d\udcda Comprehensive Examples\n\nFunPuter comes with extensive real-world examples covering every feature:\n\n### \ud83c\udfaf **Quick Start Examples**\n- **[quick_start_guide.py](examples/quick_start_guide.py)** - Get started in 5 minutes with common patterns\n- **[comprehensive_usage_guide.py](examples/comprehensive_usage_guide.py)** - Every feature demonstrated\n- **[cli_examples.sh](examples/cli_examples.sh)** - Complete CLI usage guide\n\n### \ud83c\udfed **Industry Examples**\n- **[real_world_examples.py](examples/real_world_examples.py)** - Production scenarios across industries:\n  - \ud83d\uded2 **E-commerce Customer Analytics** - Customer behavior, churn prediction\n  - \ud83c\udfe5 **Healthcare Patient Records** - Clinical data with regulatory constraints  \n  - \ud83d\udcb0 **Financial Risk Assessment** - Credit scoring, loan applications\n  - \ud83d\udce2 **Marketing Campaign Analysis** - ROI optimization, A/B testing\n  - \ud83c\udf21\ufe0f **IoT Sensor Data** - Time series, equipment monitoring\n\n### \ud83d\udcca **Usage Patterns**\n\n**Auto-Inference (Zero Configuration)**\n```python\n# Perfect for data exploration and prototyping\nsuggestions = funputer.analyze_imputation_requirements(\"mystery_data.csv\")\n```\n\n**Production Mode (Full Control)**\n```python\n# Enterprise-grade with constraint validation\nfrom funputer.models import ColumnMetadata, AnalysisConfig\n\nmetadata = [\n    ColumnMetadata('customer_id', 'integer', unique_flag=True, nullable=False),\n    ColumnMetadata('age', 'integer', min_value=18, max_value=100),\n    ColumnMetadata('income', 'float', dependent_column='age', \n                   business_rule='Income correlates with age'),\n    ColumnMetadata('category', 'categorical', allowed_values='A,B,C,D')\n]\n\nconfig = AnalysisConfig(missing_percentage_threshold=0.25, skip_columns=['id'])\nsuggestions = funputer.analyze_dataframe(df, metadata, config)\n```\n\n**CLI Automation**\n```bash\n# Batch processing workflow\nfor file in data/*.csv; do\n    funputer preflight \"$file\" && \\\n    funputer analyze -d \"$file\" --output \"results/$(basename \"$file\" .csv)_plan.csv\"\ndone\n```\n\n### \ud83c\udf93 **Learning Path**\n\n1. **Start Here**: `quick_start_guide.py` - Master the basics in 5 minutes\n2. **Go Deeper**: `comprehensive_usage_guide.py` - Learn every feature  \n3. **Real World**: `real_world_examples.py` - See industry applications\n4. **CLI Mastery**: `cli_examples.sh` - Automate your workflows\n5. **Production**: Use the patterns in your specific domain\n\n### \ud83d\udca1 **Pro Tips**\n\n- **Exploration**: Use auto-inference for quick insights\n- **Production**: Always use explicit metadata with constraints\n- **Automation**: CLI is perfect for CI/CD and batch processing\n- **Validation**: Run preflight checks before expensive analysis\n- **Performance**: Skip unnecessary columns, tune thresholds appropriately\n\n## \ud83d\udcda Documentation\n\n- **Examples Directory**: [examples/](examples/) - Comprehensive usage examples\n- **API Reference**: See docstrings and type hints in the code\n- **Changelog**: [CHANGELOG.md](CHANGELOG.md) - Version history and features\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n## \ud83d\udcc4 License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n---\n\n**Focus**: Get intelligent imputation recommendations with enhanced metadata support, not complex infrastructure.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Simple, intelligent imputation analysis with PREFLIGHT validation and auto-metadata inference",
    "version": "1.3.1",
    "project_urls": {
        "Documentation": "https://github.com/RajeshRamachander/funputer#readme",
        "Homepage": "https://github.com/RajeshRamachander/funputer",
        "Issues": "https://github.com/RajeshRamachander/funputer/issues",
        "Repository": "https://github.com/RajeshRamachander/funputer"
    },
    "split_keywords": [
        "imputation",
        " missing-data",
        " data-science",
        " machine-learning",
        " pandas",
        " auto-inference",
        " metadata",
        " preflight",
        " validation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "16b4874fb286d7703f2c1180b395e7c9eb86ddc63a8385f60c142cb6de339790",
                "md5": "46fbef84cb582f5f448785a4cc14cec7",
                "sha256": "04b645714f2e1ff89321cff07960b70048107b0afe097bfb2c66f72d9461e4ac"
            },
            "downloads": -1,
            "filename": "funputer-1.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "46fbef84cb582f5f448785a4cc14cec7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 66952,
            "upload_time": "2025-08-06T14:35:17",
            "upload_time_iso_8601": "2025-08-06T14:35:17.442004Z",
            "url": "https://files.pythonhosted.org/packages/16/b4/874fb286d7703f2c1180b395e7c9eb86ddc63a8385f60c142cb6de339790/funputer-1.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a7d91ecfc07f87fd9d30ce8ec4e6943d95c02f8f7f15cd5d9c295db386bfece7",
                "md5": "030b7ad306042f3abfad8a2486d21968",
                "sha256": "e4026a67ffcae811dc6576b900e7d2ab820ef3a85008c521ae6eceba1af1b7da"
            },
            "downloads": -1,
            "filename": "funputer-1.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "030b7ad306042f3abfad8a2486d21968",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 125790,
            "upload_time": "2025-08-06T14:35:18",
            "upload_time_iso_8601": "2025-08-06T14:35:18.859604Z",
            "url": "https://files.pythonhosted.org/packages/a7/d9/1ecfc07f87fd9d30ce8ec4e6943d95c02f8f7f15cd5d9c295db386bfece7/funputer-1.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-06 14:35:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "RajeshRamachander",
    "github_project": "funputer#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.21.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.9.0"
                ]
            ]
        },
        {
            "name": "pyyaml",
            "specs": [
                [
                    ">=",
                    "6.0"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    ">=",
                    "8.0.0"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.20.0"
                ]
            ]
        },
        {
            "name": "jsonschema",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        }
    ],
    "lcname": "funputer"
}
        
Elapsed time: 0.98214s