# FunPuter v1.3.0 - Intelligent Imputation Analysis
[](https://www.python.org/downloads/)
[](https://pypi.org/project/funputer/)
[](https://opensource.org/licenses/MIT)
[](#test-coverage)
**Production-ready intelligent imputation analysis with comprehensive constraint validation and auto-metadata inference.**
FunPuter analyzes your data and suggests the best imputation methods based on:
- ๐ค **Auto-metadata inference** (10/12 fields detected automatically)
- ๐ **Missing data mechanisms** (MCAR, MAR, MNAR detection)
- ๐ **Data types** and statistical properties
- ๐ข **Business rules** and column dependencies
- โก **Enhanced constraints** (nullable, allowed_values, max_length validation)
- ๐ก๏ธ **PREFLIGHT system** (8 core validation checks A1-A8)
- ๐ฏ **Adaptive thresholds** based on your dataset characteristics
## ๐ Quick Start
### Installation
```bash
pip install funputer
```
### 30-Second Demo
**๐ค Auto-Inference Mode (Zero Configuration!)**
```python
import funputer
# Just point to your CSV - FunPuter figures out everything automatically!
suggestions = funputer.analyze_imputation_requirements("your_data.csv")
# Get intelligent suggestions
for suggestion in suggestions:
if suggestion.missing_count > 0:
print(f"๐ {suggestion.column_name}: {suggestion.proposed_method}")
print(f" Confidence: {suggestion.confidence_score:.2f}")
print(f" Reason: {suggestion.rationale}")
```
**๐ Production Mode (Full Control)**
```python
import funputer
from funputer.models import ColumnMetadata
# Define your data structure with constraints
metadata = [
ColumnMetadata('customer_id', 'integer', unique_flag=True),
ColumnMetadata('age', 'integer', min_value=18, max_value=100),
ColumnMetadata('income', 'float', min_value=0),
ColumnMetadata('category', 'categorical', allowed_values='A,B,C'),
]
# Get production-grade suggestions
suggestions = funputer.analyze_dataframe(your_dataframe, metadata)
```
**๐ฅ๏ธ Command Line Interface**
```bash
# Auto-inference - easiest way
funputer analyze -d your_data.csv
# Production analysis with metadata
funputer analyze -d your_data.csv -m metadata.csv --verbose
# Data quality check first
funputer preflight your_data.csv
```
## ๐จ **IMPORTANT: v1.3.0 Breaking Change**
**๐ฏ Consistent Naming**: Starting with v1.3.0, all imports and CLI commands use consistent `funputer` naming:
```python
# โ
NEW (v1.3.0+): Consistent naming
import funputer
funputer.analyze_imputation_requirements("data.csv")
```
```bash
# โ
NEW CLI command (v1.3.0+)
funputer analyze -d data.csv
```
**๐ Migration**: For backward compatibility, old imports still work with deprecation warnings:
```python
# โ ๏ธ DEPRECATED (still works but shows warning)
import funimpute
# Old funimputer CLI command also still works
```
**๐
Timeline**: Deprecated imports will be removed in v2.0.0. Please update your code!
## ๐ฏ Enhanced Features (v1.3.0)
**What's New in v1.3.0:**
- ๐ฏ **Consistent Naming**: All imports and CLI use `funputer` (backward compatible)
- ๐ **JSON Metadata Support**: SimpleImputationAnalyzer now handles both CSV and JSON metadata formats
- ๐ **Enhanced Documentation**: Updated examples and migration guides
**Previous Features (v1.2.1):**
- ๐จ **PREFLIGHT System**: Lean validation (75% test coverage) that runs before ANY analysis - prevents crashes!
- ๐ **Smart Auto-Inference**: Intelligent metadata detection with confidence scoring (10/12 fields)
- โก **Constraint Validation**: Real-time nullable, allowed_values, and max_length checking
- ๐ฏ **Enhanced Proposals**: Metadata-aware imputation method selection
- ๐ก๏ธ **Exception Detection**: Comprehensive constraint violation handling (68% test coverage)
- ๐ **Improved Confidence**: Dynamic scoring based on metadata compliance
- ๐งน **Warning Suppression**: Clean output with optimized pandas datetime parsing
- โ
**Quality Assurance**: 51% overall test coverage with 220+ tests (98.3% pass rate)
## ๐จ PREFLIGHT System (NEW!)
**Fast validation to prevent crashes and guide your workflow**
### What PREFLIGHT Does
- **Runs automatically** before `init` and `analyze` commands
- **8 core checks** (A1-A8): file access, format detection, encoding, structure, memory estimation
- **Advisory recommendations**: "generate metadata first" vs "analyze now"
- **Zero crashes**: Catches problems before they break your workflow
- **Backward compatible**: All existing commands work exactly as before
### Independent Usage
```bash
# Basic preflight check
funputer preflight -d your_data.csv
# With custom options
funputer preflight -d data.csv --sample-rows 5000 --encoding utf-8
# JSON report output
funputer preflight -d data.csv --json-out report.json
```
### Exit Codes
- **0**: โ
Ready for analysis
- **2**: โ ๏ธ OK with warnings (can proceed)
- **10**: โ Hard error (cannot proceed)
### Example Output
```bash
๐ PREFLIGHT REPORT
==================================================
Status: โ
OK
File: data.csv
Size: 2.5 MB (csv)
Columns: 12
Recommendation: Analyze Infer Only
```
FunPuter now supports comprehensive metadata fields that actively influence imputation recommendations:
### Metadata Schema
| Field | Type | Description | Example |
|-------|------|-------------|---------|
| `column_name` | string | Column identifier | `"age"` |
| `data_type` | string | Data type (integer, float, string, categorical, datetime) | `"integer"` |
| `nullable` | boolean | Allow null values | `false` |
| `min_value` | number | Minimum allowed value | `0` |
| `max_value` | number | Maximum allowed value | `120` |
| `max_length` | integer | Maximum string length | `50` |
| `allowed_values` | string | Comma-separated list of allowed values | `"A,B,C"` |
| `unique_flag` | boolean | Require unique values | `true` |
| `dependent_column` | string | Column dependencies | `"age"` |
| `business_rule` | string | Custom validation rules | `"Must be positive"` |
| `description` | string | Human-readable description | `"User age in years"` |
### ๐ ๏ธ Creating Metadata
**Method 1: CLI Template Generation**
```bash
# Generate a metadata template from your data
funputer init -d data.csv -o metadata.csv
# Edit the generated file to add constraints
# Then analyze with enhanced metadata
funputer analyze -d data.csv -m metadata.csv
```
**Method 2: Manual CSV Creation**
```csv
# metadata.csv
# column_name,data_type,nullable,min_value,max_value,max_length,allowed_values,unique_flag,dependent_column,business_rule,description
user_id,integer,false,,,50,,true,,,"Unique user identifier"
age,integer,false,0,120,,,,,Must be positive,"User age in years"
income,float,true,0,,,,,age,Higher with age,"Annual income in USD"
category,categorical,false,,,10,"A,B,C",,,,"User category classification"
email,string,true,,,255,,true,,,"User email address"
```
### ๐ฏ Metadata in Action
**Example 1: Nullable Constraints**
```python
# When nullable=False but data has missing values
metadata = ColumnMetadata(
column_name="age",
data_type="integer",
nullable=False,
min_value=0,
max_value=120
)
# FunPuter will:
# - Detect nullable constraint violations
# - Recommend immediate data quality fixes
# - Lower confidence score due to constraint violations
```
**Example 2: Allowed Values**
```python
# For categorical data with specific allowed values
metadata = ColumnMetadata(
column_name="status",
data_type="categorical",
allowed_values="active,inactive,pending"
)
# FunPuter will:
# - Validate all values against allowed list
# - Recommend mode imputation using only allowed values
# - Increase confidence when data respects constraints
```
**Example 3: String Length Constraints**
```python
# For string data with length limits
metadata = ColumnMetadata(
column_name="username",
data_type="string",
max_length=20,
unique_flag=True
)
# FunPuter will:
# - Check string lengths against max_length
# - Recommend imputation respecting length limits
# - Consider uniqueness requirements in recommendations
```
### ๐ Enhanced Analysis Results
```python
# Results now include metadata-aware recommendations
for suggestion in suggestions:
print(f"Column: {suggestion.column_name}")
print(f"Method: {suggestion.proposed_method}")
print(f"Confidence: {suggestion.confidence_score:.3f}")
print(f"Rationale: {suggestion.rationale}")
# New: Metadata constraint information
if suggestion.metadata_violations:
print(f"Violations: {suggestion.metadata_violations}")
# New: Enhanced parameters
if suggestion.parameters:
print(f"Parameters: {suggestion.parameters}")
```
## ๐ Confidence-Score Heuristics
FunPuter assigns a **`confidence_score`** (range **0 โ 1**) to every imputation recommendation. The value is a transparent, rule-based estimate of how reliable the proposed method is, **not** a formal statistical uncertainty. Two calculators are used:
### Base heuristic
When only column-level data is available (no full DataFrame), the score is computed as follows:
| Signal | Condition | ฮ Score |
|--------|-----------|---------|
| **Starting value** | | **0.50** |
| Missing % | `< 5 %` +0.20 โข `5 โ 20 %` +0.10 โข `> 50 %` โ0.20 |
| Mechanism | MCAR (weak evidence) +0.10 โข MAR (related cols) +0.05 โข MNAR/UNKNOWN โ0.10 |
| Outliers | `< 5 %` +0.05 โข `> 20 %` โ0.10 |
| Metadata constraints | `allowed_values` (categorical/string) +0.10 โข `max_length` (string) +0.05 |
| Nullable constraint | `nullable=False` **with** missing โ0.15 โข **without** missing +0.05 |
| Data-quality checks | Strings within `max_length` +0.05 โข Categorical values inside `allowed_values` + *(valid_ratio ร 0.10)* |
The final score is clipped to the **[0.10, 1.00]** interval.
### Adaptive variant
When the analyzer receives the full DataFrame **and** complete metadata, it builds dataset-specific thresholds using `AdaptiveThresholds` and applies `calculate_adaptive_confidence_score`:
* Adaptive missing/outlier thresholds (based on row-count, variability, etc.)
* An additional adjustment factor (โ0.30 โฆ +0.30) reflecting dataset characteristics
This yields a context-aware score that remains interpretable yet sensitive to each dataset.
### Future work
For maximum transparency and speed we use heuristics today. Future releases may include probabilistic or conformal approaches (e.g., multiple-imputation variance or ensemble uncertainty) to provide statistically grounded confidence estimates.
## ๐ Advanced Usage
### Programmatic Metadata Creation
```python
from funputer.models import ColumnMetadata
metadata = [
ColumnMetadata(
column_name="product_code",
data_type="string",
max_length=10,
allowed_values="A1,A2,B1,B2",
nullable=False,
description="Product classification code"
),
ColumnMetadata(
column_name="price",
data_type="float",
min_value=0,
max_value=10000,
business_rule="Must be non-negative"
)
]
# Analyze with custom metadata
import pandas as pd
data = pd.read_csv("products.csv")
from funputer.simple_analyzer import SimpleImputationAnalyzer
analyzer = SimpleImputationAnalyzer()
results = analyzer.analyze_dataframe(data, metadata)
```
### CLI Usage with Enhanced Metadata & PREFLIGHT
```bash
# PREFLIGHT runs automatically before init/analyze
funputer init -d products.csv -o products_metadata.csv
# ๐ Preflight Check: โ
OK - File validated, ready for processing
# Edit metadata.csv to add constraints, then:
funputer analyze -d products.csv -m products_metadata.csv -o results.csv
# ๐ Preflight Check: โ
OK - Recommendation: Analyze Now
# Run standalone preflight validation
funputer preflight -d products.csv --json-out validation_report.json
# Disable preflight if needed (not recommended)
export FUNPUTER_PREFLIGHT=off
funputer analyze -d products.csv
# Results are automatically saved in CSV format for easy viewing
```
## ๐ Requirements
- **Python**: 3.9 or higher
- **Dependencies**: pandas, numpy, scipy, scikit-learn
## ๐ง Installation from Source
```bash
git clone https://github.com/RajeshRamachander/funputer.git
cd funputer
pip install -e .
```
## ๐ Comprehensive Examples
FunPuter comes with extensive real-world examples covering every feature:
### ๐ฏ **Quick Start Examples**
- **[quick_start_guide.py](examples/quick_start_guide.py)** - Get started in 5 minutes with common patterns
- **[comprehensive_usage_guide.py](examples/comprehensive_usage_guide.py)** - Every feature demonstrated
- **[cli_examples.sh](examples/cli_examples.sh)** - Complete CLI usage guide
### ๐ญ **Industry Examples**
- **[real_world_examples.py](examples/real_world_examples.py)** - Production scenarios across industries:
- ๐ **E-commerce Customer Analytics** - Customer behavior, churn prediction
- ๐ฅ **Healthcare Patient Records** - Clinical data with regulatory constraints
- ๐ฐ **Financial Risk Assessment** - Credit scoring, loan applications
- ๐ข **Marketing Campaign Analysis** - ROI optimization, A/B testing
- ๐ก๏ธ **IoT Sensor Data** - Time series, equipment monitoring
### ๐ **Usage Patterns**
**Auto-Inference (Zero Configuration)**
```python
# Perfect for data exploration and prototyping
suggestions = funputer.analyze_imputation_requirements("mystery_data.csv")
```
**Production Mode (Full Control)**
```python
# Enterprise-grade with constraint validation
from funputer.models import ColumnMetadata, AnalysisConfig
metadata = [
ColumnMetadata('customer_id', 'integer', unique_flag=True, nullable=False),
ColumnMetadata('age', 'integer', min_value=18, max_value=100),
ColumnMetadata('income', 'float', dependent_column='age',
business_rule='Income correlates with age'),
ColumnMetadata('category', 'categorical', allowed_values='A,B,C,D')
]
config = AnalysisConfig(missing_percentage_threshold=0.25, skip_columns=['id'])
suggestions = funputer.analyze_dataframe(df, metadata, config)
```
**CLI Automation**
```bash
# Batch processing workflow
for file in data/*.csv; do
funputer preflight "$file" && \
funputer analyze -d "$file" --output "results/$(basename "$file" .csv)_plan.csv"
done
```
### ๐ **Learning Path**
1. **Start Here**: `quick_start_guide.py` - Master the basics in 5 minutes
2. **Go Deeper**: `comprehensive_usage_guide.py` - Learn every feature
3. **Real World**: `real_world_examples.py` - See industry applications
4. **CLI Mastery**: `cli_examples.sh` - Automate your workflows
5. **Production**: Use the patterns in your specific domain
### ๐ก **Pro Tips**
- **Exploration**: Use auto-inference for quick insights
- **Production**: Always use explicit metadata with constraints
- **Automation**: CLI is perfect for CI/CD and batch processing
- **Validation**: Run preflight checks before expensive analysis
- **Performance**: Skip unnecessary columns, tune thresholds appropriately
## ๐ Documentation
- **Examples Directory**: [examples/](examples/) - Comprehensive usage examples
- **API Reference**: See docstrings and type hints in the code
- **Changelog**: [CHANGELOG.md](CHANGELOG.md) - Version history and features
## ๐ค Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
## ๐ License
MIT License - see [LICENSE](LICENSE) file for details.
---
**Focus**: Get intelligent imputation recommendations with enhanced metadata support, not complex infrastructure.
Raw data
{
"_id": null,
"home_page": null,
"name": "funputer",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "imputation, missing-data, data-science, machine-learning, pandas, auto-inference, metadata, preflight, validation",
"author": null,
"author_email": "Rajesh Ramachander <rajeshr.technocraft@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/a7/d9/1ecfc07f87fd9d30ce8ec4e6943d95c02f8f7f15cd5d9c295db386bfece7/funputer-1.3.1.tar.gz",
"platform": null,
"description": "# FunPuter v1.3.0 - Intelligent Imputation Analysis\n\n[](https://www.python.org/downloads/)\n[](https://pypi.org/project/funputer/)\n[](https://opensource.org/licenses/MIT)\n[](#test-coverage)\n\n**Production-ready intelligent imputation analysis with comprehensive constraint validation and auto-metadata inference.**\n\nFunPuter analyzes your data and suggests the best imputation methods based on:\n- \ud83e\udd16 **Auto-metadata inference** (10/12 fields detected automatically)\n- \ud83d\udd0d **Missing data mechanisms** (MCAR, MAR, MNAR detection)\n- \ud83d\udcca **Data types** and statistical properties \n- \ud83c\udfe2 **Business rules** and column dependencies\n- \u26a1 **Enhanced constraints** (nullable, allowed_values, max_length validation)\n- \ud83d\udee1\ufe0f **PREFLIGHT system** (8 core validation checks A1-A8)\n- \ud83c\udfaf **Adaptive thresholds** based on your dataset characteristics\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n```bash\npip install funputer\n```\n\n### 30-Second Demo\n\n**\ud83e\udd16 Auto-Inference Mode (Zero Configuration!)**\n```python\nimport funputer\n\n# Just point to your CSV - FunPuter figures out everything automatically!\nsuggestions = funputer.analyze_imputation_requirements(\"your_data.csv\")\n\n# Get intelligent suggestions\nfor suggestion in suggestions:\n if suggestion.missing_count > 0:\n print(f\"\ud83d\udcca {suggestion.column_name}: {suggestion.proposed_method}\")\n print(f\" Confidence: {suggestion.confidence_score:.2f}\")\n print(f\" Reason: {suggestion.rationale}\")\n```\n\n**\ud83d\udccb Production Mode (Full Control)**\n```python\nimport funputer\nfrom funputer.models import ColumnMetadata\n\n# Define your data structure with constraints\nmetadata = [\n ColumnMetadata('customer_id', 'integer', unique_flag=True),\n ColumnMetadata('age', 'integer', min_value=18, max_value=100),\n ColumnMetadata('income', 'float', min_value=0),\n ColumnMetadata('category', 'categorical', allowed_values='A,B,C'),\n]\n\n# Get production-grade suggestions\nsuggestions = funputer.analyze_dataframe(your_dataframe, metadata)\n```\n\n**\ud83d\udda5\ufe0f Command Line Interface**\n```bash\n# Auto-inference - easiest way\nfunputer analyze -d your_data.csv\n\n# Production analysis with metadata\nfunputer analyze -d your_data.csv -m metadata.csv --verbose\n\n# Data quality check first\nfunputer preflight your_data.csv\n```\n\n## \ud83d\udea8 **IMPORTANT: v1.3.0 Breaking Change**\n\n**\ud83c\udfaf Consistent Naming**: Starting with v1.3.0, all imports and CLI commands use consistent `funputer` naming:\n\n```python\n# \u2705 NEW (v1.3.0+): Consistent naming\nimport funputer\nfunputer.analyze_imputation_requirements(\"data.csv\")\n```\n\n```bash\n# \u2705 NEW CLI command (v1.3.0+)\nfunputer analyze -d data.csv\n```\n\n**\ud83d\udd04 Migration**: For backward compatibility, old imports still work with deprecation warnings:\n\n```python\n# \u26a0\ufe0f DEPRECATED (still works but shows warning)\nimport funimpute\n# Old funimputer CLI command also still works\n```\n\n**\ud83d\udcc5 Timeline**: Deprecated imports will be removed in v2.0.0. Please update your code!\n\n## \ud83c\udfaf Enhanced Features (v1.3.0)\n\n**What's New in v1.3.0:**\n- \ud83c\udfaf **Consistent Naming**: All imports and CLI use `funputer` (backward compatible)\n- \ud83d\udd04 **JSON Metadata Support**: SimpleImputationAnalyzer now handles both CSV and JSON metadata formats\n- \ud83d\udccb **Enhanced Documentation**: Updated examples and migration guides\n\n**Previous Features (v1.2.1):**\n- \ud83d\udea8 **PREFLIGHT System**: Lean validation (75% test coverage) that runs before ANY analysis - prevents crashes!\n- \ud83d\udd0d **Smart Auto-Inference**: Intelligent metadata detection with confidence scoring (10/12 fields)\n- \u26a1 **Constraint Validation**: Real-time nullable, allowed_values, and max_length checking\n- \ud83c\udfaf **Enhanced Proposals**: Metadata-aware imputation method selection\n- \ud83d\udee1\ufe0f **Exception Detection**: Comprehensive constraint violation handling (68% test coverage)\n- \ud83d\udcc8 **Improved Confidence**: Dynamic scoring based on metadata compliance\n- \ud83e\uddf9 **Warning Suppression**: Clean output with optimized pandas datetime parsing\n- \u2705 **Quality Assurance**: 51% overall test coverage with 220+ tests (98.3% pass rate)\n\n## \ud83d\udea8 PREFLIGHT System (NEW!)\n\n**Fast validation to prevent crashes and guide your workflow**\n\n### What PREFLIGHT Does\n- **Runs automatically** before `init` and `analyze` commands\n- **8 core checks** (A1-A8): file access, format detection, encoding, structure, memory estimation\n- **Advisory recommendations**: \"generate metadata first\" vs \"analyze now\"\n- **Zero crashes**: Catches problems before they break your workflow\n- **Backward compatible**: All existing commands work exactly as before\n\n### Independent Usage\n```bash\n# Basic preflight check\nfunputer preflight -d your_data.csv\n\n# With custom options\nfunputer preflight -d data.csv --sample-rows 5000 --encoding utf-8\n\n# JSON report output\nfunputer preflight -d data.csv --json-out report.json\n```\n\n### Exit Codes\n- **0**: \u2705 Ready for analysis\n- **2**: \u26a0\ufe0f OK with warnings (can proceed)\n- **10**: \u274c Hard error (cannot proceed)\n\n### Example Output\n```bash\n\ud83d\udd0d PREFLIGHT REPORT\n==================================================\nStatus: \u2705 OK\nFile: data.csv\nSize: 2.5 MB (csv) \nColumns: 12\nRecommendation: Analyze Infer Only\n```\n\nFunPuter now supports comprehensive metadata fields that actively influence imputation recommendations:\n\n### Metadata Schema\n\n| Field | Type | Description | Example |\n|-------|------|-------------|---------|\n| `column_name` | string | Column identifier | `\"age\"` |\n| `data_type` | string | Data type (integer, float, string, categorical, datetime) | `\"integer\"` |\n| `nullable` | boolean | Allow null values | `false` |\n| `min_value` | number | Minimum allowed value | `0` |\n| `max_value` | number | Maximum allowed value | `120` |\n| `max_length` | integer | Maximum string length | `50` |\n| `allowed_values` | string | Comma-separated list of allowed values | `\"A,B,C\"` |\n| `unique_flag` | boolean | Require unique values | `true` |\n| `dependent_column` | string | Column dependencies | `\"age\"` |\n| `business_rule` | string | Custom validation rules | `\"Must be positive\"` |\n| `description` | string | Human-readable description | `\"User age in years\"` |\n\n### \ud83d\udee0\ufe0f Creating Metadata\n\n**Method 1: CLI Template Generation**\n```bash\n# Generate a metadata template from your data\nfunputer init -d data.csv -o metadata.csv\n\n# Edit the generated file to add constraints\n# Then analyze with enhanced metadata\nfunputer analyze -d data.csv -m metadata.csv\n```\n\n**Method 2: Manual CSV Creation**\n```csv\n# metadata.csv\n# column_name,data_type,nullable,min_value,max_value,max_length,allowed_values,unique_flag,dependent_column,business_rule,description\nuser_id,integer,false,,,50,,true,,,\"Unique user identifier\"\nage,integer,false,0,120,,,,,Must be positive,\"User age in years\"\nincome,float,true,0,,,,,age,Higher with age,\"Annual income in USD\"\ncategory,categorical,false,,,10,\"A,B,C\",,,,\"User category classification\"\nemail,string,true,,,255,,true,,,\"User email address\"\n```\n\n### \ud83c\udfaf Metadata in Action\n\n**Example 1: Nullable Constraints**\n```python\n# When nullable=False but data has missing values\nmetadata = ColumnMetadata(\n column_name=\"age\",\n data_type=\"integer\",\n nullable=False,\n min_value=0,\n max_value=120\n)\n\n# FunPuter will:\n# - Detect nullable constraint violations\n# - Recommend immediate data quality fixes\n# - Lower confidence score due to constraint violations\n```\n\n**Example 2: Allowed Values**\n```python\n# For categorical data with specific allowed values\nmetadata = ColumnMetadata(\n column_name=\"status\",\n data_type=\"categorical\",\n allowed_values=\"active,inactive,pending\"\n)\n\n# FunPuter will:\n# - Validate all values against allowed list\n# - Recommend mode imputation using only allowed values\n# - Increase confidence when data respects constraints\n```\n\n**Example 3: String Length Constraints**\n```python\n# For string data with length limits\nmetadata = ColumnMetadata(\n column_name=\"username\",\n data_type=\"string\",\n max_length=20,\n unique_flag=True\n)\n\n# FunPuter will:\n# - Check string lengths against max_length\n# - Recommend imputation respecting length limits\n# - Consider uniqueness requirements in recommendations\n```\n\n### \ud83d\udcca Enhanced Analysis Results\n\n```python\n# Results now include metadata-aware recommendations\nfor suggestion in suggestions:\n print(f\"Column: {suggestion.column_name}\")\n print(f\"Method: {suggestion.proposed_method}\")\n print(f\"Confidence: {suggestion.confidence_score:.3f}\")\n print(f\"Rationale: {suggestion.rationale}\")\n \n # New: Metadata constraint information\n if suggestion.metadata_violations:\n print(f\"Violations: {suggestion.metadata_violations}\")\n \n # New: Enhanced parameters\n if suggestion.parameters:\n print(f\"Parameters: {suggestion.parameters}\")\n```\n\n## \ud83d\udd0d Confidence-Score Heuristics\n\nFunPuter assigns a **`confidence_score`** (range **0 \u2013 1**) to every imputation recommendation. The value is a transparent, rule-based estimate of how reliable the proposed method is, **not** a formal statistical uncertainty. Two calculators are used:\n\n### Base heuristic\nWhen only column-level data is available (no full DataFrame), the score is computed as follows:\n\n| Signal | Condition | \u0394 Score |\n|--------|-----------|---------|\n| **Starting value** | | **0.50** |\n| Missing % | `< 5 %` +0.20 \u2022 `5 \u2013 20 %` +0.10 \u2022 `> 50 %` \u22120.20 |\n| Mechanism | MCAR (weak evidence) +0.10 \u2022 MAR (related cols) +0.05 \u2022 MNAR/UNKNOWN \u22120.10 |\n| Outliers | `< 5 %` +0.05 \u2022 `> 20 %` \u22120.10 |\n| Metadata constraints | `allowed_values` (categorical/string) +0.10 \u2022 `max_length` (string) +0.05 |\n| Nullable constraint | `nullable=False` **with** missing \u22120.15 \u2022 **without** missing +0.05 |\n| Data-quality checks | Strings within `max_length` +0.05 \u2022 Categorical values inside `allowed_values` + *(valid_ratio \u00d7 0.10)* |\n\nThe final score is clipped to the **[0.10, 1.00]** interval.\n\n### Adaptive variant\nWhen the analyzer receives the full DataFrame **and** complete metadata, it builds dataset-specific thresholds using `AdaptiveThresholds` and applies `calculate_adaptive_confidence_score`:\n\n* Adaptive missing/outlier thresholds (based on row-count, variability, etc.)\n* An additional adjustment factor (\u22120.30 \u2026 +0.30) reflecting dataset characteristics\n\nThis yields a context-aware score that remains interpretable yet sensitive to each dataset.\n\n### Future work\nFor maximum transparency and speed we use heuristics today. Future releases may include probabilistic or conformal approaches (e.g., multiple-imputation variance or ensemble uncertainty) to provide statistically grounded confidence estimates.\n\n## \ud83d\ude80 Advanced Usage\n\n### Programmatic Metadata Creation\n```python\nfrom funputer.models import ColumnMetadata\n\nmetadata = [\n ColumnMetadata(\n column_name=\"product_code\",\n data_type=\"string\",\n max_length=10,\n allowed_values=\"A1,A2,B1,B2\",\n nullable=False,\n description=\"Product classification code\"\n ),\n ColumnMetadata(\n column_name=\"price\",\n data_type=\"float\",\n min_value=0,\n max_value=10000,\n business_rule=\"Must be non-negative\"\n )\n]\n\n# Analyze with custom metadata\nimport pandas as pd\ndata = pd.read_csv(\"products.csv\")\nfrom funputer.simple_analyzer import SimpleImputationAnalyzer\n\nanalyzer = SimpleImputationAnalyzer()\nresults = analyzer.analyze_dataframe(data, metadata)\n```\n\n### CLI Usage with Enhanced Metadata & PREFLIGHT\n```bash\n# PREFLIGHT runs automatically before init/analyze\nfunputer init -d products.csv -o products_metadata.csv\n# \ud83d\udd0d Preflight Check: \u2705 OK - File validated, ready for processing\n\n# Edit metadata.csv to add constraints, then:\nfunputer analyze -d products.csv -m products_metadata.csv -o results.csv\n# \ud83d\udd0d Preflight Check: \u2705 OK - Recommendation: Analyze Now\n\n# Run standalone preflight validation\nfunputer preflight -d products.csv --json-out validation_report.json\n\n# Disable preflight if needed (not recommended)\nexport FUNPUTER_PREFLIGHT=off\nfunputer analyze -d products.csv\n\n# Results are automatically saved in CSV format for easy viewing\n```\n\n## \ud83d\udccb Requirements\n\n- **Python**: 3.9 or higher\n- **Dependencies**: pandas, numpy, scipy, scikit-learn\n\n## \ud83d\udd27 Installation from Source\n\n```bash\ngit clone https://github.com/RajeshRamachander/funputer.git\ncd funputer\npip install -e .\n```\n\n## \ud83d\udcda Comprehensive Examples\n\nFunPuter comes with extensive real-world examples covering every feature:\n\n### \ud83c\udfaf **Quick Start Examples**\n- **[quick_start_guide.py](examples/quick_start_guide.py)** - Get started in 5 minutes with common patterns\n- **[comprehensive_usage_guide.py](examples/comprehensive_usage_guide.py)** - Every feature demonstrated\n- **[cli_examples.sh](examples/cli_examples.sh)** - Complete CLI usage guide\n\n### \ud83c\udfed **Industry Examples**\n- **[real_world_examples.py](examples/real_world_examples.py)** - Production scenarios across industries:\n - \ud83d\uded2 **E-commerce Customer Analytics** - Customer behavior, churn prediction\n - \ud83c\udfe5 **Healthcare Patient Records** - Clinical data with regulatory constraints \n - \ud83d\udcb0 **Financial Risk Assessment** - Credit scoring, loan applications\n - \ud83d\udce2 **Marketing Campaign Analysis** - ROI optimization, A/B testing\n - \ud83c\udf21\ufe0f **IoT Sensor Data** - Time series, equipment monitoring\n\n### \ud83d\udcca **Usage Patterns**\n\n**Auto-Inference (Zero Configuration)**\n```python\n# Perfect for data exploration and prototyping\nsuggestions = funputer.analyze_imputation_requirements(\"mystery_data.csv\")\n```\n\n**Production Mode (Full Control)**\n```python\n# Enterprise-grade with constraint validation\nfrom funputer.models import ColumnMetadata, AnalysisConfig\n\nmetadata = [\n ColumnMetadata('customer_id', 'integer', unique_flag=True, nullable=False),\n ColumnMetadata('age', 'integer', min_value=18, max_value=100),\n ColumnMetadata('income', 'float', dependent_column='age', \n business_rule='Income correlates with age'),\n ColumnMetadata('category', 'categorical', allowed_values='A,B,C,D')\n]\n\nconfig = AnalysisConfig(missing_percentage_threshold=0.25, skip_columns=['id'])\nsuggestions = funputer.analyze_dataframe(df, metadata, config)\n```\n\n**CLI Automation**\n```bash\n# Batch processing workflow\nfor file in data/*.csv; do\n funputer preflight \"$file\" && \\\n funputer analyze -d \"$file\" --output \"results/$(basename \"$file\" .csv)_plan.csv\"\ndone\n```\n\n### \ud83c\udf93 **Learning Path**\n\n1. **Start Here**: `quick_start_guide.py` - Master the basics in 5 minutes\n2. **Go Deeper**: `comprehensive_usage_guide.py` - Learn every feature \n3. **Real World**: `real_world_examples.py` - See industry applications\n4. **CLI Mastery**: `cli_examples.sh` - Automate your workflows\n5. **Production**: Use the patterns in your specific domain\n\n### \ud83d\udca1 **Pro Tips**\n\n- **Exploration**: Use auto-inference for quick insights\n- **Production**: Always use explicit metadata with constraints\n- **Automation**: CLI is perfect for CI/CD and batch processing\n- **Validation**: Run preflight checks before expensive analysis\n- **Performance**: Skip unnecessary columns, tune thresholds appropriately\n\n## \ud83d\udcda Documentation\n\n- **Examples Directory**: [examples/](examples/) - Comprehensive usage examples\n- **API Reference**: See docstrings and type hints in the code\n- **Changelog**: [CHANGELOG.md](CHANGELOG.md) - Version history and features\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n## \ud83d\udcc4 License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n---\n\n**Focus**: Get intelligent imputation recommendations with enhanced metadata support, not complex infrastructure.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Simple, intelligent imputation analysis with PREFLIGHT validation and auto-metadata inference",
"version": "1.3.1",
"project_urls": {
"Documentation": "https://github.com/RajeshRamachander/funputer#readme",
"Homepage": "https://github.com/RajeshRamachander/funputer",
"Issues": "https://github.com/RajeshRamachander/funputer/issues",
"Repository": "https://github.com/RajeshRamachander/funputer"
},
"split_keywords": [
"imputation",
" missing-data",
" data-science",
" machine-learning",
" pandas",
" auto-inference",
" metadata",
" preflight",
" validation"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "16b4874fb286d7703f2c1180b395e7c9eb86ddc63a8385f60c142cb6de339790",
"md5": "46fbef84cb582f5f448785a4cc14cec7",
"sha256": "04b645714f2e1ff89321cff07960b70048107b0afe097bfb2c66f72d9461e4ac"
},
"downloads": -1,
"filename": "funputer-1.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "46fbef84cb582f5f448785a4cc14cec7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 66952,
"upload_time": "2025-08-06T14:35:17",
"upload_time_iso_8601": "2025-08-06T14:35:17.442004Z",
"url": "https://files.pythonhosted.org/packages/16/b4/874fb286d7703f2c1180b395e7c9eb86ddc63a8385f60c142cb6de339790/funputer-1.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a7d91ecfc07f87fd9d30ce8ec4e6943d95c02f8f7f15cd5d9c295db386bfece7",
"md5": "030b7ad306042f3abfad8a2486d21968",
"sha256": "e4026a67ffcae811dc6576b900e7d2ab820ef3a85008c521ae6eceba1af1b7da"
},
"downloads": -1,
"filename": "funputer-1.3.1.tar.gz",
"has_sig": false,
"md5_digest": "030b7ad306042f3abfad8a2486d21968",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 125790,
"upload_time": "2025-08-06T14:35:18",
"upload_time_iso_8601": "2025-08-06T14:35:18.859604Z",
"url": "https://files.pythonhosted.org/packages/a7/d9/1ecfc07f87fd9d30ce8ec4e6943d95c02f8f7f15cd5d9c295db386bfece7/funputer-1.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-06 14:35:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "RajeshRamachander",
"github_project": "funputer#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pandas",
"specs": [
[
">=",
"1.5.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.21.0"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.9.0"
]
]
},
{
"name": "pyyaml",
"specs": [
[
">=",
"6.0"
]
]
},
{
"name": "click",
"specs": [
[
">=",
"8.0.0"
]
]
},
{
"name": "pydantic",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "requests",
"specs": [
[
">=",
"2.20.0"
]
]
},
{
"name": "jsonschema",
"specs": [
[
">=",
"4.0.0"
]
]
}
],
"lcname": "funputer"
}