funputer


Namefunputer JSON
Version 1.5.2 PyPI version JSON
download
home_pageNone
SummaryIntelligent imputation analysis with automatic data validation and metadata inference
upload_time2025-08-19 17:21:23
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords imputation missing-data data-science machine-learning pandas auto-inference metadata preflight validation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # FunPuter - Intelligent Imputation Analysis

[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![PyPI](https://img.shields.io/pypi/v/funputer.svg)](https://pypi.org/project/funputer/)
[![License: Proprietary](https://img.shields.io/badge/License-Proprietary-red.svg)]
[![Test Coverage](https://img.shields.io/badge/coverage-84%25-brightgreen.svg)](#documentation)

**Intelligent imputation analysis with automatic data validation and metadata inference**

FunPuter analyzes your data and recommends the best imputation methods based on data patterns, missing mechanisms, and metadata constraints. Get intelligent suggestions with confidence scores to handle missing data professionally.

## 🚀 Quick Start

### Installation

```bash
pip install funputer
```

### 30-Second Example

**Auto-Inference Mode** (Zero Configuration)
```python
import funputer

# Point to your CSV - FunPuter figures out everything automatically
suggestions = funputer.analyze_imputation_requirements("your_data.csv")

# Get intelligent suggestions with confidence scores
for suggestion in suggestions:
    if suggestion.missing_count > 0:
        print(f"📊 {suggestion.column_name}: {suggestion.proposed_method}")
        print(f"   Confidence: {suggestion.confidence_score:.3f}")
        print(f"   Reason: {suggestion.rationale}")
        print(f"   Missing: {suggestion.missing_count} ({suggestion.missing_percentage:.1f}%)")
```

**Production Mode** (Full Control)
```python
import funputer
from funputer.models import ColumnMetadata

# Define your data structure with constraints
metadata = [
    ColumnMetadata('customer_id', 'integer', unique_flag=True, nullable=False),
    ColumnMetadata('age', 'integer', min_value=18, max_value=100),
    ColumnMetadata('income', 'float', min_value=0),
    ColumnMetadata('category', 'categorical', allowed_values='A,B,C'),
]

# Get production-grade suggestions
suggestions = funputer.analyze_dataframe(your_dataframe, metadata)
```

## 🎯 Key Features

- **🤖 Automatic Metadata Inference** - Intelligent data type and constraint detection
- **📊 Missing Data Analysis** - MCAR, MAR, MNAR mechanism detection  
- **⚡ Data Validation** - Real-time constraint checking and validation
- **🎯 Smart Recommendations** - Context-aware imputation method suggestions
- **📈 Confidence Scoring** - Transparent reliability estimates for each recommendation
- **🛡️ Pre-flight Checks** - Comprehensive data validation before analysis
- **💻 CLI & Python API** - Flexible usage via command line or programmatic access

## 📊 Data Validation System

Comprehensive validation runs automatically to prevent crashes and guide your workflow:

- **File validation**: Format detection, encoding, accessibility
- **Structure validation**: Column analysis, data type inference  
- **Memory estimation**: Resource usage prediction
- **Advisory recommendations**: Guided workflow suggestions

**Independent Usage:**
```bash
# Basic validation check
funputer preflight -d your_data.csv

# With custom options  
funputer preflight -d data.csv --sample-rows 5000 --encoding utf-8

# JSON report output
funputer preflight -d data.csv --json-out report.json
```

**Exit Codes:**
- `0`: Ready for analysis
- `2`: OK with warnings (can proceed)
- `10`: Hard error (cannot proceed)

## 💻 Command Line Interface

```bash
# Generate metadata template from your data
funputer init -d data.csv -o metadata.csv

# Analyze with auto-inference  
funputer analyze -d data.csv

# Analyze with custom metadata
funputer analyze -d data.csv -m metadata.csv --verbose

# Data quality check first
funputer preflight -d data.csv
```

## 📚 Usage Examples

### Basic Analysis

```python
import funputer

# Simple analysis with auto-inference
suggestions = funputer.analyze_imputation_requirements("sales_data.csv")

# Display recommendations
for suggestion in suggestions:
    print(f"Column: {suggestion.column_name}")
    print(f"Method: {suggestion.proposed_method}")  
    print(f"Confidence: {suggestion.confidence_score:.3f}")
    print(f"Missing: {suggestion.missing_count} values")
    print()
```

### Advanced Configuration

```python
from funputer.models import ColumnMetadata, AnalysisConfig
from funputer.analyzer import ImputationAnalyzer

# Custom metadata with business rules
metadata = [
    ColumnMetadata('product_id', 'string', unique_flag=True, max_length=10),
    ColumnMetadata('price', 'float', min_value=0, max_value=10000),
    ColumnMetadata('category', 'categorical', allowed_values='Electronics,Books,Clothing'),
    ColumnMetadata('rating', 'float', min_value=1.0, max_value=5.0),
]

# Custom analysis configuration
config = AnalysisConfig(
    missing_percentage_threshold=0.3,  # 30% threshold
    skip_columns=['internal_id'],
    outlier_threshold=0.1
)

# Run analysis
analyzer = ImputationAnalyzer(config)
suggestions = analyzer.analyze_dataframe(df, metadata)
```

### Industry-Specific Examples

**E-commerce Analytics**
```python
metadata = [
    ColumnMetadata('customer_id', 'integer', unique_flag=True, nullable=False),
    ColumnMetadata('age', 'integer', min_value=13, max_value=120),
    ColumnMetadata('purchase_amount', 'float', min_value=0),
    ColumnMetadata('customer_segment', 'categorical', allowed_values='Premium,Standard,Basic'),
]
suggestions = funputer.analyze_dataframe(customer_df, metadata)
```

**Healthcare Data**  
```python
metadata = [
    ColumnMetadata('patient_id', 'integer', unique_flag=True, nullable=False),
    ColumnMetadata('age', 'integer', min_value=0, max_value=150),
    ColumnMetadata('blood_pressure', 'integer', min_value=50, max_value=300),
    ColumnMetadata('diagnosis', 'categorical', nullable=False),
]
config = AnalysisConfig(missing_threshold=0.05)  # Low tolerance for healthcare
suggestions = funputer.analyze_dataframe(patient_df, metadata, config)
```

**Financial Risk Assessment**
```python  
metadata = [
    ColumnMetadata('application_id', 'integer', unique_flag=True, nullable=False),
    ColumnMetadata('credit_score', 'integer', min_value=300, max_value=850),
    ColumnMetadata('debt_to_income', 'float', min_value=0.0, max_value=10.0),
    ColumnMetadata('loan_purpose', 'categorical', allowed_values='home,auto,personal,business'),
]
# Skip sensitive columns
config = AnalysisConfig(skip_columns=['ssn', 'account_number'])
suggestions = funputer.analyze_dataframe(loan_df, metadata, config)
```

## ⚙️ Requirements

- **Python**: 3.9 or higher
- **Dependencies**: pandas, numpy, scipy, pydantic, click, pyyaml

## 🔧 Installation from Source

```bash
git clone https://github.com/RajeshRamachander/funputer.git
cd funputer
pip install -e .
```

## 📚 Documentation

- **API Reference**: Complete docstrings and type hints throughout the codebase
- **Examples**: See usage examples above and in the codebase
- **Test Coverage**: 84% coverage with comprehensive test suite

## 📄 License  

Proprietary License - Source code is available for inspection but not for derivative works.

---

**Focus**: Get intelligent imputation recommendations, not complex infrastructure.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "funputer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "imputation, missing-data, data-science, machine-learning, pandas, auto-inference, metadata, preflight, validation",
    "author": null,
    "author_email": "Rajesh Ramachander <rajeshr.technocraft@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/e3/87/c80b17641a9bcdf0aa3a6c8617052b32d106cf762f657038bf795ddc7191/funputer-1.5.2.tar.gz",
    "platform": null,
    "description": "# FunPuter - Intelligent Imputation Analysis\n\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![PyPI](https://img.shields.io/pypi/v/funputer.svg)](https://pypi.org/project/funputer/)\n[![License: Proprietary](https://img.shields.io/badge/License-Proprietary-red.svg)]\n[![Test Coverage](https://img.shields.io/badge/coverage-84%25-brightgreen.svg)](#documentation)\n\n**Intelligent imputation analysis with automatic data validation and metadata inference**\n\nFunPuter analyzes your data and recommends the best imputation methods based on data patterns, missing mechanisms, and metadata constraints. Get intelligent suggestions with confidence scores to handle missing data professionally.\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n```bash\npip install funputer\n```\n\n### 30-Second Example\n\n**Auto-Inference Mode** (Zero Configuration)\n```python\nimport funputer\n\n# Point to your CSV - FunPuter figures out everything automatically\nsuggestions = funputer.analyze_imputation_requirements(\"your_data.csv\")\n\n# Get intelligent suggestions with confidence scores\nfor suggestion in suggestions:\n    if suggestion.missing_count > 0:\n        print(f\"\ud83d\udcca {suggestion.column_name}: {suggestion.proposed_method}\")\n        print(f\"   Confidence: {suggestion.confidence_score:.3f}\")\n        print(f\"   Reason: {suggestion.rationale}\")\n        print(f\"   Missing: {suggestion.missing_count} ({suggestion.missing_percentage:.1f}%)\")\n```\n\n**Production Mode** (Full Control)\n```python\nimport funputer\nfrom funputer.models import ColumnMetadata\n\n# Define your data structure with constraints\nmetadata = [\n    ColumnMetadata('customer_id', 'integer', unique_flag=True, nullable=False),\n    ColumnMetadata('age', 'integer', min_value=18, max_value=100),\n    ColumnMetadata('income', 'float', min_value=0),\n    ColumnMetadata('category', 'categorical', allowed_values='A,B,C'),\n]\n\n# Get production-grade suggestions\nsuggestions = funputer.analyze_dataframe(your_dataframe, metadata)\n```\n\n## \ud83c\udfaf Key Features\n\n- **\ud83e\udd16 Automatic Metadata Inference** - Intelligent data type and constraint detection\n- **\ud83d\udcca Missing Data Analysis** - MCAR, MAR, MNAR mechanism detection  \n- **\u26a1 Data Validation** - Real-time constraint checking and validation\n- **\ud83c\udfaf Smart Recommendations** - Context-aware imputation method suggestions\n- **\ud83d\udcc8 Confidence Scoring** - Transparent reliability estimates for each recommendation\n- **\ud83d\udee1\ufe0f Pre-flight Checks** - Comprehensive data validation before analysis\n- **\ud83d\udcbb CLI & Python API** - Flexible usage via command line or programmatic access\n\n## \ud83d\udcca Data Validation System\n\nComprehensive validation runs automatically to prevent crashes and guide your workflow:\n\n- **File validation**: Format detection, encoding, accessibility\n- **Structure validation**: Column analysis, data type inference  \n- **Memory estimation**: Resource usage prediction\n- **Advisory recommendations**: Guided workflow suggestions\n\n**Independent Usage:**\n```bash\n# Basic validation check\nfunputer preflight -d your_data.csv\n\n# With custom options  \nfunputer preflight -d data.csv --sample-rows 5000 --encoding utf-8\n\n# JSON report output\nfunputer preflight -d data.csv --json-out report.json\n```\n\n**Exit Codes:**\n- `0`: Ready for analysis\n- `2`: OK with warnings (can proceed)\n- `10`: Hard error (cannot proceed)\n\n## \ud83d\udcbb Command Line Interface\n\n```bash\n# Generate metadata template from your data\nfunputer init -d data.csv -o metadata.csv\n\n# Analyze with auto-inference  \nfunputer analyze -d data.csv\n\n# Analyze with custom metadata\nfunputer analyze -d data.csv -m metadata.csv --verbose\n\n# Data quality check first\nfunputer preflight -d data.csv\n```\n\n## \ud83d\udcda Usage Examples\n\n### Basic Analysis\n\n```python\nimport funputer\n\n# Simple analysis with auto-inference\nsuggestions = funputer.analyze_imputation_requirements(\"sales_data.csv\")\n\n# Display recommendations\nfor suggestion in suggestions:\n    print(f\"Column: {suggestion.column_name}\")\n    print(f\"Method: {suggestion.proposed_method}\")  \n    print(f\"Confidence: {suggestion.confidence_score:.3f}\")\n    print(f\"Missing: {suggestion.missing_count} values\")\n    print()\n```\n\n### Advanced Configuration\n\n```python\nfrom funputer.models import ColumnMetadata, AnalysisConfig\nfrom funputer.analyzer import ImputationAnalyzer\n\n# Custom metadata with business rules\nmetadata = [\n    ColumnMetadata('product_id', 'string', unique_flag=True, max_length=10),\n    ColumnMetadata('price', 'float', min_value=0, max_value=10000),\n    ColumnMetadata('category', 'categorical', allowed_values='Electronics,Books,Clothing'),\n    ColumnMetadata('rating', 'float', min_value=1.0, max_value=5.0),\n]\n\n# Custom analysis configuration\nconfig = AnalysisConfig(\n    missing_percentage_threshold=0.3,  # 30% threshold\n    skip_columns=['internal_id'],\n    outlier_threshold=0.1\n)\n\n# Run analysis\nanalyzer = ImputationAnalyzer(config)\nsuggestions = analyzer.analyze_dataframe(df, metadata)\n```\n\n### Industry-Specific Examples\n\n**E-commerce Analytics**\n```python\nmetadata = [\n    ColumnMetadata('customer_id', 'integer', unique_flag=True, nullable=False),\n    ColumnMetadata('age', 'integer', min_value=13, max_value=120),\n    ColumnMetadata('purchase_amount', 'float', min_value=0),\n    ColumnMetadata('customer_segment', 'categorical', allowed_values='Premium,Standard,Basic'),\n]\nsuggestions = funputer.analyze_dataframe(customer_df, metadata)\n```\n\n**Healthcare Data**  \n```python\nmetadata = [\n    ColumnMetadata('patient_id', 'integer', unique_flag=True, nullable=False),\n    ColumnMetadata('age', 'integer', min_value=0, max_value=150),\n    ColumnMetadata('blood_pressure', 'integer', min_value=50, max_value=300),\n    ColumnMetadata('diagnosis', 'categorical', nullable=False),\n]\nconfig = AnalysisConfig(missing_threshold=0.05)  # Low tolerance for healthcare\nsuggestions = funputer.analyze_dataframe(patient_df, metadata, config)\n```\n\n**Financial Risk Assessment**\n```python  \nmetadata = [\n    ColumnMetadata('application_id', 'integer', unique_flag=True, nullable=False),\n    ColumnMetadata('credit_score', 'integer', min_value=300, max_value=850),\n    ColumnMetadata('debt_to_income', 'float', min_value=0.0, max_value=10.0),\n    ColumnMetadata('loan_purpose', 'categorical', allowed_values='home,auto,personal,business'),\n]\n# Skip sensitive columns\nconfig = AnalysisConfig(skip_columns=['ssn', 'account_number'])\nsuggestions = funputer.analyze_dataframe(loan_df, metadata, config)\n```\n\n## \u2699\ufe0f Requirements\n\n- **Python**: 3.9 or higher\n- **Dependencies**: pandas, numpy, scipy, pydantic, click, pyyaml\n\n## \ud83d\udd27 Installation from Source\n\n```bash\ngit clone https://github.com/RajeshRamachander/funputer.git\ncd funputer\npip install -e .\n```\n\n## \ud83d\udcda Documentation\n\n- **API Reference**: Complete docstrings and type hints throughout the codebase\n- **Examples**: See usage examples above and in the codebase\n- **Test Coverage**: 84% coverage with comprehensive test suite\n\n## \ud83d\udcc4 License  \n\nProprietary License - Source code is available for inspection but not for derivative works.\n\n---\n\n**Focus**: Get intelligent imputation recommendations, not complex infrastructure.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Intelligent imputation analysis with automatic data validation and metadata inference",
    "version": "1.5.2",
    "project_urls": {
        "Documentation": "https://pypi.org/project/funputer/",
        "Homepage": "https://pypi.org/project/funputer/"
    },
    "split_keywords": [
        "imputation",
        " missing-data",
        " data-science",
        " machine-learning",
        " pandas",
        " auto-inference",
        " metadata",
        " preflight",
        " validation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "34635dede034e64458361783c88c1b5a17449b19c7dad599c6a9bdb6f229139e",
                "md5": "81af9506e41f612724e05c8430e2fbaa",
                "sha256": "3c857a56e2a21c4886417c616f8f0d5584b73900a5278d54410dff6d8220c97d"
            },
            "downloads": -1,
            "filename": "funputer-1.5.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "81af9506e41f612724e05c8430e2fbaa",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 36251,
            "upload_time": "2025-08-19T17:21:22",
            "upload_time_iso_8601": "2025-08-19T17:21:22.310760Z",
            "url": "https://files.pythonhosted.org/packages/34/63/5dede034e64458361783c88c1b5a17449b19c7dad599c6a9bdb6f229139e/funputer-1.5.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e387c80b17641a9bcdf0aa3a6c8617052b32d106cf762f657038bf795ddc7191",
                "md5": "2ed0611a20a8447f91233b9f1ae6204b",
                "sha256": "ba99a039e9e8e6984501918dabe8ee574856bcccdd594e5b0a0ce70bf0b4b6d8"
            },
            "downloads": -1,
            "filename": "funputer-1.5.2.tar.gz",
            "has_sig": false,
            "md5_digest": "2ed0611a20a8447f91233b9f1ae6204b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 107101,
            "upload_time": "2025-08-19T17:21:23",
            "upload_time_iso_8601": "2025-08-19T17:21:23.603747Z",
            "url": "https://files.pythonhosted.org/packages/e3/87/c80b17641a9bcdf0aa3a6c8617052b32d106cf762f657038bf795ddc7191/funputer-1.5.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-19 17:21:23",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "funputer"
}
        
Elapsed time: 1.56973s