# edaflow
A Python package for streamlined exploratory data analysis workflows.
## Description
`edaflow` is designed to simplify and accelerate the exploratory data analysis (EDA) process by providing a collection of tools and utilities for data scientists and analysts. The package integrates popular data science libraries to create a cohesive workflow for data exploration, visualization, and preprocessing.
## Features
- **Missing Data Analysis**: Color-coded analysis of null values with customizable thresholds
- **Categorical Data Insights**: Identify object columns that might be numeric, detect data type issues
- **Automatic Data Type Conversion**: Smart conversion of object columns to numeric when appropriate
- **Categorical Values Visualization**: Detailed exploration of categorical column values with insights
- **Column Type Classification**: Simple categorization of DataFrame columns into categorical and numerical types
- **Data Imputation**: Smart missing value imputation using median for numerical and mode for categorical columns
- **Data Type Detection**: Smart analysis to flag potential data conversion needs
- **Styled Output**: Beautiful, color-coded results for Jupyter notebooks and terminals
- **Easy Integration**: Works seamlessly with pandas, numpy, and other popular libraries
## Installation
### From PyPI
```bash
pip install edaflow
```
### From Source
```bash
git clone https://github.com/evanlow/edaflow.git
cd edaflow
pip install -e .
```
### Development Installation
```bash
git clone https://github.com/evanlow/edaflow.git
cd edaflow
pip install -e ".[dev]"
```
## Requirements
- Python 3.8+
- pandas >= 1.5.0
- numpy >= 1.21.0
- matplotlib >= 3.5.0
- seaborn >= 0.11.0
- scipy >= 1.7.0
- missingno >= 0.5.0
## Quick Start
```python
import edaflow
# Test the installation
print(edaflow.hello())
# Check null values in your dataset
import pandas as pd
df = pd.read_csv('your_data.csv')
# Analyze missing data with styled output
null_analysis = edaflow.check_null_columns(df, threshold=10)
print(null_analysis)
# Analyze categorical columns to identify data type issues
edaflow.analyze_categorical_columns(df, threshold=35)
# Convert appropriate object columns to numeric automatically
df_cleaned = edaflow.convert_to_numeric(df, threshold=35)
print("Data types after conversion:", df_cleaned.dtypes)
```
## Usage Examples
### Basic Usage
```python
import edaflow
# Verify installation
message = edaflow.hello()
print(message) # Output: "Hello from edaflow! Ready for exploratory data analysis."
```
### Missing Data Analysis with `check_null_columns`
The `check_null_columns` function provides a color-coded analysis of missing data in your DataFrame:
```python
import pandas as pd
import edaflow
# Create sample data with missing values
df = pd.DataFrame({
'customer_id': [1, 2, 3, 4, 5],
'name': ['Alice', 'Bob', None, 'Diana', 'Eve'],
'age': [25, None, 35, None, 45],
'email': [None, None, None, None, None], # All missing
'purchase_amount': [100.5, 250.0, 75.25, None, 320.0]
})
# Analyze missing data with default threshold (10%)
styled_result = edaflow.check_null_columns(df)
styled_result # Display in Jupyter notebook for color-coded styling
# Use custom threshold (20%) to change color coding sensitivity
styled_result = edaflow.check_null_columns(df, threshold=20)
styled_result
# Access underlying data if needed
data = styled_result.data
print(data)
```
**Color Coding:**
- ๐ด **Red**: > 20% missing (high concern)
- ๐ก **Yellow**: 10-20% missing (medium concern)
- ๐จ **Light Yellow**: 1-10% missing (low concern)
- โฌ **Gray**: 0% missing (no issues)
### Categorical Data Analysis with `analyze_categorical_columns`
The `analyze_categorical_columns` function helps identify data type issues and provides insights into object-type columns:
```python
import pandas as pd
import edaflow
# Create sample data with mixed categorical types
df = pd.DataFrame({
'product_name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
'price_str': ['999', '25', '75', '450'], # Numbers stored as strings
'category': ['Electronics', 'Accessories', 'Accessories', 'Electronics'],
'rating': [4.5, 3.8, 4.2, 4.7], # Already numeric
'mixed_ids': ['001', '002', 'ABC', '004'], # Mixed format
'status': ['active', 'inactive', 'active', 'pending']
})
# Analyze categorical columns with default threshold (35%)
edaflow.analyze_categorical_columns(df)
# Use custom threshold (50%) to be more lenient about mixed data
edaflow.analyze_categorical_columns(df, threshold=50)
```
**Output Interpretation:**
- ๐ด๐ต **Highlighted in Red/Blue**: Potentially numeric columns that might need conversion
- ๐กโซ **Highlighted in Yellow/Black**: Shows unique values for potential numeric columns
- **Regular text**: Truly categorical columns with statistics
- **"not an object column"**: Already properly typed numeric columns
### Data Type Conversion with `convert_to_numeric`
After analyzing your categorical columns, you can automatically convert appropriate columns to numeric:
```python
import pandas as pd
import edaflow
# Create sample data with string numbers
df = pd.DataFrame({
'product_name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
'price_str': ['999', '25', '75', '450'], # Should convert
'mixed_ids': ['001', '002', 'ABC', '004'], # Mixed data
'category': ['Electronics', 'Accessories', 'Electronics', 'Electronics']
})
# Convert appropriate columns to numeric (threshold=35% by default)
df_converted = edaflow.convert_to_numeric(df, threshold=35)
# Or modify the original DataFrame in place
edaflow.convert_to_numeric(df, threshold=35, inplace=True)
# Use a stricter threshold (only convert if <20% non-numeric values)
df_strict = edaflow.convert_to_numeric(df, threshold=20)
```
**Function Features:**
- โ
**Smart Detection**: Only converts columns with few non-numeric values
- โ
**Customizable Threshold**: Control conversion sensitivity
- โ
**Safe Conversion**: Non-numeric values become NaN (not errors)
- โ
**Inplace Option**: Modify original DataFrame or create new one
- โ
**Detailed Output**: Shows exactly what was converted and why
### Categorical Data Visualization with `visualize_categorical_values`
After cleaning your data, explore categorical columns in detail to understand value distributions:
```python
import pandas as pd
import edaflow
# Example DataFrame with categorical data
df = pd.DataFrame({
'department': ['Sales', 'Marketing', 'Sales', 'HR', 'Marketing', 'Sales', 'IT'],
'status': ['Active', 'Inactive', 'Active', 'Pending', 'Active', 'Active', 'Inactive'],
'priority': ['High', 'Medium', 'High', 'Low', 'Medium', 'High', 'Low'],
'employee_id': [1001, 1002, 1003, 1004, 1005, 1006, 1007], # Numeric (ignored)
'salary': [50000, 60000, 55000, 45000, 58000, 62000, 70000] # Numeric (ignored)
})
# Visualize all categorical columns
edaflow.visualize_categorical_values(df)
```
**Advanced Usage Examples:**
```python
# Handle high-cardinality data (many unique values)
large_df = pd.DataFrame({
'product_id': [f'PROD_{i:04d}' for i in range(100)], # 100 unique values
'category': ['Electronics'] * 40 + ['Clothing'] * 35 + ['Books'] * 25,
'status': ['Available'] * 80 + ['Out of Stock'] * 15 + ['Discontinued'] * 5
})
# Limit display for high-cardinality columns
edaflow.visualize_categorical_values(large_df, max_unique_values=5)
```
```python
# DataFrame with missing values for comprehensive analysis
df_with_nulls = pd.DataFrame({
'region': ['North', 'South', None, 'East', 'West', 'North', None],
'customer_type': ['Premium', 'Standard', 'Premium', None, 'Standard', 'Premium', 'Standard'],
'transaction_id': [f'TXN_{i}' for i in range(7)], # Mostly unique (ID-like)
})
# Get detailed insights including missing value analysis
edaflow.visualize_categorical_values(df_with_nulls)
```
**Function Features:**
- ๐ฏ **Smart Column Detection**: Automatically finds categorical (object-type) columns
- ๐ **Value Distribution**: Shows counts and percentages for each unique value
- ๐ **Missing Value Analysis**: Tracks and reports NaN/missing values
- โก **High-Cardinality Handling**: Truncates display for columns with many unique values
- ๐ก **Actionable Insights**: Identifies ID-like columns and provides data quality recommendations
- ๐จ **Color-Coded Output**: Easy-to-read formatted results with highlighting
### Column Type Classification with `display_column_types`
The `display_column_types` function provides a simple way to categorize DataFrame columns into categorical and numerical types:
```python
import pandas as pd
import edaflow
# Create sample data with mixed types
data = {
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'city': ['NYC', 'LA', 'Chicago'],
'salary': [50000, 60000, 70000],
'is_active': [True, False, True]
}
df = pd.DataFrame(data)
# Display column type classification
result = edaflow.display_column_types(df)
# Access the categorized column lists
categorical_cols = result['categorical'] # ['name', 'city']
numerical_cols = result['numerical'] # ['age', 'salary', 'is_active']
```
**Example Output:**
```
๐ Column Type Analysis
==================================================
๐ Categorical Columns (2 total):
1. name (unique values: 3)
2. city (unique values: 3)
๐ข Numerical Columns (3 total):
1. age (dtype: int64)
2. salary (dtype: int64)
3. is_active (dtype: bool)
๐ Summary:
Total columns: 5
Categorical: 2 (40.0%)
Numerical: 3 (60.0%)
```
**Function Features:**
- ๐ **Simple Classification**: Separates columns into categorical (object dtype) and numerical (all other dtypes)
- ๐ **Detailed Information**: Shows unique value counts for categorical columns and data types for numerical columns
- ๐ **Summary Statistics**: Provides percentage breakdown of column types
- ๐ฏ **Return Values**: Returns dictionary with categorized column lists for programmatic use
- โก **Fast Processing**: Efficient classification based on pandas data types
- ๐ก๏ธ **Error Handling**: Validates input and handles edge cases like empty DataFrames
### Data Imputation with `impute_numerical_median` and `impute_categorical_mode`
After analyzing your data, you often need to handle missing values. The edaflow package provides two specialized imputation functions for this purpose:
#### Numerical Imputation with `impute_numerical_median`
The `impute_numerical_median` function fills missing values in numerical columns using the median value:
```python
import pandas as pd
import edaflow
# Create sample data with missing numerical values
df = pd.DataFrame({
'age': [25, None, 35, None, 45],
'salary': [50000, 60000, None, 70000, None],
'score': [85.5, None, 92.0, 88.5, None],
'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve']
})
# Impute all numerical columns with median values
df_imputed = edaflow.impute_numerical_median(df)
# Impute specific columns only
df_imputed = edaflow.impute_numerical_median(df, columns=['age', 'salary'])
# Impute in place (modifies original DataFrame)
edaflow.impute_numerical_median(df, inplace=True)
```
**Function Features:**
- ๐ข **Smart Detection**: Automatically identifies numerical columns (int, float, etc.)
- ๐ **Median Imputation**: Uses median values which are robust to outliers
- ๐ฏ **Selective Imputation**: Option to specify which columns to impute
- ๐ **Inplace Option**: Modify original DataFrame or create new one
- ๐ก๏ธ **Safe Handling**: Gracefully handles edge cases like all-missing columns
- ๐ **Detailed Reporting**: Shows exactly what was imputed and summary statistics
#### Categorical Imputation with `impute_categorical_mode`
The `impute_categorical_mode` function fills missing values in categorical columns using the mode (most frequent value):
```python
import pandas as pd
import edaflow
# Create sample data with missing categorical values
df = pd.DataFrame({
'category': ['A', 'B', 'A', None, 'A'],
'status': ['Active', None, 'Active', 'Inactive', None],
'priority': ['High', 'Medium', None, 'Low', 'High'],
'age': [25, 30, 35, 40, 45]
})
# Impute all categorical columns with mode values
df_imputed = edaflow.impute_categorical_mode(df)
# Impute specific columns only
df_imputed = edaflow.impute_categorical_mode(df, columns=['category', 'status'])
# Impute in place (modifies original DataFrame)
edaflow.impute_categorical_mode(df, inplace=True)
```
**Function Features:**
- ๐ **Smart Detection**: Automatically identifies categorical (object) columns
- ๐ฏ **Mode Imputation**: Uses most frequent value for each column
- โ๏ธ **Tie Handling**: Gracefully handles mode ties (multiple values with same frequency)
- ๐ **Inplace Option**: Modify original DataFrame or create new one
- ๐ก๏ธ **Safe Handling**: Gracefully handles edge cases like all-missing columns
- ๐ **Detailed Reporting**: Shows exactly what was imputed and mode tie warnings
#### Complete Imputation Workflow Example
```python
import pandas as pd
import edaflow
# Sample data with both numerical and categorical missing values
df = pd.DataFrame({
'age': [25, None, 35, None, 45],
'salary': [50000, None, 70000, 80000, None],
'category': ['A', 'B', None, 'A', None],
'status': ['Active', None, 'Active', 'Inactive', None],
'score': [85.5, 92.0, None, 88.5, None]
})
print("Original DataFrame:")
print(df)
print("\n" + "="*50)
# Step 1: Impute numerical columns
print("STEP 1: Numerical Imputation")
df_step1 = edaflow.impute_numerical_median(df)
# Step 2: Impute categorical columns
print("\nSTEP 2: Categorical Imputation")
df_final = edaflow.impute_categorical_mode(df_step1)
print("\nFinal DataFrame (all missing values imputed):")
print(df_final)
# Verify no missing values remain
print(f"\nMissing values remaining: {df_final.isnull().sum().sum()}")
```
**Expected Output:**
```
๐ข Numerical Missing Value Imputation (Median)
=======================================================
๐ age - Imputed 2 values with median: 35.0
๐ salary - Imputed 2 values with median: 70000.0
๐ score - Imputed 1 values with median: 88.75
๐ Imputation Summary:
Columns processed: 3
Columns imputed: 3
Total values imputed: 5
๐ Categorical Missing Value Imputation (Mode)
=======================================================
๐ category - Imputed 2 values with mode: 'A'
๐ status - Imputed 1 values with mode: 'Active'
๐ Imputation Summary:
Columns processed: 2
Columns imputed: 2
Total values imputed: 3
```
### Complete EDA Workflow Example
```python
import pandas as pd
import edaflow
# Load your dataset
df = pd.read_csv('customer_data.csv')
print("=== EXPLORATORY DATA ANALYSIS WITH EDAFLOW ===")
print(f"Dataset shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
# Step 1: Check for missing data
print("\n1. MISSING DATA ANALYSIS")
print("-" * 40)
null_analysis = edaflow.check_null_columns(df, threshold=15)
null_analysis # Shows color-coded missing data summary
# Step 2: Analyze categorical columns for data type issues
print("\n2. CATEGORICAL DATA ANALYSIS")
print("-" * 40)
edaflow.analyze_categorical_columns(df, threshold=30)
# Step 3: Convert appropriate columns to numeric automatically
print("\n3. AUTOMATIC DATA TYPE CONVERSION")
print("-" * 40)
df_cleaned = edaflow.convert_to_numeric(df, threshold=30)
# Step 4: Visualize categorical column values in detail
print("\n4. CATEGORICAL VALUES EXPLORATION")
print("-" * 40)
edaflow.visualize_categorical_values(df_cleaned, max_unique_values=10)
# Step 5: Display column type classification
print("\n5. COLUMN TYPE CLASSIFICATION")
print("-" * 40)
column_types = edaflow.display_column_types(df_cleaned)
# Step 6: Handle missing values with imputation
print("\n6. MISSING VALUE IMPUTATION")
print("-" * 40)
# Impute numerical columns with median
df_numeric_imputed = edaflow.impute_numerical_median(df_cleaned)
# Impute categorical columns with mode
df_fully_imputed = edaflow.impute_categorical_mode(df_numeric_imputed)
# Step 7: Final data review
print("\n7. DATA CLEANING SUMMARY")
print("-" * 40)
print("Original data types:")
print(df.dtypes)
print("\nCleaned data types:")
print(df_fully_imputed.dtypes)
print(f"\nFinal dataset shape: {df_fully_imputed.shape}")
print(f"Missing values remaining: {df_fully_imputed.isnull().sum().sum()}")
# Now your data is ready for further analysis!
# You can proceed with:
# - Statistical analysis
# - Machine learning preprocessing
# - Visualization
# - Advanced EDA techniques
```
### Integration with Jupyter Notebooks
For the best experience, use these functions in Jupyter notebooks where:
- `check_null_columns()` displays beautiful color-coded tables
- `analyze_categorical_columns()` shows colored terminal output
- You can iterate quickly on data cleaning decisions
```python
# In Jupyter notebook cell
import pandas as pd
import edaflow
df = pd.read_csv('your_data.csv')
# This will display a nicely formatted, color-coded table
edaflow.check_null_columns(df)
```
# Load your dataset
df = pd.read_csv('data.csv')
# Analyze categorical columns to identify potential issues
edaflow.analyze_categorical_columns(df, threshold=35)
# This will identify:
# - Object columns that might actually be numeric (need conversion)
# - Truly categorical columns with their unique values
# - Mixed data type issues
```
### Working with Data (Future Implementation)
```python
import pandas as pd
import edaflow
# Load your dataset
df = pd.read_csv('data.csv')
# Perform EDA workflow
# summary = edaflow.quick_summary(df)
# edaflow.plot_overview(df)
# clean_df = edaflow.clean_data(df)
```
## Project Structure
```
edaflow/
โโโ edaflow/
โ โโโ __init__.py
โ โโโ analysis/
โ โโโ visualization/
โ โโโ preprocessing/
โโโ tests/
โโโ docs/
โโโ examples/
โโโ setup.py
โโโ requirements.txt
โโโ README.md
โโโ LICENSE
```
## Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## Development
### Setup Development Environment
```bash
# Clone the repository
git clone https://github.com/evanlow/edaflow.git
cd edaflow
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest
# Run linting
flake8 edaflow/
black edaflow/
isort edaflow/
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Changelog
### v0.4.0 (Data Imputation Release)
- **NEW**: `impute_numerical_median()` function for numerical missing value imputation using median
- **NEW**: `impute_categorical_mode()` function for categorical missing value imputation using mode
- **NEW**: Complete 7-function EDA workflow: analyze โ convert โ visualize โ classify โ impute
- **NEW**: Smart column detection and validation for imputation functions
- **NEW**: Inplace imputation option with detailed reporting and error handling
- **NEW**: Comprehensive edge case handling (empty DataFrames, all missing values, mode ties)
- Enhanced testing coverage with 54 comprehensive tests achieving 93% coverage
### v0.3.1 (Feature Enhancement)
- **NEW**: `display_column_types()` function for column type classification
- **NEW**: Complete 5-function EDA workflow: analyze โ convert โ visualize โ classify
- **ENHANCED**: Updated comprehensive examples with full 5-function workflow
- Enhanced testing coverage with 32 comprehensive tests covering all functions
### v0.3.0 (Major Feature Release)
- **NEW**: `convert_to_numeric()` function for automatic data type conversion
- **NEW**: `visualize_categorical_values()` function for detailed categorical data exploration
- **NEW**: Smart threshold-based conversion with detailed reporting
- **NEW**: Inplace conversion option for flexible DataFrame modification
- **NEW**: Safe conversion with NaN handling for invalid values
- **NEW**: High-cardinality handling and data quality insights
- Enhanced testing coverage with comprehensive tests
### v0.2.1 (Documentation Enhancement)
- **ENHANCED**: Comprehensive README with detailed usage examples
- **NEW**: Step-by-step examples for both `check_null_columns()` and `analyze_categorical_columns()`
- **NEW**: Complete EDA workflow example showing real-world usage
- **NEW**: Jupyter notebook integration examples
- **IMPROVED**: Color-coding explanations and output interpretation guides
### v0.2.0 (Feature Release)
- **NEW**: `analyze_categorical_columns()` function for categorical data analysis
- **NEW**: Smart detection of object columns that might be numeric
- **NEW**: Color-coded terminal output for better readability
- Enhanced testing coverage with 12 comprehensive tests
- Improved documentation with detailed usage examples
### v0.1.1 (Documentation Update)
- Updated README with improved acknowledgments
- Fixed GitHub repository URLs
- Enhanced PyPI package presentation
### v0.1.0 (Initial Release)
- Basic package structure
- Sample hello() function
- `check_null_columns()` function for missing data analysis
- Core dependencies setup
- Documentation framework
## Support
If you encounter any issues or have questions, please file an issue on the [GitHub repository](https://github.com/evanlow/edaflow/issues).
## Roadmap
- [ ] Core analysis modules
- [ ] Visualization utilities
- [ ] Data preprocessing tools
- [ ] Missing data handling
- [ ] Statistical testing suite
- [ ] Interactive dashboards
- [ ] CLI interface
- [ ] Documentation website
## Acknowledgments
edaflow was developed during the AI/ML course conducted by NTUC LearningHub. I am grateful for the privilege of working alongside my coursemates from Cohort 15. A special thanks to our awesome instructor, Ms. Isha Sehgal, who not only inspired us but also instilled the data science discipline that we now possess
Raw data
{
"_id": null,
"home_page": "https://github.com/evanlow/edaflow",
"name": "edaflow",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Evan Low <evan.low@illumetechnology.com>",
"keywords": "data-analysis, eda, exploratory-data-analysis, data-science, visualization",
"author": "Evan Low",
"author_email": "Evan Low <evan.low@illumetechnology.com>",
"download_url": "https://files.pythonhosted.org/packages/2e/f8/11c78b14311d1c323f5ea62f1f32675d221ad8971fc8132726fda18d607e/edaflow-0.4.0.tar.gz",
"platform": null,
"description": "# edaflow\r\n\r\nA Python package for streamlined exploratory data analysis workflows.\r\n\r\n## Description\r\n\r\n`edaflow` is designed to simplify and accelerate the exploratory data analysis (EDA) process by providing a collection of tools and utilities for data scientists and analysts. The package integrates popular data science libraries to create a cohesive workflow for data exploration, visualization, and preprocessing.\r\n\r\n## Features\r\n\r\n- **Missing Data Analysis**: Color-coded analysis of null values with customizable thresholds\r\n- **Categorical Data Insights**: Identify object columns that might be numeric, detect data type issues\r\n- **Automatic Data Type Conversion**: Smart conversion of object columns to numeric when appropriate\r\n- **Categorical Values Visualization**: Detailed exploration of categorical column values with insights\r\n- **Column Type Classification**: Simple categorization of DataFrame columns into categorical and numerical types\r\n- **Data Imputation**: Smart missing value imputation using median for numerical and mode for categorical columns\r\n- **Data Type Detection**: Smart analysis to flag potential data conversion needs\r\n- **Styled Output**: Beautiful, color-coded results for Jupyter notebooks and terminals\r\n- **Easy Integration**: Works seamlessly with pandas, numpy, and other popular libraries\r\n\r\n## Installation\r\n\r\n### From PyPI\r\n```bash\r\npip install edaflow\r\n```\r\n\r\n### From Source\r\n```bash\r\ngit clone https://github.com/evanlow/edaflow.git\r\ncd edaflow\r\npip install -e .\r\n```\r\n\r\n### Development Installation\r\n```bash\r\ngit clone https://github.com/evanlow/edaflow.git\r\ncd edaflow\r\npip install -e \".[dev]\"\r\n```\r\n\r\n## Requirements\r\n\r\n- Python 3.8+\r\n- pandas >= 1.5.0\r\n- numpy >= 1.21.0\r\n- matplotlib >= 3.5.0\r\n- seaborn >= 0.11.0\r\n- scipy >= 1.7.0\r\n- missingno >= 0.5.0\r\n\r\n## Quick Start\r\n\r\n```python\r\nimport edaflow\r\n\r\n# Test the installation\r\nprint(edaflow.hello())\r\n\r\n# Check null values in your dataset\r\nimport pandas as pd\r\ndf = pd.read_csv('your_data.csv')\r\n\r\n# Analyze missing data with styled output\r\nnull_analysis = edaflow.check_null_columns(df, threshold=10)\r\nprint(null_analysis)\r\n\r\n# Analyze categorical columns to identify data type issues\r\nedaflow.analyze_categorical_columns(df, threshold=35)\r\n\r\n# Convert appropriate object columns to numeric automatically\r\ndf_cleaned = edaflow.convert_to_numeric(df, threshold=35)\r\nprint(\"Data types after conversion:\", df_cleaned.dtypes)\r\n```\r\n\r\n## Usage Examples\r\n\r\n### Basic Usage\r\n```python\r\nimport edaflow\r\n\r\n# Verify installation\r\nmessage = edaflow.hello()\r\nprint(message) # Output: \"Hello from edaflow! Ready for exploratory data analysis.\"\r\n```\r\n\r\n### Missing Data Analysis with `check_null_columns`\r\n\r\nThe `check_null_columns` function provides a color-coded analysis of missing data in your DataFrame:\r\n\r\n```python\r\nimport pandas as pd\r\nimport edaflow\r\n\r\n# Create sample data with missing values\r\ndf = pd.DataFrame({\r\n 'customer_id': [1, 2, 3, 4, 5],\r\n 'name': ['Alice', 'Bob', None, 'Diana', 'Eve'],\r\n 'age': [25, None, 35, None, 45],\r\n 'email': [None, None, None, None, None], # All missing\r\n 'purchase_amount': [100.5, 250.0, 75.25, None, 320.0]\r\n})\r\n\r\n# Analyze missing data with default threshold (10%)\r\nstyled_result = edaflow.check_null_columns(df)\r\nstyled_result # Display in Jupyter notebook for color-coded styling\r\n\r\n# Use custom threshold (20%) to change color coding sensitivity\r\nstyled_result = edaflow.check_null_columns(df, threshold=20)\r\nstyled_result\r\n\r\n# Access underlying data if needed\r\ndata = styled_result.data\r\nprint(data)\r\n```\r\n\r\n**Color Coding:**\r\n- \ud83d\udd34 **Red**: > 20% missing (high concern)\r\n- \ud83d\udfe1 **Yellow**: 10-20% missing (medium concern) \r\n- \ud83d\udfe8 **Light Yellow**: 1-10% missing (low concern)\r\n- \u2b1c **Gray**: 0% missing (no issues)\r\n\r\n### Categorical Data Analysis with `analyze_categorical_columns`\r\n\r\nThe `analyze_categorical_columns` function helps identify data type issues and provides insights into object-type columns:\r\n\r\n```python\r\nimport pandas as pd\r\nimport edaflow\r\n\r\n# Create sample data with mixed categorical types\r\ndf = pd.DataFrame({\r\n 'product_name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],\r\n 'price_str': ['999', '25', '75', '450'], # Numbers stored as strings\r\n 'category': ['Electronics', 'Accessories', 'Accessories', 'Electronics'],\r\n 'rating': [4.5, 3.8, 4.2, 4.7], # Already numeric\r\n 'mixed_ids': ['001', '002', 'ABC', '004'], # Mixed format\r\n 'status': ['active', 'inactive', 'active', 'pending']\r\n})\r\n\r\n# Analyze categorical columns with default threshold (35%)\r\nedaflow.analyze_categorical_columns(df)\r\n\r\n# Use custom threshold (50%) to be more lenient about mixed data\r\nedaflow.analyze_categorical_columns(df, threshold=50)\r\n```\r\n\r\n**Output Interpretation:**\r\n- \ud83d\udd34\ud83d\udd35 **Highlighted in Red/Blue**: Potentially numeric columns that might need conversion\r\n- \ud83d\udfe1\u26ab **Highlighted in Yellow/Black**: Shows unique values for potential numeric columns\r\n- **Regular text**: Truly categorical columns with statistics\r\n- **\"not an object column\"**: Already properly typed numeric columns\r\n\r\n### Data Type Conversion with `convert_to_numeric`\r\n\r\nAfter analyzing your categorical columns, you can automatically convert appropriate columns to numeric:\r\n\r\n```python\r\nimport pandas as pd\r\nimport edaflow\r\n\r\n# Create sample data with string numbers\r\ndf = pd.DataFrame({\r\n 'product_name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],\r\n 'price_str': ['999', '25', '75', '450'], # Should convert\r\n 'mixed_ids': ['001', '002', 'ABC', '004'], # Mixed data\r\n 'category': ['Electronics', 'Accessories', 'Electronics', 'Electronics']\r\n})\r\n\r\n# Convert appropriate columns to numeric (threshold=35% by default)\r\ndf_converted = edaflow.convert_to_numeric(df, threshold=35)\r\n\r\n# Or modify the original DataFrame in place\r\nedaflow.convert_to_numeric(df, threshold=35, inplace=True)\r\n\r\n# Use a stricter threshold (only convert if <20% non-numeric values)\r\ndf_strict = edaflow.convert_to_numeric(df, threshold=20)\r\n```\r\n\r\n**Function Features:**\r\n- \u2705 **Smart Detection**: Only converts columns with few non-numeric values\r\n- \u2705 **Customizable Threshold**: Control conversion sensitivity \r\n- \u2705 **Safe Conversion**: Non-numeric values become NaN (not errors)\r\n- \u2705 **Inplace Option**: Modify original DataFrame or create new one\r\n- \u2705 **Detailed Output**: Shows exactly what was converted and why\r\n\r\n### Categorical Data Visualization with `visualize_categorical_values`\r\n\r\nAfter cleaning your data, explore categorical columns in detail to understand value distributions:\r\n\r\n```python\r\nimport pandas as pd\r\nimport edaflow\r\n\r\n# Example DataFrame with categorical data\r\ndf = pd.DataFrame({\r\n 'department': ['Sales', 'Marketing', 'Sales', 'HR', 'Marketing', 'Sales', 'IT'],\r\n 'status': ['Active', 'Inactive', 'Active', 'Pending', 'Active', 'Active', 'Inactive'],\r\n 'priority': ['High', 'Medium', 'High', 'Low', 'Medium', 'High', 'Low'],\r\n 'employee_id': [1001, 1002, 1003, 1004, 1005, 1006, 1007], # Numeric (ignored)\r\n 'salary': [50000, 60000, 55000, 45000, 58000, 62000, 70000] # Numeric (ignored)\r\n})\r\n\r\n# Visualize all categorical columns\r\nedaflow.visualize_categorical_values(df)\r\n```\r\n\r\n**Advanced Usage Examples:**\r\n\r\n```python\r\n# Handle high-cardinality data (many unique values)\r\nlarge_df = pd.DataFrame({\r\n 'product_id': [f'PROD_{i:04d}' for i in range(100)], # 100 unique values\r\n 'category': ['Electronics'] * 40 + ['Clothing'] * 35 + ['Books'] * 25,\r\n 'status': ['Available'] * 80 + ['Out of Stock'] * 15 + ['Discontinued'] * 5\r\n})\r\n\r\n# Limit display for high-cardinality columns\r\nedaflow.visualize_categorical_values(large_df, max_unique_values=5)\r\n```\r\n\r\n```python\r\n# DataFrame with missing values for comprehensive analysis\r\ndf_with_nulls = pd.DataFrame({\r\n 'region': ['North', 'South', None, 'East', 'West', 'North', None],\r\n 'customer_type': ['Premium', 'Standard', 'Premium', None, 'Standard', 'Premium', 'Standard'],\r\n 'transaction_id': [f'TXN_{i}' for i in range(7)], # Mostly unique (ID-like)\r\n})\r\n\r\n# Get detailed insights including missing value analysis\r\nedaflow.visualize_categorical_values(df_with_nulls)\r\n```\r\n\r\n**Function Features:**\r\n- \ud83c\udfaf **Smart Column Detection**: Automatically finds categorical (object-type) columns\r\n- \ud83d\udcca **Value Distribution**: Shows counts and percentages for each unique value \r\n- \ud83d\udd0d **Missing Value Analysis**: Tracks and reports NaN/missing values\r\n- \u26a1 **High-Cardinality Handling**: Truncates display for columns with many unique values\r\n- \ud83d\udca1 **Actionable Insights**: Identifies ID-like columns and provides data quality recommendations\r\n- \ud83c\udfa8 **Color-Coded Output**: Easy-to-read formatted results with highlighting\r\n\r\n### Column Type Classification with `display_column_types`\r\n\r\nThe `display_column_types` function provides a simple way to categorize DataFrame columns into categorical and numerical types:\r\n\r\n```python\r\nimport pandas as pd\r\nimport edaflow\r\n\r\n# Create sample data with mixed types\r\ndata = {\r\n 'name': ['Alice', 'Bob', 'Charlie'],\r\n 'age': [25, 30, 35],\r\n 'city': ['NYC', 'LA', 'Chicago'],\r\n 'salary': [50000, 60000, 70000],\r\n 'is_active': [True, False, True]\r\n}\r\ndf = pd.DataFrame(data)\r\n\r\n# Display column type classification\r\nresult = edaflow.display_column_types(df)\r\n\r\n# Access the categorized column lists\r\ncategorical_cols = result['categorical'] # ['name', 'city']\r\nnumerical_cols = result['numerical'] # ['age', 'salary', 'is_active']\r\n```\r\n\r\n**Example Output:**\r\n```\r\n\ud83d\udcca Column Type Analysis\r\n==================================================\r\n\r\n\ud83d\udcdd Categorical Columns (2 total):\r\n 1. name (unique values: 3)\r\n 2. city (unique values: 3)\r\n\r\n\ud83d\udd22 Numerical Columns (3 total):\r\n 1. age (dtype: int64)\r\n 2. salary (dtype: int64)\r\n 3. is_active (dtype: bool)\r\n\r\n\ud83d\udcc8 Summary:\r\n Total columns: 5\r\n Categorical: 2 (40.0%)\r\n Numerical: 3 (60.0%)\r\n```\r\n\r\n**Function Features:**\r\n- \ud83d\udd0d **Simple Classification**: Separates columns into categorical (object dtype) and numerical (all other dtypes)\r\n- \ud83d\udcca **Detailed Information**: Shows unique value counts for categorical columns and data types for numerical columns\r\n- \ud83d\udcc8 **Summary Statistics**: Provides percentage breakdown of column types\r\n- \ud83c\udfaf **Return Values**: Returns dictionary with categorized column lists for programmatic use\r\n- \u26a1 **Fast Processing**: Efficient classification based on pandas data types\r\n- \ud83d\udee1\ufe0f **Error Handling**: Validates input and handles edge cases like empty DataFrames\r\n\r\n### Data Imputation with `impute_numerical_median` and `impute_categorical_mode`\r\n\r\nAfter analyzing your data, you often need to handle missing values. The edaflow package provides two specialized imputation functions for this purpose:\r\n\r\n#### Numerical Imputation with `impute_numerical_median`\r\n\r\nThe `impute_numerical_median` function fills missing values in numerical columns using the median value:\r\n\r\n```python\r\nimport pandas as pd\r\nimport edaflow\r\n\r\n# Create sample data with missing numerical values\r\ndf = pd.DataFrame({\r\n 'age': [25, None, 35, None, 45],\r\n 'salary': [50000, 60000, None, 70000, None],\r\n 'score': [85.5, None, 92.0, 88.5, None],\r\n 'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve']\r\n})\r\n\r\n# Impute all numerical columns with median values\r\ndf_imputed = edaflow.impute_numerical_median(df)\r\n\r\n# Impute specific columns only\r\ndf_imputed = edaflow.impute_numerical_median(df, columns=['age', 'salary'])\r\n\r\n# Impute in place (modifies original DataFrame)\r\nedaflow.impute_numerical_median(df, inplace=True)\r\n```\r\n\r\n**Function Features:**\r\n- \ud83d\udd22 **Smart Detection**: Automatically identifies numerical columns (int, float, etc.)\r\n- \ud83d\udcca **Median Imputation**: Uses median values which are robust to outliers\r\n- \ud83c\udfaf **Selective Imputation**: Option to specify which columns to impute\r\n- \ud83d\udd04 **Inplace Option**: Modify original DataFrame or create new one\r\n- \ud83d\udee1\ufe0f **Safe Handling**: Gracefully handles edge cases like all-missing columns\r\n- \ud83d\udccb **Detailed Reporting**: Shows exactly what was imputed and summary statistics\r\n\r\n#### Categorical Imputation with `impute_categorical_mode`\r\n\r\nThe `impute_categorical_mode` function fills missing values in categorical columns using the mode (most frequent value):\r\n\r\n```python\r\nimport pandas as pd\r\nimport edaflow\r\n\r\n# Create sample data with missing categorical values\r\ndf = pd.DataFrame({\r\n 'category': ['A', 'B', 'A', None, 'A'],\r\n 'status': ['Active', None, 'Active', 'Inactive', None],\r\n 'priority': ['High', 'Medium', None, 'Low', 'High'],\r\n 'age': [25, 30, 35, 40, 45]\r\n})\r\n\r\n# Impute all categorical columns with mode values\r\ndf_imputed = edaflow.impute_categorical_mode(df)\r\n\r\n# Impute specific columns only\r\ndf_imputed = edaflow.impute_categorical_mode(df, columns=['category', 'status'])\r\n\r\n# Impute in place (modifies original DataFrame)\r\nedaflow.impute_categorical_mode(df, inplace=True)\r\n```\r\n\r\n**Function Features:**\r\n- \ud83d\udcdd **Smart Detection**: Automatically identifies categorical (object) columns\r\n- \ud83c\udfaf **Mode Imputation**: Uses most frequent value for each column\r\n- \u2696\ufe0f **Tie Handling**: Gracefully handles mode ties (multiple values with same frequency)\r\n- \ud83d\udd04 **Inplace Option**: Modify original DataFrame or create new one\r\n- \ud83d\udee1\ufe0f **Safe Handling**: Gracefully handles edge cases like all-missing columns\r\n- \ud83d\udccb **Detailed Reporting**: Shows exactly what was imputed and mode tie warnings\r\n\r\n#### Complete Imputation Workflow Example\r\n\r\n```python\r\nimport pandas as pd\r\nimport edaflow\r\n\r\n# Sample data with both numerical and categorical missing values\r\ndf = pd.DataFrame({\r\n 'age': [25, None, 35, None, 45],\r\n 'salary': [50000, None, 70000, 80000, None],\r\n 'category': ['A', 'B', None, 'A', None],\r\n 'status': ['Active', None, 'Active', 'Inactive', None],\r\n 'score': [85.5, 92.0, None, 88.5, None]\r\n})\r\n\r\nprint(\"Original DataFrame:\")\r\nprint(df)\r\nprint(\"\\n\" + \"=\"*50)\r\n\r\n# Step 1: Impute numerical columns\r\nprint(\"STEP 1: Numerical Imputation\")\r\ndf_step1 = edaflow.impute_numerical_median(df)\r\n\r\n# Step 2: Impute categorical columns\r\nprint(\"\\nSTEP 2: Categorical Imputation\")\r\ndf_final = edaflow.impute_categorical_mode(df_step1)\r\n\r\nprint(\"\\nFinal DataFrame (all missing values imputed):\")\r\nprint(df_final)\r\n\r\n# Verify no missing values remain\r\nprint(f\"\\nMissing values remaining: {df_final.isnull().sum().sum()}\")\r\n```\r\n\r\n**Expected Output:**\r\n```\r\n\ud83d\udd22 Numerical Missing Value Imputation (Median)\r\n=======================================================\r\n\ud83d\udd04 age - Imputed 2 values with median: 35.0\r\n\ud83d\udd04 salary - Imputed 2 values with median: 70000.0\r\n\ud83d\udd04 score - Imputed 1 values with median: 88.75\r\n\r\n\ud83d\udcca Imputation Summary:\r\n Columns processed: 3\r\n Columns imputed: 3\r\n Total values imputed: 5\r\n\r\n\ud83d\udcdd Categorical Missing Value Imputation (Mode)\r\n=======================================================\r\n\ud83d\udd04 category - Imputed 2 values with mode: 'A'\r\n\ud83d\udd04 status - Imputed 1 values with mode: 'Active'\r\n\r\n\ud83d\udcca Imputation Summary:\r\n Columns processed: 2\r\n Columns imputed: 2\r\n Total values imputed: 3\r\n```\r\n\r\n### Complete EDA Workflow Example\r\n\r\n```python\r\nimport pandas as pd\r\nimport edaflow\r\n\r\n# Load your dataset\r\ndf = pd.read_csv('customer_data.csv')\r\n\r\nprint(\"=== EXPLORATORY DATA ANALYSIS WITH EDAFLOW ===\")\r\nprint(f\"Dataset shape: {df.shape}\")\r\nprint(f\"Columns: {list(df.columns)}\")\r\n\r\n# Step 1: Check for missing data\r\nprint(\"\\n1. MISSING DATA ANALYSIS\")\r\nprint(\"-\" * 40)\r\nnull_analysis = edaflow.check_null_columns(df, threshold=15)\r\nnull_analysis # Shows color-coded missing data summary\r\n\r\n# Step 2: Analyze categorical columns for data type issues\r\nprint(\"\\n2. CATEGORICAL DATA ANALYSIS\") \r\nprint(\"-\" * 40)\r\nedaflow.analyze_categorical_columns(df, threshold=30)\r\n\r\n# Step 3: Convert appropriate columns to numeric automatically\r\nprint(\"\\n3. AUTOMATIC DATA TYPE CONVERSION\")\r\nprint(\"-\" * 40)\r\ndf_cleaned = edaflow.convert_to_numeric(df, threshold=30)\r\n\r\n# Step 4: Visualize categorical column values in detail\r\nprint(\"\\n4. CATEGORICAL VALUES EXPLORATION\")\r\nprint(\"-\" * 40)\r\nedaflow.visualize_categorical_values(df_cleaned, max_unique_values=10)\r\n\r\n# Step 5: Display column type classification\r\nprint(\"\\n5. COLUMN TYPE CLASSIFICATION\")\r\nprint(\"-\" * 40)\r\ncolumn_types = edaflow.display_column_types(df_cleaned)\r\n\r\n# Step 6: Handle missing values with imputation\r\nprint(\"\\n6. MISSING VALUE IMPUTATION\") \r\nprint(\"-\" * 40)\r\n# Impute numerical columns with median\r\ndf_numeric_imputed = edaflow.impute_numerical_median(df_cleaned)\r\n# Impute categorical columns with mode\r\ndf_fully_imputed = edaflow.impute_categorical_mode(df_numeric_imputed)\r\n\r\n# Step 7: Final data review\r\nprint(\"\\n7. DATA CLEANING SUMMARY\")\r\nprint(\"-\" * 40)\r\nprint(\"Original data types:\")\r\nprint(df.dtypes)\r\nprint(\"\\nCleaned data types:\")\r\nprint(df_fully_imputed.dtypes)\r\nprint(f\"\\nFinal dataset shape: {df_fully_imputed.shape}\")\r\nprint(f\"Missing values remaining: {df_fully_imputed.isnull().sum().sum()}\")\r\n\r\n# Now your data is ready for further analysis!\r\n# You can proceed with:\r\n# - Statistical analysis\r\n# - Machine learning preprocessing \r\n# - Visualization\r\n# - Advanced EDA techniques\r\n```\r\n\r\n### Integration with Jupyter Notebooks\r\n\r\nFor the best experience, use these functions in Jupyter notebooks where:\r\n- `check_null_columns()` displays beautiful color-coded tables\r\n- `analyze_categorical_columns()` shows colored terminal output\r\n- You can iterate quickly on data cleaning decisions\r\n\r\n```python\r\n# In Jupyter notebook cell\r\nimport pandas as pd\r\nimport edaflow\r\n\r\ndf = pd.read_csv('your_data.csv')\r\n\r\n# This will display a nicely formatted, color-coded table\r\nedaflow.check_null_columns(df)\r\n```\r\n\r\n# Load your dataset\r\ndf = pd.read_csv('data.csv')\r\n\r\n# Analyze categorical columns to identify potential issues\r\nedaflow.analyze_categorical_columns(df, threshold=35)\r\n\r\n# This will identify:\r\n# - Object columns that might actually be numeric (need conversion)\r\n# - Truly categorical columns with their unique values\r\n# - Mixed data type issues\r\n```\r\n\r\n### Working with Data (Future Implementation)\r\n```python\r\nimport pandas as pd\r\nimport edaflow\r\n\r\n# Load your dataset\r\ndf = pd.read_csv('data.csv')\r\n\r\n# Perform EDA workflow\r\n# summary = edaflow.quick_summary(df)\r\n# edaflow.plot_overview(df)\r\n# clean_df = edaflow.clean_data(df)\r\n```\r\n\r\n## Project Structure\r\n\r\n```\r\nedaflow/\r\n\u251c\u2500\u2500 edaflow/\r\n\u2502 \u251c\u2500\u2500 __init__.py\r\n\u2502 \u251c\u2500\u2500 analysis/\r\n\u2502 \u251c\u2500\u2500 visualization/\r\n\u2502 \u2514\u2500\u2500 preprocessing/\r\n\u251c\u2500\u2500 tests/\r\n\u251c\u2500\u2500 docs/\r\n\u251c\u2500\u2500 examples/\r\n\u251c\u2500\u2500 setup.py\r\n\u251c\u2500\u2500 requirements.txt\r\n\u251c\u2500\u2500 README.md\r\n\u2514\u2500\u2500 LICENSE\r\n```\r\n\r\n## Contributing\r\n\r\n1. Fork the repository\r\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\r\n3. Commit your changes (`git commit -m 'Add some amazing feature'`)\r\n4. Push to the branch (`git push origin feature/amazing-feature`)\r\n5. Open a Pull Request\r\n\r\n## Development\r\n\r\n### Setup Development Environment\r\n```bash\r\n# Clone the repository\r\ngit clone https://github.com/evanlow/edaflow.git\r\ncd edaflow\r\n\r\n# Create virtual environment\r\npython -m venv venv\r\nsource venv/bin/activate # On Windows: venv\\Scripts\\activate\r\n\r\n# Install in development mode\r\npip install -e \".[dev]\"\r\n\r\n# Run tests\r\npytest\r\n\r\n# Run linting\r\nflake8 edaflow/\r\nblack edaflow/\r\nisort edaflow/\r\n```\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## Changelog\r\n\r\n### v0.4.0 (Data Imputation Release)\r\n- **NEW**: `impute_numerical_median()` function for numerical missing value imputation using median\r\n- **NEW**: `impute_categorical_mode()` function for categorical missing value imputation using mode\r\n- **NEW**: Complete 7-function EDA workflow: analyze \u2192 convert \u2192 visualize \u2192 classify \u2192 impute\r\n- **NEW**: Smart column detection and validation for imputation functions\r\n- **NEW**: Inplace imputation option with detailed reporting and error handling\r\n- **NEW**: Comprehensive edge case handling (empty DataFrames, all missing values, mode ties)\r\n- Enhanced testing coverage with 54 comprehensive tests achieving 93% coverage\r\n\r\n### v0.3.1 (Feature Enhancement)\r\n- **NEW**: `display_column_types()` function for column type classification\r\n- **NEW**: Complete 5-function EDA workflow: analyze \u2192 convert \u2192 visualize \u2192 classify\r\n- **ENHANCED**: Updated comprehensive examples with full 5-function workflow\r\n- Enhanced testing coverage with 32 comprehensive tests covering all functions\r\n\r\n### v0.3.0 (Major Feature Release)\r\n- **NEW**: `convert_to_numeric()` function for automatic data type conversion\r\n- **NEW**: `visualize_categorical_values()` function for detailed categorical data exploration\r\n- **NEW**: Smart threshold-based conversion with detailed reporting\r\n- **NEW**: Inplace conversion option for flexible DataFrame modification\r\n- **NEW**: Safe conversion with NaN handling for invalid values\r\n- **NEW**: High-cardinality handling and data quality insights\r\n- Enhanced testing coverage with comprehensive tests\r\n\r\n### v0.2.1 (Documentation Enhancement)\r\n- **ENHANCED**: Comprehensive README with detailed usage examples\r\n- **NEW**: Step-by-step examples for both `check_null_columns()` and `analyze_categorical_columns()`\r\n- **NEW**: Complete EDA workflow example showing real-world usage\r\n- **NEW**: Jupyter notebook integration examples\r\n- **IMPROVED**: Color-coding explanations and output interpretation guides\r\n\r\n### v0.2.0 (Feature Release)\r\n- **NEW**: `analyze_categorical_columns()` function for categorical data analysis\r\n- **NEW**: Smart detection of object columns that might be numeric\r\n- **NEW**: Color-coded terminal output for better readability\r\n- Enhanced testing coverage with 12 comprehensive tests\r\n- Improved documentation with detailed usage examples\r\n\r\n### v0.1.1 (Documentation Update)\r\n- Updated README with improved acknowledgments\r\n- Fixed GitHub repository URLs\r\n- Enhanced PyPI package presentation\r\n\r\n### v0.1.0 (Initial Release)\r\n- Basic package structure\r\n- Sample hello() function\r\n- `check_null_columns()` function for missing data analysis\r\n- Core dependencies setup\r\n- Documentation framework\r\n\r\n## Support\r\n\r\nIf you encounter any issues or have questions, please file an issue on the [GitHub repository](https://github.com/evanlow/edaflow/issues).\r\n\r\n## Roadmap\r\n\r\n- [ ] Core analysis modules\r\n- [ ] Visualization utilities\r\n- [ ] Data preprocessing tools\r\n- [ ] Missing data handling\r\n- [ ] Statistical testing suite\r\n- [ ] Interactive dashboards\r\n- [ ] CLI interface\r\n- [ ] Documentation website\r\n\r\n## Acknowledgments\r\n\r\nedaflow was developed during the AI/ML course conducted by NTUC LearningHub. I am grateful for the privilege of working alongside my coursemates from Cohort 15. A special thanks to our awesome instructor, Ms. Isha Sehgal, who not only inspired us but also instilled the data science discipline that we now possess\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python package for exploratory data analysis workflows",
"version": "0.4.0",
"project_urls": {
"Bug Tracker": "https://github.com/evanlow/edaflow/issues",
"Changelog": "https://github.com/evanlow/edaflow/blob/main/CHANGELOG.md",
"Documentation": "https://edaflow.readthedocs.io",
"Homepage": "https://github.com/evanlow/edaflow",
"Repository": "https://github.com/evanlow/edaflow.git"
},
"split_keywords": [
"data-analysis",
" eda",
" exploratory-data-analysis",
" data-science",
" visualization"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "a610864c6459c06cf27cd03d5398acee709181613cd8608dbcc085e33a8b31c2",
"md5": "209ddd0e03fdc5c90c5ca1b18f95638c",
"sha256": "2f1959c64b5950c955a2b23ae0df9e114af329d844b0e3caf421cf4d6da1c72a"
},
"downloads": -1,
"filename": "edaflow-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "209ddd0e03fdc5c90c5ca1b18f95638c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 16887,
"upload_time": "2025-08-04T04:40:31",
"upload_time_iso_8601": "2025-08-04T04:40:31.199684Z",
"url": "https://files.pythonhosted.org/packages/a6/10/864c6459c06cf27cd03d5398acee709181613cd8608dbcc085e33a8b31c2/edaflow-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2ef811c78b14311d1c323f5ea62f1f32675d221ad8971fc8132726fda18d607e",
"md5": "1b1b281ae70b80c87a9f0e4bd3848811",
"sha256": "c514c78a10f3a389bef9f36a3e2d4bc3bbe7f6cb23abaee8f66a158f1da4fc52"
},
"downloads": -1,
"filename": "edaflow-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "1b1b281ae70b80c87a9f0e4bd3848811",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 31281,
"upload_time": "2025-08-04T04:40:32",
"upload_time_iso_8601": "2025-08-04T04:40:32.697889Z",
"url": "https://files.pythonhosted.org/packages/2e/f8/11c78b14311d1c323f5ea62f1f32675d221ad8971fc8132726fda18d607e/edaflow-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-04 04:40:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "evanlow",
"github_project": "edaflow",
"github_not_found": true,
"lcname": "edaflow"
}