databroom


Namedatabroom JSON
Version 0.3.1 PyPI version JSON
download
home_pageNone
SummaryA cross-language DataFrame cleaning assistant with interactive GUI and one-click code export
upload_time2025-07-30 06:36:37
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords data-cleaning pandas streamlit data-preprocessing code-generation gui dataframe
VCS
bugtrack_url
requirements pandas numpy streamlit unidecode jinja2 pathlib2
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Databroom

A DataFrame cleaning tool with CLI, GUI, and code generation capabilities.

## Why Databroom?

**Manual pandas approach:**
```python
# 15+ lines of repetitive code
import pandas as pd
import unicodedata

df = pd.read_csv("messy_data.csv")
# Remove empty columns
df = df.loc[:, df.isnull().mean() < 0.9]
# Clean column names
df.columns = df.columns.str.lower().str.replace(' ', '_')
# Remove accents from text values
def clean_text(text):
    if pd.isna(text): return text
    return ''.join(c for c in unicodedata.normalize('NFKD', str(text)) 
                   if not unicodedata.combining(c))
for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].apply(clean_text)
df.to_csv("clean_data.csv", index=False)
```

**Databroom approach:**
```bash
# Single command
databroom clean messy_data.csv --clean-all --output-file clean_data.csv
```

## Installation

```bash
pip install databroom
```

## Quick Start

### Command Line Interface

```bash
# Clean everything (recommended)
databroom clean data.csv --clean-all --output-file cleaned.csv

# Clean only columns
databroom clean data.csv --clean-columns --output-file cleaned.csv

# Clean with code generation
databroom clean data.csv --clean-all --output-code script.py

# Generate R code
databroom clean data.csv --clean-all --output-code script.R --lang r

# Launch interactive GUI
databroom gui
```

### Python API

```python
from databroom.core.broom import Broom

# Load and clean data
broom = Broom.from_csv('data.csv')
cleaned = broom.clean_all()  # Smart clean everything

# Or use specific operations
cleaned = broom.clean_columns().clean_rows()

# Get cleaned DataFrame
df = cleaned.get_df()
```

## Features

- **Smart Operations**: `--clean-all`, `--clean-columns`, `--clean-rows`
- **Advanced Options**: Fine-tune with `--no-snakecase`, `--empty-threshold`, etc.
- **Code Generation**: Export Python/pandas or R/tidyverse scripts
- **Interactive GUI**: Streamlit-based web interface
- **File Support**: CSV, Excel, JSON input/output

## Available Operations

| Operation | Description |
|-----------|-------------|
| `clean_all()` | Complete cleaning: columns + rows with all operations |
| `clean_columns()` | Clean column names: snake_case + remove accents + remove empty |
| `clean_rows()` | Clean row data: snake_case + remove accents + remove empty |

### Legacy operations (still supported)
- `remove_empty_cols()`, `remove_empty_rows()`
- `standardize_column_names()`, `normalize_column_names()`
- `normalize_values()`, `standardize_values()`

## CLI Parameters

```bash
# Smart Operations
--clean-all              # Clean everything
--clean-columns          # Clean column names only  
--clean-rows            # Clean row data only

# Advanced Options
--no-snakecase          # Keep original text case
--no-remove-accents-vals # Keep accents in values
--empty-threshold 0.8   # Custom missing value threshold

# Output
--output-file clean.csv # Save cleaned data
--output-code script.py # Generate reproducible code
--lang python          # Code language (python/r)
```

## Examples

### Data Science Workflow
```bash
databroom clean survey.xlsx \
  --clean-all \
  --empty-threshold 0.7 \
  --output-file clean.csv \
  --output-code analysis.py
```

### R/Tidyverse Code Generation
```bash
databroom clean data.csv \
  --clean-all \
  --output-code analysis.R \
  --lang r
```

### Batch Processing
```bash
for file in *.csv; do
  databroom clean "$file" --clean-columns --output-file "clean_$file"
done
```

## GUI Interface

Launch the interactive web interface:

```bash
databroom gui
# Opens http://localhost:8501
```

Features:
- Drag & drop file upload
- Live preview of operations
- Interactive parameter tuning
- Real-time code generation
- One-click download

## Method Chaining

```python
from databroom.core.broom import Broom

result = (Broom.from_csv('messy_data.csv')
          .clean_columns(empty_threshold=0.8)
          .clean_rows(snakecase=False)
          .get_df())
```

## Code Generation

All operations automatically generate reproducible code:

```python
# Generated Python code
import pandas as pd
from databroom.core.broom import Broom

broom_instance = Broom.from_csv("data.csv")
broom_instance = broom_instance.clean_all()
df_cleaned = broom_instance.pipeline.df
```

## License

MIT License - see LICENSE file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "databroom",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Oliver Lozano <onlozanoo@gmail.com>",
    "keywords": "data-cleaning, pandas, streamlit, data-preprocessing, code-generation, gui, dataframe",
    "author": null,
    "author_email": "Oliver Lozano <onlozanoo@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/4c/4f/80591542c23f78906a2d2be1bc4cf119629e52c48f711eee8d99a8faab02/databroom-0.3.1.tar.gz",
    "platform": null,
    "description": "# Databroom\r\n\r\nA DataFrame cleaning tool with CLI, GUI, and code generation capabilities.\r\n\r\n## Why Databroom?\r\n\r\n**Manual pandas approach:**\r\n```python\r\n# 15+ lines of repetitive code\r\nimport pandas as pd\r\nimport unicodedata\r\n\r\ndf = pd.read_csv(\"messy_data.csv\")\r\n# Remove empty columns\r\ndf = df.loc[:, df.isnull().mean() < 0.9]\r\n# Clean column names\r\ndf.columns = df.columns.str.lower().str.replace(' ', '_')\r\n# Remove accents from text values\r\ndef clean_text(text):\r\n    if pd.isna(text): return text\r\n    return ''.join(c for c in unicodedata.normalize('NFKD', str(text)) \r\n                   if not unicodedata.combining(c))\r\nfor col in df.select_dtypes(include=['object']).columns:\r\n    df[col] = df[col].apply(clean_text)\r\ndf.to_csv(\"clean_data.csv\", index=False)\r\n```\r\n\r\n**Databroom approach:**\r\n```bash\r\n# Single command\r\ndatabroom clean messy_data.csv --clean-all --output-file clean_data.csv\r\n```\r\n\r\n## Installation\r\n\r\n```bash\r\npip install databroom\r\n```\r\n\r\n## Quick Start\r\n\r\n### Command Line Interface\r\n\r\n```bash\r\n# Clean everything (recommended)\r\ndatabroom clean data.csv --clean-all --output-file cleaned.csv\r\n\r\n# Clean only columns\r\ndatabroom clean data.csv --clean-columns --output-file cleaned.csv\r\n\r\n# Clean with code generation\r\ndatabroom clean data.csv --clean-all --output-code script.py\r\n\r\n# Generate R code\r\ndatabroom clean data.csv --clean-all --output-code script.R --lang r\r\n\r\n# Launch interactive GUI\r\ndatabroom gui\r\n```\r\n\r\n### Python API\r\n\r\n```python\r\nfrom databroom.core.broom import Broom\r\n\r\n# Load and clean data\r\nbroom = Broom.from_csv('data.csv')\r\ncleaned = broom.clean_all()  # Smart clean everything\r\n\r\n# Or use specific operations\r\ncleaned = broom.clean_columns().clean_rows()\r\n\r\n# Get cleaned DataFrame\r\ndf = cleaned.get_df()\r\n```\r\n\r\n## Features\r\n\r\n- **Smart Operations**: `--clean-all`, `--clean-columns`, `--clean-rows`\r\n- **Advanced Options**: Fine-tune with `--no-snakecase`, `--empty-threshold`, etc.\r\n- **Code Generation**: Export Python/pandas or R/tidyverse scripts\r\n- **Interactive GUI**: Streamlit-based web interface\r\n- **File Support**: CSV, Excel, JSON input/output\r\n\r\n## Available Operations\r\n\r\n| Operation | Description |\r\n|-----------|-------------|\r\n| `clean_all()` | Complete cleaning: columns + rows with all operations |\r\n| `clean_columns()` | Clean column names: snake_case + remove accents + remove empty |\r\n| `clean_rows()` | Clean row data: snake_case + remove accents + remove empty |\r\n\r\n### Legacy operations (still supported)\r\n- `remove_empty_cols()`, `remove_empty_rows()`\r\n- `standardize_column_names()`, `normalize_column_names()`\r\n- `normalize_values()`, `standardize_values()`\r\n\r\n## CLI Parameters\r\n\r\n```bash\r\n# Smart Operations\r\n--clean-all              # Clean everything\r\n--clean-columns          # Clean column names only  \r\n--clean-rows            # Clean row data only\r\n\r\n# Advanced Options\r\n--no-snakecase          # Keep original text case\r\n--no-remove-accents-vals # Keep accents in values\r\n--empty-threshold 0.8   # Custom missing value threshold\r\n\r\n# Output\r\n--output-file clean.csv # Save cleaned data\r\n--output-code script.py # Generate reproducible code\r\n--lang python          # Code language (python/r)\r\n```\r\n\r\n## Examples\r\n\r\n### Data Science Workflow\r\n```bash\r\ndatabroom clean survey.xlsx \\\r\n  --clean-all \\\r\n  --empty-threshold 0.7 \\\r\n  --output-file clean.csv \\\r\n  --output-code analysis.py\r\n```\r\n\r\n### R/Tidyverse Code Generation\r\n```bash\r\ndatabroom clean data.csv \\\r\n  --clean-all \\\r\n  --output-code analysis.R \\\r\n  --lang r\r\n```\r\n\r\n### Batch Processing\r\n```bash\r\nfor file in *.csv; do\r\n  databroom clean \"$file\" --clean-columns --output-file \"clean_$file\"\r\ndone\r\n```\r\n\r\n## GUI Interface\r\n\r\nLaunch the interactive web interface:\r\n\r\n```bash\r\ndatabroom gui\r\n# Opens http://localhost:8501\r\n```\r\n\r\nFeatures:\r\n- Drag & drop file upload\r\n- Live preview of operations\r\n- Interactive parameter tuning\r\n- Real-time code generation\r\n- One-click download\r\n\r\n## Method Chaining\r\n\r\n```python\r\nfrom databroom.core.broom import Broom\r\n\r\nresult = (Broom.from_csv('messy_data.csv')\r\n          .clean_columns(empty_threshold=0.8)\r\n          .clean_rows(snakecase=False)\r\n          .get_df())\r\n```\r\n\r\n## Code Generation\r\n\r\nAll operations automatically generate reproducible code:\r\n\r\n```python\r\n# Generated Python code\r\nimport pandas as pd\r\nfrom databroom.core.broom import Broom\r\n\r\nbroom_instance = Broom.from_csv(\"data.csv\")\r\nbroom_instance = broom_instance.clean_all()\r\ndf_cleaned = broom_instance.pipeline.df\r\n```\r\n\r\n## License\r\n\r\nMIT License - see LICENSE file for details.\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A cross-language DataFrame cleaning assistant with interactive GUI and one-click code export",
    "version": "0.3.1",
    "project_urls": {
        "Changelog": "https://github.com/onlozanoo/databroom/releases",
        "Documentation": "https://github.com/onlozanoo/databroom/blob/main/README.md",
        "Homepage": "https://github.com/onlozanoo/databroom",
        "Issues": "https://github.com/onlozanoo/databroom/issues",
        "Repository": "https://github.com/onlozanoo/databroom"
    },
    "split_keywords": [
        "data-cleaning",
        " pandas",
        " streamlit",
        " data-preprocessing",
        " code-generation",
        " gui",
        " dataframe"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "473eceb778938ebb252f347825ed34b21eef6cf27f9b6d7887d684d0a6d4b9a4",
                "md5": "62c5365da6f86dfc2a7fa0a97d187115",
                "sha256": "eb1629060d796161d02b41af3ae2544cf76475044ff04c86fe5650658bca513a"
            },
            "downloads": -1,
            "filename": "databroom-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "62c5365da6f86dfc2a7fa0a97d187115",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 36859,
            "upload_time": "2025-07-30T06:36:36",
            "upload_time_iso_8601": "2025-07-30T06:36:36.729043Z",
            "url": "https://files.pythonhosted.org/packages/47/3e/ceb778938ebb252f347825ed34b21eef6cf27f9b6d7887d684d0a6d4b9a4/databroom-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4c4f80591542c23f78906a2d2be1bc4cf119629e52c48f711eee8d99a8faab02",
                "md5": "75daf3584a45c39880442a081749b2a2",
                "sha256": "65c6cbf6441f88b37608e649ff0ee40bc2919ade0cfe10e57c0d5fdf9f862c04"
            },
            "downloads": -1,
            "filename": "databroom-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "75daf3584a45c39880442a081749b2a2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 37132,
            "upload_time": "2025-07-30T06:36:37",
            "upload_time_iso_8601": "2025-07-30T06:36:37.872983Z",
            "url": "https://files.pythonhosted.org/packages/4c/4f/80591542c23f78906a2d2be1bc4cf119629e52c48f711eee8d99a8faab02/databroom-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-30 06:36:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "onlozanoo",
    "github_project": "databroom",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.20.0"
                ]
            ]
        },
        {
            "name": "streamlit",
            "specs": [
                [
                    ">=",
                    "1.28.0"
                ]
            ]
        },
        {
            "name": "unidecode",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "jinja2",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "pathlib2",
            "specs": [
                [
                    ">=",
                    "2.3.0"
                ]
            ]
        }
    ],
    "lcname": "databroom"
}
        
Elapsed time: 1.67153s