# Databroom
A DataFrame cleaning tool with CLI, GUI, and code generation capabilities.
## Why Databroom?
**Manual pandas approach:**
```python
# 15+ lines of repetitive code
import pandas as pd
import unicodedata
df = pd.read_csv("messy_data.csv")
# Remove empty columns
df = df.loc[:, df.isnull().mean() < 0.9]
# Clean column names
df.columns = df.columns.str.lower().str.replace(' ', '_')
# Remove accents from text values
def clean_text(text):
if pd.isna(text): return text
return ''.join(c for c in unicodedata.normalize('NFKD', str(text))
if not unicodedata.combining(c))
for col in df.select_dtypes(include=['object']).columns:
df[col] = df[col].apply(clean_text)
df.to_csv("clean_data.csv", index=False)
```
**Databroom approach:**
```bash
# Single command
databroom clean messy_data.csv --clean-all --output-file clean_data.csv
```
## Installation
```bash
pip install databroom
```
## Quick Start
### Command Line Interface
```bash
# Clean everything (recommended)
databroom clean data.csv --clean-all --output-file cleaned.csv
# Clean only columns
databroom clean data.csv --clean-columns --output-file cleaned.csv
# Clean with code generation
databroom clean data.csv --clean-all --output-code script.py
# Generate R code
databroom clean data.csv --clean-all --output-code script.R --lang r
# Launch interactive GUI
databroom gui
```
### Python API
```python
from databroom.core.broom import Broom
# Load and clean data
broom = Broom.from_csv('data.csv')
cleaned = broom.clean_all() # Smart clean everything
# Or use specific operations
cleaned = broom.clean_columns().clean_rows()
# Get cleaned DataFrame
df = cleaned.get_df()
```
## Features
- **Smart Operations**: `--clean-all`, `--clean-columns`, `--clean-rows`
- **Advanced Options**: Fine-tune with `--no-snakecase`, `--empty-threshold`, etc.
- **Code Generation**: Export Python/pandas or R/tidyverse scripts
- **Interactive GUI**: Streamlit-based web interface
- **File Support**: CSV, Excel, JSON input/output
## Available Operations
| Operation | Description |
|-----------|-------------|
| `clean_all()` | Complete cleaning: columns + rows with all operations |
| `clean_columns()` | Clean column names: snake_case + remove accents + remove empty |
| `clean_rows()` | Clean row data: snake_case + remove accents + remove empty |
### Legacy operations (still supported)
- `remove_empty_cols()`, `remove_empty_rows()`
- `standardize_column_names()`, `normalize_column_names()`
- `normalize_values()`, `standardize_values()`
## CLI Parameters
```bash
# Smart Operations
--clean-all # Clean everything
--clean-columns # Clean column names only
--clean-rows # Clean row data only
# Advanced Options
--no-snakecase # Keep original text case
--no-remove-accents-vals # Keep accents in values
--empty-threshold 0.8 # Custom missing value threshold
# Output
--output-file clean.csv # Save cleaned data
--output-code script.py # Generate reproducible code
--lang python # Code language (python/r)
```
## Examples
### Data Science Workflow
```bash
databroom clean survey.xlsx \
--clean-all \
--empty-threshold 0.7 \
--output-file clean.csv \
--output-code analysis.py
```
### R/Tidyverse Code Generation
```bash
databroom clean data.csv \
--clean-all \
--output-code analysis.R \
--lang r
```
### Batch Processing
```bash
for file in *.csv; do
databroom clean "$file" --clean-columns --output-file "clean_$file"
done
```
## GUI Interface
Launch the interactive web interface:
```bash
databroom gui
# Opens http://localhost:8501
```
Features:
- Drag & drop file upload
- Live preview of operations
- Interactive parameter tuning
- Real-time code generation
- One-click download
## Method Chaining
```python
from databroom.core.broom import Broom
result = (Broom.from_csv('messy_data.csv')
.clean_columns(empty_threshold=0.8)
.clean_rows(snakecase=False)
.get_df())
```
## Code Generation
All operations automatically generate reproducible code:
```python
# Generated Python code
import pandas as pd
from databroom.core.broom import Broom
broom_instance = Broom.from_csv("data.csv")
broom_instance = broom_instance.clean_all()
df_cleaned = broom_instance.pipeline.df
```
## License
MIT License - see LICENSE file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "databroom",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Oliver Lozano <onlozanoo@gmail.com>",
"keywords": "data-cleaning, pandas, streamlit, data-preprocessing, code-generation, gui, dataframe",
"author": null,
"author_email": "Oliver Lozano <onlozanoo@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/4c/4f/80591542c23f78906a2d2be1bc4cf119629e52c48f711eee8d99a8faab02/databroom-0.3.1.tar.gz",
"platform": null,
"description": "# Databroom\r\n\r\nA DataFrame cleaning tool with CLI, GUI, and code generation capabilities.\r\n\r\n## Why Databroom?\r\n\r\n**Manual pandas approach:**\r\n```python\r\n# 15+ lines of repetitive code\r\nimport pandas as pd\r\nimport unicodedata\r\n\r\ndf = pd.read_csv(\"messy_data.csv\")\r\n# Remove empty columns\r\ndf = df.loc[:, df.isnull().mean() < 0.9]\r\n# Clean column names\r\ndf.columns = df.columns.str.lower().str.replace(' ', '_')\r\n# Remove accents from text values\r\ndef clean_text(text):\r\n if pd.isna(text): return text\r\n return ''.join(c for c in unicodedata.normalize('NFKD', str(text)) \r\n if not unicodedata.combining(c))\r\nfor col in df.select_dtypes(include=['object']).columns:\r\n df[col] = df[col].apply(clean_text)\r\ndf.to_csv(\"clean_data.csv\", index=False)\r\n```\r\n\r\n**Databroom approach:**\r\n```bash\r\n# Single command\r\ndatabroom clean messy_data.csv --clean-all --output-file clean_data.csv\r\n```\r\n\r\n## Installation\r\n\r\n```bash\r\npip install databroom\r\n```\r\n\r\n## Quick Start\r\n\r\n### Command Line Interface\r\n\r\n```bash\r\n# Clean everything (recommended)\r\ndatabroom clean data.csv --clean-all --output-file cleaned.csv\r\n\r\n# Clean only columns\r\ndatabroom clean data.csv --clean-columns --output-file cleaned.csv\r\n\r\n# Clean with code generation\r\ndatabroom clean data.csv --clean-all --output-code script.py\r\n\r\n# Generate R code\r\ndatabroom clean data.csv --clean-all --output-code script.R --lang r\r\n\r\n# Launch interactive GUI\r\ndatabroom gui\r\n```\r\n\r\n### Python API\r\n\r\n```python\r\nfrom databroom.core.broom import Broom\r\n\r\n# Load and clean data\r\nbroom = Broom.from_csv('data.csv')\r\ncleaned = broom.clean_all() # Smart clean everything\r\n\r\n# Or use specific operations\r\ncleaned = broom.clean_columns().clean_rows()\r\n\r\n# Get cleaned DataFrame\r\ndf = cleaned.get_df()\r\n```\r\n\r\n## Features\r\n\r\n- **Smart Operations**: `--clean-all`, `--clean-columns`, `--clean-rows`\r\n- **Advanced Options**: Fine-tune with `--no-snakecase`, `--empty-threshold`, etc.\r\n- **Code Generation**: Export Python/pandas or R/tidyverse scripts\r\n- **Interactive GUI**: Streamlit-based web interface\r\n- **File Support**: CSV, Excel, JSON input/output\r\n\r\n## Available Operations\r\n\r\n| Operation | Description |\r\n|-----------|-------------|\r\n| `clean_all()` | Complete cleaning: columns + rows with all operations |\r\n| `clean_columns()` | Clean column names: snake_case + remove accents + remove empty |\r\n| `clean_rows()` | Clean row data: snake_case + remove accents + remove empty |\r\n\r\n### Legacy operations (still supported)\r\n- `remove_empty_cols()`, `remove_empty_rows()`\r\n- `standardize_column_names()`, `normalize_column_names()`\r\n- `normalize_values()`, `standardize_values()`\r\n\r\n## CLI Parameters\r\n\r\n```bash\r\n# Smart Operations\r\n--clean-all # Clean everything\r\n--clean-columns # Clean column names only \r\n--clean-rows # Clean row data only\r\n\r\n# Advanced Options\r\n--no-snakecase # Keep original text case\r\n--no-remove-accents-vals # Keep accents in values\r\n--empty-threshold 0.8 # Custom missing value threshold\r\n\r\n# Output\r\n--output-file clean.csv # Save cleaned data\r\n--output-code script.py # Generate reproducible code\r\n--lang python # Code language (python/r)\r\n```\r\n\r\n## Examples\r\n\r\n### Data Science Workflow\r\n```bash\r\ndatabroom clean survey.xlsx \\\r\n --clean-all \\\r\n --empty-threshold 0.7 \\\r\n --output-file clean.csv \\\r\n --output-code analysis.py\r\n```\r\n\r\n### R/Tidyverse Code Generation\r\n```bash\r\ndatabroom clean data.csv \\\r\n --clean-all \\\r\n --output-code analysis.R \\\r\n --lang r\r\n```\r\n\r\n### Batch Processing\r\n```bash\r\nfor file in *.csv; do\r\n databroom clean \"$file\" --clean-columns --output-file \"clean_$file\"\r\ndone\r\n```\r\n\r\n## GUI Interface\r\n\r\nLaunch the interactive web interface:\r\n\r\n```bash\r\ndatabroom gui\r\n# Opens http://localhost:8501\r\n```\r\n\r\nFeatures:\r\n- Drag & drop file upload\r\n- Live preview of operations\r\n- Interactive parameter tuning\r\n- Real-time code generation\r\n- One-click download\r\n\r\n## Method Chaining\r\n\r\n```python\r\nfrom databroom.core.broom import Broom\r\n\r\nresult = (Broom.from_csv('messy_data.csv')\r\n .clean_columns(empty_threshold=0.8)\r\n .clean_rows(snakecase=False)\r\n .get_df())\r\n```\r\n\r\n## Code Generation\r\n\r\nAll operations automatically generate reproducible code:\r\n\r\n```python\r\n# Generated Python code\r\nimport pandas as pd\r\nfrom databroom.core.broom import Broom\r\n\r\nbroom_instance = Broom.from_csv(\"data.csv\")\r\nbroom_instance = broom_instance.clean_all()\r\ndf_cleaned = broom_instance.pipeline.df\r\n```\r\n\r\n## License\r\n\r\nMIT License - see LICENSE file for details.\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A cross-language DataFrame cleaning assistant with interactive GUI and one-click code export",
"version": "0.3.1",
"project_urls": {
"Changelog": "https://github.com/onlozanoo/databroom/releases",
"Documentation": "https://github.com/onlozanoo/databroom/blob/main/README.md",
"Homepage": "https://github.com/onlozanoo/databroom",
"Issues": "https://github.com/onlozanoo/databroom/issues",
"Repository": "https://github.com/onlozanoo/databroom"
},
"split_keywords": [
"data-cleaning",
" pandas",
" streamlit",
" data-preprocessing",
" code-generation",
" gui",
" dataframe"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "473eceb778938ebb252f347825ed34b21eef6cf27f9b6d7887d684d0a6d4b9a4",
"md5": "62c5365da6f86dfc2a7fa0a97d187115",
"sha256": "eb1629060d796161d02b41af3ae2544cf76475044ff04c86fe5650658bca513a"
},
"downloads": -1,
"filename": "databroom-0.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "62c5365da6f86dfc2a7fa0a97d187115",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 36859,
"upload_time": "2025-07-30T06:36:36",
"upload_time_iso_8601": "2025-07-30T06:36:36.729043Z",
"url": "https://files.pythonhosted.org/packages/47/3e/ceb778938ebb252f347825ed34b21eef6cf27f9b6d7887d684d0a6d4b9a4/databroom-0.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4c4f80591542c23f78906a2d2be1bc4cf119629e52c48f711eee8d99a8faab02",
"md5": "75daf3584a45c39880442a081749b2a2",
"sha256": "65c6cbf6441f88b37608e649ff0ee40bc2919ade0cfe10e57c0d5fdf9f862c04"
},
"downloads": -1,
"filename": "databroom-0.3.1.tar.gz",
"has_sig": false,
"md5_digest": "75daf3584a45c39880442a081749b2a2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 37132,
"upload_time": "2025-07-30T06:36:37",
"upload_time_iso_8601": "2025-07-30T06:36:37.872983Z",
"url": "https://files.pythonhosted.org/packages/4c/4f/80591542c23f78906a2d2be1bc4cf119629e52c48f711eee8d99a8faab02/databroom-0.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-30 06:36:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "onlozanoo",
"github_project": "databroom",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pandas",
"specs": [
[
">=",
"1.3.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.20.0"
]
]
},
{
"name": "streamlit",
"specs": [
[
">=",
"1.28.0"
]
]
},
{
"name": "unidecode",
"specs": [
[
">=",
"1.3.0"
]
]
},
{
"name": "jinja2",
"specs": [
[
">=",
"3.0.0"
]
]
},
{
"name": "pathlib2",
"specs": [
[
">=",
"2.3.0"
]
]
}
],
"lcname": "databroom"
}