easydata-ds


Nameeasydata-ds JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/coleragone/easydata
SummaryA Python library for data scientists to easily apply functions to datasets with a terminal UI
upload_time2025-10-24 03:34:14
maintainerNone
docs_urlNone
authorCole Ragone
requires_python>=3.7
licenseMIT
keywords data science pandas terminal ui data processing decorator
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # EasyData - Data Science Function Runner

EasyData is a Python library that makes it easy for data scientists to create reusable functions and run them on datasets through a beautiful terminal interface.

## Features

- 🎯 **Simple Decorator**: Wrap your data science functions with `@data_function`
- 🖥️ **Terminal UI**: Browse directories and select datasets interactively
- 📊 **Progress Tracking**: Built-in progress bars for long-running operations
- 📁 **Multiple Formats**: Support for CSV, Excel, JSON, Parquet, and TSV files
- 💾 **Easy Output**: Save results back to files with one click
- 🔍 **File Preview**: See file information and sample data before processing

## Installation

### From PyPI (when published)
```bash
pip install easydata-ds
```

### From Source
```bash
git clone https://github.com/coleragone/easydata.git
cd easydata
pip install -e .
```

### Development Installation
```bash
git clone https://github.com/coleragone/easydata.git
cd easydata
pip install -e ".[dev]"
```

## Quick Start

1. **Create a Python script with decorated functions:**

```python
import pandas as pd
from easyData import data_function, run_data_functions

@data_function(
    description="Add True/False column based on condition",
    input_types=['csv', 'xlsx'],
    output_types=['csv'],
    progress_enabled=True
)
def tag_condition(data):
    """Tag rows where value > 100"""
    data['is_high_value'] = data['value'] > 100
    return data

if __name__ == "__main__":
    run_data_functions()
```

2. **Run your script:**

```bash
python your_script.py
```

3. **Use the terminal UI:**
   - Select your function from the list
   - Browse directories to find your dataset
   - Preview file information and sample data
   - Run the function with progress tracking
   - Save results to a new file

## Decorator Parameters

The `@data_function` decorator accepts several parameters:

- `description`: Human-readable description of what the function does
- `input_types`: List of supported input file types (e.g., `['csv', 'xlsx', 'json']`)
- `output_types`: List of supported output file types (e.g., `['csv', 'xlsx']`)
- `progress_enabled`: Whether to show progress bars (default: `True`)
- `batch_size`: Number of rows to process at once for progress tracking (default: `1000`)

## Example Functions

### Data Tagging
```python
@data_function(
    description="Tag high-value customers",
    input_types=['csv', 'xlsx'],
    progress_enabled=True
)
def tag_high_value_customers(data):
    data['is_high_value'] = data['revenue'] > 10000
    return data
```

### Data Cleaning
```python
@data_function(
    description="Clean and standardize text data",
    input_types=['csv'],
    batch_size=500
)
def clean_text(data):
    text_cols = data.select_dtypes(include=['object']).columns
    for col in text_cols:
        data[col] = data[col].str.lower().str.strip()
    return data
```

### Statistical Analysis
```python
@data_function(
    description="Calculate summary statistics",
    input_types=['csv', 'xlsx', 'json'],
    progress_enabled=False
)
def calculate_stats(data):
    return data.describe()
```

## Supported File Formats

**Input Formats:**
- CSV (`.csv`)
- Excel (`.xlsx`, `.xls`)
- JSON (`.json`)
- Parquet (`.parquet`)
- TSV (`.tsv`)

**Output Formats:**
- CSV (`.csv`)
- Excel (`.xlsx`)
- JSON (`.json`)
- Parquet (`.parquet`)
- TSV (`.tsv`)

## Terminal UI Features

- **Function Selection**: Choose from available decorated functions
- **Directory Browsing**: Navigate through folders to find datasets
- **File Preview**: See file size, type, columns, and sample data
- **Progress Tracking**: Visual progress bars for long operations
- **Result Saving**: Save processed data to new files
- **Error Handling**: Clear error messages and recovery options

## Requirements

- Python 3.7+
- pandas
- rich (for beautiful terminal UI)
- click (for command-line interface)
- tqdm (for progress bars)

## Publishing

To build and publish this package:

1. **Build the package:**
   ```bash
   python build.py
   ```

2. **Publish to PyPI:**
   ```bash
   python publish.py
   ```

3. **Test installation:**
   ```bash
   pip install easydata-ds
   ```

## Development

### Setup Development Environment
```bash
git clone https://github.com/coleragone/easydata.git
cd easydata
pip install -e ".[dev]"
```

### Run Tests
```bash
pytest
```

### Code Formatting
```bash
black easydata/
flake8 easydata/
```

## Contributing

This is a startup project! Feel free to contribute by:
- Adding new file format support
- Improving the terminal UI
- Adding more example functions
- Enhancing error handling
- Writing tests
- Improving documentation

## License

MIT License - feel free to use this in your projects!

## Changelog

### v0.1.0 (2024-01-XX)
- Initial release
- Basic decorator functionality
- Terminal UI for file browsing
- Support for CSV, Excel, JSON, Parquet, TSV files
- Progress tracking for long operations
- Command-line interface


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/coleragone/easydata",
    "name": "easydata-ds",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "Cole Ragone <coleragone@example.com>",
    "keywords": "data science, pandas, terminal ui, data processing, decorator",
    "author": "Cole Ragone",
    "author_email": "Cole Ragone <coleragone@example.com>",
    "download_url": "https://files.pythonhosted.org/packages/a7/b8/45dba388d85630acbf7fac773df9af9ed727ff27af40824f520f48e7da74/easydata_ds-0.1.0.tar.gz",
    "platform": null,
    "description": "# EasyData - Data Science Function Runner\n\nEasyData is a Python library that makes it easy for data scientists to create reusable functions and run them on datasets through a beautiful terminal interface.\n\n## Features\n\n- \ud83c\udfaf **Simple Decorator**: Wrap your data science functions with `@data_function`\n- \ud83d\udda5\ufe0f **Terminal UI**: Browse directories and select datasets interactively\n- \ud83d\udcca **Progress Tracking**: Built-in progress bars for long-running operations\n- \ud83d\udcc1 **Multiple Formats**: Support for CSV, Excel, JSON, Parquet, and TSV files\n- \ud83d\udcbe **Easy Output**: Save results back to files with one click\n- \ud83d\udd0d **File Preview**: See file information and sample data before processing\n\n## Installation\n\n### From PyPI (when published)\n```bash\npip install easydata-ds\n```\n\n### From Source\n```bash\ngit clone https://github.com/coleragone/easydata.git\ncd easydata\npip install -e .\n```\n\n### Development Installation\n```bash\ngit clone https://github.com/coleragone/easydata.git\ncd easydata\npip install -e \".[dev]\"\n```\n\n## Quick Start\n\n1. **Create a Python script with decorated functions:**\n\n```python\nimport pandas as pd\nfrom easyData import data_function, run_data_functions\n\n@data_function(\n    description=\"Add True/False column based on condition\",\n    input_types=['csv', 'xlsx'],\n    output_types=['csv'],\n    progress_enabled=True\n)\ndef tag_condition(data):\n    \"\"\"Tag rows where value > 100\"\"\"\n    data['is_high_value'] = data['value'] > 100\n    return data\n\nif __name__ == \"__main__\":\n    run_data_functions()\n```\n\n2. **Run your script:**\n\n```bash\npython your_script.py\n```\n\n3. **Use the terminal UI:**\n   - Select your function from the list\n   - Browse directories to find your dataset\n   - Preview file information and sample data\n   - Run the function with progress tracking\n   - Save results to a new file\n\n## Decorator Parameters\n\nThe `@data_function` decorator accepts several parameters:\n\n- `description`: Human-readable description of what the function does\n- `input_types`: List of supported input file types (e.g., `['csv', 'xlsx', 'json']`)\n- `output_types`: List of supported output file types (e.g., `['csv', 'xlsx']`)\n- `progress_enabled`: Whether to show progress bars (default: `True`)\n- `batch_size`: Number of rows to process at once for progress tracking (default: `1000`)\n\n## Example Functions\n\n### Data Tagging\n```python\n@data_function(\n    description=\"Tag high-value customers\",\n    input_types=['csv', 'xlsx'],\n    progress_enabled=True\n)\ndef tag_high_value_customers(data):\n    data['is_high_value'] = data['revenue'] > 10000\n    return data\n```\n\n### Data Cleaning\n```python\n@data_function(\n    description=\"Clean and standardize text data\",\n    input_types=['csv'],\n    batch_size=500\n)\ndef clean_text(data):\n    text_cols = data.select_dtypes(include=['object']).columns\n    for col in text_cols:\n        data[col] = data[col].str.lower().str.strip()\n    return data\n```\n\n### Statistical Analysis\n```python\n@data_function(\n    description=\"Calculate summary statistics\",\n    input_types=['csv', 'xlsx', 'json'],\n    progress_enabled=False\n)\ndef calculate_stats(data):\n    return data.describe()\n```\n\n## Supported File Formats\n\n**Input Formats:**\n- CSV (`.csv`)\n- Excel (`.xlsx`, `.xls`)\n- JSON (`.json`)\n- Parquet (`.parquet`)\n- TSV (`.tsv`)\n\n**Output Formats:**\n- CSV (`.csv`)\n- Excel (`.xlsx`)\n- JSON (`.json`)\n- Parquet (`.parquet`)\n- TSV (`.tsv`)\n\n## Terminal UI Features\n\n- **Function Selection**: Choose from available decorated functions\n- **Directory Browsing**: Navigate through folders to find datasets\n- **File Preview**: See file size, type, columns, and sample data\n- **Progress Tracking**: Visual progress bars for long operations\n- **Result Saving**: Save processed data to new files\n- **Error Handling**: Clear error messages and recovery options\n\n## Requirements\n\n- Python 3.7+\n- pandas\n- rich (for beautiful terminal UI)\n- click (for command-line interface)\n- tqdm (for progress bars)\n\n## Publishing\n\nTo build and publish this package:\n\n1. **Build the package:**\n   ```bash\n   python build.py\n   ```\n\n2. **Publish to PyPI:**\n   ```bash\n   python publish.py\n   ```\n\n3. **Test installation:**\n   ```bash\n   pip install easydata-ds\n   ```\n\n## Development\n\n### Setup Development Environment\n```bash\ngit clone https://github.com/coleragone/easydata.git\ncd easydata\npip install -e \".[dev]\"\n```\n\n### Run Tests\n```bash\npytest\n```\n\n### Code Formatting\n```bash\nblack easydata/\nflake8 easydata/\n```\n\n## Contributing\n\nThis is a startup project! Feel free to contribute by:\n- Adding new file format support\n- Improving the terminal UI\n- Adding more example functions\n- Enhancing error handling\n- Writing tests\n- Improving documentation\n\n## License\n\nMIT License - feel free to use this in your projects!\n\n## Changelog\n\n### v0.1.0 (2024-01-XX)\n- Initial release\n- Basic decorator functionality\n- Terminal UI for file browsing\n- Support for CSV, Excel, JSON, Parquet, TSV files\n- Progress tracking for long operations\n- Command-line interface\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python library for data scientists to easily apply functions to datasets with a terminal UI",
    "version": "0.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/coleragone/easydata/issues",
        "Documentation": "https://github.com/coleragone/easydata#readme",
        "Homepage": "https://github.com/coleragone/easydata",
        "Repository": "https://github.com/coleragone/easydata"
    },
    "split_keywords": [
        "data science",
        " pandas",
        " terminal ui",
        " data processing",
        " decorator"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3a033fa815523d912246e77ebaf5e908cd28ff417e46241e104aa011742183e6",
                "md5": "ff5fd031c66d9f573f994bb6e25e7c48",
                "sha256": "69f95461f5e6c73be22a9c7be07a928d81cd8a44740d5274e5d5f42c5f455121"
            },
            "downloads": -1,
            "filename": "easydata_ds-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ff5fd031c66d9f573f994bb6e25e7c48",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 12210,
            "upload_time": "2025-10-24T03:34:12",
            "upload_time_iso_8601": "2025-10-24T03:34:12.872498Z",
            "url": "https://files.pythonhosted.org/packages/3a/03/3fa815523d912246e77ebaf5e908cd28ff417e46241e104aa011742183e6/easydata_ds-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a7b845dba388d85630acbf7fac773df9af9ed727ff27af40824f520f48e7da74",
                "md5": "587b20e86c441792217d3a771f0687b7",
                "sha256": "1f096806f7dc63071db009282830233473bdf4292d31804f3e2054ed0ee3c30d"
            },
            "downloads": -1,
            "filename": "easydata_ds-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "587b20e86c441792217d3a771f0687b7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 14372,
            "upload_time": "2025-10-24T03:34:14",
            "upload_time_iso_8601": "2025-10-24T03:34:14.228507Z",
            "url": "https://files.pythonhosted.org/packages/a7/b8/45dba388d85630acbf7fac773df9af9ed727ff27af40824f520f48e7da74/easydata_ds-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-24 03:34:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "coleragone",
    "github_project": "easydata",
    "github_not_found": true,
    "lcname": "easydata-ds"
}
        
Elapsed time: 1.67397s