dataprocessor

Name	dataprocessor_vb JSON
Version	0.1.1 JSON
	download
home_page	None
Summary	A comprehensive data processing library.
upload_time	2024-12-21 14:02:12
maintainer	None
docs_url	None
author	Vicba
requires_python	<4.0,>=3.12
license	None
keywords	data processing cleaning visualization feature engineering
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Data Tools Package

A comprehensive library for data preprocessing in AI development, focusing on scalability, usability, and modular design.

## Features

## Features

- **Data Loading**: Efficiently load datasets in various formats.
- **Data Cleaning**: Handle missing values, outliers, and duplicates.
- **Feature Engineering**: Create new features using advanced techniques.
- **Categorical Processing**: One-hot and label encoding for categorical variables.
- **Scaling**: Normalize and standardize numerical features.
- **Outlier Handling**: Detect and remove outliers using IQR.
- **Text Processing**: Clean, tokenize, and vectorize text data.
- **Time Series Processing**: Create time-based features and resample data.
- **Image Processing**: Load, resize, normalize, and convert images.
- **Image Augmentation**: Apply transformations to increase the diversity of your training dataset.

## usage

```py
from dataprocessor import DataLoader, DataCleaner, FeatureEngineer, ImageProcessor, ImageAugmenter

# Example usage of the package
loader = DataLoader()
data = loader.load_csv("data.csv")

cleaner = DataCleaner()
cleaned_data = cleaner.clean(data)

# Image processing example
image = ImageProcessor.load_image("path/to/image.jpg")
resized_image = ImageProcessor.resize_image(image, (224, 224))
normalized_image = ImageProcessor.normalize_image(resized_image)

# Image augmentation example
augmented_image = ImageAugmenter.augment_image(normalized_image)

```

## testing
```bash
poetry run pytest
```

# TODO:
- Fix file structure

# Package

[dataprocessor_vb pypi](https://pypi.org/project/dataprocessor_vb/)

1. configure pypi credentials if not already done
```bash
poetry config pypi-token.pypi <your-api-token>
```

2. publish the package
```bash
poetry publish --build
```

3. make also sure you add token to secrets under your repo settings in github

I think that the version should be updated manually, because now it updates the patch every commit.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "dataprocessor_vb",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.12",
    "maintainer_email": null,
    "keywords": "data, processing, cleaning, visualization, feature engineering",
    "author": "Vicba",
    "author_email": "victor.barra@live.be",
    "download_url": "https://files.pythonhosted.org/packages/d6/04/df2534725b5491e62ce09ca4b1b8fc6a0a3ee3ec461e327db7093943ba8b/dataprocessor_vb-0.1.1.tar.gz",
    "platform": null,
    "description": "# Data Tools Package\n\nA comprehensive library for data preprocessing in AI development, focusing on scalability, usability, and modular design.\n\n## Features\n\n## Features\n\n- **Data Loading**: Efficiently load datasets in various formats.\n- **Data Cleaning**: Handle missing values, outliers, and duplicates.\n- **Feature Engineering**: Create new features using advanced techniques.\n- **Categorical Processing**: One-hot and label encoding for categorical variables.\n- **Scaling**: Normalize and standardize numerical features.\n- **Outlier Handling**: Detect and remove outliers using IQR.\n- **Text Processing**: Clean, tokenize, and vectorize text data.\n- **Time Series Processing**: Create time-based features and resample data.\n- **Image Processing**: Load, resize, normalize, and convert images.\n- **Image Augmentation**: Apply transformations to increase the diversity of your training dataset.\n\n## usage\n\n```py\nfrom dataprocessor import DataLoader, DataCleaner, FeatureEngineer, ImageProcessor, ImageAugmenter\n\n# Example usage of the package\nloader = DataLoader()\ndata = loader.load_csv(\"data.csv\")\n\ncleaner = DataCleaner()\ncleaned_data = cleaner.clean(data)\n\n# Image processing example\nimage = ImageProcessor.load_image(\"path/to/image.jpg\")\nresized_image = ImageProcessor.resize_image(image, (224, 224))\nnormalized_image = ImageProcessor.normalize_image(resized_image)\n\n# Image augmentation example\naugmented_image = ImageAugmenter.augment_image(normalized_image)\n\n```\n\n## testing\n```bash\npoetry run pytest\n```\n\n# TODO:\n- Fix file structure\n\n# Package\n\n[dataprocessor_vb pypi](https://pypi.org/project/dataprocessor_vb/)\n\n1. configure pypi credentials if not already done\n```bash\npoetry config pypi-token.pypi <your-api-token>\n```\n\n2. publish the package\n```bash\npoetry publish --build\n```\n\n3. make also sure you add token to secrets under your repo settings in github\n\nI think that the version should be updated manually, because now it updates the patch every commit.",
    "bugtrack_url": null,
    "license": null,
    "summary": "A comprehensive data processing library.",
    "version": "0.1.1",
    "project_urls": {
        "repository": "https://github.com/Vicba/data-preprocessing-package"
    },
    "split_keywords": [
        "data",
        " processing",
        " cleaning",
        " visualization",
        " feature engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "838e52e682e0d0c10ca971efea3503c80034c30cbcbad9bebc6d8011e265a86a",
                "md5": "f8891061614444128f4a4bf75989e901",
                "sha256": "12b311487a80c0f71547d53acb46be8eb86440beb4a6cb34ed3a527921e57cec"
            },
            "downloads": -1,
            "filename": "dataprocessor_vb-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f8891061614444128f4a4bf75989e901",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.12",
            "size": 8592,
            "upload_time": "2024-12-21T14:02:10",
            "upload_time_iso_8601": "2024-12-21T14:02:10.653161Z",
            "url": "https://files.pythonhosted.org/packages/83/8e/52e682e0d0c10ca971efea3503c80034c30cbcbad9bebc6d8011e265a86a/dataprocessor_vb-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d604df2534725b5491e62ce09ca4b1b8fc6a0a3ee3ec461e327db7093943ba8b",
                "md5": "d55a694e12dc57f70a775e3960fce06b",
                "sha256": "71dd0a4153563127babbd5438a25470b0826a66673ec6861270a8212eab93da4"
            },
            "downloads": -1,
            "filename": "dataprocessor_vb-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "d55a694e12dc57f70a775e3960fce06b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.12",
            "size": 6183,
            "upload_time": "2024-12-21T14:02:12",
            "upload_time_iso_8601": "2024-12-21T14:02:12.972786Z",
            "url": "https://files.pythonhosted.org/packages/d6/04/df2534725b5491e62ce09ca4b1b8fc6a0a3ee3ec461e327db7093943ba8b/dataprocessor_vb-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-21 14:02:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Vicba",
    "github_project": "data-preprocessing-package",
    "github_not_found": true,
    "lcname": "dataprocessor_vb"
}

Vicba