adcl


Nameadcl JSON
Version 0.1.7 PyPI version JSON
download
home_pagehttps://github.com/maykov-stepan/ADCL-Automatic-Data-Cleaning
SummaryData preprocessing and cleaning tools for data science projects
upload_time2024-05-02 07:49:12
maintainerNone
docs_urlNone
authorMaykov Stepan
requires_python<3.8,>=3.7
licenseMIT
keywords data cleaning preprocessing data science machine learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ADCL-Automatic-Data-Cleaning Project

## Overview
ADCL-Automatic-Data-Cleaning is a Python package designed to facilitate automated data cleaning, particularly leveraging deep learning techniques for preprocessing tasks essential in data science and machine learning workflows.

## Features
- **Data Preprocessing**: Standardize, normalize, and format your data for machine learning models.
- **Missing Value Imputation**: Implements various techniques for handling missing data in both cross-sectional and time-series datasets.
- **Outlier Detection**: Identifies and manages outliers using multiple strategies, improving the robustness of your models.
- **Encoding and Transformation**: Converts categorical data into a machine-readable format using various encoding techniques.
- **Time Series Handling**: Special functions for processing time-dependent data.

## Repository Structure
- **data_preprocessing/**: Contains the core library file `data_preprocessing.py` with all preprocessing functions.
- **examples/**: Includes `example_usage.ipynb`, a Jupyter notebook demonstrating how to use the preprocessing functions.
- **missing_values_imputation_test/**: Contains notebooks for testing missing value imputation across different data types.
- **outlier_detection_test/**: Contains notebooks for testing outliers detection across different data types.
- **LICENSE**: The project is open-sourced under the MIT license.

## Installation
To install ADCL directly from PyPI, run the following command:
```bash
pip install adcl
```

## Usage
### Data Preprocessing
You can preprocess your datasets by importing functions from `data_preprocessing.py`. For example:
```python
from adcl import process_data
filepath = 'path_to_your_data.csv'
df_train, df_test, y_column_name, date_col = process_data(train_input=filepath)
```

### Missing Value Handling
Handle missing values by choosing an appropriate method from the library. An example usage for time series data:
```python
from adcl import missing_values_handling
X_train_mis, X_test_mis = missing_values_handling(df_train=X_train, df_test=X_test, datetime_col=date_col, imputation_method='auto')
```

### Outlier Detection
Detect Outliers by choosing an appropriate method from the library. An example usage for time series data:
```python
from adcl import outlier_detection
X_train_out, X_test_out = outlier_detection(X_train=X_train, X_test=X_test, datetime_col=date_col
                                    , method='auto', nu=0.05, kernel='rbf', gamma='scale'
                                    , n_neighbors=20, contamination='auto', n_estimators=100
                                    , encoding_dim=8, epochs=50, batch_size=32
                                    , window_size=20, dtw_window=None)
```

### Categorical Variables Encoding
Encode categorical variables by choosing an appropriate method from the library. An example usage for time series data:
```python
from adcl import encode_data
X_train_enc, X_test_enc = encode_data(df_train=X_train, df_test=X_test, y_column_name,
                encoding_method='label', nu=0.05, kernel='rbf', gamma='scale',
                n_neighbors=20, contamination='auto', n_estimators=100,
                encoding_dim=8, epochs=50, batch_size=32)
```

### Example Notebooks
For detailed examples, refer to the notebooks in the `examples/` directory. These notebooks provide comprehensive guides on utilizing the package's functionalities effectively.

## Contributing
Contributions are welcome! If you have suggestions for improving the library, feel free to fork the repository and submit a pull request.

## License
This project is licensed under the MIT License - see the LICENSE file for details.

## Contact
For any queries or further information, please contact [steve19992@mail.ru](mailto:steve19992@mail.ru).

By providing structured guidance on using the package and clearly explaining what each part of the package does, users of all levels can effectively integrate ADCL into their data cleaning and preprocessing workflows.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/maykov-stepan/ADCL-Automatic-Data-Cleaning",
    "name": "adcl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.8,>=3.7",
    "maintainer_email": null,
    "keywords": "data cleaning, preprocessing, data science, machine learning",
    "author": "Maykov Stepan",
    "author_email": "steve19992@mail.ru",
    "download_url": "https://files.pythonhosted.org/packages/ae/81/a3f52475a5f11e9c8350a83acb31a7666a05e8b01408d8de118d102f2f0f/adcl-0.1.7.tar.gz",
    "platform": null,
    "description": "# ADCL-Automatic-Data-Cleaning Project\r\n\r\n## Overview\r\nADCL-Automatic-Data-Cleaning is a Python package designed to facilitate automated data cleaning, particularly leveraging deep learning techniques for preprocessing tasks essential in data science and machine learning workflows.\r\n\r\n## Features\r\n- **Data Preprocessing**: Standardize, normalize, and format your data for machine learning models.\r\n- **Missing Value Imputation**: Implements various techniques for handling missing data in both cross-sectional and time-series datasets.\r\n- **Outlier Detection**: Identifies and manages outliers using multiple strategies, improving the robustness of your models.\r\n- **Encoding and Transformation**: Converts categorical data into a machine-readable format using various encoding techniques.\r\n- **Time Series Handling**: Special functions for processing time-dependent data.\r\n\r\n## Repository Structure\r\n- **data_preprocessing/**: Contains the core library file `data_preprocessing.py` with all preprocessing functions.\r\n- **examples/**: Includes `example_usage.ipynb`, a Jupyter notebook demonstrating how to use the preprocessing functions.\r\n- **missing_values_imputation_test/**: Contains notebooks for testing missing value imputation across different data types.\r\n- **outlier_detection_test/**: Contains notebooks for testing outliers detection across different data types.\r\n- **LICENSE**: The project is open-sourced under the MIT license.\r\n\r\n## Installation\r\nTo install ADCL directly from PyPI, run the following command:\r\n```bash\r\npip install adcl\r\n```\r\n\r\n## Usage\r\n### Data Preprocessing\r\nYou can preprocess your datasets by importing functions from `data_preprocessing.py`. For example:\r\n```python\r\nfrom adcl import process_data\r\nfilepath = 'path_to_your_data.csv'\r\ndf_train, df_test, y_column_name, date_col = process_data(train_input=filepath)\r\n```\r\n\r\n### Missing Value Handling\r\nHandle missing values by choosing an appropriate method from the library. An example usage for time series data:\r\n```python\r\nfrom adcl import missing_values_handling\r\nX_train_mis, X_test_mis = missing_values_handling(df_train=X_train, df_test=X_test, datetime_col=date_col, imputation_method='auto')\r\n```\r\n\r\n### Outlier Detection\r\nDetect Outliers by choosing an appropriate method from the library. An example usage for time series data:\r\n```python\r\nfrom adcl import outlier_detection\r\nX_train_out, X_test_out = outlier_detection(X_train=X_train, X_test=X_test, datetime_col=date_col\r\n                                    , method='auto', nu=0.05, kernel='rbf', gamma='scale'\r\n                                    , n_neighbors=20, contamination='auto', n_estimators=100\r\n                                    , encoding_dim=8, epochs=50, batch_size=32\r\n                                    , window_size=20, dtw_window=None)\r\n```\r\n\r\n### Categorical Variables Encoding\r\nEncode categorical variables by choosing an appropriate method from the library. An example usage for time series data:\r\n```python\r\nfrom adcl import encode_data\r\nX_train_enc, X_test_enc = encode_data(df_train=X_train, df_test=X_test, y_column_name,\r\n                encoding_method='label', nu=0.05, kernel='rbf', gamma='scale',\r\n                n_neighbors=20, contamination='auto', n_estimators=100,\r\n                encoding_dim=8, epochs=50, batch_size=32)\r\n```\r\n\r\n### Example Notebooks\r\nFor detailed examples, refer to the notebooks in the `examples/` directory. These notebooks provide comprehensive guides on utilizing the package's functionalities effectively.\r\n\r\n## Contributing\r\nContributions are welcome! If you have suggestions for improving the library, feel free to fork the repository and submit a pull request.\r\n\r\n## License\r\nThis project is licensed under the MIT License - see the LICENSE file for details.\r\n\r\n## Contact\r\nFor any queries or further information, please contact [steve19992@mail.ru](mailto:steve19992@mail.ru).\r\n\r\nBy providing structured guidance on using the package and clearly explaining what each part of the package does, users of all levels can effectively integrate ADCL into their data cleaning and preprocessing workflows.\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Data preprocessing and cleaning tools for data science projects",
    "version": "0.1.7",
    "project_urls": {
        "Homepage": "https://github.com/maykov-stepan/ADCL-Automatic-Data-Cleaning"
    },
    "split_keywords": [
        "data cleaning",
        " preprocessing",
        " data science",
        " machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "42d25ab106ee767eb3ce7cc5b27b2e74188cb1e510ff453ce8545cafb9a1caa1",
                "md5": "291a1f9ad7577452315afc2bcc38d3a0",
                "sha256": "f3b0c73db3fdcebcc875f1e4d207295cc5bcd4b17be840bed0b6e9747427004d"
            },
            "downloads": -1,
            "filename": "adcl-0.1.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "291a1f9ad7577452315afc2bcc38d3a0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.8,>=3.7",
            "size": 15171,
            "upload_time": "2024-05-02T07:49:11",
            "upload_time_iso_8601": "2024-05-02T07:49:11.055486Z",
            "url": "https://files.pythonhosted.org/packages/42/d2/5ab106ee767eb3ce7cc5b27b2e74188cb1e510ff453ce8545cafb9a1caa1/adcl-0.1.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ae81a3f52475a5f11e9c8350a83acb31a7666a05e8b01408d8de118d102f2f0f",
                "md5": "3d8270358e8bbf3f714d1935466a61c3",
                "sha256": "b98f09929b3657061260ad3d430b8ccba204084788966a39fe605d83b961657c"
            },
            "downloads": -1,
            "filename": "adcl-0.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "3d8270358e8bbf3f714d1935466a61c3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.8,>=3.7",
            "size": 16435,
            "upload_time": "2024-05-02T07:49:12",
            "upload_time_iso_8601": "2024-05-02T07:49:12.917252Z",
            "url": "https://files.pythonhosted.org/packages/ae/81/a3f52475a5f11e9c8350a83acb31a7666a05e8b01408d8de118d102f2f0f/adcl-0.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-02 07:49:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "maykov-stepan",
    "github_project": "ADCL-Automatic-Data-Cleaning",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "adcl"
}
        
Elapsed time: 4.12445s