automate-machinelearning


Nameautomate-machinelearning JSON
Version 0.0.5 PyPI version JSON
download
home_pagehttps://github.com/arvind444/auto_ml
SummaryA library package used to automate the machine learning on regression and classification problems.
upload_time2023-01-05 05:55:37
maintainer
docs_urlNone
authorArvindkumar S P
requires_python>=3.9
licenseMIT
keywords machine_learning correlation outlier_handling data scaling encoding
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # auto_ml
A library package used to automate the machine learning on regression and classification problems.

"""The function data_summary() is used to provide a basic details of the 
       given tabular dataset.
       
       data_summary(data) function contains one argument.
       
       data -> We need to pass the entire dataframe to this function. (type = dataframe)
       
       Example: data_summary(data = dataframe_name)"""
       
       """The function data_eda() is used to provide a basic exploratory data analysis of the 
       given tabular dataset using matplotlib and seaborn library.
       
       data_eda(data, target_column_name, ml_type) function contains three argument.
       
       data -> We need to pass the entire dataframe to this function. (type = dataframe)
       target_column_name -> We need to pass the column name of the target variable (y). (type = string)
       regression -> We need to pass the supervised learning type based on the target_column_name. (type = bool)
                   True -> target_column is numerical.
                   False -> target_column is discrete or categorical.
       
       Example: data_eda(data = dataframe_name, target_column_name = column_name, regression = True)
                data_eda(data = dataframe_name, target_column_name = column_name, regression = False)"""
                
        """The function train_data_handling() is used to handle null values and outliers in the dataset.
       This function is only used for training dataset. Don't use it for test dataset.
       It seperates the given data into numerical data, discrete data and categorical data.
       
       For numerical data, the null values are filled based on the skewness of the data.
       For discrete and categorical data the null values are filled randomly based on their unique values.
       
       The outliers in numerical data are cured by quantile or percentile method by passing the value between 0.10 to 0.90.
       
       train_data_handling(data, file_to_dump, regression, target_column_name, min_outlier_fill, max_outlier_fill)
       This function contains six arguments.
           
           data -> We need to pass the entire train dataframe. (type = dataframe)
           file_to_dump -> We need to pass the file name to dump the handling_na_values and handling_outliers values for 
                           future use. (type = str)
           regression -> We need to pass the supervised learning type based on the target_column_name. (type = bool)
                   True -> target_column is numerical.
                   False -> target_column is discrete or categorical.
           target_colum_name -> We need to pass the column name of the target variable (y). (type = string)
           min_outlier_fill -> We need to pass the quantile value to replace the minimum outlier. (type = float or int)
                               range = 0.10 to 0.50
           max_outlier_fill -> We need to pass the quantile value to replace the maximum outlier. (type = float or int)
                               range = 0.50 to 0.90
                               
            This function will return a dataframe which has no null values and outliers.
            
            Example: train_data_handling(data = dataframe, file_to_dump = "file name", regression = True, target_column_name = column_name)
                     train_data_handling(data = dataframe, file_to_dump = "file name", regression = False, target_column_name = column_name,
                     min_outlier_fill = 0.45, max_outlier_fill = 0.85)"""
        
        """The function test_data_handling() is used to handle the null values and outliers
       on the test data based on the file created by the function train_data_handling().
       
       test_data_handling(data, na_file: str, outlier_file: str, regression: bool, min_outlier_fill = 0.10, max_outlier_fill = 0.90)
           This function contains six arguments.
           
           data -> We need to pass the entire train dataframe. (type = dataframe)
           na_file -> We need to pass the file name to load the handling_na_values to fill null values in test data. (type = str)
           outlier_file -> We need to pass the file name to load the handling_outliers values to replace outliers in test data. (type = str)
           regression -> We need to pass the supervised learning type based on the target_column_name. (type = bool)
                   True -> target_column is numerical.
                   False -> target_column is discrete or categorical.
           min_outlier_fill -> We need to pass the quantile value to replace the minimum outlier. (type = float or int)
                               range = 0.10 to 0.50
           max_outlier_fill -> We need to pass the quantile value to replace the maximum outlier. (type = float or int)
                               range = 0.50 to 0.90
                               
            This function will return a dataframe which has no null values and outliers.
            
            Example: train_data_handling(data = dataframe, na_file = "file name", outlier_file = "file name", regression = True)
                     train_data_handling(data = dataframe, na_file = "file name", outlier_file = "file name", regression = False, target_column_name = column_name,
                     min_outlier_fill = 0.45, max_outlier_fill = 0.85)"""
                     
        """The function data_corr() is used to get the correlation and p_value
       using spearman method.Because it can see both linear and non-linear data correlation.
       This function will neglect datetime columns present in the given data.
       It produce a dataframe for correlation and p_value and stored in the given file name
       
       data_corr(data, target_column_name, corr_file_name) contains three arguments.
           
           data -> We need to pass entire train dataframe. (type = dataframe)
           target_column_name -> We need to provide the column name of the target variable (y). (type = str)
           corr_file_name -> We need to pass a name to save the generated correlation dataframe. (type = str)
           
           This function will return a dataframe which does not contain any datetime column in the given dataframe.
           
           Example: data_corr(data = dataframe, target_column_name = "column_name", corr_file_name = "file_name")"""
           
     """The function data_encode_and_scale() is used to encode the categorical data 
       and scale the dataframe using standard scaler.
       
       If the test data is given in a dataframe this fuction will return two dataframe.
       
       data_encode_and_scale(data, test_data, target_column_name: str, correlation_file_name: str, confidence_level: int, file_name: str)
                             contains six arguments.
                             
                data -> We need to pass the entire dataframe. (type = dataframe)
                test_data -> We need to pass the test dataframe if available or any python object. (type = dataframe)
                target_column_name -> We need to provide the column name of the target variable (y). (type = str)
                correlation_file_name -> We need to pass the file that is generated by the function train_data_handling(). (type = str)
                confidence_level -> We need to pass an integer between 0 to 100 to get the p_value to select the features from dataframe. (type = int)
                file_name -> We need to pass a file name to save the encoding and scaling method to use in future.
                
        Example : data_encode_and_scale(data = "dataframe", test_data = "dataframe", target_column_name = "column_name", correlation_file_name: "correlation_file", confidence_level = 94, file_name = "file_name")
                    It will return a two dataframe which are scaled train dataframe and scaled test dataframe"""
    
    
     """The function train_model() is used to train the machine learning model
       for the given data.
       
       train_model(data, file_name: str, target_column_name: str, regression: bool) contains four arguments
       
           data -> We need to provide entire dataframe. (type = dataframe)
           file_name -> We need to provide the file name to save the model for future use. (type = str)
           target_column_name -> We need to provide the name of the target column name. (type = str)
           regression -> We need to provide a boolean data type whether the ml model is regression or classification. (type = bool)
           
           The function train_model() will return a dataframe that contains the information about the trained model.
           
               Example: train_model(data = dataframe, file_name = "file_name", target_column_name = "column_name", regression = True)
                        train_model(data = dataframe, file_name = "file_name", target_column_name = "column_name", regression = False)"""
    
    
     
    """The function test_model() is used to predict the target variable for the test data.
    
       test_model(data, test_data, train_file_name: str, target_column_name: str) contains four arguments
       
           data -> We need to provide the entire dataframe on which the train_model() is built. (type = dataframe)
           test_data -> We need to provide the test dataframe to predict the target variable. (type = dataframe)
           train_file_name -> We need to provide the any one train file name which is generated by train_model() function. (type = str)
           target_column_name -> We need to provide the target column name of the given data. (type = str)
           
           The function test_model() will return an array of predicted value for the given test data.
           
               Example: test_model(data = dataframe, test_model = test_dataframe, train_file_name = "file_name", target_column_name = "column_name")"""
 

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/arvind444/auto_ml",
    "name": "automate-machinelearning",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "machine_learning,correlation,outlier_handling,data,scaling,encoding",
    "author": "Arvindkumar S P",
    "author_email": "arvindkumar1998ece@gmail.com",
    "download_url": "",
    "platform": null,
    "description": "# auto_ml\nA library package used to automate the machine learning on regression and classification problems.\n\n\"\"\"The function data_summary() is used to provide a basic details of the \n       given tabular dataset.\n       \n       data_summary(data) function contains one argument.\n       \n       data -> We need to pass the entire dataframe to this function. (type = dataframe)\n       \n       Example: data_summary(data = dataframe_name)\"\"\"\n       \n       \"\"\"The function data_eda() is used to provide a basic exploratory data analysis of the \n       given tabular dataset using matplotlib and seaborn library.\n       \n       data_eda(data, target_column_name, ml_type) function contains three argument.\n       \n       data -> We need to pass the entire dataframe to this function. (type = dataframe)\n       target_column_name -> We need to pass the column name of the target variable (y). (type = string)\n       regression -> We need to pass the supervised learning type based on the target_column_name. (type = bool)\n                   True -> target_column is numerical.\n                   False -> target_column is discrete or categorical.\n       \n       Example: data_eda(data = dataframe_name, target_column_name = column_name, regression = True)\n                data_eda(data = dataframe_name, target_column_name = column_name, regression = False)\"\"\"\n                \n        \"\"\"The function train_data_handling() is used to handle null values and outliers in the dataset.\n       This function is only used for training dataset. Don't use it for test dataset.\n       It seperates the given data into numerical data, discrete data and categorical data.\n       \n       For numerical data, the null values are filled based on the skewness of the data.\n       For discrete and categorical data the null values are filled randomly based on their unique values.\n       \n       The outliers in numerical data are cured by quantile or percentile method by passing the value between 0.10 to 0.90.\n       \n       train_data_handling(data, file_to_dump, regression, target_column_name, min_outlier_fill, max_outlier_fill)\n       This function contains six arguments.\n           \n           data -> We need to pass the entire train dataframe. (type = dataframe)\n           file_to_dump -> We need to pass the file name to dump the handling_na_values and handling_outliers values for \n                           future use. (type = str)\n           regression -> We need to pass the supervised learning type based on the target_column_name. (type = bool)\n                   True -> target_column is numerical.\n                   False -> target_column is discrete or categorical.\n           target_colum_name -> We need to pass the column name of the target variable (y). (type = string)\n           min_outlier_fill -> We need to pass the quantile value to replace the minimum outlier. (type = float or int)\n                               range = 0.10 to 0.50\n           max_outlier_fill -> We need to pass the quantile value to replace the maximum outlier. (type = float or int)\n                               range = 0.50 to 0.90\n                               \n            This function will return a dataframe which has no null values and outliers.\n            \n            Example: train_data_handling(data = dataframe, file_to_dump = \"file name\", regression = True, target_column_name = column_name)\n                     train_data_handling(data = dataframe, file_to_dump = \"file name\", regression = False, target_column_name = column_name,\n                     min_outlier_fill = 0.45, max_outlier_fill = 0.85)\"\"\"\n        \n        \"\"\"The function test_data_handling() is used to handle the null values and outliers\n       on the test data based on the file created by the function train_data_handling().\n       \n       test_data_handling(data, na_file: str, outlier_file: str, regression: bool, min_outlier_fill = 0.10, max_outlier_fill = 0.90)\n           This function contains six arguments.\n           \n           data -> We need to pass the entire train dataframe. (type = dataframe)\n           na_file -> We need to pass the file name to load the handling_na_values to fill null values in test data. (type = str)\n           outlier_file -> We need to pass the file name to load the handling_outliers values to replace outliers in test data. (type = str)\n           regression -> We need to pass the supervised learning type based on the target_column_name. (type = bool)\n                   True -> target_column is numerical.\n                   False -> target_column is discrete or categorical.\n           min_outlier_fill -> We need to pass the quantile value to replace the minimum outlier. (type = float or int)\n                               range = 0.10 to 0.50\n           max_outlier_fill -> We need to pass the quantile value to replace the maximum outlier. (type = float or int)\n                               range = 0.50 to 0.90\n                               \n            This function will return a dataframe which has no null values and outliers.\n            \n            Example: train_data_handling(data = dataframe, na_file = \"file name\", outlier_file = \"file name\", regression = True)\n                     train_data_handling(data = dataframe, na_file = \"file name\", outlier_file = \"file name\", regression = False, target_column_name = column_name,\n                     min_outlier_fill = 0.45, max_outlier_fill = 0.85)\"\"\"\n                     \n        \"\"\"The function data_corr() is used to get the correlation and p_value\n       using spearman method.Because it can see both linear and non-linear data correlation.\n       This function will neglect datetime columns present in the given data.\n       It produce a dataframe for correlation and p_value and stored in the given file name\n       \n       data_corr(data, target_column_name, corr_file_name) contains three arguments.\n           \n           data -> We need to pass entire train dataframe. (type = dataframe)\n           target_column_name -> We need to provide the column name of the target variable (y). (type = str)\n           corr_file_name -> We need to pass a name to save the generated correlation dataframe. (type = str)\n           \n           This function will return a dataframe which does not contain any datetime column in the given dataframe.\n           \n           Example: data_corr(data = dataframe, target_column_name = \"column_name\", corr_file_name = \"file_name\")\"\"\"\n           \n     \"\"\"The function data_encode_and_scale() is used to encode the categorical data \n       and scale the dataframe using standard scaler.\n       \n       If the test data is given in a dataframe this fuction will return two dataframe.\n       \n       data_encode_and_scale(data, test_data, target_column_name: str, correlation_file_name: str, confidence_level: int, file_name: str)\n                             contains six arguments.\n                             \n                data -> We need to pass the entire dataframe. (type = dataframe)\n                test_data -> We need to pass the test dataframe if available or any python object. (type = dataframe)\n                target_column_name -> We need to provide the column name of the target variable (y). (type = str)\n                correlation_file_name -> We need to pass the file that is generated by the function train_data_handling(). (type = str)\n                confidence_level -> We need to pass an integer between 0 to 100 to get the p_value to select the features from dataframe. (type = int)\n                file_name -> We need to pass a file name to save the encoding and scaling method to use in future.\n                \n        Example : data_encode_and_scale(data = \"dataframe\", test_data = \"dataframe\", target_column_name = \"column_name\", correlation_file_name: \"correlation_file\", confidence_level = 94, file_name = \"file_name\")\n                    It will return a two dataframe which are scaled train dataframe and scaled test dataframe\"\"\"\n    \n    \n     \"\"\"The function train_model() is used to train the machine learning model\n       for the given data.\n       \n       train_model(data, file_name: str, target_column_name: str, regression: bool) contains four arguments\n       \n           data -> We need to provide entire dataframe. (type = dataframe)\n           file_name -> We need to provide the file name to save the model for future use. (type = str)\n           target_column_name -> We need to provide the name of the target column name. (type = str)\n           regression -> We need to provide a boolean data type whether the ml model is regression or classification. (type = bool)\n           \n           The function train_model() will return a dataframe that contains the information about the trained model.\n           \n               Example: train_model(data = dataframe, file_name = \"file_name\", target_column_name = \"column_name\", regression = True)\n                        train_model(data = dataframe, file_name = \"file_name\", target_column_name = \"column_name\", regression = False)\"\"\"\n    \n    \n     \n    \"\"\"The function test_model() is used to predict the target variable for the test data.\n    \n       test_model(data, test_data, train_file_name: str, target_column_name: str) contains four arguments\n       \n           data -> We need to provide the entire dataframe on which the train_model() is built. (type = dataframe)\n           test_data -> We need to provide the test dataframe to predict the target variable. (type = dataframe)\n           train_file_name -> We need to provide the any one train file name which is generated by train_model() function. (type = str)\n           target_column_name -> We need to provide the target column name of the given data. (type = str)\n           \n           The function test_model() will return an array of predicted value for the given test data.\n           \n               Example: test_model(data = dataframe, test_model = test_dataframe, train_file_name = \"file_name\", target_column_name = \"column_name\")\"\"\"\n \n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A library package used to automate the machine learning on regression and classification problems.",
    "version": "0.0.5",
    "split_keywords": [
        "machine_learning",
        "correlation",
        "outlier_handling",
        "data",
        "scaling",
        "encoding"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a475091ff0980af268d7a7a4879bc673752d7423d3eb1e786f55b5e740457794",
                "md5": "92b8f14ea2dacb9fb2227b2a01a16fa1",
                "sha256": "4c651433c20802b3d33f58d7f543c3af9978ca26901ee21dc5d1654b5a749fa9"
            },
            "downloads": -1,
            "filename": "automate_machinelearning-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "92b8f14ea2dacb9fb2227b2a01a16fa1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 12281,
            "upload_time": "2023-01-05T05:55:37",
            "upload_time_iso_8601": "2023-01-05T05:55:37.519918Z",
            "url": "https://files.pythonhosted.org/packages/a4/75/091ff0980af268d7a7a4879bc673752d7423d3eb1e786f55b5e740457794/automate_machinelearning-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-05 05:55:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "arvind444",
    "github_project": "auto_ml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "automate-machinelearning"
}
        
Elapsed time: 0.02555s