featransform


Namefeatransform JSON
Version 0.9.16 PyPI version JSON
download
home_pagehttps://github.com/TsLu1s/Featransform
SummaryFeatransform is an automated feature engineering framework for supervised machine learning
upload_time2024-04-19 22:16:24
maintainerNone
docs_urlNone
authorLuís Santos
requires_pythonNone
licenseMIT
keywords data science machine learning data processing feature engineering feature selection feature construction feature optimization automated feature engineering automated machine learning predictive modeling
VCS
bugtrack_url
requirements atlantic catboost pandas numpy scikit-learn h2o xgboost optuna statsmodels umap-learn
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <br>
<p align="center">
  <h2 align="center"> Featransform - Automated Feature Engineering Framework for Supervised Machine Learning
  <br>
  
## Framework Contextualization <a name = "ta"></a>

The `Featransform` project constitutes an objective and integrated proposition to automate feature engineering through the integration of various approachs of input pattern recognition known in Machine Learning such as dimensionality reduction, anomaly detection, clustering approaches and datetime feature constrution. This package provides an ensemble of diverse applications of each specific approach, aggregating and generating them all as added feature engineered features based on the original input features. 

In order to avoid generation of noisy data for predictive consumption, after the engineered features ensemble are concatenated with the original features, a backwards wrapper feature selection also known as backward elimination is implemented to iteratively remove features based on evaluation of relevance, maintaining only valuable columns available for future models performance improvement purposes.

The architecture design includes three main sections, these being: data preprocessing, diverse feature engineering ensembles and optimized feature selection validation.

This project aims at providing the following application capabilities:

* General applicability on tabular datasets: The developed feature engineering procedures are applicable on any data table associated with any Supervised ML scopes, based on input data columns to be built up on.
    
* Improvement of predictive results: The application of the `Featransform` aims at improve the predictive performance of future applied Machine Learning models through added feature construction, increased pattern recognition and optimization of existing input features.

* Continuous integration: After the train data is fitted, the created object can be saved and implemented in future data with the same structure. 
   
#### Main Development Tools <a name = "pre1"></a>

Major frameworks used to built this project: 

* [Pandas](https://pandas.pydata.org/)
* [Sklearn](https://scikit-learn.org/stable/)
* [XGBoost](https://xgboost.readthedocs.io/en/stable/)
* [Optuna](https://optuna.org/)
    
## Where to get it <a name = "ta"></a>
    
Binary installer for the latest released version is available at the Python Package Index [(PyPI)](https://pypi.org/project/featransform/).   

The source code is currently hosted on GitHub at: https://github.com/TsLu1s/Featransform

## Installation  

To install this package from Pypi repository run the following command:

```
pip install featransform
```

# Usage Example
    
## Featransform - Automated Feature Engineering Pipeline

In order to be able to apply the automated feature engineering `featransform` pipeline you need first to import the package. 
The following needed step is to load a dataset and define your to be predicted target column name into the variable `target`.
You can customize the `fit_engineering` method by altering the following running pipeline parameters:
* validation_split: Division ratio in which the feature engineering methods will be evaluated within the loaded Dataset (range: [0.05, 0.45]).
* optimize_iters: Number of iterations generated for backwards feature selection optimization.
* configs: Nested dictionary in which are contained all methods specific parameters configurations. Feel free to customize each method as you see fit (customization example shown bellow);

Relevant Note:
* Although functional, `Featransform` pipeline is not optimized for big data purposes yet.

```py
    
import pandas as pd
from sklearn.model_selection import train_test_split
from featransform.pipeline import (Featransform,
                                   configurations)
import warnings
warnings.filterwarnings("ignore", category=Warning) # -> For a clean console
    
data = pd.read_csv('csv_directory_path', encoding='latin', delimiter=',') # Dataframe Loading Example

train,test = train_test_split(data, train_size=0.8)
train,test = train.reset_index(drop=True), test.reset_index(drop=True) # -> Required 


# Load and Customize Parameters

configs = configurations()
print(configs)

configs['Unsupervised']['Isolation_Forest']['n_estimators'] = 300
configs['Clustering']['KMeans']['n_clusters'] = 3
configs['DimensionalityReduction']['UMAP']['n_components'] = 6

## Fit Data

ft = Featransform(validation_split = 0.30, # validation_split:float, optimize_iters:int
                  optimize_iters = 10,
                  configs = configs)

ft.fit_engineering(X = train,              # X:pd.DataFrame, target:str="Target_Column"
                   target = "Target_Column_Name")

## Transform Data 

train = ft.transform(X=train)
test = ft.transform(X=test)

# Export Featransform Metadata

import pickle
output = open("ft_eng.pkl", 'wb')
pickle.dump(ft, output)
    
```  

#### Further Implementations

Further automated and customizable feature engineering ensemble methods applications can be checked here: [Featransform Examples](https://github.com/TsLu1s/Featransform/tree/main/examples)

## License

Distributed under the MIT License. See [LICENSE](https://github.com/TsLu1s/Featransform/blob/main/LICENSE) for more information.

## Contact 
 
[Luis Santos - LinkedIn](https://www.linkedin.com/in/lu%C3%ADsfssantos/)


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/TsLu1s/Featransform",
    "name": "featransform",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "data science, machine learning, data processing, feature engineering, feature selection, feature construction, feature optimization, automated feature engineering, automated machine learning, predictive modeling",
    "author": "Lu\u00eds Santos",
    "author_email": "luisf_ssantos@hotmail.com",
    "download_url": null,
    "platform": null,
    "description": "<br>\r\n<p align=\"center\">\r\n  <h2 align=\"center\"> Featransform - Automated Feature Engineering Framework for Supervised Machine Learning\r\n  <br>\r\n  \r\n## Framework Contextualization <a name = \"ta\"></a>\r\n\r\nThe `Featransform` project constitutes an objective and integrated proposition to automate feature engineering through the integration of various approachs of input pattern recognition known in Machine Learning such as dimensionality reduction, anomaly detection, clustering approaches and datetime feature constrution. This package provides an ensemble of diverse applications of each specific approach, aggregating and generating them all as added feature engineered features based on the original input features. \r\n\r\nIn order to avoid generation of noisy data for predictive consumption, after the engineered features ensemble are concatenated with the original features, a backwards wrapper feature selection also known as backward elimination is implemented to iteratively remove features based on evaluation of relevance, maintaining only valuable columns available for future models performance improvement purposes.\r\n\r\nThe architecture design includes three main sections, these being: data preprocessing, diverse feature engineering ensembles and optimized feature selection validation.\r\n\r\nThis project aims at providing the following application capabilities:\r\n\r\n* General applicability on tabular datasets: The developed feature engineering procedures are applicable on any data table associated with any Supervised ML scopes, based on input data columns to be built up on.\r\n    \r\n* Improvement of predictive results: The application of the `Featransform` aims at improve the predictive performance of future applied Machine Learning models through added feature construction, increased pattern recognition and optimization of existing input features.\r\n\r\n* Continuous integration: After the train data is fitted, the created object can be saved and implemented in future data with the same structure. \r\n   \r\n#### Main Development Tools <a name = \"pre1\"></a>\r\n\r\nMajor frameworks used to built this project: \r\n\r\n* [Pandas](https://pandas.pydata.org/)\r\n* [Sklearn](https://scikit-learn.org/stable/)\r\n* [XGBoost](https://xgboost.readthedocs.io/en/stable/)\r\n* [Optuna](https://optuna.org/)\r\n    \r\n## Where to get it <a name = \"ta\"></a>\r\n    \r\nBinary installer for the latest released version is available at the Python Package Index [(PyPI)](https://pypi.org/project/featransform/).   \r\n\r\nThe source code is currently hosted on GitHub at: https://github.com/TsLu1s/Featransform\r\n\r\n## Installation  \r\n\r\nTo install this package from Pypi repository run the following command:\r\n\r\n```\r\npip install featransform\r\n```\r\n\r\n# Usage Example\r\n    \r\n## Featransform - Automated Feature Engineering Pipeline\r\n\r\nIn order to be able to apply the automated feature engineering `featransform` pipeline you need first to import the package. \r\nThe following needed step is to load a dataset and define your to be predicted target column name into the variable `target`.\r\nYou can customize the `fit_engineering` method by altering the following running pipeline parameters:\r\n* validation_split: Division ratio in which the feature engineering methods will be evaluated within the loaded Dataset (range: [0.05, 0.45]).\r\n* optimize_iters: Number of iterations generated for backwards feature selection optimization.\r\n* configs: Nested dictionary in which are contained all methods specific parameters configurations. Feel free to customize each method as you see fit (customization example shown bellow);\r\n\r\nRelevant Note:\r\n* Although functional, `Featransform` pipeline is not optimized for big data purposes yet.\r\n\r\n```py\r\n    \r\nimport pandas as pd\r\nfrom sklearn.model_selection import train_test_split\r\nfrom featransform.pipeline import (Featransform,\r\n                                   configurations)\r\nimport warnings\r\nwarnings.filterwarnings(\"ignore\", category=Warning) # -> For a clean console\r\n    \r\ndata = pd.read_csv('csv_directory_path', encoding='latin', delimiter=',') # Dataframe Loading Example\r\n\r\ntrain,test = train_test_split(data, train_size=0.8)\r\ntrain,test = train.reset_index(drop=True), test.reset_index(drop=True) # -> Required \r\n\r\n\r\n# Load and Customize Parameters\r\n\r\nconfigs = configurations()\r\nprint(configs)\r\n\r\nconfigs['Unsupervised']['Isolation_Forest']['n_estimators'] = 300\r\nconfigs['Clustering']['KMeans']['n_clusters'] = 3\r\nconfigs['DimensionalityReduction']['UMAP']['n_components'] = 6\r\n\r\n## Fit Data\r\n\r\nft = Featransform(validation_split = 0.30, # validation_split:float, optimize_iters:int\r\n                  optimize_iters = 10,\r\n                  configs = configs)\r\n\r\nft.fit_engineering(X = train,              # X:pd.DataFrame, target:str=\"Target_Column\"\r\n                   target = \"Target_Column_Name\")\r\n\r\n## Transform Data \r\n\r\ntrain = ft.transform(X=train)\r\ntest = ft.transform(X=test)\r\n\r\n# Export Featransform Metadata\r\n\r\nimport pickle\r\noutput = open(\"ft_eng.pkl\", 'wb')\r\npickle.dump(ft, output)\r\n    \r\n```  \r\n\r\n#### Further Implementations\r\n\r\nFurther automated and customizable feature engineering ensemble methods applications can be checked here: [Featransform Examples](https://github.com/TsLu1s/Featransform/tree/main/examples)\r\n\r\n## License\r\n\r\nDistributed under the MIT License. See [LICENSE](https://github.com/TsLu1s/Featransform/blob/main/LICENSE) for more information.\r\n\r\n## Contact \r\n \r\n[Luis Santos - LinkedIn](https://www.linkedin.com/in/lu%C3%ADsfssantos/)\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Featransform is an automated feature engineering framework for supervised machine learning",
    "version": "0.9.16",
    "project_urls": {
        "Homepage": "https://github.com/TsLu1s/Featransform"
    },
    "split_keywords": [
        "data science",
        " machine learning",
        " data processing",
        " feature engineering",
        " feature selection",
        " feature construction",
        " feature optimization",
        " automated feature engineering",
        " automated machine learning",
        " predictive modeling"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6bd01fc3744004f49d2055d91446e1f95d706aec7750c3c5ebed3b2527d000cd",
                "md5": "55f41ae98e2e0bb2aca87b4f70c809f4",
                "sha256": "37215a4eedb18054d957354065bde987483a7c4a9b6557a235cc7069df10d328"
            },
            "downloads": -1,
            "filename": "featransform-0.9.16-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "55f41ae98e2e0bb2aca87b4f70c809f4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 18589,
            "upload_time": "2024-04-19T22:16:24",
            "upload_time_iso_8601": "2024-04-19T22:16:24.943081Z",
            "url": "https://files.pythonhosted.org/packages/6b/d0/1fc3744004f49d2055d91446e1f95d706aec7750c3c5ebed3b2527d000cd/featransform-0.9.16-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-19 22:16:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "TsLu1s",
    "github_project": "Featransform",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "atlantic",
            "specs": [
                [
                    ">=",
                    "1.1.25"
                ]
            ]
        },
        {
            "name": "catboost",
            "specs": [
                [
                    ">=",
                    "1.2.2"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.2.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.19.5"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.2.2"
                ]
            ]
        },
        {
            "name": "h2o",
            "specs": [
                [
                    ">=",
                    "3.44.0.1"
                ]
            ]
        },
        {
            "name": "xgboost",
            "specs": [
                [
                    ">=",
                    "1.7.3"
                ]
            ]
        },
        {
            "name": "optuna",
            "specs": [
                [
                    ">=",
                    "2.10.1"
                ]
            ]
        },
        {
            "name": "statsmodels",
            "specs": [
                [
                    ">=",
                    "0.13.2"
                ]
            ]
        },
        {
            "name": "umap-learn",
            "specs": [
                [
                    ">=",
                    "0.5.5"
                ]
            ]
        }
    ],
    "lcname": "featransform"
}
        
Elapsed time: 0.29013s