# Feature-engine
[](http://feature-engine.readthedocs.io)
| | |
| --- | --- |
| Package | [](https://pypi.org/project/feature-engine/) [](https://pypi.org/project/feature-engine) [](https://anaconda.org/conda-forge/feature_engine) [](https://img.shields.io/pypi/dm/feature-engine) [)](https://pepy.tech/project/feature-engine)|
| Meta | [](https://github.com/feature-engine/feature_engine/blob/master/LICENSE.md) [](https://github.com/feature-engine/feature_engine/graphs/contributors) [](https://gitter.im/feature_engine/community) [](https://www.firsttimersonly.com/) [](https://www.trainindata.com/) |
| Documentation | [](https://feature-engine.readthedocs.io/en/latest/index.html) [](https://zenodo.org/badge/latestdoi/163630824) [](https://doi.org/10.21105/joss.03642) |
| Testing | [](https://app.circleci.com/pipelines/github/feature-engine/feature_engine) [](https://codecov.io/github/feature-engine/feature_engine) [](https://github.com/psf/black) |
<div align="center">
</div>
Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models.
Feature-engine's transformers follow Scikit-learn's functionality with fit() and transform() methods to learn the
transforming parameters from the data and then transform it.
## Feature-engine features in the following resources
* [Feature Engineering for Machine Learning, Online Course](https://www.trainindata.com/p/feature-engineering-for-machine-learning)
* [Feature Selection for Machine Learning, Online Course](https://www.trainindata.com/p/feature-selection-for-machine-learning)
* [Feature Engineering for Time Series Forecasting, Online Course](https://www.trainindata.com/p/feature-engineering-for-forecasting)
* [Forecasting with Machine Learning, Online Course](https://www.trainindata.com/p/forecasting-with-machine-learning)
* [Python Feature Engineering Cookbook](https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587)
* [Feature Selection in Machine Learning Book](https://leanpub.com/feature-selection-in-machine-learning)
## Blogs about Feature-engine
* [Feature-engine: A new open-source Python package for feature engineering](https://trainindata.medium.com/feature-engine-a-new-open-source-python-package-for-feature-engineering-29a0ab88ea7c)
* [Practical Code Implementations of Feature Engineering for Machine Learning with Python](https://towardsdatascience.com/practical-code-implementations-of-feature-engineering-for-machine-learning-with-python-f13b953d4bcd)
## Documentation
* [Documentation](https://feature-engine.trainindata.com)
## Pst! How did you find us?
We want to share Feature-engine with more people. It'd help us loads if you tell us
how you discovered us.
Then we'd know what we are doing right and which channels to use to share the love.
Please share your story by answering 1 quick question
[at this link](https://docs.google.com/forms/d/e/1FAIpQLSfxvgnJvuvPf2XgosakhXo5VNQafqRrjNXkoW5qDWqnuxZNSQ/viewform?usp=sf_link)
. 😃
## Current Feature-engine's transformers include functionality for:
* Missing Data Imputation
* Categorical Encoding
* Discretisation
* Outlier Capping or Removal
* Variable Transformation
* Variable Creation
* Variable Selection
* Datetime Features
* Time Series
* Preprocessing
* Scaling
* Scikit-learn Wrappers
### Imputation Methods
* MeanMedianImputer
* ArbitraryNumberImputer
* RandomSampleImputer
* EndTailImputer
* CategoricalImputer
* AddMissingIndicator
* DropMissingData
### Encoding Methods
* OneHotEncoder
* OrdinalEncoder
* CountFrequencyEncoder
* MeanEncoder
* WoEEncoder
* RareLabelEncoder
* DecisionTreeEncoder
* StringSimilarityEncoder
### Discretisation methods
* EqualFrequencyDiscretiser
* EqualWidthDiscretiser
* GeometricWidthDiscretiser
* DecisionTreeDiscretiser
* ArbitraryDiscreriser
### Outlier Handling methods
* Winsorizer
* ArbitraryOutlierCapper
* OutlierTrimmer
### Variable Transformation methods
* LogTransformer
* LogCpTransformer
* ReciprocalTransformer
* ArcsinTransformer
* PowerTransformer
* BoxCoxTransformer
* YeoJohnsonTransformer
### Variable Scaling methods
* MeanNormalizationScaler
### Variable Creation:
* MathFeatures
* RelativeFeatures
* CyclicalFeatures
* DecisionTreeFeatures()
### Feature Selection:
* DropFeatures
* DropConstantFeatures
* DropDuplicateFeatures
* DropCorrelatedFeatures
* SmartCorrelationSelection
* ShuffleFeaturesSelector
* SelectBySingleFeaturePerformance
* SelectByTargetMeanPerformance
* RecursiveFeatureElimination
* RecursiveFeatureAddition
* DropHighPSIFeatures
* SelectByInformationValue
* ProbeFeatureSelection
* MRMR
### Datetime
* DatetimeFeatures
* DatetimeSubtraction
### Time Series
* LagFeatures
* WindowFeatures
* ExpandingWindowFeatures
### Pipelines
* Pipeline
* make_pipeline
### Preprocessing
* MatchCategories
* MatchVariables
### Wrappers:
* SklearnTransformerWrapper
## Installation
From PyPI using pip:
```
pip install feature_engine
```
From Anaconda:
```
conda install -c conda-forge feature_engine
```
Or simply clone it:
```
git clone https://github.com/feature-engine/feature_engine.git
```
## Example Usage
```python
>>> import pandas as pd
>>> from feature_engine.encoding import RareLabelEncoder
>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
```
```
Out[1]:
A 10
B 10
C 2
D 1
Name: var_A, dtype: int64
```
```python
>>> rare_encoder = RareLabelEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
```
```
Out[2]:
A 10
B 10
Rare 3
Name: var_A, dtype: int64
```
Find more examples in our [Jupyter Notebook Gallery](https://nbviewer.org/github/feature-engine/feature-engine-examples/tree/main/)
or in the [documentation](https://feature-engine.trainindata.com).
## Contribute
Details about how to contribute can be found in the [Contribute Page](https://feature-engine.trainindata.com/en/latest/contribute/index.html)
Briefly:
- Fork the repo
- Clone your fork into your local computer:
```
git clone https://github.com/<YOURUSERNAME>/feature_engine.git
```
- navigate into the repo folder
```
cd feature_engine
```
- Install Feature-engine as a developer:
```
pip install -e .
```
- Optional: Create and activate a virtual environment with any tool of choice
- Install Feature-engine dependencies:
```
pip install -r requirements.txt
```
and
```
pip install -r test_requirements.txt
```
- Create a feature branch with a meaningful name for your feature:
```
git checkout -b myfeaturebranch
```
- Develop your feature, tests and documentation
- Make sure the tests pass
- Make a PR
Thank you!!
### Documentation
Feature-engine documentation is built using [Sphinx](https://www.sphinx-doc.org) and is hosted on [Read the Docs](https://readthedocs.org/).
To build the documentation make sure you have the dependencies installed: from the root directory:
```
pip install -r docs/requirements.txt
```
Now you can build the docs using:
```
sphinx-build -b html docs build
```
## License
The content of this repository is licensed under a [BSD 3-Clause license](https://github.com/feature-engine/feature_engine/blob/main/LICENSE.md).
## Sponsor us
[Sponsor us](https://github.com/sponsors/feature-engine) and support further our
mission to democratize machine learning and programming tools through open-source
software.
Raw data
{
"_id": null,
"home_page": "http://github.com/feature-engine/feature_engine",
"name": "feature-engine",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9.0",
"maintainer_email": null,
"keywords": null,
"author": "Soledad Galli",
"author_email": "solegalli@protonmail.com",
"download_url": "https://files.pythonhosted.org/packages/6e/fb/f00c5c3153d97faea382ee17ea26fc4cc6351e31bc9c0438da70c685faae/feature_engine-1.8.2.tar.gz",
"platform": null,
"description": "# Feature-engine\n\n[](http://feature-engine.readthedocs.io)\n\n| | |\n| --- | --- |\n| Package | [](https://pypi.org/project/feature-engine/) [](https://pypi.org/project/feature-engine) [](https://anaconda.org/conda-forge/feature_engine) [](https://img.shields.io/pypi/dm/feature-engine) [)](https://pepy.tech/project/feature-engine)|\n| Meta | [](https://github.com/feature-engine/feature_engine/blob/master/LICENSE.md) [](https://github.com/feature-engine/feature_engine/graphs/contributors) [](https://gitter.im/feature_engine/community) [](https://www.firsttimersonly.com/) [](https://www.trainindata.com/) |\n| Documentation | [](https://feature-engine.readthedocs.io/en/latest/index.html) [](https://zenodo.org/badge/latestdoi/163630824) [](https://doi.org/10.21105/joss.03642) |\n| Testing | [](https://app.circleci.com/pipelines/github/feature-engine/feature_engine) [](https://codecov.io/github/feature-engine/feature_engine) [](https://github.com/psf/black) |\n<div align=\"center\">\n\n\n</div>\n\nFeature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. \nFeature-engine's transformers follow Scikit-learn's functionality with fit() and transform() methods to learn the \ntransforming parameters from the data and then transform it.\n\n\n## Feature-engine features in the following resources\n\n* [Feature Engineering for Machine Learning, Online Course](https://www.trainindata.com/p/feature-engineering-for-machine-learning)\n\n* [Feature Selection for Machine Learning, Online Course](https://www.trainindata.com/p/feature-selection-for-machine-learning)\n\n* [Feature Engineering for Time Series Forecasting, Online Course](https://www.trainindata.com/p/feature-engineering-for-forecasting)\n\n* [Forecasting with Machine Learning, Online Course](https://www.trainindata.com/p/forecasting-with-machine-learning)\n\n* [Python Feature Engineering Cookbook](https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587)\n\n* [Feature Selection in Machine Learning Book](https://leanpub.com/feature-selection-in-machine-learning)\n\n\n## Blogs about Feature-engine\n\n* [Feature-engine: A new open-source Python package for feature engineering](https://trainindata.medium.com/feature-engine-a-new-open-source-python-package-for-feature-engineering-29a0ab88ea7c)\n\n* [Practical Code Implementations of Feature Engineering for Machine Learning with Python](https://towardsdatascience.com/practical-code-implementations-of-feature-engineering-for-machine-learning-with-python-f13b953d4bcd)\n\n\n## Documentation\n\n* [Documentation](https://feature-engine.trainindata.com)\n\n\n## Pst! How did you find us?\n\nWe want to share Feature-engine with more people. It'd help us loads if you tell us\nhow you discovered us.\n\nThen we'd know what we are doing right and which channels to use to share the love.\n\nPlease share your story by answering 1 quick question\n[at this link](https://docs.google.com/forms/d/e/1FAIpQLSfxvgnJvuvPf2XgosakhXo5VNQafqRrjNXkoW5qDWqnuxZNSQ/viewform?usp=sf_link)\n. \ud83d\ude03\n\n## Current Feature-engine's transformers include functionality for:\n\n* Missing Data Imputation\n* Categorical Encoding\n* Discretisation\n* Outlier Capping or Removal\n* Variable Transformation\n* Variable Creation\n* Variable Selection\n* Datetime Features\n* Time Series\n* Preprocessing\n* Scaling\n* Scikit-learn Wrappers\n\n### Imputation Methods\n* MeanMedianImputer\n* ArbitraryNumberImputer\n* RandomSampleImputer\n* EndTailImputer\n* CategoricalImputer\n* AddMissingIndicator\n* DropMissingData\n\n### Encoding Methods\n* OneHotEncoder\n* OrdinalEncoder\n* CountFrequencyEncoder\n* MeanEncoder\n* WoEEncoder\n* RareLabelEncoder\n* DecisionTreeEncoder\n* StringSimilarityEncoder\n\n### Discretisation methods\n* EqualFrequencyDiscretiser\n* EqualWidthDiscretiser\n* GeometricWidthDiscretiser\n* DecisionTreeDiscretiser\n* ArbitraryDiscreriser\n\n### Outlier Handling methods\n* Winsorizer\n* ArbitraryOutlierCapper\n* OutlierTrimmer\n\n### Variable Transformation methods\n* LogTransformer\n* LogCpTransformer\n* ReciprocalTransformer\n* ArcsinTransformer\n* PowerTransformer\n* BoxCoxTransformer\n* YeoJohnsonTransformer\n\n### Variable Scaling methods\n* MeanNormalizationScaler\n\n### Variable Creation:\n * MathFeatures\n * RelativeFeatures\n * CyclicalFeatures\n * DecisionTreeFeatures()\n\n### Feature Selection:\n * DropFeatures\n * DropConstantFeatures\n * DropDuplicateFeatures\n * DropCorrelatedFeatures\n * SmartCorrelationSelection\n * ShuffleFeaturesSelector\n * SelectBySingleFeaturePerformance\n * SelectByTargetMeanPerformance\n * RecursiveFeatureElimination\n * RecursiveFeatureAddition\n * DropHighPSIFeatures\n * SelectByInformationValue\n * ProbeFeatureSelection\n * MRMR\n\n### Datetime\n * DatetimeFeatures\n * DatetimeSubtraction\n \n### Time Series\n * LagFeatures\n * WindowFeatures\n * ExpandingWindowFeatures\n \n### Pipelines\n* Pipeline\n* make_pipeline\n\n### Preprocessing\n * MatchCategories\n * MatchVariables\n \n### Wrappers:\n * SklearnTransformerWrapper\n\n## Installation\n\nFrom PyPI using pip:\n\n```\npip install feature_engine\n```\n\nFrom Anaconda:\n\n```\nconda install -c conda-forge feature_engine\n```\n\nOr simply clone it:\n\n```\ngit clone https://github.com/feature-engine/feature_engine.git\n```\n\n## Example Usage\n\n```python\n>>> import pandas as pd\n>>> from feature_engine.encoding import RareLabelEncoder\n\n>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}\n>>> data = pd.DataFrame(data)\n>>> data['var_A'].value_counts()\n```\n\n```\nOut[1]:\nA 10\nB 10\nC 2\nD 1\nName: var_A, dtype: int64\n```\n \n```python \n>>> rare_encoder = RareLabelEncoder(tol=0.10, n_categories=3)\n>>> data_encoded = rare_encoder.fit_transform(data)\n>>> data_encoded['var_A'].value_counts()\n```\n\n```\nOut[2]:\nA 10\nB 10\nRare 3\nName: var_A, dtype: int64\n```\n\nFind more examples in our [Jupyter Notebook Gallery](https://nbviewer.org/github/feature-engine/feature-engine-examples/tree/main/) \nor in the [documentation](https://feature-engine.trainindata.com).\n\n## Contribute\n\nDetails about how to contribute can be found in the [Contribute Page](https://feature-engine.trainindata.com/en/latest/contribute/index.html)\n\nBriefly:\n\n- Fork the repo\n- Clone your fork into your local computer:\n```\ngit clone https://github.com/<YOURUSERNAME>/feature_engine.git\n```\n- navigate into the repo folder\n```\ncd feature_engine\n```\n- Install Feature-engine as a developer: \n```\npip install -e .\n```\n- Optional: Create and activate a virtual environment with any tool of choice\n- Install Feature-engine dependencies: \n```\npip install -r requirements.txt\n``` \nand \n```\npip install -r test_requirements.txt\n```\n- Create a feature branch with a meaningful name for your feature: \n```\ngit checkout -b myfeaturebranch\n```\n- Develop your feature, tests and documentation\n- Make sure the tests pass\n- Make a PR\n\nThank you!!\n\n\n### Documentation\n\nFeature-engine documentation is built using [Sphinx](https://www.sphinx-doc.org) and is hosted on [Read the Docs](https://readthedocs.org/).\n\nTo build the documentation make sure you have the dependencies installed: from the root directory: \n```\npip install -r docs/requirements.txt\n```\n\nNow you can build the docs using: \n```\nsphinx-build -b html docs build\n```\n\n\n## License\n\nThe content of this repository is licensed under a [BSD 3-Clause license](https://github.com/feature-engine/feature_engine/blob/main/LICENSE.md).\n\n## Sponsor us\n\n[Sponsor us](https://github.com/sponsors/feature-engine) and support further our \nmission to democratize machine learning and programming tools through open-source \nsoftware.\n\n\n",
"bugtrack_url": null,
"license": "BSD 3 clause",
"summary": "Feature engineering and selection package with Scikit-learn's fit transform functionality",
"version": "1.8.2",
"project_urls": {
"Homepage": "http://github.com/feature-engine/feature_engine"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "266af947404b55d8008035895ce33d6b1326cfb2412478f31e4548c184fefab2",
"md5": "0607aa0aea9b5eea3323eef0d2200ccb",
"sha256": "2315b0625beec8a52801d048e937591ef36225ad5ef32e5475615a235a491dd0"
},
"downloads": -1,
"filename": "feature_engine-1.8.2-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "0607aa0aea9b5eea3323eef0d2200ccb",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.9.0",
"size": 374974,
"upload_time": "2024-11-03T09:08:45",
"upload_time_iso_8601": "2024-11-03T09:08:45.979347Z",
"url": "https://files.pythonhosted.org/packages/26/6a/f947404b55d8008035895ce33d6b1326cfb2412478f31e4548c184fefab2/feature_engine-1.8.2-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6efbf00c5c3153d97faea382ee17ea26fc4cc6351e31bc9c0438da70c685faae",
"md5": "2ed2dd8455ea09615c5e29b9fab68d79",
"sha256": "d51d3197b9245ec1c286f6562111788c1be927ba4f700d2064ea94f623acab01"
},
"downloads": -1,
"filename": "feature_engine-1.8.2.tar.gz",
"has_sig": false,
"md5_digest": "2ed2dd8455ea09615c5e29b9fab68d79",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9.0",
"size": 232768,
"upload_time": "2024-11-03T09:08:47",
"upload_time_iso_8601": "2024-11-03T09:08:47.734692Z",
"url": "https://files.pythonhosted.org/packages/6e/fb/f00c5c3153d97faea382ee17ea26fc4cc6351e31bc9c0438da70c685faae/feature_engine-1.8.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-03 09:08:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "feature-engine",
"github_project": "feature_engine",
"travis_ci": false,
"coveralls": true,
"github_actions": false,
"circle": true,
"requirements": [],
"test_requirements": [],
"tox": true,
"lcname": "feature-engine"
}