autofeat


Nameautofeat JSON
Version 2.1.2 PyPI version JSON
download
home_pagehttps://github.com/cod3licious/autofeat
SummaryAutomatic Feature Engineering and Selection Linear Prediction Model
upload_time2023-07-28 17:28:40
maintainer
docs_urlNone
authorFranziska Horn
requires_python>=3.8.1,<3.13
licenseMIT
keywords automl feature engineering feature selection linear model
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # `autofeat` library
### Linear Prediction Models with Automated Feature Engineering and Selection

This library contains the `AutoFeatRegressor` and `AutoFeatClassifier` models with a similar interface as `scikit-learn` models:
- `fit()` function to fit the model parameters
- `predict()` function to predict the target variable given the input
- `predict_proba()` function to predict probabilities of the target variable given the input (classifier only)
- `score()` function to calculate the goodness of the fit (R^2/accuracy)
- `fit_transform()` and `transform()` functions, which extend the given data by the additional features that were engineered and selected by the model

When calling the `fit()` function, internally the `fit_transform()` function will be called, so if you're planing to call `transform()` on the same data anyways, just call `fit_transform()` right away. `transform()` is mostly useful if you've split your data into training and test data and did not call `fit_transform()` on your whole dataset. The `predict()` and `score()` functions can either be given data in the format of the original dataframe that was used when calling `fit()`/`fit_transform()` or they can be given an already transformed dataframe.

In addition, only the feature selection part is also available in the `FeatureSelector` model.

Furthermore (as of version 2.0.0), minimal feature selection (removing zero variance and redundant features), engineering (simple product and ratio of features), and scaling (power transform to make features more normally distributed) is also available in the `AutoFeatLight` model.

The `AutoFeatRegressor`, `AutoFeatClassifier`, and `FeatureSelector` models need to be **fit on data without NaNs**, as they internally call the sklearn `LassoLarsCV` model, which can not handle NaNs. When calling `transform()`, NaNs (but not `np.inf`) are okay.

The [autofeat examples notebook](https://github.com/cod3licious/autofeat/blob/master/autofeat_examples.ipynb) contains a simple usage example - try it out! :) Additional examples can be found in the autofeat benchmark notebooks for [regression](https://github.com/cod3licious/autofeat/blob/master/autofeat_benchmark_regression.ipynb) (which also contains the code to reproduce the results from the paper mentioned below) and [classification](https://github.com/cod3licious/autofeat/blob/master/autofeat_benchmark_classification.ipynb), as well as the testing scripts.

Please keep in mind that since the `AutoFeatRegressor` and `AutoFeatClassifier` models can generate very complex features, they might **overfit on noise** in the dataset, i.e., find some new features that lead to good prediction on the training set but result in a poor performance on new test samples. While this usually only happens for datasets with very few samples, we suggest you carefully inspect the features found by `autofeat` and use those that make sense to you to train your own models.

Depending on the number of `feateng_steps` (default 2) and the number of input features, `autofeat` can generate a very huge feature matrix (before selecting the most appropriate features from this large feature pool). By specifying in `feateng_cols` those columns that you expect to be most valuable in the feature engineering part, the number of features can be greatly reduced. Additionally, `transformations` can be limited to only those feature transformations that make sense for your data. Last but not least, you can subsample the data used for training the model to limit the memory requirements. After the model was fit, you can call `transform()` on your whole dataset to generate only those few features that were selected during `fit()`/`fit_transform()`.


### Installation
You can either download the code from here and include the autofeat folder in your `$PYTHONPATH` or install (the library components only) via pip:

    $ pip install autofeat

The library requires Python 3! Other dependencies: `numpy`, `pandas`, `scikit-learn`, `sympy`, `joblib`, `pint` and `numba`.


### Paper
For further details on the model and implementation please refer to the [paper](https://arxiv.org/abs/1901.07329)  - and of course if any of this code was helpful for your research, please consider citing it:
```
@inproceedings{horn2019autofeat,
  title={The autofeat Python Library for Automated Feature Engineering and Selection},
  author={Horn, Franziska and Pack, Robert and Rieger, Michael},
  booktitle={Joint European Conference on Machine Learning and Knowledge Discovery in Databases},
  pages={111--120},
  year={2019},
  organization={Springer}
}
```

If you don't like reading, you can also watch a video of my [talk at the PyData conference](https://www.youtube.com/watch?v=4-4pKPv9lJ4) about automated feature engineering and selection with `autofeat`.

The code is intended for research purposes.

If you have any questions please don't hesitate to send me an [email](mailto:cod3licious@gmail.com) and of course if you should find any bugs or want to contribute other improvements, pull requests are very welcome!

### Acknowledgments

This project was made possible thanks to support by [BASF](https://www.basf.com).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cod3licious/autofeat",
    "name": "autofeat",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8.1,<3.13",
    "maintainer_email": "",
    "keywords": "automl,feature engineering,feature selection,linear model",
    "author": "Franziska Horn",
    "author_email": "cod3licious@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/0c/99/6a64d96dc056a1c3c9822c02ad010053d678d644a6e5462ee263833fe981/autofeat-2.1.2.tar.gz",
    "platform": null,
    "description": "# `autofeat` library\n### Linear Prediction Models with Automated Feature Engineering and Selection\n\nThis library contains the `AutoFeatRegressor` and `AutoFeatClassifier` models with a similar interface as `scikit-learn` models:\n- `fit()` function to fit the model parameters\n- `predict()` function to predict the target variable given the input\n- `predict_proba()` function to predict probabilities of the target variable given the input (classifier only)\n- `score()` function to calculate the goodness of the fit (R^2/accuracy)\n- `fit_transform()` and `transform()` functions, which extend the given data by the additional features that were engineered and selected by the model\n\nWhen calling the `fit()` function, internally the `fit_transform()` function will be called, so if you're planing to call `transform()` on the same data anyways, just call `fit_transform()` right away. `transform()` is mostly useful if you've split your data into training and test data and did not call `fit_transform()` on your whole dataset. The `predict()` and `score()` functions can either be given data in the format of the original dataframe that was used when calling `fit()`/`fit_transform()` or they can be given an already transformed dataframe.\n\nIn addition, only the feature selection part is also available in the `FeatureSelector` model.\n\nFurthermore (as of version 2.0.0), minimal feature selection (removing zero variance and redundant features), engineering (simple product and ratio of features), and scaling (power transform to make features more normally distributed) is also available in the `AutoFeatLight` model.\n\nThe `AutoFeatRegressor`, `AutoFeatClassifier`, and `FeatureSelector` models need to be **fit on data without NaNs**, as they internally call the sklearn `LassoLarsCV` model, which can not handle NaNs. When calling `transform()`, NaNs (but not `np.inf`) are okay.\n\nThe [autofeat examples notebook](https://github.com/cod3licious/autofeat/blob/master/autofeat_examples.ipynb) contains a simple usage example - try it out! :) Additional examples can be found in the autofeat benchmark notebooks for [regression](https://github.com/cod3licious/autofeat/blob/master/autofeat_benchmark_regression.ipynb) (which also contains the code to reproduce the results from the paper mentioned below) and [classification](https://github.com/cod3licious/autofeat/blob/master/autofeat_benchmark_classification.ipynb), as well as the testing scripts.\n\nPlease keep in mind that since the `AutoFeatRegressor` and `AutoFeatClassifier` models can generate very complex features, they might **overfit on noise** in the dataset, i.e., find some new features that lead to good prediction on the training set but result in a poor performance on new test samples. While this usually only happens for datasets with very few samples, we suggest you carefully inspect the features found by `autofeat` and use those that make sense to you to train your own models.\n\nDepending on the number of `feateng_steps` (default 2) and the number of input features, `autofeat` can generate a very huge feature matrix (before selecting the most appropriate features from this large feature pool). By specifying in `feateng_cols` those columns that you expect to be most valuable in the feature engineering part, the number of features can be greatly reduced. Additionally, `transformations` can be limited to only those feature transformations that make sense for your data. Last but not least, you can subsample the data used for training the model to limit the memory requirements. After the model was fit, you can call `transform()` on your whole dataset to generate only those few features that were selected during `fit()`/`fit_transform()`.\n\n\n### Installation\nYou can either download the code from here and include the autofeat folder in your `$PYTHONPATH` or install (the library components only) via pip:\n\n    $ pip install autofeat\n\nThe library requires Python 3! Other dependencies: `numpy`, `pandas`, `scikit-learn`, `sympy`, `joblib`, `pint` and `numba`.\n\n\n### Paper\nFor further details on the model and implementation please refer to the [paper](https://arxiv.org/abs/1901.07329)  - and of course if any of this code was helpful for your research, please consider citing it:\n```\n@inproceedings{horn2019autofeat,\n  title={The autofeat Python Library for Automated Feature Engineering and Selection},\n  author={Horn, Franziska and Pack, Robert and Rieger, Michael},\n  booktitle={Joint European Conference on Machine Learning and Knowledge Discovery in Databases},\n  pages={111--120},\n  year={2019},\n  organization={Springer}\n}\n```\n\nIf you don't like reading, you can also watch a video of my [talk at the PyData conference](https://www.youtube.com/watch?v=4-4pKPv9lJ4) about automated feature engineering and selection with `autofeat`.\n\nThe code is intended for research purposes.\n\nIf you have any questions please don't hesitate to send me an [email](mailto:cod3licious@gmail.com) and of course if you should find any bugs or want to contribute other improvements, pull requests are very welcome!\n\n### Acknowledgments\n\nThis project was made possible thanks to support by [BASF](https://www.basf.com).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Automatic Feature Engineering and Selection Linear Prediction Model",
    "version": "2.1.2",
    "project_urls": {
        "Homepage": "https://github.com/cod3licious/autofeat",
        "Repository": "https://github.com/cod3licious/autofeat"
    },
    "split_keywords": [
        "automl",
        "feature engineering",
        "feature selection",
        "linear model"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "17e35a1cf2cd0035a7e02d4e022aead457256e6917ba77eedc3550ea9fe29530",
                "md5": "ecd8def4f6cbf699b505499741134e0c",
                "sha256": "2125d31fb59d084e8ff31687caa182bb3fb2927e5e762222cec9397bcb68b0f4"
            },
            "downloads": -1,
            "filename": "autofeat-2.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ecd8def4f6cbf699b505499741134e0c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8.1,<3.13",
            "size": 25234,
            "upload_time": "2023-07-28T17:28:38",
            "upload_time_iso_8601": "2023-07-28T17:28:38.458202Z",
            "url": "https://files.pythonhosted.org/packages/17/e3/5a1cf2cd0035a7e02d4e022aead457256e6917ba77eedc3550ea9fe29530/autofeat-2.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c996a64d96dc056a1c3c9822c02ad010053d678d644a6e5462ee263833fe981",
                "md5": "a8e9576051b06489b7687e26990c8ba3",
                "sha256": "d2fa3c618b7c9c51467577b9dc0a0e1d10db7c25c196e47de32634856870b11f"
            },
            "downloads": -1,
            "filename": "autofeat-2.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "a8e9576051b06489b7687e26990c8ba3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8.1,<3.13",
            "size": 25099,
            "upload_time": "2023-07-28T17:28:40",
            "upload_time_iso_8601": "2023-07-28T17:28:40.173947Z",
            "url": "https://files.pythonhosted.org/packages/0c/99/6a64d96dc056a1c3c9822c02ad010053d678d644a6e5462ee263833fe981/autofeat-2.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-28 17:28:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cod3licious",
    "github_project": "autofeat",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "autofeat"
}
        
Elapsed time: 0.09240s