strelok


Namestrelok JSON
Version 0.0.7 PyPI version JSON
download
home_page
SummaryStrelok is a simple Python package that provides FEAT, a feature engineering automation toolkit. With a focus on simplicity, it offers user definable pipelines to streamline the feature engineering process and improve the performance of machine learning models
upload_time2023-08-11 05:25:36
maintainer
docs_urlNone
authorJulius Riel
requires_python>=3.6
license
keywords strelok feature selection machine learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ===================================================
Strelok
===================================================

.. image:: https://badge.fury.io/py/strelok.svg
    :target: https://badge.fury.io/py/strelok


.. image:: https://static.pepy.tech/badge/strelok
   :target: https://pepy.tech/project/strelok

.. image:: https://static.pepy.tech/badge/strelok/week
   :target: https://pepy.tech/project/strelok

.. image:: https://static.pepy.tech/badge/strelok/month
   :target: https://pepy.tech/project/strelok

Overview
========

The Strelok library is a tool that offers support in the feature engineering process of machine learning projects. It aids in generating new features, dealing with missing values, creating interaction features, and executing feature selection. With Strelok, feature engineering can become a more streamlined process, contributing to shortened development times and potentially improved model performance.

Installation
============

To install Strelok, you can use `pip`:

.. code-block:: bash

   pip install strelok

Features
========

The Strelok library offers the following key features:

1. Mathematical Transformations: Generate new features by applying various mathematical transformations, such as logarithmic, exponential, square root, and more.

2. Missing Value Imputation: Fill missing values in your dataset using strategies like mean, median, mode, constant, forward fill, backward fill, interpolation, or KNN imputation.

3. Interaction Feature Generation: Create new features by combining existing features through operations like multiplication, addition, subtraction, and division.

4. Feature Selection: Select the most relevant features from your dataset using methods like univariate selection, recursive feature elimination (RFE), L1 regularization (Lasso), random forest importance, or correlation-based selection.

Method Details
==============

Before diving into examples of how to use the Strelok library, let's understand some of the core classes and their input parameters.

Pipeline
~~~~~~~~
This is the main class you'll interact with when using Strelok. It orchestrates the entire feature engineering process.

- `target_col` (string): The name of the column in your dataset which you want to target with Mathematical transformation or MissingValueImputation.

Methods:

- `add_feature(feature)`: Adds a feature object (e.g., an instance of `MathematicalTransformationFeature`, `MissingValueImputation`, or `InteractionFeature`) to the pipeline. The pipeline will process these features in the order they were added.

- `set_feature_selector(selector, not_X_col, y_col)`: Sets a feature selection method for the pipeline. Only one feature selection method can be active at a time. The `selector` should be an instance of `FeatureSelection`.
   - `feature_selector` (FeatureSelection): An instance of the FeatureSelection class representing the feature selection method.
   - `not_X_col` (list of strings, optional): A list of column names that should not be included as input features for feature selection. By default, it is an empty list. If y_col is provided, it is automatically appended to the not_X_col list to prevent the target column from being selected as the most relevant feature during feature selection. This ensures that the target column is excluded from the set of input features considered for selection.
   - `y_col` (list of strings): A list of column names representing the target variable. Default is an empty list.

- `generate_features(data)`: Applies all added features and the feature selector (if any) to the provided dataframe `data`, and returns a new dataframe with the engineered features. 



MathematicalTransformation
~~~~~~~~~~~~~~~~~~~~~~~~~~

This class performs mathematical transformations on a feature. 

- `name` (string): The name of the new feature. This name will be used to represent the transformed feature in the output dataframe.
- `transformation_type` (string): The type of mathematical transformation to apply. Supported values are:
    - 'logarithm': Applies the natural logarithm transformation. The input column should only contain positive numbers.
    - 'square_root': Applies the square root transformation. The input column should only contain non-negative numbers.
    - 'exponential': Applies the exponential transformation. Optional parameter 'power' (int or float) can be provided to specify the power of the transformation. Default is 1.
    - 'box_cox': Applies the Box-Cox transformation. The Box-Cox transformation requires the input column to contain positive values.
    - 'reciprocal': Applies the reciprocal transformation. The input column should contain non-zero values.
    - 'power': Applies the power transformation. Optional parameter 'power' (int or float) can be provided to specify the power of the transformation. Default is 2.
    - 'binning': Applies binning to the input column. Optional parameter 'num_bins' (int) can be provided to specify the number of bins. Default is 10.
    - 'standardization': Applies standardization to the input column. Optional parameters 'mean' (float) and 'std' (float) can be provided to specify the mean and standard deviation for standardization. By default, the mean and standard deviation are calculated from the input column.
    - 'rank': Computes the rank of the values in the input column.
    - 'difference': Computes the difference between the values in the input column and another feature specified by the 'other_feature' parameter.
    - 'relative_difference': Computes the relative difference between the values in the input column and a specified 'other_value'.
    - 'sin': Applies the sine transformation to the values in the input column.
    - 'cos': Applies the cosine transformation to the values in the input column.
    - 'mod_tan': Applies the modified tangent transformation to the values in the input column. It computes the tangent of the values as the ratio of the sine to the cosine of the values, with a small constant added to the denominator to prevent division by zero.

- `diff_col` (string, optional): The name of the existing column to be transformed, if not defined the column default to target_col in `Pipeline`
- `kwargs` (dictionary, optional): Additional parameters for specific transformation types.

In addition to the common inputs mentioned earlier, some mathematical transformations in the `MathematicalTransformation` class require additional parameters:

- 'exponential' transformation:

  - `power` (int or float, optional): The power of the exponential transformation. Default is 1.

- 'power' transformation:

  - `power` (int or float, optional): The power of the power transformation. Default is 2.

- 'binning' transformation:

  - `num_bins` (int, optional): The number of bins for binning. Default is 10.

- 'standardization' transformation:

  - `mean` (float, optional): The mean value for standardization. If not provided, the mean is calculated from the input column.
  - `std` (float, optional): The standard deviation for standardization. If not provided, the standard deviation is calculated from the input column.

- 'difference' transformation:

  - `other_feature` (string): The name of the other feature to compute the difference with.

- 'relative_difference' transformation:

  - `other_value` (float): The value to compute the relative difference with.

MissingValueImputation
~~~~~~~~~~~~~~~~~~~~~~

This class imputes missing values in a feature.

- `name` (string): The name of the new feature. This name will be used to represent the imputed feature in the output dataframe.
- `imputation_strategy` (string): The imputation strategy. Supported values are:
    - 'mean': Replaces missing values with the mean value of the non-missing values in the column. Suitable for numeric columns.
    - 'median': Replaces missing values with the median value of the non-missing values in the column. Suitable for numeric columns.
    - 'mode': Replaces missing values with the most frequent value in the column. Suitable for both numeric and categorical columns.
    - 'constant': Replaces missing values with a constant value (0).
    - 'forward_fill': Fills missing values with the previous non-missing value in the column (forward fill).
    - 'backward_fill': Fills missing values with the next non-missing value in the column (backward fill).
    - 'interpolation': Performs linear interpolation to fill missing values.
    - 'knn': Performs K-nearest neighbors imputation using the specified number of neighbors.
    - 'multiple': Performs multiple imputation using an iterative imputer.
    - 'missing_indicator': Creates a binary indicator column that flags missing values.

- `diff_col` (string, optional): The name of the existing column to be transformed. If not defined, the column defaults to the `target_col` in the `Pipeline`.

In addition to the common inputs mentioned earlier, some imputation strategies in the `MissingValueImputation` class require additional parameters:

- `knn` strategy:
    - `n_neighbors` (int): The number of nearest neighbors to consider when performing K-nearest neighbors imputation.

- `multiple` strategy:
    - No additional inputs are required. The `max_iter` and `random_state` parameters are set to default values.

InteractionFeature
~~~~~~~~~~~~~~~~~~
This class creates a new feature that is the interaction of two or more features.

- `name` (string): The name of the new feature. This name will be used to represent the interaction feature in the output dataframe.
- `interaction_type` (string): The type of interaction. Supported values are:
    - 'addition': Adds the values in the specified columns.
    - 'subtraction': Subtracts the values in the second column from the first. Only two columns are allowed in this case.
    - 'multiplication': Multiplies the values in the specified columns.
    - 'division': Divides the values in the first column by those in the second. Only two columns are allowed in this case, and the second column should not contain zero values.
- `columns` (list of strings): The names of the existing columns to be interacted. The list should contain at least two column names.

Feature Selection
~~~~~~~~~~~~~~~~~

This class selects top 'k' features based on a selection method.

- `method` (string): The feature selection method. Supported values are:
    - 'univariate': Selects features based on statistical tests.
    - 'rfe': Selects features using recursive feature elimination.
    - 'lasso': Selects features based on L1 regularization using Lasso.
    - 'random_forest': Selects features based on their importance in a trained random forest model.
    - 'pearson_correlation': Selects features based on Pearson correlation with the target.
    - 'spearman_correlation': Selects features based on Spearman correlation with the target.
    - 'box_cox': Selects features based on Box-Cox transformation.

- `k` (integer): The number of features to select.

In addition to the common inputs mentioned earlier, some feature selection methods in the `FeatureSelection` class require additional parameters:

- `correlation` methods (inlcudes `pearson` and `spearman`):
    - `correlation_threshold` (float): The threshold for selecting features based on their correlation with the target. Only features with a correlation above this threshold will be selected. Hence `k` is not required

- `box_cox` method:
    - `box_cox_threshold` (float): The threshold for selecting features based on their skewness using Box-Cox transformation. Only features with a skewness above this threshold will be selected.

Usage Examples
==============


Mathematical Transformations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   import pandas as pd
   from strelok import feat

   df = pd.DataFrame({'feature1': [1, 2, 3, 10], 'feature2': [2, 3, 4, 5], 'feature3': [1, 1, 1, 0], 'target': [0, 0, 0, 1]})
   pipeline = feat.Pipeline(target_col = 'feature1')

   log_feature = feat.MathematicalTransformation(name='logarithm_of_feature1', transformation_type='logarithm', diff_col='feature2') #diff_col not required, if left undefined target_col will be used

   pipeline.add_feature(log_feature)

   df_new = pipeline.generate_features(data=df)

Missing Value Imputation
~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   import pandas as pd
   import numpy as np
   from strelok import feat

   df = pd.DataFrame({'feature1': [1, np.nan, 3, 10], 'feature2': [2, 3, 4, 5], 'feature3': [1, 1, 1, 0], 'target': [0, 0, 0, 1]})

   pipeline = feat.Pipeline(target_col = 'feature1')
   
   pipeline.add_feature(feat.MissingValueImputationFeature(name='feature1', imputation_strategy='mean'))

   df_new = pipeline.generate_features(data=df)

Interaction Feature Generation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   import pandas as pd
   from strelok import feat

   df = pd.DataFrame({'feature1': [1, 2, 3, 10], 'feature2': [2, 3, 4, 5], 'feature3': [1, 1, 1, 0], 'target': [0, 0, 0, 1]})
   pipeline = feat.Pipeline(target_col = 'feature1')
   
   pipeline.add_feature(feat.InteractionFeature(method = 'add', columns=['feature1', 'feature2']))

   pipeline.generate_features(data=df)

Feature Selection
~~~~~~~~~~~~~~~~~

.. code-block:: python

   import pandas as pd
   from strelok import feat

   df = pd.DataFrame({'feature1': [1, 2, 3, 10], 'feature2': [2, 3, 4, 5], 'feature3': [1, 1, 1, 0], 'target': [0, 0, 0, 1]})
   pipeline = feat.Pipeline(target_col = 'feature1')
   
   pipeline.set_feature_selector(feat.FeatureSelection(method='univariate', k=2), not_X_col=[], y_col=['target'])

   pipeline.generate_features(data=df)

Complete example pipeline
~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   import pandas as pd
   from strelok import feat

   df = pd.DataFrame({'feature1': [1, np.nan, 3, 4],
                     'feature2': [5, 6, 7, 8],
                     'target': [0, 1, 0, 1]})

   pipeline = feat.Pipeline(target_col='feature1')

   # Add features to the pipeline
   pipeline.add_feature(feat.MissingValueImputationFeature(name='feature1', imputation_strategy='mean'))
   pipeline.add_feature(feat.MathematicalTransformationFeature(name='squared', transformation_type='power', power=2))
   pipeline.add_feature(feat.InteractionFeature(method = 'add', columns=['feature1', 'feature2', 'squared']))
   pipeline.set_feature_selector(feat.FeatureSelection(method='univariate', k=3), not_X_col=[], y_col=['target'])



   # Generate features on the dataset
   processed_data = pipeline.generate_features(data=df)

   # Print the processed data
   print(processed_data)


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "strelok",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "strelok,feature,selection,machine learning",
    "author": "Julius Riel",
    "author_email": "julius.riel@icloud.com",
    "download_url": "https://files.pythonhosted.org/packages/8b/1e/63aecc024f448453b2795c2c1c950712fc7f3c982d65fb7477de9d65b6b7/strelok-0.0.7.tar.gz",
    "platform": null,
    "description": "===================================================\nStrelok\n===================================================\n\n.. image:: https://badge.fury.io/py/strelok.svg\n    :target: https://badge.fury.io/py/strelok\n\n\n.. image:: https://static.pepy.tech/badge/strelok\n   :target: https://pepy.tech/project/strelok\n\n.. image:: https://static.pepy.tech/badge/strelok/week\n   :target: https://pepy.tech/project/strelok\n\n.. image:: https://static.pepy.tech/badge/strelok/month\n   :target: https://pepy.tech/project/strelok\n\nOverview\n========\n\nThe Strelok library is a tool that offers support in the feature engineering process of machine learning projects. It aids in generating new features, dealing with missing values, creating interaction features, and executing feature selection. With Strelok, feature engineering can become a more streamlined process, contributing to shortened development times and potentially improved model performance.\n\nInstallation\n============\n\nTo install Strelok, you can use `pip`:\n\n.. code-block:: bash\n\n   pip install strelok\n\nFeatures\n========\n\nThe Strelok library offers the following key features:\n\n1. Mathematical Transformations: Generate new features by applying various mathematical transformations, such as logarithmic, exponential, square root, and more.\n\n2. Missing Value Imputation: Fill missing values in your dataset using strategies like mean, median, mode, constant, forward fill, backward fill, interpolation, or KNN imputation.\n\n3. Interaction Feature Generation: Create new features by combining existing features through operations like multiplication, addition, subtraction, and division.\n\n4. Feature Selection: Select the most relevant features from your dataset using methods like univariate selection, recursive feature elimination (RFE), L1 regularization (Lasso), random forest importance, or correlation-based selection.\n\nMethod Details\n==============\n\nBefore diving into examples of how to use the Strelok library, let's understand some of the core classes and their input parameters.\n\nPipeline\n~~~~~~~~\nThis is the main class you'll interact with when using Strelok. It orchestrates the entire feature engineering process.\n\n- `target_col` (string): The name of the column in your dataset which you want to target with Mathematical transformation or MissingValueImputation.\n\nMethods:\n\n- `add_feature(feature)`: Adds a feature object (e.g., an instance of `MathematicalTransformationFeature`, `MissingValueImputation`, or `InteractionFeature`) to the pipeline. The pipeline will process these features in the order they were added.\n\n- `set_feature_selector(selector, not_X_col, y_col)`: Sets a feature selection method for the pipeline. Only one feature selection method can be active at a time. The `selector` should be an instance of `FeatureSelection`.\n   - `feature_selector` (FeatureSelection): An instance of the FeatureSelection class representing the feature selection method.\n   - `not_X_col` (list of strings, optional): A list of column names that should not be included as input features for feature selection. By default, it is an empty list. If y_col is provided, it is automatically appended to the not_X_col list to prevent the target column from being selected as the most relevant feature during feature selection. This ensures that the target column is excluded from the set of input features considered for selection.\n   - `y_col` (list of strings): A list of column names representing the target variable. Default is an empty list.\n\n- `generate_features(data)`: Applies all added features and the feature selector (if any) to the provided dataframe `data`, and returns a new dataframe with the engineered features. \n\n\n\nMathematicalTransformation\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThis class performs mathematical transformations on a feature. \n\n- `name` (string): The name of the new feature. This name will be used to represent the transformed feature in the output dataframe.\n- `transformation_type` (string): The type of mathematical transformation to apply. Supported values are:\n    - 'logarithm': Applies the natural logarithm transformation. The input column should only contain positive numbers.\n    - 'square_root': Applies the square root transformation. The input column should only contain non-negative numbers.\n    - 'exponential': Applies the exponential transformation. Optional parameter 'power' (int or float) can be provided to specify the power of the transformation. Default is 1.\n    - 'box_cox': Applies the Box-Cox transformation. The Box-Cox transformation requires the input column to contain positive values.\n    - 'reciprocal': Applies the reciprocal transformation. The input column should contain non-zero values.\n    - 'power': Applies the power transformation. Optional parameter 'power' (int or float) can be provided to specify the power of the transformation. Default is 2.\n    - 'binning': Applies binning to the input column. Optional parameter 'num_bins' (int) can be provided to specify the number of bins. Default is 10.\n    - 'standardization': Applies standardization to the input column. Optional parameters 'mean' (float) and 'std' (float) can be provided to specify the mean and standard deviation for standardization. By default, the mean and standard deviation are calculated from the input column.\n    - 'rank': Computes the rank of the values in the input column.\n    - 'difference': Computes the difference between the values in the input column and another feature specified by the 'other_feature' parameter.\n    - 'relative_difference': Computes the relative difference between the values in the input column and a specified 'other_value'.\n    - 'sin': Applies the sine transformation to the values in the input column.\n    - 'cos': Applies the cosine transformation to the values in the input column.\n    - 'mod_tan': Applies the modified tangent transformation to the values in the input column. It computes the tangent of the values as the ratio of the sine to the cosine of the values, with a small constant added to the denominator to prevent division by zero.\n\n- `diff_col` (string, optional): The name of the existing column to be transformed, if not defined the column default to target_col in `Pipeline`\n- `kwargs` (dictionary, optional): Additional parameters for specific transformation types.\n\nIn addition to the common inputs mentioned earlier, some mathematical transformations in the `MathematicalTransformation` class require additional parameters:\n\n- 'exponential' transformation:\n\n  - `power` (int or float, optional): The power of the exponential transformation. Default is 1.\n\n- 'power' transformation:\n\n  - `power` (int or float, optional): The power of the power transformation. Default is 2.\n\n- 'binning' transformation:\n\n  - `num_bins` (int, optional): The number of bins for binning. Default is 10.\n\n- 'standardization' transformation:\n\n  - `mean` (float, optional): The mean value for standardization. If not provided, the mean is calculated from the input column.\n  - `std` (float, optional): The standard deviation for standardization. If not provided, the standard deviation is calculated from the input column.\n\n- 'difference' transformation:\n\n  - `other_feature` (string): The name of the other feature to compute the difference with.\n\n- 'relative_difference' transformation:\n\n  - `other_value` (float): The value to compute the relative difference with.\n\nMissingValueImputation\n~~~~~~~~~~~~~~~~~~~~~~\n\nThis class imputes missing values in a feature.\n\n- `name` (string): The name of the new feature. This name will be used to represent the imputed feature in the output dataframe.\n- `imputation_strategy` (string): The imputation strategy. Supported values are:\n    - 'mean': Replaces missing values with the mean value of the non-missing values in the column. Suitable for numeric columns.\n    - 'median': Replaces missing values with the median value of the non-missing values in the column. Suitable for numeric columns.\n    - 'mode': Replaces missing values with the most frequent value in the column. Suitable for both numeric and categorical columns.\n    - 'constant': Replaces missing values with a constant value (0).\n    - 'forward_fill': Fills missing values with the previous non-missing value in the column (forward fill).\n    - 'backward_fill': Fills missing values with the next non-missing value in the column (backward fill).\n    - 'interpolation': Performs linear interpolation to fill missing values.\n    - 'knn': Performs K-nearest neighbors imputation using the specified number of neighbors.\n    - 'multiple': Performs multiple imputation using an iterative imputer.\n    - 'missing_indicator': Creates a binary indicator column that flags missing values.\n\n- `diff_col` (string, optional): The name of the existing column to be transformed. If not defined, the column defaults to the `target_col` in the `Pipeline`.\n\nIn addition to the common inputs mentioned earlier, some imputation strategies in the `MissingValueImputation` class require additional parameters:\n\n- `knn` strategy:\n    - `n_neighbors` (int): The number of nearest neighbors to consider when performing K-nearest neighbors imputation.\n\n- `multiple` strategy:\n    - No additional inputs are required. The `max_iter` and `random_state` parameters are set to default values.\n\nInteractionFeature\n~~~~~~~~~~~~~~~~~~\nThis class creates a new feature that is the interaction of two or more features.\n\n- `name` (string): The name of the new feature. This name will be used to represent the interaction feature in the output dataframe.\n- `interaction_type` (string): The type of interaction. Supported values are:\n    - 'addition': Adds the values in the specified columns.\n    - 'subtraction': Subtracts the values in the second column from the first. Only two columns are allowed in this case.\n    - 'multiplication': Multiplies the values in the specified columns.\n    - 'division': Divides the values in the first column by those in the second. Only two columns are allowed in this case, and the second column should not contain zero values.\n- `columns` (list of strings): The names of the existing columns to be interacted. The list should contain at least two column names.\n\nFeature Selection\n~~~~~~~~~~~~~~~~~\n\nThis class selects top 'k' features based on a selection method.\n\n- `method` (string): The feature selection method. Supported values are:\n    - 'univariate': Selects features based on statistical tests.\n    - 'rfe': Selects features using recursive feature elimination.\n    - 'lasso': Selects features based on L1 regularization using Lasso.\n    - 'random_forest': Selects features based on their importance in a trained random forest model.\n    - 'pearson_correlation': Selects features based on Pearson correlation with the target.\n    - 'spearman_correlation': Selects features based on Spearman correlation with the target.\n    - 'box_cox': Selects features based on Box-Cox transformation.\n\n- `k` (integer): The number of features to select.\n\nIn addition to the common inputs mentioned earlier, some feature selection methods in the `FeatureSelection` class require additional parameters:\n\n- `correlation` methods (inlcudes `pearson` and `spearman`):\n    - `correlation_threshold` (float): The threshold for selecting features based on their correlation with the target. Only features with a correlation above this threshold will be selected. Hence `k` is not required\n\n- `box_cox` method:\n    - `box_cox_threshold` (float): The threshold for selecting features based on their skewness using Box-Cox transformation. Only features with a skewness above this threshold will be selected.\n\nUsage Examples\n==============\n\n\nMathematical Transformations\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: python\n\n   import pandas as pd\n   from strelok import feat\n\n   df = pd.DataFrame({'feature1': [1, 2, 3, 10], 'feature2': [2, 3, 4, 5], 'feature3': [1, 1, 1, 0], 'target': [0, 0, 0, 1]})\n   pipeline = feat.Pipeline(target_col = 'feature1')\n\n   log_feature = feat.MathematicalTransformation(name='logarithm_of_feature1', transformation_type='logarithm', diff_col='feature2') #diff_col not required, if left undefined target_col will be used\n\n   pipeline.add_feature(log_feature)\n\n   df_new = pipeline.generate_features(data=df)\n\nMissing Value Imputation\n~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: python\n\n   import pandas as pd\n   import numpy as np\n   from strelok import feat\n\n   df = pd.DataFrame({'feature1': [1, np.nan, 3, 10], 'feature2': [2, 3, 4, 5], 'feature3': [1, 1, 1, 0], 'target': [0, 0, 0, 1]})\n\n   pipeline = feat.Pipeline(target_col = 'feature1')\n   \n   pipeline.add_feature(feat.MissingValueImputationFeature(name='feature1', imputation_strategy='mean'))\n\n   df_new = pipeline.generate_features(data=df)\n\nInteraction Feature Generation\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: python\n\n   import pandas as pd\n   from strelok import feat\n\n   df = pd.DataFrame({'feature1': [1, 2, 3, 10], 'feature2': [2, 3, 4, 5], 'feature3': [1, 1, 1, 0], 'target': [0, 0, 0, 1]})\n   pipeline = feat.Pipeline(target_col = 'feature1')\n   \n   pipeline.add_feature(feat.InteractionFeature(method = 'add', columns=['feature1', 'feature2']))\n\n   pipeline.generate_features(data=df)\n\nFeature Selection\n~~~~~~~~~~~~~~~~~\n\n.. code-block:: python\n\n   import pandas as pd\n   from strelok import feat\n\n   df = pd.DataFrame({'feature1': [1, 2, 3, 10], 'feature2': [2, 3, 4, 5], 'feature3': [1, 1, 1, 0], 'target': [0, 0, 0, 1]})\n   pipeline = feat.Pipeline(target_col = 'feature1')\n   \n   pipeline.set_feature_selector(feat.FeatureSelection(method='univariate', k=2), not_X_col=[], y_col=['target'])\n\n   pipeline.generate_features(data=df)\n\nComplete example pipeline\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: python\n\n   import pandas as pd\n   from strelok import feat\n\n   df = pd.DataFrame({'feature1': [1, np.nan, 3, 4],\n                     'feature2': [5, 6, 7, 8],\n                     'target': [0, 1, 0, 1]})\n\n   pipeline = feat.Pipeline(target_col='feature1')\n\n   # Add features to the pipeline\n   pipeline.add_feature(feat.MissingValueImputationFeature(name='feature1', imputation_strategy='mean'))\n   pipeline.add_feature(feat.MathematicalTransformationFeature(name='squared', transformation_type='power', power=2))\n   pipeline.add_feature(feat.InteractionFeature(method = 'add', columns=['feature1', 'feature2', 'squared']))\n   pipeline.set_feature_selector(feat.FeatureSelection(method='univariate', k=3), not_X_col=[], y_col=['target'])\n\n\n\n   # Generate features on the dataset\n   processed_data = pipeline.generate_features(data=df)\n\n   # Print the processed data\n   print(processed_data)\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Strelok is a simple Python package that provides FEAT, a feature engineering automation toolkit. With a focus on simplicity, it offers user definable pipelines to streamline the feature engineering process and improve the performance of machine learning models",
    "version": "0.0.7",
    "project_urls": null,
    "split_keywords": [
        "strelok",
        "feature",
        "selection",
        "machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8b1e63aecc024f448453b2795c2c1c950712fc7f3c982d65fb7477de9d65b6b7",
                "md5": "d2916184674ec5552a4fcc0e56d5b696",
                "sha256": "db3cd1e434181e4eaabb03763f23b2d7a30163dff95f1d2bd5c3af58eea1957a"
            },
            "downloads": -1,
            "filename": "strelok-0.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "d2916184674ec5552a4fcc0e56d5b696",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 11470,
            "upload_time": "2023-08-11T05:25:36",
            "upload_time_iso_8601": "2023-08-11T05:25:36.279961Z",
            "url": "https://files.pythonhosted.org/packages/8b/1e/63aecc024f448453b2795c2c1c950712fc7f3c982d65fb7477de9d65b6b7/strelok-0.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-11 05:25:36",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "strelok"
}
        
Elapsed time: 0.27789s