create-vars


Namecreate-vars JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/ravennaro/create_vars
SummaryCreate variables in time
upload_time2023-08-11 17:45:33
maintainer
docs_urlNone
authorRavenna Oliveira
requires_python>=3.7
licenseApache Software License 2.0
keywords nbdev jupyter notebook python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # create_vars

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

``` python
from create_vars.vars import *
```

``` python
import pandas as pd
import random
import numpy as np
```

Criando variáveis numéricas e categóricas

## Install

``` sh
pip install create_vars
```

## How to use

### Criando dataframe aleatório

Criando DataFrame com variáveis categóricas e numéricas

``` python
# Criando o DataFrame
data = {
    'ID_cliente': [random.choice(list(range(1, 101)))for _ in range(100)],
    'Safra': [random.choice([202207, 202209, 202212, 202301, 202207, 202302, 202305, 202306]) for _ in range(100)],
    'Feat_cat': [random.choice(['A', 'B', 'C']) for _ in range(100)],    
    'Feat_num1': np.random.randint(0, 100, size=100),
    'Feat_num2': np.random.randint(0, 100, size=100)
}
df = pd.DataFrame(data)
```

``` python
df.head()
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID_cliente | Safra  | Feat_cat | Feat_num1 | Feat_num2 |
|-----|------------|--------|----------|-----------|-----------|
| 0   | 75         | 202207 | A        | 72        | 66        |
| 1   | 30         | 202209 | B        | 44        | 90        |
| 2   | 70         | 202301 | B        | 82        | 33        |
| 3   | 76         | 202302 | A        | 37        | 70        |
| 4   | 81         | 202305 | C        | 76        | 17        |

</div>

O Dataframe criado tem o ID do cliente aleatório, que pode se repetir em
datas diferentes.Por exemplo:

``` python
df.groupby('ID_cliente')['Safra'].value_counts().sort_values(ascending=False)
```

    ID_cliente  Safra 
    11          202207    2
    70          202301    2
    3           202305    1
    73          202209    1
    79          202207    1
                         ..
    35          202302    1
    34          202305    1
                202306    1
    33          202306    1
    99          202306    1
    Name: count, Length: 98, dtype: int64

A safra corresponde a data que cada variável foi calculada. Ela precisa
estar no formato YYYYMM ou YYYYMMDD. Para o nosso exemplo teremos a
seguinte distribuição das datas:

``` python
df['Safra'].value_counts().sort_index()
```

    Safra
    202207    23
    202209    12
    202212     5
    202301    13
    202302    15
    202305    15
    202306    17
    Name: count, dtype: int64

### Safra de ref

Criando safra de referencia no formato YYYYMM:

``` python
df['safra_ref'] = '20230702'
```

Também podemos utilizar o formato YYYYMMDD.

### Variáveis numéricas

A partir do ID_cliente do cliente e da Safra, agrupamos as entradas e
calculamos variáveis do tipo ‘sum’, ‘mean’ e ‘count’ que são fornecidas
em forma de lista em ‘operations’ para variáveis numéricas. As variáveis
utilizadas são listadas em ‘value_var’ e calculadas nas janelas de tempo
em ‘window’ para a data de referência em ‘ref_time_var’.

``` python
id_cols = ['ID_cliente','Safra']
trns_time = 'Safra'
ref_time = 'safra_ref'
value_var = ['Feat_num1','Feat_num2']
window = [-1,-2,-3,-6,-9,-12,-15]
operations = ['sum','mean','count']

df_vars_num = create_vars_in_time(df,id_cols,trns_time,ref_time,value_var,window,operations)
```

``` python
df_vars_num.head()
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID_cliente | Safra  | Feat_num1_sum_1M | Feat_num1_mean_1M | Feat_num1_count_1M | Feat_num1_sum_2M | Feat_num1_mean_2M | Feat_num1_count_2M | Feat_num1_sum_3M | Feat_num1_mean_3M | ... | Feat_num2_count_6M | Feat_num2_sum_9M | Feat_num2_mean_9M | Feat_num2_count_9M | Feat_num2_sum_12M | Feat_num2_mean_12M | Feat_num2_count_12M | Feat_num2_sum_15M | Feat_num2_mean_15M | Feat_num2_count_15M |
|-----|------------|--------|------------------|-------------------|--------------------|------------------|-------------------|--------------------|------------------|-------------------|-----|--------------------|------------------|-------------------|--------------------|-------------------|--------------------|---------------------|-------------------|--------------------|---------------------|
| 0   | 75         | 202207 | NaN              | NaN               | NaN                | NaN              | NaN               | NaN                | NaN              | NaN               | ... | NaN                | NaN              | NaN               | NaN                | NaN               | NaN                | NaN                 | 66                | 66.0               | 1                   |
| 1   | 30         | 202209 | NaN              | NaN               | NaN                | NaN              | NaN               | NaN                | NaN              | NaN               | ... | NaN                | NaN              | NaN               | NaN                | 90.0              | 90.0               | 1.0                 | 90                | 90.0               | 1                   |
| 2   | 70         | 202301 | NaN              | NaN               | NaN                | NaN              | NaN               | NaN                | NaN              | NaN               | ... | NaN                | 111.0            | 55.5              | 2.0                | 111.0             | 55.5               | 2.0                 | 111               | 55.5               | 2                   |
| 3   | 76         | 202302 | NaN              | NaN               | NaN                | NaN              | NaN               | NaN                | NaN              | NaN               | ... | 1.0                | 70.0             | 70.0              | 1.0                | 70.0              | 70.0               | 1.0                 | 70                | 70.0               | 1                   |
| 4   | 81         | 202305 | NaN              | NaN               | NaN                | NaN              | NaN               | NaN                | 76.0             | 76.0              | ... | 1.0                | 17.0             | 17.0              | 1.0                | 17.0              | 17.0               | 1.0                 | 17                | 17.0               | 1                   |

<p>5 rows × 44 columns</p>
</div>

### Variáveis categóricas

Para variáveis categóricas, as operações são ‘nunique’ e mode.

``` python
id_cols = ['ID_cliente','Safra']
trns_time = 'Safra'
ref_time = 'safra_ref'
value_var = ['Feat_cat']
window = [-1,-2,-3,-6,-9,-12,-15]
operations = ['nunique', mode]

df_vars_cat = create_vars_in_time(df,id_cols,trns_time,ref_time,value_var,window,operations)
```

``` python
df_vars_cat.head()
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID_cliente | Safra  | Feat_cat_nunique_1M | Feat_cat_mode_1M | Feat_cat_nunique_2M | Feat_cat_mode_2M | Feat_cat_nunique_3M | Feat_cat_mode_3M | Feat_cat_nunique_6M | Feat_cat_mode_6M | Feat_cat_nunique_9M | Feat_cat_mode_9M | Feat_cat_nunique_12M | Feat_cat_mode_12M | Feat_cat_nunique_15M | Feat_cat_mode_15M |
|-----|------------|--------|---------------------|------------------|---------------------|------------------|---------------------|------------------|---------------------|------------------|---------------------|------------------|----------------------|-------------------|----------------------|-------------------|
| 0   | 75         | 202207 | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | NaN                  | NaN               | 1                    | A                 |
| 1   | 30         | 202209 | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | 1.0                  | B                 | 1                    | B                 |
| 2   | 70         | 202301 | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | 2.0                 | A                | 2.0                  | A                 | 2                    | A                 |
| 3   | 76         | 202302 | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | 1.0                 | A                | 1.0                 | A                | 1.0                  | A                 | 1                    | A                 |
| 4   | 81         | 202305 | NaN                 | NaN              | NaN                 | NaN              | 1.0                 | C                | 1.0                 | C                | 1.0                 | C                | 1.0                  | C                 | 1                    | C                 |

</div>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ravennaro/create_vars",
    "name": "create-vars",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "nbdev jupyter notebook python",
    "author": "Ravenna Oliveira",
    "author_email": "ravenna.rro@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/d0/4d/804a917751890b89fce8d5bfa1e0b32e72668f7016766a28e3cd67887b64/create_vars-0.0.1.tar.gz",
    "platform": null,
    "description": "# create_vars\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n``` python\nfrom create_vars.vars import *\n```\n\n``` python\nimport pandas as pd\nimport random\nimport numpy as np\n```\n\nCriando vari\u00e1veis num\u00e9ricas e categ\u00f3ricas\n\n## Install\n\n``` sh\npip install create_vars\n```\n\n## How to use\n\n### Criando dataframe aleat\u00f3rio\n\nCriando DataFrame com vari\u00e1veis categ\u00f3ricas e num\u00e9ricas\n\n``` python\n# Criando o DataFrame\ndata = {\n    'ID_cliente': [random.choice(list(range(1, 101)))for _ in range(100)],\n    'Safra': [random.choice([202207, 202209, 202212, 202301, 202207, 202302, 202305, 202306]) for _ in range(100)],\n    'Feat_cat': [random.choice(['A', 'B', 'C']) for _ in range(100)],    \n    'Feat_num1': np.random.randint(0, 100, size=100),\n    'Feat_num2': np.random.randint(0, 100, size=100)\n}\ndf = pd.DataFrame(data)\n```\n\n``` python\ndf.head()\n```\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|     | ID_cliente | Safra  | Feat_cat | Feat_num1 | Feat_num2 |\n|-----|------------|--------|----------|-----------|-----------|\n| 0   | 75         | 202207 | A        | 72        | 66        |\n| 1   | 30         | 202209 | B        | 44        | 90        |\n| 2   | 70         | 202301 | B        | 82        | 33        |\n| 3   | 76         | 202302 | A        | 37        | 70        |\n| 4   | 81         | 202305 | C        | 76        | 17        |\n\n</div>\n\nO Dataframe criado tem o ID do cliente aleat\u00f3rio, que pode se repetir em\ndatas diferentes.Por exemplo:\n\n``` python\ndf.groupby('ID_cliente')['Safra'].value_counts().sort_values(ascending=False)\n```\n\n    ID_cliente  Safra \n    11          202207    2\n    70          202301    2\n    3           202305    1\n    73          202209    1\n    79          202207    1\n                         ..\n    35          202302    1\n    34          202305    1\n                202306    1\n    33          202306    1\n    99          202306    1\n    Name: count, Length: 98, dtype: int64\n\nA safra corresponde a data que cada vari\u00e1vel foi calculada. Ela precisa\nestar no formato YYYYMM ou YYYYMMDD. Para o nosso exemplo teremos a\nseguinte distribui\u00e7\u00e3o das datas:\n\n``` python\ndf['Safra'].value_counts().sort_index()\n```\n\n    Safra\n    202207    23\n    202209    12\n    202212     5\n    202301    13\n    202302    15\n    202305    15\n    202306    17\n    Name: count, dtype: int64\n\n### Safra de ref\n\nCriando safra de referencia no formato YYYYMM:\n\n``` python\ndf['safra_ref'] = '20230702'\n```\n\nTamb\u00e9m podemos utilizar o formato YYYYMMDD.\n\n### Vari\u00e1veis num\u00e9ricas\n\nA partir do ID_cliente do cliente e da Safra, agrupamos as entradas e\ncalculamos vari\u00e1veis do tipo \u2018sum\u2019, \u2018mean\u2019 e \u2018count\u2019 que s\u00e3o fornecidas\nem forma de lista em \u2018operations\u2019 para vari\u00e1veis num\u00e9ricas. As vari\u00e1veis\nutilizadas s\u00e3o listadas em \u2018value_var\u2019 e calculadas nas janelas de tempo\nem \u2018window\u2019 para a data de refer\u00eancia em \u2018ref_time_var\u2019.\n\n``` python\nid_cols = ['ID_cliente','Safra']\ntrns_time = 'Safra'\nref_time = 'safra_ref'\nvalue_var = ['Feat_num1','Feat_num2']\nwindow = [-1,-2,-3,-6,-9,-12,-15]\noperations = ['sum','mean','count']\n\ndf_vars_num = create_vars_in_time(df,id_cols,trns_time,ref_time,value_var,window,operations)\n```\n\n``` python\ndf_vars_num.head()\n```\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|     | ID_cliente | Safra  | Feat_num1_sum_1M | Feat_num1_mean_1M | Feat_num1_count_1M | Feat_num1_sum_2M | Feat_num1_mean_2M | Feat_num1_count_2M | Feat_num1_sum_3M | Feat_num1_mean_3M | ... | Feat_num2_count_6M | Feat_num2_sum_9M | Feat_num2_mean_9M | Feat_num2_count_9M | Feat_num2_sum_12M | Feat_num2_mean_12M | Feat_num2_count_12M | Feat_num2_sum_15M | Feat_num2_mean_15M | Feat_num2_count_15M |\n|-----|------------|--------|------------------|-------------------|--------------------|------------------|-------------------|--------------------|------------------|-------------------|-----|--------------------|------------------|-------------------|--------------------|-------------------|--------------------|---------------------|-------------------|--------------------|---------------------|\n| 0   | 75         | 202207 | NaN              | NaN               | NaN                | NaN              | NaN               | NaN                | NaN              | NaN               | ... | NaN                | NaN              | NaN               | NaN                | NaN               | NaN                | NaN                 | 66                | 66.0               | 1                   |\n| 1   | 30         | 202209 | NaN              | NaN               | NaN                | NaN              | NaN               | NaN                | NaN              | NaN               | ... | NaN                | NaN              | NaN               | NaN                | 90.0              | 90.0               | 1.0                 | 90                | 90.0               | 1                   |\n| 2   | 70         | 202301 | NaN              | NaN               | NaN                | NaN              | NaN               | NaN                | NaN              | NaN               | ... | NaN                | 111.0            | 55.5              | 2.0                | 111.0             | 55.5               | 2.0                 | 111               | 55.5               | 2                   |\n| 3   | 76         | 202302 | NaN              | NaN               | NaN                | NaN              | NaN               | NaN                | NaN              | NaN               | ... | 1.0                | 70.0             | 70.0              | 1.0                | 70.0              | 70.0               | 1.0                 | 70                | 70.0               | 1                   |\n| 4   | 81         | 202305 | NaN              | NaN               | NaN                | NaN              | NaN               | NaN                | 76.0             | 76.0              | ... | 1.0                | 17.0             | 17.0              | 1.0                | 17.0              | 17.0               | 1.0                 | 17                | 17.0               | 1                   |\n\n<p>5 rows \u00d7 44 columns</p>\n</div>\n\n### Vari\u00e1veis categ\u00f3ricas\n\nPara vari\u00e1veis categ\u00f3ricas, as opera\u00e7\u00f5es s\u00e3o \u2018nunique\u2019 e mode.\n\n``` python\nid_cols = ['ID_cliente','Safra']\ntrns_time = 'Safra'\nref_time = 'safra_ref'\nvalue_var = ['Feat_cat']\nwindow = [-1,-2,-3,-6,-9,-12,-15]\noperations = ['nunique', mode]\n\ndf_vars_cat = create_vars_in_time(df,id_cols,trns_time,ref_time,value_var,window,operations)\n```\n\n``` python\ndf_vars_cat.head()\n```\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|     | ID_cliente | Safra  | Feat_cat_nunique_1M | Feat_cat_mode_1M | Feat_cat_nunique_2M | Feat_cat_mode_2M | Feat_cat_nunique_3M | Feat_cat_mode_3M | Feat_cat_nunique_6M | Feat_cat_mode_6M | Feat_cat_nunique_9M | Feat_cat_mode_9M | Feat_cat_nunique_12M | Feat_cat_mode_12M | Feat_cat_nunique_15M | Feat_cat_mode_15M |\n|-----|------------|--------|---------------------|------------------|---------------------|------------------|---------------------|------------------|---------------------|------------------|---------------------|------------------|----------------------|-------------------|----------------------|-------------------|\n| 0   | 75         | 202207 | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | NaN                  | NaN               | 1                    | A                 |\n| 1   | 30         | 202209 | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | 1.0                  | B                 | 1                    | B                 |\n| 2   | 70         | 202301 | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | 2.0                 | A                | 2.0                  | A                 | 2                    | A                 |\n| 3   | 76         | 202302 | NaN                 | NaN              | NaN                 | NaN              | NaN                 | NaN              | 1.0                 | A                | 1.0                 | A                | 1.0                  | A                 | 1                    | A                 |\n| 4   | 81         | 202305 | NaN                 | NaN              | NaN                 | NaN              | 1.0                 | C                | 1.0                 | C                | 1.0                 | C                | 1.0                  | C                 | 1                    | C                 |\n\n</div>\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "Create variables in time",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/ravennaro/create_vars"
    },
    "split_keywords": [
        "nbdev",
        "jupyter",
        "notebook",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5b67b6479f8f0e277f8c56e7021241dca38d600085f6b937282041ab3a7bc192",
                "md5": "fd1d0fa0121e60cf55ccf9af86f78f6a",
                "sha256": "bdaa1f870834b318e7e3a23b6efe5f2a0b0680cef31e236ebec0d85ca0b0327a"
            },
            "downloads": -1,
            "filename": "create_vars-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fd1d0fa0121e60cf55ccf9af86f78f6a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 9227,
            "upload_time": "2023-08-11T17:45:32",
            "upload_time_iso_8601": "2023-08-11T17:45:32.245901Z",
            "url": "https://files.pythonhosted.org/packages/5b/67/b6479f8f0e277f8c56e7021241dca38d600085f6b937282041ab3a7bc192/create_vars-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d04d804a917751890b89fce8d5bfa1e0b32e72668f7016766a28e3cd67887b64",
                "md5": "9e45702f22c75fb52fef46f0692fcdc0",
                "sha256": "cb783e85fde7ec1c10bdaa51f3bad240eb96b732eb0536057ebf77a16624ac0e"
            },
            "downloads": -1,
            "filename": "create_vars-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9e45702f22c75fb52fef46f0692fcdc0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 10430,
            "upload_time": "2023-08-11T17:45:33",
            "upload_time_iso_8601": "2023-08-11T17:45:33.500565Z",
            "url": "https://files.pythonhosted.org/packages/d0/4d/804a917751890b89fce8d5bfa1e0b32e72668f7016766a28e3cd67887b64/create_vars-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-11 17:45:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ravennaro",
    "github_project": "create_vars",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "create-vars"
}
        
Elapsed time: 0.20015s