# create_vars
<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
``` python
from create_vars.vars import *
```
``` python
import pandas as pd
import random
import numpy as np
```
Criando variáveis numéricas e categóricas
## Install
``` sh
pip install create_vars
```
## How to use
### Criando dataframe aleatório
Criando DataFrame com variáveis categóricas e numéricas
``` python
# Criando o DataFrame
data = {
'ID_cliente': [random.choice(list(range(1, 101)))for _ in range(100)],
'Safra': [random.choice([202207, 202209, 202212, 202301, 202207, 202302, 202305, 202306]) for _ in range(100)],
'Feat_cat': [random.choice(['A', 'B', 'C']) for _ in range(100)],
'Feat_num1': np.random.randint(0, 100, size=100),
'Feat_num2': np.random.randint(0, 100, size=100)
}
df = pd.DataFrame(data)
```
``` python
df.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID_cliente | Safra | Feat_cat | Feat_num1 | Feat_num2 |
|-----|------------|--------|----------|-----------|-----------|
| 0 | 75 | 202207 | A | 72 | 66 |
| 1 | 30 | 202209 | B | 44 | 90 |
| 2 | 70 | 202301 | B | 82 | 33 |
| 3 | 76 | 202302 | A | 37 | 70 |
| 4 | 81 | 202305 | C | 76 | 17 |
</div>
O Dataframe criado tem o ID do cliente aleatório, que pode se repetir em
datas diferentes.Por exemplo:
``` python
df.groupby('ID_cliente')['Safra'].value_counts().sort_values(ascending=False)
```
ID_cliente Safra
11 202207 2
70 202301 2
3 202305 1
73 202209 1
79 202207 1
..
35 202302 1
34 202305 1
202306 1
33 202306 1
99 202306 1
Name: count, Length: 98, dtype: int64
A safra corresponde a data que cada variável foi calculada. Ela precisa
estar no formato YYYYMM ou YYYYMMDD. Para o nosso exemplo teremos a
seguinte distribuição das datas:
``` python
df['Safra'].value_counts().sort_index()
```
Safra
202207 23
202209 12
202212 5
202301 13
202302 15
202305 15
202306 17
Name: count, dtype: int64
### Safra de ref
Criando safra de referencia no formato YYYYMM:
``` python
df['safra_ref'] = '20230702'
```
Também podemos utilizar o formato YYYYMMDD.
### Variáveis numéricas
A partir do ID_cliente do cliente e da Safra, agrupamos as entradas e
calculamos variáveis do tipo ‘sum’, ‘mean’ e ‘count’ que são fornecidas
em forma de lista em ‘operations’ para variáveis numéricas. As variáveis
utilizadas são listadas em ‘value_var’ e calculadas nas janelas de tempo
em ‘window’ para a data de referência em ‘ref_time_var’.
``` python
id_cols = ['ID_cliente','Safra']
trns_time = 'Safra'
ref_time = 'safra_ref'
value_var = ['Feat_num1','Feat_num2']
window = [-1,-2,-3,-6,-9,-12,-15]
operations = ['sum','mean','count']
df_vars_num = create_vars_in_time(df,id_cols,trns_time,ref_time,value_var,window,operations)
```
``` python
df_vars_num.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID_cliente | Safra | Feat_num1_sum_1M | Feat_num1_mean_1M | Feat_num1_count_1M | Feat_num1_sum_2M | Feat_num1_mean_2M | Feat_num1_count_2M | Feat_num1_sum_3M | Feat_num1_mean_3M | ... | Feat_num2_count_6M | Feat_num2_sum_9M | Feat_num2_mean_9M | Feat_num2_count_9M | Feat_num2_sum_12M | Feat_num2_mean_12M | Feat_num2_count_12M | Feat_num2_sum_15M | Feat_num2_mean_15M | Feat_num2_count_15M |
|-----|------------|--------|------------------|-------------------|--------------------|------------------|-------------------|--------------------|------------------|-------------------|-----|--------------------|------------------|-------------------|--------------------|-------------------|--------------------|---------------------|-------------------|--------------------|---------------------|
| 0 | 75 | 202207 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 66 | 66.0 | 1 |
| 1 | 30 | 202209 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | 90.0 | 90.0 | 1.0 | 90 | 90.0 | 1 |
| 2 | 70 | 202301 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | 111.0 | 55.5 | 2.0 | 111.0 | 55.5 | 2.0 | 111 | 55.5 | 2 |
| 3 | 76 | 202302 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1.0 | 70.0 | 70.0 | 1.0 | 70.0 | 70.0 | 1.0 | 70 | 70.0 | 1 |
| 4 | 81 | 202305 | NaN | NaN | NaN | NaN | NaN | NaN | 76.0 | 76.0 | ... | 1.0 | 17.0 | 17.0 | 1.0 | 17.0 | 17.0 | 1.0 | 17 | 17.0 | 1 |
<p>5 rows × 44 columns</p>
</div>
### Variáveis categóricas
Para variáveis categóricas, as operações são ‘nunique’ e mode.
``` python
id_cols = ['ID_cliente','Safra']
trns_time = 'Safra'
ref_time = 'safra_ref'
value_var = ['Feat_cat']
window = [-1,-2,-3,-6,-9,-12,-15]
operations = ['nunique', mode]
df_vars_cat = create_vars_in_time(df,id_cols,trns_time,ref_time,value_var,window,operations)
```
``` python
df_vars_cat.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID_cliente | Safra | Feat_cat_nunique_1M | Feat_cat_mode_1M | Feat_cat_nunique_2M | Feat_cat_mode_2M | Feat_cat_nunique_3M | Feat_cat_mode_3M | Feat_cat_nunique_6M | Feat_cat_mode_6M | Feat_cat_nunique_9M | Feat_cat_mode_9M | Feat_cat_nunique_12M | Feat_cat_mode_12M | Feat_cat_nunique_15M | Feat_cat_mode_15M |
|-----|------------|--------|---------------------|------------------|---------------------|------------------|---------------------|------------------|---------------------|------------------|---------------------|------------------|----------------------|-------------------|----------------------|-------------------|
| 0 | 75 | 202207 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | A |
| 1 | 30 | 202209 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | B | 1 | B |
| 2 | 70 | 202301 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.0 | A | 2.0 | A | 2 | A |
| 3 | 76 | 202302 | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | A | 1.0 | A | 1.0 | A | 1 | A |
| 4 | 81 | 202305 | NaN | NaN | NaN | NaN | 1.0 | C | 1.0 | C | 1.0 | C | 1.0 | C | 1 | C |
</div>
Raw data
{
"_id": null,
"home_page": "https://github.com/ravennaro/create_vars",
"name": "create-vars",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "nbdev jupyter notebook python",
"author": "Ravenna Oliveira",
"author_email": "ravenna.rro@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d0/4d/804a917751890b89fce8d5bfa1e0b32e72668f7016766a28e3cd67887b64/create_vars-0.0.1.tar.gz",
"platform": null,
"description": "# create_vars\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n``` python\nfrom create_vars.vars import *\n```\n\n``` python\nimport pandas as pd\nimport random\nimport numpy as np\n```\n\nCriando vari\u00e1veis num\u00e9ricas e categ\u00f3ricas\n\n## Install\n\n``` sh\npip install create_vars\n```\n\n## How to use\n\n### Criando dataframe aleat\u00f3rio\n\nCriando DataFrame com vari\u00e1veis categ\u00f3ricas e num\u00e9ricas\n\n``` python\n# Criando o DataFrame\ndata = {\n 'ID_cliente': [random.choice(list(range(1, 101)))for _ in range(100)],\n 'Safra': [random.choice([202207, 202209, 202212, 202301, 202207, 202302, 202305, 202306]) for _ in range(100)],\n 'Feat_cat': [random.choice(['A', 'B', 'C']) for _ in range(100)], \n 'Feat_num1': np.random.randint(0, 100, size=100),\n 'Feat_num2': np.random.randint(0, 100, size=100)\n}\ndf = pd.DataFrame(data)\n```\n\n``` python\ndf.head()\n```\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID_cliente | Safra | Feat_cat | Feat_num1 | Feat_num2 |\n|-----|------------|--------|----------|-----------|-----------|\n| 0 | 75 | 202207 | A | 72 | 66 |\n| 1 | 30 | 202209 | B | 44 | 90 |\n| 2 | 70 | 202301 | B | 82 | 33 |\n| 3 | 76 | 202302 | A | 37 | 70 |\n| 4 | 81 | 202305 | C | 76 | 17 |\n\n</div>\n\nO Dataframe criado tem o ID do cliente aleat\u00f3rio, que pode se repetir em\ndatas diferentes.Por exemplo:\n\n``` python\ndf.groupby('ID_cliente')['Safra'].value_counts().sort_values(ascending=False)\n```\n\n ID_cliente Safra \n 11 202207 2\n 70 202301 2\n 3 202305 1\n 73 202209 1\n 79 202207 1\n ..\n 35 202302 1\n 34 202305 1\n 202306 1\n 33 202306 1\n 99 202306 1\n Name: count, Length: 98, dtype: int64\n\nA safra corresponde a data que cada vari\u00e1vel foi calculada. Ela precisa\nestar no formato YYYYMM ou YYYYMMDD. Para o nosso exemplo teremos a\nseguinte distribui\u00e7\u00e3o das datas:\n\n``` python\ndf['Safra'].value_counts().sort_index()\n```\n\n Safra\n 202207 23\n 202209 12\n 202212 5\n 202301 13\n 202302 15\n 202305 15\n 202306 17\n Name: count, dtype: int64\n\n### Safra de ref\n\nCriando safra de referencia no formato YYYYMM:\n\n``` python\ndf['safra_ref'] = '20230702'\n```\n\nTamb\u00e9m podemos utilizar o formato YYYYMMDD.\n\n### Vari\u00e1veis num\u00e9ricas\n\nA partir do ID_cliente do cliente e da Safra, agrupamos as entradas e\ncalculamos vari\u00e1veis do tipo \u2018sum\u2019, \u2018mean\u2019 e \u2018count\u2019 que s\u00e3o fornecidas\nem forma de lista em \u2018operations\u2019 para vari\u00e1veis num\u00e9ricas. As vari\u00e1veis\nutilizadas s\u00e3o listadas em \u2018value_var\u2019 e calculadas nas janelas de tempo\nem \u2018window\u2019 para a data de refer\u00eancia em \u2018ref_time_var\u2019.\n\n``` python\nid_cols = ['ID_cliente','Safra']\ntrns_time = 'Safra'\nref_time = 'safra_ref'\nvalue_var = ['Feat_num1','Feat_num2']\nwindow = [-1,-2,-3,-6,-9,-12,-15]\noperations = ['sum','mean','count']\n\ndf_vars_num = create_vars_in_time(df,id_cols,trns_time,ref_time,value_var,window,operations)\n```\n\n``` python\ndf_vars_num.head()\n```\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID_cliente | Safra | Feat_num1_sum_1M | Feat_num1_mean_1M | Feat_num1_count_1M | Feat_num1_sum_2M | Feat_num1_mean_2M | Feat_num1_count_2M | Feat_num1_sum_3M | Feat_num1_mean_3M | ... | Feat_num2_count_6M | Feat_num2_sum_9M | Feat_num2_mean_9M | Feat_num2_count_9M | Feat_num2_sum_12M | Feat_num2_mean_12M | Feat_num2_count_12M | Feat_num2_sum_15M | Feat_num2_mean_15M | Feat_num2_count_15M |\n|-----|------------|--------|------------------|-------------------|--------------------|------------------|-------------------|--------------------|------------------|-------------------|-----|--------------------|------------------|-------------------|--------------------|-------------------|--------------------|---------------------|-------------------|--------------------|---------------------|\n| 0 | 75 | 202207 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 66 | 66.0 | 1 |\n| 1 | 30 | 202209 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | 90.0 | 90.0 | 1.0 | 90 | 90.0 | 1 |\n| 2 | 70 | 202301 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | 111.0 | 55.5 | 2.0 | 111.0 | 55.5 | 2.0 | 111 | 55.5 | 2 |\n| 3 | 76 | 202302 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1.0 | 70.0 | 70.0 | 1.0 | 70.0 | 70.0 | 1.0 | 70 | 70.0 | 1 |\n| 4 | 81 | 202305 | NaN | NaN | NaN | NaN | NaN | NaN | 76.0 | 76.0 | ... | 1.0 | 17.0 | 17.0 | 1.0 | 17.0 | 17.0 | 1.0 | 17 | 17.0 | 1 |\n\n<p>5 rows \u00d7 44 columns</p>\n</div>\n\n### Vari\u00e1veis categ\u00f3ricas\n\nPara vari\u00e1veis categ\u00f3ricas, as opera\u00e7\u00f5es s\u00e3o \u2018nunique\u2019 e mode.\n\n``` python\nid_cols = ['ID_cliente','Safra']\ntrns_time = 'Safra'\nref_time = 'safra_ref'\nvalue_var = ['Feat_cat']\nwindow = [-1,-2,-3,-6,-9,-12,-15]\noperations = ['nunique', mode]\n\ndf_vars_cat = create_vars_in_time(df,id_cols,trns_time,ref_time,value_var,window,operations)\n```\n\n``` python\ndf_vars_cat.head()\n```\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID_cliente | Safra | Feat_cat_nunique_1M | Feat_cat_mode_1M | Feat_cat_nunique_2M | Feat_cat_mode_2M | Feat_cat_nunique_3M | Feat_cat_mode_3M | Feat_cat_nunique_6M | Feat_cat_mode_6M | Feat_cat_nunique_9M | Feat_cat_mode_9M | Feat_cat_nunique_12M | Feat_cat_mode_12M | Feat_cat_nunique_15M | Feat_cat_mode_15M |\n|-----|------------|--------|---------------------|------------------|---------------------|------------------|---------------------|------------------|---------------------|------------------|---------------------|------------------|----------------------|-------------------|----------------------|-------------------|\n| 0 | 75 | 202207 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | A |\n| 1 | 30 | 202209 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | B | 1 | B |\n| 2 | 70 | 202301 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.0 | A | 2.0 | A | 2 | A |\n| 3 | 76 | 202302 | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | A | 1.0 | A | 1.0 | A | 1 | A |\n| 4 | 81 | 202305 | NaN | NaN | NaN | NaN | 1.0 | C | 1.0 | C | 1.0 | C | 1.0 | C | 1 | C |\n\n</div>\n",
"bugtrack_url": null,
"license": "Apache Software License 2.0",
"summary": "Create variables in time",
"version": "0.0.1",
"project_urls": {
"Homepage": "https://github.com/ravennaro/create_vars"
},
"split_keywords": [
"nbdev",
"jupyter",
"notebook",
"python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5b67b6479f8f0e277f8c56e7021241dca38d600085f6b937282041ab3a7bc192",
"md5": "fd1d0fa0121e60cf55ccf9af86f78f6a",
"sha256": "bdaa1f870834b318e7e3a23b6efe5f2a0b0680cef31e236ebec0d85ca0b0327a"
},
"downloads": -1,
"filename": "create_vars-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fd1d0fa0121e60cf55ccf9af86f78f6a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 9227,
"upload_time": "2023-08-11T17:45:32",
"upload_time_iso_8601": "2023-08-11T17:45:32.245901Z",
"url": "https://files.pythonhosted.org/packages/5b/67/b6479f8f0e277f8c56e7021241dca38d600085f6b937282041ab3a7bc192/create_vars-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d04d804a917751890b89fce8d5bfa1e0b32e72668f7016766a28e3cd67887b64",
"md5": "9e45702f22c75fb52fef46f0692fcdc0",
"sha256": "cb783e85fde7ec1c10bdaa51f3bad240eb96b732eb0536057ebf77a16624ac0e"
},
"downloads": -1,
"filename": "create_vars-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "9e45702f22c75fb52fef46f0692fcdc0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 10430,
"upload_time": "2023-08-11T17:45:33",
"upload_time_iso_8601": "2023-08-11T17:45:33.500565Z",
"url": "https://files.pythonhosted.org/packages/d0/4d/804a917751890b89fce8d5bfa1e0b32e72668f7016766a28e3cd67887b64/create_vars-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-11 17:45:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ravennaro",
"github_project": "create_vars",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "create-vars"
}