# featsql
<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
<span style="color: blue;">(EN)</span> There are three kinds of
functions that can be used to create features from numerical,
categorical and aggregated data. They are:
- For the SQLite database:
- `sqlite_create_query_num()`: Numerical variables such as mean, sum,
etc., within the user-provided window of n months.
- `sqlite_create_query_cat()`: Categorical variables such as mean,
sum, etc., within the user-provided window of n months.
- `sqlite_create_query_agregada()`: Numerical variables grouped by a
specific categorical variable value with mean, sum, etc., within the
user-provided window of n months.
- For the MySQL database:
- `mysql_create_query_num()`: Numerical variables such as mean, sum,
etc., within the user-provided window of n months.
- `mysql_create_query_cat()`: Categorical variables such as mean, sum,
etc., within the user-provided window of n months.
- `mysql_create_query_agregada()`: Numerical variables grouped by a
specific categorical variable value with mean, sum, etc., within the
user-provided window of n months.
- For the Snowflake database:
- `snow_create_query_num()`: Numerical variables such as mean, sum,
etc., within the user-provided window of n months.
- `snow_create_query_cat()`: Categorical variables such as mean, sum,
etc., within the user-provided window of n months.
- `snow_create_query_agregada()`: Numerical variables grouped by a
specific categorical variable value with mean, sum, etc., within the
user-provided window of n months.
Next, we have a detailed example for creating variables in SQLite and
MySQL databases, along with an example output for the Snowflake
database.
<span style="color: green;">(PT)</span> Existem três tipos de funções
que podem ser usadas para criar variáveis a partir de dados numéricos,
categóricos e agregados. São elas: - Para o banco SQLite:
+ ``sqlite_create_query_num()``: variáveis numéricas como média, soma e etc na janela de n meses fornecida pelo usuário.
+ ``sqlite_create_query_cat()``: variáveis categóricas como média, soma e etc na janela de n meses fornecida pelo usuário.
+ ``sqlite_create_query_agregada()``: variáveis numéricas agrupadas por uma valor específico de variável categórica com a média, soma e etc na janela de n meses fornecida pelo usuário.
- Para o banco MySQL:
- `mysql_create_query_num()`: variáveis numéricas como média, soma e
etc na janela de n meses fornecida pelo usuário.
- `mysql_create_query_cat()`: variáveis categóricas como média, soma e
etc na janela de n meses fornecida pelo usuário.
- `mysql_create_query_agregada()`: variáveis numéricas agrupadas por
uma valor específico de variável categórica com a média, soma e etc
na janela de n meses fornecida pelo usuário.
- Para o banco snowflake:
- `snow_create_query_num()`: variáveis numéricas como média, soma e
etc na janela de n meses fornecida pelo usuário.
- `snow_create_query_cat()`: variáveis categóricas como média, soma e
etc na janela de n meses fornecida pelo usuário.
- `snow_create_query_agregada()`: variáveis numéricas agrupadas por
uma valor específico de variável categórica com a média, soma e etc
na janela de n meses fornecida pelo usuário.
A seguir, temos um exemplo detalhado para a criação de variáveis no
banco sqlite e mysql. Além de um exemplo de output para o banco do tipo
snowflake.
## Install
<span style="color: blue;">(EN)</span> To install, simply use the
command line:
``` sh
pip install featsql
```
<span style="color: green;">(PT)</span> Para instalar, simplesmente use
o comando:
``` sh
pip install featsql
```
## Imports
``` python
from featsql.featsqlite import *
from featsql.featmysql import *
from featsql.featsnow import *
```
``` python
import pandas as pd
```
``` python
import mysql.connector
import sqlite3
pd.set_option('display.max_columns', None)
```
# Exemples
## 1. SQLITE
### Setting up the engine (Configurando a engine)
``` python
url_db = "../../data/mydatabase.db"
conn = sqlite3.connect(url_db)
cursor = conn.cursor()
```
### Initial view of the public (Visão inicial do público)
<span style="color: blue;">(EN)</span> First, let’s observe the format
of the spine table.
<span style="color: green;">(PT)</span> Primeiro vamos observar o
formato da tabela spine
``` python
df_spine = pd.read_sql("SELECT * FROM tb_spine", conn)
df_spine.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA_REF | Target |
|-----|-----|------------|--------|
| 0 | 4 | 2023-02-01 | 0 |
| 1 | 5 | 2023-02-01 | 0 |
| 2 | 6 | 2023-02-01 | 0 |
| 3 | 7 | 2023-02-01 | 0 |
| 4 | 10 | 2023-02-01 | 1 |
</div>
### Initial view of the variables table (Visão inicial da tabela de variáveis)
<span style="color: blue;">(EN)</span> The table tb_feat contains 4
variables, two numeric and two categorical. Note that there are more
unique IDs and dates available in this than in the spine table, as is
often the case.
<span style="color: green;">(PT)</span> A tabela tb_feat contém 4
variáveis, duas sendo numéricas e duas categórica. Perceba que existem
mais ID’s únicos e datas disponíveis nessa tabela do que na tabela
spine, caso que ocorre no dia a dia.
``` python
df_data = pd.read_sql("SELECT * FROM tb_feat", conn)
df_data.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA | FEAT_NUM1 | FEAT_NUM2 | FEAT_CAT1 | FEAT_CAT2 |
|-----|-----|------------|-----------|-----------|-----------|-----------|
| 0 | 1 | 2023-01-01 | -19 | -52 | A | B |
| 1 | 2 | 2023-01-01 | -7 | -33 | A | B |
| 2 | 3 | 2023-01-01 | 6 | 91 | C | C |
| 3 | 4 | 2023-01-01 | 74 | 52 | B | A |
| 4 | 5 | 2023-01-01 | 79 | 77 | B | B |
</div>
### Creation of numerical variables (Criação de variáveis numéricas)
<span style="color: blue;">(EN)</span> The function
sqlite_create_query_num() creates a query text for generating variables
with the sum, minimum, maximum, and average operations on the variables
listed in feat_num_lista and within the time window listed in
list_window.
<span style="color: green;">(PT)</span> A função
sqlite_create_query_num() cria um texto com a query para a criação de
variáveis com as operações soma, mínimo, máximo e média das variáveis
listadas em feat_num_lista e com a janela de tempo listada em
list_window.
``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
query_final_num_sqlite = sqlite_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)
```
Complete query creation with no saved file.
``` python
df_num_sqlite = pd.read_sql(query_final_num_sqlite, conn)
df_num_sqlite.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA_REF | FEAT_NUM1_SUM_1M | FEAT_NUM1_MIN_1M | FEAT_NUM1_MAX_1M | FEAT_NUM1_AVG_1M | FEAT_NUM2_SUM_1M | FEAT_NUM2_MIN_1M | FEAT_NUM2_MAX_1M | FEAT_NUM2_AVG_1M | FEAT_NUM1_SUM_2M | FEAT_NUM1_MIN_2M | FEAT_NUM1_MAX_2M | FEAT_NUM1_AVG_2M | FEAT_NUM2_SUM_2M | FEAT_NUM2_MIN_2M | FEAT_NUM2_MAX_2M | FEAT_NUM2_AVG_2M | FEAT_NUM1_SUM_3M | FEAT_NUM1_MIN_3M | FEAT_NUM1_MAX_3M | FEAT_NUM1_AVG_3M | FEAT_NUM2_SUM_3M | FEAT_NUM2_MIN_3M | FEAT_NUM2_MAX_3M | FEAT_NUM2_AVG_3M |
|-----|-----|------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|
| 0 | 4 | 2023-02-01 | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 | 74 | 74 | 74 | 74.0 | 52 | 52 | 52 | 52.0 | 74 | 74 | 74 | 74.0 | 52 | 52 | 52 | 52.0 |
| 1 | 5 | 2023-02-01 | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 | 79 | 79 | 79 | 79.0 | 77 | 77 | 77 | 77.0 | 79 | 79 | 79 | 79.0 | 77 | 77 | 77 | 77.0 |
| 2 | 6 | 2023-02-01 | -13.0 | -13.0 | -13.0 | -13.0 | -45.0 | -45.0 | -45.0 | -45.0 | -13 | -13 | -13 | -13.0 | -45 | -45 | -45 | -45.0 | -13 | -13 | -13 | -13.0 | -45 | -45 | -45 | -45.0 |
| 3 | 7 | 2023-02-01 | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 | -44 | -44 | -44 | -44.0 | 0 | 0 | 0 | 0.0 | -44 | -44 | -44 | -44.0 | 0 | 0 | 0 | 0.0 |
| 4 | 10 | 2023-02-01 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 | 11 | 11 | 11 | 11.0 | -39 | -39 | -39 | -39.0 | 11 | 11 | 11 | 11.0 | -39 | -39 | -39 | -39.0 |
</div>
<span style="color: blue;">(EN)</span> We can use the `nome_arquivo`
parameter with a name in the format ‘table.sql’ to save the creation
query in a file in the same folder where it is executed.
<span style="color: green;">(PT)</span> Podemos utilizar o parâmetro
`nome_arquivo` com um nome no formato “table.sql” para salvar query de
criação em um arquivo na mesma pasta que é executado.
``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
file_name = 'table.sql'
query_final_num_sqlite = sqlite_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name)
```
Complete query creation with table.sql saved file
<span style="color: blue;">(EN)</span> Besides saving the query to a
file, we can create a table directly in the database using the
`[`sqlite_create_query_num`](https://ravennaro.github.io/featsql/creation_sqlite.html#sqlite_create_query_num)`
function with the creation parameters. Here is an example:
<span style="color: green;">(PT)</span> Além de salvar a query em um
arquivo, podemos criar uma tabela diretamente no banco de dados a partir
da função
`[`sqlite_create_query_num`](https://ravennaro.github.io/featsql/creation_sqlite.html#sqlite_create_query_num)`
com os parâmetros de criação. A seguir um exemplo:
``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
status = True
table_name = 'nome_da_tabela_criada'
file_name = 'table.sql'
query_final_num_sqlite = sqlite_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=conn)
```
Complete query creation with table.sql saved file
<span style="color: blue;">(EN)</span> To verify the creation, let’s
check the database table using the name of the created table:
<span style="color: green;">(PT)</span> Para conferir a criação, vamos
consultar o banco de dados com o nome da tabela que foi criada:
``` python
query=f"""SELECT * from {table_name}"""
df = pd.read_sql(query, conn)
df.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA_REF | FEAT_NUM1_SUM_1M | FEAT_NUM1_MIN_1M | FEAT_NUM1_MAX_1M | FEAT_NUM1_AVG_1M | FEAT_NUM2_SUM_1M | FEAT_NUM2_MIN_1M | FEAT_NUM2_MAX_1M | FEAT_NUM2_AVG_1M | FEAT_NUM1_SUM_2M | FEAT_NUM1_MIN_2M | FEAT_NUM1_MAX_2M | FEAT_NUM1_AVG_2M | FEAT_NUM2_SUM_2M | FEAT_NUM2_MIN_2M | FEAT_NUM2_MAX_2M | FEAT_NUM2_AVG_2M | FEAT_NUM1_SUM_3M | FEAT_NUM1_MIN_3M | FEAT_NUM1_MAX_3M | FEAT_NUM1_AVG_3M | FEAT_NUM2_SUM_3M | FEAT_NUM2_MIN_3M | FEAT_NUM2_MAX_3M | FEAT_NUM2_AVG_3M |
|-----|-----|------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|
| 0 | 4 | 2023-02-01 | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 |
| 1 | 5 | 2023-02-01 | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 |
| 2 | 6 | 2023-02-01 | -13.0 | -13.0 | -13.0 | -13.0 | -45.0 | -45.0 | -45.0 | -45.0 | -13.0 | -13.0 | -13.0 | -13.0 | -45.0 | -45.0 | -45.0 | -45.0 | -13.0 | -13.0 | -13.0 | -13.0 | -45.0 | -45.0 | -45.0 | -45.0 |
| 3 | 7 | 2023-02-01 | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 4 | 10 | 2023-02-01 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 |
</div>
### Creation of categorical variables (Criação de variáveis categóricas)
<span style="color: blue;">(EN)</span> The function
`sqlite_create_query_cat()` generates a query text for creating
variables with the mode of each variable listed in feat_num_lista within
the time window provided in list_window.
<span style="color: green;">(PT)</span> A função
`sqlite_create_query_cat()` cria um texto com a query para a criação de
variáveis com a moda de cada uma das variáveis listadas em
feat_num_lista na janela de tempo fornecida em list_window.
``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1', 'FEAT_CAT2']
list_window = [1, 3, 6]
query_final_cat_sqlite = sqlite_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)
```
Complete query creation with no saved file.
``` python
df_cat_sqlite_sqlite = pd.read_sql(query_final_cat_sqlite, conn)
df_cat_sqlite_sqlite.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA_REF | FEAT_CAT1_MODA_1M | FEAT_CAT2_MODA_1M | FEAT_CAT1_MODA_3M | FEAT_CAT2_MODA_3M | FEAT_CAT1_MODA_6M | FEAT_CAT2_MODA_6M |
|-----|-----|------------|-------------------|-------------------|-------------------|-------------------|-------------------|-------------------|
| 0 | 4 | 2023-02-01 | B | A | B | A | B | A |
| 1 | 5 | 2023-02-01 | B | B | B | B | B | B |
| 2 | 6 | 2023-02-01 | C | B | C | B | C | B |
| 3 | 7 | 2023-02-01 | B | C | B | C | B | C |
| 4 | 10 | 2023-02-01 | A | A | A | A | A | A |
</div>
<span style="color: blue;">(EN)</span> The function
`sqlite_create_query_cat()` also allows saving a file with the created
query and creating the table directly in the database. Here’s an
example:
<span style="color: green;">(PT)</span> A função
`sqlite_create_query_cat()` também permite salvar um arquivo com a query
criada e criar a tabela diretamente no banco de dados. Segue um exemplo:
``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1', 'FEAT_CAT2']
list_window = [1, 3, 6]
status = True
table_name = 'nome_da_tabela_criada'
file_name = 'table.sql'
query_final_cat_sqlite = sqlite_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=conn)
```
Complete query creation with table.sql saved file
<span style="color: blue;">(EN)</span> To verify the creation, let’s
check the database table using the name of the created table:
<span style="color: green;">(PT)</span> Para conferir a criação, vamos
consultar o banco de dados com o nome da tabela que foi criada:
``` python
query=f"""SELECT * from {table_name}"""
df = pd.read_sql(query, conn)
df.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA_REF | FEAT_CAT1_MODA_1M | FEAT_CAT2_MODA_1M | FEAT_CAT1_MODA_3M | FEAT_CAT2_MODA_3M | FEAT_CAT1_MODA_6M | FEAT_CAT2_MODA_6M |
|-----|-----|------------|-------------------|-------------------|-------------------|-------------------|-------------------|-------------------|
| 0 | 4 | 2023-02-01 | B | A | B | A | B | A |
| 1 | 5 | 2023-02-01 | B | B | B | B | B | B |
| 2 | 6 | 2023-02-01 | C | B | C | B | C | B |
| 3 | 7 | 2023-02-01 | B | C | B | C | B | C |
| 4 | 10 | 2023-02-01 | A | A | A | A | A | A |
</div>
### Creation of aggregated variables (Criação de variáveis agregadas)
<span style="color: blue;">(EN)</span> We can create features by
aggregating on a specific value of a categorical variable and
calculating sum, avg, min, and max operations for user-defined numerical
variables. For example, for the variable FEAT_CAT1 having a value of A
or B, we calculate sum, avg, min, and max operations for the variables
FEAT_NUM1 and FEAT_NUM2. Here’s a use case:
<span style="color: green;">(PT)</span> Podemos criar variáveis
agregando por valor específico de variável categórica e calculando as
operações de sum, avg, min e max para as variáveis númericas definidas
pelo usuário. Por exemplo, para a variável `FEAT_CAT1` tendo valor igual
a `A` ou `B`, calculamos as operações de sum, avg, min e max para as
variáveis `FEAT_NUM1` e `FEAT_NUM2`. Segue um caso de uso:
``` python
tb_publico = 'tb_spine'
list_window = [3, 6]
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
list_aggregator_value = ['A', 'B']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'
query = sqlite_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, list_aggregator_value)
```
Complete query creation with no saved file.
``` python
df_sqlite_agregada = pd.read_sql(query, conn)
df_sqlite_agregada.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA_REF | SUM_FEAT_NUM1_FEAT_CAT1_A_3M | MAX_FEAT_NUM1_FEAT_CAT1_A_3M | MIN_FEAT_NUM1_FEAT_CAT1_A_3M | AVG_FEAT_NUM1_FEAT_CAT1_A_3M | SUM_FEAT_NUM2_FEAT_CAT1_A_3M | MAX_FEAT_NUM2_FEAT_CAT1_A_3M | MIN_FEAT_NUM2_FEAT_CAT1_A_3M | AVG_FEAT_NUM2_FEAT_CAT1_A_3M | SUM_FEAT_NUM1_FEAT_CAT1_A_6M | MAX_FEAT_NUM1_FEAT_CAT1_A_6M | MIN_FEAT_NUM1_FEAT_CAT1_A_6M | AVG_FEAT_NUM1_FEAT_CAT1_A_6M | SUM_FEAT_NUM2_FEAT_CAT1_A_6M | MAX_FEAT_NUM2_FEAT_CAT1_A_6M | MIN_FEAT_NUM2_FEAT_CAT1_A_6M | AVG_FEAT_NUM2_FEAT_CAT1_A_6M | SUM_FEAT_NUM1_FEAT_CAT1_B_3M | MAX_FEAT_NUM1_FEAT_CAT1_B_3M | MIN_FEAT_NUM1_FEAT_CAT1_B_3M | AVG_FEAT_NUM1_FEAT_CAT1_B_3M | SUM_FEAT_NUM2_FEAT_CAT1_B_3M | MAX_FEAT_NUM2_FEAT_CAT1_B_3M | MIN_FEAT_NUM2_FEAT_CAT1_B_3M | AVG_FEAT_NUM2_FEAT_CAT1_B_3M | SUM_FEAT_NUM1_FEAT_CAT1_B_6M | MAX_FEAT_NUM1_FEAT_CAT1_B_6M | MIN_FEAT_NUM1_FEAT_CAT1_B_6M | AVG_FEAT_NUM1_FEAT_CAT1_B_6M | SUM_FEAT_NUM2_FEAT_CAT1_B_6M | MAX_FEAT_NUM2_FEAT_CAT1_B_6M | MIN_FEAT_NUM2_FEAT_CAT1_B_6M | AVG_FEAT_NUM2_FEAT_CAT1_B_6M |
|-----|-----|------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| 0 | 4 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 |
| 1 | 5 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 |
| 2 | 6 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 7 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 4 | 10 | 2023-02-01 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
</div>
<span style="color: blue;">(EN)</span> Here’s an example of saving a
file with the created query and directly creating the table in the
database:
<span style="color: green;">(PT)</span> Segue um exemplo para salvar um
arquivo com a query criada e criar a tabela diretamente no banco de
dados:
``` python
tb_publico = 'tb_spine'
list_window = [3, 6]
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
list_aggregator_value = ['A', 'B']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'
status = True
file_name = 'nome_da_tabela_criada'
file_name = 'table.sql'
query = sqlite_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, list_aggregator_value, file_name=file_name, status=status, table_name=table_name, conn=conn)
```
Complete query creation with table.sql saved file
<span style="color: blue;">(EN)</span> To verify the creation, let’s
check the database table using the name of the created table:
<span style="color: green;">(PT)</span> Para conferir a criação, vamos
consultar o banco de dados com o nome da tabela que foi criada:
``` python
query=f"""SELECT * from {table_name}"""
df = pd.read_sql(query, conn)
df.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA_REF | SUM_FEAT_NUM1_FEAT_CAT1_A_3M | MAX_FEAT_NUM1_FEAT_CAT1_A_3M | MIN_FEAT_NUM1_FEAT_CAT1_A_3M | AVG_FEAT_NUM1_FEAT_CAT1_A_3M | SUM_FEAT_NUM2_FEAT_CAT1_A_3M | MAX_FEAT_NUM2_FEAT_CAT1_A_3M | MIN_FEAT_NUM2_FEAT_CAT1_A_3M | AVG_FEAT_NUM2_FEAT_CAT1_A_3M | SUM_FEAT_NUM1_FEAT_CAT1_A_6M | MAX_FEAT_NUM1_FEAT_CAT1_A_6M | MIN_FEAT_NUM1_FEAT_CAT1_A_6M | AVG_FEAT_NUM1_FEAT_CAT1_A_6M | SUM_FEAT_NUM2_FEAT_CAT1_A_6M | MAX_FEAT_NUM2_FEAT_CAT1_A_6M | MIN_FEAT_NUM2_FEAT_CAT1_A_6M | AVG_FEAT_NUM2_FEAT_CAT1_A_6M | SUM_FEAT_NUM1_FEAT_CAT1_B_3M | MAX_FEAT_NUM1_FEAT_CAT1_B_3M | MIN_FEAT_NUM1_FEAT_CAT1_B_3M | AVG_FEAT_NUM1_FEAT_CAT1_B_3M | SUM_FEAT_NUM2_FEAT_CAT1_B_3M | MAX_FEAT_NUM2_FEAT_CAT1_B_3M | MIN_FEAT_NUM2_FEAT_CAT1_B_3M | AVG_FEAT_NUM2_FEAT_CAT1_B_3M | SUM_FEAT_NUM1_FEAT_CAT1_B_6M | MAX_FEAT_NUM1_FEAT_CAT1_B_6M | MIN_FEAT_NUM1_FEAT_CAT1_B_6M | AVG_FEAT_NUM1_FEAT_CAT1_B_6M | SUM_FEAT_NUM2_FEAT_CAT1_B_6M | MAX_FEAT_NUM2_FEAT_CAT1_B_6M | MIN_FEAT_NUM2_FEAT_CAT1_B_6M | AVG_FEAT_NUM2_FEAT_CAT1_B_6M |
|-----|-----|------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| 0 | 4 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 |
| 1 | 5 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 |
| 2 | 6 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 7 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 4 | 10 | 2023-02-01 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
</div>
## 2. MySQL
### Setting up the connection (Configurando a conexão)
``` python
host = "localhost"
user = "sqluser"
password = "password"
database = "mydatabase"
# Conectar ao MySQL
connection = mysql.connector.connect(
host=host,
user=user,
password=password,
database=database
)
```
### Initial view of the public (Visão inicial do público)
<span style="color: blue;">(EN)</span> First, let’s observe the format
of the spine table.
<span style="color: green;">(PT)</span> Primeiro vamos observar o
formato da tabela spine
``` python
df_spine = pd.read_sql("SELECT * FROM tb_spine", connection)
df_spine.head()
```
/tmp/ipykernel_31659/1385309256.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
df_spine = pd.read_sql("SELECT * FROM tb_spine", connection)
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA_REF | Target |
|-----|-----|------------|--------|
| 0 | 4 | 2023-02-01 | 1 |
| 1 | 5 | 2023-02-01 | 0 |
| 2 | 6 | 2023-02-01 | 0 |
| 3 | 7 | 2023-02-01 | 0 |
| 4 | 10 | 2023-02-01 | 0 |
</div>
### Initial view of the variables table (Visão inicial da tabela de variáveis)
<span style="color: blue;">(EN)</span> The table tb_feat contains 4
variables, two numeric and two categorical. Note that there are more
unique IDs and dates available in this than in the spine table, as is
often the case.
<span style="color: green;">(PT)</span> A tabela tb_feat contém 4
variáveis, duas sendo numéricas e duas categórica. Perceba que existem
mais ID’s únicos e datas disponíveis nessa tabela do que na tabela
spine, caso que ocorre no dia a dia.
``` python
df_data = pd.read_sql("SELECT * FROM tb_feat", connection)
df_data.head()
```
/tmp/ipykernel_31659/548342132.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
df_data = pd.read_sql("SELECT * FROM tb_feat", connection)
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA | FEAT_NUM1 | FEAT_NUM2 | FEAT_CAT1 | FEAT_CAT2 |
|-----|-----|------------|-----------|-----------|-----------|-----------|
| 0 | 1 | 2023-01-01 | 73 | 23 | B | B |
| 1 | 3 | 2023-01-01 | 15 | 1 | B | B |
| 2 | 5 | 2023-01-01 | 75 | 71 | A | A |
| 3 | 7 | 2023-01-01 | 73 | 82 | B | C |
| 4 | 9 | 2023-01-01 | 61 | 8 | C | B |
</div>
### Creation of numerical variables (Criação de variáveis numéricas)
<span style="color: blue;">(EN)</span> The function
`mysql_create_query_num()` creates a query text for generating variables
with the sum, minimum, maximum, and average operations on the variables
listed in feat_num_lista and within the time window listed in
list_window.
<span style="color: green;">(PT)</span> A função
`mysql_create_query_num()` cria um texto com a query para a criação de
variáveis com as operações soma, mínimo, máximo e média das variáveis
listadas em feat_num_lista e com a janela de tempo listada em
list_window.
``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
query_final_num_mysql = mysql_create_query_num(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra)
```
Complete query creation with no saved file.
``` python
df_num_mysql = pd.read_sql(query_final_num_mysql, connection)
df_num_mysql.head()
```
/tmp/ipykernel_31659/3114881597.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
df_num_mysql = pd.read_sql(query_final_num_mysql, connection)
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA_REF | FEAT_NUM1_SUM_1M | FEAT_NUM1_MIN_1M | FEAT_NUM1_MAX_1M | FEAT_NUM1_AVG_1M | FEAT_NUM2_SUM_1M | FEAT_NUM2_MIN_1M | FEAT_NUM2_MAX_1M | FEAT_NUM2_AVG_1M | FEAT_NUM1_SUM_2M | FEAT_NUM1_MIN_2M | FEAT_NUM1_MAX_2M | FEAT_NUM1_AVG_2M | FEAT_NUM2_SUM_2M | FEAT_NUM2_MIN_2M | FEAT_NUM2_MAX_2M | FEAT_NUM2_AVG_2M | FEAT_NUM1_SUM_3M | FEAT_NUM1_MIN_3M | FEAT_NUM1_MAX_3M | FEAT_NUM1_AVG_3M | FEAT_NUM2_SUM_3M | FEAT_NUM2_MIN_3M | FEAT_NUM2_MAX_3M | FEAT_NUM2_AVG_3M |
|-----|-----|------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|
| 0 | 4 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 5 | 2023-02-01 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 |
| 2 | 6 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 7 | 2023-02-01 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 |
| 4 | 10 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
</div>
<span style="color: blue;">(EN)</span> We can use the `file_name`
parameter with a name in the format ‘table.sql’ to save the creation
query in a file in the same folder where it is executed.
<span style="color: green;">(PT)</span> Podemos utilizar o parâmetro
`file_name` com um nome no formato “table.sql” para salvar query de
criação em um arquivo na mesma pasta que é executado.
``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
status = True
table_name = 'nome_da_tabela_criada'
file_name = 'table.sql'
query_final_num_mysql = mysql_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=connection)
```
ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ']
-- Create table with the name nome_da_tabela_criada
-- Criar a tabela nome_da_' at line 1
<span style="color: blue;">(EN)</span> To verify the creation, let’s
check the database table using the name of the created table:
<span style="color: green;">(PT)</span> Para conferir a criação, vamos
consultar o banco de dados com o nome da tabela que foi criada:
``` python
query=f"""SELECT * from {table_name}"""
df = pd.read_sql(query, connection)
df.head()
```
/tmp/ipykernel_1661/3805114983.py:4: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
df = pd.read_sql(query, connection)
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA_REF | FEAT_NUM1_SUM_1M | FEAT_NUM1_MIN_1M | FEAT_NUM1_MAX_1M | FEAT_NUM1_AVG_1M | FEAT_NUM2_SUM_1M | FEAT_NUM2_MIN_1M | FEAT_NUM2_MAX_1M | FEAT_NUM2_AVG_1M | FEAT_NUM1_SUM_2M | FEAT_NUM1_MIN_2M | FEAT_NUM1_MAX_2M | FEAT_NUM1_AVG_2M | FEAT_NUM2_SUM_2M | FEAT_NUM2_MIN_2M | FEAT_NUM2_MAX_2M | FEAT_NUM2_AVG_2M | FEAT_NUM1_SUM_3M | FEAT_NUM1_MIN_3M | FEAT_NUM1_MAX_3M | FEAT_NUM1_AVG_3M | FEAT_NUM2_SUM_3M | FEAT_NUM2_MIN_3M | FEAT_NUM2_MAX_3M | FEAT_NUM2_AVG_3M |
|-----|-----|------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|
| 0 | 4 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 5 | 2023-02-01 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 |
| 2 | 6 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 7 | 2023-02-01 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 |
| 4 | 10 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
</div>
### Creation of categorical variables (Criação de variáveis categóricas)
<span style="color: blue;">(EN)</span> The function
`mysql_create_query_cat()` generates a query text for creating variables
with the mode of each variable listed in feat_num_lista within the time
window provided in list_window.
<span style="color: green;">(PT)</span> A função
`mysql_create_query_cat()` cria um texto com a query para a criação de
variáveis com a moda de cada uma das variáveis listadas em
feat_num_lista na janela de tempo fornecida em list_window.
``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1','FEAT_CAT2']
list_window = [1,2,3]
query_final_cat_mysql = mysql_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)
```
Complete query creation with no saved file.
``` python
df_cat_sqlite_mysql = pd.read_sql(query_final_cat_mysql, connection)
df_cat_sqlite_mysql.head()
```
/tmp/ipykernel_1661/827211967.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
df_cat_sqlite_mysql = pd.read_sql(query_final_cat_mysql, connection)
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA_REF | FEAT_CAT1_MODA_1M | FEAT_CAT2_MODA_1M | FEAT_CAT1_MODA_2M | FEAT_CAT2_MODA_2M | FEAT_CAT1_MODA_3M | FEAT_CAT2_MODA_3M |
|-----|-----|------------|-------------------|-------------------|-------------------|-------------------|-------------------|-------------------|
| 0 | 4 | 2023-02-01 | None | None | None | None | None | None |
| 1 | 5 | 2023-02-01 | A | A | A | A | A | A |
| 2 | 6 | 2023-02-01 | None | None | None | None | None | None |
| 3 | 7 | 2023-02-01 | B | C | B | C | B | C |
| 4 | 10 | 2023-02-01 | None | None | None | None | None | None |
</div>
<span style="color: blue;">(EN)</span> The function
`sqlite_create_query_cat()` also allows saving a file with the created
query and creating the table directly in the database. Here’s an
example:
<span style="color: green;">(PT)</span> A função
`sqlite_create_query_cat()` também permite salvar um arquivo com a query
criada e criar a tabela diretamente no banco de dados. Segue um exemplo:
``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1', 'FEAT_CAT2']
list_window = [1, 3, 6]
status = True
table_name = 'nome_da_tabela_criada'
file_name = 'table.sql'
query_final_cat_mysql = mysql_create_query_cat(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=connection)
```
Complete query creation with table.sql saved file
<span style="color: blue;">(EN)</span> To verify the creation, let’s
check the database table using the name of the created table:
<span style="color: green;">(PT)</span> Para conferir a criação, vamos
consultar o banco de dados com o nome da tabela que foi criada:
``` python
query=f"""SELECT * from {table_name}"""
df = pd.read_sql(query, connection)
df.head()
```
/tmp/ipykernel_1661/3805114983.py:4: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
df = pd.read_sql(query, connection)
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA_REF | FEAT_CAT1_MODA_1M | FEAT_CAT2_MODA_1M | FEAT_CAT1_MODA_3M | FEAT_CAT2_MODA_3M | FEAT_CAT1_MODA_6M | FEAT_CAT2_MODA_6M |
|-----|-----|------------|-------------------|-------------------|-------------------|-------------------|-------------------|-------------------|
| 0 | 4 | 2023-02-01 | None | None | None | None | None | None |
| 1 | 5 | 2023-02-01 | A | A | A | A | A | A |
| 2 | 6 | 2023-02-01 | None | None | None | None | None | None |
| 3 | 7 | 2023-02-01 | B | C | B | C | B | C |
| 4 | 10 | 2023-02-01 | None | None | None | None | None | None |
</div>
### Creation of aggregated variables (Criação de variáveis agregadas)
<span style="color: blue;">(EN)</span> We can create features by
aggregating on a specific value of a categorical variable and
calculating sum, avg, min, and max operations for user-defined numerical
variables. For example, for the variable FEAT_CAT1 having a value of A
or B, we calculate sum, avg, min, and max operations for the variables
FEAT_NUM1 and FEAT_NUM2. Here’s a use case:
<span style="color: green;">(PT)</span> Podemos criar variáveis
agregando por valor específico de variável categórica e calculando as
operações de sum, avg, min e max para as variáveis númericas definidas
pelo usuário. Por exemplo, para a variável `FEAT_CAT1` tendo valor igual
a `A` ou `B`, calculamos as operações de sum, avg, min e max para as
variáveis `FEAT_NUM1` e `FEAT_NUM2`. Segue um caso de uso:
``` python
tb_publico = 'tb_spine'
list_window = [3, 6]
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
lista_valor_agragador = ['A', 'B']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'
query = mysql_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador)
```
Complete query creation with no saved file.
``` python
df_mysql_agregada = pd.read_sql(query, connection)
df_mysql_agregada.head()
```
/tmp/ipykernel_1661/789764728.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
df_mysql_agregada = pd.read_sql(query, connection)
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA_REF | SUM_FEAT_NUM1_FEAT_CAT1_A_3M | MAX_FEAT_NUM1_FEAT_CAT1_A_3M | MIN_FEAT_NUM1_FEAT_CAT1_A_3M | AVG_FEAT_NUM1_FEAT_CAT1_A_3M | SUM_FEAT_NUM2_FEAT_CAT1_A_3M | MAX_FEAT_NUM2_FEAT_CAT1_A_3M | MIN_FEAT_NUM2_FEAT_CAT1_A_3M | AVG_FEAT_NUM2_FEAT_CAT1_A_3M | SUM_FEAT_NUM1_FEAT_CAT1_A_6M | MAX_FEAT_NUM1_FEAT_CAT1_A_6M | MIN_FEAT_NUM1_FEAT_CAT1_A_6M | AVG_FEAT_NUM1_FEAT_CAT1_A_6M | SUM_FEAT_NUM2_FEAT_CAT1_A_6M | MAX_FEAT_NUM2_FEAT_CAT1_A_6M | MIN_FEAT_NUM2_FEAT_CAT1_A_6M | AVG_FEAT_NUM2_FEAT_CAT1_A_6M | SUM_FEAT_NUM1_FEAT_CAT1_B_3M | MAX_FEAT_NUM1_FEAT_CAT1_B_3M | MIN_FEAT_NUM1_FEAT_CAT1_B_3M | AVG_FEAT_NUM1_FEAT_CAT1_B_3M | SUM_FEAT_NUM2_FEAT_CAT1_B_3M | MAX_FEAT_NUM2_FEAT_CAT1_B_3M | MIN_FEAT_NUM2_FEAT_CAT1_B_3M | AVG_FEAT_NUM2_FEAT_CAT1_B_3M | SUM_FEAT_NUM1_FEAT_CAT1_B_6M | MAX_FEAT_NUM1_FEAT_CAT1_B_6M | MIN_FEAT_NUM1_FEAT_CAT1_B_6M | AVG_FEAT_NUM1_FEAT_CAT1_B_6M | SUM_FEAT_NUM2_FEAT_CAT1_B_6M | MAX_FEAT_NUM2_FEAT_CAT1_B_6M | MIN_FEAT_NUM2_FEAT_CAT1_B_6M | AVG_FEAT_NUM2_FEAT_CAT1_B_6M |
|-----|-----|------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| 0 | 4 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 5 | 2023-02-01 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 6 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 7 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 |
| 4 | 10 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
</div>
``` python
tb_publico = 'tb_spine'
list_window = [3, 6]
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT2'
lista_valor_agragador = ['B', 'C']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'
query = mysql_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador)
```
Complete query creation with no saved file.
``` python
df_mysql_agregada = pd.read_sql(query, connection)
df_mysql_agregada.head()
```
/tmp/ipykernel_1661/789764728.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
df_mysql_agregada = pd.read_sql(query, connection)
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA_REF | SUM_FEAT_NUM1_FEAT_CAT2_B_3M | MAX_FEAT_NUM1_FEAT_CAT2_B_3M | MIN_FEAT_NUM1_FEAT_CAT2_B_3M | AVG_FEAT_NUM1_FEAT_CAT2_B_3M | SUM_FEAT_NUM2_FEAT_CAT2_B_3M | MAX_FEAT_NUM2_FEAT_CAT2_B_3M | MIN_FEAT_NUM2_FEAT_CAT2_B_3M | AVG_FEAT_NUM2_FEAT_CAT2_B_3M | SUM_FEAT_NUM1_FEAT_CAT2_B_6M | MAX_FEAT_NUM1_FEAT_CAT2_B_6M | MIN_FEAT_NUM1_FEAT_CAT2_B_6M | AVG_FEAT_NUM1_FEAT_CAT2_B_6M | SUM_FEAT_NUM2_FEAT_CAT2_B_6M | MAX_FEAT_NUM2_FEAT_CAT2_B_6M | MIN_FEAT_NUM2_FEAT_CAT2_B_6M | AVG_FEAT_NUM2_FEAT_CAT2_B_6M | SUM_FEAT_NUM1_FEAT_CAT2_C_3M | MAX_FEAT_NUM1_FEAT_CAT2_C_3M | MIN_FEAT_NUM1_FEAT_CAT2_C_3M | AVG_FEAT_NUM1_FEAT_CAT2_C_3M | SUM_FEAT_NUM2_FEAT_CAT2_C_3M | MAX_FEAT_NUM2_FEAT_CAT2_C_3M | MIN_FEAT_NUM2_FEAT_CAT2_C_3M | AVG_FEAT_NUM2_FEAT_CAT2_C_3M | SUM_FEAT_NUM1_FEAT_CAT2_C_6M | MAX_FEAT_NUM1_FEAT_CAT2_C_6M | MIN_FEAT_NUM1_FEAT_CAT2_C_6M | AVG_FEAT_NUM1_FEAT_CAT2_C_6M | SUM_FEAT_NUM2_FEAT_CAT2_C_6M | MAX_FEAT_NUM2_FEAT_CAT2_C_6M | MIN_FEAT_NUM2_FEAT_CAT2_C_6M | AVG_FEAT_NUM2_FEAT_CAT2_C_6M |
|-----|-----|------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| 0 | 4 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 5 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 6 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 7 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 |
| 4 | 10 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
</div>
<span style="color: blue;">(EN)</span> Here’s an example of saving a
file with the created query and directly creating the table in the
database:
<span style="color: green;">(PT)</span> Segue um exemplo para salvar um
arquivo com a query criada e criar a tabela diretamente no banco de
dados:
``` python
tb_publico = 'tb_spine'
list_window = [3, 6]
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
lista_valor_agragador = ['A', 'B']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'
status = True
table_name = 'nome_da_tabela_criada'
file_name = 'table.sql'
query = mysql_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador, file_name=file_name, status=status, table_name=table_name, conn=connection)
```
Complete query creation with table.sql saved file
<span style="color: blue;">(EN)</span> To verify the creation, let’s
check the database table using the name of the created table:
<span style="color: green;">(PT)</span> Para conferir a criação, vamos
consultar o banco de dados com o nome da tabela que foi criada:
``` python
query=f"""SELECT * from {table_name}"""
df = pd.read_sql(query, connection)
df.head()
```
/tmp/ipykernel_1661/3805114983.py:4: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
df = pd.read_sql(query, connection)
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| | ID | SAFRA_REF | SUM_FEAT_NUM1_FEAT_CAT1_A_3M | MAX_FEAT_NUM1_FEAT_CAT1_A_3M | MIN_FEAT_NUM1_FEAT_CAT1_A_3M | AVG_FEAT_NUM1_FEAT_CAT1_A_3M | SUM_FEAT_NUM2_FEAT_CAT1_A_3M | MAX_FEAT_NUM2_FEAT_CAT1_A_3M | MIN_FEAT_NUM2_FEAT_CAT1_A_3M | AVG_FEAT_NUM2_FEAT_CAT1_A_3M | SUM_FEAT_NUM1_FEAT_CAT1_A_6M | MAX_FEAT_NUM1_FEAT_CAT1_A_6M | MIN_FEAT_NUM1_FEAT_CAT1_A_6M | AVG_FEAT_NUM1_FEAT_CAT1_A_6M | SUM_FEAT_NUM2_FEAT_CAT1_A_6M | MAX_FEAT_NUM2_FEAT_CAT1_A_6M | MIN_FEAT_NUM2_FEAT_CAT1_A_6M | AVG_FEAT_NUM2_FEAT_CAT1_A_6M | SUM_FEAT_NUM1_FEAT_CAT1_B_3M | MAX_FEAT_NUM1_FEAT_CAT1_B_3M | MIN_FEAT_NUM1_FEAT_CAT1_B_3M | AVG_FEAT_NUM1_FEAT_CAT1_B_3M | SUM_FEAT_NUM2_FEAT_CAT1_B_3M | MAX_FEAT_NUM2_FEAT_CAT1_B_3M | MIN_FEAT_NUM2_FEAT_CAT1_B_3M | AVG_FEAT_NUM2_FEAT_CAT1_B_3M | SUM_FEAT_NUM1_FEAT_CAT1_B_6M | MAX_FEAT_NUM1_FEAT_CAT1_B_6M | MIN_FEAT_NUM1_FEAT_CAT1_B_6M | AVG_FEAT_NUM1_FEAT_CAT1_B_6M | SUM_FEAT_NUM2_FEAT_CAT1_B_6M | MAX_FEAT_NUM2_FEAT_CAT1_B_6M | MIN_FEAT_NUM2_FEAT_CAT1_B_6M | AVG_FEAT_NUM2_FEAT_CAT1_B_6M |
|-----|-----|------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| 0 | 4 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 5 | 2023-02-01 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 6 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 7 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 |
| 4 | 10 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
</div>
## 3. Snowflake
### Criação de variáveis numéricas
<span style="color: blue;">(EN)</span> The function
`snow_create_query_num()` creates a query text for generating variables
with the sum, minimum, maximum, and average operations on the variables
listed in feat_num_lista and within the time window listed in
list_window.
<span style="color: green;">(PT)</span> A função
`snow_create_query_num()` cria um texto com a query para a criação de
variáveis com as operações soma, mínimo, máximo e média das variáveis
listadas em feat_num_lista e com a janela de tempo listada em
list_window.
``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
query_final_num_snow = snow_create_query_num(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra)
```
Complete query creation with no saved file.
Podemos salvar a consulta da query em um arquivo também:
``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
file_name = 'table.sql'
query_final_num_snow = snow_create_query_num(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra, file_name)
```
Complete query creation with table.sql saved file
### Criação de variáveis categóricas
<span style="color: blue;">(EN)</span> The function
`mysql_create_query_cat()` generates a query text for creating variables
with the mode of each variable listed in feat_num_lista within the time
window provided in list_window.
<span style="color: green;">(PT)</span> A função
`mysql_create_query_cat()` cria um texto com a query para a criação de
variáveis com a moda de cada uma das variáveis listadas em
feat_num_lista na janela de tempo fornecida em list_window.
``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1','FEAT_CAT2']
list_window = [1,2,3]
query_final_cat_snow = snow_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)
```
Complete query creation with no saved file.
``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1','FEAT_CAT2']
list_window = [1,2,3]
file_name = 'table.sql'
query_final_cat_snow = snow_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name)
```
Complete query creation with table.sql saved file
### Criação de variáveis agregadas
<span style="color: blue;">(EN)</span> We can create features by
aggregating on a specific value of a categorical variable and
calculating sum, avg, min, and max operations for user-defined numerical
variables. For example, for the variable FEAT_CAT1 having a value of A
or B, we calculate sum, avg, min, and max operations for the variables
FEAT_NUM1 and FEAT_NUM2. Here’s a use case:
<span style="color: green;">(PT)</span> Podemos criar variáveis
agregando por valor específico de variável categórica e calculando as
operações de sum, avg, min e max para as variáveis númericas definidas
pelo usuário. Por exemplo, para a variável `FEAT_CAT1` tendo valor igual
a `A` ou `B`, calculamos as operações de sum, avg, min e max para as
variáveis `FEAT_NUM1` e `FEAT_NUM2`. Segue um caso de uso:
``` python
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
lista_valor_agragador = ['B', 'C']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'
janelas = [1, 2, 3]
tb_publico = 'tb_spine'
query_final_cat_agre_snow = snow_create_query_agregada(tb_publico, tb_feat, janelas, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador)
```
Complete query creation with no saved file.
<span style="color: blue;">(EN)</span> We can also save the query to a
file:
<span style="color: green;">(PT)</span> Podemos também salvar a consulta
em um arquivo:
``` python
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
lista_valor_agragador = ['B', 'C']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'
janelas = [1, 2, 3]
tb_publico = 'tb_spine'
file_name = 'table.sql'
query_final_cat_agre_snow = snow_create_query_agregada(tb_publico, tb_feat, janelas, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador, file_name)
```
Complete query creation with table.sql saved file
Raw data
{
"_id": null,
"home_page": "https://github.com/ravennaro/featsql",
"name": "featsql",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "nbdev jupyter notebook python",
"author": "Ravenna Oliveria",
"author_email": "ravenna.rro@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/94/0d/8e81f47909c4a062d0b097229e215cc3c26be010d55d423aaee1f9cce2da/featsql-0.1.1.tar.gz",
"platform": null,
"description": "# featsql\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n<span style=\"color: blue;\">(EN)</span> There are three kinds of\nfunctions that can be used to create features from numerical,\ncategorical and aggregated data. They are:\n\n- For the SQLite database:\n\n - `sqlite_create_query_num()`: Numerical variables such as mean, sum,\n etc., within the user-provided window of n months.\n\n - `sqlite_create_query_cat()`: Categorical variables such as mean,\n sum, etc., within the user-provided window of n months.\n\n - `sqlite_create_query_agregada()`: Numerical variables grouped by a\n specific categorical variable value with mean, sum, etc., within the\n user-provided window of n months.\n\n- For the MySQL database:\n\n - `mysql_create_query_num()`: Numerical variables such as mean, sum,\n etc., within the user-provided window of n months.\n\n - `mysql_create_query_cat()`: Categorical variables such as mean, sum,\n etc., within the user-provided window of n months.\n\n - `mysql_create_query_agregada()`: Numerical variables grouped by a\n specific categorical variable value with mean, sum, etc., within the\n user-provided window of n months.\n\n- For the Snowflake database:\n\n - `snow_create_query_num()`: Numerical variables such as mean, sum,\n etc., within the user-provided window of n months.\n\n - `snow_create_query_cat()`: Categorical variables such as mean, sum,\n etc., within the user-provided window of n months.\n\n - `snow_create_query_agregada()`: Numerical variables grouped by a\n specific categorical variable value with mean, sum, etc., within the\n user-provided window of n months.\n\nNext, we have a detailed example for creating variables in SQLite and\nMySQL databases, along with an example output for the Snowflake\ndatabase.\n\n<span style=\"color: green;\">(PT)</span> Existem tr\u00eas tipos de fun\u00e7\u00f5es\nque podem ser usadas para criar vari\u00e1veis a partir de dados num\u00e9ricos,\ncateg\u00f3ricos e agregados. S\u00e3o elas: - Para o banco SQLite:\n\n + ``sqlite_create_query_num()``: vari\u00e1veis num\u00e9ricas como m\u00e9dia, soma e etc na janela de n meses fornecida pelo usu\u00e1rio.\n + ``sqlite_create_query_cat()``: vari\u00e1veis categ\u00f3ricas como m\u00e9dia, soma e etc na janela de n meses fornecida pelo usu\u00e1rio.\n + ``sqlite_create_query_agregada()``: vari\u00e1veis num\u00e9ricas agrupadas por uma valor espec\u00edfico de vari\u00e1vel categ\u00f3rica com a m\u00e9dia, soma e etc na janela de n meses fornecida pelo usu\u00e1rio.\n\n- Para o banco MySQL:\n\n - `mysql_create_query_num()`: vari\u00e1veis num\u00e9ricas como m\u00e9dia, soma e\n etc na janela de n meses fornecida pelo usu\u00e1rio.\n - `mysql_create_query_cat()`: vari\u00e1veis categ\u00f3ricas como m\u00e9dia, soma e\n etc na janela de n meses fornecida pelo usu\u00e1rio.\n - `mysql_create_query_agregada()`: vari\u00e1veis num\u00e9ricas agrupadas por\n uma valor espec\u00edfico de vari\u00e1vel categ\u00f3rica com a m\u00e9dia, soma e etc\n na janela de n meses fornecida pelo usu\u00e1rio.\n\n- Para o banco snowflake:\n\n - `snow_create_query_num()`: vari\u00e1veis num\u00e9ricas como m\u00e9dia, soma e\n etc na janela de n meses fornecida pelo usu\u00e1rio.\n\n - `snow_create_query_cat()`: vari\u00e1veis categ\u00f3ricas como m\u00e9dia, soma e\n etc na janela de n meses fornecida pelo usu\u00e1rio.\n\n - `snow_create_query_agregada()`: vari\u00e1veis num\u00e9ricas agrupadas por\n uma valor espec\u00edfico de vari\u00e1vel categ\u00f3rica com a m\u00e9dia, soma e etc\n na janela de n meses fornecida pelo usu\u00e1rio.\n\nA seguir, temos um exemplo detalhado para a cria\u00e7\u00e3o de vari\u00e1veis no\nbanco sqlite e mysql. Al\u00e9m de um exemplo de output para o banco do tipo\nsnowflake.\n\n## Install\n\n<span style=\"color: blue;\">(EN)</span> To install, simply use the\ncommand line:\n\n``` sh\npip install featsql\n```\n\n<span style=\"color: green;\">(PT)</span> Para instalar, simplesmente use\no comando:\n\n``` sh\npip install featsql\n```\n\n## Imports\n\n``` python\nfrom featsql.featsqlite import *\nfrom featsql.featmysql import *\nfrom featsql.featsnow import *\n```\n\n``` python\nimport pandas as pd\n```\n\n``` python\nimport mysql.connector\nimport sqlite3\npd.set_option('display.max_columns', None)\n```\n\n# Exemples\n\n## 1. SQLITE\n\n### Setting up the engine (Configurando a engine)\n\n``` python\nurl_db = \"../../data/mydatabase.db\" \nconn = sqlite3.connect(url_db)\ncursor = conn.cursor()\n```\n\n### Initial view of the public (Vis\u00e3o inicial do p\u00fablico)\n\n<span style=\"color: blue;\">(EN)</span> First, let\u2019s observe the format\nof the spine table.\n\n<span style=\"color: green;\">(PT)</span> Primeiro vamos observar o\nformato da tabela spine\n\n``` python\ndf_spine = pd.read_sql(\"SELECT * FROM tb_spine\", conn)\ndf_spine.head()\n```\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA_REF | Target |\n|-----|-----|------------|--------|\n| 0 | 4 | 2023-02-01 | 0 |\n| 1 | 5 | 2023-02-01 | 0 |\n| 2 | 6 | 2023-02-01 | 0 |\n| 3 | 7 | 2023-02-01 | 0 |\n| 4 | 10 | 2023-02-01 | 1 |\n\n</div>\n\n### Initial view of the variables table (Vis\u00e3o inicial da tabela de vari\u00e1veis)\n\n<span style=\"color: blue;\">(EN)</span> The table tb_feat contains 4\nvariables, two numeric and two categorical. Note that there are more\nunique IDs and dates available in this than in the spine table, as is\noften the case.\n\n<span style=\"color: green;\">(PT)</span> A tabela tb_feat cont\u00e9m 4\nvari\u00e1veis, duas sendo num\u00e9ricas e duas categ\u00f3rica. Perceba que existem\nmais ID\u2019s \u00fanicos e datas dispon\u00edveis nessa tabela do que na tabela\nspine, caso que ocorre no dia a dia.\n\n``` python\ndf_data = pd.read_sql(\"SELECT * FROM tb_feat\", conn)\ndf_data.head()\n```\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA | FEAT_NUM1 | FEAT_NUM2 | FEAT_CAT1 | FEAT_CAT2 |\n|-----|-----|------------|-----------|-----------|-----------|-----------|\n| 0 | 1 | 2023-01-01 | -19 | -52 | A | B |\n| 1 | 2 | 2023-01-01 | -7 | -33 | A | B |\n| 2 | 3 | 2023-01-01 | 6 | 91 | C | C |\n| 3 | 4 | 2023-01-01 | 74 | 52 | B | A |\n| 4 | 5 | 2023-01-01 | 79 | 77 | B | B |\n\n</div>\n\n### Creation of numerical variables (Cria\u00e7\u00e3o de vari\u00e1veis num\u00e9ricas)\n\n<span style=\"color: blue;\">(EN)</span> The function\nsqlite_create_query_num() creates a query text for generating variables\nwith the sum, minimum, maximum, and average operations on the variables\nlisted in feat_num_lista and within the time window listed in\nlist_window.\n\n<span style=\"color: green;\">(PT)</span> A fun\u00e7\u00e3o\nsqlite_create_query_num() cria um texto com a query para a cria\u00e7\u00e3o de\nvari\u00e1veis com as opera\u00e7\u00f5es soma, m\u00ednimo, m\u00e1ximo e m\u00e9dia das vari\u00e1veis\nlistadas em feat_num_lista e com a janela de tempo listada em\nlist_window.\n\n``` python\ntb_publico = 'tb_spine'\ntb_feat = 'tb_feat'\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\nsafra = 'SAFRA'\nfeat_num_lista = ['FEAT_NUM1','FEAT_NUM2']\nlist_window = [1,2,3]\nquery_final_num_sqlite = sqlite_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)\n```\n\n Complete query creation with no saved file.\n\n``` python\ndf_num_sqlite = pd.read_sql(query_final_num_sqlite, conn)\ndf_num_sqlite.head()\n```\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA_REF | FEAT_NUM1_SUM_1M | FEAT_NUM1_MIN_1M | FEAT_NUM1_MAX_1M | FEAT_NUM1_AVG_1M | FEAT_NUM2_SUM_1M | FEAT_NUM2_MIN_1M | FEAT_NUM2_MAX_1M | FEAT_NUM2_AVG_1M | FEAT_NUM1_SUM_2M | FEAT_NUM1_MIN_2M | FEAT_NUM1_MAX_2M | FEAT_NUM1_AVG_2M | FEAT_NUM2_SUM_2M | FEAT_NUM2_MIN_2M | FEAT_NUM2_MAX_2M | FEAT_NUM2_AVG_2M | FEAT_NUM1_SUM_3M | FEAT_NUM1_MIN_3M | FEAT_NUM1_MAX_3M | FEAT_NUM1_AVG_3M | FEAT_NUM2_SUM_3M | FEAT_NUM2_MIN_3M | FEAT_NUM2_MAX_3M | FEAT_NUM2_AVG_3M |\n|-----|-----|------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|\n| 0 | 4 | 2023-02-01 | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 | 74 | 74 | 74 | 74.0 | 52 | 52 | 52 | 52.0 | 74 | 74 | 74 | 74.0 | 52 | 52 | 52 | 52.0 |\n| 1 | 5 | 2023-02-01 | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 | 79 | 79 | 79 | 79.0 | 77 | 77 | 77 | 77.0 | 79 | 79 | 79 | 79.0 | 77 | 77 | 77 | 77.0 |\n| 2 | 6 | 2023-02-01 | -13.0 | -13.0 | -13.0 | -13.0 | -45.0 | -45.0 | -45.0 | -45.0 | -13 | -13 | -13 | -13.0 | -45 | -45 | -45 | -45.0 | -13 | -13 | -13 | -13.0 | -45 | -45 | -45 | -45.0 |\n| 3 | 7 | 2023-02-01 | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 | -44 | -44 | -44 | -44.0 | 0 | 0 | 0 | 0.0 | -44 | -44 | -44 | -44.0 | 0 | 0 | 0 | 0.0 |\n| 4 | 10 | 2023-02-01 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 | 11 | 11 | 11 | 11.0 | -39 | -39 | -39 | -39.0 | 11 | 11 | 11 | 11.0 | -39 | -39 | -39 | -39.0 |\n\n</div>\n\n<span style=\"color: blue;\">(EN)</span> We can use the `nome_arquivo`\nparameter with a name in the format \u2018table.sql\u2019 to save the creation\nquery in a file in the same folder where it is executed.\n\n<span style=\"color: green;\">(PT)</span> Podemos utilizar o par\u00e2metro\n`nome_arquivo` com um nome no formato \u201ctable.sql\u201d para salvar query de\ncria\u00e7\u00e3o em um arquivo na mesma pasta que \u00e9 executado.\n\n``` python\ntb_publico = 'tb_spine'\ntb_feat = 'tb_feat'\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\nsafra = 'SAFRA'\nfeat_num_lista = ['FEAT_NUM1','FEAT_NUM2']\nlist_window = [1,2,3]\nfile_name = 'table.sql'\n\nquery_final_num_sqlite = sqlite_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name)\n```\n\n Complete query creation with table.sql saved file\n\n<span style=\"color: blue;\">(EN)</span> Besides saving the query to a\nfile, we can create a table directly in the database using the\n`[`sqlite_create_query_num`](https://ravennaro.github.io/featsql/creation_sqlite.html#sqlite_create_query_num)`\nfunction with the creation parameters. Here is an example:\n\n<span style=\"color: green;\">(PT)</span> Al\u00e9m de salvar a query em um\narquivo, podemos criar uma tabela diretamente no banco de dados a partir\nda fun\u00e7\u00e3o\n`[`sqlite_create_query_num`](https://ravennaro.github.io/featsql/creation_sqlite.html#sqlite_create_query_num)`\ncom os par\u00e2metros de cria\u00e7\u00e3o. A seguir um exemplo:\n\n``` python\ntb_publico = 'tb_spine'\ntb_feat = 'tb_feat'\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\nsafra = 'SAFRA'\nfeat_num_lista = ['FEAT_NUM1','FEAT_NUM2']\nlist_window = [1,2,3]\nstatus = True\ntable_name = 'nome_da_tabela_criada'\nfile_name = 'table.sql'\n\nquery_final_num_sqlite = sqlite_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=conn)\n```\n\n Complete query creation with table.sql saved file\n\n<span style=\"color: blue;\">(EN)</span> To verify the creation, let\u2019s\ncheck the database table using the name of the created table:\n\n<span style=\"color: green;\">(PT)</span> Para conferir a cria\u00e7\u00e3o, vamos\nconsultar o banco de dados com o nome da tabela que foi criada:\n\n``` python\nquery=f\"\"\"SELECT * from {table_name}\"\"\"\ndf = pd.read_sql(query, conn)\ndf.head()\n```\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA_REF | FEAT_NUM1_SUM_1M | FEAT_NUM1_MIN_1M | FEAT_NUM1_MAX_1M | FEAT_NUM1_AVG_1M | FEAT_NUM2_SUM_1M | FEAT_NUM2_MIN_1M | FEAT_NUM2_MAX_1M | FEAT_NUM2_AVG_1M | FEAT_NUM1_SUM_2M | FEAT_NUM1_MIN_2M | FEAT_NUM1_MAX_2M | FEAT_NUM1_AVG_2M | FEAT_NUM2_SUM_2M | FEAT_NUM2_MIN_2M | FEAT_NUM2_MAX_2M | FEAT_NUM2_AVG_2M | FEAT_NUM1_SUM_3M | FEAT_NUM1_MIN_3M | FEAT_NUM1_MAX_3M | FEAT_NUM1_AVG_3M | FEAT_NUM2_SUM_3M | FEAT_NUM2_MIN_3M | FEAT_NUM2_MAX_3M | FEAT_NUM2_AVG_3M |\n|-----|-----|------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|\n| 0 | 4 | 2023-02-01 | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 |\n| 1 | 5 | 2023-02-01 | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 |\n| 2 | 6 | 2023-02-01 | -13.0 | -13.0 | -13.0 | -13.0 | -45.0 | -45.0 | -45.0 | -45.0 | -13.0 | -13.0 | -13.0 | -13.0 | -45.0 | -45.0 | -45.0 | -45.0 | -13.0 | -13.0 | -13.0 | -13.0 | -45.0 | -45.0 | -45.0 | -45.0 |\n| 3 | 7 | 2023-02-01 | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 |\n| 4 | 10 | 2023-02-01 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 |\n\n</div>\n\n### Creation of categorical variables (Cria\u00e7\u00e3o de vari\u00e1veis categ\u00f3ricas)\n\n<span style=\"color: blue;\">(EN)</span> The function\n`sqlite_create_query_cat()` generates a query text for creating\nvariables with the mode of each variable listed in feat_num_lista within\nthe time window provided in list_window.\n\n<span style=\"color: green;\">(PT)</span> A fun\u00e7\u00e3o\n`sqlite_create_query_cat()` cria um texto com a query para a cria\u00e7\u00e3o de\nvari\u00e1veis com a moda de cada uma das vari\u00e1veis listadas em\nfeat_num_lista na janela de tempo fornecida em list_window.\n\n``` python\ntb_publico = 'tb_spine'\ntb_feat = 'tb_feat'\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\nsafra = 'SAFRA'\nfeat_num_lista = ['FEAT_CAT1', 'FEAT_CAT2']\nlist_window = [1, 3, 6]\n\nquery_final_cat_sqlite = sqlite_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)\n```\n\n Complete query creation with no saved file.\n\n``` python\ndf_cat_sqlite_sqlite = pd.read_sql(query_final_cat_sqlite, conn)\ndf_cat_sqlite_sqlite.head()\n```\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA_REF | FEAT_CAT1_MODA_1M | FEAT_CAT2_MODA_1M | FEAT_CAT1_MODA_3M | FEAT_CAT2_MODA_3M | FEAT_CAT1_MODA_6M | FEAT_CAT2_MODA_6M |\n|-----|-----|------------|-------------------|-------------------|-------------------|-------------------|-------------------|-------------------|\n| 0 | 4 | 2023-02-01 | B | A | B | A | B | A |\n| 1 | 5 | 2023-02-01 | B | B | B | B | B | B |\n| 2 | 6 | 2023-02-01 | C | B | C | B | C | B |\n| 3 | 7 | 2023-02-01 | B | C | B | C | B | C |\n| 4 | 10 | 2023-02-01 | A | A | A | A | A | A |\n\n</div>\n\n<span style=\"color: blue;\">(EN)</span> The function\n`sqlite_create_query_cat()` also allows saving a file with the created\nquery and creating the table directly in the database. Here\u2019s an\nexample:\n\n<span style=\"color: green;\">(PT)</span> A fun\u00e7\u00e3o\n`sqlite_create_query_cat()` tamb\u00e9m permite salvar um arquivo com a query\ncriada e criar a tabela diretamente no banco de dados. Segue um exemplo:\n\n``` python\ntb_publico = 'tb_spine'\ntb_feat = 'tb_feat'\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\nsafra = 'SAFRA'\nfeat_num_lista = ['FEAT_CAT1', 'FEAT_CAT2']\nlist_window = [1, 3, 6]\nstatus = True\ntable_name = 'nome_da_tabela_criada'\nfile_name = 'table.sql'\n\nquery_final_cat_sqlite = sqlite_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=conn)\n```\n\n Complete query creation with table.sql saved file\n\n<span style=\"color: blue;\">(EN)</span> To verify the creation, let\u2019s\ncheck the database table using the name of the created table:\n\n<span style=\"color: green;\">(PT)</span> Para conferir a cria\u00e7\u00e3o, vamos\nconsultar o banco de dados com o nome da tabela que foi criada:\n\n``` python\nquery=f\"\"\"SELECT * from {table_name}\"\"\"\ndf = pd.read_sql(query, conn)\ndf.head()\n```\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA_REF | FEAT_CAT1_MODA_1M | FEAT_CAT2_MODA_1M | FEAT_CAT1_MODA_3M | FEAT_CAT2_MODA_3M | FEAT_CAT1_MODA_6M | FEAT_CAT2_MODA_6M |\n|-----|-----|------------|-------------------|-------------------|-------------------|-------------------|-------------------|-------------------|\n| 0 | 4 | 2023-02-01 | B | A | B | A | B | A |\n| 1 | 5 | 2023-02-01 | B | B | B | B | B | B |\n| 2 | 6 | 2023-02-01 | C | B | C | B | C | B |\n| 3 | 7 | 2023-02-01 | B | C | B | C | B | C |\n| 4 | 10 | 2023-02-01 | A | A | A | A | A | A |\n\n</div>\n\n### Creation of aggregated variables (Cria\u00e7\u00e3o de vari\u00e1veis agregadas)\n\n<span style=\"color: blue;\">(EN)</span> We can create features by\naggregating on a specific value of a categorical variable and\ncalculating sum, avg, min, and max operations for user-defined numerical\nvariables. For example, for the variable FEAT_CAT1 having a value of A\nor B, we calculate sum, avg, min, and max operations for the variables\nFEAT_NUM1 and FEAT_NUM2. Here\u2019s a use case:\n\n<span style=\"color: green;\">(PT)</span> Podemos criar vari\u00e1veis\nagregando por valor espec\u00edfico de vari\u00e1vel categ\u00f3rica e calculando as\nopera\u00e7\u00f5es de sum, avg, min e max para as vari\u00e1veis n\u00famericas definidas\npelo usu\u00e1rio. Por exemplo, para a vari\u00e1vel `FEAT_CAT1` tendo valor igual\na `A` ou `B`, calculamos as opera\u00e7\u00f5es de sum, avg, min e max para as\nvari\u00e1veis `FEAT_NUM1` e `FEAT_NUM2`. Segue um caso de uso:\n\n``` python\ntb_publico = 'tb_spine'\nlist_window = [3, 6]\nlista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']\nfeat_cat = 'FEAT_CAT1'\nlist_aggregator_value = ['A', 'B']\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\ntb_feat = 'tb_feat'\nsafra = 'SAFRA'\n\nquery = sqlite_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, list_aggregator_value)\n```\n\n Complete query creation with no saved file.\n\n``` python\ndf_sqlite_agregada = pd.read_sql(query, conn)\ndf_sqlite_agregada.head()\n```\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA_REF | SUM_FEAT_NUM1_FEAT_CAT1_A_3M | MAX_FEAT_NUM1_FEAT_CAT1_A_3M | MIN_FEAT_NUM1_FEAT_CAT1_A_3M | AVG_FEAT_NUM1_FEAT_CAT1_A_3M | SUM_FEAT_NUM2_FEAT_CAT1_A_3M | MAX_FEAT_NUM2_FEAT_CAT1_A_3M | MIN_FEAT_NUM2_FEAT_CAT1_A_3M | AVG_FEAT_NUM2_FEAT_CAT1_A_3M | SUM_FEAT_NUM1_FEAT_CAT1_A_6M | MAX_FEAT_NUM1_FEAT_CAT1_A_6M | MIN_FEAT_NUM1_FEAT_CAT1_A_6M | AVG_FEAT_NUM1_FEAT_CAT1_A_6M | SUM_FEAT_NUM2_FEAT_CAT1_A_6M | MAX_FEAT_NUM2_FEAT_CAT1_A_6M | MIN_FEAT_NUM2_FEAT_CAT1_A_6M | AVG_FEAT_NUM2_FEAT_CAT1_A_6M | SUM_FEAT_NUM1_FEAT_CAT1_B_3M | MAX_FEAT_NUM1_FEAT_CAT1_B_3M | MIN_FEAT_NUM1_FEAT_CAT1_B_3M | AVG_FEAT_NUM1_FEAT_CAT1_B_3M | SUM_FEAT_NUM2_FEAT_CAT1_B_3M | MAX_FEAT_NUM2_FEAT_CAT1_B_3M | MIN_FEAT_NUM2_FEAT_CAT1_B_3M | AVG_FEAT_NUM2_FEAT_CAT1_B_3M | SUM_FEAT_NUM1_FEAT_CAT1_B_6M | MAX_FEAT_NUM1_FEAT_CAT1_B_6M | MIN_FEAT_NUM1_FEAT_CAT1_B_6M | AVG_FEAT_NUM1_FEAT_CAT1_B_6M | SUM_FEAT_NUM2_FEAT_CAT1_B_6M | MAX_FEAT_NUM2_FEAT_CAT1_B_6M | MIN_FEAT_NUM2_FEAT_CAT1_B_6M | AVG_FEAT_NUM2_FEAT_CAT1_B_6M |\n|-----|-----|------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|\n| 0 | 4 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 |\n| 1 | 5 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 |\n| 2 | 6 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n| 3 | 7 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 |\n| 4 | 10 | 2023-02-01 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n\n</div>\n\n<span style=\"color: blue;\">(EN)</span> Here\u2019s an example of saving a\nfile with the created query and directly creating the table in the\ndatabase:\n\n<span style=\"color: green;\">(PT)</span> Segue um exemplo para salvar um\narquivo com a query criada e criar a tabela diretamente no banco de\ndados:\n\n``` python\ntb_publico = 'tb_spine'\nlist_window = [3, 6]\nlista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']\nfeat_cat = 'FEAT_CAT1'\nlist_aggregator_value = ['A', 'B']\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\ntb_feat = 'tb_feat'\nsafra = 'SAFRA'\nstatus = True\nfile_name = 'nome_da_tabela_criada'\nfile_name = 'table.sql'\n\nquery = sqlite_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, list_aggregator_value, file_name=file_name, status=status, table_name=table_name, conn=conn)\n```\n\n Complete query creation with table.sql saved file\n\n<span style=\"color: blue;\">(EN)</span> To verify the creation, let\u2019s\ncheck the database table using the name of the created table:\n\n<span style=\"color: green;\">(PT)</span> Para conferir a cria\u00e7\u00e3o, vamos\nconsultar o banco de dados com o nome da tabela que foi criada:\n\n``` python\nquery=f\"\"\"SELECT * from {table_name}\"\"\"\ndf = pd.read_sql(query, conn)\ndf.head()\n```\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA_REF | SUM_FEAT_NUM1_FEAT_CAT1_A_3M | MAX_FEAT_NUM1_FEAT_CAT1_A_3M | MIN_FEAT_NUM1_FEAT_CAT1_A_3M | AVG_FEAT_NUM1_FEAT_CAT1_A_3M | SUM_FEAT_NUM2_FEAT_CAT1_A_3M | MAX_FEAT_NUM2_FEAT_CAT1_A_3M | MIN_FEAT_NUM2_FEAT_CAT1_A_3M | AVG_FEAT_NUM2_FEAT_CAT1_A_3M | SUM_FEAT_NUM1_FEAT_CAT1_A_6M | MAX_FEAT_NUM1_FEAT_CAT1_A_6M | MIN_FEAT_NUM1_FEAT_CAT1_A_6M | AVG_FEAT_NUM1_FEAT_CAT1_A_6M | SUM_FEAT_NUM2_FEAT_CAT1_A_6M | MAX_FEAT_NUM2_FEAT_CAT1_A_6M | MIN_FEAT_NUM2_FEAT_CAT1_A_6M | AVG_FEAT_NUM2_FEAT_CAT1_A_6M | SUM_FEAT_NUM1_FEAT_CAT1_B_3M | MAX_FEAT_NUM1_FEAT_CAT1_B_3M | MIN_FEAT_NUM1_FEAT_CAT1_B_3M | AVG_FEAT_NUM1_FEAT_CAT1_B_3M | SUM_FEAT_NUM2_FEAT_CAT1_B_3M | MAX_FEAT_NUM2_FEAT_CAT1_B_3M | MIN_FEAT_NUM2_FEAT_CAT1_B_3M | AVG_FEAT_NUM2_FEAT_CAT1_B_3M | SUM_FEAT_NUM1_FEAT_CAT1_B_6M | MAX_FEAT_NUM1_FEAT_CAT1_B_6M | MIN_FEAT_NUM1_FEAT_CAT1_B_6M | AVG_FEAT_NUM1_FEAT_CAT1_B_6M | SUM_FEAT_NUM2_FEAT_CAT1_B_6M | MAX_FEAT_NUM2_FEAT_CAT1_B_6M | MIN_FEAT_NUM2_FEAT_CAT1_B_6M | AVG_FEAT_NUM2_FEAT_CAT1_B_6M |\n|-----|-----|------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|\n| 0 | 4 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 | 74.0 | 74.0 | 74.0 | 74.0 | 52.0 | 52.0 | 52.0 | 52.0 |\n| 1 | 5 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 | 79.0 | 79.0 | 79.0 | 79.0 | 77.0 | 77.0 | 77.0 | 77.0 |\n| 2 | 6 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n| 3 | 7 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 | -44.0 | -44.0 | -44.0 | -44.0 | 0.0 | 0.0 | 0.0 | 0.0 |\n| 4 | 10 | 2023-02-01 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 | 11.0 | 11.0 | 11.0 | 11.0 | -39.0 | -39.0 | -39.0 | -39.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n\n</div>\n\n## 2. MySQL\n\n### Setting up the connection (Configurando a conex\u00e3o)\n\n``` python\nhost = \"localhost\"\nuser = \"sqluser\"\npassword = \"password\"\ndatabase = \"mydatabase\"\n\n# Conectar ao MySQL\nconnection = mysql.connector.connect(\n host=host,\n user=user,\n password=password,\n database=database\n)\n```\n\n### Initial view of the public (Vis\u00e3o inicial do p\u00fablico)\n\n<span style=\"color: blue;\">(EN)</span> First, let\u2019s observe the format\nof the spine table.\n\n<span style=\"color: green;\">(PT)</span> Primeiro vamos observar o\nformato da tabela spine\n\n``` python\ndf_spine = pd.read_sql(\"SELECT * FROM tb_spine\", connection)\ndf_spine.head()\n```\n\n /tmp/ipykernel_31659/1385309256.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.\n df_spine = pd.read_sql(\"SELECT * FROM tb_spine\", connection)\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA_REF | Target |\n|-----|-----|------------|--------|\n| 0 | 4 | 2023-02-01 | 1 |\n| 1 | 5 | 2023-02-01 | 0 |\n| 2 | 6 | 2023-02-01 | 0 |\n| 3 | 7 | 2023-02-01 | 0 |\n| 4 | 10 | 2023-02-01 | 0 |\n\n</div>\n\n### Initial view of the variables table (Vis\u00e3o inicial da tabela de vari\u00e1veis)\n\n<span style=\"color: blue;\">(EN)</span> The table tb_feat contains 4\nvariables, two numeric and two categorical. Note that there are more\nunique IDs and dates available in this than in the spine table, as is\noften the case.\n\n<span style=\"color: green;\">(PT)</span> A tabela tb_feat cont\u00e9m 4\nvari\u00e1veis, duas sendo num\u00e9ricas e duas categ\u00f3rica. Perceba que existem\nmais ID\u2019s \u00fanicos e datas dispon\u00edveis nessa tabela do que na tabela\nspine, caso que ocorre no dia a dia.\n\n``` python\ndf_data = pd.read_sql(\"SELECT * FROM tb_feat\", connection)\ndf_data.head()\n```\n\n /tmp/ipykernel_31659/548342132.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.\n df_data = pd.read_sql(\"SELECT * FROM tb_feat\", connection)\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA | FEAT_NUM1 | FEAT_NUM2 | FEAT_CAT1 | FEAT_CAT2 |\n|-----|-----|------------|-----------|-----------|-----------|-----------|\n| 0 | 1 | 2023-01-01 | 73 | 23 | B | B |\n| 1 | 3 | 2023-01-01 | 15 | 1 | B | B |\n| 2 | 5 | 2023-01-01 | 75 | 71 | A | A |\n| 3 | 7 | 2023-01-01 | 73 | 82 | B | C |\n| 4 | 9 | 2023-01-01 | 61 | 8 | C | B |\n\n</div>\n\n### Creation of numerical variables (Cria\u00e7\u00e3o de vari\u00e1veis num\u00e9ricas)\n\n<span style=\"color: blue;\">(EN)</span> The function\n`mysql_create_query_num()` creates a query text for generating variables\nwith the sum, minimum, maximum, and average operations on the variables\nlisted in feat_num_lista and within the time window listed in\nlist_window.\n\n<span style=\"color: green;\">(PT)</span> A fun\u00e7\u00e3o\n`mysql_create_query_num()` cria um texto com a query para a cria\u00e7\u00e3o de\nvari\u00e1veis com as opera\u00e7\u00f5es soma, m\u00ednimo, m\u00e1ximo e m\u00e9dia das vari\u00e1veis\nlistadas em feat_num_lista e com a janela de tempo listada em\nlist_window.\n\n``` python\ntb_publico = 'tb_spine'\ntb_feat = 'tb_feat'\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\nsafra = 'SAFRA'\nfeat_num_lista = ['FEAT_NUM1','FEAT_NUM2']\nlist_window = [1,2,3]\nquery_final_num_mysql = mysql_create_query_num(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra)\n```\n\n Complete query creation with no saved file.\n\n``` python\ndf_num_mysql = pd.read_sql(query_final_num_mysql, connection)\ndf_num_mysql.head()\n```\n\n /tmp/ipykernel_31659/3114881597.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.\n df_num_mysql = pd.read_sql(query_final_num_mysql, connection)\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA_REF | FEAT_NUM1_SUM_1M | FEAT_NUM1_MIN_1M | FEAT_NUM1_MAX_1M | FEAT_NUM1_AVG_1M | FEAT_NUM2_SUM_1M | FEAT_NUM2_MIN_1M | FEAT_NUM2_MAX_1M | FEAT_NUM2_AVG_1M | FEAT_NUM1_SUM_2M | FEAT_NUM1_MIN_2M | FEAT_NUM1_MAX_2M | FEAT_NUM1_AVG_2M | FEAT_NUM2_SUM_2M | FEAT_NUM2_MIN_2M | FEAT_NUM2_MAX_2M | FEAT_NUM2_AVG_2M | FEAT_NUM1_SUM_3M | FEAT_NUM1_MIN_3M | FEAT_NUM1_MAX_3M | FEAT_NUM1_AVG_3M | FEAT_NUM2_SUM_3M | FEAT_NUM2_MIN_3M | FEAT_NUM2_MAX_3M | FEAT_NUM2_AVG_3M |\n|-----|-----|------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|\n| 0 | 4 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n| 1 | 5 | 2023-02-01 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 |\n| 2 | 6 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n| 3 | 7 | 2023-02-01 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 |\n| 4 | 10 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n\n</div>\n\n<span style=\"color: blue;\">(EN)</span> We can use the `file_name`\nparameter with a name in the format \u2018table.sql\u2019 to save the creation\nquery in a file in the same folder where it is executed.\n\n<span style=\"color: green;\">(PT)</span> Podemos utilizar o par\u00e2metro\n`file_name` com um nome no formato \u201ctable.sql\u201d para salvar query de\ncria\u00e7\u00e3o em um arquivo na mesma pasta que \u00e9 executado.\n\n``` python\ntb_publico = 'tb_spine'\ntb_feat = 'tb_feat'\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\nsafra = 'SAFRA'\nfeat_num_lista = ['FEAT_NUM1','FEAT_NUM2']\nlist_window = [1,2,3]\nstatus = True\ntable_name = 'nome_da_tabela_criada'\nfile_name = 'table.sql'\n\nquery_final_num_mysql = mysql_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=connection)\n```\n\n ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ']\n -- Create table with the name nome_da_tabela_criada\n -- Criar a tabela nome_da_' at line 1\n\n<span style=\"color: blue;\">(EN)</span> To verify the creation, let\u2019s\ncheck the database table using the name of the created table:\n\n<span style=\"color: green;\">(PT)</span> Para conferir a cria\u00e7\u00e3o, vamos\nconsultar o banco de dados com o nome da tabela que foi criada:\n\n``` python\nquery=f\"\"\"SELECT * from {table_name}\"\"\"\ndf = pd.read_sql(query, connection)\ndf.head()\n```\n\n /tmp/ipykernel_1661/3805114983.py:4: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.\n df = pd.read_sql(query, connection)\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA_REF | FEAT_NUM1_SUM_1M | FEAT_NUM1_MIN_1M | FEAT_NUM1_MAX_1M | FEAT_NUM1_AVG_1M | FEAT_NUM2_SUM_1M | FEAT_NUM2_MIN_1M | FEAT_NUM2_MAX_1M | FEAT_NUM2_AVG_1M | FEAT_NUM1_SUM_2M | FEAT_NUM1_MIN_2M | FEAT_NUM1_MAX_2M | FEAT_NUM1_AVG_2M | FEAT_NUM2_SUM_2M | FEAT_NUM2_MIN_2M | FEAT_NUM2_MAX_2M | FEAT_NUM2_AVG_2M | FEAT_NUM1_SUM_3M | FEAT_NUM1_MIN_3M | FEAT_NUM1_MAX_3M | FEAT_NUM1_AVG_3M | FEAT_NUM2_SUM_3M | FEAT_NUM2_MIN_3M | FEAT_NUM2_MAX_3M | FEAT_NUM2_AVG_3M |\n|-----|-----|------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|\n| 0 | 4 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n| 1 | 5 | 2023-02-01 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 |\n| 2 | 6 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n| 3 | 7 | 2023-02-01 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 |\n| 4 | 10 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n\n</div>\n\n### Creation of categorical variables (Cria\u00e7\u00e3o de vari\u00e1veis categ\u00f3ricas)\n\n<span style=\"color: blue;\">(EN)</span> The function\n`mysql_create_query_cat()` generates a query text for creating variables\nwith the mode of each variable listed in feat_num_lista within the time\nwindow provided in list_window.\n\n<span style=\"color: green;\">(PT)</span> A fun\u00e7\u00e3o\n`mysql_create_query_cat()` cria um texto com a query para a cria\u00e7\u00e3o de\nvari\u00e1veis com a moda de cada uma das vari\u00e1veis listadas em\nfeat_num_lista na janela de tempo fornecida em list_window.\n\n``` python\ntb_publico = 'tb_spine'\ntb_feat = 'tb_feat'\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\nsafra = 'SAFRA'\nfeat_num_lista = ['FEAT_CAT1','FEAT_CAT2']\nlist_window = [1,2,3]\nquery_final_cat_mysql = mysql_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)\n```\n\n Complete query creation with no saved file.\n\n``` python\ndf_cat_sqlite_mysql = pd.read_sql(query_final_cat_mysql, connection)\ndf_cat_sqlite_mysql.head()\n```\n\n /tmp/ipykernel_1661/827211967.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.\n df_cat_sqlite_mysql = pd.read_sql(query_final_cat_mysql, connection)\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA_REF | FEAT_CAT1_MODA_1M | FEAT_CAT2_MODA_1M | FEAT_CAT1_MODA_2M | FEAT_CAT2_MODA_2M | FEAT_CAT1_MODA_3M | FEAT_CAT2_MODA_3M |\n|-----|-----|------------|-------------------|-------------------|-------------------|-------------------|-------------------|-------------------|\n| 0 | 4 | 2023-02-01 | None | None | None | None | None | None |\n| 1 | 5 | 2023-02-01 | A | A | A | A | A | A |\n| 2 | 6 | 2023-02-01 | None | None | None | None | None | None |\n| 3 | 7 | 2023-02-01 | B | C | B | C | B | C |\n| 4 | 10 | 2023-02-01 | None | None | None | None | None | None |\n\n</div>\n\n<span style=\"color: blue;\">(EN)</span> The function\n`sqlite_create_query_cat()` also allows saving a file with the created\nquery and creating the table directly in the database. Here\u2019s an\nexample:\n\n<span style=\"color: green;\">(PT)</span> A fun\u00e7\u00e3o\n`sqlite_create_query_cat()` tamb\u00e9m permite salvar um arquivo com a query\ncriada e criar a tabela diretamente no banco de dados. Segue um exemplo:\n\n``` python\ntb_publico = 'tb_spine'\ntb_feat = 'tb_feat'\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\nsafra = 'SAFRA'\nfeat_num_lista = ['FEAT_CAT1', 'FEAT_CAT2']\nlist_window = [1, 3, 6]\nstatus = True\ntable_name = 'nome_da_tabela_criada'\nfile_name = 'table.sql'\n\nquery_final_cat_mysql = mysql_create_query_cat(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=connection)\n```\n\n Complete query creation with table.sql saved file\n\n<span style=\"color: blue;\">(EN)</span> To verify the creation, let\u2019s\ncheck the database table using the name of the created table:\n\n<span style=\"color: green;\">(PT)</span> Para conferir a cria\u00e7\u00e3o, vamos\nconsultar o banco de dados com o nome da tabela que foi criada:\n\n``` python\nquery=f\"\"\"SELECT * from {table_name}\"\"\"\ndf = pd.read_sql(query, connection)\ndf.head()\n```\n\n /tmp/ipykernel_1661/3805114983.py:4: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.\n df = pd.read_sql(query, connection)\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA_REF | FEAT_CAT1_MODA_1M | FEAT_CAT2_MODA_1M | FEAT_CAT1_MODA_3M | FEAT_CAT2_MODA_3M | FEAT_CAT1_MODA_6M | FEAT_CAT2_MODA_6M |\n|-----|-----|------------|-------------------|-------------------|-------------------|-------------------|-------------------|-------------------|\n| 0 | 4 | 2023-02-01 | None | None | None | None | None | None |\n| 1 | 5 | 2023-02-01 | A | A | A | A | A | A |\n| 2 | 6 | 2023-02-01 | None | None | None | None | None | None |\n| 3 | 7 | 2023-02-01 | B | C | B | C | B | C |\n| 4 | 10 | 2023-02-01 | None | None | None | None | None | None |\n\n</div>\n\n### Creation of aggregated variables (Cria\u00e7\u00e3o de vari\u00e1veis agregadas)\n\n<span style=\"color: blue;\">(EN)</span> We can create features by\naggregating on a specific value of a categorical variable and\ncalculating sum, avg, min, and max operations for user-defined numerical\nvariables. For example, for the variable FEAT_CAT1 having a value of A\nor B, we calculate sum, avg, min, and max operations for the variables\nFEAT_NUM1 and FEAT_NUM2. Here\u2019s a use case:\n\n<span style=\"color: green;\">(PT)</span> Podemos criar vari\u00e1veis\nagregando por valor espec\u00edfico de vari\u00e1vel categ\u00f3rica e calculando as\nopera\u00e7\u00f5es de sum, avg, min e max para as vari\u00e1veis n\u00famericas definidas\npelo usu\u00e1rio. Por exemplo, para a vari\u00e1vel `FEAT_CAT1` tendo valor igual\na `A` ou `B`, calculamos as opera\u00e7\u00f5es de sum, avg, min e max para as\nvari\u00e1veis `FEAT_NUM1` e `FEAT_NUM2`. Segue um caso de uso:\n\n``` python\ntb_publico = 'tb_spine'\nlist_window = [3, 6]\nlista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']\nfeat_cat = 'FEAT_CAT1'\nlista_valor_agragador = ['A', 'B']\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\ntb_feat = 'tb_feat'\nsafra = 'SAFRA'\n\nquery = mysql_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador)\n```\n\n Complete query creation with no saved file.\n\n``` python\ndf_mysql_agregada = pd.read_sql(query, connection)\ndf_mysql_agregada.head()\n```\n\n /tmp/ipykernel_1661/789764728.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.\n df_mysql_agregada = pd.read_sql(query, connection)\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA_REF | SUM_FEAT_NUM1_FEAT_CAT1_A_3M | MAX_FEAT_NUM1_FEAT_CAT1_A_3M | MIN_FEAT_NUM1_FEAT_CAT1_A_3M | AVG_FEAT_NUM1_FEAT_CAT1_A_3M | SUM_FEAT_NUM2_FEAT_CAT1_A_3M | MAX_FEAT_NUM2_FEAT_CAT1_A_3M | MIN_FEAT_NUM2_FEAT_CAT1_A_3M | AVG_FEAT_NUM2_FEAT_CAT1_A_3M | SUM_FEAT_NUM1_FEAT_CAT1_A_6M | MAX_FEAT_NUM1_FEAT_CAT1_A_6M | MIN_FEAT_NUM1_FEAT_CAT1_A_6M | AVG_FEAT_NUM1_FEAT_CAT1_A_6M | SUM_FEAT_NUM2_FEAT_CAT1_A_6M | MAX_FEAT_NUM2_FEAT_CAT1_A_6M | MIN_FEAT_NUM2_FEAT_CAT1_A_6M | AVG_FEAT_NUM2_FEAT_CAT1_A_6M | SUM_FEAT_NUM1_FEAT_CAT1_B_3M | MAX_FEAT_NUM1_FEAT_CAT1_B_3M | MIN_FEAT_NUM1_FEAT_CAT1_B_3M | AVG_FEAT_NUM1_FEAT_CAT1_B_3M | SUM_FEAT_NUM2_FEAT_CAT1_B_3M | MAX_FEAT_NUM2_FEAT_CAT1_B_3M | MIN_FEAT_NUM2_FEAT_CAT1_B_3M | AVG_FEAT_NUM2_FEAT_CAT1_B_3M | SUM_FEAT_NUM1_FEAT_CAT1_B_6M | MAX_FEAT_NUM1_FEAT_CAT1_B_6M | MIN_FEAT_NUM1_FEAT_CAT1_B_6M | AVG_FEAT_NUM1_FEAT_CAT1_B_6M | SUM_FEAT_NUM2_FEAT_CAT1_B_6M | MAX_FEAT_NUM2_FEAT_CAT1_B_6M | MIN_FEAT_NUM2_FEAT_CAT1_B_6M | AVG_FEAT_NUM2_FEAT_CAT1_B_6M |\n|-----|-----|------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|\n| 0 | 4 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n| 1 | 5 | 2023-02-01 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n| 2 | 6 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n| 3 | 7 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 |\n| 4 | 10 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n\n</div>\n\n``` python\ntb_publico = 'tb_spine'\nlist_window = [3, 6]\nlista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']\nfeat_cat = 'FEAT_CAT2'\nlista_valor_agragador = ['B', 'C']\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\ntb_feat = 'tb_feat'\nsafra = 'SAFRA'\n\nquery = mysql_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador)\n```\n\n Complete query creation with no saved file.\n\n``` python\ndf_mysql_agregada = pd.read_sql(query, connection)\ndf_mysql_agregada.head()\n```\n\n /tmp/ipykernel_1661/789764728.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.\n df_mysql_agregada = pd.read_sql(query, connection)\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA_REF | SUM_FEAT_NUM1_FEAT_CAT2_B_3M | MAX_FEAT_NUM1_FEAT_CAT2_B_3M | MIN_FEAT_NUM1_FEAT_CAT2_B_3M | AVG_FEAT_NUM1_FEAT_CAT2_B_3M | SUM_FEAT_NUM2_FEAT_CAT2_B_3M | MAX_FEAT_NUM2_FEAT_CAT2_B_3M | MIN_FEAT_NUM2_FEAT_CAT2_B_3M | AVG_FEAT_NUM2_FEAT_CAT2_B_3M | SUM_FEAT_NUM1_FEAT_CAT2_B_6M | MAX_FEAT_NUM1_FEAT_CAT2_B_6M | MIN_FEAT_NUM1_FEAT_CAT2_B_6M | AVG_FEAT_NUM1_FEAT_CAT2_B_6M | SUM_FEAT_NUM2_FEAT_CAT2_B_6M | MAX_FEAT_NUM2_FEAT_CAT2_B_6M | MIN_FEAT_NUM2_FEAT_CAT2_B_6M | AVG_FEAT_NUM2_FEAT_CAT2_B_6M | SUM_FEAT_NUM1_FEAT_CAT2_C_3M | MAX_FEAT_NUM1_FEAT_CAT2_C_3M | MIN_FEAT_NUM1_FEAT_CAT2_C_3M | AVG_FEAT_NUM1_FEAT_CAT2_C_3M | SUM_FEAT_NUM2_FEAT_CAT2_C_3M | MAX_FEAT_NUM2_FEAT_CAT2_C_3M | MIN_FEAT_NUM2_FEAT_CAT2_C_3M | AVG_FEAT_NUM2_FEAT_CAT2_C_3M | SUM_FEAT_NUM1_FEAT_CAT2_C_6M | MAX_FEAT_NUM1_FEAT_CAT2_C_6M | MIN_FEAT_NUM1_FEAT_CAT2_C_6M | AVG_FEAT_NUM1_FEAT_CAT2_C_6M | SUM_FEAT_NUM2_FEAT_CAT2_C_6M | MAX_FEAT_NUM2_FEAT_CAT2_C_6M | MIN_FEAT_NUM2_FEAT_CAT2_C_6M | AVG_FEAT_NUM2_FEAT_CAT2_C_6M |\n|-----|-----|------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|\n| 0 | 4 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n| 1 | 5 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n| 2 | 6 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n| 3 | 7 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 |\n| 4 | 10 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n\n</div>\n\n<span style=\"color: blue;\">(EN)</span> Here\u2019s an example of saving a\nfile with the created query and directly creating the table in the\ndatabase:\n\n<span style=\"color: green;\">(PT)</span> Segue um exemplo para salvar um\narquivo com a query criada e criar a tabela diretamente no banco de\ndados:\n\n``` python\ntb_publico = 'tb_spine'\nlist_window = [3, 6]\nlista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']\nfeat_cat = 'FEAT_CAT1'\nlista_valor_agragador = ['A', 'B']\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\ntb_feat = 'tb_feat'\nsafra = 'SAFRA'\nstatus = True\ntable_name = 'nome_da_tabela_criada'\nfile_name = 'table.sql'\n\nquery = mysql_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador, file_name=file_name, status=status, table_name=table_name, conn=connection)\n```\n\n Complete query creation with table.sql saved file\n\n<span style=\"color: blue;\">(EN)</span> To verify the creation, let\u2019s\ncheck the database table using the name of the created table:\n\n<span style=\"color: green;\">(PT)</span> Para conferir a cria\u00e7\u00e3o, vamos\nconsultar o banco de dados com o nome da tabela que foi criada:\n\n``` python\nquery=f\"\"\"SELECT * from {table_name}\"\"\"\ndf = pd.read_sql(query, connection)\ndf.head()\n```\n\n /tmp/ipykernel_1661/3805114983.py:4: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.\n df = pd.read_sql(query, connection)\n\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n .dataframe tbody tr th {\n vertical-align: top;\n }\n .dataframe thead th {\n text-align: right;\n }\n</style>\n\n| | ID | SAFRA_REF | SUM_FEAT_NUM1_FEAT_CAT1_A_3M | MAX_FEAT_NUM1_FEAT_CAT1_A_3M | MIN_FEAT_NUM1_FEAT_CAT1_A_3M | AVG_FEAT_NUM1_FEAT_CAT1_A_3M | SUM_FEAT_NUM2_FEAT_CAT1_A_3M | MAX_FEAT_NUM2_FEAT_CAT1_A_3M | MIN_FEAT_NUM2_FEAT_CAT1_A_3M | AVG_FEAT_NUM2_FEAT_CAT1_A_3M | SUM_FEAT_NUM1_FEAT_CAT1_A_6M | MAX_FEAT_NUM1_FEAT_CAT1_A_6M | MIN_FEAT_NUM1_FEAT_CAT1_A_6M | AVG_FEAT_NUM1_FEAT_CAT1_A_6M | SUM_FEAT_NUM2_FEAT_CAT1_A_6M | MAX_FEAT_NUM2_FEAT_CAT1_A_6M | MIN_FEAT_NUM2_FEAT_CAT1_A_6M | AVG_FEAT_NUM2_FEAT_CAT1_A_6M | SUM_FEAT_NUM1_FEAT_CAT1_B_3M | MAX_FEAT_NUM1_FEAT_CAT1_B_3M | MIN_FEAT_NUM1_FEAT_CAT1_B_3M | AVG_FEAT_NUM1_FEAT_CAT1_B_3M | SUM_FEAT_NUM2_FEAT_CAT1_B_3M | MAX_FEAT_NUM2_FEAT_CAT1_B_3M | MIN_FEAT_NUM2_FEAT_CAT1_B_3M | AVG_FEAT_NUM2_FEAT_CAT1_B_3M | SUM_FEAT_NUM1_FEAT_CAT1_B_6M | MAX_FEAT_NUM1_FEAT_CAT1_B_6M | MIN_FEAT_NUM1_FEAT_CAT1_B_6M | AVG_FEAT_NUM1_FEAT_CAT1_B_6M | SUM_FEAT_NUM2_FEAT_CAT1_B_6M | MAX_FEAT_NUM2_FEAT_CAT1_B_6M | MIN_FEAT_NUM2_FEAT_CAT1_B_6M | AVG_FEAT_NUM2_FEAT_CAT1_B_6M |\n|-----|-----|------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|\n| 0 | 4 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n| 1 | 5 | 2023-02-01 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | 75.0 | 75.0 | 75.0 | 75.0 | 71.0 | 71.0 | 71.0 | 71.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n| 2 | 6 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n| 3 | 7 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 | 73.0 | 73.0 | 73.0 | 73.0 | 82.0 | 82.0 | 82.0 | 82.0 |\n| 4 | 10 | 2023-02-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |\n\n</div>\n\n## 3. Snowflake\n\n### Cria\u00e7\u00e3o de vari\u00e1veis num\u00e9ricas\n\n<span style=\"color: blue;\">(EN)</span> The function\n`snow_create_query_num()` creates a query text for generating variables\nwith the sum, minimum, maximum, and average operations on the variables\nlisted in feat_num_lista and within the time window listed in\nlist_window.\n\n<span style=\"color: green;\">(PT)</span> A fun\u00e7\u00e3o\n`snow_create_query_num()` cria um texto com a query para a cria\u00e7\u00e3o de\nvari\u00e1veis com as opera\u00e7\u00f5es soma, m\u00ednimo, m\u00e1ximo e m\u00e9dia das vari\u00e1veis\nlistadas em feat_num_lista e com a janela de tempo listada em\nlist_window.\n\n``` python\ntb_publico = 'tb_spine'\ntb_feat = 'tb_feat'\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\nsafra = 'SAFRA'\nfeat_num_lista = ['FEAT_NUM1','FEAT_NUM2']\nlist_window = [1,2,3]\n\nquery_final_num_snow = snow_create_query_num(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra)\n```\n\n Complete query creation with no saved file.\n\nPodemos salvar a consulta da query em um arquivo tamb\u00e9m:\n\n``` python\ntb_publico = 'tb_spine'\ntb_feat = 'tb_feat'\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\nsafra = 'SAFRA'\nfeat_num_lista = ['FEAT_NUM1','FEAT_NUM2']\nlist_window = [1,2,3]\nfile_name = 'table.sql'\n\nquery_final_num_snow = snow_create_query_num(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra, file_name)\n```\n\n Complete query creation with table.sql saved file\n\n### Cria\u00e7\u00e3o de vari\u00e1veis categ\u00f3ricas\n\n<span style=\"color: blue;\">(EN)</span> The function\n`mysql_create_query_cat()` generates a query text for creating variables\nwith the mode of each variable listed in feat_num_lista within the time\nwindow provided in list_window.\n\n<span style=\"color: green;\">(PT)</span> A fun\u00e7\u00e3o\n`mysql_create_query_cat()` cria um texto com a query para a cria\u00e7\u00e3o de\nvari\u00e1veis com a moda de cada uma das vari\u00e1veis listadas em\nfeat_num_lista na janela de tempo fornecida em list_window.\n\n``` python\ntb_publico = 'tb_spine'\ntb_feat = 'tb_feat'\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\nsafra = 'SAFRA'\nfeat_num_lista = ['FEAT_CAT1','FEAT_CAT2']\nlist_window = [1,2,3]\n\nquery_final_cat_snow = snow_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)\n```\n\n Complete query creation with no saved file.\n\n``` python\ntb_publico = 'tb_spine'\ntb_feat = 'tb_feat'\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\nsafra = 'SAFRA'\nfeat_num_lista = ['FEAT_CAT1','FEAT_CAT2']\nlist_window = [1,2,3]\nfile_name = 'table.sql'\n\nquery_final_cat_snow = snow_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name)\n```\n\n Complete query creation with table.sql saved file\n\n### Cria\u00e7\u00e3o de vari\u00e1veis agregadas\n\n<span style=\"color: blue;\">(EN)</span> We can create features by\naggregating on a specific value of a categorical variable and\ncalculating sum, avg, min, and max operations for user-defined numerical\nvariables. For example, for the variable FEAT_CAT1 having a value of A\nor B, we calculate sum, avg, min, and max operations for the variables\nFEAT_NUM1 and FEAT_NUM2. Here\u2019s a use case:\n\n<span style=\"color: green;\">(PT)</span> Podemos criar vari\u00e1veis\nagregando por valor espec\u00edfico de vari\u00e1vel categ\u00f3rica e calculando as\nopera\u00e7\u00f5es de sum, avg, min e max para as vari\u00e1veis n\u00famericas definidas\npelo usu\u00e1rio. Por exemplo, para a vari\u00e1vel `FEAT_CAT1` tendo valor igual\na `A` ou `B`, calculamos as opera\u00e7\u00f5es de sum, avg, min e max para as\nvari\u00e1veis `FEAT_NUM1` e `FEAT_NUM2`. Segue um caso de uso:\n\n``` python\nlista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']\nfeat_cat = 'FEAT_CAT1'\nlista_valor_agragador = ['B', 'C']\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\ntb_feat = 'tb_feat'\nsafra = 'SAFRA'\njanelas = [1, 2, 3]\ntb_publico = 'tb_spine'\n\n\nquery_final_cat_agre_snow = snow_create_query_agregada(tb_publico, tb_feat, janelas, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador)\n```\n\n Complete query creation with no saved file.\n\n<span style=\"color: blue;\">(EN)</span> We can also save the query to a\nfile:\n\n<span style=\"color: green;\">(PT)</span> Podemos tamb\u00e9m salvar a consulta\nem um arquivo:\n\n``` python\nlista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']\nfeat_cat = 'FEAT_CAT1'\nlista_valor_agragador = ['B', 'C']\nid = 'ID'\nsafra_ref = 'SAFRA_REF'\ntb_feat = 'tb_feat'\nsafra = 'SAFRA'\njanelas = [1, 2, 3]\ntb_publico = 'tb_spine'\nfile_name = 'table.sql'\n\nquery_final_cat_agre_snow = snow_create_query_agregada(tb_publico, tb_feat, janelas, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador, file_name)\n```\n\n Complete query creation with table.sql saved file\n\n\n",
"bugtrack_url": null,
"license": "Apache Software License 2.0",
"summary": "Create features with sql",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/ravennaro/featsql"
},
"split_keywords": [
"nbdev",
"jupyter",
"notebook",
"python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4fd0117c037676dd6626ff2ff843860fdb677a91a098335c09df3f3387b0704c",
"md5": "9d83728da6c3e10ed5793ac57abae49b",
"sha256": "9df93b161012dd1ec738e5e412484df593a48ce6e0f5a1555a41b24c2c5c7dcf"
},
"downloads": -1,
"filename": "featsql-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9d83728da6c3e10ed5793ac57abae49b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 22135,
"upload_time": "2024-06-16T23:49:06",
"upload_time_iso_8601": "2024-06-16T23:49:06.931179Z",
"url": "https://files.pythonhosted.org/packages/4f/d0/117c037676dd6626ff2ff843860fdb677a91a098335c09df3f3387b0704c/featsql-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "940d8e81f47909c4a062d0b097229e215cc3c26be010d55d423aaee1f9cce2da",
"md5": "421f41cf93e980833cc29037175e5bc0",
"sha256": "d83ece53fdc9004db9ffb42fd473368afb4a286ff9bcf811b7c104c410c05756"
},
"downloads": -1,
"filename": "featsql-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "421f41cf93e980833cc29037175e5bc0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 32590,
"upload_time": "2024-06-16T23:49:09",
"upload_time_iso_8601": "2024-06-16T23:49:09.266788Z",
"url": "https://files.pythonhosted.org/packages/94/0d/8e81f47909c4a062d0b097229e215cc3c26be010d55d423aaee1f9cce2da/featsql-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-16 23:49:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ravennaro",
"github_project": "featsql",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "featsql"
}