ts2ml


Namets2ml JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/joaopcnogueira/ts2ml
SummaryTools to Transform a Time Series into Features and Target a.k.a Supervised Learning
upload_time2023-06-03 14:11:28
maintainer
docs_urlNone
authorJoão Nogueira
requires_python>=3.7
licenseApache Software License 2.0
keywords nbdev jupyter notebook python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ts2ml

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Install

``` sh
pip install ts2ml
```

## How to use

``` python
import pandas as pd
from ts2ml.core import add_missing_slots
from ts2ml.core import transform_ts_data_into_features_and_target
```

``` python
df = pd.DataFrame({
    'pickup_hour': ['2022-01-01 00:00:00', '2022-01-01 01:00:00', '2022-01-01 03:00:00', '2022-01-01 01:00:00', '2022-01-01 02:00:00', '2022-01-01 05:00:00'],
    'pickup_location_id': [1, 1, 1, 2, 2, 2],
    'rides': [2, 3, 1, 1, 2, 1]
})
df
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | pickup_hour         | pickup_location_id | rides |
|-----|---------------------|--------------------|-------|
| 0   | 2022-01-01 00:00:00 | 1                  | 2     |
| 1   | 2022-01-01 01:00:00 | 1                  | 3     |
| 2   | 2022-01-01 03:00:00 | 1                  | 1     |
| 3   | 2022-01-01 01:00:00 | 2                  | 1     |
| 4   | 2022-01-01 02:00:00 | 2                  | 2     |
| 5   | 2022-01-01 05:00:00 | 2                  | 1     |

</div>

Let’s fill the missing slots with zeros

``` python
df = add_missing_slots(df, datetime_col='pickup_hour', entity_col='pickup_location_id', value_col='rides', freq='H')
df
```

    100%|██████████| 2/2 [00:00<00:00, 907.86it/s]

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | pickup_hour         | pickup_location_id | rides |
|-----|---------------------|--------------------|-------|
| 0   | 2022-01-01 00:00:00 | 1                  | 2     |
| 1   | 2022-01-01 01:00:00 | 1                  | 3     |
| 2   | 2022-01-01 02:00:00 | 1                  | 0     |
| 3   | 2022-01-01 03:00:00 | 1                  | 1     |
| 4   | 2022-01-01 04:00:00 | 1                  | 0     |
| 5   | 2022-01-01 05:00:00 | 1                  | 0     |
| 6   | 2022-01-01 00:00:00 | 2                  | 0     |
| 7   | 2022-01-01 01:00:00 | 2                  | 1     |
| 8   | 2022-01-01 02:00:00 | 2                  | 2     |
| 9   | 2022-01-01 03:00:00 | 2                  | 0     |
| 10  | 2022-01-01 04:00:00 | 2                  | 0     |
| 11  | 2022-01-01 05:00:00 | 2                  | 1     |

</div>

Now, let’s build features and targets to predict the number of rides for
the next hour for each location_id, by using the historical number of
rides for the last 3 hours

``` python
features, targets = transform_ts_data_into_features_and_target(
    df,
    n_features=3,
    datetime_col='pickup_hour', 
    entity_col='pickup_location_id', 
    value_col='rides',
    n_targets=1,
    step_size=1,
    step_name='hour'
)
```

    100%|██████████| 2/2 [00:00<00:00, 597.86it/s]

``` python
features
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | rides_previous_3_hour | rides_previous_2_hour | rides_previous_1_hour | pickup_hour         | pickup_location_id |
|-----|-----------------------|-----------------------|-----------------------|---------------------|--------------------|
| 0   | 2.0                   | 3.0                   | 0.0                   | 2022-01-01 03:00:00 | 1                  |
| 1   | 3.0                   | 0.0                   | 1.0                   | 2022-01-01 04:00:00 | 1                  |
| 2   | 0.0                   | 1.0                   | 2.0                   | 2022-01-01 03:00:00 | 2                  |
| 3   | 1.0                   | 2.0                   | 0.0                   | 2022-01-01 04:00:00 | 2                  |

</div>

``` python
targets
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | target_rides_next_hour |
|-----|------------------------|
| 0   | 1.0                    |
| 1   | 0.0                    |
| 2   | 0.0                    |
| 3   | 0.0                    |

</div>

``` python
Xy_df = pd.concat([features, targets], axis=1)
Xy_df
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | rides_previous_3_hour | rides_previous_2_hour | rides_previous_1_hour | pickup_hour         | pickup_location_id | target_rides_next_hour |
|-----|-----------------------|-----------------------|-----------------------|---------------------|--------------------|------------------------|
| 0   | 2.0                   | 3.0                   | 0.0                   | 2022-01-01 03:00:00 | 1                  | 1.0                    |
| 1   | 3.0                   | 0.0                   | 1.0                   | 2022-01-01 04:00:00 | 1                  | 0.0                    |
| 2   | 0.0                   | 1.0                   | 2.0                   | 2022-01-01 03:00:00 | 2                  | 0.0                    |
| 3   | 1.0                   | 2.0                   | 0.0                   | 2022-01-01 04:00:00 | 2                  | 0.0                    |

</div>

# Another Example

Montly spaced time series

``` python
import pandas as pd
import numpy as np

# Generate timestamp index with monthly frequency
date_rng = pd.date_range(start='1/1/2020', end='12/1/2022', freq='MS')

# Create list of city codes
cities = ['FOR', 'SP', 'RJ']

# Create dataframe with random sales data for each city on each month
df = pd.DataFrame({
    'date': date_rng,
    'city': np.repeat(cities, len(date_rng)//len(cities)),
    'sales': np.random.randint(1000, 5000, size=len(date_rng))
})
df
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | date       | city | sales |
|-----|------------|------|-------|
| 0   | 2020-01-01 | FOR  | 4944  |
| 1   | 2020-02-01 | FOR  | 3435  |
| 2   | 2020-03-01 | FOR  | 4543  |
| 3   | 2020-04-01 | FOR  | 3879  |
| 4   | 2020-05-01 | FOR  | 2601  |
| 5   | 2020-06-01 | FOR  | 2922  |
| 6   | 2020-07-01 | FOR  | 4542  |
| 7   | 2020-08-01 | FOR  | 1338  |
| 8   | 2020-09-01 | FOR  | 2938  |
| 9   | 2020-10-01 | FOR  | 2695  |
| 10  | 2020-11-01 | FOR  | 4065  |
| 11  | 2020-12-01 | FOR  | 3864  |
| 12  | 2021-01-01 | SP   | 2652  |
| 13  | 2021-02-01 | SP   | 2137  |
| 14  | 2021-03-01 | SP   | 2663  |
| 15  | 2021-04-01 | SP   | 1168  |
| 16  | 2021-05-01 | SP   | 4523  |
| 17  | 2021-06-01 | SP   | 4135  |
| 18  | 2021-07-01 | SP   | 3566  |
| 19  | 2021-08-01 | SP   | 2121  |
| 20  | 2021-09-01 | SP   | 1070  |
| 21  | 2021-10-01 | SP   | 1624  |
| 22  | 2021-11-01 | SP   | 3034  |
| 23  | 2021-12-01 | SP   | 4063  |
| 24  | 2022-01-01 | RJ   | 2297  |
| 25  | 2022-02-01 | RJ   | 3430  |
| 26  | 2022-03-01 | RJ   | 2903  |
| 27  | 2022-04-01 | RJ   | 4197  |
| 28  | 2022-05-01 | RJ   | 4141  |
| 29  | 2022-06-01 | RJ   | 2899  |
| 30  | 2022-07-01 | RJ   | 4529  |
| 31  | 2022-08-01 | RJ   | 3612  |
| 32  | 2022-09-01 | RJ   | 1856  |
| 33  | 2022-10-01 | RJ   | 4804  |
| 34  | 2022-11-01 | RJ   | 1764  |
| 35  | 2022-12-01 | RJ   | 4425  |

</div>

FOR city only have data for 2020 year, RJ only for 2022 and SP only for
2021. Let’s also simulate more missing slots between the years.

``` python
# Generate random indices to drop
drop_indices = np.random.choice(df.index, size=int(len(df)*0.2), replace=False)

# Drop selected rows from dataframe
df = df.drop(drop_indices)
df.reset_index(drop=True, inplace=True)
df
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | date       | city | sales |
|-----|------------|------|-------|
| 0   | 2020-01-01 | FOR  | 4944  |
| 1   | 2020-02-01 | FOR  | 3435  |
| 2   | 2020-03-01 | FOR  | 4543  |
| 3   | 2020-04-01 | FOR  | 3879  |
| 4   | 2020-05-01 | FOR  | 2601  |
| 5   | 2020-06-01 | FOR  | 2922  |
| 6   | 2020-07-01 | FOR  | 4542  |
| 7   | 2020-08-01 | FOR  | 1338  |
| 8   | 2020-09-01 | FOR  | 2938  |
| 9   | 2020-11-01 | FOR  | 4065  |
| 10  | 2020-12-01 | FOR  | 3864  |
| 11  | 2021-01-01 | SP   | 2652  |
| 12  | 2021-02-01 | SP   | 2137  |
| 13  | 2021-03-01 | SP   | 2663  |
| 14  | 2021-07-01 | SP   | 3566  |
| 15  | 2021-08-01 | SP   | 2121  |
| 16  | 2021-10-01 | SP   | 1624  |
| 17  | 2021-11-01 | SP   | 3034  |
| 18  | 2021-12-01 | SP   | 4063  |
| 19  | 2022-01-01 | RJ   | 2297  |
| 20  | 2022-02-01 | RJ   | 3430  |
| 21  | 2022-03-01 | RJ   | 2903  |
| 22  | 2022-04-01 | RJ   | 4197  |
| 23  | 2022-05-01 | RJ   | 4141  |
| 24  | 2022-06-01 | RJ   | 2899  |
| 25  | 2022-09-01 | RJ   | 1856  |
| 26  | 2022-10-01 | RJ   | 4804  |
| 27  | 2022-11-01 | RJ   | 1764  |
| 28  | 2022-12-01 | RJ   | 4425  |

</div>

Now lets fill the missing slots with zero values. The function will
complete the missing slots with zeros:

``` python
df_full = add_missing_slots(df, datetime_col='date', entity_col='city', value_col='sales', freq='MS')
df_full
```

    100%|██████████| 3/3 [00:00<00:00, 843.70it/s]

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | date       | city | sales |
|-----|------------|------|-------|
| 0   | 2020-01-01 | FOR  | 4944  |
| 1   | 2020-02-01 | FOR  | 3435  |
| 2   | 2020-03-01 | FOR  | 4543  |
| 3   | 2020-04-01 | FOR  | 3879  |
| 4   | 2020-05-01 | FOR  | 2601  |
| ... | ...        | ...  | ...   |
| 103 | 2022-08-01 | RJ   | 0     |
| 104 | 2022-09-01 | RJ   | 1856  |
| 105 | 2022-10-01 | RJ   | 4804  |
| 106 | 2022-11-01 | RJ   | 1764  |
| 107 | 2022-12-01 | RJ   | 4425  |

<p>108 rows × 3 columns</p>
</div>

Let’s build a dataset for training a machine learning model to predict
the sales for the next 3 months, for each city, based on historical data
of sales for the previous 6 months.

``` python
features, targets = transform_ts_data_into_features_and_target(
    df_full,
    n_features=3,
    datetime_col='date',
    entity_col='city',
    value_col='sales',
    n_targets=1,
    step_size=1,
    step_name='month'
)
```

    100%|██████████| 3/3 [00:00<00:00, 205.58it/s]

``` python
pd.concat([features, targets], axis=1)
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | sales_previous_3_month | sales_previous_2_month | sales_previous_1_month | date       | city | target_sales_next_month |
|-----|------------------------|------------------------|------------------------|------------|------|-------------------------|
| 0   | 4944.0                 | 3435.0                 | 4543.0                 | 2020-04-01 | FOR  | 3879.0                  |
| 1   | 3435.0                 | 4543.0                 | 3879.0                 | 2020-05-01 | FOR  | 2601.0                  |
| 2   | 4543.0                 | 3879.0                 | 2601.0                 | 2020-06-01 | FOR  | 2922.0                  |
| 3   | 3879.0                 | 2601.0                 | 2922.0                 | 2020-07-01 | FOR  | 4542.0                  |
| 4   | 2601.0                 | 2922.0                 | 4542.0                 | 2020-08-01 | FOR  | 1338.0                  |
| ... | ...                    | ...                    | ...                    | ...        | ...  | ...                     |
| 91  | 4197.0                 | 4141.0                 | 2899.0                 | 2022-07-01 | RJ   | 0.0                     |
| 92  | 4141.0                 | 2899.0                 | 0.0                    | 2022-08-01 | RJ   | 0.0                     |
| 93  | 2899.0                 | 0.0                    | 0.0                    | 2022-09-01 | RJ   | 1856.0                  |
| 94  | 0.0                    | 0.0                    | 1856.0                 | 2022-10-01 | RJ   | 4804.0                  |
| 95  | 0.0                    | 1856.0                 | 4804.0                 | 2022-11-01 | RJ   | 1764.0                  |

<p>96 rows × 6 columns</p>
</div>

# Embedding on Sklearn Pipelines

``` python
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer
```

``` python
add_missing_slots_transformer = FunctionTransformer(
    add_missing_slots, 
    kw_args={
        'datetime_col': 'date', 
        'entity_col': 'city', 
        'value_col': 'sales', 
        'freq': 'MS'
    }
)

transform_ts_data_into_features_and_target_transformer = FunctionTransformer(
    transform_ts_data_into_features_and_target, 
    kw_args={
        'n_features': 3, 
        'datetime_col': 'date', 
        'entity_col': 'city', 
        'value_col': 'sales', 
        'n_targets': 1, 
        'step_size': 1, 
        'step_name': 'month',
        'concat_Xy': True
    }
)
```

``` python
ts_data_to_features_and_target_pipeline = make_pipeline(
    add_missing_slots_transformer,
    transform_ts_data_into_features_and_target_transformer
)
ts_data_to_features_and_target_pipeline
```

<style>#sk-container-id-3 {color: black;background-color: white;}#sk-container-id-3 pre{padding: 0;}#sk-container-id-3 div.sk-toggleable {background-color: white;}#sk-container-id-3 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-3 label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-3 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-3 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-3 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-3 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-3 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-3 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-container-id-3 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-3 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-3 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-3 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-3 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-3 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-3 div.sk-item {position: relative;z-index: 1;}#sk-container-id-3 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-3 div.sk-item::before, #sk-container-id-3 div.sk-parallel-item::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-3 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-3 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-3 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-3 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-3 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-3 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-3 div.sk-label-container {text-align: center;}#sk-container-id-3 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-3 div.sk-text-repr-fallback {display: none;}</style><div id="sk-container-id-3" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>Pipeline(steps=[(&#x27;functiontransformer-1&#x27;,
                 FunctionTransformer(func=&lt;function add_missing_slots at 0x11f8f49d0&gt;,
                                     kw_args={&#x27;datetime_col&#x27;: &#x27;date&#x27;,
                                              &#x27;entity_col&#x27;: &#x27;city&#x27;,
                                              &#x27;freq&#x27;: &#x27;MS&#x27;,
                                              &#x27;value_col&#x27;: &#x27;sales&#x27;})),
                (&#x27;functiontransformer-2&#x27;,
                 FunctionTransformer(func=&lt;function transform_ts_data_into_features_and_target at 0x11f925ca0&gt;,
                                     kw_args={&#x27;concat_Xy&#x27;: True,
                                              &#x27;datetime_col&#x27;: &#x27;date&#x27;,
                                              &#x27;entity_col&#x27;: &#x27;city&#x27;,
                                              &#x27;n_features&#x27;: 3, &#x27;n_targets&#x27;: 1,
                                              &#x27;step_name&#x27;: &#x27;month&#x27;,
                                              &#x27;step_size&#x27;: 1,
                                              &#x27;value_col&#x27;: &#x27;sales&#x27;}))])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-7" type="checkbox" ><label for="sk-estimator-id-7" class="sk-toggleable__label sk-toggleable__label-arrow">Pipeline</label><div class="sk-toggleable__content"><pre>Pipeline(steps=[(&#x27;functiontransformer-1&#x27;,
                 FunctionTransformer(func=&lt;function add_missing_slots at 0x11f8f49d0&gt;,
                                     kw_args={&#x27;datetime_col&#x27;: &#x27;date&#x27;,
                                              &#x27;entity_col&#x27;: &#x27;city&#x27;,
                                              &#x27;freq&#x27;: &#x27;MS&#x27;,
                                              &#x27;value_col&#x27;: &#x27;sales&#x27;})),
                (&#x27;functiontransformer-2&#x27;,
                 FunctionTransformer(func=&lt;function transform_ts_data_into_features_and_target at 0x11f925ca0&gt;,
                                     kw_args={&#x27;concat_Xy&#x27;: True,
                                              &#x27;datetime_col&#x27;: &#x27;date&#x27;,
                                              &#x27;entity_col&#x27;: &#x27;city&#x27;,
                                              &#x27;n_features&#x27;: 3, &#x27;n_targets&#x27;: 1,
                                              &#x27;step_name&#x27;: &#x27;month&#x27;,
                                              &#x27;step_size&#x27;: 1,
                                              &#x27;value_col&#x27;: &#x27;sales&#x27;}))])</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-8" type="checkbox" ><label for="sk-estimator-id-8" class="sk-toggleable__label sk-toggleable__label-arrow">FunctionTransformer</label><div class="sk-toggleable__content"><pre>FunctionTransformer(func=&lt;function add_missing_slots at 0x11f8f49d0&gt;,
                    kw_args={&#x27;datetime_col&#x27;: &#x27;date&#x27;, &#x27;entity_col&#x27;: &#x27;city&#x27;,
                             &#x27;freq&#x27;: &#x27;MS&#x27;, &#x27;value_col&#x27;: &#x27;sales&#x27;})</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-9" type="checkbox" ><label for="sk-estimator-id-9" class="sk-toggleable__label sk-toggleable__label-arrow">FunctionTransformer</label><div class="sk-toggleable__content"><pre>FunctionTransformer(func=&lt;function transform_ts_data_into_features_and_target at 0x11f925ca0&gt;,
                    kw_args={&#x27;concat_Xy&#x27;: True, &#x27;datetime_col&#x27;: &#x27;date&#x27;,
                             &#x27;entity_col&#x27;: &#x27;city&#x27;, &#x27;n_features&#x27;: 3,
                             &#x27;n_targets&#x27;: 1, &#x27;step_name&#x27;: &#x27;month&#x27;,
                             &#x27;step_size&#x27;: 1, &#x27;value_col&#x27;: &#x27;sales&#x27;})</pre></div></div></div></div></div></div></div>

``` python
Xy_df = ts_data_to_features_and_target_pipeline.fit_transform(df)
Xy_df
```

    100%|██████████| 3/3 [00:00<00:00, 715.47it/s]
    100%|██████████| 3/3 [00:00<00:00, 184.12it/s]

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | sales_previous_3_month | sales_previous_2_month | sales_previous_1_month | date       | city | target_sales_next_month |
|-----|------------------------|------------------------|------------------------|------------|------|-------------------------|
| 0   | 4944.0                 | 3435.0                 | 4543.0                 | 2020-04-01 | FOR  | 3879.0                  |
| 1   | 3435.0                 | 4543.0                 | 3879.0                 | 2020-05-01 | FOR  | 2601.0                  |
| 2   | 4543.0                 | 3879.0                 | 2601.0                 | 2020-06-01 | FOR  | 2922.0                  |
| 3   | 3879.0                 | 2601.0                 | 2922.0                 | 2020-07-01 | FOR  | 4542.0                  |
| 4   | 2601.0                 | 2922.0                 | 4542.0                 | 2020-08-01 | FOR  | 1338.0                  |
| ... | ...                    | ...                    | ...                    | ...        | ...  | ...                     |
| 91  | 4197.0                 | 4141.0                 | 2899.0                 | 2022-07-01 | RJ   | 0.0                     |
| 92  | 4141.0                 | 2899.0                 | 0.0                    | 2022-08-01 | RJ   | 0.0                     |
| 93  | 2899.0                 | 0.0                    | 0.0                    | 2022-09-01 | RJ   | 1856.0                  |
| 94  | 0.0                    | 0.0                    | 1856.0                 | 2022-10-01 | RJ   | 4804.0                  |
| 95  | 0.0                    | 1856.0                 | 4804.0                 | 2022-11-01 | RJ   | 1764.0                  |

<p>96 rows × 6 columns</p>
</div>



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/joaopcnogueira/ts2ml",
    "name": "ts2ml",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "nbdev jupyter notebook python",
    "author": "Jo\u00e3o Nogueira",
    "author_email": "joaopcnogueira@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/56/c0/46aa04962b31d0eb5282df70a68fbde9a29a3b5f9bf893ddf9d96f3c75a2/ts2ml-1.0.1.tar.gz",
    "platform": null,
    "description": "# ts2ml\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n## Install\n\n``` sh\npip install ts2ml\n```\n\n## How to use\n\n``` python\nimport pandas as pd\nfrom ts2ml.core import add_missing_slots\nfrom ts2ml.core import transform_ts_data_into_features_and_target\n```\n\n``` python\ndf = pd.DataFrame({\n    'pickup_hour': ['2022-01-01 00:00:00', '2022-01-01 01:00:00', '2022-01-01 03:00:00', '2022-01-01 01:00:00', '2022-01-01 02:00:00', '2022-01-01 05:00:00'],\n    'pickup_location_id': [1, 1, 1, 2, 2, 2],\n    'rides': [2, 3, 1, 1, 2, 1]\n})\ndf\n```\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|     | pickup_hour         | pickup_location_id | rides |\n|-----|---------------------|--------------------|-------|\n| 0   | 2022-01-01 00:00:00 | 1                  | 2     |\n| 1   | 2022-01-01 01:00:00 | 1                  | 3     |\n| 2   | 2022-01-01 03:00:00 | 1                  | 1     |\n| 3   | 2022-01-01 01:00:00 | 2                  | 1     |\n| 4   | 2022-01-01 02:00:00 | 2                  | 2     |\n| 5   | 2022-01-01 05:00:00 | 2                  | 1     |\n\n</div>\n\nLet\u2019s fill the missing slots with zeros\n\n``` python\ndf = add_missing_slots(df, datetime_col='pickup_hour', entity_col='pickup_location_id', value_col='rides', freq='H')\ndf\n```\n\n    100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:00<00:00, 907.86it/s]\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|     | pickup_hour         | pickup_location_id | rides |\n|-----|---------------------|--------------------|-------|\n| 0   | 2022-01-01 00:00:00 | 1                  | 2     |\n| 1   | 2022-01-01 01:00:00 | 1                  | 3     |\n| 2   | 2022-01-01 02:00:00 | 1                  | 0     |\n| 3   | 2022-01-01 03:00:00 | 1                  | 1     |\n| 4   | 2022-01-01 04:00:00 | 1                  | 0     |\n| 5   | 2022-01-01 05:00:00 | 1                  | 0     |\n| 6   | 2022-01-01 00:00:00 | 2                  | 0     |\n| 7   | 2022-01-01 01:00:00 | 2                  | 1     |\n| 8   | 2022-01-01 02:00:00 | 2                  | 2     |\n| 9   | 2022-01-01 03:00:00 | 2                  | 0     |\n| 10  | 2022-01-01 04:00:00 | 2                  | 0     |\n| 11  | 2022-01-01 05:00:00 | 2                  | 1     |\n\n</div>\n\nNow, let\u2019s build features and targets to predict the number of rides for\nthe next hour for each location_id, by using the historical number of\nrides for the last 3 hours\n\n``` python\nfeatures, targets = transform_ts_data_into_features_and_target(\n    df,\n    n_features=3,\n    datetime_col='pickup_hour', \n    entity_col='pickup_location_id', \n    value_col='rides',\n    n_targets=1,\n    step_size=1,\n    step_name='hour'\n)\n```\n\n    100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:00<00:00, 597.86it/s]\n\n``` python\nfeatures\n```\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|     | rides_previous_3_hour | rides_previous_2_hour | rides_previous_1_hour | pickup_hour         | pickup_location_id |\n|-----|-----------------------|-----------------------|-----------------------|---------------------|--------------------|\n| 0   | 2.0                   | 3.0                   | 0.0                   | 2022-01-01 03:00:00 | 1                  |\n| 1   | 3.0                   | 0.0                   | 1.0                   | 2022-01-01 04:00:00 | 1                  |\n| 2   | 0.0                   | 1.0                   | 2.0                   | 2022-01-01 03:00:00 | 2                  |\n| 3   | 1.0                   | 2.0                   | 0.0                   | 2022-01-01 04:00:00 | 2                  |\n\n</div>\n\n``` python\ntargets\n```\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|     | target_rides_next_hour |\n|-----|------------------------|\n| 0   | 1.0                    |\n| 1   | 0.0                    |\n| 2   | 0.0                    |\n| 3   | 0.0                    |\n\n</div>\n\n``` python\nXy_df = pd.concat([features, targets], axis=1)\nXy_df\n```\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|     | rides_previous_3_hour | rides_previous_2_hour | rides_previous_1_hour | pickup_hour         | pickup_location_id | target_rides_next_hour |\n|-----|-----------------------|-----------------------|-----------------------|---------------------|--------------------|------------------------|\n| 0   | 2.0                   | 3.0                   | 0.0                   | 2022-01-01 03:00:00 | 1                  | 1.0                    |\n| 1   | 3.0                   | 0.0                   | 1.0                   | 2022-01-01 04:00:00 | 1                  | 0.0                    |\n| 2   | 0.0                   | 1.0                   | 2.0                   | 2022-01-01 03:00:00 | 2                  | 0.0                    |\n| 3   | 1.0                   | 2.0                   | 0.0                   | 2022-01-01 04:00:00 | 2                  | 0.0                    |\n\n</div>\n\n# Another Example\n\nMontly spaced time series\n\n``` python\nimport pandas as pd\nimport numpy as np\n\n# Generate timestamp index with monthly frequency\ndate_rng = pd.date_range(start='1/1/2020', end='12/1/2022', freq='MS')\n\n# Create list of city codes\ncities = ['FOR', 'SP', 'RJ']\n\n# Create dataframe with random sales data for each city on each month\ndf = pd.DataFrame({\n    'date': date_rng,\n    'city': np.repeat(cities, len(date_rng)//len(cities)),\n    'sales': np.random.randint(1000, 5000, size=len(date_rng))\n})\ndf\n```\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|     | date       | city | sales |\n|-----|------------|------|-------|\n| 0   | 2020-01-01 | FOR  | 4944  |\n| 1   | 2020-02-01 | FOR  | 3435  |\n| 2   | 2020-03-01 | FOR  | 4543  |\n| 3   | 2020-04-01 | FOR  | 3879  |\n| 4   | 2020-05-01 | FOR  | 2601  |\n| 5   | 2020-06-01 | FOR  | 2922  |\n| 6   | 2020-07-01 | FOR  | 4542  |\n| 7   | 2020-08-01 | FOR  | 1338  |\n| 8   | 2020-09-01 | FOR  | 2938  |\n| 9   | 2020-10-01 | FOR  | 2695  |\n| 10  | 2020-11-01 | FOR  | 4065  |\n| 11  | 2020-12-01 | FOR  | 3864  |\n| 12  | 2021-01-01 | SP   | 2652  |\n| 13  | 2021-02-01 | SP   | 2137  |\n| 14  | 2021-03-01 | SP   | 2663  |\n| 15  | 2021-04-01 | SP   | 1168  |\n| 16  | 2021-05-01 | SP   | 4523  |\n| 17  | 2021-06-01 | SP   | 4135  |\n| 18  | 2021-07-01 | SP   | 3566  |\n| 19  | 2021-08-01 | SP   | 2121  |\n| 20  | 2021-09-01 | SP   | 1070  |\n| 21  | 2021-10-01 | SP   | 1624  |\n| 22  | 2021-11-01 | SP   | 3034  |\n| 23  | 2021-12-01 | SP   | 4063  |\n| 24  | 2022-01-01 | RJ   | 2297  |\n| 25  | 2022-02-01 | RJ   | 3430  |\n| 26  | 2022-03-01 | RJ   | 2903  |\n| 27  | 2022-04-01 | RJ   | 4197  |\n| 28  | 2022-05-01 | RJ   | 4141  |\n| 29  | 2022-06-01 | RJ   | 2899  |\n| 30  | 2022-07-01 | RJ   | 4529  |\n| 31  | 2022-08-01 | RJ   | 3612  |\n| 32  | 2022-09-01 | RJ   | 1856  |\n| 33  | 2022-10-01 | RJ   | 4804  |\n| 34  | 2022-11-01 | RJ   | 1764  |\n| 35  | 2022-12-01 | RJ   | 4425  |\n\n</div>\n\nFOR city only have data for 2020 year, RJ only for 2022 and SP only for\n2021. Let\u2019s also simulate more missing slots between the years.\n\n``` python\n# Generate random indices to drop\ndrop_indices = np.random.choice(df.index, size=int(len(df)*0.2), replace=False)\n\n# Drop selected rows from dataframe\ndf = df.drop(drop_indices)\ndf.reset_index(drop=True, inplace=True)\ndf\n```\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|     | date       | city | sales |\n|-----|------------|------|-------|\n| 0   | 2020-01-01 | FOR  | 4944  |\n| 1   | 2020-02-01 | FOR  | 3435  |\n| 2   | 2020-03-01 | FOR  | 4543  |\n| 3   | 2020-04-01 | FOR  | 3879  |\n| 4   | 2020-05-01 | FOR  | 2601  |\n| 5   | 2020-06-01 | FOR  | 2922  |\n| 6   | 2020-07-01 | FOR  | 4542  |\n| 7   | 2020-08-01 | FOR  | 1338  |\n| 8   | 2020-09-01 | FOR  | 2938  |\n| 9   | 2020-11-01 | FOR  | 4065  |\n| 10  | 2020-12-01 | FOR  | 3864  |\n| 11  | 2021-01-01 | SP   | 2652  |\n| 12  | 2021-02-01 | SP   | 2137  |\n| 13  | 2021-03-01 | SP   | 2663  |\n| 14  | 2021-07-01 | SP   | 3566  |\n| 15  | 2021-08-01 | SP   | 2121  |\n| 16  | 2021-10-01 | SP   | 1624  |\n| 17  | 2021-11-01 | SP   | 3034  |\n| 18  | 2021-12-01 | SP   | 4063  |\n| 19  | 2022-01-01 | RJ   | 2297  |\n| 20  | 2022-02-01 | RJ   | 3430  |\n| 21  | 2022-03-01 | RJ   | 2903  |\n| 22  | 2022-04-01 | RJ   | 4197  |\n| 23  | 2022-05-01 | RJ   | 4141  |\n| 24  | 2022-06-01 | RJ   | 2899  |\n| 25  | 2022-09-01 | RJ   | 1856  |\n| 26  | 2022-10-01 | RJ   | 4804  |\n| 27  | 2022-11-01 | RJ   | 1764  |\n| 28  | 2022-12-01 | RJ   | 4425  |\n\n</div>\n\nNow lets fill the missing slots with zero values. The function will\ncomplete the missing slots with zeros:\n\n``` python\ndf_full = add_missing_slots(df, datetime_col='date', entity_col='city', value_col='sales', freq='MS')\ndf_full\n```\n\n    100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 843.70it/s]\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|     | date       | city | sales |\n|-----|------------|------|-------|\n| 0   | 2020-01-01 | FOR  | 4944  |\n| 1   | 2020-02-01 | FOR  | 3435  |\n| 2   | 2020-03-01 | FOR  | 4543  |\n| 3   | 2020-04-01 | FOR  | 3879  |\n| 4   | 2020-05-01 | FOR  | 2601  |\n| ... | ...        | ...  | ...   |\n| 103 | 2022-08-01 | RJ   | 0     |\n| 104 | 2022-09-01 | RJ   | 1856  |\n| 105 | 2022-10-01 | RJ   | 4804  |\n| 106 | 2022-11-01 | RJ   | 1764  |\n| 107 | 2022-12-01 | RJ   | 4425  |\n\n<p>108 rows \u00d7 3 columns</p>\n</div>\n\nLet\u2019s build a dataset for training a machine learning model to predict\nthe sales for the next 3 months, for each city, based on historical data\nof sales for the previous 6 months.\n\n``` python\nfeatures, targets = transform_ts_data_into_features_and_target(\n    df_full,\n    n_features=3,\n    datetime_col='date',\n    entity_col='city',\n    value_col='sales',\n    n_targets=1,\n    step_size=1,\n    step_name='month'\n)\n```\n\n    100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 205.58it/s]\n\n``` python\npd.concat([features, targets], axis=1)\n```\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|     | sales_previous_3_month | sales_previous_2_month | sales_previous_1_month | date       | city | target_sales_next_month |\n|-----|------------------------|------------------------|------------------------|------------|------|-------------------------|\n| 0   | 4944.0                 | 3435.0                 | 4543.0                 | 2020-04-01 | FOR  | 3879.0                  |\n| 1   | 3435.0                 | 4543.0                 | 3879.0                 | 2020-05-01 | FOR  | 2601.0                  |\n| 2   | 4543.0                 | 3879.0                 | 2601.0                 | 2020-06-01 | FOR  | 2922.0                  |\n| 3   | 3879.0                 | 2601.0                 | 2922.0                 | 2020-07-01 | FOR  | 4542.0                  |\n| 4   | 2601.0                 | 2922.0                 | 4542.0                 | 2020-08-01 | FOR  | 1338.0                  |\n| ... | ...                    | ...                    | ...                    | ...        | ...  | ...                     |\n| 91  | 4197.0                 | 4141.0                 | 2899.0                 | 2022-07-01 | RJ   | 0.0                     |\n| 92  | 4141.0                 | 2899.0                 | 0.0                    | 2022-08-01 | RJ   | 0.0                     |\n| 93  | 2899.0                 | 0.0                    | 0.0                    | 2022-09-01 | RJ   | 1856.0                  |\n| 94  | 0.0                    | 0.0                    | 1856.0                 | 2022-10-01 | RJ   | 4804.0                  |\n| 95  | 0.0                    | 1856.0                 | 4804.0                 | 2022-11-01 | RJ   | 1764.0                  |\n\n<p>96 rows \u00d7 6 columns</p>\n</div>\n\n# Embedding on Sklearn Pipelines\n\n``` python\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import FunctionTransformer\n```\n\n``` python\nadd_missing_slots_transformer = FunctionTransformer(\n    add_missing_slots, \n    kw_args={\n        'datetime_col': 'date', \n        'entity_col': 'city', \n        'value_col': 'sales', \n        'freq': 'MS'\n    }\n)\n\ntransform_ts_data_into_features_and_target_transformer = FunctionTransformer(\n    transform_ts_data_into_features_and_target, \n    kw_args={\n        'n_features': 3, \n        'datetime_col': 'date', \n        'entity_col': 'city', \n        'value_col': 'sales', \n        'n_targets': 1, \n        'step_size': 1, \n        'step_name': 'month',\n        'concat_Xy': True\n    }\n)\n```\n\n``` python\nts_data_to_features_and_target_pipeline = make_pipeline(\n    add_missing_slots_transformer,\n    transform_ts_data_into_features_and_target_transformer\n)\nts_data_to_features_and_target_pipeline\n```\n\n<style>#sk-container-id-3 {color: black;background-color: white;}#sk-container-id-3 pre{padding: 0;}#sk-container-id-3 div.sk-toggleable {background-color: white;}#sk-container-id-3 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-3 label.sk-toggleable__label-arrow:before {content: \"\u25b8\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-3 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-3 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-3 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-3 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-3 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-3 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"\u25be\";}#sk-container-id-3 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-3 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-3 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-3 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-3 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-3 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-3 div.sk-item {position: relative;z-index: 1;}#sk-container-id-3 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-3 div.sk-item::before, #sk-container-id-3 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-3 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-3 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-3 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-3 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-3 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-3 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-3 div.sk-label-container {text-align: center;}#sk-container-id-3 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-3 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-3\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>Pipeline(steps=[(&#x27;functiontransformer-1&#x27;,\n                 FunctionTransformer(func=&lt;function add_missing_slots at 0x11f8f49d0&gt;,\n                                     kw_args={&#x27;datetime_col&#x27;: &#x27;date&#x27;,\n                                              &#x27;entity_col&#x27;: &#x27;city&#x27;,\n                                              &#x27;freq&#x27;: &#x27;MS&#x27;,\n                                              &#x27;value_col&#x27;: &#x27;sales&#x27;})),\n                (&#x27;functiontransformer-2&#x27;,\n                 FunctionTransformer(func=&lt;function transform_ts_data_into_features_and_target at 0x11f925ca0&gt;,\n                                     kw_args={&#x27;concat_Xy&#x27;: True,\n                                              &#x27;datetime_col&#x27;: &#x27;date&#x27;,\n                                              &#x27;entity_col&#x27;: &#x27;city&#x27;,\n                                              &#x27;n_features&#x27;: 3, &#x27;n_targets&#x27;: 1,\n                                              &#x27;step_name&#x27;: &#x27;month&#x27;,\n                                              &#x27;step_size&#x27;: 1,\n                                              &#x27;value_col&#x27;: &#x27;sales&#x27;}))])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-7\" type=\"checkbox\" ><label for=\"sk-estimator-id-7\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">Pipeline</label><div class=\"sk-toggleable__content\"><pre>Pipeline(steps=[(&#x27;functiontransformer-1&#x27;,\n                 FunctionTransformer(func=&lt;function add_missing_slots at 0x11f8f49d0&gt;,\n                                     kw_args={&#x27;datetime_col&#x27;: &#x27;date&#x27;,\n                                              &#x27;entity_col&#x27;: &#x27;city&#x27;,\n                                              &#x27;freq&#x27;: &#x27;MS&#x27;,\n                                              &#x27;value_col&#x27;: &#x27;sales&#x27;})),\n                (&#x27;functiontransformer-2&#x27;,\n                 FunctionTransformer(func=&lt;function transform_ts_data_into_features_and_target at 0x11f925ca0&gt;,\n                                     kw_args={&#x27;concat_Xy&#x27;: True,\n                                              &#x27;datetime_col&#x27;: &#x27;date&#x27;,\n                                              &#x27;entity_col&#x27;: &#x27;city&#x27;,\n                                              &#x27;n_features&#x27;: 3, &#x27;n_targets&#x27;: 1,\n                                              &#x27;step_name&#x27;: &#x27;month&#x27;,\n                                              &#x27;step_size&#x27;: 1,\n                                              &#x27;value_col&#x27;: &#x27;sales&#x27;}))])</pre></div></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-8\" type=\"checkbox\" ><label for=\"sk-estimator-id-8\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">FunctionTransformer</label><div class=\"sk-toggleable__content\"><pre>FunctionTransformer(func=&lt;function add_missing_slots at 0x11f8f49d0&gt;,\n                    kw_args={&#x27;datetime_col&#x27;: &#x27;date&#x27;, &#x27;entity_col&#x27;: &#x27;city&#x27;,\n                             &#x27;freq&#x27;: &#x27;MS&#x27;, &#x27;value_col&#x27;: &#x27;sales&#x27;})</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-9\" type=\"checkbox\" ><label for=\"sk-estimator-id-9\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">FunctionTransformer</label><div class=\"sk-toggleable__content\"><pre>FunctionTransformer(func=&lt;function transform_ts_data_into_features_and_target at 0x11f925ca0&gt;,\n                    kw_args={&#x27;concat_Xy&#x27;: True, &#x27;datetime_col&#x27;: &#x27;date&#x27;,\n                             &#x27;entity_col&#x27;: &#x27;city&#x27;, &#x27;n_features&#x27;: 3,\n                             &#x27;n_targets&#x27;: 1, &#x27;step_name&#x27;: &#x27;month&#x27;,\n                             &#x27;step_size&#x27;: 1, &#x27;value_col&#x27;: &#x27;sales&#x27;})</pre></div></div></div></div></div></div></div>\n\n``` python\nXy_df = ts_data_to_features_and_target_pipeline.fit_transform(df)\nXy_df\n```\n\n    100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 715.47it/s]\n    100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 184.12it/s]\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n&#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n&#10;    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n\n|     | sales_previous_3_month | sales_previous_2_month | sales_previous_1_month | date       | city | target_sales_next_month |\n|-----|------------------------|------------------------|------------------------|------------|------|-------------------------|\n| 0   | 4944.0                 | 3435.0                 | 4543.0                 | 2020-04-01 | FOR  | 3879.0                  |\n| 1   | 3435.0                 | 4543.0                 | 3879.0                 | 2020-05-01 | FOR  | 2601.0                  |\n| 2   | 4543.0                 | 3879.0                 | 2601.0                 | 2020-06-01 | FOR  | 2922.0                  |\n| 3   | 3879.0                 | 2601.0                 | 2922.0                 | 2020-07-01 | FOR  | 4542.0                  |\n| 4   | 2601.0                 | 2922.0                 | 4542.0                 | 2020-08-01 | FOR  | 1338.0                  |\n| ... | ...                    | ...                    | ...                    | ...        | ...  | ...                     |\n| 91  | 4197.0                 | 4141.0                 | 2899.0                 | 2022-07-01 | RJ   | 0.0                     |\n| 92  | 4141.0                 | 2899.0                 | 0.0                    | 2022-08-01 | RJ   | 0.0                     |\n| 93  | 2899.0                 | 0.0                    | 0.0                    | 2022-09-01 | RJ   | 1856.0                  |\n| 94  | 0.0                    | 0.0                    | 1856.0                 | 2022-10-01 | RJ   | 4804.0                  |\n| 95  | 0.0                    | 1856.0                 | 4804.0                 | 2022-11-01 | RJ   | 1764.0                  |\n\n<p>96 rows \u00d7 6 columns</p>\n</div>\n\n\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "Tools to Transform a Time Series into Features and Target a.k.a Supervised Learning",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/joaopcnogueira/ts2ml"
    },
    "split_keywords": [
        "nbdev",
        "jupyter",
        "notebook",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "83a0290fa925a32633aaa7f2f080770dac0d04eeb63722cec89fc5b127c48845",
                "md5": "e0c80ae71009a1fd69649df4523e452f",
                "sha256": "90738dfa76b54b8abaed3b21d2ca9cae8e7c86859fc902868bfcffd6f409a3aa"
            },
            "downloads": -1,
            "filename": "ts2ml-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e0c80ae71009a1fd69649df4523e452f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 12901,
            "upload_time": "2023-06-03T14:11:26",
            "upload_time_iso_8601": "2023-06-03T14:11:26.440299Z",
            "url": "https://files.pythonhosted.org/packages/83/a0/290fa925a32633aaa7f2f080770dac0d04eeb63722cec89fc5b127c48845/ts2ml-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "56c046aa04962b31d0eb5282df70a68fbde9a29a3b5f9bf893ddf9d96f3c75a2",
                "md5": "f5463cd0a2186f9b0e3d55f846423255",
                "sha256": "e31eee6d3911b723b4f3d123fef10192e6ad721a3ff5bd8093716d8b151f7d14"
            },
            "downloads": -1,
            "filename": "ts2ml-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "f5463cd0a2186f9b0e3d55f846423255",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 17928,
            "upload_time": "2023-06-03T14:11:28",
            "upload_time_iso_8601": "2023-06-03T14:11:28.719625Z",
            "url": "https://files.pythonhosted.org/packages/56/c0/46aa04962b31d0eb5282df70a68fbde9a29a3b5f9bf893ddf9d96f3c75a2/ts2ml-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-03 14:11:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "joaopcnogueira",
    "github_project": "ts2ml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "ts2ml"
}
        
Elapsed time: 0.33925s