draco-ml

Name	draco-ml JSON
Version	0.3.0 JSON
	download
home_page	https://github.com/sintel-dev/Draco
Summary	AutoML for Time Series.
upload_time	2023-07-31 15:36:00
maintainer
docs_url	None
author	MIT Data To AI Lab
requires_python	>=3.6,<3.9
license	MIT license
keywords	wind machine learning draco
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI
coveralls test coverage	No coveralls.

            <p align="left">
<img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt="DAI" />
<i>An open source project from Data to AI Lab at MIT.</i>
</p>

<p align="left">
<img width=20% src="https://dai.lids.mit.edu/wp-content/uploads/2019/03/GreenGuard.png" alt="Draco" />
</p>

<p align="left">
AutoML for Time Series.
</p>


[![PyPI Shield](https://img.shields.io/pypi/v/draco-ml.svg)](https://pypi.python.org/pypi/draco-ml)
[![Tests](https://github.com/sintel-dev/Draco/workflows/Run%20Tests/badge.svg)](https://github.com/sintel-dev/Draco/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster)
[![Downloads](https://pepy.tech/badge/draco-ml)](https://pepy.tech/project/draco-ml)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sintel-dev/Draco/master?filepath=tutorials)
<!--
[![Coverage Status](https://codecov.io/gh/sintel-dev/Draco/branch/master/graph/badge.svg)](https://codecov.io/gh/sintel-dev/Draco)
-->

# Draco

- License: [MIT](https://github.com/sintel-dev/Draco/blob/master/LICENSE)
- Documentation: https://sintel-dev.github.io/Draco
- Homepage: https://github.com/sintel-dev/Draco

## Overview

The Draco project is a collection of end-to-end solutions for machine learning problems
commonly found in time series monitoring systems. Most tasks utilize sensor data
emanating from monitoring systems. We utilize the foundational innovations developed for
automation of machine Learning at Data to AI Lab at MIT.

The salient aspects of this customized project are:

* A set of ready to use, well tested pipelines for different machine learning tasks. These are
  vetted through testing across multiple publicly available datasets for the same task.
* An easy interface to specify the task, pipeline, and generate results and summarize them.
* A production ready, deployable pipeline.
* An easy interface to ``tune`` pipelines using Bayesian Tuning and Bandits library.
* A community oriented infrastructure to incorporate new pipelines.
* A robust continuous integration and testing infrastructure.
* A ``learning database`` recording all past outcomes --> tasks, pipelines, outcomes.

## Resources

* [Data Format](DATA_FORMAT.md).
* [Draco folder structure](DATA_FORMAT.md#folder-structure).

# Install

## Requirements

**Draco** has been developed and runs on Python 3.6, 3.7 and 3.8.

Also, although it is not strictly required, the usage of a [virtualenv](
https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid interfering
with other software installed in the system where you are trying to run **Draco**.

## Download and Install

**Draco** can be installed locally using [pip](https://pip.pypa.io/en/stable/) with
the following command:

```bash
pip install draco-ml
```

This will pull and install the latest stable release from [PyPi](https://pypi.org/).

If you want to install from source or contribute to the project please read the
[Contributing Guide](https://sintel-dev.github.io/Draco/contributing.html#get-started).

# Data Format

The minimum input expected by the **Draco** system consists of the following two elements,
which need to be passed as `pandas.DataFrame` objects:

## Target Times

A table containing the specification of the problem that we are solving, which has three
columns:

* `turbine_id`: Unique identifier of the turbine which this label corresponds to.
* `cutoff_time`: Time associated with this target
* `target`: The value that we want to predict. This can either be a numerical value or a
  categorical label. This column can also be skipped when preparing data that will be used
  only to make predictions and not to fit any pipeline.

|    | turbine_id   | cutoff_time         |   target |
|----|--------------|---------------------|----------|
|  0 | T1           | 2001-01-02 00:00:00 |        0 |
|  1 | T1           | 2001-01-03 00:00:00 |        1 |
|  2 | T2           | 2001-01-04 00:00:00 |        0 |

## Readings

A table containing the signal data from the different sensors, with the following columns:

  * `turbine_id`: Unique identifier of the turbine which this reading comes from.
  * `signal_id`: Unique identifier of the signal which this reading comes from.
  * `timestamp (datetime)`: Time where the reading took place, as a datetime.
  * `value (float)`: Numeric value of this reading.

|    | turbine_id   | signal_id   | timestamp           |   value |
|----|--------------|-------------|---------------------|---------|
|  0 | T1           | S1          | 2001-01-01 00:00:00 |       1 |
|  1 | T1           | S1          | 2001-01-01 12:00:00 |       2 |
|  2 | T1           | S1          | 2001-01-02 00:00:00 |       3 |
|  3 | T1           | S1          | 2001-01-02 12:00:00 |       4 |
|  4 | T1           | S1          | 2001-01-03 00:00:00 |       5 |
|  5 | T1           | S1          | 2001-01-03 12:00:00 |       6 |
|  6 | T1           | S2          | 2001-01-01 00:00:00 |       7 |
|  7 | T1           | S2          | 2001-01-01 12:00:00 |       8 |
|  8 | T1           | S2          | 2001-01-02 00:00:00 |       9 |
|  9 | T1           | S2          | 2001-01-02 12:00:00 |      10 |
| 10 | T1           | S2          | 2001-01-03 00:00:00 |      11 |
| 11 | T1           | S2          | 2001-01-03 12:00:00 |      12 |

## Turbines

Optionally, a third table can be added containing metadata about the turbines.
The only requirement for this table is to have a `turbine_id` field, and it can have
an arbitraty number of additional fields.

|    | turbine_id   | manufacturer   | ...   | ...   | ...   |
|----|--------------|----------------|-------|-------|-------|
|  0 | T1           | Siemens        | ...   | ...   | ...   |
|  1 | T2           | Siemens        | ...   | ...   | ...   |

## CSV Format

A part from the in-memory data format explained above, which is limited by the memory
allocation capabilities of the system where it is run, **Draco** is also prepared to
load and work with data stored as a collection of CSV files, drastically increasing the amount
of data which it can work with. Further details about this format can be found in the
[project documentation site](DATA_FORMAT.md#csv-format).

# Quickstart

In this example we will load some demo data and classify it using a **Draco Pipeline**.

## 1. Load and split the demo data

The first step is to load the demo data.

For this, we will import and call the `draco.demo.load_demo` function without any arguments:

```python3
from draco.demo import load_demo

target_times, readings = load_demo()
```

The returned objects are:

*  ``target_times``: A ``pandas.DataFrame`` with the ``target_times`` table data:

   ```
     turbine_id cutoff_time  target
   0       T001  2013-01-12       0
   1       T001  2013-01-13       0
   2       T001  2013-01-14       0
   3       T001  2013-01-15       1
   4       T001  2013-01-16       0
   ```

* ``readings``: A ``pandas.DataFrame`` containing the time series data in the format explained above.

   ```
     turbine_id signal_id  timestamp  value
   0       T001       S01 2013-01-10  323.0
   1       T001       S02 2013-01-10  320.0
   2       T001       S03 2013-01-10  284.0
   3       T001       S04 2013-01-10  348.0
   4       T001       S05 2013-01-10  273.0
   ```

Once we have loaded the `target_times` and before proceeding to training any Machine Learning
Pipeline, we will have split them in 2 partitions for training and testing.

In this case, we will split them using the [train_test_split function from scikit-learn](
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html),
but it can be done with any other suitable tool.

```python3
from sklearn.model_selection import train_test_split

train, test = train_test_split(target_times, test_size=0.25, random_state=0)
```

Notice how we are only splitting the `target_times` data and not the `readings`.
This is because the pipelines will later on take care of selecting the parts of the
`readings` table needed for the training based on the information found inside
the `train` and `test` inputs.

Additionally, if we want to calculate a goodness-of-fit score later on, we can separate the
testing target values from the `test` table by popping them from it:

```python3
test_targets = test.pop('target')
```

## 2. Exploring the available Pipelines

Once we have the data ready, we need to find a suitable pipeline.

The list of available Draco Pipelines can be obtained using the `draco.get_pipelines`
function.

```python3
from draco import get_pipelines

pipelines = get_pipelines()
```

The returned `pipeline` variable will be `list` containing the names of all the pipelines
available in the Draco system:

```
['lstm',
 'lstm_with_unstack',
 'double_lstm',
 'double_lstm_with_unstack']
```

For the rest of this tutorial, we will select and use the pipeline
`lstm_with_unstack` as our template.

```python3
pipeline_name = 'lstm_with_unstack'
```

## 3. Fitting the Pipeline

Once we have loaded the data and selected the pipeline that we will use, we have to
fit it.

For this, we will create an instance of a `DracoPipeline` object passing the name
of the pipeline that we want to use:

```python3
from draco.pipeline import DracoPipeline

pipeline = DracoPipeline(pipeline_name)
```

And then we can directly fit it to our data by calling its `fit` method and passing in the
training `target_times` and the complete `readings` table:

```python3
pipeline.fit(train, readings)
```

## 4. Make predictions

After fitting the pipeline, we are ready to make predictions on new data by calling the
`pipeline.predict` method passing the testing `target_times` and, again, the complete
`readings` table.

```python3
predictions = pipeline.predict(test, readings)
```

## 5. Evaluate the goodness-of-fit

Finally, after making predictions we can evaluate how good the prediction was
using any suitable metric.

```python3
from sklearn.metrics import f1_score

f1_score(test_targets, predictions)
```

## What's next?

For more details about **Draco** and all its possibilities and features, please check the
[project documentation site](https://sintel-dev.github.io/Draco/)
Also do not forget to have a look at the [tutorials](
https://github.com/sintel-dev/Draco/tree/master/tutorials)!


# History

## 0.3.0 - 2022-07-31

This release switches from ``MLPrimitives`` to ``ml-stars``.
Moreover, we remove all pipelines using deep feature synthesis.

* Update demo bucket - [Issue #76](https://github.com/sintel-dev/Draco/issues/76) by @sarahmish
* Remove ``dfs`` based pipelines - [Issue #73](https://github.com/sintel-dev/Draco/issues/73) by @sarahmish
* Move from ``MLPrimitives`` to ``ml-stars`` - [Issue #72](https://github.com/sintel-dev/Draco/issues/72) by @sarahmish


## 0.2.0 - 2022-04-12

This release features a reorganization and renaming of ``Draco`` pipelines. In addtion,
we update some of the dependencies for general housekeeping.

* Update Draco dependencies - [Issue #66](https://github.com/signals-dev/Draco/issues/66) by @sarahmish
* Reorganize pipelines - [Issue #63](https://github.com/signals-dev/Draco/issues/63) by @sarahmish


## 0.1.0 - 2022-01-01

* First release on ``draco-ml`` PyPI


## Previous GreenGuard development

### 0.3.0 - 2021-01-22

This release increases the supported version of python to `3.8` and also includes changes
in the installation requirements, where ``pandas`` and ``scikit-optimize`` packages have
been updated to support higher versions. This changes come together with the newer versions
of ``MLBlocks`` and ``MLPrimitives``.

#### Internal Improvements

* Fix ``run_benchmark`` generating properly the ``init_hyperparameters`` for the pipelines.
* New ``FPR`` metric.
* New ``roc_auc_score`` metric.
* Multiple benchmarking metrics allowed.
* Multiple ``tpr`` or ``threshold`` values allowed for the benchmark.

### 0.2.6 - 2020-10-23

* Fix ``mkdir`` when exporting to ``csv`` file the benchmark results.
* Intermediate steps for the pipelines with demo notebooks for each pipeline.

#### Resolved Issues

* Issue #50: Expose partial outputs and executions in the ``GreenGuardPipeline``.

### 0.2.5 - 2020-10-09

With this release we include:

* `run_benchmark`: A function within the module `benchmark` that allows the user to evaluate
templates against problems with different window size and resample rules.
* `summarize_results`: A function that given a `csv` file generates a `xlsx` file with a summary
tab and a detailed tab with the results from `run_benchmark`.

### 0.2.4 - 2020-09-25

* Fix dependency errors

### 0.2.3 - 2020-08-10

* Added benchmarking module.

### 0.2.2 - 2020-07-10

#### Internal Improvements

* Added github actions.

#### Resolved Issues

* Issue #27: Cache Splits pre-processed data on disk

### 0.2.1 - 2020-06-16

With this release we give the possibility to the user to specify more than one template when
creating a GreenGuardPipeline. When the `tune` method of this is called, an instance of BTBSession
is returned and it is in charge of selecting the templates and tuning their hyperparameters until
achieving the best pipeline.

#### Internal Improvements

* Resample by filename inside the `CSVLoader` to avoid oversampling of data that will not be used.
* Select targets now allows them to be equal.
* Fixed the csv filename format.
* Upgraded to BTB.

#### Bug Fixes

* Issue #33: Wrong default datetime format

#### Resolved Issues

* Issue #35: Select targets is too strict
* Issue #36: resample by filename inside csvloader
* Issue #39: Upgrade BTB
* Issue #41: Fix CSV filename format

### 0.2.0 - 2020-02-14

First stable release:

* efficient data loading and preprocessing
* initial collection of dfs and lstm based pipelines
* optimized pipeline tuning
* documentation and tutorials

### 0.1.0

* First release on PyPI

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sintel-dev/Draco",
    "name": "draco-ml",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6,<3.9",
    "maintainer_email": "",
    "keywords": "wind machine learning draco",
    "author": "MIT Data To AI Lab",
    "author_email": "dailabmit@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ea/cf/25b803c59b94e75c68e654c0fec10c669371a7ecdf456f3266d8435608f6/draco-ml-0.3.0.tar.gz",
    "platform": null,
    "description": "<p align=\"left\">\n<img width=15% src=\"https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png\" alt=\"DAI\" />\n<i>An open source project from Data to AI Lab at MIT.</i>\n</p>\n\n<p align=\"left\">\n<img width=20% src=\"https://dai.lids.mit.edu/wp-content/uploads/2019/03/GreenGuard.png\" alt=\"Draco\" />\n</p>\n\n<p align=\"left\">\nAutoML for Time Series.\n</p>\n\n\n[![PyPI Shield](https://img.shields.io/pypi/v/draco-ml.svg)](https://pypi.python.org/pypi/draco-ml)\n[![Tests](https://github.com/sintel-dev/Draco/workflows/Run%20Tests/badge.svg)](https://github.com/sintel-dev/Draco/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster)\n[![Downloads](https://pepy.tech/badge/draco-ml)](https://pepy.tech/project/draco-ml)\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sintel-dev/Draco/master?filepath=tutorials)\n<!--\n[![Coverage Status](https://codecov.io/gh/sintel-dev/Draco/branch/master/graph/badge.svg)](https://codecov.io/gh/sintel-dev/Draco)\n-->\n\n# Draco\n\n- License: [MIT](https://github.com/sintel-dev/Draco/blob/master/LICENSE)\n- Documentation: https://sintel-dev.github.io/Draco\n- Homepage: https://github.com/sintel-dev/Draco\n\n## Overview\n\nThe Draco project is a collection of end-to-end solutions for machine learning problems\ncommonly found in time series monitoring systems. Most tasks utilize sensor data\nemanating from monitoring systems. We utilize the foundational innovations developed for\nautomation of machine Learning at Data to AI Lab at MIT.\n\nThe salient aspects of this customized project are:\n\n* A set of ready to use, well tested pipelines for different machine learning tasks. These are\n  vetted through testing across multiple publicly available datasets for the same task.\n* An easy interface to specify the task, pipeline, and generate results and summarize them.\n* A production ready, deployable pipeline.\n* An easy interface to ``tune`` pipelines using Bayesian Tuning and Bandits library.\n* A community oriented infrastructure to incorporate new pipelines.\n* A robust continuous integration and testing infrastructure.\n* A ``learning database`` recording all past outcomes --> tasks, pipelines, outcomes.\n\n## Resources\n\n* [Data Format](DATA_FORMAT.md).\n* [Draco folder structure](DATA_FORMAT.md#folder-structure).\n\n# Install\n\n## Requirements\n\n**Draco** has been developed and runs on Python 3.6, 3.7 and 3.8.\n\nAlso, although it is not strictly required, the usage of a [virtualenv](\nhttps://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid interfering\nwith other software installed in the system where you are trying to run **Draco**.\n\n## Download and Install\n\n**Draco** can be installed locally using [pip](https://pip.pypa.io/en/stable/) with\nthe following command:\n\n```bash\npip install draco-ml\n```\n\nThis will pull and install the latest stable release from [PyPi](https://pypi.org/).\n\nIf you want to install from source or contribute to the project please read the\n[Contributing Guide](https://sintel-dev.github.io/Draco/contributing.html#get-started).\n\n# Data Format\n\nThe minimum input expected by the **Draco** system consists of the following two elements,\nwhich need to be passed as `pandas.DataFrame` objects:\n\n## Target Times\n\nA table containing the specification of the problem that we are solving, which has three\ncolumns:\n\n* `turbine_id`: Unique identifier of the turbine which this label corresponds to.\n* `cutoff_time`: Time associated with this target\n* `target`: The value that we want to predict. This can either be a numerical value or a\n  categorical label. This column can also be skipped when preparing data that will be used\n  only to make predictions and not to fit any pipeline.\n\n|    | turbine_id   | cutoff_time         |   target |\n|----|--------------|---------------------|----------|\n|  0 | T1           | 2001-01-02 00:00:00 |        0 |\n|  1 | T1           | 2001-01-03 00:00:00 |        1 |\n|  2 | T2           | 2001-01-04 00:00:00 |        0 |\n\n## Readings\n\nA table containing the signal data from the different sensors, with the following columns:\n\n  * `turbine_id`: Unique identifier of the turbine which this reading comes from.\n  * `signal_id`: Unique identifier of the signal which this reading comes from.\n  * `timestamp (datetime)`: Time where the reading took place, as a datetime.\n  * `value (float)`: Numeric value of this reading.\n\n|    | turbine_id   | signal_id   | timestamp           |   value |\n|----|--------------|-------------|---------------------|---------|\n|  0 | T1           | S1          | 2001-01-01 00:00:00 |       1 |\n|  1 | T1           | S1          | 2001-01-01 12:00:00 |       2 |\n|  2 | T1           | S1          | 2001-01-02 00:00:00 |       3 |\n|  3 | T1           | S1          | 2001-01-02 12:00:00 |       4 |\n|  4 | T1           | S1          | 2001-01-03 00:00:00 |       5 |\n|  5 | T1           | S1          | 2001-01-03 12:00:00 |       6 |\n|  6 | T1           | S2          | 2001-01-01 00:00:00 |       7 |\n|  7 | T1           | S2          | 2001-01-01 12:00:00 |       8 |\n|  8 | T1           | S2          | 2001-01-02 00:00:00 |       9 |\n|  9 | T1           | S2          | 2001-01-02 12:00:00 |      10 |\n| 10 | T1           | S2          | 2001-01-03 00:00:00 |      11 |\n| 11 | T1           | S2          | 2001-01-03 12:00:00 |      12 |\n\n## Turbines\n\nOptionally, a third table can be added containing metadata about the turbines.\nThe only requirement for this table is to have a `turbine_id` field, and it can have\nan arbitraty number of additional fields.\n\n|    | turbine_id   | manufacturer   | ...   | ...   | ...   |\n|----|--------------|----------------|-------|-------|-------|\n|  0 | T1           | Siemens        | ...   | ...   | ...   |\n|  1 | T2           | Siemens        | ...   | ...   | ...   |\n\n## CSV Format\n\nA part from the in-memory data format explained above, which is limited by the memory\nallocation capabilities of the system where it is run, **Draco** is also prepared to\nload and work with data stored as a collection of CSV files, drastically increasing the amount\nof data which it can work with. Further details about this format can be found in the\n[project documentation site](DATA_FORMAT.md#csv-format).\n\n# Quickstart\n\nIn this example we will load some demo data and classify it using a **Draco Pipeline**.\n\n## 1. Load and split the demo data\n\nThe first step is to load the demo data.\n\nFor this, we will import and call the `draco.demo.load_demo` function without any arguments:\n\n```python3\nfrom draco.demo import load_demo\n\ntarget_times, readings = load_demo()\n```\n\nThe returned objects are:\n\n*  ``target_times``: A ``pandas.DataFrame`` with the ``target_times`` table data:\n\n   ```\n     turbine_id cutoff_time  target\n   0       T001  2013-01-12       0\n   1       T001  2013-01-13       0\n   2       T001  2013-01-14       0\n   3       T001  2013-01-15       1\n   4       T001  2013-01-16       0\n   ```\n\n* ``readings``: A ``pandas.DataFrame`` containing the time series data in the format explained above.\n\n   ```\n     turbine_id signal_id  timestamp  value\n   0       T001       S01 2013-01-10  323.0\n   1       T001       S02 2013-01-10  320.0\n   2       T001       S03 2013-01-10  284.0\n   3       T001       S04 2013-01-10  348.0\n   4       T001       S05 2013-01-10  273.0\n   ```\n\nOnce we have loaded the `target_times` and before proceeding to training any Machine Learning\nPipeline, we will have split them in 2 partitions for training and testing.\n\nIn this case, we will split them using the [train_test_split function from scikit-learn](\nhttps://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html),\nbut it can be done with any other suitable tool.\n\n```python3\nfrom sklearn.model_selection import train_test_split\n\ntrain, test = train_test_split(target_times, test_size=0.25, random_state=0)\n```\n\nNotice how we are only splitting the `target_times` data and not the `readings`.\nThis is because the pipelines will later on take care of selecting the parts of the\n`readings` table needed for the training based on the information found inside\nthe `train` and `test` inputs.\n\nAdditionally, if we want to calculate a goodness-of-fit score later on, we can separate the\ntesting target values from the `test` table by popping them from it:\n\n```python3\ntest_targets = test.pop('target')\n```\n\n## 2. Exploring the available Pipelines\n\nOnce we have the data ready, we need to find a suitable pipeline.\n\nThe list of available Draco Pipelines can be obtained using the `draco.get_pipelines`\nfunction.\n\n```python3\nfrom draco import get_pipelines\n\npipelines = get_pipelines()\n```\n\nThe returned `pipeline` variable will be `list` containing the names of all the pipelines\navailable in the Draco system:\n\n```\n['lstm',\n 'lstm_with_unstack',\n 'double_lstm',\n 'double_lstm_with_unstack']\n```\n\nFor the rest of this tutorial, we will select and use the pipeline\n`lstm_with_unstack` as our template.\n\n```python3\npipeline_name = 'lstm_with_unstack'\n```\n\n## 3. Fitting the Pipeline\n\nOnce we have loaded the data and selected the pipeline that we will use, we have to\nfit it.\n\nFor this, we will create an instance of a `DracoPipeline` object passing the name\nof the pipeline that we want to use:\n\n```python3\nfrom draco.pipeline import DracoPipeline\n\npipeline = DracoPipeline(pipeline_name)\n```\n\nAnd then we can directly fit it to our data by calling its `fit` method and passing in the\ntraining `target_times` and the complete `readings` table:\n\n```python3\npipeline.fit(train, readings)\n```\n\n## 4. Make predictions\n\nAfter fitting the pipeline, we are ready to make predictions on new data by calling the\n`pipeline.predict` method passing the testing `target_times` and, again, the complete\n`readings` table.\n\n```python3\npredictions = pipeline.predict(test, readings)\n```\n\n## 5. Evaluate the goodness-of-fit\n\nFinally, after making predictions we can evaluate how good the prediction was\nusing any suitable metric.\n\n```python3\nfrom sklearn.metrics import f1_score\n\nf1_score(test_targets, predictions)\n```\n\n## What's next?\n\nFor more details about **Draco** and all its possibilities and features, please check the\n[project documentation site](https://sintel-dev.github.io/Draco/)\nAlso do not forget to have a look at the [tutorials](\nhttps://github.com/sintel-dev/Draco/tree/master/tutorials)!\n\n\n# History\n\n## 0.3.0 - 2022-07-31\n\nThis release switches from ``MLPrimitives`` to ``ml-stars``.\nMoreover, we remove all pipelines using deep feature synthesis.\n\n* Update demo bucket - [Issue #76](https://github.com/sintel-dev/Draco/issues/76) by @sarahmish\n* Remove ``dfs`` based pipelines - [Issue #73](https://github.com/sintel-dev/Draco/issues/73) by @sarahmish\n* Move from ``MLPrimitives`` to ``ml-stars`` - [Issue #72](https://github.com/sintel-dev/Draco/issues/72) by @sarahmish\n\n\n## 0.2.0 - 2022-04-12\n\nThis release features a reorganization and renaming of ``Draco`` pipelines. In addtion,\nwe update some of the dependencies for general housekeeping.\n\n* Update Draco dependencies - [Issue #66](https://github.com/signals-dev/Draco/issues/66) by @sarahmish\n* Reorganize pipelines - [Issue #63](https://github.com/signals-dev/Draco/issues/63) by @sarahmish\n\n\n## 0.1.0 - 2022-01-01\n\n* First release on ``draco-ml`` PyPI\n\n\n## Previous GreenGuard development\n\n### 0.3.0 - 2021-01-22\n\nThis release increases the supported version of python to `3.8` and also includes changes\nin the installation requirements, where ``pandas`` and ``scikit-optimize`` packages have\nbeen updated to support higher versions. This changes come together with the newer versions\nof ``MLBlocks`` and ``MLPrimitives``.\n\n#### Internal Improvements\n\n* Fix ``run_benchmark`` generating properly the ``init_hyperparameters`` for the pipelines.\n* New ``FPR`` metric.\n* New ``roc_auc_score`` metric.\n* Multiple benchmarking metrics allowed.\n* Multiple ``tpr`` or ``threshold`` values allowed for the benchmark.\n\n### 0.2.6 - 2020-10-23\n\n* Fix ``mkdir`` when exporting to ``csv`` file the benchmark results.\n* Intermediate steps for the pipelines with demo notebooks for each pipeline.\n\n#### Resolved Issues\n\n* Issue #50: Expose partial outputs and executions in the ``GreenGuardPipeline``.\n\n### 0.2.5 - 2020-10-09\n\nWith this release we include:\n\n* `run_benchmark`: A function within the module `benchmark` that allows the user to evaluate\ntemplates against problems with different window size and resample rules.\n* `summarize_results`: A function that given a `csv` file generates a `xlsx` file with a summary\ntab and a detailed tab with the results from `run_benchmark`.\n\n### 0.2.4 - 2020-09-25\n\n* Fix dependency errors\n\n### 0.2.3 - 2020-08-10\n\n* Added benchmarking module.\n\n### 0.2.2 - 2020-07-10\n\n#### Internal Improvements\n\n* Added github actions.\n\n#### Resolved Issues\n\n* Issue #27: Cache Splits pre-processed data on disk\n\n### 0.2.1 - 2020-06-16\n\nWith this release we give the possibility to the user to specify more than one template when\ncreating a GreenGuardPipeline. When the `tune` method of this is called, an instance of BTBSession\nis returned and it is in charge of selecting the templates and tuning their hyperparameters until\nachieving the best pipeline.\n\n#### Internal Improvements\n\n* Resample by filename inside the `CSVLoader` to avoid oversampling of data that will not be used.\n* Select targets now allows them to be equal.\n* Fixed the csv filename format.\n* Upgraded to BTB.\n\n#### Bug Fixes\n\n* Issue #33: Wrong default datetime format\n\n#### Resolved Issues\n\n* Issue #35: Select targets is too strict\n* Issue #36: resample by filename inside csvloader\n* Issue #39: Upgrade BTB\n* Issue #41: Fix CSV filename format\n\n### 0.2.0 - 2020-02-14\n\nFirst stable release:\n\n* efficient data loading and preprocessing\n* initial collection of dfs and lstm based pipelines\n* optimized pipeline tuning\n* documentation and tutorials\n\n### 0.1.0\n\n* First release on PyPI\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "AutoML for Time Series.",
    "version": "0.3.0",
    "project_urls": {
        "Homepage": "https://github.com/sintel-dev/Draco"
    },
    "split_keywords": [
        "wind",
        "machine",
        "learning",
        "draco"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "754f4b214badc8f4435ea1f5de76ffa2fe7550380968091f523159ae10dbc4f3",
                "md5": "9d4c6820db1bdbd46a50521401321043",
                "sha256": "43f6f61f31fe9b4f9fbf8ddf3d34317f785c370efe42bfde2afea07a4ad6a986"
            },
            "downloads": -1,
            "filename": "draco_ml-0.3.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9d4c6820db1bdbd46a50521401321043",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.6,<3.9",
            "size": 42739,
            "upload_time": "2023-07-31T15:35:56",
            "upload_time_iso_8601": "2023-07-31T15:35:56.246129Z",
            "url": "https://files.pythonhosted.org/packages/75/4f/4b214badc8f4435ea1f5de76ffa2fe7550380968091f523159ae10dbc4f3/draco_ml-0.3.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "eacf25b803c59b94e75c68e654c0fec10c669371a7ecdf456f3266d8435608f6",
                "md5": "31a24ffcfddf38dc5bebaecda924184b",
                "sha256": "fa8200c9dae4ec2189c1cfe13f5fcaad7766f272fd8da65d79b0abdaf65f794f"
            },
            "downloads": -1,
            "filename": "draco-ml-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "31a24ffcfddf38dc5bebaecda924184b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6,<3.9",
            "size": 917113,
            "upload_time": "2023-07-31T15:36:00",
            "upload_time_iso_8601": "2023-07-31T15:36:00.244828Z",
            "url": "https://files.pythonhosted.org/packages/ea/cf/25b803c59b94e75c68e654c0fec10c669371a7ecdf456f3266d8435608f6/draco-ml-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-31 15:36:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sintel-dev",
    "github_project": "Draco",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "draco-ml"
}

MIT Data To AI Lab