wax-ml


Namewax-ml JSON
Version 0.6.4 PyPI version JSON
download
home_pagehttps://github.com/eserie/wax-ml
SummaryA Python library for machine-learning and feedback loops on streaming data
upload_time2023-04-21 22:27:14
maintainer
docs_urlNone
authorWAX-ML Authors
requires_python
licenseApache
keywords time series machine learning optimization optimal control online learning reinforcement learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
<img src="https://github.com/eserie/wax-ml/blob/main/docs/_static/wax_logo.png" alt="logo" width="40%"></img>
</div>

# WAX-ML: A Python library for machine-learning and feedback loops on streaming data

![Continuous integration](https://github.com/eserie/wax-ml/actions/workflows/tests.yml/badge.svg)
[![Documentation Status](https://readthedocs.org/projects/wax-ml/badge/?version=latest)](https://wax-ml.readthedocs.io/en/latest/)
[![PyPI version](https://badge.fury.io/py/wax-ml.svg)](https://badge.fury.io/py/wax-ml)
[![Codecov](https://codecov.io/gh/eserie/wax-ml/branch/main/graph/badge.svg)](https://codecov.io/gh/eserie/wax-ml)
[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)



[**Quickstart**](#quickstart-colab-in-the-cloud)
| [**Install guide**](#installation)
| [**Change logs**](https://wax-ml.readthedocs.io/en/latest/changelog.html)
| [**Reference docs**](https://wax-ml.readthedocs.io/en/latest/)

## Introduction

🌊 Wax is what you put on a surfboard to avoid slipping. It is an essential tool to go
surfing ... 🌊

WAX-ML is a research-oriented [Python](https://www.python.org/)  library
providing tools to design powerful machine learning algorithms and feedback loops
working on streaming data.

It strives to complement [JAX](https://jax.readthedocs.io/en/latest/)
with tools dedicated to time series.

WAX-ML makes JAX-based programs easy to use for end-users working
with
[pandas](https://pandas.pydata.org/) and [xarray](http://xarray.pydata.org/en/stable/)
for data manipulation.

WAX-ML provides a simple mechanism for implementing feedback loops, allows the implementation of
 reinforcement learning algorithms with functions, and makes them easy to integrate by
end-users working with the object-oriented reinforcement learning framework from the
[Gym](https://gym.openai.com/) library.

To learn more, you can read our [article on ArXiv](http://arxiv.org/abs/2106.06524)
or simply access the code in this repository.

## WAX-ML Goal

WAX-ML's goal is to expose "traditional" algorithms that are often difficult to find in standard
Python ecosystem and are related to time-series and more generally to streaming data.

It aims to make it easy to work with algorithms from very various computational domains such as
machine learning, online learning, reinforcement learning, optimal control, time-series analysis,
optimization, statistical modeling.

For now, WAX-ML focuses on **time-series** algorithms as this is one of the areas of machine learning
that lacks the most dedicated tools.  Working with time series is notoriously known to be difficult
and often requires very specific algorithms (statistical modeling, filtering, optimal control).


Even though some of the modern machine learning methods such as RNN, LSTM, or reinforcement learning
can do an excellent job on some specific time-series problems, most of the problems require using
more traditional algorithms such as linear and non-linear filters, FFT,
the eigendecomposition of matrices (e.g. [[7]](#references)),
principal component analysis (PCA) (e.g. [[8]](#references)), Riccati solvers for
optimal control and filtering, ...


By adopting a functional approach, inherited from JAX, WAX-ML aims to be an efficient tool to
combine modern machine learning approaches with more traditional ones.


Some work has been done in this direction in [[2] in References](#references) where transformer encoder
architectures are massively accelerated, with limited accuracy costs, by replacing the
self-attention sublayers with a standard, non-parameterized Fast Fourier Transform (FFT).


WAX-ML may also be useful for developing research ideas in areas such as online machine learning
(see [[1] in References](#references)) and development of control, reinforcement learning,
and online optimization methods.

## What does WAX-ML do?

Well, building WAX-ML, we have some pretty ambitious design and implementation goals.

To do things right, we decided to start small and in an open-source design from the beginning.


For now, WAX-ML contains:
- transformation tools that we call "unroll" transformations allowing us to
  apply any transformation, possibly stateful, on sequential data.  It generalizes the RNN
  architecture to any stateful transformation allowing the implementation of any kind of "filter".

- a "stream" module, described in [🌊 Streaming Data 🌊](#-streaming-data-), permitting us to
  synchronize data streams with different time resolutions.

- some general pandas and xarray "accessors" permitting the application of any
  JAX-functions on pandas and xarray data containers:
  `DataFrame`, `Series`, `Dataset`, and `DataArray`.

- ready-to-use exponential moving average filter that we exposed with two APIs:
    - one for JAX users: as Haiku modules (`EWMA`, ... see the complete list in our
    [API documentation](https://wax-ml.readthedocs.io/en/latest/wax.modules.html)
    ).
    - a second one for pandas and xarray users: with drop-in replacement of pandas
      `ewm` accessor.

- a simple module `OnlineSupervisedLearner` to implement online learning algorithms
  for supervised machine learning problems.

- building blocks for designing feedback loops in reinforcement learning, and have
  provided a module called `GymFeedback` allowing the implementation of feedback loop as the
  introduced in the library [Gym](https://gym.openai.com/), and illustrated this figure:

  <div align="center">
  <img src="docs/tikz/gymfeedback.png" alt="logo" width="60%"></img>
  </div>

### What is JAX?

JAX is a research-oriented computational system implemented in Python that leverages the
XLA optimization framework for machine learning computations.  It makes XLA usable with
the NumPy API and some functional primitives for just-in-time compilation,
differentiation, vectorization, and parallelization.  It allows building higher-level
transformations or "programs" in a functional programming approach.
See  [JAX's page](https://github.com/google/jax) for more details.


## Why to use WAX-ML?

If you deal with time-series and are a pandas or xarray user, b
ut you want to use the impressive
tools of the JAX ecosystem, then WAX-ML might be the right tool for you,
as it implements pandas and
xarray accessors to apply JAX functions.

If you are a user of JAX, you may be interested in adding WAX-ML to your toolbox to address
time-series problems.

## Design

### Research oriented
WAX-ML is a research-oriented library.  It relies on
[JAX](https://jax.readthedocs.io/en/latest/notebooks/quickstart.html) and
[Haiku](https://github.com/deepmind/dm-haiku) functional programming paradigm to ease the
development of research ideas.

WAX-ML is a bit like [Flux](https://fluxml.ai/Flux.jl/stable/)
in [Julia](https://julialang.org/) programming language.

### Functional programming
In WAX-ML, we pursue a functional programming approach inherited from JAX.

In this sense, WAX-ML is not a framework, as most object-oriented libraries offer.  Instead, we
implement "functions" that must be pure to exploit the JAX ecosystem.

### Haiku modules
We use the "module" mechanism proposed by the Haiku library to easily generate pure function pairs,
called `init` and `apply` in Haiku, to implement programs that require the management of
parameters and/or state variables.
You can see
[the Haiku module API](https://dm-haiku.readthedocs.io/en/latest/api.html#modules-parameters-and-state)
and
[Haiku transformation functions](https://dm-haiku.readthedocs.io/en/latest/api.html#haiku-transforms)
for more details.

In this way, we can recover all the advantages of
object-oriented programming but exposed in the functional programming approach.
It permits to ease the development of robust and reusable features and to
develop "mini-languages" tailored to specific scientific domains.


### WAX-ML works with other libraries

We want existing machine learning libraries to work well together while trying to leverage their strength.
This is facilitated with a functional programming approach.

WAX-ML is not a framework but either a set of tools that aim to complement
[JAX Ecosystem](https://moocaholic.medium.com/jax-a13e83f49897).


# Contents
* [🚀 Quickstart: Colab in the Cloud 🚀](#-quicksart-colab-in-the-cloud-)
* [⏱ Synchronize streams ⏱](#-synchronize-streams-)
* [🌊 Streaming Data 🌊](#-streaming-data-)
* [Implemented modules](#-implemented-modules-)
* [♻ Feedback loops ♻](#-feedback-loops-)
* [Future plans](#future-plans)
* [⚒ Installation ⚒](#installation)
* [Disclaimer](#disclaimer)
* [Development](#development)
* [References](#references)
* [License](#license)
* [Citing WAX-ML](#citing-wax)
* [Reference documentation](#reference-documentation)


## 🚀 Quickstart 🚀

Jump right in using a notebook in your browser, connected to a Google Cloud GPU or
simply read our notebook in the
[documentation](https://wax-ml.readthedocs.io/en/latest/).

Here are some starter notebooks:
- 〰 Compute exponential moving averages with xarray and pandas accessors 〰 : [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/01_demo_EWMA.ipynb),
  [Open in Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/01_demo_EWMA.html)
- ⏱ Synchronize data streams ⏱ : [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/02_Synchronize_data_streams.ipynb),
  [Open in Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/02_Synchronize_data_streams.html)
- 🌡 Binning temperatures 🌡 : [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/03_ohlc_temperature.ipynb),
  [Open in Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/03_ohlc_temperature.html)
- 🎛 The three steps workflow 🎛 : [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/04_The_three_steps_workflow.ipynb),
  [Open in Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/04_The_three_steps_workflow.html)
- 🔭 Reconstructing the light curve of stars with LSTM 🔭: [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/05_reconstructing_the_light_curve_of_stars.ipynb),
  [Open in Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/05_reconstructing_the_light_curve_of_stars.html)
- 🦎 Online linear regression with a non-stationary environment 🦎: [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/06_Online_Linear_Regression.ipynb),
  [Open in Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/06_Online_Linear_Regression.html)


## ⏱ Synchronize streams ⏱

Physicists have brought a solution to the synchronization problem called the Poincaré–Einstein
synchronization (See [Poincaré-Einstein synchronization Wikipedia
page](https://en.wikipedia.org/wiki/Einstein_synchronisation)).  In WAX-ML we have implemented a similar
mechanism by defining a "local time", borrowing Henri Poincaré terminology, to denominate the
timestamps of the stream (the "local stream") in which the user wants to apply transformations and
unravel all other streams.  The other streams, which we have called "secondary streams", are pushed
back in the local stream using embedding maps which specify how to convert timestamps from a
secondary stream into timestamps in the local stream.

This synchronization mechanism permits to work with secondary streams having timestamps at
frequencies that can be lower or higher than the local stream. The data from these secondary streams
are represented in the "local stream" either with the use of a forward filling mechanism for lower
frequencies or with a buffering mechanism for higher frequencies.

Note that this simple synchronization scheme assumes that the different event streams have fixed
latencies.

We have implemented a "data tracing" mechanism to optimize access to out-of-sync streams.  This
mechanism works on in-memory data.  We perform the first pass on the data, without actually
accessing it, and determine the indices necessary to later access the data. Doing so we are vigilant
to not let any "future" information pass through and thus guaranty a data processing that respects
causality.

The buffering mechanism used in the case of higher frequencies works with a fixed buffer size
(see the WAX-ML module [`wax.modules.Buffer`](https://wax-ml.readthedocs.io/en/latest/_autosummary/wax.modules.buffer.html#module-wax.modules.buffer)
to allow the use of JAX / XLA optimizations and efficient processing.

### Example

Let's illustrate with a small example how `wax.stream.Stream` synchronizes data streams.

Let's use the dataset "air temperature" with :
- An air temperature is defined with hourly resolution.
- A "fake" ground temperature is defined with a daily resolution as the air temperature minus 10 degrees.


```python

from wax.accessors import register_wax_accessors

register_wax_accessors()
```

```python

from wax.modules import EWMA


def my_custom_function(dataset):
    return {
        "air_10": EWMA(1.0 / 10.0)(dataset["air"]),
        "air_100": EWMA(1.0 / 100.0)(dataset["air"]),
        "ground_100": EWMA(1.0 / 100.0)(dataset["ground"]),
    }
```

```python
results, state = dataset.wax.stream(
    local_time="time", ffills={"day": 1}, pbar=True
).apply(my_custom_function, format_dims=dataset.air.dims)
```

```python
_ = results.isel(lat=0, lon=0).drop(["lat", "lon"]).to_pandas().plot(figsize=(12, 8))
```

<div align="center">
<img src="docs/_static/synchronize_data_streams.png" alt="logo" width="60%"></img>
</div>

## 🌊 Streaming Data 🌊

WAX-ML may complement JAX ecosystem by adding support for **streaming data**.

To do this, WAX-ML implements a unique **data tracing** mechanism that prepares for fast
access to in-memory data and allows the execution of JAX tractable functions such as
`jit`, `grad`, `vmap`, or `pmap`.

This mechanism is somewhat special in that it works with time-series data.

The `wax.stream.Stream` object implements this idea.  It uses Python generators to
**synchronize multiple streaming data streams** with potentially different temporal
resolutions.

The `wax.stream.Stream` object works on in-memory data stored in
[`xarray.Dataset`](http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html).

To work with "real" streaming data, it should be possible to implement a buffer
mechanism running on any Python generator and to use the synchronization and data
tracing mechanisms implemented in WAX-ML to apply JAX transformations on batches of data
stored in memory. (See our WEP4 enhancement proposal)

### ⌛ Adding support for time dtypes in JAX ⌛

At the moment `datetime64` and `string_` dtypes are not supported in JAX.

WAX-ML add support for `datetime64` and `string_` NumPy dtypes in JAX.
To do so, WAX-ML implements:
- an encoding scheme for `datetime64` relying on pairs of 32-bit integers similar to `PRNGKey` in JAX.
- an encoding scheme for `string_` relying on `LabelEncoder` of [Scikit-learn](https://scikit-learn.org/stable/).

By providing these two encoding schemes, WAX-ML makes it easy to use JAX algorithms on data of these types.

Currently, the types of time offsets supported by WAX-ML are quite limited and we
collaborate with the pandas, xarray, and [Astropy](https://www.astropy.org/) teams
to further develop the time manipulation tools in WAX-ML (see "WEP1" in `WEP.md`).

### pandas and xarray accessors

WAX-ML implements pandas and xarray accessors to ease the usage of machine-learning
algorithms implemented with functions implemented with Haiku modules on high-level data APIs :
- pandas's `DataFrame` and `Series`
- xarray's `Dataset` and `DataArray`.

To load the accessors, run:
```python
from wax.accessors import register_wax_accessors
register_wax_accessors()
```

Then run the "one-liner" syntax:
```python
<data-container>.stream(…).apply(…)
```

## Implemented modules

We have some modules (inherited from Haiku modules) ready to be used in `wax.modules`
(see our [api documentation](https://wax-ml.readthedocs.io/en/latest/wax.modules.html)).

They can be considered as "building blocks" that can be reused to build more advanced programs to run on streaming data.

### Fundamental modules

We have some "fundamental" modules that are specific to time series management,
- the `Buffer` module which implements the buffering mechanism
- the `UpdateOnEvent` module which allows to "freeze" the computation of a program and
  to update it on some events in the "local flow".
  To illustrate the use of this module we show how it can be used to compute the opening,
  high and closing quantities of temperatures recorded during a day,
  the binning process being reset at each day change.  We show an illustrative graph of the final result:


<div align="center">
<img src="docs/_static/trailing_ohlc.png" alt="logo" width="60%"></img>
</div>

### pandas modules

We have a few more specific modules that aim to reproduce some of the logic that **pandas** users may be familiar with,
such as:
- `Lag` to implement a delay on the input data
- `Diff` to compute differences between values over time
- `PctChange` to compute the relative difference between values over time.
- `RollingMean` to compute the moving average over time.
- `EWMA`, `EWMVar`, `EWMCov`, to compute the exponential moving average, variance, covariance of the input data.

### Online learning and reinforcement learning modules

Finally, we implement domain-specific modules for online learning and reinforcement
learning such as `OnlineSupervisedLearner` and `GymFeedback` (see dedicated sections).

### accessors

For now, WAX-ML offers direct access to some modules through specific accessors for pandas and xarray users.
For instance, we have an implementation of the "exponential moving average" directly
accessible through the accessor `<data-container>.ewm(...).mean()` which provides a
drop-in replacement for the exponential moving average of pandas.

For now, WAX-ML offers direct access to some modules through specific accessors for pandas and xarray users.

For instance, you can see our implementation of the "exponential moving average".  This
is a drop-in replacement for the exponential moving average of pandas.

Let's show how it works on the "air temperature" dataset from `xarray.tutorials`:

```python
import xarray as xr
da = xr.tutorial.open_dataset("air_temperature")
dataframe = da.air.to_series().unstack(["lon", "lat"])
```

Pandas ewma:
```python
air_temp_ewma = dataframe.ewm(alpha=1.0 / 10.0).mean()
```

WAX-ML ewma:
```python
air_temp_ewma = dataframe.wax.ewm(alpha=1.0 / 10.0).mean()
```


### Apply a custom function to a Dataset

Now let's illustrate how WAX-ML accessors work on [xarray datasets](http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html).

```python
from wax.modules import EWMA


def my_custom_function(dataset):
    return {
        "air_10": EWMA(1.0 / 10.0)(dataset["air"]),
        "air_100": EWMA(1.0 / 100.0)(dataset["air"]),
    }


dataset = xr.tutorial.open_dataset("air_temperature")
output, state = dataset.wax.stream().apply(
    my_custom_function, format_dims=dataset.air.dims
)

_ = output.isel(lat=0, lon=0).drop(["lat", "lon"]).to_pandas().plot(figsize=(12, 8))
```

<div align="center">
<img src="docs/_static/my_custom_function_on_dataset.png" alt="logo" width="60%"></img>
</div>

You can see our [Documentation](https://wax-ml.readthedocs.io/en/latest/) for examples with
EWMA or Binning on the air temperature dataset.


### ⚡ Performance on big dataframes ⚡

Check out our [Documentation](https://wax-ml.readthedocs.io/en/latest/) to
see how you can use our "3-step workflow" to speed things up!


### 🔥 Speed 🔥

WAX-ML algorithms are implemented in JAX, so they are fast!

The use of JAX allows for leveraging hardware accelerators that optimize programs for the CPU, GPU, and TPU.

With WAX-ML, you can already compute an exponential moving average on a dataframe with 1 million rows with a 3x to 100x speedup
(depending on the data container you use and speed measurement methodology) compared to
pandas implementation.  (See our notebook in the
[Quick Start Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/04_The_three_steps_workflow.html)
or in
[Colaboratory](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/04_The_three_steps_workflow.ipynb)
).

## ♻ Feedback loops ♻

Feedback is a fundamental notion in time-series analysis and has a wide history
(see [Feedback Wikipedia page](https://en.wikipedia.org/wiki/Feedback)  for instance).
So, we believe it is important to be able to implement them well in WAX-ML.


A fundamental piece in the implementation of feedback loops is the delay operator. We implement it
with the delay module `Lag` which is itself implemented with the `Buffer` module, a module
implementing the buffering mechanism.

The linear state-space models used to model linear time-invariant systems in signal theory are a
well-known place where feedbacks are used to implement for instance infinite impulse response
filters.  This is easily implemented with the WAX-ML tools and will be implemented at
a later time.


Another example is control theory or reinforcement learning.
In reinforcement learning setup, an agent and an environment interact with a feedback loop.
This generally results in a non-trivial global dynamic.
In WAX-ML, we propose a simple module called
`GymFeedBack` that allows the implementation of reinforcement learning experiments.
This is built from an agent and an environment, both possibly having parameters and state:

<div align="center">
<img src="docs/tikz/agent_env.png" alt="logo" width="60%"></img>
</div>

- The agent is in charge of generating an action from observations.
- The environment is in charge of calculating a reward associated with the agent's action and preparing
  the next observation from some "raw observations" and the agent's action, which it gives back to the
  agent.

A feedback instance `GymFeedback(agent, env)` is a function that processes the
"raw observations" and returns a reward as represented here:

<div align="center">
<img src="docs/tikz/gymfeedback.png" alt="logo" width="60%"></img>
</div>

Equivalently, we can describe the function `GymFeedback(agent, env)`,
after transformation by Haiku transformation, by a pair of pure functions
`init` and `apply` that we describe here:

<div align="center">
<img src="docs/tikz/gymfeedback_init_apply.png" alt="logo" width="100%"></img>
</div>

We have made concrete use of this feedback mechanism in this notebook where
we give an example of online linear regression in a non-stationary environment:
- 🦎: [online learning example ![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/06_Online_Linear_Regression.ipynb),
  [Open in Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/06_Online_Linear_Regression.html) 🦎

Here is an illustrative plot of the final result of the study:

<div align="center">
<img src="docs/_static/online_linear_regression_regret.png" alt="logo" width="100%"></img>
</div>

- Left: The regret (cumulative sum of losses) first becomes concave, which means that the agent "learns something".
  Then, the regret curve has a bump at step 2000 where it becomes locally linear.
  It finally ends in a concave regime concave regime, which means that the agent has adapted to the new regime.
- Right: We see that the weights converge to the correct values in both regimes

### Compatibility with other reinforcement learning frameworks

In addition, to ensure compatibility with other tools in the Gym ecosystem, we propose a
*transformation* mechanism to transform functions into standard stateful Python objects
following the Gym API for *agents* and *environments* implemented in
[deluca](https://github.com/google/deluca).  These wrappers are in the `wax.gym` package.

WAX-ML implements *callbacks* in the `wax.gym` package.  The callback API was inspired by
the one in the one in [dask](https://github.com/dask/dask).

WAX-ML should provide tools for reinforcement learning that should complement well those
already existing such as [RLax](https://github.com/deepmind/rlax) or [deluca](https://github.com/google/deluca).

## Future plans

### Feedback loops and control theory

We would like to implement other types of feedback loops in WAX-ML.
For instance, those of the standard control theory toolboxes,
such as those implemented in the SLICOT [SLICOT](http://slicot.org/) library.

Many algorithms in this space are absent from the Python ecosystem and
we aim to provide JAX-based implementations and expose them with a simple API.

An idiomatic example in this field is the
[Kalman filter](https://fr.wikipedia.org/wiki/Filtre_de_Kalman),
a now-standard algorithm that dates back to the 1950s.
After 30 years of existence, the Python ecosystem has still not integrated
this algorithm into widely adopted libraries!
Some implementations can be found in Python libraries such as
[python-control](https://github.com/python-control/python-control),
[stats-models](https://www.statsmodels.org/stable/index.html),
[SciPy Cookbook](https://scipy-cookbook.readthedocs.io/items/KalmanFiltering.html#).
Also, some machine learning libraries have some closed and non-solved issues on this subject
, see [Scikit-learn#862 issue](https://github.com/scikit-learn/scikit-learn/pull/862)
or [River#355 issue](https://github.com/online-ml/river/pull/355).
Why has the Kalman filter not found its place in these libraries?
We think it may be because they have an object-oriented API, which makes
them very well suited to the specific problems of modern machine learning but,
on the other hand, prevents them from accommodating additional features such as Kalman filtering.
We think the functional approach of WAX-ML, inherited from JAX, could well
help to integrate a Kalman filter implementation in a machine learning ecosystem.

It turns out that Python code written with JAX is not very far from
[Fortran](https://fr.wikipedia.org/wiki/Fortran), a mathematical FORmula TRANslating
system.  It should therefore be quite easy and natural to reimplement standard
algorithms implemented in Fortran, such as those in the
[SLICOT](http://slicot.org/) library with JAX.
It seems that some questions about the integration of Fortran into
JAX has already been raised.
As noted in
[this discussion on JAX's Github page](https://github.com/google/jax/discussions/3950),
it might even be possible to simply wrap Fortran code in JAX.
This would avoid a painful rewriting process!


Along with the implementation of good old algorithms,
we would like to implement more recent ones from the online learning
literature that somehow revisits the filtering and control problems.
In particular, we would like to implement the online learning version of the
ARMA model developed in [[3]](#references)
and some online-learning versions of control theory algorithms,
an approach called "the non-stochastic control problem",
such as the linear quadratic regulator (see [[4]](#references)).

### Optimization


The JAX ecosystem already has a library dedicated to optimization:
[Optax](https://github.com/deepmind/optax), which we use in WAX-ML.
We could complete it by offering
other first-order algorithms such as the Alternating Direction Multiplier Method
[(ADMM)](https://stanford.edu/~boyd/admm.html).
One can find "functional" implementations of proximal algorithms in libraries such
as
[proxmin](https://github.com/pmelchior/proxmin)),
[ProximalOperators](https://kul-forbes.github.io/ProximalOperators.jl/latest/),
or [COSMO](https://github.com/oxfordcontrol/COSMO.jl),
which could give good reference implementations to start the work.


Another type of work took place around automatic differentiation and optimization.
In [[5]](#references) the authors implement differentiable layers based on
convex optimization in the library
[cvxpylayers](https://github.com/cvxgrp/cvxpylayers).
They have implemented a JAX API but, at the moment, they cannot use the
`jit` compilation of JAX yet
(see [this issue](https://github.com/cvxgrp/cvxpylayers/issues/103)).
We would be interested in helping to solve this issue.

Furthermore, in the recent paper [[9]](#references), the authors propose a new
efficient and modular implicit differentiation technique with a JAX-based implementation that should
lead to a new open-source optimization library in the JAX ecosystem.

### Other algorithms

The machine learning libraries [Scikit-learn](https://scikit-learn.org/stable/),
[River](https://github.com/online-ml/river),
[ml-numpy](https://github.com/ddbourgin/numpy-ml) implement many "traditional" machine
learning algorithms that should provide an excellent basis for linking or reimplementing
in JAX.  WAX-ML could help to build a repository for JAX versions of these algorithms.

### Other APIS



As it did for the Gym API, WAX-ML could add support for other high-level object-oriented APIs like
Keras, Scikit-learn, River ...


### Collaborations

The WAX-ML team is open to discussion and collaboration with contributors from any field who are
 interested in using WAX-ML for their problems on streaming data.  We are looking for use cases
 around data streaming in audio processing, natural language processing, astrophysics, biology,
 finance, engineering ...

 We believe that good software design, especially in the scientific domain, requires practical use
 cases and that the more diversified these use cases are, the more the developed functionalities
 will be guaranteed to be well implemented.

 By making this software public, we hope to find enthusiasts who aim to develop WAX-ML further!


## ⚒ Installation ⚒

You can install WAX-ML with the command:

```bash
pip install wax-ml
```

To install the latest version from source, you can use the command :

```bash
pip install "wax-ml[dev,complete] @ git+https://github.com/eserie/wax-ml.git"
```

## Disclaimer

WAX-ML is in its early stages of development and its features and API are very likely to
evolve.


## Development

You can contribute to WAX-ML by asking questions, proposing practical use cases, or by contributing to the code or the documentation.  You can have a look at our [Contributing
Guidelines](https://github.com/eserie/wax-ml/CONTRIBUTING.md) and [Developer
Documentation](https://wax-ml.readthedocs.io/en/latest/developer.html) .

We maintain a "WAX-ML Enhancement Proposals" in
[WEP.md](https://github.com/eserie/wax-ml/WEP.md) file.


## References

[1] [Google Princeton AI and Hazan Lab @ Princeton University](https://www.minregret.com/research/)

[2] [FNet: Mixing Tokens with Fourier Transforms, James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon](https://arxiv.org/abs/2105.03824)

[3] [Online Learning for Time Series Prediction, Oren Anava, Elad Hazan, Shie Mannor, Ohad Shamir](http://proceedings.mlr.press/v30/Anava13.html)

[4] [The Nonstochastic Control Problem, Elad Hazan, Sham M. Kakade, Karan Singh](https://arxiv.org/abs/1911.12178)

[5] [Differentiable Convex Optimization Layers, Akshay Agrawal, Brandon Amos, Shane Barratt, Stephen Boyd, Steven Diamond, Zico Kolter](https://arxiv.org/abs/1910.12430)

[6] [Machine learning accelerated computational fluid dynamics, Dmitrii Kochkov, Jamie A. Smith, Ayya Alieva, Qing Wang, Michael P. Brenner, Stephan Hoyer](https://arxiv.org/abs/2102.01010)

[7] [Large dimension forecasting models and random singular value spectra, Jean-Philippe Bouchaud, Laurent Laloux, M. Augusta Miceli, Marc Potters](https://arxiv.org/abs/physics/0512090)

[8] [A novel dynamic PCA algorithm for dynamic data modeling and process monitoring, Yining Dongac and S. JoeQina](https://www.sciencedirect.com/science/article/pii/S095915241730094X)

[9] [Efficient and Modular Implicit Differentiation, Mathieu Blondel, Quentin Berthet, Marco Cuturi, Roy Frostig, Stephan Hoyer, Felipe Llinares-López, Fabian Pedregosa, Jean-Philippe Vert](https://arxiv.org/abs/2105.15183)

## License

```
Copyright 2021 The WAX-ML Authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```

WAX-ML bundles portions of astropy, dask, deluca, haiku, jax, xarray.

astropy, dask are available under a "3-clause BSD" license:
- dask: `wax/gym/callbacks/callbacks.py`
- astropy: `CONTRIBUTING.md`

deluca, haiku, jax and xarray are available under a "Apache" license:
- deluca: `wax/gym/entity.py`
- haiku: `docs/notebooks/05_reconstructing_the_light_curve_of_stars.*`
- jax: `docs/conf.py`, `docs/developer.md`
- xarray: `wax/datasets/generate_temperature_data.py`

The full text of these `licenses` is included in the licenses directory.


## Citing WAX-ML

If you use WAX-ML, please cite our [paper](http://arxiv.org/abs/2106.06524) using the BibTex entry:

```
@misc{sérié2021waxml,
      title={{WAX-ML}: {A} {P}ython library for machine learning and feedback loops on streaming data},
      author={Emmanuel Sérié},
      year={2021},
      eprint={2106.06524},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url = {http://arxiv.org/abs/2106.06524},
}
```


## Reference documentation

For details about the WAX-ML API, see the
[reference documentation](https://wax-ml.readthedocs.io/en/latest/).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/eserie/wax-ml",
    "name": "wax-ml",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "time series,machine learning,optimization,optimal control,online learning,reinforcement learning",
    "author": "WAX-ML Authors",
    "author_email": "eserie@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/6b/5f/8f8eee7d231129480e0675b47041d70299bbde3c9b6d4d7ec0826695ee24/wax-ml-0.6.4.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n<img src=\"https://github.com/eserie/wax-ml/blob/main/docs/_static/wax_logo.png\" alt=\"logo\" width=\"40%\"></img>\n</div>\n\n# WAX-ML: A Python library for machine-learning and feedback loops on streaming data\n\n![Continuous integration](https://github.com/eserie/wax-ml/actions/workflows/tests.yml/badge.svg)\n[![Documentation Status](https://readthedocs.org/projects/wax-ml/badge/?version=latest)](https://wax-ml.readthedocs.io/en/latest/)\n[![PyPI version](https://badge.fury.io/py/wax-ml.svg)](https://badge.fury.io/py/wax-ml)\n[![Codecov](https://codecov.io/gh/eserie/wax-ml/branch/main/graph/badge.svg)](https://codecov.io/gh/eserie/wax-ml)\n[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)\n\n\n\n[**Quickstart**](#quickstart-colab-in-the-cloud)\n| [**Install guide**](#installation)\n| [**Change logs**](https://wax-ml.readthedocs.io/en/latest/changelog.html)\n| [**Reference docs**](https://wax-ml.readthedocs.io/en/latest/)\n\n## Introduction\n\n\ud83c\udf0a Wax is what you put on a surfboard to avoid slipping. It is an essential tool to go\nsurfing ... \ud83c\udf0a\n\nWAX-ML is a research-oriented [Python](https://www.python.org/)  library\nproviding tools to design powerful machine learning algorithms and feedback loops\nworking on streaming data.\n\nIt strives to complement [JAX](https://jax.readthedocs.io/en/latest/)\nwith tools dedicated to time series.\n\nWAX-ML makes JAX-based programs easy to use for end-users working\nwith\n[pandas](https://pandas.pydata.org/) and [xarray](http://xarray.pydata.org/en/stable/)\nfor data manipulation.\n\nWAX-ML provides a simple mechanism for implementing feedback loops, allows the implementation of\n reinforcement learning algorithms with functions, and makes them easy to integrate by\nend-users working with the object-oriented reinforcement learning framework from the\n[Gym](https://gym.openai.com/) library.\n\nTo learn more, you can read our [article on ArXiv](http://arxiv.org/abs/2106.06524)\nor simply access the code in this repository.\n\n## WAX-ML Goal\n\nWAX-ML's goal is to expose \"traditional\" algorithms that are often difficult to find in standard\nPython ecosystem and are related to time-series and more generally to streaming data.\n\nIt aims to make it easy to work with algorithms from very various computational domains such as\nmachine learning, online learning, reinforcement learning, optimal control, time-series analysis,\noptimization, statistical modeling.\n\nFor now, WAX-ML focuses on **time-series** algorithms as this is one of the areas of machine learning\nthat lacks the most dedicated tools.  Working with time series is notoriously known to be difficult\nand often requires very specific algorithms (statistical modeling, filtering, optimal control).\n\n\nEven though some of the modern machine learning methods such as RNN, LSTM, or reinforcement learning\ncan do an excellent job on some specific time-series problems, most of the problems require using\nmore traditional algorithms such as linear and non-linear filters, FFT,\nthe eigendecomposition of matrices (e.g. [[7]](#references)),\nprincipal component analysis (PCA) (e.g. [[8]](#references)), Riccati solvers for\noptimal control and filtering, ...\n\n\nBy adopting a functional approach, inherited from JAX, WAX-ML aims to be an efficient tool to\ncombine modern machine learning approaches with more traditional ones.\n\n\nSome work has been done in this direction in [[2] in References](#references) where transformer encoder\narchitectures are massively accelerated, with limited accuracy costs, by replacing the\nself-attention sublayers with a standard, non-parameterized Fast Fourier Transform (FFT).\n\n\nWAX-ML may also be useful for developing research ideas in areas such as online machine learning\n(see [[1] in References](#references)) and development of control, reinforcement learning,\nand online optimization methods.\n\n## What does WAX-ML do?\n\nWell, building WAX-ML, we have some pretty ambitious design and implementation goals.\n\nTo do things right, we decided to start small and in an open-source design from the beginning.\n\n\nFor now, WAX-ML contains:\n- transformation tools that we call \"unroll\" transformations allowing us to\n  apply any transformation, possibly stateful, on sequential data.  It generalizes the RNN\n  architecture to any stateful transformation allowing the implementation of any kind of \"filter\".\n\n- a \"stream\" module, described in [\ud83c\udf0a Streaming Data \ud83c\udf0a](#-streaming-data-), permitting us to\n  synchronize data streams with different time resolutions.\n\n- some general pandas and xarray \"accessors\" permitting the application of any\n  JAX-functions on pandas and xarray data containers:\n  `DataFrame`, `Series`, `Dataset`, and `DataArray`.\n\n- ready-to-use exponential moving average filter that we exposed with two APIs:\n    - one for JAX users: as Haiku modules (`EWMA`, ... see the complete list in our\n    [API documentation](https://wax-ml.readthedocs.io/en/latest/wax.modules.html)\n    ).\n    - a second one for pandas and xarray users: with drop-in replacement of pandas\n      `ewm` accessor.\n\n- a simple module `OnlineSupervisedLearner` to implement online learning algorithms\n  for supervised machine learning problems.\n\n- building blocks for designing feedback loops in reinforcement learning, and have\n  provided a module called `GymFeedback` allowing the implementation of feedback loop as the\n  introduced in the library [Gym](https://gym.openai.com/), and illustrated this figure:\n\n  <div align=\"center\">\n  <img src=\"docs/tikz/gymfeedback.png\" alt=\"logo\" width=\"60%\"></img>\n  </div>\n\n### What is JAX?\n\nJAX is a research-oriented computational system implemented in Python that leverages the\nXLA optimization framework for machine learning computations.  It makes XLA usable with\nthe NumPy API and some functional primitives for just-in-time compilation,\ndifferentiation, vectorization, and parallelization.  It allows building higher-level\ntransformations or \"programs\" in a functional programming approach.\nSee  [JAX's page](https://github.com/google/jax) for more details.\n\n\n## Why to use WAX-ML?\n\nIf you deal with time-series and are a pandas or xarray user, b\nut you want to use the impressive\ntools of the JAX ecosystem, then WAX-ML might be the right tool for you,\nas it implements pandas and\nxarray accessors to apply JAX functions.\n\nIf you are a user of JAX, you may be interested in adding WAX-ML to your toolbox to address\ntime-series problems.\n\n## Design\n\n### Research oriented\nWAX-ML is a research-oriented library.  It relies on\n[JAX](https://jax.readthedocs.io/en/latest/notebooks/quickstart.html) and\n[Haiku](https://github.com/deepmind/dm-haiku) functional programming paradigm to ease the\ndevelopment of research ideas.\n\nWAX-ML is a bit like [Flux](https://fluxml.ai/Flux.jl/stable/)\nin [Julia](https://julialang.org/) programming language.\n\n### Functional programming\nIn WAX-ML, we pursue a functional programming approach inherited from JAX.\n\nIn this sense, WAX-ML is not a framework, as most object-oriented libraries offer.  Instead, we\nimplement \"functions\" that must be pure to exploit the JAX ecosystem.\n\n### Haiku modules\nWe use the \"module\" mechanism proposed by the Haiku library to easily generate pure function pairs,\ncalled `init` and `apply` in Haiku, to implement programs that require the management of\nparameters and/or state variables.\nYou can see\n[the Haiku module API](https://dm-haiku.readthedocs.io/en/latest/api.html#modules-parameters-and-state)\nand\n[Haiku transformation functions](https://dm-haiku.readthedocs.io/en/latest/api.html#haiku-transforms)\nfor more details.\n\nIn this way, we can recover all the advantages of\nobject-oriented programming but exposed in the functional programming approach.\nIt permits to ease the development of robust and reusable features and to\ndevelop \"mini-languages\" tailored to specific scientific domains.\n\n\n### WAX-ML works with other libraries\n\nWe want existing machine learning libraries to work well together while trying to leverage their strength.\nThis is facilitated with a functional programming approach.\n\nWAX-ML is not a framework but either a set of tools that aim to complement\n[JAX Ecosystem](https://moocaholic.medium.com/jax-a13e83f49897).\n\n\n# Contents\n* [\ud83d\ude80 Quickstart: Colab in the Cloud \ud83d\ude80](#-quicksart-colab-in-the-cloud-)\n* [\u23f1 Synchronize streams \u23f1](#-synchronize-streams-)\n* [\ud83c\udf0a Streaming Data \ud83c\udf0a](#-streaming-data-)\n* [Implemented modules](#-implemented-modules-)\n* [\u267b Feedback loops \u267b](#-feedback-loops-)\n* [Future plans](#future-plans)\n* [\u2692 Installation \u2692](#installation)\n* [Disclaimer](#disclaimer)\n* [Development](#development)\n* [References](#references)\n* [License](#license)\n* [Citing WAX-ML](#citing-wax)\n* [Reference documentation](#reference-documentation)\n\n\n## \ud83d\ude80 Quickstart \ud83d\ude80\n\nJump right in using a notebook in your browser, connected to a Google Cloud GPU or\nsimply read our notebook in the\n[documentation](https://wax-ml.readthedocs.io/en/latest/).\n\nHere are some starter notebooks:\n- \u3030 Compute exponential moving averages with xarray and pandas accessors \u3030 : [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/01_demo_EWMA.ipynb),\n  [Open in Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/01_demo_EWMA.html)\n- \u23f1 Synchronize data streams \u23f1 : [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/02_Synchronize_data_streams.ipynb),\n  [Open in Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/02_Synchronize_data_streams.html)\n- \ud83c\udf21 Binning temperatures \ud83c\udf21 : [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/03_ohlc_temperature.ipynb),\n  [Open in Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/03_ohlc_temperature.html)\n- \ud83c\udf9b The three steps workflow \ud83c\udf9b : [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/04_The_three_steps_workflow.ipynb),\n  [Open in Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/04_The_three_steps_workflow.html)\n- \ud83d\udd2d Reconstructing the light curve of stars with LSTM \ud83d\udd2d: [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/05_reconstructing_the_light_curve_of_stars.ipynb),\n  [Open in Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/05_reconstructing_the_light_curve_of_stars.html)\n- \ud83e\udd8e Online linear regression with a non-stationary environment \ud83e\udd8e: [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/06_Online_Linear_Regression.ipynb),\n  [Open in Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/06_Online_Linear_Regression.html)\n\n\n## \u23f1 Synchronize streams \u23f1\n\nPhysicists have brought a solution to the synchronization problem called the Poincar\u00e9\u2013Einstein\nsynchronization (See [Poincar\u00e9-Einstein synchronization Wikipedia\npage](https://en.wikipedia.org/wiki/Einstein_synchronisation)).  In WAX-ML we have implemented a similar\nmechanism by defining a \"local time\", borrowing Henri Poincar\u00e9 terminology, to denominate the\ntimestamps of the stream (the \"local stream\") in which the user wants to apply transformations and\nunravel all other streams.  The other streams, which we have called \"secondary streams\", are pushed\nback in the local stream using embedding maps which specify how to convert timestamps from a\nsecondary stream into timestamps in the local stream.\n\nThis synchronization mechanism permits to work with secondary streams having timestamps at\nfrequencies that can be lower or higher than the local stream. The data from these secondary streams\nare represented in the \"local stream\" either with the use of a forward filling mechanism for lower\nfrequencies or with a buffering mechanism for higher frequencies.\n\nNote that this simple synchronization scheme assumes that the different event streams have fixed\nlatencies.\n\nWe have implemented a \"data tracing\" mechanism to optimize access to out-of-sync streams.  This\nmechanism works on in-memory data.  We perform the first pass on the data, without actually\naccessing it, and determine the indices necessary to later access the data. Doing so we are vigilant\nto not let any \"future\" information pass through and thus guaranty a data processing that respects\ncausality.\n\nThe buffering mechanism used in the case of higher frequencies works with a fixed buffer size\n(see the WAX-ML module [`wax.modules.Buffer`](https://wax-ml.readthedocs.io/en/latest/_autosummary/wax.modules.buffer.html#module-wax.modules.buffer)\nto allow the use of JAX / XLA optimizations and efficient processing.\n\n### Example\n\nLet's illustrate with a small example how `wax.stream.Stream` synchronizes data streams.\n\nLet's use the dataset \"air temperature\" with :\n- An air temperature is defined with hourly resolution.\n- A \"fake\" ground temperature is defined with a daily resolution as the air temperature minus 10 degrees.\n\n\n```python\n\nfrom wax.accessors import register_wax_accessors\n\nregister_wax_accessors()\n```\n\n```python\n\nfrom wax.modules import EWMA\n\n\ndef my_custom_function(dataset):\n    return {\n        \"air_10\": EWMA(1.0 / 10.0)(dataset[\"air\"]),\n        \"air_100\": EWMA(1.0 / 100.0)(dataset[\"air\"]),\n        \"ground_100\": EWMA(1.0 / 100.0)(dataset[\"ground\"]),\n    }\n```\n\n```python\nresults, state = dataset.wax.stream(\n    local_time=\"time\", ffills={\"day\": 1}, pbar=True\n).apply(my_custom_function, format_dims=dataset.air.dims)\n```\n\n```python\n_ = results.isel(lat=0, lon=0).drop([\"lat\", \"lon\"]).to_pandas().plot(figsize=(12, 8))\n```\n\n<div align=\"center\">\n<img src=\"docs/_static/synchronize_data_streams.png\" alt=\"logo\" width=\"60%\"></img>\n</div>\n\n## \ud83c\udf0a Streaming Data \ud83c\udf0a\n\nWAX-ML may complement JAX ecosystem by adding support for **streaming data**.\n\nTo do this, WAX-ML implements a unique **data tracing** mechanism that prepares for fast\naccess to in-memory data and allows the execution of JAX tractable functions such as\n`jit`, `grad`, `vmap`, or `pmap`.\n\nThis mechanism is somewhat special in that it works with time-series data.\n\nThe `wax.stream.Stream` object implements this idea.  It uses Python generators to\n**synchronize multiple streaming data streams** with potentially different temporal\nresolutions.\n\nThe `wax.stream.Stream` object works on in-memory data stored in\n[`xarray.Dataset`](http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html).\n\nTo work with \"real\" streaming data, it should be possible to implement a buffer\nmechanism running on any Python generator and to use the synchronization and data\ntracing mechanisms implemented in WAX-ML to apply JAX transformations on batches of data\nstored in memory. (See our WEP4 enhancement proposal)\n\n### \u231b Adding support for time dtypes in JAX \u231b\n\nAt the moment `datetime64` and `string_` dtypes are not supported in JAX.\n\nWAX-ML add support for `datetime64` and `string_` NumPy dtypes in JAX.\nTo do so, WAX-ML implements:\n- an encoding scheme for `datetime64` relying on pairs of 32-bit integers similar to `PRNGKey` in JAX.\n- an encoding scheme for `string_` relying on `LabelEncoder` of [Scikit-learn](https://scikit-learn.org/stable/).\n\nBy providing these two encoding schemes, WAX-ML makes it easy to use JAX algorithms on data of these types.\n\nCurrently, the types of time offsets supported by WAX-ML are quite limited and we\ncollaborate with the pandas, xarray, and [Astropy](https://www.astropy.org/) teams\nto further develop the time manipulation tools in WAX-ML (see \"WEP1\" in `WEP.md`).\n\n### pandas and xarray accessors\n\nWAX-ML implements pandas and xarray accessors to ease the usage of machine-learning\nalgorithms implemented with functions implemented with Haiku modules on high-level data APIs :\n- pandas's `DataFrame` and `Series`\n- xarray's `Dataset` and `DataArray`.\n\nTo load the accessors, run:\n```python\nfrom wax.accessors import register_wax_accessors\nregister_wax_accessors()\n```\n\nThen run the \"one-liner\" syntax:\n```python\n<data-container>.stream(\u2026).apply(\u2026)\n```\n\n## Implemented modules\n\nWe have some modules (inherited from Haiku modules) ready to be used in `wax.modules`\n(see our [api documentation](https://wax-ml.readthedocs.io/en/latest/wax.modules.html)).\n\nThey can be considered as \"building blocks\" that can be reused to build more advanced programs to run on streaming data.\n\n### Fundamental modules\n\nWe have some \"fundamental\" modules that are specific to time series management,\n- the `Buffer` module which implements the buffering mechanism\n- the `UpdateOnEvent` module which allows to \"freeze\" the computation of a program and\n  to update it on some events in the \"local flow\".\n  To illustrate the use of this module we show how it can be used to compute the opening,\n  high and closing quantities of temperatures recorded during a day,\n  the binning process being reset at each day change.  We show an illustrative graph of the final result:\n\n\n<div align=\"center\">\n<img src=\"docs/_static/trailing_ohlc.png\" alt=\"logo\" width=\"60%\"></img>\n</div>\n\n### pandas modules\n\nWe have a few more specific modules that aim to reproduce some of the logic that **pandas** users may be familiar with,\nsuch as:\n- `Lag` to implement a delay on the input data\n- `Diff` to compute differences between values over time\n- `PctChange` to compute the relative difference between values over time.\n- `RollingMean` to compute the moving average over time.\n- `EWMA`, `EWMVar`, `EWMCov`, to compute the exponential moving average, variance, covariance of the input data.\n\n### Online learning and reinforcement learning modules\n\nFinally, we implement domain-specific modules for online learning and reinforcement\nlearning such as `OnlineSupervisedLearner` and `GymFeedback` (see dedicated sections).\n\n### accessors\n\nFor now, WAX-ML offers direct access to some modules through specific accessors for pandas and xarray users.\nFor instance, we have an implementation of the \"exponential moving average\" directly\naccessible through the accessor `<data-container>.ewm(...).mean()` which provides a\ndrop-in replacement for the exponential moving average of pandas.\n\nFor now, WAX-ML offers direct access to some modules through specific accessors for pandas and xarray users.\n\nFor instance, you can see our implementation of the \"exponential moving average\".  This\nis a drop-in replacement for the exponential moving average of pandas.\n\nLet's show how it works on the \"air temperature\" dataset from `xarray.tutorials`:\n\n```python\nimport xarray as xr\nda = xr.tutorial.open_dataset(\"air_temperature\")\ndataframe = da.air.to_series().unstack([\"lon\", \"lat\"])\n```\n\nPandas ewma:\n```python\nair_temp_ewma = dataframe.ewm(alpha=1.0 / 10.0).mean()\n```\n\nWAX-ML ewma:\n```python\nair_temp_ewma = dataframe.wax.ewm(alpha=1.0 / 10.0).mean()\n```\n\n\n### Apply a custom function to a Dataset\n\nNow let's illustrate how WAX-ML accessors work on [xarray datasets](http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html).\n\n```python\nfrom wax.modules import EWMA\n\n\ndef my_custom_function(dataset):\n    return {\n        \"air_10\": EWMA(1.0 / 10.0)(dataset[\"air\"]),\n        \"air_100\": EWMA(1.0 / 100.0)(dataset[\"air\"]),\n    }\n\n\ndataset = xr.tutorial.open_dataset(\"air_temperature\")\noutput, state = dataset.wax.stream().apply(\n    my_custom_function, format_dims=dataset.air.dims\n)\n\n_ = output.isel(lat=0, lon=0).drop([\"lat\", \"lon\"]).to_pandas().plot(figsize=(12, 8))\n```\n\n<div align=\"center\">\n<img src=\"docs/_static/my_custom_function_on_dataset.png\" alt=\"logo\" width=\"60%\"></img>\n</div>\n\nYou can see our [Documentation](https://wax-ml.readthedocs.io/en/latest/) for examples with\nEWMA or Binning on the air temperature dataset.\n\n\n### \u26a1 Performance on big dataframes \u26a1\n\nCheck out our [Documentation](https://wax-ml.readthedocs.io/en/latest/) to\nsee how you can use our \"3-step workflow\" to speed things up!\n\n\n### \ud83d\udd25 Speed \ud83d\udd25\n\nWAX-ML algorithms are implemented in JAX, so they are fast!\n\nThe use of JAX allows for leveraging hardware accelerators that optimize programs for the CPU, GPU, and TPU.\n\nWith WAX-ML, you can already compute an exponential moving average on a dataframe with 1 million rows with a 3x to 100x speedup\n(depending on the data container you use and speed measurement methodology) compared to\npandas implementation.  (See our notebook in the\n[Quick Start Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/04_The_three_steps_workflow.html)\nor in\n[Colaboratory](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/04_The_three_steps_workflow.ipynb)\n).\n\n## \u267b Feedback loops \u267b\n\nFeedback is a fundamental notion in time-series analysis and has a wide history\n(see [Feedback Wikipedia page](https://en.wikipedia.org/wiki/Feedback)  for instance).\nSo, we believe it is important to be able to implement them well in WAX-ML.\n\n\nA fundamental piece in the implementation of feedback loops is the delay operator. We implement it\nwith the delay module `Lag` which is itself implemented with the `Buffer` module, a module\nimplementing the buffering mechanism.\n\nThe linear state-space models used to model linear time-invariant systems in signal theory are a\nwell-known place where feedbacks are used to implement for instance infinite impulse response\nfilters.  This is easily implemented with the WAX-ML tools and will be implemented at\na later time.\n\n\nAnother example is control theory or reinforcement learning.\nIn reinforcement learning setup, an agent and an environment interact with a feedback loop.\nThis generally results in a non-trivial global dynamic.\nIn WAX-ML, we propose a simple module called\n`GymFeedBack` that allows the implementation of reinforcement learning experiments.\nThis is built from an agent and an environment, both possibly having parameters and state:\n\n<div align=\"center\">\n<img src=\"docs/tikz/agent_env.png\" alt=\"logo\" width=\"60%\"></img>\n</div>\n\n- The agent is in charge of generating an action from observations.\n- The environment is in charge of calculating a reward associated with the agent's action and preparing\n  the next observation from some \"raw observations\" and the agent's action, which it gives back to the\n  agent.\n\nA feedback instance `GymFeedback(agent, env)` is a function that processes the\n\"raw observations\" and returns a reward as represented here:\n\n<div align=\"center\">\n<img src=\"docs/tikz/gymfeedback.png\" alt=\"logo\" width=\"60%\"></img>\n</div>\n\nEquivalently, we can describe the function `GymFeedback(agent, env)`,\nafter transformation by Haiku transformation, by a pair of pure functions\n`init` and `apply` that we describe here:\n\n<div align=\"center\">\n<img src=\"docs/tikz/gymfeedback_init_apply.png\" alt=\"logo\" width=\"100%\"></img>\n</div>\n\nWe have made concrete use of this feedback mechanism in this notebook where\nwe give an example of online linear regression in a non-stationary environment:\n- \ud83e\udd8e: [online learning example ![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eserie/wax-ml/blob/main/docs/notebooks/06_Online_Linear_Regression.ipynb),\n  [Open in Documentation](https://wax-ml.readthedocs.io/en/latest/notebooks/06_Online_Linear_Regression.html) \ud83e\udd8e\n\nHere is an illustrative plot of the final result of the study:\n\n<div align=\"center\">\n<img src=\"docs/_static/online_linear_regression_regret.png\" alt=\"logo\" width=\"100%\"></img>\n</div>\n\n- Left: The regret (cumulative sum of losses) first becomes concave, which means that the agent \"learns something\".\n  Then, the regret curve has a bump at step 2000 where it becomes locally linear.\n  It finally ends in a concave regime concave regime, which means that the agent has adapted to the new regime.\n- Right: We see that the weights converge to the correct values in both regimes\n\n### Compatibility with other reinforcement learning frameworks\n\nIn addition, to ensure compatibility with other tools in the Gym ecosystem, we propose a\n*transformation* mechanism to transform functions into standard stateful Python objects\nfollowing the Gym API for *agents* and *environments* implemented in\n[deluca](https://github.com/google/deluca).  These wrappers are in the `wax.gym` package.\n\nWAX-ML implements *callbacks* in the `wax.gym` package.  The callback API was inspired by\nthe one in the one in [dask](https://github.com/dask/dask).\n\nWAX-ML should provide tools for reinforcement learning that should complement well those\nalready existing such as [RLax](https://github.com/deepmind/rlax) or [deluca](https://github.com/google/deluca).\n\n## Future plans\n\n### Feedback loops and control theory\n\nWe would like to implement other types of feedback loops in WAX-ML.\nFor instance, those of the standard control theory toolboxes,\nsuch as those implemented in the SLICOT [SLICOT](http://slicot.org/) library.\n\nMany algorithms in this space are absent from the Python ecosystem and\nwe aim to provide JAX-based implementations and expose them with a simple API.\n\nAn idiomatic example in this field is the\n[Kalman filter](https://fr.wikipedia.org/wiki/Filtre_de_Kalman),\na now-standard algorithm that dates back to the 1950s.\nAfter 30 years of existence, the Python ecosystem has still not integrated\nthis algorithm into widely adopted libraries!\nSome implementations can be found in Python libraries such as\n[python-control](https://github.com/python-control/python-control),\n[stats-models](https://www.statsmodels.org/stable/index.html),\n[SciPy Cookbook](https://scipy-cookbook.readthedocs.io/items/KalmanFiltering.html#).\nAlso, some machine learning libraries have some closed and non-solved issues on this subject\n, see [Scikit-learn#862 issue](https://github.com/scikit-learn/scikit-learn/pull/862)\nor [River#355 issue](https://github.com/online-ml/river/pull/355).\nWhy has the Kalman filter not found its place in these libraries?\nWe think it may be because they have an object-oriented API, which makes\nthem very well suited to the specific problems of modern machine learning but,\non the other hand, prevents them from accommodating additional features such as Kalman filtering.\nWe think the functional approach of WAX-ML, inherited from JAX, could well\nhelp to integrate a Kalman filter implementation in a machine learning ecosystem.\n\nIt turns out that Python code written with JAX is not very far from\n[Fortran](https://fr.wikipedia.org/wiki/Fortran), a mathematical FORmula TRANslating\nsystem.  It should therefore be quite easy and natural to reimplement standard\nalgorithms implemented in Fortran, such as those in the\n[SLICOT](http://slicot.org/) library with JAX.\nIt seems that some questions about the integration of Fortran into\nJAX has already been raised.\nAs noted in\n[this discussion on JAX's Github page](https://github.com/google/jax/discussions/3950),\nit might even be possible to simply wrap Fortran code in JAX.\nThis would avoid a painful rewriting process!\n\n\nAlong with the implementation of good old algorithms,\nwe would like to implement more recent ones from the online learning\nliterature that somehow revisits the filtering and control problems.\nIn particular, we would like to implement the online learning version of the\nARMA model developed in [[3]](#references)\nand some online-learning versions of control theory algorithms,\nan approach called \"the non-stochastic control problem\",\nsuch as the linear quadratic regulator (see [[4]](#references)).\n\n### Optimization\n\n\nThe JAX ecosystem already has a library dedicated to optimization:\n[Optax](https://github.com/deepmind/optax), which we use in WAX-ML.\nWe could complete it by offering\nother first-order algorithms such as the Alternating Direction Multiplier Method\n[(ADMM)](https://stanford.edu/~boyd/admm.html).\nOne can find \"functional\" implementations of proximal algorithms in libraries such\nas\n[proxmin](https://github.com/pmelchior/proxmin)),\n[ProximalOperators](https://kul-forbes.github.io/ProximalOperators.jl/latest/),\nor [COSMO](https://github.com/oxfordcontrol/COSMO.jl),\nwhich could give good reference implementations to start the work.\n\n\nAnother type of work took place around automatic differentiation and optimization.\nIn [[5]](#references) the authors implement differentiable layers based on\nconvex optimization in the library\n[cvxpylayers](https://github.com/cvxgrp/cvxpylayers).\nThey have implemented a JAX API but, at the moment, they cannot use the\n`jit` compilation of JAX yet\n(see [this issue](https://github.com/cvxgrp/cvxpylayers/issues/103)).\nWe would be interested in helping to solve this issue.\n\nFurthermore, in the recent paper [[9]](#references), the authors propose a new\nefficient and modular implicit differentiation technique with a JAX-based implementation that should\nlead to a new open-source optimization library in the JAX ecosystem.\n\n### Other algorithms\n\nThe machine learning libraries [Scikit-learn](https://scikit-learn.org/stable/),\n[River](https://github.com/online-ml/river),\n[ml-numpy](https://github.com/ddbourgin/numpy-ml) implement many \"traditional\" machine\nlearning algorithms that should provide an excellent basis for linking or reimplementing\nin JAX.  WAX-ML could help to build a repository for JAX versions of these algorithms.\n\n### Other APIS\n\n\n\nAs it did for the Gym API, WAX-ML could add support for other high-level object-oriented APIs like\nKeras, Scikit-learn, River ...\n\n\n### Collaborations\n\nThe WAX-ML team is open to discussion and collaboration with contributors from any field who are\n interested in using WAX-ML for their problems on streaming data.  We are looking for use cases\n around data streaming in audio processing, natural language processing, astrophysics, biology,\n finance, engineering ...\n\n We believe that good software design, especially in the scientific domain, requires practical use\n cases and that the more diversified these use cases are, the more the developed functionalities\n will be guaranteed to be well implemented.\n\n By making this software public, we hope to find enthusiasts who aim to develop WAX-ML further!\n\n\n## \u2692 Installation \u2692\n\nYou can install WAX-ML with the command:\n\n```bash\npip install wax-ml\n```\n\nTo install the latest version from source, you can use the command :\n\n```bash\npip install \"wax-ml[dev,complete] @ git+https://github.com/eserie/wax-ml.git\"\n```\n\n## Disclaimer\n\nWAX-ML is in its early stages of development and its features and API are very likely to\nevolve.\n\n\n## Development\n\nYou can contribute to WAX-ML by asking questions, proposing practical use cases, or by contributing to the code or the documentation.  You can have a look at our [Contributing\nGuidelines](https://github.com/eserie/wax-ml/CONTRIBUTING.md) and [Developer\nDocumentation](https://wax-ml.readthedocs.io/en/latest/developer.html) .\n\nWe maintain a \"WAX-ML Enhancement Proposals\" in\n[WEP.md](https://github.com/eserie/wax-ml/WEP.md) file.\n\n\n## References\n\n[1] [Google Princeton AI and Hazan Lab @ Princeton University](https://www.minregret.com/research/)\n\n[2] [FNet: Mixing Tokens with Fourier Transforms, James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon](https://arxiv.org/abs/2105.03824)\n\n[3] [Online Learning for Time Series Prediction, Oren Anava, Elad Hazan, Shie Mannor, Ohad Shamir](http://proceedings.mlr.press/v30/Anava13.html)\n\n[4] [The Nonstochastic Control Problem, Elad Hazan, Sham M. Kakade, Karan Singh](https://arxiv.org/abs/1911.12178)\n\n[5] [Differentiable Convex Optimization Layers, Akshay Agrawal, Brandon Amos, Shane Barratt, Stephen Boyd, Steven Diamond, Zico Kolter](https://arxiv.org/abs/1910.12430)\n\n[6] [Machine learning accelerated computational fluid dynamics, Dmitrii Kochkov, Jamie A. Smith, Ayya Alieva, Qing Wang, Michael P. Brenner, Stephan Hoyer](https://arxiv.org/abs/2102.01010)\n\n[7] [Large dimension forecasting models and random singular value spectra, Jean-Philippe Bouchaud, Laurent Laloux, M. Augusta Miceli, Marc Potters](https://arxiv.org/abs/physics/0512090)\n\n[8] [A novel dynamic PCA algorithm for dynamic data modeling and process monitoring, Yining Dongac and S. JoeQina](https://www.sciencedirect.com/science/article/pii/S095915241730094X)\n\n[9] [Efficient and Modular Implicit Differentiation, Mathieu Blondel, Quentin Berthet, Marco Cuturi, Roy Frostig, Stephan Hoyer, Felipe Llinares-L\u00f3pez, Fabian Pedregosa, Jean-Philippe Vert](https://arxiv.org/abs/2105.15183)\n\n## License\n\n```\nCopyright 2021 The WAX-ML Authors\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    https://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n```\n\nWAX-ML bundles portions of astropy, dask, deluca, haiku, jax, xarray.\n\nastropy, dask are available under a \"3-clause BSD\" license:\n- dask: `wax/gym/callbacks/callbacks.py`\n- astropy: `CONTRIBUTING.md`\n\ndeluca, haiku, jax and xarray are available under a \"Apache\" license:\n- deluca: `wax/gym/entity.py`\n- haiku: `docs/notebooks/05_reconstructing_the_light_curve_of_stars.*`\n- jax: `docs/conf.py`, `docs/developer.md`\n- xarray: `wax/datasets/generate_temperature_data.py`\n\nThe full text of these `licenses` is included in the licenses directory.\n\n\n## Citing WAX-ML\n\nIf you use WAX-ML, please cite our [paper](http://arxiv.org/abs/2106.06524) using the BibTex entry:\n\n```\n@misc{s\u00e9ri\u00e92021waxml,\n      title={{WAX-ML}: {A} {P}ython library for machine learning and feedback loops on streaming data},\n      author={Emmanuel S\u00e9ri\u00e9},\n      year={2021},\n      eprint={2106.06524},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url = {http://arxiv.org/abs/2106.06524},\n}\n```\n\n\n## Reference documentation\n\nFor details about the WAX-ML API, see the\n[reference documentation](https://wax-ml.readthedocs.io/en/latest/).\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "A Python library for machine-learning and feedback loops on streaming data",
    "version": "0.6.4",
    "split_keywords": [
        "time series",
        "machine learning",
        "optimization",
        "optimal control",
        "online learning",
        "reinforcement learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b69ab1e8f5d2728d1d2f447920a627d735bced5ad41124c24ce04d9f70d38c55",
                "md5": "897852043c2271842a81e7348bbc3f82",
                "sha256": "867953bffa1ccdc53a80a0f7d6ce2d10db9ceb4c9c309985e77dcd38daaf3b85"
            },
            "downloads": -1,
            "filename": "wax_ml-0.6.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "897852043c2271842a81e7348bbc3f82",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 181075,
            "upload_time": "2023-04-21T22:27:12",
            "upload_time_iso_8601": "2023-04-21T22:27:12.488429Z",
            "url": "https://files.pythonhosted.org/packages/b6/9a/b1e8f5d2728d1d2f447920a627d735bced5ad41124c24ce04d9f70d38c55/wax_ml-0.6.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6b5f8f8eee7d231129480e0675b47041d70299bbde3c9b6d4d7ec0826695ee24",
                "md5": "37368a782a42ead270aaa79557f242c2",
                "sha256": "82b717635ac9229b755f11a9a187e3a5ed0ba95ce4fd0536ee1614478d4036f4"
            },
            "downloads": -1,
            "filename": "wax-ml-0.6.4.tar.gz",
            "has_sig": false,
            "md5_digest": "37368a782a42ead270aaa79557f242c2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 132930,
            "upload_time": "2023-04-21T22:27:14",
            "upload_time_iso_8601": "2023-04-21T22:27:14.380226Z",
            "url": "https://files.pythonhosted.org/packages/6b/5f/8f8eee7d231129480e0675b47041d70299bbde3c9b6d4d7ec0826695ee24/wax-ml-0.6.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-21 22:27:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "eserie",
    "github_project": "wax-ml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "wax-ml"
}
        
Elapsed time: 0.07294s