<p align="center">
<img src="https://github.com/Infineon/StreamGen/blob/main/docs/artwork/crystal_cascades_by_th3dutchzombi3_dgmp8d5-pre.jpg?raw=true"/></a>
</p>
<h1 align="center">
π StreamGen
</h1>
<p align="center">
a π Python framework for generating streams of labelled data
</p>
<p align="center">
<a href="https://pypi.org/project/streamgen/"><img alt="PyPI - Version" src="https://img.shields.io/pypi/v/streamgen?label=%F0%9F%93%A6%20PyPi">
</a>
<a href="https://www.repostatus.org/#active"><img src="https://www.repostatus.org/badges/latest/active.svg" alt="Project Status: Active β The project has reached a stable, usable state and is being actively developed." /></a>
<a href="https://github.com/Infineon/StreamGen/actions/workflows/python-package-ubuntu.yaml"><img alt="π Python package" src="https://github.com/Infineon/StreamGen/actions/workflows/python-package-ubuntu.yaml/badge.svg"></a>
<img alt="Static Badge" src="https://img.shields.io/badge/Coverage-88%25-yellow?logo=codecov">
</p>
<p align="center">
<a href="https://www.python.org/"><img alt="Python" src="https://img.shields.io/badge/Python-3.11|3.12-yellow?logo=python"></a>
<a href="https://python-poetry.org/"><img alt="Poetry" src="https://img.shields.io/badge/Poetry-1.8.2-blue?logo=Poetry"></a>
<a href="https://joss.theoj.org/papers/4b6bac90bd1eb54700f8afb9f32caebe"><img src="https://joss.theoj.org/papers/4b6bac90bd1eb54700f8afb9f32caebe/status.svg"></a>
</p>
<p align="center">
<a href="https://github.com/astral-sh/ruff"><img alt="Ruff" src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json"></a>
<a href="https://github.com/beartype/beartype"><img alt="Beartype" src="https://raw.githubusercontent.com/beartype/beartype-assets/main/badge/bear-ified.svg"></a>
</p>
<p align="center">
<a href="https://github.com/Infineon/StreamGen?tab=readme-ov-file#%EF%B8%8F-motivation">βοΈ Motivation</a> β’
<a href="https://github.com/Infineon/StreamGen?tab=readme-ov-file#-idea">π‘ Idea</a> β’
<a href="https://github.com/Infineon/StreamGen?tab=readme-ov-file#-installation">π¦ Installation</a> β’
<a href="https://github.com/Infineon/StreamGen?tab=readme-ov-file#-examples">π Examples</a> β’
<a href="https://github.com/Infineon/StreamGen?tab=readme-ov-file#-documentation">π Documentation</a> β’
<a href="https://github.com/Infineon/StreamGen?tab=readme-ov-file#-acknowledgement">π Acknowledgement</a>
</p>
---
## βοΈ Motivation
Most machine learning systems rely on *stationary, labeled, balanced and large-scale* datasets.
**Incremental learning** (IL), also referred to as **lifelong learning** (LL) or **continual learning** (CL), extends the traditional paradigm to work in dynamic and evolving environments.
This requires such systems to acquire and preserve knowledge continually.
Existing CL frameworks like [avalanche](https://github.com/ContinualAI/avalanche)[^1] or [continuum](https://github.com/Continvvm/continuum)[^2] construct data streams by *splitting* large datasets into multiple *experiences*, which has a few disadvantages:
- results in unrealistic scenarios
- offers limited insight into distributions and their evolution
- not extendable to scenarios with fewer constraints on the stream properties
To answer different research questions in the field of CL, researchers need knowledge and control over:
- class distributions
- novelties and outliers
- complexity and evolution of the background domain
- semantics of the unlabeled parts of a domain
- class dependencies
- class composition (for multi-label modelling)
A more economical alternative to collecting and labelling streams with desired properties is the **generation** of synthetic streams[^6].
Some mentionable efforts in that direction include augmentation based dataset generation like [ImageNet-C](https://github.com/hendrycks/robustness)[^3] or simulation-based approaches like the [EndlessCLSim](https://arxiv.org/abs/2106.02585)[^4], where semantically labeled street-view images are generated (and labeled) by a game engine, that procedurally generates the city environment and simulates drift by modifying parameters (like the weather and illumination conditions) over time.
<details>
<summary>ImageNet-C [3]</summary>
<img src="https://github.com/Infineon/StreamGen/blob/main/docs/images/imagenet-c.png?raw=true">
</details>
<details>
<summary>EndlessCLSim [4]</summary>
<img src="https://github.com/Infineon/StreamGen/blob/main/docs/images/endless_cl_sim.png?raw=true">
<img src="https://github.com/Infineon/StreamGen/blob/main/docs/images/endless_cl_sim_model.png?raw=true">
</details>
This project builds on these ideas and presents a general framework for generating streams of labeled samples.
## π‘ Idea
This section introduces the main ideas and building blocks of the `streamgen` framework.
### π² Building complex Distributions through random Transformations
There exists only a limited number of distributions one can directly sample from (e.g.: a gaussian distribution).
Instead of generating samples directly from a distribution, researchers often work with collected sets of samples.
A common practice to increase the variability of such datasets is the use of **stochastic transformations** in a sequential augmentation pipeline:
```python
from torchvision.transforms import v2
transforms = v2.Compose([
v2.RandomResizedCrop(size=(224, 224), antialias=True),
v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
# ...
])
while generating_data:
# option 1 - sample from a dataset
sample = np.random.choice(dataset)
# option 2 - sample from a distribution
sample = np.random.randn(...)
augmented_sample = transforms(sample)
```
Combined with an initial sampler, that either samples from a data set or directly from a distribution, these chained transformations can represent complex distributions.
<details>
<summary>Function Composition Details </summary>
Two (or more) functions f: X β X, g: X β X having the same domain and codomain are often called **transformations**. One can form chains of transformations composed together, such as f β f β g β f (which is the same as f(f(g(f(x)))) given some input x). Such chains have the algebraic structure of a **monoid**, called a transformation monoid or (much more seldom) a composition monoid. [^7]
A lot of programming languages offer native support for such transformation monoids.
Julia uses `|>` or `β` for function chaining:
```julia
distribution = sample |> filter |> augment
distribution = augment β filter β sample
```
R uses the chain operator `%>%`:
```R
distribution <- sample %>%
filter() %>%
augment()
```
In python, you can use `functools.reduce` to create simple monoids:
```python
from functools import reduce
from typing import Callable
def compose(*funcs) -> Callable[[int], int]:
"""Compose a group of functions (f(g(h(...)))) into a single composite func."""
return reduce(lambda f, g: lambda x: f(g(x)), funcs)
distribution = compose(sample, filter, augment)
```
> π€ StreamGen is not trying to implement general (and optimized) function composition in Python. It rather offers a very opinionated implementation, that is optimal for the data generation use-case.
</details>
### π³ Sampling Trees
One shortcoming of this approach is that one can only generate samples from a single distribution -> different class distributions are not representable.
One solution to this problem is the use of a [tree](https://en.wikipedia.org/wiki/Tree_(data_structure)) (or other directed acyclic graph (DAG)) data structure to store the transformations.
- samples are transformed during the traversal of the tree from the root to the leaves.
- each path through the tree represents its own class-conditional distribution.
- each branching point represents a categorical distribution which determines the path to take for a sample during the tree traversal.
<p align="center">
<img src="https://github.com/Infineon/StreamGen/blob/main/docs/images/sampling_tree.png?raw=true"/></a>
</p>
### βοΈ Parameter Schedules
If we want to model evolving distributions (streams), we either need to change the **parameters** of the stochastic transformations or the **topology** of the tree over time.
<p align="center">
<img src="https://github.com/Infineon/StreamGen/blob/main/docs/images/parameter_schedule.png?raw=true"/></a>
</p>
Currently, `streamgen` does not support scheduling topological changes (like adding branches and nodes), but by **unrolling** these changes over time into one static tree, topological changes can be modelled purely with branch probabilities.
<p align="center">
<img src="https://github.com/Infineon/StreamGen/blob/main/docs/images/data_drifts_by_topology_changes.png?raw=true"/></a>
</p>
<p align="center">
<img src="https://github.com/Infineon/StreamGen/blob/main/docs/images/unrolled_static_tree.png?raw=true"/></a>
</p>
> π‘ the directed acyclic graph above is not a tree anymore due to the *merging* of certain branches. Because these merges are very convenient in certain scenarios, `streamgen` support the definition of such trees by copying the paths below the merge to every branch before the merge. For an example of this, have a look at `examples/time series classification/04-multi-label-generation.ipynb`.
### π Data Drift Scenarios
The proposed tree structure can model all three common data drift scenarios by scheduling the parameters of the transformations at specific nodes.
#### π Covariate shift
<p align="center">
<img src="https://github.com/Infineon/StreamGen/blob/main/docs/images/covariate_shift.png?raw=true"/></a>
</p>
#### π Prior probability shift
<p align="center">
<img src="https://github.com/Infineon/StreamGen/blob/main/docs/images/prior_probability_shift.png?raw=true"/></a>
</p>
#### π·οΈ Concept shift
<p align="center">
<img src="https://github.com/Infineon/StreamGen/blob/main/docs/images/concept_shift.png?raw=true"/></a>
</p>
## π¦ Installation
The graph visualizations require [Graphviz](https://www.graphviz.org/download/) to be installed on your system. Depending on your operating system and package manager, you might try one of the following options:
- ubuntu: `sudo apt-get install graphviz`
- windows: `choco install graphviz`
- macOs: `brew install graphviz`
The basic version of the package can be installed from [PyPi](https://pypi.org/project/streamgen/) with:
```sh
pip install streamgen
```
`streamgen` provides a few (pip) extras:
| extras group | needed for | additional dependencies |
| ------------ | -------------------------------------------------------------------------- | ---------------------------- |
| **examples** | running the example notebooks with their application specific dependencies | `perlin-numpy`, `polars` |
| **cl** | continual learning frameworks | `continuum` |
| **all** | shortcut for installing every extra | * |
To install the package with specific extras execute:
```sh
pip install streamgen[<name_of_extra>]
```
> π§βπ» to install a development environment (which you need if you want to work on the package, instead of just using the package), `cd` into the project's root directory and call:
```bash
poetry install --sync --compile --all-extras
```
## π Examples
There are example notebooks πͺπ showcasing and explaining `streamgen` features:
+ π time series
+ [π² sampling from static distributions](https://github.com/Infineon/StreamGen/blob/main/examples/time%20series%20classification/01-static-distributions.ipynb)
+ [π creating data streams](https://github.com/Infineon/StreamGen/blob/main/examples/time%20series%20classification/02-data-streams.ipynb)
+ [π data drift scenarios](https://github.com/Infineon/StreamGen/blob/main/examples/time%20series%20classification/03-drift-scenarios.ipynb)
+ [π·οΈ multi-label generation](https://github.com/Infineon/StreamGen/blob/main/examples/time%20series%20classification/04-multi-label-generation.ipynb)
+ πΌοΈ analog wafer map streams based on the [wm811k dataset](https://www.kaggle.com/datasets/qingyi/wm811k-wafer-map)[^5] in [π wafer map generation](https://github.com/Infineon/StreamGen/blob/main/examples/wafer_map_generation.ipynb)
Here is a preview of what we will create in the time series examples:
<p align="center">
<img src="https://github.com/Infineon/StreamGen/blob/main/docs/videos/time_series_tree_svg.gif?raw=true"/></a>
</p>
## π Documentation
The [documentation](https://infineon.github.io/StreamGen/) is hosted through github pages.
To locally build and view it, call `poe docs_local`.
## π Acknowledgement
Made with β€οΈ and β by Laurenz Farthofer.
This work was funded by the Austrian Research Promotion Agency (FFG, Project No. 905107).
Special thanks to Benjamin Steinwender, Marius Birkenbach and Nikolaus Neugebauer for their valuable feedback.
I want to thank Infineon and KAI for letting me work on and publish this project.
Finally, I want to thank my university supervisors Thomas Pock and Marc Masana for their guidance.
---
## πΌοΈ Β©οΈ Banner Artwork Attribution
<a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/3.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-nd/3.0/88x31.png" /></a><br />The art in the banner of this README is licensed under a [Creative Commons Attribution-NonCommercial-No Derivatives Works 3.0 License](https://creativecommons.org/licenses/by-nc-nd/3.0/). It was made by [th3dutchzombi3](https://www.deviantart.com/th3dutchzombi3). Check out his beautiful artwork β€οΈ
---
## π References
[^1]: V. Lomonaco et al., βAvalanche: an End-to-End Library for Continual Learning,β in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA: IEEE, Jun. 2021, pp. 3595β3605. doi: 10.1109/CVPRW53098.2021.00399.
[^2]: A. Douillard and T. Lesort, βContinuum: Simple Management of Complex Continual Learning Scenarios.β arXiv, Feb. 11, 2021. doi: 10.48550/arXiv.2102.06253.
[^3]: D. Hendrycks and T. Dietterich, βBenchmarking Neural Network Robustness to Common Corruptions and Perturbations.β arXiv, Mar. 28, 2019. doi: 10.48550/arXiv.1903.12261.
[^4]: T. Hess, M. Mundt, I. Pliushch, and V. Ramesh, βA Procedural World Generation Framework for Systematic Evaluation of Continual Learning.β arXiv, Dec. 13, 2021. doi: 10.48550/arXiv.2106.02585.
[^5]: Wu, Ming-Ju, Jyh-Shing R. Jang, and Jui-Long Chen. βWafer Map Failure Pattern Recognition and Similarity Ranking for Large-Scale Data Sets.β IEEE Transactions on Semiconductor Manufacturing 28, no. 1 (February 2015): 1β12.
[^6]: J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, βLearning under Concept Drift: A Reviewβ IEEE Trans. Knowl. Data Eng., pp. 1β1, 2018, doi: 10.1109/TKDE.2018.2876857.
[^7]: βFunction composition,β Wikipedia. Feb. 16, 2024. Accessed: Apr. 17, 2024. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Function_composition&oldid=1207989326
Raw data
{
"_id": null,
"home_page": "https://github.com/Infineon/StreamGen",
"name": "streamgen",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.11",
"maintainer_email": null,
"keywords": "Python, Data Generation, Data Streams, Continual Learning, Data Structures, Function Composition",
"author": "Laurenz A. Farthofer",
"author_email": "laurenz@hey.com",
"download_url": "https://files.pythonhosted.org/packages/db/5c/858025cde18e4a2f17431e8fce9b1c0da384db6f05bebae938d556d5a472/streamgen-1.0.4.tar.gz",
"platform": null,
"description": "<p align=\"center\">\n <img src=\"https://github.com/Infineon/StreamGen/blob/main/docs/artwork/crystal_cascades_by_th3dutchzombi3_dgmp8d5-pre.jpg?raw=true\"/></a>\n</p>\n\n<h1 align=\"center\">\n \ud83c\udf0c StreamGen\n</h1>\n\n<p align=\"center\">\na \ud83d\udc0d Python framework for generating streams of labelled data\n</p>\n\n<p align=\"center\">\n <a href=\"https://pypi.org/project/streamgen/\"><img alt=\"PyPI - Version\" src=\"https://img.shields.io/pypi/v/streamgen?label=%F0%9F%93%A6%20PyPi\">\n</a>\n <a href=\"https://www.repostatus.org/#active\"><img src=\"https://www.repostatus.org/badges/latest/active.svg\" alt=\"Project Status: Active \u2013 The project has reached a stable, usable state and is being actively developed.\" /></a>\n <a href=\"https://github.com/Infineon/StreamGen/actions/workflows/python-package-ubuntu.yaml\"><img alt=\"\ud83d\udc0d Python package\" src=\"https://github.com/Infineon/StreamGen/actions/workflows/python-package-ubuntu.yaml/badge.svg\"></a>\n <img alt=\"Static Badge\" src=\"https://img.shields.io/badge/Coverage-88%25-yellow?logo=codecov\">\n</p>\n\n<p align=\"center\">\n <a href=\"https://www.python.org/\"><img alt=\"Python\" src=\"https://img.shields.io/badge/Python-3.11|3.12-yellow?logo=python\"></a>\n <a href=\"https://python-poetry.org/\"><img alt=\"Poetry\" src=\"https://img.shields.io/badge/Poetry-1.8.2-blue?logo=Poetry\"></a>\n <a href=\"https://joss.theoj.org/papers/4b6bac90bd1eb54700f8afb9f32caebe\"><img src=\"https://joss.theoj.org/papers/4b6bac90bd1eb54700f8afb9f32caebe/status.svg\"></a>\n</p>\n\n<p align=\"center\">\n <a href=\"https://github.com/astral-sh/ruff\"><img alt=\"Ruff\" src=\"https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json\"></a>\n <a href=\"https://github.com/beartype/beartype\"><img alt=\"Beartype\" src=\"https://raw.githubusercontent.com/beartype/beartype-assets/main/badge/bear-ified.svg\"></a>\n</p>\n\n<p align=\"center\">\n <a href=\"https://github.com/Infineon/StreamGen?tab=readme-ov-file#%EF%B8%8F-motivation\">\u2697\ufe0f Motivation</a> \u2022\n <a href=\"https://github.com/Infineon/StreamGen?tab=readme-ov-file#-idea\">\ud83d\udca1 Idea</a> \u2022\n <a href=\"https://github.com/Infineon/StreamGen?tab=readme-ov-file#-installation\">\ud83d\udce6 Installation</a> \u2022\n <a href=\"https://github.com/Infineon/StreamGen?tab=readme-ov-file#-examples\">\ud83d\udc40 Examples</a> \u2022\n <a href=\"https://github.com/Infineon/StreamGen?tab=readme-ov-file#-documentation\">\ud83d\udcd6 Documentation</a> \u2022\n <a href=\"https://github.com/Infineon/StreamGen?tab=readme-ov-file#-acknowledgement\">\ud83d\ude4f Acknowledgement</a>\n</p>\n\n---\n\n## \u2697\ufe0f Motivation\n\nMost machine learning systems rely on *stationary, labeled, balanced and large-scale* datasets.\n**Incremental learning** (IL), also referred to as **lifelong learning** (LL) or **continual learning** (CL), extends the traditional paradigm to work in dynamic and evolving environments.\nThis requires such systems to acquire and preserve knowledge continually.\n\nExisting CL frameworks like [avalanche](https://github.com/ContinualAI/avalanche)[^1] or [continuum](https://github.com/Continvvm/continuum)[^2] construct data streams by *splitting* large datasets into multiple *experiences*, which has a few disadvantages:\n\n- results in unrealistic scenarios\n- offers limited insight into distributions and their evolution\n- not extendable to scenarios with fewer constraints on the stream properties\n\nTo answer different research questions in the field of CL, researchers need knowledge and control over:\n\n- class distributions\n- novelties and outliers\n- complexity and evolution of the background domain\n- semantics of the unlabeled parts of a domain\n- class dependencies\n- class composition (for multi-label modelling)\n\nA more economical alternative to collecting and labelling streams with desired properties is the **generation** of synthetic streams[^6].\nSome mentionable efforts in that direction include augmentation based dataset generation like [ImageNet-C](https://github.com/hendrycks/robustness)[^3] or simulation-based approaches like the [EndlessCLSim](https://arxiv.org/abs/2106.02585)[^4], where semantically labeled street-view images are generated (and labeled) by a game engine, that procedurally generates the city environment and simulates drift by modifying parameters (like the weather and illumination conditions) over time.\n\n<details>\n<summary>ImageNet-C [3]</summary>\n<img src=\"https://github.com/Infineon/StreamGen/blob/main/docs/images/imagenet-c.png?raw=true\">\n</details>\n\n<details>\n<summary>EndlessCLSim [4]</summary>\n<img src=\"https://github.com/Infineon/StreamGen/blob/main/docs/images/endless_cl_sim.png?raw=true\">\n<img src=\"https://github.com/Infineon/StreamGen/blob/main/docs/images/endless_cl_sim_model.png?raw=true\">\n</details>\n\nThis project builds on these ideas and presents a general framework for generating streams of labeled samples.\n\n## \ud83d\udca1 Idea\n\nThis section introduces the main ideas and building blocks of the `streamgen` framework.\n\n### \ud83c\udfb2 Building complex Distributions through random Transformations\n\nThere exists only a limited number of distributions one can directly sample from (e.g.: a gaussian distribution).\n\nInstead of generating samples directly from a distribution, researchers often work with collected sets of samples.\nA common practice to increase the variability of such datasets is the use of **stochastic transformations** in a sequential augmentation pipeline:\n\n```python\nfrom torchvision.transforms import v2\n\ntransforms = v2.Compose([\n v2.RandomResizedCrop(size=(224, 224), antialias=True),\n v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n # ...\n])\n\nwhile generating_data:\n # option 1 - sample from a dataset\n sample = np.random.choice(dataset)\n # option 2 - sample from a distribution\n sample = np.random.randn(...)\n\n augmented_sample = transforms(sample)\n```\n\nCombined with an initial sampler, that either samples from a data set or directly from a distribution, these chained transformations can represent complex distributions.\n\n<details>\n <summary>Function Composition Details </summary>\n\nTwo (or more) functions f: X \u2192 X, g: X \u2192 X having the same domain and codomain are often called **transformations**. One can form chains of transformations composed together, such as f \u2218 f \u2218 g \u2218 f (which is the same as f(f(g(f(x)))) given some input x). Such chains have the algebraic structure of a **monoid**, called a transformation monoid or (much more seldom) a composition monoid. [^7]\n\nA lot of programming languages offer native support for such transformation monoids.\n\nJulia uses `|>` or `\u2218` for function chaining:\n```julia\ndistribution = sample |> filter |> augment\ndistribution = augment \u2218 filter \u2218 sample\n```\n\nR uses the chain operator `%>%`:\n```R\ndistribution <- sample %>%\n filter() %>%\n augment()\n```\n\nIn python, you can use `functools.reduce` to create simple monoids:\n```python\nfrom functools import reduce\nfrom typing import Callable\n\ndef compose(*funcs) -> Callable[[int], int]:\n \"\"\"Compose a group of functions (f(g(h(...)))) into a single composite func.\"\"\"\n return reduce(lambda f, g: lambda x: f(g(x)), funcs)\n\ndistribution = compose(sample, filter, augment)\n```\n\n> \ud83e\udd1a StreamGen is not trying to implement general (and optimized) function composition in Python. It rather offers a very opinionated implementation, that is optimal for the data generation use-case.\n\n</details>\n\n### \ud83c\udf33 Sampling Trees\n\nOne shortcoming of this approach is that one can only generate samples from a single distribution -> different class distributions are not representable.\n\nOne solution to this problem is the use of a [tree](https://en.wikipedia.org/wiki/Tree_(data_structure)) (or other directed acyclic graph (DAG)) data structure to store the transformations.\n\n- samples are transformed during the traversal of the tree from the root to the leaves.\n- each path through the tree represents its own class-conditional distribution.\n- each branching point represents a categorical distribution which determines the path to take for a sample during the tree traversal.\n\n<p align=\"center\">\n <img src=\"https://github.com/Infineon/StreamGen/blob/main/docs/images/sampling_tree.png?raw=true\"/></a>\n</p>\n\n### \u2699\ufe0f Parameter Schedules\n\nIf we want to model evolving distributions (streams), we either need to change the **parameters** of the stochastic transformations or the **topology** of the tree over time.\n\n<p align=\"center\">\n <img src=\"https://github.com/Infineon/StreamGen/blob/main/docs/images/parameter_schedule.png?raw=true\"/></a>\n</p>\n\nCurrently, `streamgen` does not support scheduling topological changes (like adding branches and nodes), but by **unrolling** these changes over time into one static tree, topological changes can be modelled purely with branch probabilities.\n\n<p align=\"center\">\n <img src=\"https://github.com/Infineon/StreamGen/blob/main/docs/images/data_drifts_by_topology_changes.png?raw=true\"/></a>\n</p>\n\n<p align=\"center\">\n <img src=\"https://github.com/Infineon/StreamGen/blob/main/docs/images/unrolled_static_tree.png?raw=true\"/></a>\n</p>\n\n> \ud83d\udca1 the directed acyclic graph above is not a tree anymore due to the *merging* of certain branches. Because these merges are very convenient in certain scenarios, `streamgen` support the definition of such trees by copying the paths below the merge to every branch before the merge. For an example of this, have a look at `examples/time series classification/04-multi-label-generation.ipynb`.\n\n### \ud83d\udcc8 Data Drift Scenarios\n\nThe proposed tree structure can model all three common data drift scenarios by scheduling the parameters of the transformations at specific nodes.\n\n#### \ud83d\udcc9 Covariate shift\n\n<p align=\"center\">\n <img src=\"https://github.com/Infineon/StreamGen/blob/main/docs/images/covariate_shift.png?raw=true\"/></a>\n</p>\n\n#### \ud83d\udcca Prior probability shift\n\n<p align=\"center\">\n <img src=\"https://github.com/Infineon/StreamGen/blob/main/docs/images/prior_probability_shift.png?raw=true\"/></a>\n</p>\n\n#### \ud83c\udff7\ufe0f Concept shift\n\n<p align=\"center\">\n <img src=\"https://github.com/Infineon/StreamGen/blob/main/docs/images/concept_shift.png?raw=true\"/></a>\n</p>\n\n## \ud83d\udce6 Installation\n\nThe graph visualizations require [Graphviz](https://www.graphviz.org/download/) to be installed on your system. Depending on your operating system and package manager, you might try one of the following options:\n\n- ubuntu: `sudo apt-get install graphviz`\n- windows: `choco install graphviz`\n- macOs: `brew install graphviz`\n\nThe basic version of the package can be installed from [PyPi](https://pypi.org/project/streamgen/) with:\n```sh\npip install streamgen\n```\n\n`streamgen` provides a few (pip) extras:\n\n| extras group | needed for | additional dependencies |\n| ------------ | -------------------------------------------------------------------------- | ---------------------------- |\n| **examples** | running the example notebooks with their application specific dependencies | `perlin-numpy`, `polars` |\n| **cl** | continual learning frameworks | `continuum` |\n| **all** | shortcut for installing every extra | * |\n\nTo install the package with specific extras execute:\n\n```sh\npip install streamgen[<name_of_extra>]\n```\n\n> \ud83e\uddd1\u200d\ud83d\udcbb to install a development environment (which you need if you want to work on the package, instead of just using the package), `cd` into the project's root directory and call:\n```bash\npoetry install --sync --compile --all-extras\n```\n\n## \ud83d\udc40 Examples\n\nThere are example notebooks \ud83e\ude90\ud83d\udcd3 showcasing and explaining `streamgen` features:\n\n+ \ud83d\udcc8 time series\n + [\ud83c\udfb2 sampling from static distributions](https://github.com/Infineon/StreamGen/blob/main/examples/time%20series%20classification/01-static-distributions.ipynb)\n + [\ud83c\udf0c creating data streams](https://github.com/Infineon/StreamGen/blob/main/examples/time%20series%20classification/02-data-streams.ipynb)\n + [\ud83d\udcca data drift scenarios](https://github.com/Infineon/StreamGen/blob/main/examples/time%20series%20classification/03-drift-scenarios.ipynb)\n + [\ud83c\udff7\ufe0f multi-label generation](https://github.com/Infineon/StreamGen/blob/main/examples/time%20series%20classification/04-multi-label-generation.ipynb)\n+ \ud83d\uddbc\ufe0f analog wafer map streams based on the [wm811k dataset](https://www.kaggle.com/datasets/qingyi/wm811k-wafer-map)[^5] in [\ud83c\udf10 wafer map generation](https://github.com/Infineon/StreamGen/blob/main/examples/wafer_map_generation.ipynb)\n\nHere is a preview of what we will create in the time series examples:\n\n<p align=\"center\">\n <img src=\"https://github.com/Infineon/StreamGen/blob/main/docs/videos/time_series_tree_svg.gif?raw=true\"/></a>\n</p>\n\n## \ud83d\udcd6 Documentation\n\nThe [documentation](https://infineon.github.io/StreamGen/) is hosted through github pages.\n\nTo locally build and view it, call `poe docs_local`.\n\n## \ud83d\ude4f Acknowledgement\n\nMade with \u2764\ufe0f and \u2615 by Laurenz Farthofer.\n\nThis work was funded by the Austrian Research Promotion Agency (FFG, Project No. 905107).\n\nSpecial thanks to Benjamin Steinwender, Marius Birkenbach and Nikolaus Neugebauer for their valuable feedback.\n\nI want to thank Infineon and KAI for letting me work on and publish this project.\n\nFinally, I want to thank my university supervisors Thomas Pock and Marc Masana for their guidance.\n\n---\n\n## \ud83d\uddbc\ufe0f \u00a9\ufe0f Banner Artwork Attribution\n\n<a rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-nd/3.0/\"><img alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by-nc-nd/3.0/88x31.png\" /></a><br />The art in the banner of this README is licensed under a [Creative Commons Attribution-NonCommercial-No Derivatives Works 3.0 License](https://creativecommons.org/licenses/by-nc-nd/3.0/). It was made by [th3dutchzombi3](https://www.deviantart.com/th3dutchzombi3). Check out his beautiful artwork \u2764\ufe0f\n\n---\n\n## \ud83d\udcc4 References\n\n[^1]: V. Lomonaco et al., \u201cAvalanche: an End-to-End Library for Continual Learning,\u201d in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA: IEEE, Jun. 2021, pp. 3595\u20133605. doi: 10.1109/CVPRW53098.2021.00399.\n[^2]: A. Douillard and T. Lesort, \u201cContinuum: Simple Management of Complex Continual Learning Scenarios.\u201d arXiv, Feb. 11, 2021. doi: 10.48550/arXiv.2102.06253.\n[^3]: D. Hendrycks and T. Dietterich, \u201cBenchmarking Neural Network Robustness to Common Corruptions and Perturbations.\u201d arXiv, Mar. 28, 2019. doi: 10.48550/arXiv.1903.12261.\n[^4]: T. Hess, M. Mundt, I. Pliushch, and V. Ramesh, \u201cA Procedural World Generation Framework for Systematic Evaluation of Continual Learning.\u201d arXiv, Dec. 13, 2021. doi: 10.48550/arXiv.2106.02585.\n[^5]: Wu, Ming-Ju, Jyh-Shing R. Jang, and Jui-Long Chen. \u201cWafer Map Failure Pattern Recognition and Similarity Ranking for Large-Scale Data Sets.\u201d IEEE Transactions on Semiconductor Manufacturing 28, no. 1 (February 2015): 1\u201312.\n[^6]: J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, \u201cLearning under Concept Drift: A Review\u201d IEEE Trans. Knowl. Data Eng., pp. 1\u20131, 2018, doi: 10.1109/TKDE.2018.2876857.\n[^7]: \u201cFunction composition,\u201d Wikipedia. Feb. 16, 2024. Accessed: Apr. 17, 2024. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Function_composition&oldid=1207989326\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "\ud83c\udf0c a framework for generating streams of labeled data.",
"version": "1.0.4",
"project_urls": {
"Documentation": "https://infineon.github.io/StreamGen/",
"Homepage": "https://github.com/Infineon/StreamGen",
"Repository": "https://github.com/Infineon/StreamGen"
},
"split_keywords": [
"python",
" data generation",
" data streams",
" continual learning",
" data structures",
" function composition"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "48f8a824f9e9223d7faaf99c8e2abdc176931931a4857b7738d23e6e8194fcf2",
"md5": "1ccabdb5ef05161df58f996fee178f47",
"sha256": "fc733b4ce6d01c422b46ded4a35ba86c129bf19491918cfcba6d697381fbd554"
},
"downloads": -1,
"filename": "streamgen-1.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1ccabdb5ef05161df58f996fee178f47",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.11",
"size": 28749,
"upload_time": "2024-12-03T12:30:40",
"upload_time_iso_8601": "2024-12-03T12:30:40.367480Z",
"url": "https://files.pythonhosted.org/packages/48/f8/a824f9e9223d7faaf99c8e2abdc176931931a4857b7738d23e6e8194fcf2/streamgen-1.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "db5c858025cde18e4a2f17431e8fce9b1c0da384db6f05bebae938d556d5a472",
"md5": "18f56588a00948dfe7be9b09be47a2ab",
"sha256": "febf160b2982ced62cc8b682c68a413054be9385642cab1c36f38921c0ccc6a3"
},
"downloads": -1,
"filename": "streamgen-1.0.4.tar.gz",
"has_sig": false,
"md5_digest": "18f56588a00948dfe7be9b09be47a2ab",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.11",
"size": 30863,
"upload_time": "2024-12-03T12:30:42",
"upload_time_iso_8601": "2024-12-03T12:30:42.435123Z",
"url": "https://files.pythonhosted.org/packages/db/5c/858025cde18e4a2f17431e8fce9b1c0da384db6f05bebae938d556d5a472/streamgen-1.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-03 12:30:42",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Infineon",
"github_project": "StreamGen",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "streamgen"
}