data-aug-4-tsc


Namedata-aug-4-tsc JSON
Version 0.0.1 PyPI version JSON
download
home_pageNone
SummaryData Augmentation for Time series Data: A Review
upload_time2025-08-14 08:31:57
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseGPL-3.0-only
keywords data-science machine-learning data-mining time-series classification time-series-analysis time-series-classification time-series-augmentation data-augmentation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            > ⚠️ **Alert:** If you are using this code with **Keras v3**, make sure you are using **Keras ≥ 3.6.0**.
> Earlier versions of Keras v3 do not honor `trainable=False`, which will result in **training hand-crafted filters** in **LITEMV** unexpectedly.

| Overview        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **CI/CD**       | [![github-actions-main](https://github.com/MSD-IRIMAS/Data-Augmentation-4-TSC/actions/workflows/pytest.yml/badge.svg?branch=main&logo=github&label=build%20(main))](https://github.com/MSD-IRIMAS/Data-Augmentation-4-TSC/actions/workflows/pytest.yml) [![github-actions-tests](https://github.com/MSD-IRIMAS/Data-Augmentation-4-TSC/actions/workflows/pre-commit.yml/badge.svg?logo=github&label=build%20(tests))](https://github.com/MSD-IRIMAS/Data-Augmentation-4-TSC/actions/workflows/pre-commit.yml)  |
| **Code**        | [![pypi](https://img.shields.io/pypi/v/data-aug-4-tsc?logo=pypi&color=blue)](https://pypi.org/project/data-aug-4-tsc/) [![python-versions](https://img.shields.io/pypi/pyversions/data-aug-4-tsc?logo=python)](https://www.python.org/) [![!black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![license](https://img.shields.io/badge/license-GPL3.0-green)](https://github.com/MSD-IRIMAS/Data-Augmentation-4-TSC/blob/main/LICENSE) |
| **Community**   | [![website](https://img.shields.io/static/v1?label=Website&message=msd-irimas.github.io&color=blue&logo=githubpages)](https://msd-irimas.github.io/msd-irimas.github.io/)


# Re-framing Time Series Augmentation Through the Lens of Generative Models

Authors: [Ali Ismail-Fawaz](https://hadifawaz1999.github.io/)<sup>1</sup>, [Maxime Devanne](https://maxime-devanne.com/)<sup>1</sup>, [Stefano Berreti](https://www.micc.unifi.it/berretti/)<sup>2</sup>, [Jonathan Weber](https://www.jonathan-weber.eu/)<sup>1</sup> and [Germain Forestier](https://germain-forestier.info/)<sup>1,3</sup>

<sup>1</sup> IRIMAS, Universite de Haute-Alsace, France<br>
<sup>2</sup> MICC, University of Florence, Italy<br>
<sup>3</sup> DSAI, Monash University, Australia

This repository is the source code of the article titled "[Re-framing Time Series Augmentation Through the Lens of Generative Models](#)" accepted in the [10th Workshop on Advanced Analytics and Learning on Temporal Data (AALTD 2025)](https://ecml-aaltd.github.io/aaltd2025/) in conjunction with the [2025 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2025)](https://ecmlpkdd.org/2025/).
In this article, we present a benchmark comparison between 22 data augmentation techniques on 131 time series classification datasets of the [UCR archive](https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/).

<img id="img-overview" src="https://raw.githubusercontent.com/MSD-IRIMAS/Data-Augmentation-4-TSC/refs/heads/main/static/summary-methods.png" class="interpolation-image" style="width: 100%; height: 100%; border: none;"> </img>

## Abstract

Time series classification is widely used in many fields, but it often suffers from a lack of labeled data. To address this, researchers commonly apply data augmentation techniques that generate synthetic samples through transformations such as jittering, warping, or resampling. However, with an increasing number of available augmentation methods, it becomes difficult to choose the most suitable one for a given task. In many cases, this choice is based on intuition or visual inspection. Assessing the impact of this choice on classification accuracy requires training models, which is time-consuming and depends on the dataset. In this work, we adopt a generative model perspective and evaluate augmentation methods prior to training any classifier, using metrics that quantify both fidelity and diversity of the generated samples. We benchmark 22 augmentation techniques on 131 public datasets using eight metrics. Our results provide a practical and efficient way to compare augmentation methods without relying solely on classifier performance.

## Data

In this work we utilize 131 datasets of the UCR archive taken from the [original repository](https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/) and the [new added datasets](https://link.springer.com/content/pdf/10.1007/s10618-024-01022-1.pdf).

However you are not obligated to download them as our code loads the datasets through the [Time Series Classification webpage](https://timeseriesclassification.com/) using [aeon-toolkit](https://aeon-toolkit.org/).

## Docker

This repository supports the usage of docker. In order to create the docker image using the [dockerfile](dockerfile), simply run the following command (assuming you have docker installed and nvidia cuda container as well):
```bash
docker build --build-arg USER_ID=$(id -u) --build-arg GROUP_ID=$(id -g) -t data-augmentation-review-image .
```
After the image has been successfully built, you can create the docker container using the following command:
```bash
docker run --gpus all -it --name data-augmentation-review-container -v "$(pwd):/home/myuser/code" --user $(id -u):$(id -g) data-augmentation-review-image bash
```

The code will be stored under the directory `/home/myuser/code/` inside the docker container. This will allow you to use GPU acceleration.

## Requirements

If you do not want to use docker, simply install the project using the following command:
```bash
python3 -m venv ./data-augmentation-review-venv
source ./data-augmentation-review-venv/bin/activate
pip install --upgrade pip
pip install -e .[dev]
```

Make sure you have [`jq`](https://jqlang.org/) installed on your system. This project supports `python>=3.10` only.

You can see the list of dependencies and their required version in the [pyptoject.toml](pyproject.toml) file.


## Running the code on a single experiment

If you wish to run a single experiment on a single dataset, using a single augmentation method, using a single model then first you have to execute your docker container to open a terminal inside if you're not inside the container:
```bash
docker exec -it data-augmentation-review-container bash
```
Then you can run the following command for example to run Amplitude Warping on the Adiac dataset:
```bash
python3 main.py task=generate_data dataset_name=Adiac generate_data.method=AW
```
The code uses [hydra](https://hydra.cc/docs/intro/) for the parameter configuration, simply see the [hydra configuration file](config/config_hydra.yaml) for a detailed view on the parameters of our experiments.

## Running the whole benchmark

If you wish to run all the experiments to reproduce the results of our article simply run the following for data generation experiments:
```bash
chmod +x run_generate_data.sh
nohup ./run_generate_data.sh &
```
and the following for training the feature extractor:
```bash
chmod +x run_train_feature_extractor.sh
nohup ./run_train_feature_extractor.sh &
```
and the following for evaluation of the generations:
```bash
chmod +x run_evaluate_generation.sh
nohup ./run_evaluate_generation.sh &
```

## Cite this work

If you use this work please cite the following:
```bibtex
@inproceedings{ismail-fawaz2025Data-Aug-4-TSC,
  author = {Ismail-Fawaz, Ali and Devanne, Maxime and Berretti, Sefano and Weber, Jonathan and Forestier, Germain},
  title = {Re-framing Time Series Augmentation Through the Lens of Generative Models},
  booktitle = {ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data},
  city = {Porto},
  country = {Portugal},
  year = {2025}
}
```

## Acknowledgments

This work was supported by the ANR DELEGATION project (grant ANR-21-CE23-0014) of the French Agence Nationale de la Recherche. The authors would like to acknowledge the High Performance Computing Center of the University of Strasbourg for supporting this work by providing scientific sup- port and access to computing resources. Part of the computing resources were funded by the Equipex Equip@Meso project (Programme Investissements d’Avenir) and the CPER Alsacalcul/Big Data. The authors would also like to thank the creators and providers of the UCR Archive

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "data-aug-4-tsc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Ali Ismail-Fawaz <ali-el-hadi.ismail-fawaz@uha.fr>",
    "keywords": "data-science, machine-learning, data-mining, time-series, classification, time-series-analysis, time-series-classification, time-series-augmentation, data-augmentation",
    "author": null,
    "author_email": "Ali Ismail-Fawaz <ali-el-hadi.ismail-fawaz@uha.fr>, Maxime Devanne <maxime.devanne@uha.fr>, Stefano Berretti <stefano.berretti@unifi.it>, Jonathan Weber <jonathan.weber@uha.fr>, Germain Forestier <germain.forestier@uha.fr>",
    "download_url": "https://files.pythonhosted.org/packages/58/31/cce816c6e890085d6bb48047397943915e6345c46941f4968c6ee922c327/data_aug_4_tsc-0.0.1.tar.gz",
    "platform": null,
    "description": "> \u26a0\ufe0f **Alert:** If you are using this code with **Keras v3**, make sure you are using **Keras \u2265 3.6.0**.\n> Earlier versions of Keras v3 do not honor `trainable=False`, which will result in **training hand-crafted filters** in **LITEMV** unexpectedly.\n\n| Overview        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |\n|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| **CI/CD**       | [![github-actions-main](https://github.com/MSD-IRIMAS/Data-Augmentation-4-TSC/actions/workflows/pytest.yml/badge.svg?branch=main&logo=github&label=build%20(main))](https://github.com/MSD-IRIMAS/Data-Augmentation-4-TSC/actions/workflows/pytest.yml) [![github-actions-tests](https://github.com/MSD-IRIMAS/Data-Augmentation-4-TSC/actions/workflows/pre-commit.yml/badge.svg?logo=github&label=build%20(tests))](https://github.com/MSD-IRIMAS/Data-Augmentation-4-TSC/actions/workflows/pre-commit.yml)  |\n| **Code**        | [![pypi](https://img.shields.io/pypi/v/data-aug-4-tsc?logo=pypi&color=blue)](https://pypi.org/project/data-aug-4-tsc/) [![python-versions](https://img.shields.io/pypi/pyversions/data-aug-4-tsc?logo=python)](https://www.python.org/) [![!black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![license](https://img.shields.io/badge/license-GPL3.0-green)](https://github.com/MSD-IRIMAS/Data-Augmentation-4-TSC/blob/main/LICENSE) |\n| **Community**   | [![website](https://img.shields.io/static/v1?label=Website&message=msd-irimas.github.io&color=blue&logo=githubpages)](https://msd-irimas.github.io/msd-irimas.github.io/)\n\n\n# Re-framing Time Series Augmentation Through the Lens of Generative Models\n\nAuthors: [Ali Ismail-Fawaz](https://hadifawaz1999.github.io/)<sup>1</sup>, [Maxime Devanne](https://maxime-devanne.com/)<sup>1</sup>, [Stefano Berreti](https://www.micc.unifi.it/berretti/)<sup>2</sup>, [Jonathan Weber](https://www.jonathan-weber.eu/)<sup>1</sup> and [Germain Forestier](https://germain-forestier.info/)<sup>1,3</sup>\n\n<sup>1</sup> IRIMAS, Universite de Haute-Alsace, France<br>\n<sup>2</sup> MICC, University of Florence, Italy<br>\n<sup>3</sup> DSAI, Monash University, Australia\n\nThis repository is the source code of the article titled \"[Re-framing Time Series Augmentation Through the Lens of Generative Models](#)\" accepted in the [10th Workshop on Advanced Analytics and Learning on Temporal Data (AALTD 2025)](https://ecml-aaltd.github.io/aaltd2025/) in conjunction with the [2025 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2025)](https://ecmlpkdd.org/2025/).\nIn this article, we present a benchmark comparison between 22 data augmentation techniques on 131 time series classification datasets of the [UCR archive](https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/).\n\n<img id=\"img-overview\" src=\"https://raw.githubusercontent.com/MSD-IRIMAS/Data-Augmentation-4-TSC/refs/heads/main/static/summary-methods.png\" class=\"interpolation-image\" style=\"width: 100%; height: 100%; border: none;\"> </img>\n\n## Abstract\n\nTime series classification is widely used in many fields, but it often suffers from a lack of labeled data. To address this, researchers commonly apply data augmentation techniques that generate synthetic samples through transformations such as jittering, warping, or resampling. However, with an increasing number of available augmentation methods, it becomes difficult to choose the most suitable one for a given task. In many cases, this choice is based on intuition or visual inspection. Assessing the impact of this choice on classification accuracy requires training models, which is time-consuming and depends on the dataset. In this work, we adopt a generative model perspective and evaluate augmentation methods prior to training any classifier, using metrics that quantify both fidelity and diversity of the generated samples. We benchmark 22 augmentation techniques on 131 public datasets using eight metrics. Our results provide a practical and efficient way to compare augmentation methods without relying solely on classifier performance.\n\n## Data\n\nIn this work we utilize 131 datasets of the UCR archive taken from the [original repository](https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/) and the [new added datasets](https://link.springer.com/content/pdf/10.1007/s10618-024-01022-1.pdf).\n\nHowever you are not obligated to download them as our code loads the datasets through the [Time Series Classification webpage](https://timeseriesclassification.com/) using [aeon-toolkit](https://aeon-toolkit.org/).\n\n## Docker\n\nThis repository supports the usage of docker. In order to create the docker image using the [dockerfile](dockerfile), simply run the following command (assuming you have docker installed and nvidia cuda container as well):\n```bash\ndocker build --build-arg USER_ID=$(id -u) --build-arg GROUP_ID=$(id -g) -t data-augmentation-review-image .\n```\nAfter the image has been successfully built, you can create the docker container using the following command:\n```bash\ndocker run --gpus all -it --name data-augmentation-review-container -v \"$(pwd):/home/myuser/code\" --user $(id -u):$(id -g) data-augmentation-review-image bash\n```\n\nThe code will be stored under the directory `/home/myuser/code/` inside the docker container. This will allow you to use GPU acceleration.\n\n## Requirements\n\nIf you do not want to use docker, simply install the project using the following command:\n```bash\npython3 -m venv ./data-augmentation-review-venv\nsource ./data-augmentation-review-venv/bin/activate\npip install --upgrade pip\npip install -e .[dev]\n```\n\nMake sure you have [`jq`](https://jqlang.org/) installed on your system. This project supports `python>=3.10` only.\n\nYou can see the list of dependencies and their required version in the [pyptoject.toml](pyproject.toml) file.\n\n\n## Running the code on a single experiment\n\nIf you wish to run a single experiment on a single dataset, using a single augmentation method, using a single model then first you have to execute your docker container to open a terminal inside if you're not inside the container:\n```bash\ndocker exec -it data-augmentation-review-container bash\n```\nThen you can run the following command for example to run Amplitude Warping on the Adiac dataset:\n```bash\npython3 main.py task=generate_data dataset_name=Adiac generate_data.method=AW\n```\nThe code uses [hydra](https://hydra.cc/docs/intro/) for the parameter configuration, simply see the [hydra configuration file](config/config_hydra.yaml) for a detailed view on the parameters of our experiments.\n\n## Running the whole benchmark\n\nIf you wish to run all the experiments to reproduce the results of our article simply run the following for data generation experiments:\n```bash\nchmod +x run_generate_data.sh\nnohup ./run_generate_data.sh &\n```\nand the following for training the feature extractor:\n```bash\nchmod +x run_train_feature_extractor.sh\nnohup ./run_train_feature_extractor.sh &\n```\nand the following for evaluation of the generations:\n```bash\nchmod +x run_evaluate_generation.sh\nnohup ./run_evaluate_generation.sh &\n```\n\n## Cite this work\n\nIf you use this work please cite the following:\n```bibtex\n@inproceedings{ismail-fawaz2025Data-Aug-4-TSC,\n  author = {Ismail-Fawaz, Ali and Devanne, Maxime and Berretti, Sefano and Weber, Jonathan and Forestier, Germain},\n  title = {Re-framing Time Series Augmentation Through the Lens of Generative Models},\n  booktitle = {ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data},\n  city = {Porto},\n  country = {Portugal},\n  year = {2025}\n}\n```\n\n## Acknowledgments\n\nThis work was supported by the ANR DELEGATION project (grant ANR-21-CE23-0014) of the French Agence Nationale de la Recherche. The authors would like to acknowledge the High Performance Computing Center of the University of Strasbourg for supporting this work by providing scientific sup- port and access to computing resources. Part of the computing resources were funded by the Equipex Equip@Meso project (Programme Investissements d\u2019Avenir) and the CPER Alsacalcul/Big Data. The authors would also like to thank the creators and providers of the UCR Archive\n",
    "bugtrack_url": null,
    "license": "GPL-3.0-only",
    "summary": "Data Augmentation for Time series Data: A Review",
    "version": "0.0.1",
    "project_urls": {
        "Download": "https://pypi.org/project/data-aug-4-tsc/#files",
        "Homepage": "https://github.com/MSD-IRIMAS/Data-Augmentation-4-TSC/",
        "Repository": "https://github.com/MSD-IRIMAS/Data-Augmentation-4-TSC"
    },
    "split_keywords": [
        "data-science",
        " machine-learning",
        " data-mining",
        " time-series",
        " classification",
        " time-series-analysis",
        " time-series-classification",
        " time-series-augmentation",
        " data-augmentation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cc4d227ae3aae43c655b70d9fd1b33006ae403aeb915b063dad86fe7fe4918ae",
                "md5": "e815c371692eadf2e8db9779bb5c4fa3",
                "sha256": "129ef9837151f185aa94b4d00e4f0537e77e80a3b5a8db18259d75cd99edd5a4"
            },
            "downloads": -1,
            "filename": "data_aug_4_tsc-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e815c371692eadf2e8db9779bb5c4fa3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 17546,
            "upload_time": "2025-08-14T08:31:56",
            "upload_time_iso_8601": "2025-08-14T08:31:56.580509Z",
            "url": "https://files.pythonhosted.org/packages/cc/4d/227ae3aae43c655b70d9fd1b33006ae403aeb915b063dad86fe7fe4918ae/data_aug_4_tsc-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5831cce816c6e890085d6bb48047397943915e6345c46941f4968c6ee922c327",
                "md5": "f88302d90fc46c50e8329ce07065ef42",
                "sha256": "39ab5e2a24e740f1f975d924badc6334441f81f7d16625ea2503310ebd696f74"
            },
            "downloads": -1,
            "filename": "data_aug_4_tsc-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "f88302d90fc46c50e8329ce07065ef42",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 17979,
            "upload_time": "2025-08-14T08:31:57",
            "upload_time_iso_8601": "2025-08-14T08:31:57.942900Z",
            "url": "https://files.pythonhosted.org/packages/58/31/cce816c6e890085d6bb48047397943915e6345c46941f4968c6ee922c327/data_aug_4_tsc-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-14 08:31:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MSD-IRIMAS",
    "github_project": "Data-Augmentation-4-TSC",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "data-aug-4-tsc"
}
        
Elapsed time: 1.04261s