convst


Nameconvst JSON
Version 0.3.0 PyPI version JSON
download
home_pagehttps://github.com/baraline/convst
SummaryThe Random Dilation Shapelet Transform algorithm and associated works
upload_time2023-06-13 09:28:18
maintainer
docs_urlNone
authorAntoine Guillaume
requires_python>=3.8,<3.11
licenseBSD 2-Clause License Copyright (c) 2021, Antoine Guillaume All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
keywords data-science machine-learning data-mining time-series shapelets time-series-classification
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # This package is moving to the aeon-toolkit.
Starting from v0.3.0, this package will not be updated, bugfixes will still be included if issues are raised.
You can already find an updated version of RDST in the Aeon package at https://github.com/aeon-toolkit/ . Further improvements are planned for further speeding up RDST, these improvement will only be implemented in aeon.
All the functionnalities of this package will be ported into Aeon when I got some time, for now, only the transformer for univariate and multivariate series of even length have been implemented.

# Readme
Welcome to the convst repository. It contains the implementation of the `Random Dilated Shapelet Transform (RDST)` along with other works in the same area.
This work was supported by the following organisations:

<p float="center">
  <img src="https://raw.githubusercontent.com/baraline/convst/main/docs/_static/img/logo-UO-2022.png" width="32%" />
  <img src="https://raw.githubusercontent.com/baraline/convst/main/docs/_static/img/logo-lifo.png" width="32%" /> 
  <img src="https://raw.githubusercontent.com/baraline/convst/main/docs/_static/img/Logo_Worldline_-_2021(1).png" width="32%" />
</p>

## Status

| Overview | |
|---|---|
| **Compatibility** | [![!python-versions](https://img.shields.io/pypi/pyversions/convst)](https://www.python.org/)
| **CI/CD** |  [![!pypi](https://img.shields.io/pypi/v/convst?color=orange)](https://pypi.org/project/convst/)  ![docs](https://img.shields.io/readthedocs/convst) ![build](https://github.com/baraline/convst/actions/workflows/test.yml/badge.svg)| 
| **Code Quality** |  ![lines](https://img.shields.io/tokei/lines/github/baraline/convst) [![CodeFactor](https://www.codefactor.io/repository/github/baraline/convst/badge/main)](https://www.codefactor.io/repository/github/baraline/convst/overview/main) |
| **Downloads**| [![Downloads](https://pepy.tech/badge/convst)](https://pepy.tech/project/convst) |



<p float="center">
  <img src="https://raw.githubusercontent.com/baraline/convst/main/docs/_static/img/cd_ensemble.png" width="100%" />
</p>

## Installation

The recommended way to install the latest stable version is to use pip with `pip install convst`. To install the package from sources, you can download the latest version on GitHub and run `python setup.py install`. This should install the package and automatically look for the dependencies using `pip`. 

We recommend doing this in a new virtual environment using anaconda to avoid any conflict with an existing installation. If you wish to install dependencies individually, you can see dependencies in the `setup.py` file.

An optional dependency that can help speed up numba, which is used in our implementation, is the Intel vector math library (SVML). When using conda it can be installed by running `conda install -c numba icc_rt`. I didn't test the behavior with AMD processors, but I suspect it won't work.

If you are using RDST in some specific settings such as an HPC cluster and are getting errors, take a loot at [issue #24](https://github.com/baraline/convst/issues/24), you may need to change the numba compilation settings to not using function caching (see [this example](https://github.com/baraline/convst/blob/main/examples/Changing_numba_options.py)). THIS SHOULD BE FIXED WITH v0.3.0


## Tutorial
We give here a minimal example to run the `RDST` algorithm on any dataset of the UCR archive using the aeon API to get datasets:

```python

from convst.classifiers import R_DST_Ridge
from convst.utils.dataset_utils import load_UCR_UEA_dataset_split

X_train, X_test, y_train, y_test, _ = load_UCR_UEA_dataset_split('GunPoint')

# First run may be slow due to numba compilations on the first call. 
# Run a small dataset like GunPoint if this is the first time you call RDST on your system.
# You can change n_shapelets to 1 to make this process faster. The n_jobs parameter can
# also be changed to increase speed once numba compilation are done.

rdst = R_DST_Ridge(n_shapelets=10_000, n_jobs=1).fit(X_train, y_train)
print("Accuracy Score for RDST : {}".format(rdst.score(X_test, y_test)))
```
If you want a more powerful model, you can use R_DST_Ensemble as follows (note that additional Numba compilation might be needed here):
```python

from convst.classifiers import R_DST_Ensemble

rdst_e = R_DST_Ensemble(
  n_shapelets_per_estimator=10_000,
  n_jobs=1
).fit(X_train, y_train)
print("Accuracy Score for RDST : {}".format(rdst_e.score(X_test, y_test)))

```
You can obtain faster result by using more jobs and even faster, at the expense of some accuracy, with the prime_dilation option:

```python
rdst_e = R_DST_Ensemble(
  n_shapelets_per_estimator=10_000,
  prime_dilations=True,
  n_jobs=-1
).fit(X_train, y_train)

print("Accuracy Score for RDST : {}".format(rdst_e.score(X_test, y_test)))
```
You can also visualize a shapelet using the visualization tool to obtain such visualization :

![Example of shapelet visualization](https://raw.githubusercontent.com/baraline/convst/main/docs/_static/img/shp_vis.png)

To know more about all the interpretability tools, check the documentation on readthedocs.

## Supported inputs
RDST support the following type of time series:
- Univariate and same length
- Univariate and variable length
- Multivariate and same length
- Multivariate and variable length

We use the standard scikit-learn interface and expect as input a 3D numpy array of shape `(n_samples, n_features, n_timestamps)`. For variable length input, we expect a (python) list of numpy arrays, or a numpy array with object dtype.

## Reproducing the paper results

Multiple scripts are available under the `PaperScripts` folder. It contains the exact same scripts used to generate our results, notably the `test_models.py` file, used to generate the csv results available in the `Results` folder of the archive.

## Contributing, Citing and Contact

If you are experiencing bugs in the RDST implementation, or would like to contribute in any way, please create an issue or pull request in this repository.
For other question or to take contact with me, you can email me at antoine.guillaume45@gmail.com

If you use our algorithm or publication in any work, please cite the following paper (ArXiv version https://arxiv.org/abs/2109.13514):

```bibtex
@InProceedings{10.1007/978-3-031-09037-0_53,
author="Guillaume, Antoine
and Vrain, Christel
and Elloumi, Wael",
title="Random Dilated Shapelet Transform: A New Approach for Time Series Shapelets",
booktitle="Pattern Recognition and Artificial Intelligence",
year="2022",
publisher="Springer International Publishing",
address="Cham",
pages="653--664",
abstract="Shapelet-based algorithms are widely used for time series classification because of their ease of interpretation, but they are currently outperformed by recent state-of-the-art approaches. We present a new formulation of time series shapelets including the notion of dilation, and we introduce a new shapelet feature to enhance their discriminative power for classification. Experiments performed on 112 datasets show that our method improves on the state-of-the-art shapelet algorithm, and achieves comparable accuracy to recent state-of-the-art approaches, without sacrificing neither scalability, nor interpretability.",
isbn="978-3-031-09037-0"
}
```

To cite the RDST Ensemble method, you can cite the PhD thesis where it is presented as (soon to be available, citation format may change):
```bibtex
@phdthesis{Guillaume2023,
  author="Guillaume, Antoine", 
  title="Time series classification with Shapelets: Application to predictive maintenance on event logs",
  school="University of Orléans",
  year="2023",
  url="https://www.theses.fr/s265104"
}
```

## TODO for relase 1.0:

- [ ] Finish Numpy docs in all python files
- [ ] Update documentation and examples
- [X] Enhance interface for interpretability tools
- [X] Add the Generalised version of RDST
- [ ] Continue unit tests and code coverage/quality

## Citations

Here are the code-related citations that were not made in the paper

[1]: [The Scikit-learn development team, "Scikit-learn: Machine Learning in Python", Journal of Machine Learning Research 2011](https://scikit-learn.org/stable/)

[2]: [The Numpy development team, "Array programming with NumPy", Nature 2020](https://numpy.org/)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/baraline/convst",
    "name": "convst",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<3.11",
    "maintainer_email": "",
    "keywords": "data-science,machine-learning,data-mining,time-series,shapelets,time-series-classification",
    "author": "Antoine Guillaume",
    "author_email": "Antoine Guillaume <antoine.guillaume45@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/e0/1a/08923a9e5f2fb9d407996dedf88ee39641867ceb34ebe406ad2f2f39221e/convst-0.3.0.tar.gz",
    "platform": null,
    "description": "# This package is moving to the aeon-toolkit.\nStarting from v0.3.0, this package will not be updated, bugfixes will still be included if issues are raised.\nYou can already find an updated version of RDST in the Aeon package at https://github.com/aeon-toolkit/ . Further improvements are planned for further speeding up RDST, these improvement will only be implemented in aeon.\nAll the functionnalities of this package will be ported into Aeon when I got some time, for now, only the transformer for univariate and multivariate series of even length have been implemented.\n\n# Readme\nWelcome to the convst repository. It contains the implementation of the `Random Dilated Shapelet Transform (RDST)` along with other works in the same area.\nThis work was supported by the following organisations:\n\n<p float=\"center\">\n  <img src=\"https://raw.githubusercontent.com/baraline/convst/main/docs/_static/img/logo-UO-2022.png\" width=\"32%\" />\n  <img src=\"https://raw.githubusercontent.com/baraline/convst/main/docs/_static/img/logo-lifo.png\" width=\"32%\" /> \n  <img src=\"https://raw.githubusercontent.com/baraline/convst/main/docs/_static/img/Logo_Worldline_-_2021(1).png\" width=\"32%\" />\n</p>\n\n## Status\n\n| Overview | |\n|---|---|\n| **Compatibility** | [![!python-versions](https://img.shields.io/pypi/pyversions/convst)](https://www.python.org/)\n| **CI/CD** |  [![!pypi](https://img.shields.io/pypi/v/convst?color=orange)](https://pypi.org/project/convst/)  ![docs](https://img.shields.io/readthedocs/convst) ![build](https://github.com/baraline/convst/actions/workflows/test.yml/badge.svg)| \n| **Code Quality** |  ![lines](https://img.shields.io/tokei/lines/github/baraline/convst) [![CodeFactor](https://www.codefactor.io/repository/github/baraline/convst/badge/main)](https://www.codefactor.io/repository/github/baraline/convst/overview/main) |\n| **Downloads**| [![Downloads](https://pepy.tech/badge/convst)](https://pepy.tech/project/convst) |\n\n\n\n<p float=\"center\">\n  <img src=\"https://raw.githubusercontent.com/baraline/convst/main/docs/_static/img/cd_ensemble.png\" width=\"100%\" />\n</p>\n\n## Installation\n\nThe recommended way to install the latest stable version is to use pip with `pip install convst`. To install the package from sources, you can download the latest version on GitHub and run `python setup.py install`. This should install the package and automatically look for the dependencies using `pip`. \n\nWe recommend doing this in a new virtual environment using anaconda to avoid any conflict with an existing installation. If you wish to install dependencies individually, you can see dependencies in the `setup.py` file.\n\nAn optional dependency that can help speed up numba, which is used in our implementation, is the Intel vector math library (SVML). When using conda it can be installed by running `conda install -c numba icc_rt`. I didn't test the behavior with AMD processors, but I suspect it won't work.\n\nIf you are using RDST in some specific settings such as an HPC cluster and are getting errors, take a loot at [issue #24](https://github.com/baraline/convst/issues/24), you may need to change the numba compilation settings to not using function caching (see [this example](https://github.com/baraline/convst/blob/main/examples/Changing_numba_options.py)). THIS SHOULD BE FIXED WITH v0.3.0\n\n\n## Tutorial\nWe give here a minimal example to run the `RDST` algorithm on any dataset of the UCR archive using the aeon API to get datasets:\n\n```python\n\nfrom convst.classifiers import R_DST_Ridge\nfrom convst.utils.dataset_utils import load_UCR_UEA_dataset_split\n\nX_train, X_test, y_train, y_test, _ = load_UCR_UEA_dataset_split('GunPoint')\n\n# First run may be slow due to numba compilations on the first call. \n# Run a small dataset like GunPoint if this is the first time you call RDST on your system.\n# You can change n_shapelets to 1 to make this process faster. The n_jobs parameter can\n# also be changed to increase speed once numba compilation are done.\n\nrdst = R_DST_Ridge(n_shapelets=10_000, n_jobs=1).fit(X_train, y_train)\nprint(\"Accuracy Score for RDST : {}\".format(rdst.score(X_test, y_test)))\n```\nIf you want a more powerful model, you can use R_DST_Ensemble as follows (note that additional Numba compilation might be needed here):\n```python\n\nfrom convst.classifiers import R_DST_Ensemble\n\nrdst_e = R_DST_Ensemble(\n  n_shapelets_per_estimator=10_000,\n  n_jobs=1\n).fit(X_train, y_train)\nprint(\"Accuracy Score for RDST : {}\".format(rdst_e.score(X_test, y_test)))\n\n```\nYou can obtain faster result by using more jobs and even faster, at the expense of some accuracy, with the prime_dilation option:\n\n```python\nrdst_e = R_DST_Ensemble(\n  n_shapelets_per_estimator=10_000,\n  prime_dilations=True,\n  n_jobs=-1\n).fit(X_train, y_train)\n\nprint(\"Accuracy Score for RDST : {}\".format(rdst_e.score(X_test, y_test)))\n```\nYou can also visualize a shapelet using the visualization tool to obtain such visualization :\n\n![Example of shapelet visualization](https://raw.githubusercontent.com/baraline/convst/main/docs/_static/img/shp_vis.png)\n\nTo know more about all the interpretability tools, check the documentation on readthedocs.\n\n## Supported inputs\nRDST support the following type of time series:\n- Univariate and same length\n- Univariate and variable length\n- Multivariate and same length\n- Multivariate and variable length\n\nWe use the standard scikit-learn interface and expect as input a 3D numpy array of shape `(n_samples, n_features, n_timestamps)`. For variable length input, we expect a (python) list of numpy arrays, or a numpy array with object dtype.\n\n## Reproducing the paper results\n\nMultiple scripts are available under the `PaperScripts` folder. It contains the exact same scripts used to generate our results, notably the `test_models.py` file, used to generate the csv results available in the `Results` folder of the archive.\n\n## Contributing, Citing and Contact\n\nIf you are experiencing bugs in the RDST implementation, or would like to contribute in any way, please create an issue or pull request in this repository.\nFor other question or to take contact with me, you can email me at antoine.guillaume45@gmail.com\n\nIf you use our algorithm or publication in any work, please cite the following paper (ArXiv version https://arxiv.org/abs/2109.13514):\n\n```bibtex\n@InProceedings{10.1007/978-3-031-09037-0_53,\nauthor=\"Guillaume, Antoine\nand Vrain, Christel\nand Elloumi, Wael\",\ntitle=\"Random Dilated Shapelet Transform: A New Approach for Time Series Shapelets\",\nbooktitle=\"Pattern Recognition and Artificial Intelligence\",\nyear=\"2022\",\npublisher=\"Springer International Publishing\",\naddress=\"Cham\",\npages=\"653--664\",\nabstract=\"Shapelet-based algorithms are widely used for time series classification because of their ease of interpretation, but they are currently outperformed by recent state-of-the-art approaches. We present a new formulation of time series shapelets including the notion of dilation, and we introduce a new shapelet feature to enhance their discriminative power for classification. Experiments performed on 112 datasets show that our method improves on the state-of-the-art shapelet algorithm, and achieves comparable accuracy to recent state-of-the-art approaches, without sacrificing neither scalability, nor interpretability.\",\nisbn=\"978-3-031-09037-0\"\n}\n```\n\nTo cite the RDST Ensemble method, you can cite the PhD thesis where it is presented as (soon to be available, citation format may change):\n```bibtex\n@phdthesis{Guillaume2023,\n  author=\"Guillaume, Antoine\", \n  title=\"Time series classification with Shapelets: Application to predictive maintenance on event logs\",\n  school=\"University of Orl\u00e9ans\",\n  year=\"2023\",\n  url=\"https://www.theses.fr/s265104\"\n}\n```\n\n## TODO for relase 1.0:\n\n- [ ] Finish Numpy docs in all python files\n- [ ] Update documentation and examples\n- [X] Enhance interface for interpretability tools\n- [X] Add the Generalised version of RDST\n- [ ] Continue unit tests and code coverage/quality\n\n## Citations\n\nHere are the code-related citations that were not made in the paper\n\n[1]: [The Scikit-learn development team, \"Scikit-learn: Machine Learning in Python\", Journal of Machine Learning Research 2011](https://scikit-learn.org/stable/)\n\n[2]: [The Numpy development team, \"Array programming with NumPy\", Nature 2020](https://numpy.org/)\n",
    "bugtrack_url": null,
    "license": "BSD 2-Clause License  Copyright (c) 2021, Antoine Guillaume All rights reserved.  Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ",
    "summary": "The Random Dilation Shapelet Transform algorithm and associated works",
    "version": "0.3.0",
    "project_urls": {
        "Download": "https://pypi.org/project/convst/#files",
        "Homepage": "https://github.com/baraline/convst",
        "documentation": "https://convst.readthedocs.io/",
        "download": "https://pypi.org/project/convst/#files",
        "repository": "https://github.com/baraline/convst"
    },
    "split_keywords": [
        "data-science",
        "machine-learning",
        "data-mining",
        "time-series",
        "shapelets",
        "time-series-classification"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8a5c0371a1d1961c2f38596a83cad4ca3b7dbf736c842e1ded5743051ef7b5a4",
                "md5": "5007e223a5bdf899b69bbbb9693842be",
                "sha256": "ff0eb67fcad797cfeb09c917439531c059c2c16dfcb6b3705fdbc29bc572b1f1"
            },
            "downloads": -1,
            "filename": "convst-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5007e223a5bdf899b69bbbb9693842be",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<3.11",
            "size": 55936,
            "upload_time": "2023-06-13T09:28:16",
            "upload_time_iso_8601": "2023-06-13T09:28:16.660833Z",
            "url": "https://files.pythonhosted.org/packages/8a/5c/0371a1d1961c2f38596a83cad4ca3b7dbf736c842e1ded5743051ef7b5a4/convst-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e01a08923a9e5f2fb9d407996dedf88ee39641867ceb34ebe406ad2f2f39221e",
                "md5": "ea91b951fc60c9a26dc6b018c748d758",
                "sha256": "a4f37d8a4eecb92b6b25c07349d8c097ba85979306bd7d4493ba2143d1b25c74"
            },
            "downloads": -1,
            "filename": "convst-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ea91b951fc60c9a26dc6b018c748d758",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<3.11",
            "size": 46000,
            "upload_time": "2023-06-13T09:28:18",
            "upload_time_iso_8601": "2023-06-13T09:28:18.036696Z",
            "url": "https://files.pythonhosted.org/packages/e0/1a/08923a9e5f2fb9d407996dedf88ee39641867ceb34ebe406ad2f2f39221e/convst-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-13 09:28:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "baraline",
    "github_project": "convst",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "convst"
}
        
Elapsed time: 1.35528s