GPUDTW

Name	GPUDTW JSON
Version	0.13 JSON
	download
home_page	https://github.com/qianlikzf/GPUDTW
Summary	dynamic time warping (DTW) by GPU accelerated
upload_time	2024-03-08 14:52:01
maintainer
docs_url	None
author	Wang Zihao
requires_python
license	GNU GENERAL PUBLIC LICENSE Version 3
keywords	dtw gpu opencl cuda
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Project description

Dynamic Time Warping (DTW) is a mathematical technique used to compare two temporal sequences that do not align perfectly. Detailed introductions and explanations of the algorithm can be found [here](https://builtin.com/data-science/dynamic-time-warping). For long sequences, DTW computations can be computationally intensive, thus requiring significant processing time. However, by utilizing GPU acceleration, the computation speed of DTW can be increased by hundreds of times. Using this software package, the DTW distances of millions of one-dimensional sequence data can be calculated in just a few seconds.

The software in this package is a library enables the utilization of GPUs for accelerated dynamic time warping (DTW), supporting both CUDA and OpenCL. This means you can reap the benefits of GPU acceleration from common NVIDIA graphics cards, as well as utilize AMD or Intel hardware for compute acceleration.

This software has been optimized for GPU memory. If the capacity of your array exceeds the GPU memory, the software will automatically split the data into smaller chunks and perform the calculations separately. This means that as long as your host machine's memory can accommodate your data, you don't need to worry about the GPU's memory capacity, even if your GPU only has 512M of memory.

The software has been tested on various GPUs, including the older GeForce 9800 GT and the modern RTX 4090, and both can utilize this software effectively. AMD's range of high-end and low-end graphics cards, as well as Intel's IGP, can also utilize the OpenCL functionality of this software. 

# Hardware specifications

This software is compatible with various modern PC hardware devices, including portable laptops, and may take advantage of acceleration provided by the CPU's integrated graphics.

# Software requirements

To use this software package, you need to install the CORRECT graphics card drivers first, because some drivers shipped by Windows do not support OpenCL. You can use the software GPU-z to check whether your graphics card supports OpenCL. It is recommended to install the Adrenalin driver on AMD graphics cards, rather than the PRO driver.

This software requires the support of the pycuda or pyopencl modules in Python. You can choose to install the one that suits your acceleration needs, and it's not necessary to install both modules. Here are the installation commands:
~~~bash
pip install pycuda
pip install pyopencl
~~~
Note: You may need to install the cuda tool kit before installing pycuda. Although pycuda and pyopencl are the basic dependencies for this software package, in order to reduce the installation difficulties and provide users with more options, these dependencies are not forcibly configured in the setup of this software package.

Then, you can choose from the following two options:

## CUDA
When utilizing the CUDA acceleration feature of this software, you need to install the CUDA Toolkit. You can download and use any version of the CUDA Toolkit that is compatible with your graphics card, even the older 6.5 version.

## OpenCL
Using OpenCL is simpler and only necessitates the installation of correct graphics card drivers. 

The OpenCL.dll version 2.0.4.0 bundled with Windows 10 is not compatible with the latest pyopencl module. To resolve this issue, you need to locate another OpenCL.dll file (version 3.0.3.0) in the installation directory of the module, which is typically located in "lib/site-packages/GPUDTW". Then, manually copy this file to the system32 subdirectory within your Windows directory, overwriting the existing 2.0.4.0 version.

In cases where there are multiple graphics cards or your CPU with integrated graphics, you may be prompted to select which hardware to use for accelerated computing. At this point, you should choose the stronger graphics card based on your knowledge of your hardware's performance. When using a laptop with an 11th generation Intel processor or newer processors, selecting the Intel IGP may yield unexpected benefits.

# How to use

Prepare your data, which should be a 2D numpy array. The first dimension represents the size of the data, and the second dimension represents the length of the vectors to be calculated. Your data consists of two arrays, one as the source and the other as the target. The vector length of the source data must be equal to that of the target data. If they are not equal, you need to align them, and there are many methods for alignment, such as upsampling and downsampling. 

Then, simply call the **cuda_dtw** or **opencl_dtw** function to compute the DTW Euclidean distance between the two datasets. 

The returned data is a 2D array, where the first dimension represents the number of source data vectors, and the second dimension represents the number of target data vectors. For instance, the data at position (2,3) corresponds to the DTW Euclidean distance between the second source data vector and the third target data vector.

Additionally, a CPU-based computation function is provided to validate the results obtained from the GPU. This CPU function leverages the numba module for parallel acceleration, so you'll need to install the numba module by running "pip install numba".

Examples are available in the unit test script **test.py**

~~~python
from __future__ import absolute_import
from __future__ import print_function

import numpy
import time

try:
    from GPUDTW import cuda_dtw
except:
    pass

try:
    from GPUDTW import opencl_dtw
except:
    pass

try:
    from GPUDTW import cpu_dtw, dtw_1D_jit2
except:
    pass

if __name__ == '__main__':
    S = numpy.random.random ((3,1212))
    S = S.astype(numpy.float32)
    T = numpy.random.random ((1312,1212))
    T = T.astype(numpy.float32)

    t0 = time.time()
    ret_cpu =cpu_dtw (S, T, dtw_1D_jit2)
    print ("cpu time",time.time()-t0)

    if 'cuda_dtw' in locals():
        t0 = time.time()
        ret_cuda = cuda_dtw (S, T)
        print ("cuda time:",time.time()-t0)
        cuda_verify = numpy.sqrt((ret_cuda - ret_cpu)**2)
        print ("Maximum Deviation in cuda with CPU ", cuda_verify.max())

    #os.environ['PYOPENCL_CTX'] = '0'
    if 'opencl_dtw' in locals():
        t0 = time.time()
        ret_opencl = opencl_dtw (S, T)
        print ("OpenCL time:",time.time()-t0)
        opencl_verify = numpy.sqrt((ret_opencl - ret_cpu)**2)
        print ("Maximum Deviation in OpenCL with CPU", opencl_verify.max())
~~~

# Trouble tip

1. When you encounter issues with the program not working properly or crashing, you should check if your graphics card driver version is correct, or if it matches the version of the CUDA Toolkit. For related knowledge on NVIDIA, you can refer to this [link](https://developer.nvidia.com/cuda-gpus).

2. The vector length cannot be excessively large, typically not exceeding around 2500, due to the limitations of the GPU's high-speed local memory (on-chip memory). If the vector length is too large, the computation buffer cannot be accommodated within the GPU's local memory, and the program will directly report an error and exit. Alternatively, one can consider placing the computation buffer in the graphics memory (global memory), but the access speed of graphics memory is significantly slower than local memory, and the advantage of parallel computing will not be as apparent. Additionally, the DTW algorithm is often used for comparing time-series data, and in practical work, it is rare to encounter time-series with lengths exceeding 1000. Therefore, this issue can be temporarily shelved. If you do encounter such a special case, please leave a detailed message on GitHub, explaining your specific work background. If you can convince me, perhaps I will have time to update a new version that can perform calculations on graphics memory.

# Copyright

 Copyright (C) 2024 Wuhan University of Technology

 Authors: Wang Zihao <qianlkzf@outlook.com> 
  
 This program is free software: you can redistribute it and/or modify  
 it under the terms of the GNU General Public License as published by  
 the Free Software Foundation, either version 3 of the License, or  
 (at your option) any later version.  
  
 This program is distributed in the hope that it will be useful,  
 but WITHOUT ANY WARRANTY; without even the implied warranty of  
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the  
 GNU General Public License for more details.  
  
 You should have received a copy of the GNU General Public License  
 along with this program.  If not, see <https://www.gnu.org/licenses/>.  
 
## License  
  
This project is licensed under the [GNU General Public License v3.0](LICENSE).  
  
## GitHub Repository  
  
You can find the source code and more information about this project on GitHub at:  
  
[<img src="https://img.shields.io/badge/GitHub-Repo-blue?logo=github">](https://github.com/qianlikzf/GPUDTW)  
  
Or visit directly:  
  
[https://github.com/qianlikzf/GPUDTW](https://github.com/qianlikzf/GPUDTW)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/qianlikzf/GPUDTW",
    "name": "GPUDTW",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "DTW,GPU,OpenCL,CUDA",
    "author": "Wang Zihao",
    "author_email": "qianlkzf@outlook.com",
    "download_url": "https://files.pythonhosted.org/packages/6b/be/2c677b3a7fe533fe2d9b8bffc6d8bcd9bccda29199d666a6ffb34ff6817d/GPUDTW-0.13.tar.gz",
    "platform": null,
    "description": "# Project description\r\n\r\nDynamic Time Warping (DTW) is a mathematical technique used to compare two temporal sequences that do not align perfectly. Detailed introductions and explanations of the algorithm can be found [here](https://builtin.com/data-science/dynamic-time-warping). For long sequences, DTW computations can be computationally intensive, thus requiring significant processing time. However, by utilizing GPU acceleration, the computation speed of DTW can be increased by hundreds of times. Using this software package, the DTW distances of millions of one-dimensional sequence data can be calculated in just a few seconds.\r\n\r\nThe software in this package is a library enables the utilization of GPUs for accelerated dynamic time warping (DTW), supporting both CUDA and OpenCL. This means you can reap the benefits of GPU acceleration from common NVIDIA graphics cards, as well as utilize AMD or Intel hardware for compute acceleration.\r\n\r\nThis software has been optimized for GPU memory. If the capacity of your array exceeds the GPU memory, the software will automatically split the data into smaller chunks and perform the calculations separately. This means that as long as your host machine's memory can accommodate your data, you don't need to worry about the GPU's memory capacity, even if your GPU only has 512M of memory.\r\n\r\nThe software has been tested on various GPUs, including the older GeForce 9800 GT and the modern RTX 4090, and both can utilize this software effectively. AMD's range of high-end and low-end graphics cards, as well as Intel's IGP, can also utilize the OpenCL functionality of this software. \r\n\r\n# Hardware specifications\r\n\r\nThis software is compatible with various modern PC hardware devices, including portable laptops, and may take advantage of acceleration provided by the CPU's integrated graphics.\r\n\r\n# Software requirements\r\n\r\nTo use this software package, you need to install the CORRECT graphics card drivers first, because some drivers shipped by Windows do not support OpenCL. You can use the software GPU-z to check whether your graphics card supports OpenCL. It is recommended to install the Adrenalin driver on AMD graphics cards, rather than the PRO driver.\r\n\r\nThis software requires the support of the pycuda or pyopencl modules in Python. You can choose to install the one that suits your acceleration needs, and it's not necessary to install both modules. Here are the installation commands:\r\n~~~bash\r\npip install pycuda\r\npip install pyopencl\r\n~~~\r\nNote: You may need to install the cuda tool kit before installing pycuda. Although pycuda and pyopencl are the basic dependencies for this software package, in order to reduce the installation difficulties and provide users with more options, these dependencies are not forcibly configured in the setup of this software package.\r\n\r\nThen, you can choose from the following two options:\r\n\r\n## CUDA\r\nWhen utilizing the CUDA acceleration feature of this software, you need to install the CUDA Toolkit. You can download and use any version of the CUDA Toolkit that is compatible with your graphics card, even the older 6.5 version.\r\n\r\n## OpenCL\r\nUsing OpenCL is simpler and only necessitates the installation of correct graphics card drivers. \r\n\r\nThe OpenCL.dll version 2.0.4.0 bundled with Windows 10 is not compatible with the latest pyopencl module. To resolve this issue, you need to locate another OpenCL.dll file (version 3.0.3.0) in the installation directory of the module, which is typically located in \"lib/site-packages/GPUDTW\". Then, manually copy this file to the system32 subdirectory within your Windows directory, overwriting the existing 2.0.4.0 version.\r\n\r\nIn cases where there are multiple graphics cards or your CPU with integrated graphics, you may be prompted to select which hardware to use for accelerated computing. At this point, you should choose the stronger graphics card based on your knowledge of your hardware's performance. When using a laptop with an 11th generation Intel processor or newer processors, selecting the Intel IGP may yield unexpected benefits.\r\n\r\n# How to use\r\n\r\nPrepare your data, which should be a 2D numpy array. The first dimension represents the size of the data, and the second dimension represents the length of the vectors to be calculated. Your data consists of two arrays, one as the source and the other as the target. The vector length of the source data must be equal to that of the target data. If they are not equal, you need to align them, and there are many methods for alignment, such as upsampling and downsampling. \r\n\r\nThen, simply call the **cuda_dtw** or **opencl_dtw** function to compute the DTW Euclidean distance between the two datasets. \r\n\r\nThe returned data is a 2D array, where the first dimension represents the number of source data vectors, and the second dimension represents the number of target data vectors. For instance, the data at position (2,3) corresponds to the DTW Euclidean distance between the second source data vector and the third target data vector.\r\n\r\nAdditionally, a CPU-based computation function is provided to validate the results obtained from the GPU. This CPU function leverages the numba module for parallel acceleration, so you'll need to install the numba module by running \"pip install numba\".\r\n\r\nExamples are available in the unit test script **test.py**\r\n\r\n~~~python\r\nfrom __future__ import absolute_import\r\nfrom __future__ import print_function\r\n\r\nimport numpy\r\nimport time\r\n\r\ntry:\r\n    from GPUDTW import cuda_dtw\r\nexcept:\r\n    pass\r\n\r\ntry:\r\n    from GPUDTW import opencl_dtw\r\nexcept:\r\n    pass\r\n\r\ntry:\r\n    from GPUDTW import cpu_dtw, dtw_1D_jit2\r\nexcept:\r\n    pass\r\n\r\nif __name__ == '__main__':\r\n    S = numpy.random.random ((3,1212))\r\n    S = S.astype(numpy.float32)\r\n    T = numpy.random.random ((1312,1212))\r\n    T = T.astype(numpy.float32)\r\n\r\n    t0 = time.time()\r\n    ret_cpu =cpu_dtw (S, T, dtw_1D_jit2)\r\n    print (\"cpu time\",time.time()-t0)\r\n\r\n    if 'cuda_dtw' in locals():\r\n        t0 = time.time()\r\n        ret_cuda = cuda_dtw (S, T)\r\n        print (\"cuda time:\",time.time()-t0)\r\n        cuda_verify = numpy.sqrt((ret_cuda - ret_cpu)**2)\r\n        print (\"Maximum Deviation in cuda with CPU \", cuda_verify.max())\r\n\r\n    #os.environ['PYOPENCL_CTX'] = '0'\r\n    if 'opencl_dtw' in locals():\r\n        t0 = time.time()\r\n        ret_opencl = opencl_dtw (S, T)\r\n        print (\"OpenCL time:\",time.time()-t0)\r\n        opencl_verify = numpy.sqrt((ret_opencl - ret_cpu)**2)\r\n        print (\"Maximum Deviation in OpenCL with CPU\", opencl_verify.max())\r\n~~~\r\n\r\n# Trouble tip\r\n\r\n1. When you encounter issues with the program not working properly or crashing, you should check if your graphics card driver version is correct, or if it matches the version of the CUDA Toolkit. For related knowledge on NVIDIA, you can refer to this [link](https://developer.nvidia.com/cuda-gpus).\r\n\r\n2. The vector length cannot be excessively large, typically not exceeding around 2500, due to the limitations of the GPU's high-speed local memory (on-chip memory). If the vector length is too large, the computation buffer cannot be accommodated within the GPU's local memory, and the program will directly report an error and exit. Alternatively, one can consider placing the computation buffer in the graphics memory (global memory), but the access speed of graphics memory is significantly slower than local memory, and the advantage of parallel computing will not be as apparent. Additionally, the DTW algorithm is often used for comparing time-series data, and in practical work, it is rare to encounter time-series with lengths exceeding 1000. Therefore, this issue can be temporarily shelved. If you do encounter such a special case, please leave a detailed message on GitHub, explaining your specific work background. If you can convince me, perhaps I will have time to update a new version that can perform calculations on graphics memory.\r\n\r\n# Copyright\r\n\r\n Copyright (C) 2024 Wuhan University of Technology\r\n\r\n Authors: Wang Zihao <qianlkzf@outlook.com> \r\n  \r\n This program is free software: you can redistribute it and/or modify  \r\n it under the terms of the GNU General Public License as published by  \r\n the Free Software Foundation, either version 3 of the License, or  \r\n (at your option) any later version.  \r\n  \r\n This program is distributed in the hope that it will be useful,  \r\n but WITHOUT ANY WARRANTY; without even the implied warranty of  \r\n MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the  \r\n GNU General Public License for more details.  \r\n  \r\n You should have received a copy of the GNU General Public License  \r\n along with this program.  If not, see <https://www.gnu.org/licenses/>.  \r\n \r\n## License  \r\n  \r\nThis project is licensed under the [GNU General Public License v3.0](LICENSE).  \r\n  \r\n## GitHub Repository  \r\n  \r\nYou can find the source code and more information about this project on GitHub at:  \r\n  \r\n[<img src=\"https://img.shields.io/badge/GitHub-Repo-blue?logo=github\">](https://github.com/qianlikzf/GPUDTW)  \r\n  \r\nOr visit directly:  \r\n  \r\n[https://github.com/qianlikzf/GPUDTW](https://github.com/qianlikzf/GPUDTW)\r\n\r\n",
    "bugtrack_url": null,
    "license": "GNU GENERAL PUBLIC LICENSE Version 3",
    "summary": "dynamic time warping (DTW) by GPU accelerated",
    "version": "0.13",
    "project_urls": {
        "Homepage": "https://github.com/qianlikzf/GPUDTW"
    },
    "split_keywords": [
        "dtw",
        "gpu",
        "opencl",
        "cuda"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5afc17f81ced11c5b8f3641f1f49ad9604e25458d7c7a5a383bcd377628c5bd8",
                "md5": "dd658ee243dca6265b2f4c7ca28d4f4f",
                "sha256": "a31658e012b20c8b97a8c18ab558422c62a705df70ea371515298979a162c011"
            },
            "downloads": -1,
            "filename": "GPUDTW-0.13-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dd658ee243dca6265b2f4c7ca28d4f4f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 518349,
            "upload_time": "2024-03-08T14:51:57",
            "upload_time_iso_8601": "2024-03-08T14:51:57.940937Z",
            "url": "https://files.pythonhosted.org/packages/5a/fc/17f81ced11c5b8f3641f1f49ad9604e25458d7c7a5a383bcd377628c5bd8/GPUDTW-0.13-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6bbe2c677b3a7fe533fe2d9b8bffc6d8bcd9bccda29199d666a6ffb34ff6817d",
                "md5": "fd2fd5a58df394939d90b638e00c335a",
                "sha256": "0bb2e9b9fe7a43fe0e0f63874bf6571b425359ef939b54bb8f66cdee60f9c6e9"
            },
            "downloads": -1,
            "filename": "GPUDTW-0.13.tar.gz",
            "has_sig": false,
            "md5_digest": "fd2fd5a58df394939d90b638e00c335a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 509610,
            "upload_time": "2024-03-08T14:52:01",
            "upload_time_iso_8601": "2024-03-08T14:52:01.391691Z",
            "url": "https://files.pythonhosted.org/packages/6b/be/2c677b3a7fe533fe2d9b8bffc6d8bcd9bccda29199d666a6ffb34ff6817d/GPUDTW-0.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-08 14:52:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "qianlikzf",
    "github_project": "GPUDTW",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "gpudtw"
}

Wang Zihao