<div align="center">
<img width="500px" src="https://raw.githubusercontent.com/KernelTuner/kernel_tuner/master/doc/images/KernelTuner-logo.png"/>
</div>
---
[](https://github.com/KernelTuner/kernel_tuner/actions/workflows/test-python-package.yml)
[](https://codecov.io/gh/KernelTuner/kernel_tuner)
[](https://pypi.python.org/pypi/kernel_tuner/)
[](https://zenodo.org/badge/latestdoi/54894320)
[](https://sonarcloud.io/dashboard?id=KernelTuner_kernel_tuner)
[](https://bestpractices.coreinfrastructure.org/projects/6573)
[](https://fair-software.eu)
---
Create optimized GPU applications in any mainstream GPU
programming language (CUDA, HIP, OpenCL, OpenACC, OpenMP).
What Kernel Tuner does:
- Works as an external tool to benchmark and optimize GPU kernels in isolation
- Can be used directly on existing kernel code without extensive changes
- Can be used with applications in any host programming language
- Blazing fast search space construction
- More than 20 [optimization algorithms](https://kerneltuner.github.io/kernel_tuner/stable/optimization.html) to speedup tuning
- Energy measurements and optimizations [(power capping, clock frequency tuning)](https://arxiv.org/abs/2211.07260)
- ... and much more! For example, [caching](https://kerneltuner.github.io/kernel_tuner/stable/cache_files.html), [output verification](https://kerneltuner.github.io/kernel_tuner/stable/correctness.html), [tuning host and device code](https://kerneltuner.github.io/kernel_tuner/stable/hostcode.html), [user defined metrics](https://kerneltuner.github.io/kernel_tuner/stable/metrics.html), see [the full documentation](https://kerneltuner.github.io/kernel_tuner/stable/index.html).
## Installation
- First, make sure you have your [CUDA](https://kerneltuner.github.io/kernel_tuner/stable/install.html#cuda-and-pycuda), [OpenCL](https://kerneltuner.github.io/kernel_tuner/stable/install.html#opencl-and-pyopencl), or [HIP](https://kerneltuner.github.io/kernel_tuner/stable/install.html#hip-and-hip-python) compiler installed
- Then type: `pip install kernel_tuner[cuda]`, `pip install kernel_tuner[opencl]`, or `pip install kernel_tuner[hip]`
- or why not all of them: `pip install kernel_tuner[cuda,opencl,hip]`
More information on installation, also for other languages, in the [installation guide](http://kerneltuner.github.io/kernel_tuner/stable/install.html).
## Example
```python
import numpy as np
from kernel_tuner import tune_kernel
kernel_string = """
__global__ void vector_add(float *c, float *a, float *b, int n) {
int i = blockIdx.x * block_size_x + threadIdx.x;
if (i<n) {
c[i] = a[i] + b[i];
}
}
"""
n = np.int32(10000000)
a = np.random.randn(n).astype(np.float32)
b = np.random.randn(n).astype(np.float32)
c = np.zeros_like(a)
args = [c, a, b, n]
tune_params = {"block_size_x": [32, 64, 128, 256, 512]}
tune_kernel("vector_add", kernel_string, n, args, tune_params)
```
More [examples here](https://kerneltuner.github.io/kernel_tuner/stable/examples.html).
## Resources
- [Full documentation](https://kerneltuner.github.io/kernel_tuner/stable/)
- Guides:
- [Getting Started](https://kerneltuner.github.io/kernel_tuner/stable/quickstart.html)
- [Convolution](https://kerneltuner.github.io/kernel_tuner/stable/convolution.html)
- [Diffusion](https://kerneltuner.github.io/kernel_tuner/stable/diffusion.html)
- [Matrix Multiplication](https://kerneltuner.github.io/kernel_tuner/stable/matrix_multiplication.html)
- Features & Use cases:
- [Full list of examples](https://kerneltuner.github.io/kernel_tuner/stable/examples.html)
- [Output verification](https://kerneltuner.github.io/kernel_tuner/stable/correctness.html)
- [Test GPU code from Python](https://github.com/KernelTuner/kernel_tuner/blob/master/examples/cuda/test_vector_add.py)
- [Tune code in both host and device code](https://kerneltuner.github.io/kernel_tuner/stable/hostcode.html)
- [Optimization algorithms](https://kerneltuner.github.io/kernel_tuner/stable/optimization.html)
- [Mixed-precision & Accuracy tuning](https://github.com/KernelTuner/kernel_tuner/blob/master/examples/cuda/accuracy.py)
- [Custom metrics & tuning objectives](https://kerneltuner.github.io/kernel_tuner/stable/metrics.html)
- **Kernel Tuner Tutorial** slides [[PDF]](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/slides/2022_SURF/SURF22-Kernel-Tuner-Tutorial.pdf), hands-on:
- Vector add example [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/00_Kernel_Tuner_Introduction.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/00_Kernel_Tuner_Introduction.ipynb)
- Tuning thread block dimensions [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/01_Kernel_Tuner_Getting_Started.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/01_Kernel_Tuner_Getting_Started.ipynb)
- Search space restrictions & output verification [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/02_Kernel_Tuner_Intermediate.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/02_Kernel_Tuner_Intermediate.ipynb)
- Visualization & search space optimization [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/03_Kernel_Tuner_Advanced.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/03_Kernel_Tuner_Advanced.ipynb)
- **Energy Efficient GPU Computing** tutorial slides [[PDF]](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/slides/2023_Supercomputing/SC23.pdf), hands-on:
- Kernel Tuner for GPU energy measurements [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/00_Kernel_Tuner_Introduction.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/00_Kernel_Tuner_Introduction.ipynb)
- Code optimizations for energy [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/01_Code_Optimizations_for_Energy.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/01_Code_Optimizations_for_Energy.ipynb)
- Mixed precision and accuracy tuning [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/02_Mixed_precision_programming.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/02_Mixed_precision_programming.ipynb)
- Optimzing for time vs for energy [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/03_energy_efficient_computing.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/03_energy_efficient_computing.ipynb)
## Kernel Tuner ecosystem
<img width="250px" src="https://raw.githubusercontent.com/KernelTuner/kernel_tuner/master/doc/images/kernel_launcher.png"/><br />C++ magic to integrate auto-tuned kernels into C++ applications
<img width="250px" src="https://raw.githubusercontent.com/KernelTuner/kernel_tuner/master/doc/images/kernel_float.png"/><br />C++ data types for mixed-precision CUDA kernel programming
<img width="275px" src="https://raw.githubusercontent.com/KernelTuner/kernel_tuner/master/doc/images/kernel_dashboard.png"/><br />Monitor, analyze, and visualize auto-tuning runs
## Communication & Contribution
- GitHub [Issues](https://github.com/KernelTuner/kernel_tuner/issues): Bug reports, install issues, feature requests, work in progress
- GitHub [Discussion group](https://github.com/orgs/KernelTuner/discussions): General questions, Q&A, thoughts
Contributions are welcome! For feature requests, bug reports, or usage problems, please feel free to create an issue.
For more extensive contributions, check the [contribution guide](http://kerneltuner.github.io/kernel_tuner/stable/contributing.html).
## Citation
If you use Kernel Tuner in research or research software, please cite the most relevant among the [publications on Kernel
Tuner](https://kerneltuner.github.io/kernel_tuner/stable/#citation). To refer to the project as a whole, please cite:
```latex
@article{kerneltuner,
author = {Ben van Werkhoven},
title = {Kernel Tuner: A search-optimizing GPU code auto-tuner},
journal = {Future Generation Computer Systems},
year = {2019},
volume = {90},
pages = {347-358},
url = {https://www.sciencedirect.com/science/article/pii/S0167739X18313359},
doi = {https://doi.org/10.1016/j.future.2018.08.004}
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "kernel-tuner",
"maintainer": null,
"docs_url": null,
"requires_python": "<4,>=3.10",
"maintainer_email": null,
"keywords": "auto-tuning, gpu, computing, pycuda, cuda, pyopencl, opencl",
"author": "Ben van Werkhoven",
"author_email": "b.vanwerkhoven@esciencecenter.nl",
"download_url": "https://files.pythonhosted.org/packages/0e/23/623c2e19cbe757ab0a323fc3221e60bb3bf4e75448729f477a92d85bcba9/kernel_tuner-1.2.0.tar.gz",
"platform": null,
"description": "\n\n<div align=\"center\">\n <img width=\"500px\" src=\"https://raw.githubusercontent.com/KernelTuner/kernel_tuner/master/doc/images/KernelTuner-logo.png\"/>\n</div>\n\n---\n[](https://github.com/KernelTuner/kernel_tuner/actions/workflows/test-python-package.yml)\n[](https://codecov.io/gh/KernelTuner/kernel_tuner)\n[](https://pypi.python.org/pypi/kernel_tuner/)\n[](https://zenodo.org/badge/latestdoi/54894320)\n[](https://sonarcloud.io/dashboard?id=KernelTuner_kernel_tuner)\n[](https://bestpractices.coreinfrastructure.org/projects/6573)\n[](https://fair-software.eu)\n---\n\n\nCreate optimized GPU applications in any mainstream GPU \nprogramming language (CUDA, HIP, OpenCL, OpenACC, OpenMP).\n\nWhat Kernel Tuner does:\n\n- Works as an external tool to benchmark and optimize GPU kernels in isolation\n- Can be used directly on existing kernel code without extensive changes \n- Can be used with applications in any host programming language\n- Blazing fast search space construction\n- More than 20 [optimization algorithms](https://kerneltuner.github.io/kernel_tuner/stable/optimization.html) to speedup tuning\n- Energy measurements and optimizations [(power capping, clock frequency tuning)](https://arxiv.org/abs/2211.07260)\n- ... and much more! For example, [caching](https://kerneltuner.github.io/kernel_tuner/stable/cache_files.html), [output verification](https://kerneltuner.github.io/kernel_tuner/stable/correctness.html), [tuning host and device code](https://kerneltuner.github.io/kernel_tuner/stable/hostcode.html), [user defined metrics](https://kerneltuner.github.io/kernel_tuner/stable/metrics.html), see [the full documentation](https://kerneltuner.github.io/kernel_tuner/stable/index.html).\n\n\n\n## Installation\n\n- First, make sure you have your [CUDA](https://kerneltuner.github.io/kernel_tuner/stable/install.html#cuda-and-pycuda), [OpenCL](https://kerneltuner.github.io/kernel_tuner/stable/install.html#opencl-and-pyopencl), or [HIP](https://kerneltuner.github.io/kernel_tuner/stable/install.html#hip-and-hip-python) compiler installed\n- Then type: `pip install kernel_tuner[cuda]`, `pip install kernel_tuner[opencl]`, or `pip install kernel_tuner[hip]`\n- or why not all of them: `pip install kernel_tuner[cuda,opencl,hip]`\n\nMore information on installation, also for other languages, in the [installation guide](http://kerneltuner.github.io/kernel_tuner/stable/install.html).\n\n## Example\n\n```python\nimport numpy as np\nfrom kernel_tuner import tune_kernel\n\nkernel_string = \"\"\"\n__global__ void vector_add(float *c, float *a, float *b, int n) {\n int i = blockIdx.x * block_size_x + threadIdx.x;\n if (i<n) {\n c[i] = a[i] + b[i];\n }\n}\n\"\"\"\n\nn = np.int32(10000000)\n\na = np.random.randn(n).astype(np.float32)\nb = np.random.randn(n).astype(np.float32)\nc = np.zeros_like(a)\n\nargs = [c, a, b, n]\n\ntune_params = {\"block_size_x\": [32, 64, 128, 256, 512]}\n\ntune_kernel(\"vector_add\", kernel_string, n, args, tune_params)\n```\n\nMore [examples here](https://kerneltuner.github.io/kernel_tuner/stable/examples.html).\n\n## Resources\n\n- [Full documentation](https://kerneltuner.github.io/kernel_tuner/stable/)\n- Guides:\n - [Getting Started](https://kerneltuner.github.io/kernel_tuner/stable/quickstart.html)\n - [Convolution](https://kerneltuner.github.io/kernel_tuner/stable/convolution.html)\n - [Diffusion](https://kerneltuner.github.io/kernel_tuner/stable/diffusion.html)\n - [Matrix Multiplication](https://kerneltuner.github.io/kernel_tuner/stable/matrix_multiplication.html)\n- Features & Use cases:\n - [Full list of examples](https://kerneltuner.github.io/kernel_tuner/stable/examples.html)\n - [Output verification](https://kerneltuner.github.io/kernel_tuner/stable/correctness.html)\n - [Test GPU code from Python](https://github.com/KernelTuner/kernel_tuner/blob/master/examples/cuda/test_vector_add.py)\n - [Tune code in both host and device code](https://kerneltuner.github.io/kernel_tuner/stable/hostcode.html)\n - [Optimization algorithms](https://kerneltuner.github.io/kernel_tuner/stable/optimization.html)\n - [Mixed-precision & Accuracy tuning](https://github.com/KernelTuner/kernel_tuner/blob/master/examples/cuda/accuracy.py)\n - [Custom metrics & tuning objectives](https://kerneltuner.github.io/kernel_tuner/stable/metrics.html)\n- **Kernel Tuner Tutorial** slides [[PDF]](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/slides/2022_SURF/SURF22-Kernel-Tuner-Tutorial.pdf), hands-on:\n - Vector add example [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/00_Kernel_Tuner_Introduction.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/00_Kernel_Tuner_Introduction.ipynb)\n - Tuning thread block dimensions [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/01_Kernel_Tuner_Getting_Started.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/01_Kernel_Tuner_Getting_Started.ipynb)\n - Search space restrictions & output verification [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/02_Kernel_Tuner_Intermediate.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/02_Kernel_Tuner_Intermediate.ipynb)\n - Visualization & search space optimization [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/03_Kernel_Tuner_Advanced.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/03_Kernel_Tuner_Advanced.ipynb)\n- **Energy Efficient GPU Computing** tutorial slides [[PDF]](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/slides/2023_Supercomputing/SC23.pdf), hands-on:\n - Kernel Tuner for GPU energy measurements [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/00_Kernel_Tuner_Introduction.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/00_Kernel_Tuner_Introduction.ipynb)\n - Code optimizations for energy [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/01_Code_Optimizations_for_Energy.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/01_Code_Optimizations_for_Energy.ipynb)\n - Mixed precision and accuracy tuning [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/02_Mixed_precision_programming.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/02_Mixed_precision_programming.ipynb)\n - Optimzing for time vs for energy [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/03_energy_efficient_computing.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/03_energy_efficient_computing.ipynb)\n\n\n## Kernel Tuner ecosystem\n\n<img width=\"250px\" src=\"https://raw.githubusercontent.com/KernelTuner/kernel_tuner/master/doc/images/kernel_launcher.png\"/><br />C++ magic to integrate auto-tuned kernels into C++ applications \n\n<img width=\"250px\" src=\"https://raw.githubusercontent.com/KernelTuner/kernel_tuner/master/doc/images/kernel_float.png\"/><br />C++ data types for mixed-precision CUDA kernel programming\n\n<img width=\"275px\" src=\"https://raw.githubusercontent.com/KernelTuner/kernel_tuner/master/doc/images/kernel_dashboard.png\"/><br />Monitor, analyze, and visualize auto-tuning runs\n\n\n## Communication & Contribution\n\n- GitHub [Issues](https://github.com/KernelTuner/kernel_tuner/issues): Bug reports, install issues, feature requests, work in progress\n- GitHub [Discussion group](https://github.com/orgs/KernelTuner/discussions): General questions, Q&A, thoughts\n\nContributions are welcome! For feature requests, bug reports, or usage problems, please feel free to create an issue.\nFor more extensive contributions, check the [contribution guide](http://kerneltuner.github.io/kernel_tuner/stable/contributing.html).\n\n## Citation\n\nIf you use Kernel Tuner in research or research software, please cite the most relevant among the [publications on Kernel \nTuner](https://kerneltuner.github.io/kernel_tuner/stable/#citation). To refer to the project as a whole, please cite:\n\n```latex\n@article{kerneltuner,\n author = {Ben van Werkhoven},\n title = {Kernel Tuner: A search-optimizing GPU code auto-tuner},\n journal = {Future Generation Computer Systems},\n year = {2019},\n volume = {90},\n pages = {347-358},\n url = {https://www.sciencedirect.com/science/article/pii/S0167739X18313359},\n doi = {https://doi.org/10.1016/j.future.2018.08.004}\n}\n```\n\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "An easy to use CUDA/OpenCL kernel tuner in Python",
"version": "1.2.0",
"project_urls": {
"Documentation": "https://KernelTuner.github.io/kernel_tuner/",
"Homepage": "https://KernelTuner.github.io/kernel_tuner/",
"Repository": "https://github.com/KernelTuner/kernel_tuner",
"changelog": "https://github.com/KernelTuner/kernel_tuner/blob/master/CHANGELOG.md",
"issues": "https://github.com/KernelTuner/kernel_tuner/issues"
},
"split_keywords": [
"auto-tuning",
" gpu",
" computing",
" pycuda",
" cuda",
" pyopencl",
" opencl"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "18f1b108aede496b91858fcd57c7656298a1dffd64fb69f55de2f65e9c651db2",
"md5": "f1db356c09e51f24c628acfa00f21afb",
"sha256": "91cd7ce0ccd0904e600af1e92b20394c0a832af90c6bf01248f41606328d560f"
},
"downloads": -1,
"filename": "kernel_tuner-1.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f1db356c09e51f24c628acfa00f21afb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.10",
"size": 159228,
"upload_time": "2025-07-17T08:24:47",
"upload_time_iso_8601": "2025-07-17T08:24:47.105213Z",
"url": "https://files.pythonhosted.org/packages/18/f1/b108aede496b91858fcd57c7656298a1dffd64fb69f55de2f65e9c651db2/kernel_tuner-1.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0e23623c2e19cbe757ab0a323fc3221e60bb3bf4e75448729f477a92d85bcba9",
"md5": "76fc6cdb6de4de5c1d25424dad008a77",
"sha256": "7d868c36561c090bfc7d83df578edf7b9ead05bee2147e849b56dc8139473cca"
},
"downloads": -1,
"filename": "kernel_tuner-1.2.0.tar.gz",
"has_sig": false,
"md5_digest": "76fc6cdb6de4de5c1d25424dad008a77",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.10",
"size": 163564,
"upload_time": "2025-07-17T08:24:48",
"upload_time_iso_8601": "2025-07-17T08:24:48.738394Z",
"url": "https://files.pythonhosted.org/packages/0e/23/623c2e19cbe757ab0a323fc3221e60bb3bf4e75448729f477a92d85bcba9/kernel_tuner-1.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-17 08:24:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "KernelTuner",
"github_project": "kernel_tuner",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "kernel-tuner"
}