mlstm-kernels

Name	mlstm-kernels JSON
Version	2.0.1 JSON
	download
home_page	None
Summary	A library providing fast and efficient mLSTM kernels for the xLSTM.
upload_time	2025-07-29 05:34:25
maintainer	None
docs_url	None
author	None
requires_python	>=3.11
license	NXAI COMMUNITY LICENSE AGREEMENT Preamble 1 We are proud to present the NXAI xLSTM 7B model and software, demonstrating the strength of next-generation RNN-based large language models, delivering high-quality performance and fast inference speeds. While xLSTM 7B is freely available for open research and development, we believe that organizations significantly benefiting from our technology should contribute back. Our goal is to support research, small and medium-sized enterprises (SMEs), and open innovation, while ensuring that large enterprises who incorporate xLSTM 7B into commercial products or services fairly compensate the creators for their research and development efforts. Linz, December 12, 2024. Preamble 2 The NXAI COMMUNITY LICENSE AGREEMENT is based on the META LLAMA 3 COMMUNITY LICENSE AGREEMENT and contains some modifications, especially Section 2, “Additional Commercial Terms” is different. “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the NXAI Materials set forth herein. “Documentation” means the specifications, manuals and documentation accompanying NXAI Materials distributed by NXAI at https://github.com/NX-AI/. “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. “NXAI Materials” means, collectively, NXAI’s proprietary large language models, algorithms and any Software, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and all other work of NXAI in the field of neural networks, Documentation (and any portion thereof) made available under this Agreement. “NXAI” or “we” means NXAI GmbH, Linz, Austria. By using or distributing any portion or element of the NXAI Materials, you agree to be bound by this Agreement. 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under NXAI’s intellectual property embodied in the NXAI Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the NXAI Materials. b. Redistribution and Use. i. If you distribute or make available the NXAI Materials (or any derivative works thereof), or a product or service that uses any of them, including another AI model, you shall (A) provide a copy of this Agreement with any such NXAI Materials; and (B) prominently display “Built with technology from NXAI” on a related website, user interface, blogpost, about page, or product documentation. ii. If you receive NXAI Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the NXAI Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “This product includes materials developed at NXAI that are licensed under the NXAI Community License, Copyright © NXAI GmbH, All Rights Reserved.” 2. Additional Commercial Terms. If (a) the Licensee, on a consolidated basis (including parent, subsidiaries, and affiliates), exceeds the annual revenue of one hundred million Euros (€100,000,000), and (b) the Licensee incorporates NXAI Material, in whole or in part, into a Commercial Product or Service, then the Licensee must obtain a commercial license from NXAI, which NXAI may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until NXAI otherwise expressly grants you such rights 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE NXAI MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND NXAI DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE NXAI MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE NXAI MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL NXAI OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF NXAI OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the NXAI Materials, neither NXAI nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the NXAI Materials or as set forth in this Section 5(a). NXAI hereby grants you a license to use “NXAI” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. All goodwill arising out of your use of the Mark will insure to the benefit of NXAI. b. Subject to NXAI’s ownership of NXAI Materials and derivatives made by or for NXAI, with respect to any derivative works and modifications of the NXAI Materials that are made by you, as between you and NXAI, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against NXAI or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the NXAI Materials or models released by NXAI outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless NXAI from and against any claim by any third party arising out of or related to your use or distribution of the NXAI Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the NXAI Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. NXAI may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the NXAI Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement shall be governed by and construed in accordance with the laws of the Republic of Austria, without regard to its conflict of laws principles. The courts located in Linz, Austria shall have exclusive jurisdiction over any disputes arising out of or in connection with this Agreement. ==================================================================================================== This product includes software licensed under the MIT License: MIT License Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ==================================================================================================== This product includes software licensed under the BSD-3-Clause License. BSD 3-Clause License Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
keywords	mlstm xlstm lstm transformer machine learning deep learning state space models
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Tiled Flash Linear Attention - mLSTM Kernels

<img src="./res/Figure_1-7.svg" width="350px" alt="xLSTM Figure 1"> <img src="./res/Figure 2 - paper.svg" width="400px" alt="xLSTM Figure 1">

>Paper: [https://arxiv.org/abs/2503.14376](https://arxiv.org/abs/2503.14376)
>
>Authors: Maximilian Beck, Korbinian Pöppel, Phillip Lippe, Sepp Hochreiter


## About
This library provides fast and efficient mLSTM training and inference Triton kernels.
The chunkwise-parallel mLSTM Kernels are built on Tiled Flash Linear Attention (TFLA).

This repository also contains an easy to extend library for any kind of runtime benchmarks, which we use to benchmark our mLSTM kernels, as well as full mLSTM Huggingface models.

## mLSTM Kernel Library Overview

At its core the mLSTM Kernel library contains several implementations of the mLSTM in JAX, PyTorch as well as kernels in Triton,
which build three toplevel modules within the `mlstm_kernels` library:

- `jax`: Contains JAX native implementations of the mLSTM, as well as JAX Triton integrations.
- `torch`: Contains PyTorch native implementations of the mLSTM, as well the Triton integrations for PyTorch. It also contains the configurable PyTorch backend module for simple integration of the mLSTM kernels into your models (see below for further details).
- `triton`: Contains the Triton kernels for the mLSTM, as well as kernel launch parameter heuristics.

The `utils` module contains code for unit tests, additional analysis (such as the transfer behavior analysis from the TFLA paper) or the benchmark library, which is discussed in detail below.

Each of the three toplevel modules, contains three different types of implementations and kernels for the mLSTM:

- `chunkwise`: Chunkwise kernels, that process chunks of the sequence in parallel. These include the TFLA kernels.
- `parallel`: Parallel kernels that process a sequence in parallel (like Attention). Overall the runtime of these kernels scales quadratically with sequence length.
- `recurrent`: Recurrent step kernels for text generation during inference.

## Benchmark of TFLA mLSTM kernels

Runtime comparison of mLSTM chunkwise kernels against other baselines on a NVIDA H100 GPU with a constant number of tokens.
This means that as we increase the sequence length on the x-axis we proportionally decrease the batch size to keep the overall number of tokens constant. This is the same setup as for example in FlashAttention 3.

![Kernel Benchmark](./res/plot_tfla_mlstm_kernel_benchmark--paper-rerun.svg)

**Left**: Forward pass
**Right**: Forward and backward pass

### Kernel description

We benchmark the two mLSTM versions: mLSTM with exponential input gate (mLSTMexp) and mLSTM with sigmoid input gate (mLSTMsig)

- **mLSTMexp (limit chunk)**: mLSTMexp kernel with limited chunk size (`chunk_size=64`).
- **mLSTMexp (TFLA XL chunk)**: mLSTMexp TFLA kernel with unlimited chunk size (in this benchmark `chunk_size=128`)
- **mLSTMsig (TFLA XL chunk)**: mLSTMsig TFLA kernel with unlimited chunk size (in this benchmark `chunk_size=128`)

> In the following `limit_chunk` means chunkwise kernels that are limited in chunk_size and `xl_chunk` means TFLA kernels.

For more details we refer to the TFLA paper.


## Installation

You can find the conda environment file in the `envs/` folder. We recommend to use the latest file, i.e. `environment_pt251cu124.yaml`

Then you can install the mLSTM kernels via pip: `pip install mlstm_kernels`
or by cloning the repository.


## How to use and integrate our mLSTM kernels

In this library we proivide PyTorch, JAX and Triton implementations of the mLSTM.
For the Triton kernels, we provide wrappers in PyTorch and JAX.

There are two options to use our implementations and kernels:

### Option 1 (Recommended): Use via backend module
This is the recommended option, if you want to use our mLSTM kernels in your own (language) model.
The backend module is implemented in `mlstm_kernels/torch/backend_module.py` and provides a configurable wrapper around all our mLSTM implementations and kernels.

>Note: This is also how these kernels are implemented in our official implementation for the xLSTM 7B model (see [xLSTM 7B model.py](https://github.com/NX-AI/xlstm/blob/main/xlstm/xlstm_large/model.py))

It allows to switch between training and inference mode and automatically selects the respective kernels.

For example the following code snippet configures the `mLSTMBackend` to use our TFLA mLSTMexp kernel:

```python
# we use the mLSTMexp TFLA kernel
# we also configure to use the triton step kernel for inference
mlstm_backend_config = mLSTMBackendConfig(
    chunkwise_kernel="chunkwise--triton_xl_chunk",
    sequence_kernel="native_sequence__triton",
    step_kernel="triton",
    chunk_size=256,
    return_last_states=False,
)

mlstm_backend = mLSTMBackend(mlstm_backend_config)

# run the backend
DEVICE = torch.device("cuda")
DTYPE = torch.bfloat16
B = 2
S = 512
DHQK = 128
DHHV = 256
NH = 4

# create input tensors
torch.manual_seed(1)
matQ = torch.randn((B, NH, S, DHQK), dtype=DTYPE, device=DEVICE)
matK = torch.randn((B, NH, S, DHQK), dtype=DTYPE, device=DEVICE)
matV = torch.randn((B, NH, S, DHHV), dtype=DTYPE, device=DEVICE)
vecI = torch.randn((B, NH, S), dtype=DTYPE, device=DEVICE)
vecF = 3.0 + torch.randn((B, NH, S), dtype=DTYPE, device=DEVICE)

matH = mlstm_backend(q=matQ, k=matK, v=matV, i=vecI, f=vecF)
```

**Quickstart**: Have a look at the demo notebook `demo/integrate_mlstm_via_backend_module_option1.ipynb`.


### Option 2: Direct import

If you directly want to use a specific kernel you can directly import the kernel from the respective module.
The following code snippet import the TFLA mLSTMexp kernel and runs a forward pass.

```python
import torch
# directly import mLSTMexp TFLA kernel
from mlstm_kernels.torch.chunkwise.triton_xl_chunk import mlstm_chunkwise__xl_chunk

# run the kernel
DEVICE = torch.device("cuda")
DTYPE = torch.bfloat16
B = 2
S = 512
DHQK = 128
DHHV = 256
NH = 4

torch.manual_seed(1)
matQ = torch.randn((B, NH, S, DHQK), dtype=DTYPE, device=DEVICE)
matK = torch.randn((B, NH, S, DHQK), dtype=DTYPE, device=DEVICE)
matV = torch.randn((B, NH, S, DHHV), dtype=DTYPE, device=DEVICE)
vecI = torch.randn((B, NH, S), dtype=DTYPE, device=DEVICE)
vecF = 3.0 + torch.randn((B, NH, S), dtype=DTYPE, device=DEVICE)

matH1 = mlstm_chunkwise__xl_chunk(
    q=matQ, k=matK, v=matV, i=vecI, f=vecF, return_last_states=False, chunk_size=256
)
```

### Option 3: Select the kernel via the kernel specifier

You can also get a specific kernel function via its kernel specifier.

First, display all available kernels via `get_available_mlstm_kernels()`.
This displays all kernels that can be used for training and that have a similar function signature such that they can be used interchangably.

```python
# display all available mlstm chunkwise and parallel kernels
from mlstm_kernels.torch import get_available_mlstm_kernels

get_available_mlstm_kernels()
```
```
['chunkwise--native_autograd',
 'chunkwise--native_custbw',
 'chunkwise--triton_limit_chunk',
 'chunkwise--triton_xl_chunk',
 'chunkwise--triton_xl_chunk_siging',
 'parallel--native_autograd',
 'parallel--native_custbw',
 'parallel--native_stablef_autograd',
 'parallel--native_stablef_custbw',
 'parallel--triton_limit_headdim',
 'parallel--native_siging_autograd',
 'parallel--native_siging_custbw']
```

Then select a kernel via `get_mlstm_kernel()`:

```python
# select the kernel
from mlstm_kernels.torch import get_mlstm_kernel

mlstm_chunkwise_xl_chunk = get_mlstm_kernel("chunkwise--triton_xl_chunk")

matH2 = mlstm_chunkwise_xl_chunk(
    q=matQ, k=matK, v=matV, i=vecI, f=vecF, return_last_states=False, chunk_size=256
)

torch.allclose(matH1, matH2, atol=1e-3, rtol=1e-3) # True
```

**Quickstart for option 2 and 3**: Have a look at the demo notebook `demo/integrate_mlstm_via_direct_import_option2and3.ipynb`.



### Using the JAX wrappers

The JAX module `mlstm_kernels.jax` mirrors the PyTorch module `mlstm_kernels.torch` and can be used in the way as the PyTorch kernels with option 2.

<!-- We also aim provide a backend module for Flax soon. -->

## Benchmark Library

The module `mlstm_kernels.utils.benchmark` contains a configurable benchmark library for benchmarking the runtime and GPU memory usage of kernels or models.
We use this library for all our benchmarks in the TFLA paper and the xLSTM 7B paper.

### Overview

**Step 1:** To begin please have a look at `mlstm_kernels/utils/benchmark/benchmarks/interface.py`

At the core of the benchmark library, there is the `BenchmarkInterface` dataclass, which is the abstract base class that every new benchmark should inherit from.
The `BenchmarkInterface` dataclass holds generic benchmark parameters, defines the `setup_benchmark` function that must be overridden for every specific benchmark and also defines the function to benchmark `benchmark_fn`, which is the function that is benchmarked.
To run the benchmark the `BenchmarkInterface` has the method `run_benchmark`.

The `BenchmarkCreator` defines the benchmark collection, i.e. the collection of benchmarks that can be run and configured together via a single config.
To create a new benchmark collection, with several benchmarks one has to implement a new `BenchmarkCreator`.
This is a function that takes as input a `KernelSpec` dataclass (containing the specification for the benchmark class) and a parameter dict with overrides. It then creates and returns the specified benchmark.

**Step 2:** Next have a look at `mlstm_kernels/utils/benchmark/param_handling.py` in order to understand how the benchmarks are configured through a unified config.

We use the dataclass `KernelSpec` to provide a unified interface to our kernel benchmarks. The `kernel_name` must be a unique specifier within a benchmark collection. The `additional_params` field are parameters that are overriden in the respective `BenchmarkInterface` class.

One level above is the `BenchmarkConfig` dataclass. This config class enables to configure sweeps over multiple `KernelSpec` dataclasses.

**Step 3:** Finally, have a look at `mlstm_kernels/utils/benchmark/run_benchmark.py` and a corresponding benchmark script, e.g. `scripts/run_training_kernel_benchmarks.py`.

The "benchmark loops" are implemented in `run_benchmark.py`. These take as input a `BenchmarkConfig` and a `BenchmarkCreator` and run every benchmark member specified in the kernel specs with every parameter combination.

The `run_and_record_benchmarks()` functions executes these loops, and records the results to disk via .csv files and plots.

Finally, in our case we create scripts that collect several configured benchmarks, which we can then run via different arguments, see for e.g. `scripts/run_training_kernel_benchmarks.py`.

You should now be able to understand the structure of our benchmark suites, i.e. collections of benchmarks that are run together.
In this repository we create several benchmark suites, for example the kernel benchmarks for the TFLA paper or the model benchmarks for the xLSTM 7B paper.
These are implemented in `mlstm_kernels/utils/benchmark/benchmarks/training_kernel_benchmarks.py` and `mlstm_kernels/utils/benchmark/benchmarks/huggingface_model_benchmark.py`, respectively.

**Quickstart:** For a quick start please have a look at the demo notebook: `demo/kernel_speed_benchmark.ipynb`.

### Running kernel benchmarks

The following command runs the mLSTM kernels from the figure above.
Note that you need a large GPU memory in order to fit the long sequences and large embedding dimension of 4096 for a 7B model.

``` bash
PYTHONPATH=. python scripts/run_training_kernel_benchmarks.py --consttoken_benchmark mlstm_triton --folder_suffix "mlstm_bench" --num_heads 16 --half_qkdim 1
```

It will create a new subfolder in `outputs_kernel_benchmarks/` that contains the results.

## Running the unit tests

The unit tests cross-check the different kernel implementations on numerical deviations for different dtypes.
You can run all of them with the following command:

```bash
pytest -s tests/torch
# make sure you are in a JAX GPU environment
pytest -s tests/jax
```

The `-s` disables the log capturing so you see the results directly on the command line.
Each test will log the outputs to a new folder with the timestamp as name in the `test_outputs/` directory.

Note: The the JAX tests were only tested on NVIDIA H100 GPUs.

## Citation

Please cite our papers if you use this codebase, or otherwise find our work valuable:

```
@article{beck:25tfla,
  title        = {{Tiled Flash Linear Attention}: More Efficient Linear RNN and xLSTM Kernels},
  author       = {Maximilian Beck and Korbinian Pöppel and Phillip Lippe and Sepp Hochreiter},
  year         = {2025},
  volume       = {2503.14376},
  journal      = {arXiv},
  primaryclass = {cs.LG},
  url          = {https://arxiv.org/abs/2503.14376}
}

@article{beck:25xlstm7b,
  title        = {{xLSTM 7B}: A Recurrent LLM for Fast and Efficient Inference},
  author       = {Maximilian Beck and Korbinian Pöppel and Phillip Lippe and Richard Kurle and Patrick M. Blies and Günter Klambauer and Sebastian Böck and Sepp Hochreiter},
  year         = {2025},
  volume       = {2503.13427},
  journal      = {arXiv},
  primaryclass = {cs.LG},
  url          = {https://arxiv.org/abs/2503.13427}
}

@inproceedings{beck:24xlstm,
      title={xLSTM: Extended Long Short-Term Memory},
      author={Maximilian Beck and Korbinian Pöppel and Markus Spanring and Andreas Auer and Oleksandra Prudnikova and Michael Kopp and Günter Klambauer and Johannes Brandstetter and Sepp Hochreiter},
      booktitle = {Thirty-eighth Conference on Neural Information Processing Systems},
      year={2024},
      url={https://arxiv.org/abs/2405.04517},
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mlstm-kernels",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "mLSTM, xLSTM, LSTM, Transformer, Machine Learning, Deep Learning, State Space Models",
    "author": null,
    "author_email": "Maximilian Beck <beck@ml.jku.at>, Korbinian Poeppel <poeppel@ml.jku.at>, Phillip Lippe <phillip.lippe@gmail.com>, Sebastian Boeck <sebastian.boeck@nx-ai.com>",
    "download_url": "https://files.pythonhosted.org/packages/85/85/e40077464ed57e46cec32a0f988f6bcd986fd1a3cd85055f02688e9df715/mlstm_kernels-2.0.1.tar.gz",
    "platform": null,
    "description": "# Tiled Flash Linear Attention - mLSTM Kernels\n\n<img src=\"./res/Figure_1-7.svg\" width=\"350px\" alt=\"xLSTM Figure 1\"> <img src=\"./res/Figure 2 - paper.svg\" width=\"400px\" alt=\"xLSTM Figure 1\">\n\n>Paper: [https://arxiv.org/abs/2503.14376](https://arxiv.org/abs/2503.14376)\n>\n>Authors: Maximilian Beck, Korbinian P\u00f6ppel, Phillip Lippe, Sepp Hochreiter\n\n\n## About\nThis library provides fast and efficient mLSTM training and inference Triton kernels.\nThe chunkwise-parallel mLSTM Kernels are built on Tiled Flash Linear Attention (TFLA).\n\nThis repository also contains an easy to extend library for any kind of runtime benchmarks, which we use to benchmark our mLSTM kernels, as well as full mLSTM Huggingface models.\n\n## mLSTM Kernel Library Overview\n\nAt its core the mLSTM Kernel library contains several implementations of the mLSTM in JAX, PyTorch as well as kernels in Triton,\nwhich build three toplevel modules within the `mlstm_kernels` library:\n\n- `jax`: Contains JAX native implementations of the mLSTM, as well as JAX Triton integrations.\n- `torch`: Contains PyTorch native implementations of the mLSTM, as well the Triton integrations for PyTorch. It also contains the configurable PyTorch backend module for simple integration of the mLSTM kernels into your models (see below for further details).\n- `triton`: Contains the Triton kernels for the mLSTM, as well as kernel launch parameter heuristics.\n\nThe `utils` module contains code for unit tests, additional analysis (such as the transfer behavior analysis from the TFLA paper) or the benchmark library, which is discussed in detail below.\n\nEach of the three toplevel modules, contains three different types of implementations and kernels for the mLSTM:\n\n- `chunkwise`: Chunkwise kernels, that process chunks of the sequence in parallel. These include the TFLA kernels.\n- `parallel`: Parallel kernels that process a sequence in parallel (like Attention). Overall the runtime of these kernels scales quadratically with sequence length.\n- `recurrent`: Recurrent step kernels for text generation during inference.\n\n## Benchmark of TFLA mLSTM kernels\n\nRuntime comparison of mLSTM chunkwise kernels against other baselines on a NVIDA H100 GPU with a constant number of tokens.\nThis means that as we increase the sequence length on the x-axis we proportionally decrease the batch size to keep the overall number of tokens constant. This is the same setup as for example in FlashAttention 3.\n\n![Kernel Benchmark](./res/plot_tfla_mlstm_kernel_benchmark--paper-rerun.svg)\n\n**Left**: Forward pass\n**Right**: Forward and backward pass\n\n### Kernel description\n\nWe benchmark the two mLSTM versions: mLSTM with exponential input gate (mLSTMexp) and mLSTM with sigmoid input gate (mLSTMsig)\n\n- **mLSTMexp (limit chunk)**: mLSTMexp kernel with limited chunk size (`chunk_size=64`).\n- **mLSTMexp (TFLA XL chunk)**: mLSTMexp TFLA kernel with unlimited chunk size (in this benchmark `chunk_size=128`)\n- **mLSTMsig (TFLA XL chunk)**: mLSTMsig TFLA kernel with unlimited chunk size (in this benchmark `chunk_size=128`)\n\n> In the following `limit_chunk` means chunkwise kernels that are limited in chunk_size and `xl_chunk` means TFLA kernels.\n\nFor more details we refer to the TFLA paper.\n\n\n## Installation\n\nYou can find the conda environment file in the `envs/` folder. We recommend to use the latest file, i.e. `environment_pt251cu124.yaml`\n\nThen you can install the mLSTM kernels via pip: `pip install mlstm_kernels`\nor by cloning the repository.\n\n\n## How to use and integrate our mLSTM kernels\n\nIn this library we proivide PyTorch, JAX and Triton implementations of the mLSTM.\nFor the Triton kernels, we provide wrappers in PyTorch and JAX.\n\nThere are two options to use our implementations and kernels:\n\n### Option 1 (Recommended): Use via backend module\nThis is the recommended option, if you want to use our mLSTM kernels in your own (language) model.\nThe backend module is implemented in `mlstm_kernels/torch/backend_module.py` and provides a configurable wrapper around all our mLSTM implementations and kernels.\n\n>Note: This is also how these kernels are implemented in our official implementation for the xLSTM 7B model (see [xLSTM 7B model.py](https://github.com/NX-AI/xlstm/blob/main/xlstm/xlstm_large/model.py))\n\nIt allows to switch between training and inference mode and automatically selects the respective kernels.\n\nFor example the following code snippet configures the `mLSTMBackend` to use our TFLA mLSTMexp kernel:\n\n```python\n# we use the mLSTMexp TFLA kernel\n# we also configure to use the triton step kernel for inference\nmlstm_backend_config = mLSTMBackendConfig(\n    chunkwise_kernel=\"chunkwise--triton_xl_chunk\",\n    sequence_kernel=\"native_sequence__triton\",\n    step_kernel=\"triton\",\n    chunk_size=256,\n    return_last_states=False,\n)\n\nmlstm_backend = mLSTMBackend(mlstm_backend_config)\n\n# run the backend\nDEVICE = torch.device(\"cuda\")\nDTYPE = torch.bfloat16\nB = 2\nS = 512\nDHQK = 128\nDHHV = 256\nNH = 4\n\n# create input tensors\ntorch.manual_seed(1)\nmatQ = torch.randn((B, NH, S, DHQK), dtype=DTYPE, device=DEVICE)\nmatK = torch.randn((B, NH, S, DHQK), dtype=DTYPE, device=DEVICE)\nmatV = torch.randn((B, NH, S, DHHV), dtype=DTYPE, device=DEVICE)\nvecI = torch.randn((B, NH, S), dtype=DTYPE, device=DEVICE)\nvecF = 3.0 + torch.randn((B, NH, S), dtype=DTYPE, device=DEVICE)\n\nmatH = mlstm_backend(q=matQ, k=matK, v=matV, i=vecI, f=vecF)\n```\n\n**Quickstart**: Have a look at the demo notebook `demo/integrate_mlstm_via_backend_module_option1.ipynb`.\n\n\n### Option 2: Direct import\n\nIf you directly want to use a specific kernel you can directly import the kernel from the respective module.\nThe following code snippet import the TFLA mLSTMexp kernel and runs a forward pass.\n\n```python\nimport torch\n# directly import mLSTMexp TFLA kernel\nfrom mlstm_kernels.torch.chunkwise.triton_xl_chunk import mlstm_chunkwise__xl_chunk\n\n# run the kernel\nDEVICE = torch.device(\"cuda\")\nDTYPE = torch.bfloat16\nB = 2\nS = 512\nDHQK = 128\nDHHV = 256\nNH = 4\n\ntorch.manual_seed(1)\nmatQ = torch.randn((B, NH, S, DHQK), dtype=DTYPE, device=DEVICE)\nmatK = torch.randn((B, NH, S, DHQK), dtype=DTYPE, device=DEVICE)\nmatV = torch.randn((B, NH, S, DHHV), dtype=DTYPE, device=DEVICE)\nvecI = torch.randn((B, NH, S), dtype=DTYPE, device=DEVICE)\nvecF = 3.0 + torch.randn((B, NH, S), dtype=DTYPE, device=DEVICE)\n\nmatH1 = mlstm_chunkwise__xl_chunk(\n    q=matQ, k=matK, v=matV, i=vecI, f=vecF, return_last_states=False, chunk_size=256\n)\n```\n\n### Option 3: Select the kernel via the kernel specifier\n\nYou can also get a specific kernel function via its kernel specifier.\n\nFirst, display all available kernels via `get_available_mlstm_kernels()`.\nThis displays all kernels that can be used for training and that have a similar function signature such that they can be used interchangably.\n\n```python\n# display all available mlstm chunkwise and parallel kernels\nfrom mlstm_kernels.torch import get_available_mlstm_kernels\n\nget_available_mlstm_kernels()\n```\n```\n['chunkwise--native_autograd',\n 'chunkwise--native_custbw',\n 'chunkwise--triton_limit_chunk',\n 'chunkwise--triton_xl_chunk',\n 'chunkwise--triton_xl_chunk_siging',\n 'parallel--native_autograd',\n 'parallel--native_custbw',\n 'parallel--native_stablef_autograd',\n 'parallel--native_stablef_custbw',\n 'parallel--triton_limit_headdim',\n 'parallel--native_siging_autograd',\n 'parallel--native_siging_custbw']\n```\n\nThen select a kernel via `get_mlstm_kernel()`:\n\n```python\n# select the kernel\nfrom mlstm_kernels.torch import get_mlstm_kernel\n\nmlstm_chunkwise_xl_chunk = get_mlstm_kernel(\"chunkwise--triton_xl_chunk\")\n\nmatH2 = mlstm_chunkwise_xl_chunk(\n    q=matQ, k=matK, v=matV, i=vecI, f=vecF, return_last_states=False, chunk_size=256\n)\n\ntorch.allclose(matH1, matH2, atol=1e-3, rtol=1e-3) # True\n```\n\n**Quickstart for option 2 and 3**: Have a look at the demo notebook `demo/integrate_mlstm_via_direct_import_option2and3.ipynb`.\n\n\n\n### Using the JAX wrappers\n\nThe JAX module `mlstm_kernels.jax` mirrors the PyTorch module `mlstm_kernels.torch` and can be used in the way as the PyTorch kernels with option 2.\n\n<!-- We also aim provide a backend module for Flax soon. -->\n\n## Benchmark Library\n\nThe module `mlstm_kernels.utils.benchmark` contains a configurable benchmark library for benchmarking the runtime and GPU memory usage of kernels or models.\nWe use this library for all our benchmarks in the TFLA paper and the xLSTM 7B paper.\n\n### Overview\n\n**Step 1:** To begin please have a look at `mlstm_kernels/utils/benchmark/benchmarks/interface.py`\n\nAt the core of the benchmark library, there is the `BenchmarkInterface` dataclass, which is the abstract base class that every new benchmark should inherit from.\nThe `BenchmarkInterface` dataclass holds generic benchmark parameters, defines the `setup_benchmark` function that must be overridden for every specific benchmark and also defines the function to benchmark `benchmark_fn`, which is the function that is benchmarked.\nTo run the benchmark the `BenchmarkInterface` has the method `run_benchmark`.\n\nThe `BenchmarkCreator` defines the benchmark collection, i.e. the collection of benchmarks that can be run and configured together via a single config.\nTo create a new benchmark collection, with several benchmarks one has to implement a new `BenchmarkCreator`.\nThis is a function that takes as input a `KernelSpec` dataclass (containing the specification for the benchmark class) and a parameter dict with overrides. It then creates and returns the specified benchmark.\n\n**Step 2:** Next have a look at `mlstm_kernels/utils/benchmark/param_handling.py` in order to understand how the benchmarks are configured through a unified config.\n\nWe use the dataclass `KernelSpec` to provide a unified interface to our kernel benchmarks. The `kernel_name` must be a unique specifier within a benchmark collection. The `additional_params` field are parameters that are overriden in the respective `BenchmarkInterface` class.\n\nOne level above is the `BenchmarkConfig` dataclass. This config class enables to configure sweeps over multiple `KernelSpec` dataclasses.\n\n**Step 3:** Finally, have a look at `mlstm_kernels/utils/benchmark/run_benchmark.py` and a corresponding benchmark script, e.g. `scripts/run_training_kernel_benchmarks.py`.\n\nThe \"benchmark loops\" are implemented in `run_benchmark.py`. These take as input a `BenchmarkConfig` and a `BenchmarkCreator` and run every benchmark member specified in the kernel specs with every parameter combination.\n\nThe `run_and_record_benchmarks()` functions executes these loops, and records the results to disk via .csv files and plots.\n\nFinally, in our case we create scripts that collect several configured benchmarks, which we can then run via different arguments, see for e.g. `scripts/run_training_kernel_benchmarks.py`.\n\nYou should now be able to understand the structure of our benchmark suites, i.e. collections of benchmarks that are run together.\nIn this repository we create several benchmark suites, for example the kernel benchmarks for the TFLA paper or the model benchmarks for the xLSTM 7B paper.\nThese are implemented in `mlstm_kernels/utils/benchmark/benchmarks/training_kernel_benchmarks.py` and `mlstm_kernels/utils/benchmark/benchmarks/huggingface_model_benchmark.py`, respectively.\n\n**Quickstart:** For a quick start please have a look at the demo notebook: `demo/kernel_speed_benchmark.ipynb`.\n\n### Running kernel benchmarks\n\nThe following command runs the mLSTM kernels from the figure above.\nNote that you need a large GPU memory in order to fit the long sequences and large embedding dimension of 4096 for a 7B model.\n\n``` bash\nPYTHONPATH=. python scripts/run_training_kernel_benchmarks.py --consttoken_benchmark mlstm_triton --folder_suffix \"mlstm_bench\" --num_heads 16 --half_qkdim 1\n```\n\nIt will create a new subfolder in `outputs_kernel_benchmarks/` that contains the results.\n\n## Running the unit tests\n\nThe unit tests cross-check the different kernel implementations on numerical deviations for different dtypes.\nYou can run all of them with the following command:\n\n```bash\npytest -s tests/torch\n# make sure you are in a JAX GPU environment\npytest -s tests/jax\n```\n\nThe `-s` disables the log capturing so you see the results directly on the command line.\nEach test will log the outputs to a new folder with the timestamp as name in the `test_outputs/` directory.\n\nNote: The the JAX tests were only tested on NVIDIA H100 GPUs.\n\n## Citation\n\nPlease cite our papers if you use this codebase, or otherwise find our work valuable:\n\n```\n@article{beck:25tfla,\n  title        = {{Tiled Flash Linear Attention}: More Efficient Linear RNN and xLSTM Kernels},\n  author       = {Maximilian Beck and Korbinian P\u00f6ppel and Phillip Lippe and Sepp Hochreiter},\n  year         = {2025},\n  volume       = {2503.14376},\n  journal      = {arXiv},\n  primaryclass = {cs.LG},\n  url          = {https://arxiv.org/abs/2503.14376}\n}\n\n@article{beck:25xlstm7b,\n  title        = {{xLSTM 7B}: A Recurrent LLM for Fast and Efficient Inference},\n  author       = {Maximilian Beck and Korbinian P\u00f6ppel and Phillip Lippe and Richard Kurle and Patrick M. Blies and G\u00fcnter Klambauer and Sebastian B\u00f6ck and Sepp Hochreiter},\n  year         = {2025},\n  volume       = {2503.13427},\n  journal      = {arXiv},\n  primaryclass = {cs.LG},\n  url          = {https://arxiv.org/abs/2503.13427}\n}\n\n@inproceedings{beck:24xlstm,\n      title={xLSTM: Extended Long Short-Term Memory},\n      author={Maximilian Beck and Korbinian P\u00f6ppel and Markus Spanring and Andreas Auer and Oleksandra Prudnikova and Michael Kopp and G\u00fcnter Klambauer and Johannes Brandstetter and Sepp Hochreiter},\n      booktitle = {Thirty-eighth Conference on Neural Information Processing Systems},\n      year={2024},\n      url={https://arxiv.org/abs/2405.04517},\n}\n```\n",
    "bugtrack_url": null,
    "license": "NXAI COMMUNITY LICENSE AGREEMENT\n        \n        Preamble 1\n        We are proud to present the NXAI xLSTM 7B model and software, demonstrating the strength of next-generation RNN-based large language models, delivering high-quality performance and fast inference speeds. While xLSTM 7B is freely available for open research and development, we believe that organizations significantly benefiting from our technology should contribute back. Our goal is to support research, small and medium-sized enterprises (SMEs), and open innovation, while ensuring that large enterprises who incorporate xLSTM 7B into commercial products or services fairly compensate the creators for their research and development efforts.\n        Linz, December 12, 2024.\n        \n        Preamble 2\n        The NXAI COMMUNITY LICENSE AGREEMENT is based on the META LLAMA 3 COMMUNITY LICENSE AGREEMENT and contains some modifications, especially Section 2, \u201cAdditional Commercial Terms\u201d is different.\n        \n        \u201cAgreement\u201d means the terms and conditions for use, reproduction, distribution and modification of the NXAI Materials set forth herein.\n        \u201cDocumentation\u201d means the specifications, manuals and documentation accompanying NXAI Materials distributed by NXAI at https://github.com/NX-AI/.\n        \u201cLicensee\u201d or \u201cyou\u201d means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity\u2019s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.\n        \u201cNXAI Materials\u201d means, collectively, NXAI\u2019s proprietary large language models, algorithms and any Software, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and all other work of NXAI in the field of neural networks, Documentation (and any portion thereof) made available under this Agreement.\n        \u201cNXAI\u201d or \u201cwe\u201d means NXAI GmbH, Linz, Austria.\n        \n        By using or distributing any portion or element of the NXAI Materials, you agree to be bound by this Agreement.\n        \n        1. License Rights and Redistribution.\n        \n            a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under NXAI\u2019s intellectual property embodied in the NXAI Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the NXAI Materials.\n        \n            b. Redistribution and Use.\n        \n                i. If you distribute or make available the NXAI Materials (or any derivative works thereof), or a product or service that uses any of them, including another AI model, you shall (A) provide a copy of this Agreement with any such NXAI Materials; and (B) prominently display \u201cBuilt with technology from NXAI\u201d on a related website, user interface, blogpost, about page, or product documentation.\n        \n                ii. If you receive NXAI Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.\n        \n                iii. You must retain in all copies of the NXAI Materials that you distribute the following attribution notice within a \u201cNotice\u201d text file distributed as a part of such copies: \u201cThis product includes materials developed at NXAI that are licensed under the NXAI Community License, Copyright \u00a9 NXAI GmbH, All Rights Reserved.\u201d\n        \n        2. Additional Commercial Terms. If (a) the Licensee, on a consolidated basis (including parent, subsidiaries, and affiliates), exceeds the annual revenue of one hundred million Euros (\u20ac100,000,000), and (b) the Licensee incorporates NXAI Material, in whole or in part, into a Commercial Product or Service, then the Licensee must obtain a commercial license from NXAI, which NXAI may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until NXAI otherwise expressly grants you such rights\n        \n        3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE NXAI MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN \u201cAS IS\u201d BASIS, WITHOUT WARRANTIES OF ANY KIND, AND NXAI DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE NXAI MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE NXAI MATERIALS AND ANY OUTPUT AND RESULTS.\n        \n        4. Limitation of Liability. IN NO EVENT WILL NXAI OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF NXAI OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.\n        \n        5. Intellectual Property.\n        \n            a. No trademark licenses are granted under this Agreement, and in connection with the NXAI Materials, neither NXAI nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the NXAI Materials or as set forth in this Section 5(a). NXAI hereby grants you a license to use \u201cNXAI\u201d (the \u201cMark\u201d) solely as required to comply with the last sentence of Section 1.b.i. All goodwill arising out of your use of the Mark will insure to the benefit of NXAI.\n        \n            b. Subject to NXAI\u2019s ownership of NXAI Materials and derivatives made by or for NXAI, with respect to any derivative works and modifications of the NXAI Materials that are made by you, as between you and NXAI, you are and will be the owner of such derivative works and modifications.\n        \n        c. If you institute litigation or other proceedings against NXAI or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the NXAI Materials or models released by NXAI outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless NXAI from and against any claim by any third party arising out of or related to your use or distribution of the NXAI Materials.\n        \n        6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the NXAI Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. NXAI may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the NXAI Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.\n        \n        7. Governing Law and Jurisdiction. This Agreement shall be governed by and construed in accordance with the laws of the Republic of Austria, without regard to its conflict of laws principles. The courts located in Linz, Austria shall have exclusive jurisdiction over any disputes arising out of or in connection with this Agreement.\n        \n        ====================================================================================================\n        \n        This product includes software licensed under the MIT License:\n        \n        MIT License\n        \n        Permission is hereby granted, free of charge, to any person obtaining\n        a copy of this software and associated documentation files\n        (the \"Software\"), to deal in the Software without restriction,\n        including without limitation the rights to use, copy, modify, merge,\n        publish, distribute, sublicense, and/or sell copies of the Software,\n        and to permit persons to whom the Software is furnished to do so,\n        subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be\n        included in all copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n        EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF\n        MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.\n        IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY\n        CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,\n        TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE\n        SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n        \n        ====================================================================================================\n        \n        This product includes software licensed under the BSD-3-Clause License.\n        \n        BSD 3-Clause License\n        \n        Redistribution and use in source and binary forms, with or without\n        modification, are permitted provided that the following conditions are met:\n        \n        * Redistributions of source code must retain the above copyright notice, this\n          list of conditions and the following disclaimer.\n        \n        * Redistributions in binary form must reproduce the above copyright notice,\n          this list of conditions and the following disclaimer in the documentation\n          and/or other materials provided with the distribution.\n        \n        * Neither the name of the copyright holder nor the names of its\n          contributors may be used to endorse or promote products derived from\n          this software without specific prior written permission.\n        \n        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\n        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\n        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\n        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE\n        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\n        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR\n        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\n        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,\n        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\n        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n        ",
    "summary": "A library providing fast and efficient mLSTM kernels for the xLSTM.",
    "version": "2.0.1",
    "project_urls": null,
    "split_keywords": [
        "mlstm",
        " xlstm",
        " lstm",
        " transformer",
        " machine learning",
        " deep learning",
        " state space models"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0879dd51593fa07994b42d6a2692b0dd438fcfbe4ea5f4e28106e43b35aa41b2",
                "md5": "2e4a8c812a1091364716869d95dc5558",
                "sha256": "453f04c36c5ac64479c425164f3da5df3f83960c1d4ec350e36940a519d6df93"
            },
            "downloads": -1,
            "filename": "mlstm_kernels-2.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2e4a8c812a1091364716869d95dc5558",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 349286,
            "upload_time": "2025-07-29T05:34:23",
            "upload_time_iso_8601": "2025-07-29T05:34:23.649939Z",
            "url": "https://files.pythonhosted.org/packages/08/79/dd51593fa07994b42d6a2692b0dd438fcfbe4ea5f4e28106e43b35aa41b2/mlstm_kernels-2.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8585e40077464ed57e46cec32a0f988f6bcd986fd1a3cd85055f02688e9df715",
                "md5": "1b2c37723001a3894220f6fbe6989eaf",
                "sha256": "683c10f5b5108ab21db60ee43a79333fa06757781a80c2c4de7aef2e74c192b4"
            },
            "downloads": -1,
            "filename": "mlstm_kernels-2.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "1b2c37723001a3894220f6fbe6989eaf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 200316,
            "upload_time": "2025-07-29T05:34:25",
            "upload_time_iso_8601": "2025-07-29T05:34:25.573167Z",
            "url": "https://files.pythonhosted.org/packages/85/85/e40077464ed57e46cec32a0f988f6bcd986fd1a3cd85055f02688e9df715/mlstm_kernels-2.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-29 05:34:25",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "mlstm-kernels"
}

None