# da4ml: Distributed Arithmetic for Machine Learning
[](https://www.gnu.org/licenses/lgpl-3.0)
[](https://calad0i.github.io/da4ml/)
[](https://badge.fury.io/py/da4ml)
[](https://arxiv.org/abs/2507.04535)
da4ml is a library for implementing distributed arithmetic (DA) based algorithms for ultra-low latency machine learning (ML) applications on FPGAs. It as two major components:
- A fast and performant constant-matrix-vector multiplications (CMVM) optimizer to implement them as
efficient adder trees. Common sub-expressions elimination (CSE) with graph-based pre-optimization are
performed to reduce the firmware footprint and improve the performance.
- Low-level symbolic tracing frameworks for generating combinational/fully pipelined logics in HDL or HLS
code. For fully pipelined networks, da4ml can generate the firmware for the whole network standalone.
Alternatively, da4ml be used as a plugin in hls4ml to optimize the CMVM operations in the network.
Key Features
------------
- **Optimized Algorithms**: Comparing to hls4ml's latency strategy, da4ml's CMVM implementation uses no DSO and consumes up to 50% less LUT usage.
- **Fast code generation**: da4ml can generate HDL for a fully pipelined network in seconds. For the same models, high-level synthesis tools like Vivado/Vitis HLS can take up to days to generate the HDL code.
- **Low-level symbolic tracing**: As long as the operation can be expressed by a combination of the low-level operations supported, adding new operations is straightforward by "replaying" the operation on the symbolic tensor provided. In most cases, adding support for a new operation/layer takes just a few lines of code in numpy flavor.
- **Automatic model conversion**: da4ml can automatically convert models trained in `HGQ2 <https://github.com/calad0i/hgq2>`_.
- **Bit-accurate Simulation**: All operation in da4ml is bit-accurate, meaning the generated HDL code will produce the same output as the original model. da4ml's computation is converted to a RISC-like, instruction set level intermediate representation, distributed arithmetic instruction set (DAIS), which can be easily simulated in multiple ways.
- **hls4ml integration**: da4ml can be used as a plugin in hls4ml to optimize the CMVM operations in the network by setting `strategy='distributed_arithmetic'` for the strategy of the Dense, EinsumDense, or Conv1/2D layers.
Installation
------------
```bash
pip install da4ml
```
Getting Started
---------------
See the [Getting Started](https://calad0i.github.io/da4ml/getting_started.html) guide for a quick introduction to using da4ml.
Raw data
{
"_id": null,
"home_page": null,
"name": "da4ml",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "CMVM, distributed arithmetic, hls4ml, MCM, subexpression elimination",
"author": null,
"author_email": "Chang Sun <chsun@cern.ch>",
"download_url": "https://files.pythonhosted.org/packages/ca/c3/2b49efa3189a2debb73063ffdd4bf77f097824f9e1ae0609fb46f0658c2f/da4ml-0.3.2.tar.gz",
"platform": null,
"description": "# da4ml: Distributed Arithmetic for Machine Learning\n\n[](https://www.gnu.org/licenses/lgpl-3.0)\n[](https://calad0i.github.io/da4ml/)\n[](https://badge.fury.io/py/da4ml)\n[](https://arxiv.org/abs/2507.04535)\n\nda4ml is a library for implementing distributed arithmetic (DA) based algorithms for ultra-low latency machine learning (ML) applications on FPGAs. It as two major components:\n - A fast and performant constant-matrix-vector multiplications (CMVM) optimizer to implement them as\n efficient adder trees. Common sub-expressions elimination (CSE) with graph-based pre-optimization are\n performed to reduce the firmware footprint and improve the performance.\n - Low-level symbolic tracing frameworks for generating combinational/fully pipelined logics in HDL or HLS\n code. For fully pipelined networks, da4ml can generate the firmware for the whole network standalone.\n Alternatively, da4ml be used as a plugin in hls4ml to optimize the CMVM operations in the network.\n\n\nKey Features\n------------\n\n- **Optimized Algorithms**: Comparing to hls4ml's latency strategy, da4ml's CMVM implementation uses no DSO and consumes up to 50% less LUT usage.\n- **Fast code generation**: da4ml can generate HDL for a fully pipelined network in seconds. For the same models, high-level synthesis tools like Vivado/Vitis HLS can take up to days to generate the HDL code.\n- **Low-level symbolic tracing**: As long as the operation can be expressed by a combination of the low-level operations supported, adding new operations is straightforward by \"replaying\" the operation on the symbolic tensor provided. In most cases, adding support for a new operation/layer takes just a few lines of code in numpy flavor.\n- **Automatic model conversion**: da4ml can automatically convert models trained in `HGQ2 <https://github.com/calad0i/hgq2>`_.\n- **Bit-accurate Simulation**: All operation in da4ml is bit-accurate, meaning the generated HDL code will produce the same output as the original model. da4ml's computation is converted to a RISC-like, instruction set level intermediate representation, distributed arithmetic instruction set (DAIS), which can be easily simulated in multiple ways.\n- **hls4ml integration**: da4ml can be used as a plugin in hls4ml to optimize the CMVM operations in the network by setting `strategy='distributed_arithmetic'` for the strategy of the Dense, EinsumDense, or Conv1/2D layers.\n\nInstallation\n------------\n\n```bash\npip install da4ml\n```\n\nGetting Started\n---------------\n\nSee the [Getting Started](https://calad0i.github.io/da4ml/getting_started.html) guide for a quick introduction to using da4ml.\n",
"bugtrack_url": null,
"license": "GNU Lesser General Public License v3 (LGPLv3)",
"summary": "Digital Arithmetic for Machine Learning",
"version": "0.3.2",
"project_urls": {
"repository": "https://github.com/calad0i/da4ml"
},
"split_keywords": [
"cmvm",
" distributed arithmetic",
" hls4ml",
" mcm",
" subexpression elimination"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "0c1dbf934c7ef2a6e57985fbcc57232a5771ad532ca8ac964d47581e951f687a",
"md5": "3cc7f48e00a1228c5656f5617a752f23",
"sha256": "e732c31c9f89d8c90dde0ecd1ec07ee256ec38c7fbacb4a339bb79b22a8fd293"
},
"downloads": -1,
"filename": "da4ml-0.3.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3cc7f48e00a1228c5656f5617a752f23",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 176836,
"upload_time": "2025-08-16T23:29:59",
"upload_time_iso_8601": "2025-08-16T23:29:59.591136Z",
"url": "https://files.pythonhosted.org/packages/0c/1d/bf934c7ef2a6e57985fbcc57232a5771ad532ca8ac964d47581e951f687a/da4ml-0.3.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "cac32b49efa3189a2debb73063ffdd4bf77f097824f9e1ae0609fb46f0658c2f",
"md5": "d9b51b8d7f11da33541cd859030b8a86",
"sha256": "cfd7947b059eb8d012c9607ab348304cae5d4a5bc6d4344719117de6271f2d74"
},
"downloads": -1,
"filename": "da4ml-0.3.2.tar.gz",
"has_sig": false,
"md5_digest": "d9b51b8d7f11da33541cd859030b8a86",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 259990,
"upload_time": "2025-08-16T23:30:01",
"upload_time_iso_8601": "2025-08-16T23:30:01.027008Z",
"url": "https://files.pythonhosted.org/packages/ca/c3/2b49efa3189a2debb73063ffdd4bf77f097824f9e1ae0609fb46f0658c2f/da4ml-0.3.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-16 23:30:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "calad0i",
"github_project": "da4ml",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "da4ml"
}