da4ml


Nameda4ml JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryDigital Arithmetic for Machine Learning
upload_time2025-02-07 23:58:22
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseGNU Lesser General Public License v3 (LGPLv3)
keywords cmvm distributed arithmetic hls4ml mcm subexpression elimination
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # da4ml: Distributed Arithmetic for Machine Learning

This project performs Constant Matrix-Vector Multiplication (CMVM) with Distributed Arithmetic (DA) for Machine Learning (ML) on a Field Programmable Gate Arrays (FPGAs).

CMVM optimization is done through greedy CSE of two-term subexpressions, with possible Delay Constraints (DC). The optimization is done in jitted Python (Numba), and a list of optimized operations is generated as traced Python code.

At the moment, the project only generates Vitis HLS C++ code for the FPGA implementation of the optimized CMVM kernel. HDL code generation is planned for the future. Currently, the major use of this repository is through the `distributed_arithmetic` strategy in the [`hls4ml`](https://github.com/fastmachinelearning/hls4ml/) project.


## Installation

The project is available on PyPI and can be installed with pip:

```bash
pip install da4ml
```

Notice that `numba>=6.0.0` is required for the project to work. The project does not work with `python<3.10`. If the project fails to compile, try upgrading `numba` and `llvmlite` to the latest versions.

## `hls4ml`

The major use of this project is through the `distributed_arithmetic` strategy in the `hls4ml`:

```python
model_hls = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config={
        'Model': {
            ...
            'Strategy': 'distributed_arithmetic',
        },
        ...
    },
    ...
)
```
Currently, `Dense/Conv1D/Conv2D` layers are supported for both `io_parallel` and `io_stream` dataflows. However, notice that distributed arithmetic implies `reuse_factor=1`, as the whole kernel is implemented in combinational logic.

### Notice

Currently, only the `da4ml-v2` branch of `hls4ml` supports the `distributed_arithmetic` strategy. The `da4ml-v2` branch is not yet merged into the `main` branch of `hls4ml`, so you need to install it from the GitHub repository.

## Direct Usage

If you want to use it directly, you can use the `da4ml.api.fn_from_kernel` function, which creates a Python function from a 2x2 kernel `float[n_in, n_out]` and its corresponding code. The function signature is:

```python
def fn_from_kernel(
    kernel: np.ndarray,
    signs: list[bool],
    bits: list[int],
    int_bits: list[int],
    symmetrics: list[bool],
    depths: list[int] | None = None,
    n_beams: int = 1,
    dc: int | None = None,
    n_inp_max: int = -1,
    n_out_max: int = -1,
    codegen_backend: PyCodegenBackend = PyCodegenBackend(),
    signed_balanced_reduction: bool = True,
) -> tuple[Callable[[list[T]], list[T]], str]:
    """Compile a CMVM operation, with the constant kernel, into a function with only accumulation/subtraction/shift operations.

    Parameters
    ----------
    kernel : np.ndarray
        The kernel to compile. Must be of shape (n_inp, n_out).
    signs : list[bool]
        If the input is signed. Must be of length n_inp.
    bits : list[int]
        The bitwidth of the inputs. Must be of length n_inp.
    int_bits : list[int]
        The number of integer bits in the inputs (incl. sign bit!). Must be of length n_inp.
    symmetrics : list[bool]
        If the input is symmetricly quantized. Must be of length n_inp.
    depths : list[int]|None, optional
        The depth associated with each input. Must be of length n_inp. Defaults to [0]*n_inp.
    n_beams : int, optional
        Number of beams to use in beam search. Defaults to 1. (Currently disabled!)
    dc : int | None, optional
        Delay constraint. Not implemented yet. Defaults to None.
    n_inp_max : int, optional
        Number of inputs to process in one block. Defaults to -1 (no limit). Decrease to improve performance, but result will be less optimal.
    n_out_max : int, optional
        Number of outputs to process in one block. Defaults to -1 (no limit). Decrease to improve performance, but result will be less optimal.
    codegen_backend : PyCodegenBackend, optional
        The codegen backend to be used. Defaults to PyCodegenBackend().
    signed_balanced_reduction : bool, optional
        If the reduction tree should isolate the plus and minus terms. Set to False to improve latency. Defaults to True.

    Returns
    -------
    tuple[Callable[[list[T]], list[T]], str]
        fn : Callable[[list[T]], list[T]]
            The compiled python function. It takes a list of inputs and returns a list of outputs with only accumulation/subtraction/powers of 2 operations.
        fn_str : str
            The code of the compiled function, depending on the codegen_backend used.
    """
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "da4ml",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "CMVM, distributed arithmetic, hls4ml, MCM, subexpression elimination",
    "author": null,
    "author_email": "Chang Sun <chsun@cern.ch>",
    "download_url": "https://files.pythonhosted.org/packages/40/90/68a8649e4cf45df53cfdee64e2f9cc658fb380ddd6e41c7695076c4ae582/da4ml-0.1.0.tar.gz",
    "platform": null,
    "description": "# da4ml: Distributed Arithmetic for Machine Learning\n\nThis project performs Constant Matrix-Vector Multiplication (CMVM) with Distributed Arithmetic (DA) for Machine Learning (ML) on a Field Programmable Gate Arrays (FPGAs).\n\nCMVM optimization is done through greedy CSE of two-term subexpressions, with possible Delay Constraints (DC). The optimization is done in jitted Python (Numba), and a list of optimized operations is generated as traced Python code.\n\nAt the moment, the project only generates Vitis HLS C++ code for the FPGA implementation of the optimized CMVM kernel. HDL code generation is planned for the future. Currently, the major use of this repository is through the `distributed_arithmetic` strategy in the [`hls4ml`](https://github.com/fastmachinelearning/hls4ml/) project.\n\n\n## Installation\n\nThe project is available on PyPI and can be installed with pip:\n\n```bash\npip install da4ml\n```\n\nNotice that `numba>=6.0.0` is required for the project to work. The project does not work with `python<3.10`. If the project fails to compile, try upgrading `numba` and `llvmlite` to the latest versions.\n\n## `hls4ml`\n\nThe major use of this project is through the `distributed_arithmetic` strategy in the `hls4ml`:\n\n```python\nmodel_hls = hls4ml.converters.convert_from_keras_model(\n    model,\n    hls_config={\n        'Model': {\n            ...\n            'Strategy': 'distributed_arithmetic',\n        },\n        ...\n    },\n    ...\n)\n```\nCurrently, `Dense/Conv1D/Conv2D` layers are supported for both `io_parallel` and `io_stream` dataflows. However, notice that distributed arithmetic implies `reuse_factor=1`, as the whole kernel is implemented in combinational logic.\n\n### Notice\n\nCurrently, only the `da4ml-v2` branch of `hls4ml` supports the `distributed_arithmetic` strategy. The `da4ml-v2` branch is not yet merged into the `main` branch of `hls4ml`, so you need to install it from the GitHub repository.\n\n## Direct Usage\n\nIf you want to use it directly, you can use the `da4ml.api.fn_from_kernel` function, which creates a Python function from a 2x2 kernel `float[n_in, n_out]` and its corresponding code. The function signature is:\n\n```python\ndef fn_from_kernel(\n    kernel: np.ndarray,\n    signs: list[bool],\n    bits: list[int],\n    int_bits: list[int],\n    symmetrics: list[bool],\n    depths: list[int] | None = None,\n    n_beams: int = 1,\n    dc: int | None = None,\n    n_inp_max: int = -1,\n    n_out_max: int = -1,\n    codegen_backend: PyCodegenBackend = PyCodegenBackend(),\n    signed_balanced_reduction: bool = True,\n) -> tuple[Callable[[list[T]], list[T]], str]:\n    \"\"\"Compile a CMVM operation, with the constant kernel, into a function with only accumulation/subtraction/shift operations.\n\n    Parameters\n    ----------\n    kernel : np.ndarray\n        The kernel to compile. Must be of shape (n_inp, n_out).\n    signs : list[bool]\n        If the input is signed. Must be of length n_inp.\n    bits : list[int]\n        The bitwidth of the inputs. Must be of length n_inp.\n    int_bits : list[int]\n        The number of integer bits in the inputs (incl. sign bit!). Must be of length n_inp.\n    symmetrics : list[bool]\n        If the input is symmetricly quantized. Must be of length n_inp.\n    depths : list[int]|None, optional\n        The depth associated with each input. Must be of length n_inp. Defaults to [0]*n_inp.\n    n_beams : int, optional\n        Number of beams to use in beam search. Defaults to 1. (Currently disabled!)\n    dc : int | None, optional\n        Delay constraint. Not implemented yet. Defaults to None.\n    n_inp_max : int, optional\n        Number of inputs to process in one block. Defaults to -1 (no limit). Decrease to improve performance, but result will be less optimal.\n    n_out_max : int, optional\n        Number of outputs to process in one block. Defaults to -1 (no limit). Decrease to improve performance, but result will be less optimal.\n    codegen_backend : PyCodegenBackend, optional\n        The codegen backend to be used. Defaults to PyCodegenBackend().\n    signed_balanced_reduction : bool, optional\n        If the reduction tree should isolate the plus and minus terms. Set to False to improve latency. Defaults to True.\n\n    Returns\n    -------\n    tuple[Callable[[list[T]], list[T]], str]\n        fn : Callable[[list[T]], list[T]]\n            The compiled python function. It takes a list of inputs and returns a list of outputs with only accumulation/subtraction/powers of 2 operations.\n        fn_str : str\n            The code of the compiled function, depending on the codegen_backend used.\n    \"\"\"\n```\n",
    "bugtrack_url": null,
    "license": "GNU Lesser General Public License v3 (LGPLv3)",
    "summary": "Digital Arithmetic for Machine Learning",
    "version": "0.1.0",
    "project_urls": {
        "repository": "https://github.com/calad0i/da4ml"
    },
    "split_keywords": [
        "cmvm",
        " distributed arithmetic",
        " hls4ml",
        " mcm",
        " subexpression elimination"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c6069d3b0bc9108b461cc7c2400fdd0c61c87a4438daef2e7ed15dc42db84bc1",
                "md5": "356e18a0c002580925b1b7d89452ba8d",
                "sha256": "3bb61451fd7b9e4e58c78ee86a1dd905a7eb56bc7d40052d9eb2915aba6590a6"
            },
            "downloads": -1,
            "filename": "da4ml-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "356e18a0c002580925b1b7d89452ba8d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 19112,
            "upload_time": "2025-02-07T23:58:20",
            "upload_time_iso_8601": "2025-02-07T23:58:20.041414Z",
            "url": "https://files.pythonhosted.org/packages/c6/06/9d3b0bc9108b461cc7c2400fdd0c61c87a4438daef2e7ed15dc42db84bc1/da4ml-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "409068a8649e4cf45df53cfdee64e2f9cc658fb380ddd6e41c7695076c4ae582",
                "md5": "e25fe441566e983a9f8a9efd23ded2fd",
                "sha256": "4176acf4cb4008ef5db74f7e63f62539c3c120abf56f6b65bed3cfbd24ab8041"
            },
            "downloads": -1,
            "filename": "da4ml-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e25fe441566e983a9f8a9efd23ded2fd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 19282,
            "upload_time": "2025-02-07T23:58:22",
            "upload_time_iso_8601": "2025-02-07T23:58:22.026535Z",
            "url": "https://files.pythonhosted.org/packages/40/90/68a8649e4cf45df53cfdee64e2f9cc658fb380ddd6e41c7695076c4ae582/da4ml-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-07 23:58:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "calad0i",
    "github_project": "da4ml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "da4ml"
}
        
Elapsed time: 0.55364s