libucxx-cu12

Name	libucxx-cu12 JSON
Version	0.46.0 JSON
	download
home_page	None
Summary	Python Bindings for the Unified Communication X library (UCX)
upload_time	2025-10-09 13:40:43
maintainer	None
docs_url	None
author	NVIDIA Corporation
requires_python	None
license	BSD-3-Clause
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # UCXX

UCXX is an object-oriented C++ interface for UCX, with native support for Python bindings.

## Building

### Environment setup

Before starting it is necessary to have the necessary dependencies installed. The simplest way to get started is to install [Miniforge](https://github.com/conda-forge/miniforge) and then to create and activate an environment with the provided development file, for CUDA 13.x:

```
$ conda env create -n ucxx -f conda/environments/all_cuda-130_arch-x86_64.yaml
```

And then activate the newly created environment:

```
$ conda activate ucxx
```

#### Faster conda dependency resolution

The procedure aforementioned should complete without issues, but it may be slower than necessary. One alternative to speed up dependency resolution is to install [mamba](https://mamba.readthedocs.io/en/latest/) before creating the new environment. After installing Miniforge, mamba can be installed with:

```
$ conda install -c conda-forge mamba
```

After that, one can proceed as before, but simply replacing `conda` with `mamba` in the environment creation command:

```
$ mamba env create -n ucxx -f conda/environments/all_cuda-130_arch-x86_64.yaml
$ conda activate ucxx
```

### Convenience Script

For convenience, we provide the `./build.sh` script. By default, it will build and install both C++ and Python libraries. For a detailed description on available options please check `./build.sh --help`.

Building C++ and Python libraries manually is also possible, see instructions on building [C++](#c) and [Python](#python).

Additionally, there is a `./build_and_run.sh` script that will call `./build.sh` to build everything as well as running C++ and Python tests and a few benchmarks. Similarly, details on existing options can be queried with `./build_and_run.sh`.

### C++

To build and install C++ library to `${CONDA_PREFIX}`, with both Python and RMM support, as well as building all tests and benchmarks (with CUDA support) run:

```
mkdir cpp/build
cd cpp/build
cmake .. -DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX} \
      -DBUILD_TESTS=ON \
      -DBUILD_BENCHMARKS=ON \
      -DCMAKE_BUILD_TYPE=Release \
      -DUCXX_ENABLE_PYTHON=ON \
      -DUCXX_ENABLE_RMM=ON \
      -DUCXX_BENCHMARKS_ENABLE_CUDA=ON
make -j install
```

### Python

```
cd python
python setup.py install
```

## Running benchmarks

### C++

Currently there is one C++ benchmark with comprehensive options. It can be found under `cpp/build/benchmarks/ucxx_perftest` and for a full list of options `-h` argument can be used.

The benchmark is composed of two processes: a server and a client. The server must not specify an IP address or hostname and will bind to all available interfaces, whereas the client must specify the IP address or hostname where the server can be reached.

#### Basic Usage

Below is an example of running a server first, followed by the client connecting to the server on the `localhost` (same as `127.0.0.1`). Both processes specify a list of parameters, which are the message size in bytes (`-s 1000000000`), the number of iterations to perform (`-n 10`) and the progress mode (`-P polling`).

```
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -s 1000000000 -n 10 -P polling &
$ ./benchmarks/ucxx_perftest -s 1000000000 -n 10 -P polling localhost
```

#### CUDA Memory Support

When built with `UCXX_BENCHMARKS_ENABLE_CUDA=ON`, the benchmark supports multiple CUDA memory types using the `-m` flag:

```
# Server with CUDA device memory
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda -s 1048576 -n 10 &

# Client with CUDA device memory
$ ./benchmarks/ucxx_perftest -m cuda -s 1048576 -n 10 127.0.0.1

# Server with CUDA managed memory (unified memory)
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda-managed -s 1048576 -n 10 &

# Client with CUDA managed memory
$ ./benchmarks/ucxx_perftest -m cuda-managed -s 1048576 -n 10 127.0.0.1

# Server with CUDA async memory (with streams)
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda-async -s 1048576 -n 10 &

# Client with CUDA async memory
$ ./benchmarks/ucxx_perftest -m cuda-async -s 1048576 -n 10 127.0.0.1
```

**Available Memory Types:**
- `host` - Standard host memory allocation (default)
- `cuda` - CUDA device memory allocation
- `cuda-managed` - CUDA unified/managed memory allocation
- `cuda-async` - CUDA device memory with asynchronous operations

**Requirements for CUDA Support:**
- UCXX compiled with `UCXX_BENCHMARKS_ENABLE_CUDA=ON` (if building benchmarks)
- CUDA runtime available
- UCX configured with CUDA transport support
- Compatible CUDA devices on both endpoints

It is recommended to use `UCX_TCP_CM_REUSEADDR=y` when binding to interfaces with TCP support to prevent waiting for the process' `TIME_WAIT` state to complete, which often takes 60 seconds after the server has terminated.

### Python

Benchmarks are available for both the Python "core" (synchronous) API and the "high-level" (asynchronous) API.

#### Synchronous

```python
# Thread progress without delayed notification NumPy transfer, 100 iterations
# of single buffer with 100 bytes
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-core \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 100

# Blocking progress without delayed notification RMM transfer between GPUs 0
# and 3, 100 iterations of 2 buffers (using multi-buffer interface) each with
# 1 MiB
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-core \
    --object_type rmm \
    --server-dev 0 \
    --client-dev 3 \
    --n-iter 100 \
    --n-bytes 100 \
    --progress-mode blocking
```

#### Asynchronous

```python
# NumPy transfer, 100 iterations of 8 buffers (using multi-buffer interface)
# each with 100 bytes
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 100 \
    --n-buffers 8

# RMM transfer between GPUs 0 and 3, 100 iterations of 2 buffers (using
# multi-buffer interface) each with 1 MiB
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type rmm \
    --server-dev 0 \
    --client-dev 3 \
    --n-iter 100 \
    --n-bytes 1MiB \
    --n-buffers 2

# Polling progress mode without delayed notification NumPy transfer,
# 100 iterations of single buffer with 1 MiB
UCXPY_ENABLE_DELAYED_SUBMISSION=0 \
    python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 1MiB \
    --progress-mode polling
```

## Logging

Logging is independently available for both C++ and Python APIs. Since the Python interface uses the C++ backend, C++ logging can be enabled when running Python code as well.

### C++

The C++ interface reuses the UCX logger and provides the same log levels and can be enabled via the `UCXX_LOG_LEVEL` environment variable. However, it will not enable UCX logging, one must still set `UCX_LOG_LEVEL` for UCX logging. A few examples are below:

```
# Request trace log level
UCXX_LOG_LEVEL=TRACE_REQ

# Debug log level
UCXX_LOG_LEVEL=DEBUG
```

### Python

The UCXX Python interface uses the `logging` library included in Python. The only used levels currently are `INFO` and `DEBUG`, and can be enabled via the `UCXPY_LOG_LEVEL` environment variable. A few examples are below:

```
# Enable Python info log level
UCXPY_LOG_LEVEL=INFO

# Enable Python debug log level, UCXX request trace log level and UCX data log level
UCXPY_LOG_LEVEL=DEBUG UCXX_LOG_LEVEL=TRACE_REQ UCX_LOG_LEVEL=DATA
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "libucxx-cu12",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "NVIDIA Corporation",
    "author_email": null,
    "download_url": null,
    "platform": null,
    "description": "# UCXX\n\nUCXX is an object-oriented C++ interface for UCX, with native support for Python bindings.\n\n## Building\n\n### Environment setup\n\nBefore starting it is necessary to have the necessary dependencies installed. The simplest way to get started is to install [Miniforge](https://github.com/conda-forge/miniforge) and then to create and activate an environment with the provided development file, for CUDA 13.x:\n\n```\n$ conda env create -n ucxx -f conda/environments/all_cuda-130_arch-x86_64.yaml\n```\n\nAnd then activate the newly created environment:\n\n```\n$ conda activate ucxx\n```\n\n#### Faster conda dependency resolution\n\nThe procedure aforementioned should complete without issues, but it may be slower than necessary. One alternative to speed up dependency resolution is to install [mamba](https://mamba.readthedocs.io/en/latest/) before creating the new environment. After installing Miniforge, mamba can be installed with:\n\n```\n$ conda install -c conda-forge mamba\n```\n\nAfter that, one can proceed as before, but simply replacing `conda` with `mamba` in the environment creation command:\n\n```\n$ mamba env create -n ucxx -f conda/environments/all_cuda-130_arch-x86_64.yaml\n$ conda activate ucxx\n```\n\n### Convenience Script\n\nFor convenience, we provide the `./build.sh` script. By default, it will build and install both C++ and Python libraries. For a detailed description on available options please check `./build.sh --help`.\n\nBuilding C++ and Python libraries manually is also possible, see instructions on building [C++](#c) and [Python](#python).\n\nAdditionally, there is a `./build_and_run.sh` script that will call `./build.sh` to build everything as well as running C++ and Python tests and a few benchmarks. Similarly, details on existing options can be queried with `./build_and_run.sh`.\n\n### C++\n\nTo build and install C++ library to `${CONDA_PREFIX}`, with both Python and RMM support, as well as building all tests and benchmarks (with CUDA support) run:\n\n```\nmkdir cpp/build\ncd cpp/build\ncmake .. -DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX} \\\n      -DBUILD_TESTS=ON \\\n      -DBUILD_BENCHMARKS=ON \\\n      -DCMAKE_BUILD_TYPE=Release \\\n      -DUCXX_ENABLE_PYTHON=ON \\\n      -DUCXX_ENABLE_RMM=ON \\\n      -DUCXX_BENCHMARKS_ENABLE_CUDA=ON\nmake -j install\n```\n\n### Python\n\n```\ncd python\npython setup.py install\n```\n\n## Running benchmarks\n\n### C++\n\nCurrently there is one C++ benchmark with comprehensive options. It can be found under `cpp/build/benchmarks/ucxx_perftest` and for a full list of options `-h` argument can be used.\n\nThe benchmark is composed of two processes: a server and a client. The server must not specify an IP address or hostname and will bind to all available interfaces, whereas the client must specify the IP address or hostname where the server can be reached.\n\n#### Basic Usage\n\nBelow is an example of running a server first, followed by the client connecting to the server on the `localhost` (same as `127.0.0.1`). Both processes specify a list of parameters, which are the message size in bytes (`-s 1000000000`), the number of iterations to perform (`-n 10`) and the progress mode (`-P polling`).\n\n```\n$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -s 1000000000 -n 10 -P polling &\n$ ./benchmarks/ucxx_perftest -s 1000000000 -n 10 -P polling localhost\n```\n\n#### CUDA Memory Support\n\nWhen built with `UCXX_BENCHMARKS_ENABLE_CUDA=ON`, the benchmark supports multiple CUDA memory types using the `-m` flag:\n\n```\n# Server with CUDA device memory\n$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda -s 1048576 -n 10 &\n\n# Client with CUDA device memory\n$ ./benchmarks/ucxx_perftest -m cuda -s 1048576 -n 10 127.0.0.1\n\n# Server with CUDA managed memory (unified memory)\n$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda-managed -s 1048576 -n 10 &\n\n# Client with CUDA managed memory\n$ ./benchmarks/ucxx_perftest -m cuda-managed -s 1048576 -n 10 127.0.0.1\n\n# Server with CUDA async memory (with streams)\n$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda-async -s 1048576 -n 10 &\n\n# Client with CUDA async memory\n$ ./benchmarks/ucxx_perftest -m cuda-async -s 1048576 -n 10 127.0.0.1\n```\n\n**Available Memory Types:**\n- `host` - Standard host memory allocation (default)\n- `cuda` - CUDA device memory allocation\n- `cuda-managed` - CUDA unified/managed memory allocation\n- `cuda-async` - CUDA device memory with asynchronous operations\n\n**Requirements for CUDA Support:**\n- UCXX compiled with `UCXX_BENCHMARKS_ENABLE_CUDA=ON` (if building benchmarks)\n- CUDA runtime available\n- UCX configured with CUDA transport support\n- Compatible CUDA devices on both endpoints\n\nIt is recommended to use `UCX_TCP_CM_REUSEADDR=y` when binding to interfaces with TCP support to prevent waiting for the process' `TIME_WAIT` state to complete, which often takes 60 seconds after the server has terminated.\n\n### Python\n\nBenchmarks are available for both the Python \"core\" (synchronous) API and the \"high-level\" (asynchronous) API.\n\n#### Synchronous\n\n```python\n# Thread progress without delayed notification NumPy transfer, 100 iterations\n# of single buffer with 100 bytes\npython -m ucxx.benchmarks.send_recv \\\n    --backend ucxx-core \\\n    --object_type numpy \\\n    --n-iter 100 \\\n    --n-bytes 100\n\n# Blocking progress without delayed notification RMM transfer between GPUs 0\n# and 3, 100 iterations of 2 buffers (using multi-buffer interface) each with\n# 1 MiB\npython -m ucxx.benchmarks.send_recv \\\n    --backend ucxx-core \\\n    --object_type rmm \\\n    --server-dev 0 \\\n    --client-dev 3 \\\n    --n-iter 100 \\\n    --n-bytes 100 \\\n    --progress-mode blocking\n```\n\n#### Asynchronous\n\n```python\n# NumPy transfer, 100 iterations of 8 buffers (using multi-buffer interface)\n# each with 100 bytes\npython -m ucxx.benchmarks.send_recv \\\n    --backend ucxx-async \\\n    --object_type numpy \\\n    --n-iter 100 \\\n    --n-bytes 100 \\\n    --n-buffers 8\n\n# RMM transfer between GPUs 0 and 3, 100 iterations of 2 buffers (using\n# multi-buffer interface) each with 1 MiB\npython -m ucxx.benchmarks.send_recv \\\n    --backend ucxx-async \\\n    --object_type rmm \\\n    --server-dev 0 \\\n    --client-dev 3 \\\n    --n-iter 100 \\\n    --n-bytes 1MiB \\\n    --n-buffers 2\n\n# Polling progress mode without delayed notification NumPy transfer,\n# 100 iterations of single buffer with 1 MiB\nUCXPY_ENABLE_DELAYED_SUBMISSION=0 \\\n    python -m ucxx.benchmarks.send_recv \\\n    --backend ucxx-async \\\n    --object_type numpy \\\n    --n-iter 100 \\\n    --n-bytes 1MiB \\\n    --progress-mode polling\n```\n\n## Logging\n\nLogging is independently available for both C++ and Python APIs. Since the Python interface uses the C++ backend, C++ logging can be enabled when running Python code as well.\n\n### C++\n\nThe C++ interface reuses the UCX logger and provides the same log levels and can be enabled via the `UCXX_LOG_LEVEL` environment variable. However, it will not enable UCX logging, one must still set `UCX_LOG_LEVEL` for UCX logging. A few examples are below:\n\n```\n# Request trace log level\nUCXX_LOG_LEVEL=TRACE_REQ\n\n# Debug log level\nUCXX_LOG_LEVEL=DEBUG\n```\n\n### Python\n\nThe UCXX Python interface uses the `logging` library included in Python. The only used levels currently are `INFO` and `DEBUG`, and can be enabled via the `UCXPY_LOG_LEVEL` environment variable. A few examples are below:\n\n```\n# Enable Python info log level\nUCXPY_LOG_LEVEL=INFO\n\n# Enable Python debug log level, UCXX request trace log level and UCX data log level\nUCXPY_LOG_LEVEL=DEBUG UCXX_LOG_LEVEL=TRACE_REQ UCX_LOG_LEVEL=DATA\n```\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause",
    "summary": "Python Bindings for the Unified Communication X library (UCX)",
    "version": "0.46.0",
    "project_urls": {
        "Homepage": "https://github.com/rapidsai/ucxx"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "deb1d632dbb632576a74a7c36b1f8121055bab8f0bfc2c679566b5807468df72",
                "md5": "3addac76bbc344f33c0cbef4384420c9",
                "sha256": "25ffdc0c7a34904cb78903661c8089809f3454f433487a21942382b9bd472bcd"
            },
            "downloads": -1,
            "filename": "libucxx_cu12-0.46.0-py3-none-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl",
            "has_sig": false,
            "md5_digest": "3addac76bbc344f33c0cbef4384420c9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 499378,
            "upload_time": "2025-10-09T13:40:43",
            "upload_time_iso_8601": "2025-10-09T13:40:43.525918Z",
            "url": "https://files.pythonhosted.org/packages/de/b1/d632dbb632576a74a7c36b1f8121055bab8f0bfc2c679566b5807468df72/libucxx_cu12-0.46.0-py3-none-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "87e5707df9f422ed7fdf8c44399d55eb19a2b48ae54a6878452ce546eefff109",
                "md5": "c938d6cca4b6fb2273e7672d4e277c2f",
                "sha256": "dbf975954199a3fa641da23fbed378dfdb9c30c6401ec6f8f7914bc5c0af8a29"
            },
            "downloads": -1,
            "filename": "libucxx_cu12-0.46.0-py3-none-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "c938d6cca4b6fb2273e7672d4e277c2f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 513551,
            "upload_time": "2025-10-09T13:45:38",
            "upload_time_iso_8601": "2025-10-09T13:45:38.881445Z",
            "url": "https://files.pythonhosted.org/packages/87/e5/707df9f422ed7fdf8c44399d55eb19a2b48ae54a6878452ce546eefff109/libucxx_cu12-0.46.0-py3-none-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-09 13:40:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rapidsai",
    "github_project": "ucxx",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "libucxx-cu12"
}

NVIDIA Corporation