kvcached


Namekvcached JSON
Version 0.0.1 PyPI version JSON
download
home_pageNone
SummaryA KV cache management system that supports on-demand KV cache allocation for LLMs with GPU virtual memory
upload_time2025-07-25 20:40:58
maintainerNone
docs_urlNone
authorThe kvcached team
requires_python<3.13,>=3.9
licenseNone
keywords llm kv-cache gpu-memory gpu-sharing machine-learning pytorch vllm sglang cuda memory-management
VCS
bugtrack_url
requirements setuptools wheel posix_ipc
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # kvcached

kvcached is a new KV cache management system that supports on-demand KV cache allocation. It implements the concept of GPU virtual memory, allowing applications to reserve virtual address space without immediately committing physical memory. Physical memory is then automatically allocated and mapped as needed at runtime. This capability allows multiple LLMs to run concurrently on a single GPU or a group of GPUs (TP) and flexibly share the GPU memory, significantly improving GPU utilization and reducing memory fragmentation.

kvcached is compatible with popular LLM serving engines, including SGLang and vLLM.

## kvcached Installation

### Prerequisites

* Python (tested with 3.11)
* PyTorch (tested with 2.6.0 and 2.7.0)

kvcached can be installed as simple as

```bash
pip install kvcached --no-build-isolation
```

Note that `--no-build-isolation` is required for kvcached to find the right PyTorch in the current virtual environment. If PyTorch is re-installed or upgraded, kvcached also needs re-installation.

kvcached now supports both **SGLang** and **vLLM**. While we are in the process of upstreaming the integration interfaces, we provide temporary support via code patches.

To apply a patch:

```bash
cd engine_integration/scripts
git apply [patch_name][engine_code_path]
```

## All-in-One Installation (Recommended for Development)

You can install everything (SGLang+vLLM+kvcached) by running the following commands:

```bash
cd engine_integration/scripts
./setup.sh all
```

This script will download the specified versions of SGLang and vLLM, create separate venv environments (using `uv`), compile the code, apply the necessary patches, and install kvcached.

## Testing

kvcached can be enabled or disabled by `export ENABLE_KVCACHED=true` or `false`. To verify the successful installation and benchmark the performance of SGLang/vLLM with kvcached, run:

```bash
cd engine_integration/benchmark
./start_server.sh [sgl|vllm]
# Wait until LLM server is ready
./start_client.sh [sgl|vllm]
```

The benchmark scripts automatically set `ENABLE_KVCACHED=true`. Please refer to each script for instructions on how to run SGLang/vLLM with kvcached.

## Memory monitoring and control via kvcached CLI

kvcached includes a built-in CLI tool that allows you to monitor GPU memory usage and manage memory limits across different applications. A command `kvctl` is installed along with kvcached package:

```bash
kvctl
```

Once inside the CLI, type `help` to view all supported commands:

```
kvcached> help
Available commands:
  list [ipc ...]               List IPC segments and usage
  limit <ipc> <size>           Set absolute limit (e.g. 512M, 2G)
  limit-percent <ipc> <pct>    Set limit as percentage of total GPU RAM
  watch [-n sec] [ipc ...]     Continuously display usage table
  kvtop [ipc ...] [--refresh r]  Launch curses kvtop UI (q to quit)
  !<shell cmd>                 Run command in system shell
  help                         Show this help message
  delete <ipc>                 Delete IPC segment and its limit entry
  exit | quit                  Exit the shell

kvcached>
```

Use the `kvtop` command for real-time visualization of memory usage:

<!-- KVCache memory monitor (muted colours) -->
<pre>
<span style="color:#009ACD; font-weight:bold;">KVCache Memory Usage</span>

<span style="color:#009ACD;">IPC: SGLANG</span>
<span style="color:#009ACD;">[</span><span style="color:#B7A800;">==</span><span style="color:#009E8F;">##################</span><span style="color:#666666;">----------------------------------------</span><span style="color:#009ACD;">]</span>
Prealloc: 792.0&nbsp;MB | Used: 11.2&nbsp;GB / 39.9&nbsp;GB (30.1%) | Free: 27.9&nbsp;GB

<span style="color:#009ACD;">IPC: VLLM</span>
<span style="color:#009ACD;">[</span><span style="color:#B7A800;">==</span><span style="color:#009E8F;">#######</span><span style="color:#666666;">--------------------------------------------------- </span><span style="color:#009ACD;">]</span>
Prealloc: 768.0&nbsp;MB | Used: 3.6&nbsp;GB / 37.4&nbsp;GB (11.7%) | Free: 33.0&nbsp;GB

<span style="color:#009ACD;">GPU Memory Usage</span>
<span style="color:#009ACD;">[</span><span style="color:#B7A800;">########################################</span><span style="color:#666666;">--------------------</span><span style="color:#009ACD;">]</span>
Used: 52.9&nbsp;GB / 79.2&nbsp;GB (66.8%) | Free: 26.3&nbsp;GB

<span style="color:#555555;">Press 'q' to quit</span>
</pre>

## Contributing

We are grateful for and open to contributions and collaborations of any kind.

We use pre-commit to ensure a consistent coding style. You can set it up by

```
pip install pre-commit
pre-commit install
```

Before pushing your code, please run the following check and make sure your code passes all checks.

```
pre-commit run --all-files
```

## Contacts

Feel free to contact us for contributions and collaborations.

```
Jiarong Xing (jxing@rice.edu)
Yifan Qiao (yifanqiao@berkeley.edu)
Shan Yu (shanyu1@g.ucla.edu)
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "kvcached",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.9",
    "maintainer_email": null,
    "keywords": "llm, kv-cache, gpu-memory, gpu-sharing, machine-learning, pytorch, vllm, sglang, cuda, memory-management",
    "author": "The kvcached team",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/e6/38/c5d4d06c02e701425f94db3f018df45261d8dbadabeda783dbd56ba33412/kvcached-0.0.1.tar.gz",
    "platform": null,
    "description": "# kvcached\n\nkvcached is a new KV cache management system that supports on-demand KV cache allocation. It implements the concept of GPU virtual memory, allowing applications to reserve virtual address space without immediately committing physical memory. Physical memory is then automatically allocated and mapped as needed at runtime. This capability allows multiple LLMs to run concurrently on a single GPU or a group of GPUs (TP) and flexibly share the GPU memory, significantly improving GPU utilization and reducing memory fragmentation.\n\nkvcached is compatible with popular LLM serving engines, including SGLang and vLLM.\n\n## kvcached Installation\n\n### Prerequisites\n\n* Python (tested with 3.11)\n* PyTorch (tested with 2.6.0 and 2.7.0)\n\nkvcached can be installed as simple as\n\n```bash\npip install kvcached --no-build-isolation\n```\n\nNote that `--no-build-isolation` is required for kvcached to find the right PyTorch in the current virtual environment. If PyTorch is re-installed or upgraded, kvcached also needs re-installation.\n\nkvcached now supports both **SGLang** and **vLLM**. While we are in the process of upstreaming the integration interfaces, we provide temporary support via code patches.\n\nTo apply a patch:\n\n```bash\ncd engine_integration/scripts\ngit apply [patch_name][engine_code_path]\n```\n\n## All-in-One Installation (Recommended for Development)\n\nYou can install everything (SGLang+vLLM+kvcached) by running the following commands:\n\n```bash\ncd engine_integration/scripts\n./setup.sh all\n```\n\nThis script will download the specified versions of SGLang and vLLM, create separate venv environments (using `uv`), compile the code, apply the necessary patches, and install kvcached.\n\n## Testing\n\nkvcached can be enabled or disabled by `export ENABLE_KVCACHED=true` or `false`. To verify the successful installation and benchmark the performance of SGLang/vLLM with kvcached, run:\n\n```bash\ncd engine_integration/benchmark\n./start_server.sh [sgl|vllm]\n# Wait until LLM server is ready\n./start_client.sh [sgl|vllm]\n```\n\nThe benchmark scripts automatically set `ENABLE_KVCACHED=true`. Please refer to each script for instructions on how to run SGLang/vLLM with kvcached.\n\n## Memory monitoring and control via kvcached CLI\n\nkvcached includes a built-in CLI tool that allows you to monitor GPU memory usage and manage memory limits across different applications. A command `kvctl` is installed along with kvcached package:\n\n```bash\nkvctl\n```\n\nOnce inside the CLI, type `help` to view all supported commands:\n\n```\nkvcached> help\nAvailable commands:\n  list [ipc ...]               List IPC segments and usage\n  limit <ipc> <size>           Set absolute limit (e.g. 512M, 2G)\n  limit-percent <ipc> <pct>    Set limit as percentage of total GPU RAM\n  watch [-n sec] [ipc ...]     Continuously display usage table\n  kvtop [ipc ...] [--refresh r]  Launch curses kvtop UI (q to quit)\n  !<shell cmd>                 Run command in system shell\n  help                         Show this help message\n  delete <ipc>                 Delete IPC segment and its limit entry\n  exit | quit                  Exit the shell\n\nkvcached>\n```\n\nUse the `kvtop` command for real-time visualization of memory usage:\n\n<!-- KVCache memory monitor (muted colours) -->\n<pre>\n<span style=\"color:#009ACD; font-weight:bold;\">KVCache Memory Usage</span>\n\n<span style=\"color:#009ACD;\">IPC: SGLANG</span>\n<span style=\"color:#009ACD;\">[</span><span style=\"color:#B7A800;\">==</span><span style=\"color:#009E8F;\">##################</span><span style=\"color:#666666;\">----------------------------------------</span><span style=\"color:#009ACD;\">]</span>\nPrealloc: 792.0&nbsp;MB | Used: 11.2&nbsp;GB / 39.9&nbsp;GB (30.1%) | Free: 27.9&nbsp;GB\n\n<span style=\"color:#009ACD;\">IPC: VLLM</span>\n<span style=\"color:#009ACD;\">[</span><span style=\"color:#B7A800;\">==</span><span style=\"color:#009E8F;\">#######</span><span style=\"color:#666666;\">--------------------------------------------------- </span><span style=\"color:#009ACD;\">]</span>\nPrealloc: 768.0&nbsp;MB | Used: 3.6&nbsp;GB / 37.4&nbsp;GB (11.7%) | Free: 33.0&nbsp;GB\n\n<span style=\"color:#009ACD;\">GPU Memory Usage</span>\n<span style=\"color:#009ACD;\">[</span><span style=\"color:#B7A800;\">########################################</span><span style=\"color:#666666;\">--------------------</span><span style=\"color:#009ACD;\">]</span>\nUsed: 52.9&nbsp;GB / 79.2&nbsp;GB (66.8%) | Free: 26.3&nbsp;GB\n\n<span style=\"color:#555555;\">Press 'q' to quit</span>\n</pre>\n\n## Contributing\n\nWe are grateful for and open to contributions and collaborations of any kind.\n\nWe use pre-commit to ensure a consistent coding style. You can set it up by\n\n```\npip install pre-commit\npre-commit install\n```\n\nBefore pushing your code, please run the following check and make sure your code passes all checks.\n\n```\npre-commit run --all-files\n```\n\n## Contacts\n\nFeel free to contact us for contributions and collaborations.\n\n```\nJiarong Xing (jxing@rice.edu)\nYifan Qiao (yifanqiao@berkeley.edu)\nShan Yu (shanyu1@g.ucla.edu)\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A KV cache management system that supports on-demand KV cache allocation for LLMs with GPU virtual memory",
    "version": "0.0.1",
    "project_urls": {
        "Documentation": "https://github.com/ovg-project/kvcached#readme",
        "Homepage": "https://github.com/ovg-project/kvcached",
        "Issues": "https://github.com/ovg-project/kvcached/issues",
        "Repository": "https://github.com/ovg-project/kvcached"
    },
    "split_keywords": [
        "llm",
        " kv-cache",
        " gpu-memory",
        " gpu-sharing",
        " machine-learning",
        " pytorch",
        " vllm",
        " sglang",
        " cuda",
        " memory-management"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e638c5d4d06c02e701425f94db3f018df45261d8dbadabeda783dbd56ba33412",
                "md5": "84842b2a48f6bd068fca740e3b003804",
                "sha256": "4a9c1714ac42d2fe958a63d9ad677a1d08c074e402764ebda2532392d0a70047"
            },
            "downloads": -1,
            "filename": "kvcached-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "84842b2a48f6bd068fca740e3b003804",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.9",
            "size": 7779273,
            "upload_time": "2025-07-25T20:40:58",
            "upload_time_iso_8601": "2025-07-25T20:40:58.809639Z",
            "url": "https://files.pythonhosted.org/packages/e6/38/c5d4d06c02e701425f94db3f018df45261d8dbadabeda783dbd56ba33412/kvcached-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-25 20:40:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ovg-project",
    "github_project": "kvcached#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "setuptools",
            "specs": [
                [
                    ">=",
                    "64"
                ]
            ]
        },
        {
            "name": "wheel",
            "specs": []
        },
        {
            "name": "posix_ipc",
            "specs": [
                [
                    ">=",
                    "1.0"
                ]
            ]
        }
    ],
    "lcname": "kvcached"
}
        
Elapsed time: 0.42907s