superlaser

Name	superlaser JSON
Version	0.0.6 JSON
	download
home_page
Summary	An MLOps library for LLM deployment w/ the vLLM engine on RunPod's infra.
upload_time	2024-03-02 20:54:35
maintainer
docs_url	None
author	InquestGeronimo
requires_python
license	Apache-2.0 license
keywords	inference server llm nlp mlops deployment vllm runpod cicd
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            
<div align="center">
    <img width="400" height="350" src="/img/laser.webp">
</div>

<h1 align="center">
  <em>SuperLaser</em>
</h1>

<h5 align="center">
  ⚠️<em>Not yet ready for primetime</em> ⚠️
</h5>

**SuperLaser** provides a comprehensive suite of tools and scripts designed for deploying LLMs onto [RunPod's](https://github.com/runpod) pod and serverless infrastructure. Additionally, the deployment utilizes a containerized [vLLM](https://github.com/vllm-project/vllm) engine during runtime, ensuring memory-efficient and high-performance inference capabilities.

# Features <img align="center" width="30" height="29" src="https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExOTBqaWNrcGxnaTdzMGRzNTN0bGI2d3A4YWkxajhsb2F5MW84Z2dxaCZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/26tOZ42Mg6pbTUPHW/giphy.gif">

- **Scalable Deployment**: Easily scale your LLM inference tasks with vLLM and RunPod serverless capabilities.
- **Cost-Effective**: Optimize resource and hardware usage: tensor parallelism and other GPU assets.
- **Uses OpenAI's API**: Use the SuperLaser client for with chat, non-chat, and streaming options.

# Install <img align="center" width="30" height="29" src="https://media.giphy.com/media/sULKEgDMX8LcI/giphy.gif">

```bash
pip install superlaser
```
Before you begin, ensure you have:

- A RunPod account.
# RunPod Config <img align="right" width="75" height="75" src="./img/runpod-logo.png">

First step is to obtain an API key from RunPod. Go to your account's console, in the `Settings` section, click on `API Keys`.

After obtaining a key, set it as an environment variable:

```bash
export RUNPOD_API_KEY=<YOUR-API-KEY>
```
### Configure Template

Before spinning up a serverless endpoint, let's first configure a template that we'll pass into the endpoint during staging. The template allows you to select a serverless or pod asset, your docker image name, and the container's and volume's disk space.

Configure your template with the following attributes:

```py
import os
from superlaser import RunpodHandler as runpod

api_key = os.environ.get("RUNPOD_API_KEY")

template_data = runpod.set_template(
    serverless="true",                                      
    template_name="superlaser-inf",                         # Give a name to your template
    container_image="runpod/worker-vllm:0.3.1-cuda12.1.0",  # Docker image stub
    model_name="mistralai/Mistral-7B-v0.1",                 # Hugging Face model stub
    max_model_length=340,                                   # Maximum number of tokens for the engine to handle per request.
    container_disk=15,                                      
    volume_disk=15,
)
```
### Create Template on RunPod

```py
template = runpod(api_key, data=template_data)
print(template().text)
```
### Configure Endpoint

After your template is created, it will return a data dicitionary that includes your template ID. We will pass this template id when configuring the serverless endpoint in the section below:

```py
endpoint_data = runpod.set_endpoint(
    gpu_ids="AMPERE_24", # options for gpuIds are "AMPERE_16,AMPERE_24,AMPERE_48,AMPERE_80,ADA_24"
    idle_timeout=5,
    name="vllm_endpoint",
    scaler_type="QUEUE_DELAY",
    scaler_value=1,
    template_id="template-id",
    workers_max=1,
    workers_min=0,
)
```

### Start Endpoint on RunPod

```py
endpoint = runpod(api_key, data=endpoint_data)
print(endpoint().text)
```

# Call Endpoint <img align="right" width="225" height="75" src="./img/vllm-logo.png">

After your endpoint is staged, it will return a dictionary with your endpoint ID. Pass this endpoint ID to the `OpenAI` client and start making API requests!

```py
from openai import OpenAI

endpoint_id = "you-endpoint-id"

client = OpenAI(
    api_key=api_key,
    base_url=f"https://api.runpod.ai/v2/{endpoint_id}/openai/v1",
)
```

#### Chat w/ Streaming

```py
stream = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    messages=[{"role": "user", "content": "To be or not to be"}],
    temperature=0,
    max_tokens=100,
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)
```

#### Completion w/ Streaming

```py
stream = client.completions.create(
    model="meta-llama/Llama-2-7b-hf",
    prompt="To be or not to be",
    temperature=0,
    max_tokens=100,
    stream=True,
)

for response in stream:
    print(response.choices[0].text or "", end="", flush=True)
```

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "superlaser",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "inference,server,LLM,NLP,MLOps,deployment,vllm,runpod,cicd",
    "author": "InquestGeronimo",
    "author_email": "rcostanl@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f2/b7/1d55b8ec9d802199d8b25397002b60155e0b453cc64b8d3f5b8056ddcae5/superlaser-0.0.6.tar.gz",
    "platform": null,
    "description": "\n<div align=\"center\">\n    <img width=\"400\" height=\"350\" src=\"/img/laser.webp\">\n</div>\n\n<h1 align=\"center\">\n  <em>SuperLaser</em>\n</h1>\n\n<h5 align=\"center\">\n  \u26a0\ufe0f<em>Not yet ready for primetime</em> \u26a0\ufe0f\n</h5>\n\n**SuperLaser** provides a comprehensive suite of tools and scripts designed for deploying LLMs onto [RunPod's](https://github.com/runpod) pod and serverless infrastructure. Additionally, the deployment utilizes a containerized [vLLM](https://github.com/vllm-project/vllm) engine during runtime, ensuring memory-efficient and high-performance inference capabilities.\n\n# Features <img align=\"center\" width=\"30\" height=\"29\" src=\"https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExOTBqaWNrcGxnaTdzMGRzNTN0bGI2d3A4YWkxajhsb2F5MW84Z2dxaCZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/26tOZ42Mg6pbTUPHW/giphy.gif\">\n\n- **Scalable Deployment**: Easily scale your LLM inference tasks with vLLM and RunPod serverless capabilities.\n- **Cost-Effective**: Optimize resource and hardware usage: tensor parallelism and other GPU assets.\n- **Uses OpenAI's API**: Use the SuperLaser client for with chat, non-chat, and streaming options.\n\n# Install <img align=\"center\" width=\"30\" height=\"29\" src=\"https://media.giphy.com/media/sULKEgDMX8LcI/giphy.gif\">\n\n```bash\npip install superlaser\n```\nBefore you begin, ensure you have:\n\n- A RunPod account.\n# RunPod Config <img align=\"right\" width=\"75\" height=\"75\" src=\"./img/runpod-logo.png\">\n\nFirst step is to obtain an API key from RunPod. Go to your account's console, in the `Settings` section, click on `API Keys`.\n\nAfter obtaining a key, set it as an environment variable:\n\n```bash\nexport RUNPOD_API_KEY=<YOUR-API-KEY>\n```\n### Configure Template\n\nBefore spinning up a serverless endpoint, let's first configure a template that we'll pass into the endpoint during staging. The template allows you to select a serverless or pod asset, your docker image name, and the container's and volume's disk space.\n\nConfigure your template with the following attributes:\n\n```py\nimport os\nfrom superlaser import RunpodHandler as runpod\n\napi_key = os.environ.get(\"RUNPOD_API_KEY\")\n\ntemplate_data = runpod.set_template(\n    serverless=\"true\",                                      \n    template_name=\"superlaser-inf\",                         # Give a name to your template\n    container_image=\"runpod/worker-vllm:0.3.1-cuda12.1.0\",  # Docker image stub\n    model_name=\"mistralai/Mistral-7B-v0.1\",                 # Hugging Face model stub\n    max_model_length=340,                                   # Maximum number of tokens for the engine to handle per request.\n    container_disk=15,                                      \n    volume_disk=15,\n)\n```\n### Create Template on RunPod\n\n```py\ntemplate = runpod(api_key, data=template_data)\nprint(template().text)\n```\n### Configure Endpoint\n\nAfter your template is created, it will return a data dicitionary that includes your template ID. We will pass this template id when configuring the serverless endpoint in the section below:\n\n```py\nendpoint_data = runpod.set_endpoint(\n    gpu_ids=\"AMPERE_24\", # options for gpuIds are \"AMPERE_16,AMPERE_24,AMPERE_48,AMPERE_80,ADA_24\"\n    idle_timeout=5,\n    name=\"vllm_endpoint\",\n    scaler_type=\"QUEUE_DELAY\",\n    scaler_value=1,\n    template_id=\"template-id\",\n    workers_max=1,\n    workers_min=0,\n)\n```\n\n### Start Endpoint on RunPod\n\n```py\nendpoint = runpod(api_key, data=endpoint_data)\nprint(endpoint().text)\n```\n\n# Call Endpoint <img align=\"right\" width=\"225\" height=\"75\" src=\"./img/vllm-logo.png\">\n\nAfter your endpoint is staged, it will return a dictionary with your endpoint ID. Pass this endpoint ID to the `OpenAI` client and start making API requests!\n\n```py\nfrom openai import OpenAI\n\nendpoint_id = \"you-endpoint-id\"\n\nclient = OpenAI(\n    api_key=api_key,\n    base_url=f\"https://api.runpod.ai/v2/{endpoint_id}/openai/v1\",\n)\n```\n\n#### Chat w/ Streaming\n\n```py\nstream = client.chat.completions.create(\n    model=\"mistralai/Mistral-7B-Instruct-v0.1\",\n    messages=[{\"role\": \"user\", \"content\": \"To be or not to be\"}],\n    temperature=0,\n    max_tokens=100,\n    stream=True,\n)\n\nfor chunk in stream:\n    print(chunk.choices[0].delta.content or \"\", end=\"\", flush=True)\n```\n\n#### Completion w/ Streaming\n\n```py\nstream = client.completions.create(\n    model=\"meta-llama/Llama-2-7b-hf\",\n    prompt=\"To be or not to be\",\n    temperature=0,\n    max_tokens=100,\n    stream=True,\n)\n\nfor response in stream:\n    print(response.choices[0].text or \"\", end=\"\", flush=True)\n```\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0 license",
    "summary": "An MLOps library for LLM deployment w/ the vLLM engine on RunPod's infra.",
    "version": "0.0.6",
    "project_urls": null,
    "split_keywords": [
        "inference",
        "server",
        "llm",
        "nlp",
        "mlops",
        "deployment",
        "vllm",
        "runpod",
        "cicd"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fef22fed79e68cf12918f12ea761227bd1a4161a2c541a5d0795e01af0431b6b",
                "md5": "12f30a32c161a40e4f7719d45b8297c7",
                "sha256": "4defd824787c03c8c146b13736999a9ebadecca3b79c4e33411d2b7c4b014e0f"
            },
            "downloads": -1,
            "filename": "superlaser-0.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "12f30a32c161a40e4f7719d45b8297c7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 12011,
            "upload_time": "2024-03-02T20:54:33",
            "upload_time_iso_8601": "2024-03-02T20:54:33.492188Z",
            "url": "https://files.pythonhosted.org/packages/fe/f2/2fed79e68cf12918f12ea761227bd1a4161a2c541a5d0795e01af0431b6b/superlaser-0.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f2b71d55b8ec9d802199d8b25397002b60155e0b453cc64b8d3f5b8056ddcae5",
                "md5": "6d0ce9414cacf5c9cdfa5bc6c76dcaca",
                "sha256": "4812ab3cc423903021853ef1bd8373a54cc68078b4bb085d8bb29daed12ebd26"
            },
            "downloads": -1,
            "filename": "superlaser-0.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "6d0ce9414cacf5c9cdfa5bc6c76dcaca",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 12431,
            "upload_time": "2024-03-02T20:54:35",
            "upload_time_iso_8601": "2024-03-02T20:54:35.315139Z",
            "url": "https://files.pythonhosted.org/packages/f2/b7/1d55b8ec9d802199d8b25397002b60155e0b453cc64b8d3f5b8056ddcae5/superlaser-0.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-02 20:54:35",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "superlaser"
}

InquestGeronimo