flockserve

Name	flockserve JSON
Version	0.1.3 JSON
	download
home_page	https://github.com/jdooodle/flockserve
Summary	Open-source Sky Computing Inference Endpoint
upload_time	2024-02-08 06:31:19
maintainer
docs_url	None
author	Antti Puurula, Anil Gurbuz
requires_python	>=3.7, <3.12
license	Apache Software License 2.0
keywords	flockserve
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # FlockServe

## Open-source Sky Computing Inference Endpoint

### Overview

FlockServe is an open-source library for deploying production-ready AI inference endpoints. Similar to 
closed-source commercial inference endpoints, FlockServe adds the capabilities of autoscaling, load balancing, 
and monitoring to an inference engine server, turning these into production-ready solutions for serving AI 
predictions at dynamic request rates and high volumes. 

FlockServe uses SkyPilot as the node provisioner, taking SkyPilot task files for single inference engine servers and 
autoscales these based on request volume. Due to using SkyPilot, an inference endpoint developed with FlockServe will
work natively across multiple clouds, and can be migrated between providers just by changing the SkyPilot configuration.

Any inference engine server can be used, and examples of SkyPilot task files are given for vLLM and TGI. Both the
OpenAI and /generate APIs used by these are supported. FlockServe runs on FastAPI and uvicorn, so requests are
processed fully asynchronously.

FlockServe has a modular design, and different solutions for autoscaling and load balancing can be used. The default
option for autoscaling uses a running mean estimate of request queue lengths, which is effective for LLM autoscaling.
The default option for load balancing uses Least Connection Load Balancing, which is also well suited for serving LLMs.

### Features
 
- **Scalability:** Easily scale your inference endpoint based on the demand using cloud resources.
- **Skypilot Integration:** Leverages the power of skypilot for sky computing free from vendor lock-in
- **Flexible Model Support:** Supports any inference engine such as vLLM and TGI, and models supported by these.
- **RESTful API:** Simple and intuitive API for interacting with the inference endpoint.
- **Monitoring and Logging:** Monitor the performance and logs of deployed models for effective debugging and optimization.

### Getting Started

#### Prerequisites

- Python >= 3.7, < 3.11
- Docker (if using containerized deployment)

#### Installation

You can install FlockServe from PyPI with pip:
```
pip install flockserve
```

#### Usage

Running from command line:
```
flockserve --skypilot_task serving_tgi_cpu_openai.yaml
```
From Python:
```
from flockserve import FlockServe
fs = FlockServe(skypilot_task="serving_tgi_cpu_generate.yaml")
fs.run()
```
The mandatory argument is skypilot_task. The available arguments are:

| Argument                | Default Value          | Description                                                          |
|-------------------------|------------------------|----------------------------------------------------------------------|
| `skypilot_task`         | *Required*             | The path to a YAML file defining the SkyPilot task.                  |
| `worker_capacity`       | `30`                   | Maximum number of tasks a worker can handle concurrently.            |
| `worker_name_prefix`    | `'skypilot-worker'`    | Prefix for naming workers.                                           |
| `host`                  | `'0.0.0.0'`            | The host IP address to bind the server.                              |
| `port`                  | `-1`                   | The port number to listen on. If <0, port is read from skypilot task |
| `worker_ready_path`     | `"/health"`            | Path to check worker readiness.                                      |
| `min_workers`           | `1`                    | Minimum number of workers to maintain.                               |
| `max_workers`           | `2`                    | Maximum number of workers allowed.                                   |
| `autoscale_up`          | `7`                    | Load threshold to trigger scaling up of workers.                     |
| `autoscale_down`        | `4`                    | Load threshold to trigger scaling down of workers.                   |
| `queue_tracking_window` | `600`                  | Time window in seconds to track queue length for autoscaling.        |
| `node_control_key`      | `None`                 | Secret key for node management operations.                           |

Once FlockServe is started, it will print the outputs from SkyPilot, as well as report FlockServe metrics periodically:
```
INFO:flockserve.flockserve:Workers: 1, Workers Ready: 0, Worker Load: 0, QLRM: 0.0
```
Once "Workers Ready" is more than 0, you can send requests:
```
curl -X POST -H "Content-Type: application/json" 0.0.0.0:3000/v1/chat/completions -d "@server_test_tgi_openai.json"
```

### Acknowledgments

FlockServe was developed at [JDoodle](https://www.jdoodle.com/), the AI-powered online platform for coding.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jdooodle/flockserve",
    "name": "flockserve",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7, <3.12",
    "maintainer_email": "",
    "keywords": "flockserve",
    "author": "Antti Puurula, Anil Gurbuz",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/02/34/3a5f84da1c2b89463d02bf627860f1f8d0b2e3322a912765f327f458b191/flockserve-0.1.3.tar.gz",
    "platform": null,
    "description": "# FlockServe\n\n## Open-source Sky Computing Inference Endpoint\n\n### Overview\n\nFlockServe is an open-source library for deploying production-ready AI inference endpoints. Similar to \nclosed-source commercial inference endpoints, FlockServe adds the capabilities of autoscaling, load balancing, \nand monitoring to an inference engine server, turning these into production-ready solutions for serving AI \npredictions at dynamic request rates and high volumes. \n\nFlockServe uses SkyPilot as the node provisioner, taking SkyPilot task files for single inference engine servers and \nautoscales these based on request volume. Due to using SkyPilot, an inference endpoint developed with FlockServe will\nwork natively across multiple clouds, and can be migrated between providers just by changing the SkyPilot configuration.\n\nAny inference engine server can be used, and examples of SkyPilot task files are given for vLLM and TGI. Both the\nOpenAI and /generate APIs used by these are supported. FlockServe runs on FastAPI and uvicorn, so requests are\nprocessed fully asynchronously.\n\nFlockServe has a modular design, and different solutions for autoscaling and load balancing can be used. The default\noption for autoscaling uses a running mean estimate of request queue lengths, which is effective for LLM autoscaling.\nThe default option for load balancing uses Least Connection Load Balancing, which is also well suited for serving LLMs.\n\n### Features\n \n- **Scalability:** Easily scale your inference endpoint based on the demand using cloud resources.\n- **Skypilot Integration:** Leverages the power of skypilot for sky computing free from vendor lock-in\n- **Flexible Model Support:** Supports any inference engine such as vLLM and TGI, and models supported by these.\n- **RESTful API:** Simple and intuitive API for interacting with the inference endpoint.\n- **Monitoring and Logging:** Monitor the performance and logs of deployed models for effective debugging and optimization.\n\n### Getting Started\n\n#### Prerequisites\n\n- Python >= 3.7, < 3.11\n- Docker (if using containerized deployment)\n\n#### Installation\n\nYou can install FlockServe from PyPI with pip:\n```\npip install flockserve\n```\n\n#### Usage\n\nRunning from command line:\n```\nflockserve --skypilot_task serving_tgi_cpu_openai.yaml\n```\nFrom Python:\n```\nfrom flockserve import FlockServe\nfs = FlockServe(skypilot_task=\"serving_tgi_cpu_generate.yaml\")\nfs.run()\n```\nThe mandatory argument is skypilot_task. The available arguments are:\n\n| Argument                | Default Value          | Description                                                          |\n|-------------------------|------------------------|----------------------------------------------------------------------|\n| `skypilot_task`         | *Required*             | The path to a YAML file defining the SkyPilot task.                  |\n| `worker_capacity`       | `30`                   | Maximum number of tasks a worker can handle concurrently.            |\n| `worker_name_prefix`    | `'skypilot-worker'`    | Prefix for naming workers.                                           |\n| `host`                  | `'0.0.0.0'`            | The host IP address to bind the server.                              |\n| `port`                  | `-1`                   | The port number to listen on. If <0, port is read from skypilot task |\n| `worker_ready_path`     | `\"/health\"`            | Path to check worker readiness.                                      |\n| `min_workers`           | `1`                    | Minimum number of workers to maintain.                               |\n| `max_workers`           | `2`                    | Maximum number of workers allowed.                                   |\n| `autoscale_up`          | `7`                    | Load threshold to trigger scaling up of workers.                     |\n| `autoscale_down`        | `4`                    | Load threshold to trigger scaling down of workers.                   |\n| `queue_tracking_window` | `600`                  | Time window in seconds to track queue length for autoscaling.        |\n| `node_control_key`      | `None`                 | Secret key for node management operations.                           |\n\nOnce FlockServe is started, it will print the outputs from SkyPilot, as well as report FlockServe metrics periodically:\n```\nINFO:flockserve.flockserve:Workers: 1, Workers Ready: 0, Worker Load: 0, QLRM: 0.0\n```\nOnce \"Workers Ready\" is more than 0, you can send requests:\n```\ncurl -X POST -H \"Content-Type: application/json\" 0.0.0.0:3000/v1/chat/completions -d \"@server_test_tgi_openai.json\"\n```\n\n### Acknowledgments\n\nFlockServe was developed at [JDoodle](https://www.jdoodle.com/), the AI-powered online platform for coding. \n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "Open-source Sky Computing Inference Endpoint",
    "version": "0.1.3",
    "project_urls": {
        "Homepage": "https://github.com/jdooodle/flockserve"
    },
    "split_keywords": [
        "flockserve"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "35b52b72433809cdc43b613b11686bc691a5faf9f028c3f580c91db559fc8b99",
                "md5": "0ef2f95512eb205e4cdd8ff2b80471af",
                "sha256": "f82cde066e1280125a679aa6ec3c0572d88dddca911a8c211e78262bb4a6c012"
            },
            "downloads": -1,
            "filename": "flockserve-0.1.3-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0ef2f95512eb205e4cdd8ff2b80471af",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.7, <3.12",
            "size": 22931,
            "upload_time": "2024-02-08T06:31:16",
            "upload_time_iso_8601": "2024-02-08T06:31:16.904950Z",
            "url": "https://files.pythonhosted.org/packages/35/b5/2b72433809cdc43b613b11686bc691a5faf9f028c3f580c91db559fc8b99/flockserve-0.1.3-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "02343a5f84da1c2b89463d02bf627860f1f8d0b2e3322a912765f327f458b191",
                "md5": "85975decff3af61563a1be8871551748",
                "sha256": "ba8260f3aa61f08536aa7383f9650cf06c451eaf769efca7e9a400e88c48baf9"
            },
            "downloads": -1,
            "filename": "flockserve-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "85975decff3af61563a1be8871551748",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7, <3.12",
            "size": 15227,
            "upload_time": "2024-02-08T06:31:19",
            "upload_time_iso_8601": "2024-02-08T06:31:19.165174Z",
            "url": "https://files.pythonhosted.org/packages/02/34/3a5f84da1c2b89463d02bf627860f1f8d0b2e3322a912765f327f458b191/flockserve-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-08 06:31:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jdooodle",
    "github_project": "flockserve",
    "github_not_found": true,
    "lcname": "flockserve"
}

Antti Puurula, Anil Gurbuz