# Spirit-GPU
- [Spirit-GPU](#spirit-gpu)
- [Install](#install)
- [Handler Mode (Default)](#handler-mode-default)
- [API](#api)
- [Builder](#builder)
- [Proxy Mode](#proxy-mode)
- [Usage](#usage)
- [Authentication](#authentication)
- [Policy](#policy)
- [Logging](#logging)
## Install
```
pip install spirit-gpu
```
## Handler Mode (Default)
`spirit-gpu` executes user-defined handlers using synchronous or asynchronous requests. Please refer to the [API](#api) documentation for details on how to invoke them.
```python
from spirit_gpu import start
from spirit_gpu.env import Env
from typing import Dict, Any
def handler(request: Dict[str, Any], env: Env):
"""
request: Dict[str, Any], from client http request body.
request["input"]: Required.
request["webhook"]: Optional string for asynchronous requests.
returned object to be serialized into JSON and sent to the client.
in this case: '{"output": "hello"}'
"""
return {"output": "hello"}
def gen_handler(request: Dict[str, Any], env: Env):
"""
append yield output to array, serialize into JSON and send to client.
in this case: [0, 1, 2, 3, 4]
"""
for i in range(5):
yield i
async def async_handler(request: Dict[str, Any], env: Env):
"""
returned object to be serialized into JSON and sent to the client.
"""
return {"output": "hello"}
async def async_gen_handler(request: Dict[str, Any], env: Env):
"""
append yield output to array, serialize into JSON and send to client.
"""
for i in range(10):
yield i
def concurrency_modifier(current_allowed_concurrency: int) -> int:
"""
Adjusts the allowed concurrency level based on the current state.
For example, if the current allowed concurrency is 3 and resources are sufficient,
it can be increased to 5, allowing 5 tasks to run concurrently.
"""
allowed_concurrency = ...
return allowed_concurrency
"""
Register the handler with serverless.start().
Handlers can be synchronous, asynchronous, generators, or asynchronous generators.
"""
start({
"handler": async_handler, "concurrency_modifier": concurrency_modifier
})
```
### API
Please read [API](https://github.com/datastone-spirit/spirit-gpu/blob/main/API.md) or [中文 API](https://github.com/datastone-spirit/spirit-gpu/blob/main/API.zh.md) for how to use spirit-gpu serverless apis and some other important policies.
### Builder
The `spirit-gpu-builder` allows you to quickly generate templates and skeleton code for `spirit-gpu` using OpenAPI or JSON schema definitions. Built on `datamodel-code-generator`, this tool simplifies the setup for serverless functions.
> `spirit-gpu-builder` is installed when you install `spirit-gpu >= 0.0.6`.
```
usage: spirit-gpu-builder [-h] [-i INPUT_FILE]
[--input-type {auto,openapi,jsonschema,json,yaml,dict,csv,graphql}]
[-o OUTPUT_DIR]
[--data-type {pydantic_v2.BaseModel,dataclasses.dataclass}]
[--handler-type {sync,async,sync_generator,async_generator}]
[--model-only]
Generate spirit-gpu skeleton code from a OpenAPI or JSON schema, built on top of `datamodel-code-generator`.
```
Options:
- `-h, --help`: show this help message and exit
- `-i INPUT_FILE, --input-file INPUT_FILE` Path to the input file. Supported types: ['auto', 'openapi', 'jsonschema', 'json', 'yaml', 'dict', 'csv', 'graphql']. If not provided, will try to find default file in current directory, default files ['api.yaml', 'api.yml', 'api.json'].
- `--input-type {auto,openapi,jsonschema,json,yaml,dict,csv,graphql}`: Specific the type of input file. Default: 'auto'.
- `-o OUTPUT_DIR, --output-dir OUTPUT_DIR`: Path to the output Python file. Default is current directory.
- `--data-type {pydantic_v2.BaseModel,dataclasses.dataclass}` Type of data model to generate. Default is 'pydantic_v2.BaseModel'.
- `--handler-type {sync,async,sync_generator,async_generator}` Type of handler to generate. Default is 'sync'.
- `--model-only`: Only generate the model file and skip the template repo and main file generation. Useful when update the api file.
The input file is the `input` part of body of your request to serverless of spirit-gpu, it can be json format, json schema format or openapi file.
**Examples**
The input file should define the expected `input` part request body for your serverless spirit-gpu function. Supported formats include JSON, JSON schema, or OpenAPI.
```yaml
openapi: 3.1.0·
components:
schemas:
RequestInput:
type: object
required:
- audio
properties:
audio:
type: string
description: URL to the audio file.
nullable: false
model:
type: string
description: Identifier for the model to be used.
default: null
nullable: true
```
Your request body to `spirit-gpu`:
```json
{
"input": {
"audio": "http://your-audio.wav",
"model": "base",
},
"webhook": "xxx"
}
```
Generated python model file:
```python
class RequestInput(BaseModel):
audio: str = Field(..., description='URL to the audio file.')
model: Optional[str] = Field(
None, description='Identifier for the model to be used.'
)
```
If using OpenAPI, ensure the main object in your YAML file is named RequestInput to allow automatic code generation.
```python
def get_request_input(request: Dict[str, Any]) -> RequestInput:
return RequestInput(**request["input"])
def handler_impl(request_input: RequestInput, request: Dict[str, Any], env: Env):
"""
Your handler implementation goes here.
"""
pass
def handler(request: Dict[str, Any], env: Env):
request_input = get_request_input(request)
return handler_impl(request_input, request, env)
```
All generated code like this.
```
├── Dockerfile
├── LICENSE
├── README.md
├── api.json
├── requirements.txt
├── scripts
│ ├── build.sh
│ └── start.sh
└── src
├── build.py
├── main.py
└── spirit_generated_model.py
```
## Proxy Mode
Proxy mode allows users to easily use `spirit-gpu` to access open-source LLM servers such as [vllm](https://github.com/vllm-project/vllm) or [ollama](https://github.com/ollama/ollama/tree/main).
Proxy mode transforms our serverless service into a proxy similar to a large Nginx server. Users send their requests to our URL `https://api-serverless.datastone.cn` along with an authentication token consisting of a serverless ID and an API key. We then proxy these requests to the user's local LLM server defined in the Dockerfile.
> Note: You should start your `vllm`, `ollama`, or any other LLM server simultaneously with the `spirit-gpu` process in your Dockerfile. Currently, we only support proxying requests that are compatible with ollama APIs (`/api/*`) and OpenAI APIs (`/v1/chat/*, /v1/*`).
### Usage
```python
import aiohttp
from spirit_gpu import start, logger
# ollama base url
base_url = "http://localhost:11434"
def check_start():
return True
async def check_start_async():
try:
async with aiohttp.ClientSession() as session:
async with session.get(base_url) as response:
text = await response.text()
logger.info(f"response.status: {response.status}, text: {text}")
return True
except Exception:
return False
start({
"mode": "proxy",
"base_url": base_url,
"check_start": check_start_async,
})
```
- `mode`: Set to "proxy", meaning `spirit-gpu` will receive user requests, proxy them to the local server, and return the responses back.
- `check_start`: Used to verify if the local server has started successfully.
- `base_url`: The URL of the local server.
### Authentication
In proxy mode, you should include the user authentication token in the HTTP header. For example, to make a generate stream request to the `ollama` API:
Using spirit-gpu, when you call:
```
curl -H "Auth: {your_serverless_id}-{your_api_key}" https://api-serverless.datastone.cn/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?"
}'
```
`spirit-gpu` will proxy the request to your local server at `http://localhost:11434` as shown below. Ensure that your LLM server is included in the Dockerfile and is started.
```
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?"
}'
```
Currently, we support proxying requests for both `ollama` APIs and `OpenAI` APIs.
### Policy
- Response Timeout: We wait for a response from the worker for up to 3 minutes. If your worker fails to respond within this timeframe, a response with a status code of 408 (Request Timeout) will be returned.
- Response Transfer Duration: The entire response transfer must be completed within 3 minutes. This means that once the worker begins sending the response, the stream should finish within this period.
> Note: Please carefully design your Dockerfile to ensure that the LLM server starts and loads models promptly.
## Logging
We provide a tool to log information. Default logging level is "INFO", you can call `logger.set_level(logging.DEBUG)` to change it.
> Please make sure you update to the `latest` version to use this feature.
```python
from spirit_gpu import start, logger
from spirit_gpu.env import Env
from typing import Dict, Any
def handler(request: Dict[str, Any], env: Env):
"""
request: Dict[str, Any], from client http request body.
request["input"]: Required.
request["webhook"]: Optional string for asynchronous requests.
we will only add request["meta"]["requestID"] if it not exist in your request.
"""
request_id = request["meta"]["requestID"]
logger.info("start to handle", request_id = request_id, caller=True)
return {"output": "hello"}
start({"handler": handler})
```
Raw data
{
"_id": null,
"home_page": "https://github.com/datastone-spirit",
"name": "spirit-gpu",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "serverless, ai, gpu, machine learning, SDK, library, python, API",
"author": "spirit",
"author_email": "Spirit <pypi@datastone.cn>",
"download_url": "https://files.pythonhosted.org/packages/b0/f3/47ebe8a6d19718dd483deedac9b258ecd7f178a9528884d1ba74ac309f9b/spirit_gpu-0.1.0.tar.gz",
"platform": null,
"description": "# Spirit-GPU\n\n- [Spirit-GPU](#spirit-gpu)\n - [Install](#install)\n - [Handler Mode (Default)](#handler-mode-default)\n - [API](#api)\n - [Builder](#builder)\n - [Proxy Mode](#proxy-mode)\n - [Usage](#usage)\n - [Authentication](#authentication)\n - [Policy](#policy)\n - [Logging](#logging)\n\n## Install\n```\npip install spirit-gpu\n```\n\n## Handler Mode (Default)\n\n`spirit-gpu` executes user-defined handlers using synchronous or asynchronous requests. Please refer to the [API](#api) documentation for details on how to invoke them.\n\n```python\nfrom spirit_gpu import start\nfrom spirit_gpu.env import Env\nfrom typing import Dict, Any\n\ndef handler(request: Dict[str, Any], env: Env):\n \"\"\"\n request: Dict[str, Any], from client http request body.\n request[\"input\"]: Required.\n request[\"webhook\"]: Optional string for asynchronous requests.\n\n returned object to be serialized into JSON and sent to the client.\n in this case: '{\"output\": \"hello\"}'\n \"\"\"\n return {\"output\": \"hello\"}\n\n\ndef gen_handler(request: Dict[str, Any], env: Env):\n \"\"\"\n append yield output to array, serialize into JSON and send to client.\n in this case: [0, 1, 2, 3, 4]\n \"\"\"\n for i in range(5):\n yield i\n\n\nasync def async_handler(request: Dict[str, Any], env: Env):\n \"\"\"\n returned object to be serialized into JSON and sent to the client.\n \"\"\"\n return {\"output\": \"hello\"}\n\n\nasync def async_gen_handler(request: Dict[str, Any], env: Env):\n \"\"\"\n append yield output to array, serialize into JSON and send to client.\n \"\"\"\n for i in range(10):\n yield i\n\n\ndef concurrency_modifier(current_allowed_concurrency: int) -> int:\n \"\"\"\n Adjusts the allowed concurrency level based on the current state.\n For example, if the current allowed concurrency is 3 and resources are sufficient,\n it can be increased to 5, allowing 5 tasks to run concurrently.\n \"\"\"\n allowed_concurrency = ...\n return allowed_concurrency\n\n\n\"\"\"\nRegister the handler with serverless.start().\nHandlers can be synchronous, asynchronous, generators, or asynchronous generators.\n\"\"\"\nstart({\n \"handler\": async_handler, \"concurrency_modifier\": concurrency_modifier\n})\n```\n\n### API\nPlease read [API](https://github.com/datastone-spirit/spirit-gpu/blob/main/API.md) or [\u4e2d\u6587 API](https://github.com/datastone-spirit/spirit-gpu/blob/main/API.zh.md) for how to use spirit-gpu serverless apis and some other important policies.\n\n### Builder\n\nThe `spirit-gpu-builder` allows you to quickly generate templates and skeleton code for `spirit-gpu` using OpenAPI or JSON schema definitions. Built on `datamodel-code-generator`, this tool simplifies the setup for serverless functions.\n\n> `spirit-gpu-builder` is installed when you install `spirit-gpu >= 0.0.6`.\n\n```\nusage: spirit-gpu-builder [-h] [-i INPUT_FILE]\n [--input-type {auto,openapi,jsonschema,json,yaml,dict,csv,graphql}]\n [-o OUTPUT_DIR]\n [--data-type {pydantic_v2.BaseModel,dataclasses.dataclass}]\n [--handler-type {sync,async,sync_generator,async_generator}]\n [--model-only]\n\nGenerate spirit-gpu skeleton code from a OpenAPI or JSON schema, built on top of `datamodel-code-generator`. \n```\n\nOptions:\n- `-h, --help`: show this help message and exit\n- `-i INPUT_FILE, --input-file INPUT_FILE` Path to the input file. Supported types: ['auto', 'openapi', 'jsonschema', 'json', 'yaml', 'dict', 'csv', 'graphql']. If not provided, will try to find default file in current directory, default files ['api.yaml', 'api.yml', 'api.json'].\n- `--input-type {auto,openapi,jsonschema,json,yaml,dict,csv,graphql}`: Specific the type of input file. Default: 'auto'.\n- `-o OUTPUT_DIR, --output-dir OUTPUT_DIR`: Path to the output Python file. Default is current directory.\n- `--data-type {pydantic_v2.BaseModel,dataclasses.dataclass}` Type of data model to generate. Default is 'pydantic_v2.BaseModel'.\n- `--handler-type {sync,async,sync_generator,async_generator}` Type of handler to generate. Default is 'sync'.\n- `--model-only`: Only generate the model file and skip the template repo and main file generation. Useful when update the api file.\n\nThe input file is the `input` part of body of your request to serverless of spirit-gpu, it can be json format, json schema format or openapi file.\n\n**Examples**\n\nThe input file should define the expected `input` part request body for your serverless spirit-gpu function. Supported formats include JSON, JSON schema, or OpenAPI.\n\n```yaml\nopenapi: 3.1.0\u00b7\ncomponents:\n schemas:\n RequestInput:\n type: object\n required:\n - audio\n properties:\n audio:\n type: string\n description: URL to the audio file.\n nullable: false\n model:\n type: string\n description: Identifier for the model to be used.\n default: null\n nullable: true\n```\n\nYour request body to `spirit-gpu`:\n\n```json\n{\n \"input\": {\n \"audio\": \"http://your-audio.wav\",\n \"model\": \"base\",\n },\n \"webhook\": \"xxx\"\n}\n```\n\nGenerated python model file:\n```python\nclass RequestInput(BaseModel):\n audio: str = Field(..., description='URL to the audio file.')\n model: Optional[str] = Field(\n None, description='Identifier for the model to be used.'\n )\n```\n\nIf using OpenAPI, ensure the main object in your YAML file is named RequestInput to allow automatic code generation.\n\n```python\ndef get_request_input(request: Dict[str, Any]) -> RequestInput:\n return RequestInput(**request[\"input\"])\n\ndef handler_impl(request_input: RequestInput, request: Dict[str, Any], env: Env):\n \"\"\"\n Your handler implementation goes here.\n \"\"\"\n pass\n\ndef handler(request: Dict[str, Any], env: Env):\n request_input = get_request_input(request)\n return handler_impl(request_input, request, env)\n```\n\n\n\nAll generated code like this.\n```\n\u251c\u2500\u2500 Dockerfile\n\u251c\u2500\u2500 LICENSE\n\u251c\u2500\u2500 README.md\n\u251c\u2500\u2500 api.json\n\u251c\u2500\u2500 requirements.txt\n\u251c\u2500\u2500 scripts\n\u2502 \u251c\u2500\u2500 build.sh\n\u2502 \u2514\u2500\u2500 start.sh\n\u2514\u2500\u2500 src\n \u251c\u2500\u2500 build.py\n \u251c\u2500\u2500 main.py\n \u2514\u2500\u2500 spirit_generated_model.py\n```\n\n## Proxy Mode\n\nProxy mode allows users to easily use `spirit-gpu` to access open-source LLM servers such as [vllm](https://github.com/vllm-project/vllm) or [ollama](https://github.com/ollama/ollama/tree/main).\n\nProxy mode transforms our serverless service into a proxy similar to a large Nginx server. Users send their requests to our URL `https://api-serverless.datastone.cn` along with an authentication token consisting of a serverless ID and an API key. We then proxy these requests to the user's local LLM server defined in the Dockerfile.\n\n\n> Note: You should start your `vllm`, `ollama`, or any other LLM server simultaneously with the `spirit-gpu` process in your Dockerfile. Currently, we only support proxying requests that are compatible with ollama APIs (`/api/*`) and OpenAI APIs (`/v1/chat/*, /v1/*`).\n\n\n\n### Usage\n```python\nimport aiohttp\nfrom spirit_gpu import start, logger\n\n# ollama base url\nbase_url = \"http://localhost:11434\"\n\n\ndef check_start():\n return True\n\n\nasync def check_start_async():\n try:\n async with aiohttp.ClientSession() as session:\n async with session.get(base_url) as response:\n text = await response.text()\n logger.info(f\"response.status: {response.status}, text: {text}\")\n return True\n except Exception:\n return False\n\n\nstart({\n \"mode\": \"proxy\",\n \"base_url\": base_url,\n \"check_start\": check_start_async,\n})\n```\n- `mode`: Set to \"proxy\", meaning `spirit-gpu` will receive user requests, proxy them to the local server, and return the responses back.\n- `check_start`: Used to verify if the local server has started successfully.\n- `base_url`: The URL of the local server.\n\n### Authentication\n\nIn proxy mode, you should include the user authentication token in the HTTP header. For example, to make a generate stream request to the `ollama` API:\n\nUsing spirit-gpu, when you call:\n```\ncurl -H \"Auth: {your_serverless_id}-{your_api_key}\" https://api-serverless.datastone.cn/api/generate -d '{\n \"model\": \"llama3.2\",\n \"prompt\": \"Why is the sky blue?\"\n}'\n```\n\n`spirit-gpu` will proxy the request to your local server at `http://localhost:11434` as shown below. Ensure that your LLM server is included in the Dockerfile and is started.\n\n```\ncurl http://localhost:11434/api/generate -d '{\n \"model\": \"llama3.2\",\n \"prompt\": \"Why is the sky blue?\"\n}'\n```\n\nCurrently, we support proxying requests for both `ollama` APIs and `OpenAI` APIs.\n\n\n### Policy\n- Response Timeout: We wait for a response from the worker for up to 3 minutes. If your worker fails to respond within this timeframe, a response with a status code of 408 (Request Timeout) will be returned.\n- Response Transfer Duration: The entire response transfer must be completed within 3 minutes. This means that once the worker begins sending the response, the stream should finish within this period.\n\n> Note: Please carefully design your Dockerfile to ensure that the LLM server starts and loads models promptly.\n\n## Logging\nWe provide a tool to log information. Default logging level is \"INFO\", you can call `logger.set_level(logging.DEBUG)` to change it.\n\n> Please make sure you update to the `latest` version to use this feature.\n```python\nfrom spirit_gpu import start, logger\nfrom spirit_gpu.env import Env\nfrom typing import Dict, Any\n\n\ndef handler(request: Dict[str, Any], env: Env):\n \"\"\"\n request: Dict[str, Any], from client http request body.\n request[\"input\"]: Required.\n request[\"webhook\"]: Optional string for asynchronous requests.\n\n we will only add request[\"meta\"][\"requestID\"] if it not exist in your request.\n \"\"\"\n request_id = request[\"meta\"][\"requestID\"]\n logger.info(\"start to handle\", request_id = request_id, caller=True)\n return {\"output\": \"hello\"}\n\nstart({\"handler\": handler})\n```\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Python serverless framework for Datastone Spirit GPU.",
"version": "0.1.0",
"project_urls": {
"BugTracker": "https://github.com/datastone-spirit/spirit-gpu/issues",
"Documentation": "https://github.com/datastone-spirit/spirit-gpu/blob/main/README.md",
"Homepage": "https://github.com/datastone-spirit",
"Repository": "https://github.com/datastone-spirit/spirit-gpu"
},
"split_keywords": [
"serverless",
" ai",
" gpu",
" machine learning",
" sdk",
" library",
" python",
" api"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "93904ae3b2e4e7d93c4b08191f6ddd4ff2edd3342575c5da936214eb40ed8186",
"md5": "7032c93577604c04ea36e94333c712e7",
"sha256": "d6323c5d4f8fb2f0f872980a38bc4bdc6700a7eadc21b3ddb0fc266f628059fd"
},
"downloads": -1,
"filename": "spirit_gpu-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7032c93577604c04ea36e94333c712e7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 29615,
"upload_time": "2024-12-31T08:10:20",
"upload_time_iso_8601": "2024-12-31T08:10:20.579332Z",
"url": "https://files.pythonhosted.org/packages/93/90/4ae3b2e4e7d93c4b08191f6ddd4ff2edd3342575c5da936214eb40ed8186/spirit_gpu-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b0f347ebe8a6d19718dd483deedac9b258ecd7f178a9528884d1ba74ac309f9b",
"md5": "5e4bf125d65bb97a3e1f19df2045a94e",
"sha256": "af956297de168b65d37cc621bf74b6faf9ee55f83dd2bd5f94c8d4791bd97d64"
},
"downloads": -1,
"filename": "spirit_gpu-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "5e4bf125d65bb97a3e1f19df2045a94e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 31692,
"upload_time": "2024-12-31T08:10:21",
"upload_time_iso_8601": "2024-12-31T08:10:21.623669Z",
"url": "https://files.pythonhosted.org/packages/b0/f3/47ebe8a6d19718dd483deedac9b258ecd7f178a9528884d1ba74ac309f9b/spirit_gpu-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-31 08:10:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "datastone-spirit",
"github_project": "spirit-gpu",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "requests",
"specs": [
[
">=",
"2.31.0"
]
]
},
{
"name": "PyYAML",
"specs": [
[
">=",
"6.0.1"
]
]
},
{
"name": "backoff",
"specs": [
[
">=",
"2.2.1"
]
]
},
{
"name": "aiohttp",
"specs": [
[
">=",
"3.9.1"
]
]
},
{
"name": "pydantic",
"specs": [
[
">=",
"2.5.2"
]
]
},
{
"name": "datamodel-code-generator",
"specs": [
[
">=",
"0.26.3"
]
]
},
{
"name": "dacite",
"specs": [
[
">=",
"1.8.0"
]
]
}
],
"lcname": "spirit-gpu"
}