# Potassium
![Potassium (1)](https://user-images.githubusercontent.com/44653944/222016748-ca2c6905-8fd5-4ee5-a68e-7aed48f23436.png)
[Potassium](https://github.com/bananaml/potassium) is an open source web framework, built to tackle the unique challenges of serving custom models in production.
The goal of this project is to:
- Provide a familiar web framework similar to Flask/FastAPI
- Bake in best practices for handling large, GPU-bound ML models
- Provide a set of primitives common in ML serving, such as:
- POST request handlers
- Websocket / streaming connections
- Async handlers w/ webhooks
- Maintain a standard interface, to allow the code and models to compile to specialized hardware (ideally on [Banana Serverless GPUs](https://banana.dev) 😉)
### Stability Notes:
Potassium uses Semantic Versioning, in that major versions imply breaking changes, and v0 implies instability even between minor/patch versions. Be sure to lock your versions, as we're still in v0!
---
## Quickstart: Serving a Huggingface BERT model
The fastest way to get up and running is to use the [Banana CLI](https://github.com/bananaml/banana-cli), which downloads and runs your first model.
[Here's a demo video](https://www.loom.com/share/86d4e7b0801549b9ab2f7a1acce772aa)
1. Install the CLI with pip
```bash
pip3 install banana-cli
```
This downloads boilerplate for your potassium app, and automatically installs potassium into the venv.
2. Create a new project directory with
```bash
banana init my-app
cd my-app
```
3. Start the dev server
```bash
. ./venv/bin/activate
python3 app.py
```
4. Call your API (from a separate terminal)
```bash
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "Hello I am a [MASK] model."}' http://localhost:8000/
```
---
## Or do it yourself:
1. Install the potassium package
```bash
pip3 install potassium
```
Create a python file called `app.py` containing:
```python
from potassium import Potassium, Request, Response
from transformers import pipeline
import torch
import time
app = Potassium("my_app")
# @app.init runs at startup, and initializes the app's context
@app.init
def init():
device = 0 if torch.cuda.is_available() else -1
model = pipeline('fill-mask', model='bert-base-uncased', device=device)
context = {
"model": model,
"hello": "world"
}
return context
# @app.handler is an http post handler running for every call
@app.handler()
def handler(context: dict, request: Request) -> Response:
prompt = request.json.get("prompt")
model = context.get("model")
outputs = model(prompt)
return Response(
json = {"outputs": outputs},
status=200
)
if __name__ == "__main__":
app.serve()
```
This runs a Huggingface BERT model.
For this example, you'll also need to install transformers and torch.
```
pip3 install transformers torch
```
Start the server with:
```bash
python3 app.py
```
Test the running server with:
```bash
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "Hello I am a [MASK] model."}' http://localhost:8000
```
---
# Documentation
## potassium.Potassium
```python
from potassium import Potassium
app = Potassium("server")
```
This instantiates your HTTP app, similar to popular frameworks like [Flask](https://flask.palletsprojects.com/en/2.2.x/_)
---
## @app.init
```python
@app.init
def init():
device = 0 if torch.cuda.is_available() else -1
model = pipeline('fill-mask', model='bert-base-uncased', device=device)
return {
"model": model
}
```
The `@app.init` decorated function runs once on server startup, and is used to load any reuseable, heavy objects such as:
- Your AI model, loaded to GPU
- Tokenizers
- Precalculated embeddings
The return value is a dictionary which saves to the app's `context`, and is used later in the handler functions.
There may only be one `@app.init` function.
---
## @app.handler()
```python
@app.handler("/")
def handler(context: dict, request: Request) -> Response:
prompt = request.json.get("prompt")
model = context.get("model")
outputs = model(prompt)
return Response(
json = {"outputs": outputs},
status=200
)
```
The `@app.handler` decorated function runs for every http call, and is used to run inference or training workloads against your model(s).
You may configure as many `@app.handler` functions as you'd like, with unique API routes.
The context dict passed in is a mutable reference, so you can modify it in-place to persist objects between warm handlers.
---
## @app.background(path="/background")
```python
@app.background("/background")
def handler(context: dict, request: Request) -> Response:
prompt = request.json.get("prompt")
model = context.get("model")
outputs = model(prompt)
send_webhook(url="http://localhost:8001", json={"outputs": outputs})
return
```
The `@app.background()` decorated function runs a nonblocking job in the background, for tasks where results aren't expected to return clientside. It's on you to forward the data to wherever you please. Potassium supplies a `send_webhook()` helper function for POSTing data onward to a url, or you may add your own custom upload/pipeline code.
When invoked, the server immediately returns a `{"success": true}` message.
You may configure as many `@app.background` functions as you'd like, with unique API routes.
The context dict passed in is a mutable reference, so you can modify it in-place to persist objects between warm handlers.
---
## app.serve()
`app.serve` runs the server, and is a blocking operation.
---
## Pre-warming your app
Potassium comes with a built-in endpoint for those cases where you want to "warm up" your app to better control the timing of your inference calls. You don't *need* to call it, since your inference call requires `init()` to have run once on server startup anyway, but this gives you a bit more control.
Once your model is warm (i.e., cold boot finished), this endpoint returns a 200. If a cold boot is required, the `init()` function is first called while the server starts up, and then a 200 is returned from this endpoint.
You don't need any extra code to enable it, it comes out of the box and you can call it at `/_k/warmup` as either a GET or POST request.
---
# Store
Potassium includes a key-value storage primative, to help users persist data between calls.
Example usage: your own Redis backend (encouraged)
```
from potassium.store import Store, RedisConfig
store = Store(
backend="redis",
config = RedisConfig(
host = "localhost",
port = 6379
)
)
# in one handler
store.set("key", "value", ttl=60)
# in another handler
value = store.get("key")
```
Example usage: using local storage
- Note: not encouraged on Banana serverless or multi-replica environments, as data is stored only on the single replica
```
from potassium.store import Store, RedisConfig
store = Store(
backend="local"
)
# in one handler
store.set("key", "value", ttl=60)
# in another handler
value = store.get("key")
```
Raw data
{
"_id": null,
"home_page": "https://www.banana.dev",
"name": "potassium",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "Banana server,HTTP server,Banana,Framework",
"author": "Erik Dunteman",
"author_email": "erik@banana.dev",
"download_url": "https://files.pythonhosted.org/packages/2a/dd/19f1d81ee3d5bc02d9f853024587dfb93aca9d681ac255f96e533690642e/potassium-0.5.0.tar.gz",
"platform": null,
"description": "# Potassium\n\n![Potassium (1)](https://user-images.githubusercontent.com/44653944/222016748-ca2c6905-8fd5-4ee5-a68e-7aed48f23436.png)\n\n[Potassium](https://github.com/bananaml/potassium) is an open source web framework, built to tackle the unique challenges of serving custom models in production.\n\nThe goal of this project is to:\n\n- Provide a familiar web framework similar to Flask/FastAPI\n- Bake in best practices for handling large, GPU-bound ML models\n- Provide a set of primitives common in ML serving, such as:\n - POST request handlers\n - Websocket / streaming connections\n - Async handlers w/ webhooks\n- Maintain a standard interface, to allow the code and models to compile to specialized hardware (ideally on [Banana Serverless GPUs](https://banana.dev) \ud83d\ude09)\n\n### Stability Notes:\nPotassium uses Semantic Versioning, in that major versions imply breaking changes, and v0 implies instability even between minor/patch versions. Be sure to lock your versions, as we're still in v0!\n\n---\n\n## Quickstart: Serving a Huggingface BERT model\n\nThe fastest way to get up and running is to use the [Banana CLI](https://github.com/bananaml/banana-cli), which downloads and runs your first model.\n\n[Here's a demo video](https://www.loom.com/share/86d4e7b0801549b9ab2f7a1acce772aa)\n\n\n1. Install the CLI with pip\n```bash\npip3 install banana-cli\n```\nThis downloads boilerplate for your potassium app, and automatically installs potassium into the venv.\n\n2. Create a new project directory with \n```bash\nbanana init my-app\ncd my-app\n```\n3. Start the dev server\n```bash\n. ./venv/bin/activate\npython3 app.py\n```\n\n4. Call your API (from a separate terminal)\n```bash\ncurl -X POST -H \"Content-Type: application/json\" -d '{\"prompt\": \"Hello I am a [MASK] model.\"}' http://localhost:8000/\n``` \n\n---\n\n## Or do it yourself:\n\n1. Install the potassium package\n\n```bash\npip3 install potassium\n```\n\nCreate a python file called `app.py` containing:\n\n```python\nfrom potassium import Potassium, Request, Response\nfrom transformers import pipeline\nimport torch\nimport time\n\napp = Potassium(\"my_app\")\n\n# @app.init runs at startup, and initializes the app's context\n@app.init\ndef init():\n device = 0 if torch.cuda.is_available() else -1\n model = pipeline('fill-mask', model='bert-base-uncased', device=device)\n \n context = {\n \"model\": model,\n \"hello\": \"world\"\n }\n\n return context\n\n# @app.handler is an http post handler running for every call\n@app.handler()\ndef handler(context: dict, request: Request) -> Response:\n \n prompt = request.json.get(\"prompt\")\n model = context.get(\"model\")\n outputs = model(prompt)\n\n return Response(\n json = {\"outputs\": outputs}, \n status=200\n )\n\nif __name__ == \"__main__\":\n app.serve()\n```\n\nThis runs a Huggingface BERT model.\n\nFor this example, you'll also need to install transformers and torch.\n\n```\npip3 install transformers torch\n```\n\nStart the server with:\n\n```bash\npython3 app.py\n```\n\nTest the running server with:\n\n```bash\ncurl -X POST -H \"Content-Type: application/json\" -d '{\"prompt\": \"Hello I am a [MASK] model.\"}' http://localhost:8000\n```\n\n---\n\n# Documentation\n\n## potassium.Potassium\n\n```python\nfrom potassium import Potassium\n\napp = Potassium(\"server\")\n```\n\nThis instantiates your HTTP app, similar to popular frameworks like [Flask](https://flask.palletsprojects.com/en/2.2.x/_)\n\n---\n\n## @app.init\n\n```python\n@app.init\ndef init():\n device = 0 if torch.cuda.is_available() else -1\n model = pipeline('fill-mask', model='bert-base-uncased', device=device)\n\n return {\n \"model\": model\n }\n```\n\nThe `@app.init` decorated function runs once on server startup, and is used to load any reuseable, heavy objects such as:\n\n- Your AI model, loaded to GPU\n- Tokenizers\n- Precalculated embeddings\n\nThe return value is a dictionary which saves to the app's `context`, and is used later in the handler functions.\n\nThere may only be one `@app.init` function.\n\n---\n\n## @app.handler()\n\n```python\n@app.handler(\"/\")\ndef handler(context: dict, request: Request) -> Response:\n \n prompt = request.json.get(\"prompt\")\n model = context.get(\"model\")\n outputs = model(prompt)\n\n return Response(\n json = {\"outputs\": outputs}, \n status=200\n )\n```\n\nThe `@app.handler` decorated function runs for every http call, and is used to run inference or training workloads against your model(s).\n\nYou may configure as many `@app.handler` functions as you'd like, with unique API routes.\n\nThe context dict passed in is a mutable reference, so you can modify it in-place to persist objects between warm handlers.\n\n---\n\n## @app.background(path=\"/background\")\n\n```python\n@app.background(\"/background\")\ndef handler(context: dict, request: Request) -> Response:\n\n prompt = request.json.get(\"prompt\")\n model = context.get(\"model\")\n outputs = model(prompt)\n\n send_webhook(url=\"http://localhost:8001\", json={\"outputs\": outputs})\n\n return\n```\n\nThe `@app.background()` decorated function runs a nonblocking job in the background, for tasks where results aren't expected to return clientside. It's on you to forward the data to wherever you please. Potassium supplies a `send_webhook()` helper function for POSTing data onward to a url, or you may add your own custom upload/pipeline code.\n\nWhen invoked, the server immediately returns a `{\"success\": true}` message.\n\nYou may configure as many `@app.background` functions as you'd like, with unique API routes.\n\n\nThe context dict passed in is a mutable reference, so you can modify it in-place to persist objects between warm handlers.\n\n---\n## app.serve()\n\n`app.serve` runs the server, and is a blocking operation.\n\n---\n## Pre-warming your app\n\nPotassium comes with a built-in endpoint for those cases where you want to \"warm up\" your app to better control the timing of your inference calls. You don't *need* to call it, since your inference call requires `init()` to have run once on server startup anyway, but this gives you a bit more control.\n\nOnce your model is warm (i.e., cold boot finished), this endpoint returns a 200. If a cold boot is required, the `init()` function is first called while the server starts up, and then a 200 is returned from this endpoint.\n\nYou don't need any extra code to enable it, it comes out of the box and you can call it at `/_k/warmup` as either a GET or POST request.\n\n---\n\n# Store\nPotassium includes a key-value storage primative, to help users persist data between calls.\n\nExample usage: your own Redis backend (encouraged)\n```\nfrom potassium.store import Store, RedisConfig\n\nstore = Store(\n backend=\"redis\",\n config = RedisConfig(\n host = \"localhost\",\n port = 6379\n )\n)\n\n# in one handler\nstore.set(\"key\", \"value\", ttl=60)\n\n# in another handler\nvalue = store.get(\"key\")\n```\n\nExample usage: using local storage \n- Note: not encouraged on Banana serverless or multi-replica environments, as data is stored only on the single replica\n```\nfrom potassium.store import Store, RedisConfig\n\nstore = Store(\n backend=\"local\"\n)\n\n# in one handler\nstore.set(\"key\", \"value\", ttl=60)\n\n# in another handler\nvalue = store.get(\"key\")\n```\n\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "The potassium package is a flask-like HTTP server for serving large AI models",
"version": "0.5.0",
"project_urls": {
"Homepage": "https://www.banana.dev"
},
"split_keywords": [
"banana server",
"http server",
"banana",
"framework"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "60f3ae9562b80fd7d8f445e1e530e9e8ab990ff1b56f0e8ce410d8e51f14aefd",
"md5": "a7d926121a3bde5ebf7fbc75a1b450b8",
"sha256": "149127b984fe9277f716c3d744b9a2227b44c8269f3aca2a8d7d088ca13f22bb"
},
"downloads": -1,
"filename": "potassium-0.5.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a7d926121a3bde5ebf7fbc75a1b450b8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 18289,
"upload_time": "2023-12-12T03:42:09",
"upload_time_iso_8601": "2023-12-12T03:42:09.386383Z",
"url": "https://files.pythonhosted.org/packages/60/f3/ae9562b80fd7d8f445e1e530e9e8ab990ff1b56f0e8ce410d8e51f14aefd/potassium-0.5.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2add19f1d81ee3d5bc02d9f853024587dfb93aca9d681ac255f96e533690642e",
"md5": "821c552e3b87253031690b6f6408ad45",
"sha256": "08dcd45522737a7919a6a9014a60ab1550ef977b9183035eebdb8b5d8dff7fc5"
},
"downloads": -1,
"filename": "potassium-0.5.0.tar.gz",
"has_sig": false,
"md5_digest": "821c552e3b87253031690b6f6408ad45",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 21254,
"upload_time": "2023-12-12T03:42:10",
"upload_time_iso_8601": "2023-12-12T03:42:10.851671Z",
"url": "https://files.pythonhosted.org/packages/2a/dd/19f1d81ee3d5bc02d9f853024587dfb93aca9d681ac255f96e533690642e/potassium-0.5.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-12 03:42:10",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "potassium"
}