potassium

Name	potassium JSON
Version	0.5.0 JSON
	download
home_page	https://www.banana.dev
Summary	The potassium package is a flask-like HTTP server for serving large AI models
upload_time	2023-12-12 03:42:10
maintainer
docs_url	None
author	Erik Dunteman
requires_python
license	Apache License 2.0
keywords	banana server http server banana framework
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Potassium

![Potassium (1)](https://user-images.githubusercontent.com/44653944/222016748-ca2c6905-8fd5-4ee5-a68e-7aed48f23436.png)

[Potassium](https://github.com/bananaml/potassium) is an open source web framework, built to tackle the unique challenges of serving custom models in production.

The goal of this project is to:

- Provide a familiar web framework similar to Flask/FastAPI
- Bake in best practices for handling large, GPU-bound ML models
- Provide a set of primitives common in ML serving, such as:
    - POST request handlers
    - Websocket / streaming connections
    - Async handlers w/ webhooks
- Maintain a standard interface, to allow the code and models to compile to specialized hardware (ideally on [Banana Serverless GPUs](https://banana.dev) 😉)

### Stability Notes:
Potassium uses Semantic Versioning, in that major versions imply breaking changes, and v0 implies instability even between minor/patch versions. Be sure to lock your versions, as we're still in v0!

---

## Quickstart: Serving a Huggingface BERT model

The fastest way to get up and running is to use the [Banana CLI](https://github.com/bananaml/banana-cli), which downloads and runs your first model.

[Here's a demo video](https://www.loom.com/share/86d4e7b0801549b9ab2f7a1acce772aa)


1. Install the CLI with pip
```bash
pip3 install banana-cli
```
This downloads boilerplate for your potassium app, and automatically installs potassium into the venv.

2. Create a new project directory with 
```bash
banana init my-app
cd my-app
```
3. Start the dev server
```bash
. ./venv/bin/activate
python3 app.py
```

4. Call your API (from a separate terminal)
```bash
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "Hello I am a [MASK] model."}' http://localhost:8000/
``` 

---

## Or do it yourself:

1. Install the potassium package

```bash
pip3 install potassium
```

Create a python file called `app.py` containing:

```python
from potassium import Potassium, Request, Response
from transformers import pipeline
import torch
import time

app = Potassium("my_app")

# @app.init runs at startup, and initializes the app's context
@app.init
def init():
    device = 0 if torch.cuda.is_available() else -1
    model = pipeline('fill-mask', model='bert-base-uncased', device=device)
   
    context = {
        "model": model,
        "hello": "world"
    }

    return context

# @app.handler is an http post handler running for every call
@app.handler()
def handler(context: dict, request: Request) -> Response:
    
    prompt = request.json.get("prompt")
    model = context.get("model")
    outputs = model(prompt)

    return Response(
        json = {"outputs": outputs}, 
        status=200
    )

if __name__ == "__main__":
    app.serve()
```

This runs a Huggingface BERT model.

For this example, you'll also need to install transformers and torch.

```
pip3 install transformers torch
```

Start the server with:

```bash
python3 app.py
```

Test the running server with:

```bash
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "Hello I am a [MASK] model."}' http://localhost:8000
```

---

# Documentation

## potassium.Potassium

```python
from potassium import Potassium

app = Potassium("server")
```

This instantiates your HTTP app, similar to popular frameworks like [Flask](https://flask.palletsprojects.com/en/2.2.x/_)

---

## @app.init

```python
@app.init
def init():
    device = 0 if torch.cuda.is_available() else -1
    model = pipeline('fill-mask', model='bert-base-uncased', device=device)

    return {
        "model": model
    }
```

The `@app.init` decorated function runs once on server startup, and is used to load any reuseable, heavy objects such as:

- Your AI model, loaded to GPU
- Tokenizers
- Precalculated embeddings

The return value is a dictionary which saves to the app's `context`, and is used later in the handler functions.

There may only be one `@app.init` function.

---

## @app.handler()

```python
@app.handler("/")
def handler(context: dict, request: Request) -> Response:
    
    prompt = request.json.get("prompt")
    model = context.get("model")
    outputs = model(prompt)

    return Response(
        json = {"outputs": outputs}, 
        status=200
    )
```

The `@app.handler` decorated function runs for every http call, and is used to run inference or training workloads against your model(s).

You may configure as many `@app.handler` functions as you'd like, with unique API routes.

The context dict passed in is a mutable reference, so you can modify it in-place to persist objects between warm handlers.

---

## @app.background(path="/background")

```python
@app.background("/background")
def handler(context: dict, request: Request) -> Response:

    prompt = request.json.get("prompt")
    model = context.get("model")
    outputs = model(prompt)

    send_webhook(url="http://localhost:8001", json={"outputs": outputs})

    return
```

The `@app.background()` decorated function runs a nonblocking job in the background, for tasks where results aren't expected to return clientside. It's on you to forward the data to wherever you please. Potassium supplies a `send_webhook()` helper function for POSTing data onward to a url, or you may add your own custom upload/pipeline code.

When invoked, the server immediately returns a `{"success": true}` message.

You may configure as many `@app.background` functions as you'd like, with unique API routes.


The context dict passed in is a mutable reference, so you can modify it in-place to persist objects between warm handlers.

---
## app.serve()

`app.serve` runs the server, and is a blocking operation.

---
## Pre-warming your app

Potassium comes with a built-in endpoint for those cases where you want to "warm up" your app to better control the timing of your inference calls. You don't *need* to call it, since your inference call requires `init()` to have run once on server startup anyway, but this gives you a bit more control.

Once your model is warm (i.e., cold boot finished), this endpoint returns a 200. If a cold boot is required, the `init()` function is first called while the server starts up, and then a 200 is returned from this endpoint.

You don't need any extra code to enable it, it comes out of the box and you can call it at `/_k/warmup` as either a GET or POST request.

---

# Store
Potassium includes a key-value storage primative, to help users persist data between calls.

Example usage: your own Redis backend (encouraged)
```
from potassium.store import Store, RedisConfig

store = Store(
    backend="redis",
    config = RedisConfig(
        host = "localhost",
        port = 6379
    )
)

# in one handler
store.set("key", "value", ttl=60)

# in another handler
value = store.get("key")
```

Example usage: using local storage 
- Note: not encouraged on Banana serverless or multi-replica environments, as data is stored only on the single replica
```
from potassium.store import Store, RedisConfig

store = Store(
    backend="local"
)

# in one handler
store.set("key", "value", ttl=60)

# in another handler
value = store.get("key")
```

Raw data

            {
    "_id": null,
    "home_page": "https://www.banana.dev",
    "name": "potassium",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Banana server,HTTP server,Banana,Framework",
    "author": "Erik Dunteman",
    "author_email": "erik@banana.dev",
    "download_url": "https://files.pythonhosted.org/packages/2a/dd/19f1d81ee3d5bc02d9f853024587dfb93aca9d681ac255f96e533690642e/potassium-0.5.0.tar.gz",
    "platform": null,
    "description": "# Potassium\n\n![Potassium (1)](https://user-images.githubusercontent.com/44653944/222016748-ca2c6905-8fd5-4ee5-a68e-7aed48f23436.png)\n\n[Potassium](https://github.com/bananaml/potassium) is an open source web framework, built to tackle the unique challenges of serving custom models in production.\n\nThe goal of this project is to:\n\n- Provide a familiar web framework similar to Flask/FastAPI\n- Bake in best practices for handling large, GPU-bound ML models\n- Provide a set of primitives common in ML serving, such as:\n    - POST request handlers\n    - Websocket / streaming connections\n    - Async handlers w/ webhooks\n- Maintain a standard interface, to allow the code and models to compile to specialized hardware (ideally on [Banana Serverless GPUs](https://banana.dev) \ud83d\ude09)\n\n### Stability Notes:\nPotassium uses Semantic Versioning, in that major versions imply breaking changes, and v0 implies instability even between minor/patch versions. Be sure to lock your versions, as we're still in v0!\n\n---\n\n## Quickstart: Serving a Huggingface BERT model\n\nThe fastest way to get up and running is to use the [Banana CLI](https://github.com/bananaml/banana-cli), which downloads and runs your first model.\n\n[Here's a demo video](https://www.loom.com/share/86d4e7b0801549b9ab2f7a1acce772aa)\n\n\n1. Install the CLI with pip\n```bash\npip3 install banana-cli\n```\nThis downloads boilerplate for your potassium app, and automatically installs potassium into the venv.\n\n2. Create a new project directory with \n```bash\nbanana init my-app\ncd my-app\n```\n3. Start the dev server\n```bash\n. ./venv/bin/activate\npython3 app.py\n```\n\n4. Call your API (from a separate terminal)\n```bash\ncurl -X POST -H \"Content-Type: application/json\" -d '{\"prompt\": \"Hello I am a [MASK] model.\"}' http://localhost:8000/\n``` \n\n---\n\n## Or do it yourself:\n\n1. Install the potassium package\n\n```bash\npip3 install potassium\n```\n\nCreate a python file called `app.py` containing:\n\n```python\nfrom potassium import Potassium, Request, Response\nfrom transformers import pipeline\nimport torch\nimport time\n\napp = Potassium(\"my_app\")\n\n# @app.init runs at startup, and initializes the app's context\n@app.init\ndef init():\n    device = 0 if torch.cuda.is_available() else -1\n    model = pipeline('fill-mask', model='bert-base-uncased', device=device)\n   \n    context = {\n        \"model\": model,\n        \"hello\": \"world\"\n    }\n\n    return context\n\n# @app.handler is an http post handler running for every call\n@app.handler()\ndef handler(context: dict, request: Request) -> Response:\n    \n    prompt = request.json.get(\"prompt\")\n    model = context.get(\"model\")\n    outputs = model(prompt)\n\n    return Response(\n        json = {\"outputs\": outputs}, \n        status=200\n    )\n\nif __name__ == \"__main__\":\n    app.serve()\n```\n\nThis runs a Huggingface BERT model.\n\nFor this example, you'll also need to install transformers and torch.\n\n```\npip3 install transformers torch\n```\n\nStart the server with:\n\n```bash\npython3 app.py\n```\n\nTest the running server with:\n\n```bash\ncurl -X POST -H \"Content-Type: application/json\" -d '{\"prompt\": \"Hello I am a [MASK] model.\"}' http://localhost:8000\n```\n\n---\n\n# Documentation\n\n## potassium.Potassium\n\n```python\nfrom potassium import Potassium\n\napp = Potassium(\"server\")\n```\n\nThis instantiates your HTTP app, similar to popular frameworks like [Flask](https://flask.palletsprojects.com/en/2.2.x/_)\n\n---\n\n## @app.init\n\n```python\n@app.init\ndef init():\n    device = 0 if torch.cuda.is_available() else -1\n    model = pipeline('fill-mask', model='bert-base-uncased', device=device)\n\n    return {\n        \"model\": model\n    }\n```\n\nThe `@app.init` decorated function runs once on server startup, and is used to load any reuseable, heavy objects such as:\n\n- Your AI model, loaded to GPU\n- Tokenizers\n- Precalculated embeddings\n\nThe return value is a dictionary which saves to the app's `context`, and is used later in the handler functions.\n\nThere may only be one `@app.init` function.\n\n---\n\n## @app.handler()\n\n```python\n@app.handler(\"/\")\ndef handler(context: dict, request: Request) -> Response:\n    \n    prompt = request.json.get(\"prompt\")\n    model = context.get(\"model\")\n    outputs = model(prompt)\n\n    return Response(\n        json = {\"outputs\": outputs}, \n        status=200\n    )\n```\n\nThe `@app.handler` decorated function runs for every http call, and is used to run inference or training workloads against your model(s).\n\nYou may configure as many `@app.handler` functions as you'd like, with unique API routes.\n\nThe context dict passed in is a mutable reference, so you can modify it in-place to persist objects between warm handlers.\n\n---\n\n## @app.background(path=\"/background\")\n\n```python\n@app.background(\"/background\")\ndef handler(context: dict, request: Request) -> Response:\n\n    prompt = request.json.get(\"prompt\")\n    model = context.get(\"model\")\n    outputs = model(prompt)\n\n    send_webhook(url=\"http://localhost:8001\", json={\"outputs\": outputs})\n\n    return\n```\n\nThe `@app.background()` decorated function runs a nonblocking job in the background, for tasks where results aren't expected to return clientside. It's on you to forward the data to wherever you please. Potassium supplies a `send_webhook()` helper function for POSTing data onward to a url, or you may add your own custom upload/pipeline code.\n\nWhen invoked, the server immediately returns a `{\"success\": true}` message.\n\nYou may configure as many `@app.background` functions as you'd like, with unique API routes.\n\n\nThe context dict passed in is a mutable reference, so you can modify it in-place to persist objects between warm handlers.\n\n---\n## app.serve()\n\n`app.serve` runs the server, and is a blocking operation.\n\n---\n## Pre-warming your app\n\nPotassium comes with a built-in endpoint for those cases where you want to \"warm up\" your app to better control the timing of your inference calls. You don't *need* to call it, since your inference call requires `init()` to have run once on server startup anyway, but this gives you a bit more control.\n\nOnce your model is warm (i.e., cold boot finished), this endpoint returns a 200. If a cold boot is required, the `init()` function is first called while the server starts up, and then a 200 is returned from this endpoint.\n\nYou don't need any extra code to enable it, it comes out of the box and you can call it at `/_k/warmup` as either a GET or POST request.\n\n---\n\n# Store\nPotassium includes a key-value storage primative, to help users persist data between calls.\n\nExample usage: your own Redis backend (encouraged)\n```\nfrom potassium.store import Store, RedisConfig\n\nstore = Store(\n    backend=\"redis\",\n    config = RedisConfig(\n        host = \"localhost\",\n        port = 6379\n    )\n)\n\n# in one handler\nstore.set(\"key\", \"value\", ttl=60)\n\n# in another handler\nvalue = store.get(\"key\")\n```\n\nExample usage: using local storage \n- Note: not encouraged on Banana serverless or multi-replica environments, as data is stored only on the single replica\n```\nfrom potassium.store import Store, RedisConfig\n\nstore = Store(\n    backend=\"local\"\n)\n\n# in one handler\nstore.set(\"key\", \"value\", ttl=60)\n\n# in another handler\nvalue = store.get(\"key\")\n```\n\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "The potassium package is a flask-like HTTP server for serving large AI models",
    "version": "0.5.0",
    "project_urls": {
        "Homepage": "https://www.banana.dev"
    },
    "split_keywords": [
        "banana server",
        "http server",
        "banana",
        "framework"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "60f3ae9562b80fd7d8f445e1e530e9e8ab990ff1b56f0e8ce410d8e51f14aefd",
                "md5": "a7d926121a3bde5ebf7fbc75a1b450b8",
                "sha256": "149127b984fe9277f716c3d744b9a2227b44c8269f3aca2a8d7d088ca13f22bb"
            },
            "downloads": -1,
            "filename": "potassium-0.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a7d926121a3bde5ebf7fbc75a1b450b8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 18289,
            "upload_time": "2023-12-12T03:42:09",
            "upload_time_iso_8601": "2023-12-12T03:42:09.386383Z",
            "url": "https://files.pythonhosted.org/packages/60/f3/ae9562b80fd7d8f445e1e530e9e8ab990ff1b56f0e8ce410d8e51f14aefd/potassium-0.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2add19f1d81ee3d5bc02d9f853024587dfb93aca9d681ac255f96e533690642e",
                "md5": "821c552e3b87253031690b6f6408ad45",
                "sha256": "08dcd45522737a7919a6a9014a60ab1550ef977b9183035eebdb8b5d8dff7fc5"
            },
            "downloads": -1,
            "filename": "potassium-0.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "821c552e3b87253031690b6f6408ad45",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 21254,
            "upload_time": "2023-12-12T03:42:10",
            "upload_time_iso_8601": "2023-12-12T03:42:10.851671Z",
            "url": "https://files.pythonhosted.org/packages/2a/dd/19f1d81ee3d5bc02d9f853024587dfb93aca9d681ac255f96e533690642e/potassium-0.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-12 03:42:10",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "potassium"
}

Erik Dunteman