# FastServe
Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.
> [![img_tag](https://img.youtube.com/vi/GfcmyfPB9qY/0.jpg)](https://www.youtube.com/watch?v=GfcmyfPB9qY)
>
> YouTube: How to serve your own GPT like LLM in 1 minute with FastServe
## Installation
**Stable:**
```shell
pip install FastServeAI
```
**Latest:**
```shell
pip install git+https://github.com/aniketmaurya/fastserve.git@main
```
## Run locally
```bash
python -m fastserve
```
## Usage/Examples
### Serve LLMs with Llama-cpp
```python
from fastserve.models import ServeLlamaCpp
model_path = "openhermes-2-mistral-7b.Q5_K_M.gguf"
serve = ServeLlamaCpp(model_path=model_path, )
serve.run_server()
```
or, run `python -m fastserve.models --model llama-cpp --model_path openhermes-2-mistral-7b.Q5_K_M.gguf` from terminal.
### Serve vLLM
```python
from fastserve.models import ServeVLLM
app = ServeVLLM("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
app.run_server()
```
You can use the FastServe client that will automatically apply chat template for you -
```python
from fastserve.client import vLLMClient
from rich import print
client = vLLMClient("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
response = client.chat("Write a python function to resize image to 224x224", keep_context=True)
# print(client.context)
print(response["outputs"][0]["text"])
```
### Serve SDXL Turbo
```python
from fastserve.models import ServeSDXLTurbo
serve = ServeSDXLTurbo(device="cuda", batch_size=2, timeout=1)
serve.run_server()
```
or, run `python -m fastserve.models --model sdxl-turbo --batch_size 2 --timeout 1` from terminal.
This application comes with an UI. You can access it at [http://localhost:8000/ui](http://localhost:8000/ui) .
<img src="https://raw.githubusercontent.com/aniketmaurya/fastserve/main/assets/sdxl.jpg" width=400 style="border: 1px solid #F2F3F5;">
### Face Detection
```python
from fastserve.models import FaceDetection
serve = FaceDetection(batch_size=2, timeout=1)
serve.run_server()
```
or, run `python -m fastserve.models --model face-detection --batch_size 2 --timeout 1` from terminal.
### Image Classification
```python
from fastserve.models import ServeImageClassification
app = ServeImageClassification("resnet18", timeout=1, batch_size=4)
app.run_server()
```
or, run `python -m fastserve.models --model image-classification --model_name resnet18 --batch_size 4 --timeout 1` from
terminal.
### Serve Custom Model
To serve a custom model, you will have to implement `handle` method for `FastServe` that processes a batch of inputs and
returns the response as a list.
```python
from fastserve import FastServe
class MyModelServing(FastServe):
def __init__(self):
super().__init__(batch_size=2, timeout=0.1)
self.model = create_model(...)
def handle(self, batch: List[BaseRequest]) -> List[float]:
inputs = [b.request for b in batch]
response = self.model(inputs)
return response
app = MyModelServing()
app.run_server()
```
You can run the above script in terminal, and it will launch a FastAPI server for your custom model.
## Deploy
### Lightning AI Studio ⚡️
```shell
python fastserve.deploy.lightning --filename main.py \
--user LIGHTNING_USERNAME \
--teamspace LIGHTNING_TEAMSPACE \
--machine "CPU" # T4, A10G or A10G_X_4
```
## Contribute
**Install in editable mode:**
```shell
git clone https://github.com/aniketmaurya/fastserve.git
cd fastserve
pip install -e .
```
**Create a new branch**
```shell
git checkout -b <new-branch>
```
**Make your changes, commit and [create a PR](https://github.com/aniketmaurya/fastserve/compare).**
<!-- ## FAQ
#### Question 1
Answer 1
#### Question 2
Answer 2 -->
Raw data
{
"_id": null,
"home_page": "https://github.com/aniketmaurya/fastserve",
"name": "FastServeAI",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "opensource,python",
"author": "Aniket Maurya",
"author_email": "theaniketmaurya@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/58/57/2004d3807d412a71bd95764dffe32ddd64d82aa2782a38a4904006ca7507/FastServeAI-0.0.3.tar.gz",
"platform": null,
"description": "# FastServe\n\nMachine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.\n\n> [![img_tag](https://img.youtube.com/vi/GfcmyfPB9qY/0.jpg)](https://www.youtube.com/watch?v=GfcmyfPB9qY)\n>\n> YouTube: How to serve your own GPT like LLM in 1 minute with FastServe\n\n## Installation\n\n**Stable:**\n```shell\npip install FastServeAI\n```\n\n**Latest:**\n```shell\npip install git+https://github.com/aniketmaurya/fastserve.git@main\n```\n\n## Run locally\n\n```bash\npython -m fastserve\n```\n\n## Usage/Examples\n\n\n### Serve LLMs with Llama-cpp\n\n```python\nfrom fastserve.models import ServeLlamaCpp\n\nmodel_path = \"openhermes-2-mistral-7b.Q5_K_M.gguf\"\nserve = ServeLlamaCpp(model_path=model_path, )\nserve.run_server()\n```\n\nor, run `python -m fastserve.models --model llama-cpp --model_path openhermes-2-mistral-7b.Q5_K_M.gguf` from terminal.\n\n\n### Serve vLLM\n\n```python\nfrom fastserve.models import ServeVLLM\n\napp = ServeVLLM(\"TinyLlama/TinyLlama-1.1B-Chat-v1.0\")\napp.run_server()\n```\n\nYou can use the FastServe client that will automatically apply chat template for you -\n\n```python\nfrom fastserve.client import vLLMClient\nfrom rich import print\n\nclient = vLLMClient(\"TinyLlama/TinyLlama-1.1B-Chat-v1.0\")\nresponse = client.chat(\"Write a python function to resize image to 224x224\", keep_context=True)\n# print(client.context)\nprint(response[\"outputs\"][0][\"text\"])\n```\n\n\n### Serve SDXL Turbo\n\n```python\nfrom fastserve.models import ServeSDXLTurbo\n\nserve = ServeSDXLTurbo(device=\"cuda\", batch_size=2, timeout=1)\nserve.run_server()\n```\n\nor, run `python -m fastserve.models --model sdxl-turbo --batch_size 2 --timeout 1` from terminal.\n\nThis application comes with an UI. You can access it at [http://localhost:8000/ui](http://localhost:8000/ui) .\n\n\n<img src=\"https://raw.githubusercontent.com/aniketmaurya/fastserve/main/assets/sdxl.jpg\" width=400 style=\"border: 1px solid #F2F3F5;\">\n\n\n### Face Detection\n\n```python\nfrom fastserve.models import FaceDetection\n\nserve = FaceDetection(batch_size=2, timeout=1)\nserve.run_server()\n```\n\nor, run `python -m fastserve.models --model face-detection --batch_size 2 --timeout 1` from terminal.\n\n### Image Classification\n\n```python\nfrom fastserve.models import ServeImageClassification\n\napp = ServeImageClassification(\"resnet18\", timeout=1, batch_size=4)\napp.run_server()\n```\n\nor, run `python -m fastserve.models --model image-classification --model_name resnet18 --batch_size 4 --timeout 1` from\nterminal.\n\n### Serve Custom Model\n\nTo serve a custom model, you will have to implement `handle` method for `FastServe` that processes a batch of inputs and\nreturns the response as a list.\n\n```python\nfrom fastserve import FastServe\n\n\nclass MyModelServing(FastServe):\n def __init__(self):\n super().__init__(batch_size=2, timeout=0.1)\n self.model = create_model(...)\n\n def handle(self, batch: List[BaseRequest]) -> List[float]:\n inputs = [b.request for b in batch]\n response = self.model(inputs)\n return response\n\n\napp = MyModelServing()\napp.run_server()\n```\n\nYou can run the above script in terminal, and it will launch a FastAPI server for your custom model.\n\n## Deploy\n\n### Lightning AI Studio \u26a1\ufe0f\n\n```shell\npython fastserve.deploy.lightning --filename main.py \\\n --user LIGHTNING_USERNAME \\\n --teamspace LIGHTNING_TEAMSPACE \\\n --machine \"CPU\" # T4, A10G or A10G_X_4\n```\n\n## Contribute\n\n**Install in editable mode:**\n\n```shell\ngit clone https://github.com/aniketmaurya/fastserve.git\ncd fastserve\npip install -e .\n```\n\n**Create a new branch**\n\n```shell\ngit checkout -b \uff1cnew-branch\uff1e\n```\n\n**Make your changes, commit and [create a PR](https://github.com/aniketmaurya/fastserve/compare).**\n\n\n<!-- ## FAQ\n\n#### Question 1\n\nAnswer 1\n\n#### Question 2\n\nAnswer 2 -->\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "'Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.'",
"version": "0.0.3",
"project_urls": {
"Homepage": "https://github.com/aniketmaurya/fastserve"
},
"split_keywords": [
"opensource",
"python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3d2c0e5a2f8b9c65c33a6c8061003c9e848458a183d9eb5207b728c11bd9dc90",
"md5": "f3244f3f3b819198379019710cd1e93b",
"sha256": "cf5be4bda7d9a41425140dcd31a97635ff2f37251fb6a8d9b5f45f9f3f41a8ac"
},
"downloads": -1,
"filename": "FastServeAI-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f3244f3f3b819198379019710cd1e93b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 246964,
"upload_time": "2024-02-23T11:47:18",
"upload_time_iso_8601": "2024-02-23T11:47:18.203405Z",
"url": "https://files.pythonhosted.org/packages/3d/2c/0e5a2f8b9c65c33a6c8061003c9e848458a183d9eb5207b728c11bd9dc90/FastServeAI-0.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "58572004d3807d412a71bd95764dffe32ddd64d82aa2782a38a4904006ca7507",
"md5": "347b70cca13ed5aa8a2109fa1487d472",
"sha256": "ffb237530f4d45b54216d56927d871975e1b4620208cb352b7c39b34a6411516"
},
"downloads": -1,
"filename": "FastServeAI-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "347b70cca13ed5aa8a2109fa1487d472",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 1549228,
"upload_time": "2024-02-23T11:47:20",
"upload_time_iso_8601": "2024-02-23T11:47:20.485055Z",
"url": "https://files.pythonhosted.org/packages/58/57/2004d3807d412a71bd95764dffe32ddd64d82aa2782a38a4904006ca7507/FastServeAI-0.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-23 11:47:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aniketmaurya",
"github_project": "fastserve",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "fastserveai"
}