Name | litserve JSON |
Version |
0.2.15
JSON |
| download |
home_page | None |
Summary | Lightweight AI server. |
upload_time | 2025-07-31 11:46:58 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | Apache-2.0 |
keywords |
ai
deep learning
pytorch
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<div align='center'>
<h2>
The easiest way to deploy agents, MCP servers, RAG, pipelines, any model.
<br/>
No MLOps. No YAML.
</h2>
<img alt="Lightning" src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/ls_banner2.png" width="800px" style="max-width: 100%;">
</div>
Most serving engines serve one model with rigid abstractions. LitServe lets you serve any model (vision, audio, text) and build full AI systems - agents, chatbots, MCP servers, RAG, pipelines - with full control, batching, multi-GPU, streaming, custom logic, multi-model support, and zero YAML.
Self host or deploy in one-click to [Lightning AI](https://lightning.ai/).
<div align='center'>
<pre>
✅ Build full AI systems ✅ 2× faster than FastAPI ✅ Agents, RAG, pipelines, more
✅ Custom logic + control ✅ Any PyTorch model ✅ Self-host or managed
✅ Multi-GPU autoscaling ✅ Batching + streaming ✅ BYO model or vLLM
✅ No MLOps glue code ✅ Easy setup in Python ✅ Serverless support
</pre>
<div align='center'>
[](https://pepy.tech/projects/litserve)
[](https://discord.gg/WajDThKAur)

[](https://codecov.io/gh/Lightning-AI/litserve)
[](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)
</div>
</div>
<div align="center">
<div style="text-align: center;">
<a target="_blank" href="#quick-start" style="margin: 0 10px;">Quick start</a> •
<a target="_blank" href="#featured-examples" style="margin: 0 10px;">Examples</a> •
<a target="_blank" href="#features" style="margin: 0 10px;">Features</a> •
<a target="_blank" href="#performance" style="margin: 0 10px;">Performance</a> •
<a target="_blank" href="#host-anywhere" style="margin: 0 10px;">Hosting</a> •
<a target="_blank" href="https://lightning.ai/docs/litserve" style="margin: 0 10px;">Docs</a>
</div>
</div>
<div align="center">
<a target="_blank" href="https://lightning.ai/docs/litserve/home/get-started">
<img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/get-started-badge.svg" height="36px" alt="Get started"/>
</a>
</div>
# Quick start
Install LitServe via pip ([more options](https://lightning.ai/docs/litserve/home/install)):
```bash
pip install litserve
```
[Example 1](#inference-pipeline-example): Toy inference pipeline with multiple models.
[Example 2](#agent-example): Minimal agent to fetch the news (with OpenAI API).
([Advanced examples](#featured-examples)):
### Inference pipeline example
```python
import litserve as ls
# define the api to include any number of models, dbs, etc...
class InferencePipeline(ls.LitAPI):
def setup(self, device):
self.model1 = lambda x: x**2
self.model2 = lambda x: x**3
def predict(self, request):
x = request["input"]
# perform calculations using both models
a = self.model1(x)
b = self.model2(x)
c = a + b
return {"output": c}
if __name__ == "__main__":
# 12+ features like batching, streaming, etc...
server = ls.LitServer(InferencePipeline(max_batch_size=1), accelerator="auto")
server.run(port=8000)
```
Deploy for free to [Lightning cloud](#hosting-options) (or self host anywhere):
```bash
# Deploy for free with autoscaling, monitoring, etc...
lightning deploy server.py --cloud
# Or run locally (self host anywhere)
lightning deploy server.py
# python server.py
```
Test the server: Simulate an http request (run this on any terminal):
```bash
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"input": 4.0}'
```
### Agent example
```python
import re, requests, openai
import litserve as ls
class NewsAgent(ls.LitAPI):
def setup(self, device):
self.openai_client = openai.OpenAI(api_key="OPENAI_API_KEY")
def predict(self, request):
website_url = request.get("website_url", "https://text.npr.org/")
website_text = re.sub(r'<[^>]+>', ' ', requests.get(website_url).text)
# ask the LLM to tell you about the news
llm_response = self.openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": f"Based on this, what is the latest: {website_text}"}],
)
output = llm_response.choices[0].message.content.strip()
return {"output": output}
if __name__ == "__main__":
server = ls.LitServer(NewsAgent())
server.run(port=8000)
```
Test it:
```bash
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"website_url": "https://text.npr.org/"}'
```
# Key benefits
A few key benefits:
- **Deploy any pipeline or model**: Agents, pipelines, RAG, chatbots, image models, video, speech, text, etc...
- **No MLOps glue:** LitAPI lets you build full AI systems (multi-model, agent, RAG) in one place ([more](https://lightning.ai/docs/litserve/api-reference/litapi)).
- **Instant setup:** Connect models, DBs, and data in a few lines with `setup()` ([more](https://lightning.ai/docs/litserve/api-reference/litapi#setup)).
- **Optimized:** autoscaling, GPU support, and fast inference included ([more](https://lightning.ai/docs/litserve/api-reference/litserver)).
- **Deploy anywhere:** self-host or one-click deploy with Lightning ([more](https://lightning.ai/docs/litserve/features/deploy-on-cloud)).
- **FastAPI for AI:** Built on FastAPI but optimized for AI - 2× faster with AI-specific multi-worker handling ([more]((#performance))).
- **Expert-friendly:** Use vLLM, or build your own with full control over batching, caching, and logic ([more](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api)).
> ⚠️ Not a vLLM or Ollama alternative out of the box. LitServe gives you lower-level flexibility to build what they do (and more) if you need it.
# Featured examples
Here are examples of inference pipelines for common model types and use cases.
<pre>
<strong>Toy model:</strong> <a target="_blank" href="#define-a-server">Hello world</a>
<strong>LLMs:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-llama-3-2-vision-with-litserve">Llama 3.2</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/openai-fault-tolerant-proxy-server">LLM Proxy server</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-ai-agent-with-tool-use">Agent with tool use</a>
<strong>RAG:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api">vLLM RAG (Llama 3.2)</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-1-rag-api">RAG API (LlamaIndex)</a>
<strong>NLP:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-any-hugging-face-model-instantly">Hugging face</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-hugging-face-bert-model">BERT</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-text-embedding-api-with-litserve">Text embedding API</a>
<strong>Multimodal:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-open-ai-clip-with-litserve">OpenAI Clip</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-multi-modal-llm-with-minicpm">MiniCPM</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-phi3-5-vision-api-with-litserve">Phi-3.5 Vision Instruct</a>, <a target="_blank" href="https://lightning.ai/bhimrajyadav/studios/deploy-and-chat-with-qwen2-vl-using-litserve">Qwen2-VL</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-multi-modal-llm-with-pixtral">Pixtral</a>
<strong>Audio:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-open-ai-s-whisper-model">Whisper</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-an-music-generation-api-with-meta-s-audio-craft">AudioCraft</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-an-audio-generation-api">StableAudio</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-noise-cancellation-api-with-deepfilternet">Noise cancellation (DeepFilterNet)</a>
<strong>Vision:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-private-api-for-stable-diffusion-2">Stable diffusion 2</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-an-image-generation-api-with-auraflow">AuraFlow</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-an-image-generation-api-with-flux">Flux</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-super-resolution-image-api-with-aura-sr">Image Super Resolution (Aura SR)</a>,
<a target="_blank" href="https://lightning.ai/bhimrajyadav/studios/deploy-background-removal-api-with-litserve">Background Removal</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-controlled-image-generation-api-controlnet">Control Stable Diffusion (ControlNet)</a>
<strong>Speech:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-voice-clone-api-coqui-xtts-v2-model">Text-speech (XTTS V2)</a>, <a target="_blank" href="https://lightning.ai/bhimrajyadav/studios/deploy-a-speech-generation-api-using-parler-tts-powered-by-litserve">Parler-TTS</a>
<strong>Classical ML:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-random-forest-with-litserve">Random forest</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-xgboost-with-litserve">XGBoost</a>
<strong>Miscellaneous:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-an-media-conversion-api-with-ffmpeg">Media conversion API (ffmpeg)</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-both-pytorch-and-tensorflow-in-a-single-api">PyTorch + TensorFlow in one API</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/openai-fault-tolerant-proxy-server">LLM proxy server</a>
</pre>
</pre>
[Browse 100+ community-built templates](https://lightning.ai/studios?section=serving)
# Host anywhere
Self-host with full control, or deploy with [Lightning AI](https://lightning.ai/) in seconds with autoscaling, security, and 99.995% uptime.
**Free tier included. No setup required. Run on your cloud**
```bash
lightning deploy server.py --cloud
```
https://github.com/user-attachments/assets/ff83dab9-0c9f-4453-8dcb-fb9526726344
# Features
<div align='center'>
| [Feature](https://lightning.ai/docs/litserve/features) | Self Managed | [Fully Managed on Lightning](https://lightning.ai/deploy) |
|----------------------------------------------------------------------|-----------------------------------|------------------------------------|
| Docker-first deployment | ✅ DIY | ✅ One-click deploy |
| Cost | ✅ Free (DIY) | ✅ Generous [free tier](https://lightning.ai/pricing) with pay as you go |
| Full control | ✅ | ✅ |
| Use any engine (vLLM, etc.) | ✅ | ✅ vLLM, Ollama, LitServe, etc. |
| Own VPC | ✅ (manual setup) | ✅ Connect your own VPC |
| [(2x)+ faster than plain FastAPI](#performance) | ✅ | ✅ |
| [Bring your own model](https://lightning.ai/docs/litserve/features/full-control) | ✅ | ✅ |
| [Build compound systems (1+ models)](https://lightning.ai/docs/litserve/home) | ✅ | ✅ |
| [GPU autoscaling](https://lightning.ai/docs/litserve/features/gpu-inference) | ✅ | ✅ |
| [Batching](https://lightning.ai/docs/litserve/features/batching) | ✅ | ✅ |
| [Streaming](https://lightning.ai/docs/litserve/features/streaming) | ✅ | ✅ |
| [Worker autoscaling](https://lightning.ai/docs/litserve/features/autoscaling) | ✅ | ✅ |
| [Serve all models: (LLMs, vision, etc.)](https://lightning.ai/docs/litserve/examples) | ✅ | ✅ |
| [Supports PyTorch, JAX, TF, etc...](https://lightning.ai/docs/litserve/features/full-control) | ✅ | ✅ |
| [OpenAPI compliant](https://www.openapis.org/) | ✅ | ✅ |
| [Open AI compatibility](https://lightning.ai/docs/litserve/features/open-ai-spec) | ✅ | ✅ |
| [MCP server support](https://lightning.ai/docs/litserve/features/mcp) | ✅ | ✅ |
| [Asynchronous](https://lightning.ai/docs/litserve/features/async-concurrency) | ✅ | ✅ |
| [Authentication](https://lightning.ai/docs/litserve/features/authentication) | ❌ DIY | ✅ Token, password, custom |
| GPUs | ❌ DIY | ✅ 8+ GPU types, H100s from $1.75 |
| Load balancing | ❌ | ✅ Built-in |
| Scale to zero (serverless) | ❌ | ✅ No machine runs when idle |
| Autoscale up on demand | ❌ | ✅ Auto scale up/down |
| Multi-node inference | ❌ | ✅ Distribute across nodes |
| Use AWS/GCP credits | ❌ | ✅ Use existing cloud commits |
| Versioning | ❌ | ✅ Make and roll back releases |
| Enterprise-grade uptime (99.95%) | ❌ | ✅ SLA-backed |
| SOC2 / HIPAA compliance | ❌ | ✅ Certified & secure |
| Observability | ❌ | ✅ Built-in, connect 3rd party tools|
| CI/CD ready | ❌ | ✅ Lightning SDK |
| 24/7 enterprise support | ❌ | ✅ Dedicated support |
| Cost controls & audit logs | ❌ | ✅ Budgets, breakdowns, logs |
| Debug on GPUs | ❌ | ✅ Studio integration |
| [20+ features](https://lightning.ai/docs/litserve/features) | - | - |
</div>
# Performance
LitServe is designed for AI workloads. Specialized multi-worker handling delivers a minimum **2x speedup over FastAPI**.
Additional features like batching and GPU autoscaling can drive performance well beyond 2x, scaling efficiently to handle more simultaneous requests than FastAPI and TorchServe.
Reproduce the full benchmarks [here](https://lightning.ai/docs/litserve/home/benchmarks) (higher is better).
<div align="center">
<img alt="LitServe" src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/ls_charts_v6.png" width="1000px" style="max-width: 100%;">
</div>
These results are for image and text classification ML tasks. The performance relationships hold for other ML tasks (embedding, LLM serving, audio, segmentation, object detection, summarization etc...).
***💡 Note on LLM serving:*** For high-performance LLM serving (like Ollama/vLLM), integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm), or build your custom vLLM-like server with LitServe. Optimizations like kv-caching, which can be done with LitServe, are needed to maximize LLM performance.
# Community
LitServe is a [community project accepting contributions](https://lightning.ai/docs/litserve/community) - Let's make the world's most advanced AI inference engine.
💬 [Get help on Discord](https://discord.com/invite/XncpTy7DSt)
📋 [License: Apache 2.0](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)
Raw data
{
"_id": null,
"home_page": null,
"name": "litserve",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "AI, deep learning, pytorch",
"author": null,
"author_email": "\"Lightning-AI et al.\" <community@lightning.ai>",
"download_url": "https://files.pythonhosted.org/packages/82/c8/2335f64352e7209c170eb28de5379ab4d0659a4d99b882c99bc10009b07c/litserve-0.2.15.tar.gz",
"platform": null,
"description": "<div align='center'>\n\n<h2>\n The easiest way to deploy agents, MCP servers, RAG, pipelines, any model. \n <br/>\n No MLOps. No YAML.\n</h2> \n\n<img alt=\"Lightning\" src=\"https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/ls_banner2.png\" width=\"800px\" style=\"max-width: 100%;\">\n\n \n</div>\n\nMost serving engines serve one model with rigid abstractions. LitServe lets you serve any model (vision, audio, text) and build full AI systems - agents, chatbots, MCP servers, RAG, pipelines - with full control, batching, multi-GPU, streaming, custom logic, multi-model support, and zero YAML. \n\nSelf host or deploy in one-click to [Lightning AI](https://lightning.ai/).\n\n \n\n<div align='center'>\n \n<pre>\n\u2705 Build full AI systems \u2705 2\u00d7 faster than FastAPI \u2705 Agents, RAG, pipelines, more\n\u2705 Custom logic + control \u2705 Any PyTorch model \u2705 Self-host or managed \n\u2705 Multi-GPU autoscaling \u2705 Batching + streaming \u2705 BYO model or vLLM \n\u2705 No MLOps glue code \u2705 Easy setup in Python \u2705 Serverless support \n\n</pre>\n\n<div align='center'>\n\n[](https://pepy.tech/projects/litserve)\n[](https://discord.gg/WajDThKAur)\n\n[](https://codecov.io/gh/Lightning-AI/litserve)\n[](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)\n\n</div>\n</div>\n<div align=\"center\">\n <div style=\"text-align: center;\">\n <a target=\"_blank\" href=\"#quick-start\" style=\"margin: 0 10px;\">Quick start</a> \u2022\n <a target=\"_blank\" href=\"#featured-examples\" style=\"margin: 0 10px;\">Examples</a> \u2022\n <a target=\"_blank\" href=\"#features\" style=\"margin: 0 10px;\">Features</a> \u2022\n <a target=\"_blank\" href=\"#performance\" style=\"margin: 0 10px;\">Performance</a> \u2022\n <a target=\"_blank\" href=\"#host-anywhere\" style=\"margin: 0 10px;\">Hosting</a> \u2022\n <a target=\"_blank\" href=\"https://lightning.ai/docs/litserve\" style=\"margin: 0 10px;\">Docs</a>\n </div>\n</div>\n\n \n\n<div align=\"center\">\n<a target=\"_blank\" href=\"https://lightning.ai/docs/litserve/home/get-started\">\n <img src=\"https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/get-started-badge.svg\" height=\"36px\" alt=\"Get started\"/>\n</a>\n</div>\n\n \n\n# Quick start\n\nInstall LitServe via pip ([more options](https://lightning.ai/docs/litserve/home/install)):\n\n```bash\npip install litserve\n```\n\n[Example 1](#inference-pipeline-example): Toy inference pipeline with multiple models. \n[Example 2](#agent-example): Minimal agent to fetch the news (with OpenAI API). \n([Advanced examples](#featured-examples)): \n\n### Inference pipeline example \n\n```python\nimport litserve as ls\n\n# define the api to include any number of models, dbs, etc...\nclass InferencePipeline(ls.LitAPI):\n def setup(self, device):\n self.model1 = lambda x: x**2\n self.model2 = lambda x: x**3\n\n def predict(self, request):\n x = request[\"input\"] \n # perform calculations using both models\n a = self.model1(x)\n b = self.model2(x)\n c = a + b\n return {\"output\": c}\n\nif __name__ == \"__main__\":\n # 12+ features like batching, streaming, etc...\n server = ls.LitServer(InferencePipeline(max_batch_size=1), accelerator=\"auto\")\n server.run(port=8000)\n```\n\nDeploy for free to [Lightning cloud](#hosting-options) (or self host anywhere):\n\n```bash\n# Deploy for free with autoscaling, monitoring, etc...\nlightning deploy server.py --cloud\n\n# Or run locally (self host anywhere)\nlightning deploy server.py\n# python server.py\n```\n\nTest the server: Simulate an http request (run this on any terminal):\n```bash\ncurl -X POST http://127.0.0.1:8000/predict -H \"Content-Type: application/json\" -d '{\"input\": 4.0}'\n```\n\n### Agent example\n\n```python\nimport re, requests, openai\nimport litserve as ls\n\nclass NewsAgent(ls.LitAPI):\n def setup(self, device):\n self.openai_client = openai.OpenAI(api_key=\"OPENAI_API_KEY\")\n\n def predict(self, request):\n website_url = request.get(\"website_url\", \"https://text.npr.org/\")\n website_text = re.sub(r'<[^>]+>', ' ', requests.get(website_url).text)\n\n # ask the LLM to tell you about the news\n llm_response = self.openai_client.chat.completions.create(\n model=\"gpt-3.5-turbo\", \n messages=[{\"role\": \"user\", \"content\": f\"Based on this, what is the latest: {website_text}\"}],\n )\n output = llm_response.choices[0].message.content.strip()\n return {\"output\": output}\n\nif __name__ == \"__main__\":\n server = ls.LitServer(NewsAgent())\n server.run(port=8000)\n```\nTest it:\n```bash\ncurl -X POST http://127.0.0.1:8000/predict -H \"Content-Type: application/json\" -d '{\"website_url\": \"https://text.npr.org/\"}'\n```\n\n \n\n# Key benefits \n\nA few key benefits:\n\n- **Deploy any pipeline or model**: Agents, pipelines, RAG, chatbots, image models, video, speech, text, etc...\n- **No MLOps glue:** LitAPI lets you build full AI systems (multi-model, agent, RAG) in one place ([more](https://lightning.ai/docs/litserve/api-reference/litapi)). \n- **Instant setup:** Connect models, DBs, and data in a few lines with `setup()` ([more](https://lightning.ai/docs/litserve/api-reference/litapi#setup)). \n- **Optimized:** autoscaling, GPU support, and fast inference included ([more](https://lightning.ai/docs/litserve/api-reference/litserver)). \n- **Deploy anywhere:** self-host or one-click deploy with Lightning ([more](https://lightning.ai/docs/litserve/features/deploy-on-cloud)).\n- **FastAPI for AI:** Built on FastAPI but optimized for AI - 2\u00d7 faster with AI-specific multi-worker handling ([more]((#performance))). \n- **Expert-friendly:** Use vLLM, or build your own with full control over batching, caching, and logic ([more](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api)). \n\n> \u26a0\ufe0f Not a vLLM or Ollama alternative out of the box. LitServe gives you lower-level flexibility to build what they do (and more) if you need it.\n\n \n\n# Featured examples \nHere are examples of inference pipelines for common model types and use cases. \n \n<pre>\n<strong>Toy model:</strong> <a target=\"_blank\" href=\"#define-a-server\">Hello world</a>\n<strong>LLMs:</strong> <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-llama-3-2-vision-with-litserve\">Llama 3.2</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/openai-fault-tolerant-proxy-server\">LLM Proxy server</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-ai-agent-with-tool-use\">Agent with tool use</a>\n<strong>RAG:</strong> <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api\">vLLM RAG (Llama 3.2)</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-1-rag-api\">RAG API (LlamaIndex)</a>\n<strong>NLP:</strong> <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-any-hugging-face-model-instantly\">Hugging face</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-a-hugging-face-bert-model\">BERT</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-text-embedding-api-with-litserve\">Text embedding API</a>\n<strong>Multimodal:</strong> <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-open-ai-clip-with-litserve\">OpenAI Clip</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-a-multi-modal-llm-with-minicpm\">MiniCPM</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-phi3-5-vision-api-with-litserve\">Phi-3.5 Vision Instruct</a>, <a target=\"_blank\" href=\"https://lightning.ai/bhimrajyadav/studios/deploy-and-chat-with-qwen2-vl-using-litserve\">Qwen2-VL</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-a-multi-modal-llm-with-pixtral\">Pixtral</a>\n<strong>Audio:</strong> <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-open-ai-s-whisper-model\">Whisper</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-an-music-generation-api-with-meta-s-audio-craft\">AudioCraft</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-an-audio-generation-api\">StableAudio</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-a-noise-cancellation-api-with-deepfilternet\">Noise cancellation (DeepFilterNet)</a>\n<strong>Vision:</strong> <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-a-private-api-for-stable-diffusion-2\">Stable diffusion 2</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-an-image-generation-api-with-auraflow\">AuraFlow</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-an-image-generation-api-with-flux\">Flux</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-a-super-resolution-image-api-with-aura-sr\">Image Super Resolution (Aura SR)</a>,\n <a target=\"_blank\" href=\"https://lightning.ai/bhimrajyadav/studios/deploy-background-removal-api-with-litserve\">Background Removal</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-a-controlled-image-generation-api-controlnet\">Control Stable Diffusion (ControlNet)</a>\n<strong>Speech:</strong> <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-a-voice-clone-api-coqui-xtts-v2-model\">Text-speech (XTTS V2)</a>, <a target=\"_blank\" href=\"https://lightning.ai/bhimrajyadav/studios/deploy-a-speech-generation-api-using-parler-tts-powered-by-litserve\">Parler-TTS</a>\n<strong>Classical ML:</strong> <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-random-forest-with-litserve\">Random forest</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-xgboost-with-litserve\">XGBoost</a>\n<strong>Miscellaneous:</strong> <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-an-media-conversion-api-with-ffmpeg\">Media conversion API (ffmpeg)</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/deploy-both-pytorch-and-tensorflow-in-a-single-api\">PyTorch + TensorFlow in one API</a>, <a target=\"_blank\" href=\"https://lightning.ai/lightning-ai/studios/openai-fault-tolerant-proxy-server\">LLM proxy server</a>\n</pre>\n</pre>\n\n[Browse 100+ community-built templates](https://lightning.ai/studios?section=serving)\n\n \n\n# Host anywhere\n\nSelf-host with full control, or deploy with [Lightning AI](https://lightning.ai/) in seconds with autoscaling, security, and 99.995% uptime. \n**Free tier included. No setup required. Run on your cloud** \n\n```bash\nlightning deploy server.py --cloud\n```\n\nhttps://github.com/user-attachments/assets/ff83dab9-0c9f-4453-8dcb-fb9526726344\n\n \n\n# Features\n\n<div align='center'>\n\n| [Feature](https://lightning.ai/docs/litserve/features) | Self Managed | [Fully Managed on Lightning](https://lightning.ai/deploy) |\n|----------------------------------------------------------------------|-----------------------------------|------------------------------------|\n| Docker-first deployment | \u2705 DIY | \u2705 One-click deploy |\n| Cost | \u2705 Free (DIY) | \u2705 Generous [free tier](https://lightning.ai/pricing) with pay as you go |\n| Full control | \u2705 | \u2705 |\n| Use any engine (vLLM, etc.) | \u2705 | \u2705 vLLM, Ollama, LitServe, etc. |\n| Own VPC | \u2705 (manual setup) | \u2705 Connect your own VPC |\n| [(2x)+ faster than plain FastAPI](#performance) | \u2705 | \u2705 |\n| [Bring your own model](https://lightning.ai/docs/litserve/features/full-control) | \u2705 | \u2705 |\n| [Build compound systems (1+ models)](https://lightning.ai/docs/litserve/home) | \u2705 | \u2705 |\n| [GPU autoscaling](https://lightning.ai/docs/litserve/features/gpu-inference) | \u2705 | \u2705 |\n| [Batching](https://lightning.ai/docs/litserve/features/batching) | \u2705 | \u2705 |\n| [Streaming](https://lightning.ai/docs/litserve/features/streaming) | \u2705 | \u2705 |\n| [Worker autoscaling](https://lightning.ai/docs/litserve/features/autoscaling) | \u2705 | \u2705 |\n| [Serve all models: (LLMs, vision, etc.)](https://lightning.ai/docs/litserve/examples) | \u2705 | \u2705 |\n| [Supports PyTorch, JAX, TF, etc...](https://lightning.ai/docs/litserve/features/full-control) | \u2705 | \u2705 |\n| [OpenAPI compliant](https://www.openapis.org/) | \u2705 | \u2705 |\n| [Open AI compatibility](https://lightning.ai/docs/litserve/features/open-ai-spec) | \u2705 | \u2705 |\n| [MCP server support](https://lightning.ai/docs/litserve/features/mcp) | \u2705 | \u2705 |\n| [Asynchronous](https://lightning.ai/docs/litserve/features/async-concurrency) | \u2705 | \u2705 |\n| [Authentication](https://lightning.ai/docs/litserve/features/authentication) | \u274c DIY | \u2705 Token, password, custom |\n| GPUs | \u274c DIY | \u2705 8+ GPU types, H100s from $1.75 |\n| Load balancing | \u274c | \u2705 Built-in |\n| Scale to zero (serverless) | \u274c | \u2705 No machine runs when idle |\n| Autoscale up on demand | \u274c | \u2705 Auto scale up/down |\n| Multi-node inference | \u274c | \u2705 Distribute across nodes |\n| Use AWS/GCP credits | \u274c | \u2705 Use existing cloud commits |\n| Versioning | \u274c | \u2705 Make and roll back releases |\n| Enterprise-grade uptime (99.95%) | \u274c | \u2705 SLA-backed |\n| SOC2 / HIPAA compliance | \u274c | \u2705 Certified & secure |\n| Observability | \u274c | \u2705 Built-in, connect 3rd party tools|\n| CI/CD ready | \u274c | \u2705 Lightning SDK |\n| 24/7 enterprise support | \u274c | \u2705 Dedicated support |\n| Cost controls & audit logs | \u274c | \u2705 Budgets, breakdowns, logs |\n| Debug on GPUs | \u274c | \u2705 Studio integration |\n| [20+ features](https://lightning.ai/docs/litserve/features) | - | - |\n\n</div>\n\n \n\n# Performance \nLitServe is designed for AI workloads. Specialized multi-worker handling delivers a minimum **2x speedup over FastAPI**. \n\nAdditional features like batching and GPU autoscaling can drive performance well beyond 2x, scaling efficiently to handle more simultaneous requests than FastAPI and TorchServe.\n \nReproduce the full benchmarks [here](https://lightning.ai/docs/litserve/home/benchmarks) (higher is better). \n\n<div align=\"center\">\n <img alt=\"LitServe\" src=\"https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/ls_charts_v6.png\" width=\"1000px\" style=\"max-width: 100%;\">\n</div> \n\nThese results are for image and text classification ML tasks. The performance relationships hold for other ML tasks (embedding, LLM serving, audio, segmentation, object detection, summarization etc...). \n \n***\ud83d\udca1 Note on LLM serving:*** For high-performance LLM serving (like Ollama/vLLM), integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm), or build your custom vLLM-like server with LitServe. Optimizations like kv-caching, which can be done with LitServe, are needed to maximize LLM performance.\n\n \n\n\n# Community\nLitServe is a [community project accepting contributions](https://lightning.ai/docs/litserve/community) - Let's make the world's most advanced AI inference engine.\n\n\ud83d\udcac [Get help on Discord](https://discord.com/invite/XncpTy7DSt) \n\ud83d\udccb [License: Apache 2.0](https://github.com/Lightning-AI/litserve/blob/main/LICENSE) \n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Lightweight AI server.",
"version": "0.2.15",
"project_urls": {
"Bug Tracker": "https://github.com/Lightning-AI/litserve/issues",
"Documentation": "https://lightning-ai.github.io/litserve/",
"Download": "https://github.com/Lightning-AI/litserve",
"Homepage": "https://github.com/Lightning-AI/litserve",
"Source Code": "https://github.com/Lightning-AI/litserve"
},
"split_keywords": [
"ai",
" deep learning",
" pytorch"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f4fdc37d1eb94e124b9d6f51ee2791317d481dcf67aaa100b4854a5ac96daa10",
"md5": "6719ccccc3d3bc54bb5528ca25ebaf93",
"sha256": "67e5357382ddd123ac667a90a70925cd88d8d29814854634d578d5b27f0f29d2"
},
"downloads": -1,
"filename": "litserve-0.2.15-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6719ccccc3d3bc54bb5528ca25ebaf93",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 91166,
"upload_time": "2025-07-31T11:46:57",
"upload_time_iso_8601": "2025-07-31T11:46:57.102906Z",
"url": "https://files.pythonhosted.org/packages/f4/fd/c37d1eb94e124b9d6f51ee2791317d481dcf67aaa100b4854a5ac96daa10/litserve-0.2.15-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "82c82335f64352e7209c170eb28de5379ab4d0659a4d99b882c99bc10009b07c",
"md5": "ec9e379b6dd9acb26b0807997c2bf9d2",
"sha256": "d8baebe1fed9a5a890098748796a7c4c0f0f2aa4508f8c2718c13fab83b761cb"
},
"downloads": -1,
"filename": "litserve-0.2.15.tar.gz",
"has_sig": false,
"md5_digest": "ec9e379b6dd9acb26b0807997c2bf9d2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 208230,
"upload_time": "2025-07-31T11:46:58",
"upload_time_iso_8601": "2025-07-31T11:46:58.193776Z",
"url": "https://files.pythonhosted.org/packages/82/c8/2335f64352e7209c170eb28de5379ab4d0659a4d99b882c99bc10009b07c/litserve-0.2.15.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-31 11:46:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Lightning-AI",
"github_project": "litserve",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "litserve"
}