# Welcome to Paka
<img src="https://raw.githubusercontent.com/jjleng/paka/main/docs/img/paka.svg" alt="paka.png" width="100" height="100">
Get your LLM applications to the cloud with ease. Paka handles failure recovery, autoscaling, and monitoring, freeing you to concentrate on crafting your applications.
## 🚀 Bring LLM models to the cloud in minutes
💰 Cut 50% cost with spot instances, backed by on-demand instances for reliable service quality.
| Model | Parameters | Quantization | GPU | On-Demand | Spot | AWS Node (us-west-2) |
| ---------- | ---------- | ------------ | ------------ | --------- | ------- | ---------------------|
| Llama 3 | 70B | BF16 | A10G x 8 | $16.2880 | $4.8169 | g5.48xlarge |
| Llama 3 | 70B | GPTQ 4bit | T4 x 4 | $3.9120 | $1.6790 | g4dn.12xlarge |
| Llama 3 | 8B | BF16 | L4 x 1 | $0.8048 | $0.1100 | g6.xlarge |
| Llama 2 | 7B | GPTQ 4bit | T4 x 1 | $0.526 | $0.2584 | g4dn.xlarge |
| Mistral | 7B | BF16 | T4 x 1 | $0.526 | $0.2584 | g4dn.xlarge |
| Phi3 Mini | 3.8B | BF16 | T4 x 1 | $0.526 | $0.2584 | g4dn.xlarge |
> Note: Prices are based on us-west-2 region and are in USD per hour. Spot prices change frequently.
> See [Launch Templates](https://github.com/jjleng/paka/tree/main/examples/templates) for more details.
## 🏃 Effortlessly Launch RAG Applications
You only need to take care of the application code. Build the RAG application with your favorite languages (python, TS) and frameworks (Langchain, LlamaIndex) and let Paka handles the rest.
### Support for Vector Store
- A fast vector store (qdrant) for storing embeddings.
- Tunable for performance and cost.
### Serverless Deployment
- Deploy your application as a serverless container.
- Autoscaling and monitoring built-in.
## 📈 Monitoring
Paka comes with built-in support for monitoring and tracing. Metrics are collected via Prometheus. Users can also enable Prometheus Alertmanager for alerting.
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://raw.githubusercontent.com/jjleng/paka/main/docs/img/tokens_per_sec.png" max-width="1000"/>
</div>
## ⚙️ Architecture
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://raw.githubusercontent.com/jjleng/paka/main/docs/img/architecture.png" max-width="1000"/>
</div>
## 📜 Roadmap
- [x] (Multi-cloud) AWS support
- [x] (Backend) vLLM
- [x] (Backend) llama.cpp
- [x] (Platform) Windows support
- [x] (Accelerator) Nvidia GPU support
- [ ] (Multi-cloud) GCP support
- [ ] (Backend) TGI
- [ ] (Accelerator) AMD GPU support
- [ ] (Accelerator) Inferentia support
## 🎬 Getting Started
### Dependencies
- docker daemon and CLI
- AWS CLI
```bash
# Ensure your AWS credentials are correctly configured.
aws configure
```
### Install Paka
```bash
pip install paka
```
### Provisioning the cluster
Create a `cluster.yaml` file with the following content:
```yaml
version: "1.2"
aws:
cluster:
name: my-awesome-cluster
region: us-west-2
namespace: default
nodeType: t3a.medium
minNodes: 2
maxNodes: 4
prometheus:
enabled: true
modelGroups:
- name: llama2-7b-chat
nodeType: g4dn.xlarge
isPublic: true
minInstances: 1
maxInstances: 1
name: llama3-70b-instruct
runtime:
image: vllm/vllm-openai:v0.4.2
model:
hfRepoId: TheBloke/Llama-2-7B-Chat-GPTQ
useModelStore: false
gpu:
enabled: true
diskSize: 50
```
Bring up the cluster with the following command:
```bash
paka cluster up -f cluster.yaml
```
### Code up the application
Use your favorite language and framework to build the application. Here is an example of a Python application using Langchain:
[invoice_extraction](https://github.com/jjleng/paka/tree/main/examples/invoice_extraction)
With Paka, you can effortlessly build your source code and deploy it as a serverless function, no Dockerfile needed. Just ensure the following:
- **Procfile**: Defines the entrypoint for your application. See [Procfile](https://github.com/jjleng/paka/blob/main/examples/invoice_extraction/Procfile).
- **.cnignore file**: Excludes any files that shouldn't be included in the build. See [.cnignore](https://github.com/jjleng/paka/blob/main/examples/invoice_extraction/.cnignore).
- **runtime.txt**: Pins the version of the runtime your application uses. See [runtime.txt](https://github.com/jjleng/paka/blob/main/examples/invoice_extraction/runtime.txt).
- **requirements.txt or package.json**: Lists all necessary packages for your application.
### Deploy the App
```bash
paka function deploy --name invoice-extraction --source . --entrypoint serve
```
## 📖 Documentation
- [Quick Start](./docs/quick_start.md)
- [FAQ](./docs/faq.md)
- [cluster_config.yaml](./docs/cluster_config.md)
## Contributing
- code changes
- `make check-all`
- Open a PR
Raw data
{
"_id": null,
"home_page": "https://github.com/jjleng/paka",
"name": "paka",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.8",
"maintainer_email": null,
"keywords": "LLMOps, RAG, production, Cloud",
"author": "Jijun Leng",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/71/24/d066d051303e1ffc0bb8312e5f850e7c2d5443e8c2e8f1e56ab8f5a808a7/paka-0.1.11.tar.gz",
"platform": null,
"description": "# Welcome to Paka\n\n<img src=\"https://raw.githubusercontent.com/jjleng/paka/main/docs/img/paka.svg\" alt=\"paka.png\" width=\"100\" height=\"100\">\n\n\nGet your LLM applications to the cloud with ease. Paka handles failure recovery, autoscaling, and monitoring, freeing you to concentrate on crafting your applications.\n\n## \ud83d\ude80 Bring LLM models to the cloud in minutes\n\ud83d\udcb0 Cut 50% cost with spot instances, backed by on-demand instances for reliable service quality.\n\n| Model | Parameters | Quantization | GPU | On-Demand | Spot | AWS Node (us-west-2) |\n| ---------- | ---------- | ------------ | ------------ | --------- | ------- | ---------------------|\n| Llama 3 | 70B | BF16 | A10G x 8 | $16.2880 | $4.8169 | g5.48xlarge |\n| Llama 3 | 70B | GPTQ 4bit | T4 x 4 | $3.9120 | $1.6790 | g4dn.12xlarge |\n| Llama 3 | 8B | BF16 | L4 x 1 | $0.8048 | $0.1100 | g6.xlarge |\n| Llama 2 | 7B | GPTQ 4bit | T4 x 1 | $0.526 | $0.2584 | g4dn.xlarge |\n| Mistral | 7B | BF16 | T4 x 1 | $0.526 | $0.2584 | g4dn.xlarge |\n| Phi3 Mini | 3.8B | BF16 | T4 x 1 | $0.526 | $0.2584 | g4dn.xlarge |\n\n> Note: Prices are based on us-west-2 region and are in USD per hour. Spot prices change frequently.\n> See [Launch Templates](https://github.com/jjleng/paka/tree/main/examples/templates) for more details.\n\n\n## \ud83c\udfc3 Effortlessly Launch RAG Applications\nYou only need to take care of the application code. Build the RAG application with your favorite languages (python, TS) and frameworks (Langchain, LlamaIndex) and let Paka handles the rest.\n\n### Support for Vector Store\n- A fast vector store (qdrant) for storing embeddings.\n- Tunable for performance and cost.\n\n### Serverless Deployment\n- Deploy your application as a serverless container.\n- Autoscaling and monitoring built-in.\n\n\n## \ud83d\udcc8 Monitoring\nPaka comes with built-in support for monitoring and tracing. Metrics are collected via Prometheus. Users can also enable Prometheus Alertmanager for alerting.\n\n<div align=\"center\" style=\"margin-top:20px;margin-bottom:20px;\">\n<img src=\"https://raw.githubusercontent.com/jjleng/paka/main/docs/img/tokens_per_sec.png\" max-width=\"1000\"/>\n</div>\n\n## \u2699\ufe0f Architecture\n\n<div align=\"center\" style=\"margin-top:20px;margin-bottom:20px;\">\n<img src=\"https://raw.githubusercontent.com/jjleng/paka/main/docs/img/architecture.png\" max-width=\"1000\"/>\n</div>\n\n## \ud83d\udcdc Roadmap\n- [x] (Multi-cloud) AWS support\n- [x] (Backend) vLLM\n- [x] (Backend) llama.cpp\n- [x] (Platform) Windows support\n- [x] (Accelerator) Nvidia GPU support\n- [ ] (Multi-cloud) GCP support\n- [ ] (Backend) TGI\n- [ ] (Accelerator) AMD GPU support\n- [ ] (Accelerator) Inferentia support\n\n## \ud83c\udfac Getting Started\n### Dependencies\n- docker daemon and CLI\n- AWS CLI\n```bash\n# Ensure your AWS credentials are correctly configured.\naws configure\n```\n\n### Install Paka\n```bash\npip install paka\n```\n\n### Provisioning the cluster\n\nCreate a `cluster.yaml` file with the following content:\n\n```yaml\nversion: \"1.2\"\naws:\n cluster:\n name: my-awesome-cluster\n region: us-west-2\n namespace: default\n nodeType: t3a.medium\n minNodes: 2\n maxNodes: 4\n prometheus:\n enabled: true\n modelGroups:\n - name: llama2-7b-chat\n nodeType: g4dn.xlarge\n isPublic: true\n minInstances: 1\n maxInstances: 1\n name: llama3-70b-instruct\n runtime:\n image: vllm/vllm-openai:v0.4.2\n model:\n hfRepoId: TheBloke/Llama-2-7B-Chat-GPTQ\n useModelStore: false\n gpu:\n enabled: true\n diskSize: 50\n```\n\nBring up the cluster with the following command:\n\n```bash\npaka cluster up -f cluster.yaml\n```\n\n### Code up the application\nUse your favorite language and framework to build the application. Here is an example of a Python application using Langchain:\n\n[invoice_extraction](https://github.com/jjleng/paka/tree/main/examples/invoice_extraction)\n\nWith Paka, you can effortlessly build your source code and deploy it as a serverless function, no Dockerfile needed. Just ensure the following:\n\n- **Procfile**: Defines the entrypoint for your application. See [Procfile](https://github.com/jjleng/paka/blob/main/examples/invoice_extraction/Procfile).\n- **.cnignore file**: Excludes any files that shouldn't be included in the build. See [.cnignore](https://github.com/jjleng/paka/blob/main/examples/invoice_extraction/.cnignore).\n- **runtime.txt**: Pins the version of the runtime your application uses. See [runtime.txt](https://github.com/jjleng/paka/blob/main/examples/invoice_extraction/runtime.txt).\n- **requirements.txt or package.json**: Lists all necessary packages for your application.\n\n\n### Deploy the App\n```bash\npaka function deploy --name invoice-extraction --source . --entrypoint serve\n```\n\n## \ud83d\udcd6 Documentation\n\n- [Quick Start](./docs/quick_start.md)\n- [FAQ](./docs/faq.md)\n- [cluster_config.yaml](./docs/cluster_config.md)\n\n## Contributing\n- code changes\n- `make check-all`\n- Open a PR\n\n",
"bugtrack_url": null,
"license": null,
"summary": "LLMOps tool designed to simplify the deployment and management of large language model (LLM) applications",
"version": "0.1.11",
"project_urls": {
"Homepage": "https://github.com/jjleng/paka"
},
"split_keywords": [
"llmops",
" rag",
" production",
" cloud"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "177a4506ec88687860ee30aaf88cdc287e2864d1dddc9bf2aa74ff9ffb6e55d5",
"md5": "be5b136912ed5067a1b88c29424aa32d",
"sha256": "5884a7bd7355c414e67bf8245658b285013d126310a02560b4aeb695ba4bb91e"
},
"downloads": -1,
"filename": "paka-0.1.11-py3-none-any.whl",
"has_sig": false,
"md5_digest": "be5b136912ed5067a1b88c29424aa32d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8",
"size": 94571,
"upload_time": "2024-07-12T21:30:11",
"upload_time_iso_8601": "2024-07-12T21:30:11.230912Z",
"url": "https://files.pythonhosted.org/packages/17/7a/4506ec88687860ee30aaf88cdc287e2864d1dddc9bf2aa74ff9ffb6e55d5/paka-0.1.11-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7124d066d051303e1ffc0bb8312e5f850e7c2d5443e8c2e8f1e56ab8f5a808a7",
"md5": "59df9f424d8d91f385fc3873aa0de185",
"sha256": "70bb612f005652e81dffa8522659c9ae67b6189a38992fddea4f01c25786609e"
},
"downloads": -1,
"filename": "paka-0.1.11.tar.gz",
"has_sig": false,
"md5_digest": "59df9f424d8d91f385fc3873aa0de185",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8",
"size": 70076,
"upload_time": "2024-07-12T21:30:12",
"upload_time_iso_8601": "2024-07-12T21:30:12.778388Z",
"url": "https://files.pythonhosted.org/packages/71/24/d066d051303e1ffc0bb8312e5f850e7c2d5443e8c2e8f1e56ab8f5a808a7/paka-0.1.11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-12 21:30:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jjleng",
"github_project": "paka",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "paka"
}