<!---
Copyright (c) 2022-present, FriendliAI Inc. All rights reserved.
-->
<p align="center">
<img src="https://docs.periflow.ai/img/logo.svg" width="80%" alt="system">
</p>
<h2><p align="center">Supercharge Generative AI Serving đ</p></h2>
<p align="center">
<a href="https://github.com/friendliai/periflow-client/actions/workflows/ci.yaml">
<img alt="CI Status" src="https://github.com/friendliai/periflow-client/actions/workflows/ci.yaml/badge.svg">
</a>
<a href="https://pypi.org/project/periflow-client/">
<img alt="Python Version" src="https://img.shields.io/pypi/pyversions/periflow-client?logo=Python&logoColor=white">
</a>
<a href="https://pypi.org/project/periflow-client/">
<img alt="PyPi Package Version" src="https://img.shields.io/pypi/v/periflow-client?logo=PyPI&logoColor=white">
</a>
<a href="https://docs.periflow.ai/">
<img alt="Documentation" src="https://img.shields.io/badge/read-doc-blue?logo=ReadMe&logoColor=white">
</a>
<a href="https://github.com/friendliai/periflow-client/blob/main/LICENSE">
<img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-green.svg?logo=Apache">
</a>
</p>
PeriFlow is the fastest engine for serving generative AI models such as GPT-3. With PeriFlow, a company can significantly reduce the cost and environmental impact of running its generative AI models. Users can use PeriFlow in a container and run it on the infrastructure they manage. They can also use our PeriFlow cloud service to reduce overheads of running generative AI models themselves.
As these models enable smarter, more productive services, many companies are competitively investing to build sophisticated generative AI models, with parameter sizes ranging from a few billion to even a few trillion. OpenAI, for example, is well-known for its large language models such as ChatGPT and GPT-4. There are many others as wellâLLaMA, OPT, BLOOM, Gopher, PaLM, BlenderBot3, Codex, and CodeGen, to name a few. Many companies are increasingly making their generative AI models or adapting existing models to their needs.
Utilizing such large generative AI models, however, is no simple task. Even after users complete training models, they still face big challenges while handling the inference. Namely, the serving costs are likely high since they require GPUsâand as traffic increases, the costs steeply increase. Furthermore, they could pose environmental consequences. Lastly, operating the serving infrastructure can be burdensome for users who want to solely focus on training their models.
This is where PeriFlow, an engine for generative AI, comes to the rescue. Not only does PeriFlow provide a huge speedup of serving generative AI models, but it also provides a way for running the engine in diverse cloud or on-premise GPU resources. Namely, the PeriFlow cloud service automatically scales depending on the traffic and handles any faults or performance problems. Just imagine the time and cost that could be saved by leaving all tasks for utilizing a generative AI model in the hands of PeriFlow.
# âď¸ PeriFlow Cloud
## High performance
Users can use PeriFlow to reduce serving costs and environmental consequences significantly. They can serve much higher traffic with the same number of GPUsâor serve the same amount of traffic with notably fewer GPUs. PeriFlow can serve 10x more throughput at the same level of latency. This gain is thanks to PeriFlowâs innovative patented technology, which speeds up the execution of generative AI models.
## Diverse model and options support
PeriFlow supports various language model architectures, embedding choices, and decoding options such as greedy decoding, top-k, top-p, and beam search. PeriFlow will support diffusion models as well in the near future, so stay tuned!
Users can use PeriFlow in a container and run it by themselves, or they can use our cloud service. The cloud service supports the following features.
## Effortless deployment
PeriFlow cloud provides an easy serving experience with a Command Line Interface (CLI) and a web interface. With just a few clicks, users can deploy their models to the infrastructure that they desire. Users can move their serving between different clouds such as Azure, AWS, and GCP, and still have the same seamless experience.
## Automatic load and fault management
PeriFlow cloud monitors the resources in use and requests (responses) being sent to (sent from) the currently deployed model, allowing users a more stable model serving experience. When the number of requests sent to the deployed model increases, PeriFlow cloud automatically assigns more resources (GPU VMs) to the model, while it reduces resource usage when there are not as many requests. Furthermore, if a certain resource malfunctions, PeriFlow cloud proceeds with recovery based on the monitoring results.
# đšď¸ PeriFlow Client
Check out [PeriFlow Client Docs](https://docs.periflow.ai/) to learn more.
## Installation
```sh
pip install periflow-client
```
If you have a Hugging Face checkpoint and want to convert it to a PeriFlow-compatible format, you need to install the package with the necessary machine learing library (`mllib`) dependencies. In this case, install the package with the following command:
```sh
pip install "periflow-client[mllib]"
```
## Examples
This example shows how to create a deployment and send a completion API request to the created deployment with Python SDK.
```python
import periflow as pf
# Set up PeriFlow context.
pf.init(
api_key="YOUR_PERIFLOW_API_KEY",
project_name="my-project",
)
# Create a deployment at GCP asia-northest3 region wtih one A100 GPU.
deployment = pf.Deployment.create(
checkpoint_id="YOUR_CHECKPOINT_ID",
name="my-deployment",
cloud="gcp",
region="asia-northeast3",
gpu_type="a100",
num_gpus=1,
)
```
When the deployment becomes the "Healthy" status and ready to process inference requests, you can generate a completion with:
```python
# Generate a completion by sending an inference request to the deployment created above.
api = pf.Completion(deployment_id=deployment.deployment_id)
completion = api.create(
options=pf.V1CompletionOptions(
prompt="Python is a popular language for",
max_tokens=100,
top_p=0.8,
temperature=0.5,
no_repeat_ngram=3,
)
)
print(completion.choices[0].text)
"""
>>> Example Output:
web development. It is also used for a variety of other applications.
Python can be used to create desktop applications, web applications and mobile applications as well.
Python is one of the most popular languages for data science.
Data scientists use Python to analyze data.
The Python ecosystem is very diverse.
There are many libraries that can help you with your Python projects.
You can also find many Python tutorials online.
"""
```
You can also do the same with CLI.
```sh
# Switch CLI context to target project
pf project switch my-project
# Create a deployment
pf deployment create \
--checkpoint-id $YOUR_CHECKPOINT_ID \
--name my-deployment \
--cloud gcp \
--region asia-northeast3 \
--gpu-type a100 \
--num-gpus 1 \
--config-file config.yaml
```
When the deployment is ready, you can send a request with `curl`.
```sh
# Send a inference request to the deployment.
curl -X POST https://gcp-asia-northeast3.periflow.ai/$DEPLOYMENT_ID/v1/completions \
-d '{"prompt": "Python is a popular language for", "max_tokens": 100, "top_p": 0.8, "temperature": 0.5, "no_repeat_ngram": 3}'
```
The response will be like:
```txt
{
"choices": [
{
"index": 0,
"seed": 18337142367832222086,
"text": " web development. It is also used for a variety of other applications.\nPython can be used to create desktop applications, web applications and mobile applications as well.\nPython is one of the most popular languages for data science.\nData scientists use Python to analyze data.\nThe Python ecosystem is very diverse.\nThere are many libraries that can help you with your Python projects.\nYou can also find many Python tutorials online.
"tokens": [3644,8300,290,3992,2478,13,198,37906,318,6768,973,284,...]
}
]
}
```
Raw data
{
"_id": null,
"home_page": "https://friendli.ai/periflow",
"name": "periflow-client",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.8",
"maintainer_email": null,
"keywords": "generative-ai, serving, llm",
"author": "PeriFlow teams",
"author_email": "eng@friendli.ai",
"download_url": "https://files.pythonhosted.org/packages/aa/6d/5944531654bd4bf0847a7f689b2be3330b024ce6d639a855aaa4c75f3bd8/periflow_client-0.2.2.tar.gz",
"platform": null,
"description": "<!---\nCopyright (c) 2022-present, FriendliAI Inc. All rights reserved.\n-->\n\n<p align=\"center\">\n <img src=\"https://docs.periflow.ai/img/logo.svg\" width=\"80%\" alt=\"system\">\n</p>\n\n<h2><p align=\"center\">Supercharge Generative AI Serving \ud83d\ude80</p></h2>\n\n<p align=\"center\">\n <a href=\"https://github.com/friendliai/periflow-client/actions/workflows/ci.yaml\">\n <img alt=\"CI Status\" src=\"https://github.com/friendliai/periflow-client/actions/workflows/ci.yaml/badge.svg\">\n </a>\n <a href=\"https://pypi.org/project/periflow-client/\">\n <img alt=\"Python Version\" src=\"https://img.shields.io/pypi/pyversions/periflow-client?logo=Python&logoColor=white\">\n </a>\n <a href=\"https://pypi.org/project/periflow-client/\">\n <img alt=\"PyPi Package Version\" src=\"https://img.shields.io/pypi/v/periflow-client?logo=PyPI&logoColor=white\">\n </a>\n <a href=\"https://docs.periflow.ai/\">\n <img alt=\"Documentation\" src=\"https://img.shields.io/badge/read-doc-blue?logo=ReadMe&logoColor=white\">\n </a>\n <a href=\"https://github.com/friendliai/periflow-client/blob/main/LICENSE\">\n <img alt=\"License\" src=\"https://img.shields.io/badge/License-Apache%202.0-green.svg?logo=Apache\">\n </a>\n</p>\n\nPeriFlow is the fastest engine for serving generative AI models such as GPT-3. With PeriFlow, a company can significantly reduce the cost and environmental impact of running its generative AI models. Users can use PeriFlow in a container and run it on the infrastructure they manage. They can also use our PeriFlow cloud service to reduce overheads of running generative AI models themselves.\n\nAs these models enable smarter, more productive services, many companies are competitively investing to build sophisticated generative AI models, with parameter sizes ranging from a few billion to even a few trillion. OpenAI, for example, is well-known for its large language models such as ChatGPT and GPT-4. There are many others as well\u2014LLaMA, OPT, BLOOM, Gopher, PaLM, BlenderBot3, Codex, and CodeGen, to name a few. Many companies are increasingly making their generative AI models or adapting existing models to their needs.\n\nUtilizing such large generative AI models, however, is no simple task. Even after users complete training models, they still face big challenges while handling the inference. Namely, the serving costs are likely high since they require GPUs\u2014and as traffic increases, the costs steeply increase. Furthermore, they could pose environmental consequences. Lastly, operating the serving infrastructure can be burdensome for users who want to solely focus on training their models.\n\nThis is where PeriFlow, an engine for generative AI, comes to the rescue. Not only does PeriFlow provide a huge speedup of serving generative AI models, but it also provides a way for running the engine in diverse cloud or on-premise GPU resources. Namely, the PeriFlow cloud service automatically scales depending on the traffic and handles any faults or performance problems. Just imagine the time and cost that could be saved by leaving all tasks for utilizing a generative AI model in the hands of PeriFlow.\n\n# \u2601\ufe0f PeriFlow Cloud\n\n## High performance\n\nUsers can use PeriFlow to reduce serving costs and environmental consequences significantly. They can serve much higher traffic with the same number of GPUs\u2014or serve the same amount of traffic with notably fewer GPUs. PeriFlow can serve 10x more throughput at the same level of latency. This gain is thanks to PeriFlow\u2019s innovative patented technology, which speeds up the execution of generative AI models.\n\n## Diverse model and options support\n\nPeriFlow supports various language model architectures, embedding choices, and decoding options such as greedy decoding, top-k, top-p, and beam search. PeriFlow will support diffusion models as well in the near future, so stay tuned!\nUsers can use PeriFlow in a container and run it by themselves, or they can use our cloud service. The cloud service supports the following features.\n\n## Effortless deployment\n\nPeriFlow cloud provides an easy serving experience with a Command Line Interface (CLI) and a web interface. With just a few clicks, users can deploy their models to the infrastructure that they desire. Users can move their serving between different clouds such as Azure, AWS, and GCP, and still have the same seamless experience.\n\n## Automatic load and fault management\n\nPeriFlow cloud monitors the resources in use and requests (responses) being sent to (sent from) the currently deployed model, allowing users a more stable model serving experience. When the number of requests sent to the deployed model increases, PeriFlow cloud automatically assigns more resources (GPU VMs) to the model, while it reduces resource usage when there are not as many requests. Furthermore, if a certain resource malfunctions, PeriFlow cloud proceeds with recovery based on the monitoring results.\n\n# \ud83d\udd79\ufe0f PeriFlow Client\n\nCheck out [PeriFlow Client Docs](https://docs.periflow.ai/) to learn more.\n\n## Installation\n\n```sh\npip install periflow-client\n```\n\nIf you have a Hugging Face checkpoint and want to convert it to a PeriFlow-compatible format, you need to install the package with the necessary machine learing library (`mllib`) dependencies. In this case, install the package with the following command:\n\n```sh\npip install \"periflow-client[mllib]\"\n```\n\n## Examples\n\nThis example shows how to create a deployment and send a completion API request to the created deployment with Python SDK.\n\n```python\nimport periflow as pf\n\n\n# Set up PeriFlow context.\npf.init(\n api_key=\"YOUR_PERIFLOW_API_KEY\",\n project_name=\"my-project\",\n)\n\n# Create a deployment at GCP asia-northest3 region wtih one A100 GPU.\ndeployment = pf.Deployment.create(\n checkpoint_id=\"YOUR_CHECKPOINT_ID\",\n name=\"my-deployment\",\n cloud=\"gcp\",\n region=\"asia-northeast3\",\n gpu_type=\"a100\",\n num_gpus=1,\n)\n```\n\nWhen the deployment becomes the \"Healthy\" status and ready to process inference requests, you can generate a completion with:\n\n```python\n# Generate a completion by sending an inference request to the deployment created above.\napi = pf.Completion(deployment_id=deployment.deployment_id)\ncompletion = api.create(\n options=pf.V1CompletionOptions(\n prompt=\"Python is a popular language for\",\n max_tokens=100,\n top_p=0.8,\n temperature=0.5,\n no_repeat_ngram=3,\n )\n)\nprint(completion.choices[0].text)\n\n\"\"\"\n>>> Example Output:\n\nweb development. It is also used for a variety of other applications.\nPython can be used to create desktop applications, web applications and mobile applications as well.\nPython is one of the most popular languages for data science.\nData scientists use Python to analyze data.\nThe Python ecosystem is very diverse.\nThere are many libraries that can help you with your Python projects.\nYou can also find many Python tutorials online.\n\"\"\"\n```\n\nYou can also do the same with CLI.\n\n```sh\n# Switch CLI context to target project\npf project switch my-project\n\n# Create a deployment\npf deployment create \\\n --checkpoint-id $YOUR_CHECKPOINT_ID \\\n --name my-deployment \\\n --cloud gcp \\\n --region asia-northeast3 \\\n --gpu-type a100 \\\n --num-gpus 1 \\\n --config-file config.yaml\n```\n\nWhen the deployment is ready, you can send a request with `curl`.\n\n```sh\n# Send a inference request to the deployment.\ncurl -X POST https://gcp-asia-northeast3.periflow.ai/$DEPLOYMENT_ID/v1/completions \\\n -d '{\"prompt\": \"Python is a popular language for\", \"max_tokens\": 100, \"top_p\": 0.8, \"temperature\": 0.5, \"no_repeat_ngram\": 3}'\n```\n\nThe response will be like:\n\n```txt\n{\n \"choices\": [\n {\n \"index\": 0,\n \"seed\": 18337142367832222086,\n \"text\": \" web development. It is also used for a variety of other applications.\\nPython can be used to create desktop applications, web applications and mobile applications as well.\\nPython is one of the most popular languages for data science.\\nData scientists use Python to analyze data.\\nThe Python ecosystem is very diverse.\\nThere are many libraries that can help you with your Python projects.\\nYou can also find many Python tutorials online.\n \"tokens\": [3644,8300,290,3992,2478,13,198,37906,318,6768,973,284,...]\n }\n ]\n}\n```\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Client of PeriFlow, the fastest generative AI serving available.",
"version": "0.2.2",
"project_urls": {
"Documentation": "https://docs.periflow.ai/",
"Homepage": "https://friendli.ai/periflow",
"Repository": "https://github.com/friendliai/periflow-client"
},
"split_keywords": [
"generative-ai",
" serving",
" llm"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "415192102a3f15ec5c2f1ddf8dd230001bbbb294baec28e89e168d8048220f46",
"md5": "6fb487986dab3fcea990ed45666cb7b9",
"sha256": "f3953be5f24f8484ab687dba2d27386d7dbbae67fd451fd408ff951cf23b99b5"
},
"downloads": -1,
"filename": "periflow_client-0.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6fb487986dab3fcea990ed45666cb7b9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8",
"size": 209796,
"upload_time": "2024-04-06T16:05:44",
"upload_time_iso_8601": "2024-04-06T16:05:44.285661Z",
"url": "https://files.pythonhosted.org/packages/41/51/92102a3f15ec5c2f1ddf8dd230001bbbb294baec28e89e168d8048220f46/periflow_client-0.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "aa6d5944531654bd4bf0847a7f689b2be3330b024ce6d639a855aaa4c75f3bd8",
"md5": "2cb3da49b4c85c1ae0052d029bb546d2",
"sha256": "6cf09d2933e9a3761a8916d1fa05e85b0867c77253b56ee3aa4ad64e59dbd6ed"
},
"downloads": -1,
"filename": "periflow_client-0.2.2.tar.gz",
"has_sig": false,
"md5_digest": "2cb3da49b4c85c1ae0052d029bb546d2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8",
"size": 133543,
"upload_time": "2024-04-06T16:05:46",
"upload_time_iso_8601": "2024-04-06T16:05:46.805427Z",
"url": "https://files.pythonhosted.org/packages/aa/6d/5944531654bd4bf0847a7f689b2be3330b024ce6d639a855aaa4c75f3bd8/periflow_client-0.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-06 16:05:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "friendliai",
"github_project": "periflow-client",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "periflow-client"
}