sagify

Name	sagify JSON
Version	0.25.4 JSON
	download
home_page	https://github.com/Kenza-AI/sagify
Summary	Machine Learning Training, Tuning and Deployment on AWS
upload_time	2024-03-10 12:33:03
maintainer
docs_url	None
author	Pavlos Mitsoulis Ntompos, Ioakeim (Joakim) Lazakis, Dionysis Varelas
requires_python	~=3.7
license	MIT
keywords	sagify llm multi-modal-model foundation-model llm-production machine-learning machine-learning-deploy machine-learning-production deep-learning-production deep-learning cli aws sagemaker tensorflow mxnet scikit-learn artificial-intelligence keras cntk deploy
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage

            ![Sagify](docs/sagify@2x.png)

<p align="center">
    <em>LLMs and Machine Learning done easily.</em>
</p>
<p align="center">
<a href="https://github.com/kenza-ai/sagify/actions?query=workflow%3ACI" target="_blank">
    <img src="https://github.com/kenza-ai/sagify/workflows/CI/badge.svg" alt="Test">
</a>
</p>

# sagify

Sagify provides a simplified interface to manage machine learning workflows on [AWS SageMaker](https://aws.amazon.com/sagemaker/), helping you focus on building ML models rather than infrastructure. Its modular architecture includes an LLM Gateway module to provide a unified interface for leveraging both open source and proprietary large language models. The LLM Gateway gives access to various LLMs through a simple API, letting you easily incorporate them into your workflows.

For detailed reference to Sagify please go to: [Read the Docs](https://Kenza-AI.github.io/sagify/)

## Installation

### Prerequisites

sagify requires the following:

1. Python (3.7, 3.8, 3.9, 3.10, 3.11)
2. [Docker](https://www.docker.com/) installed and running
3. Configured [awscli](https://pypi.python.org/pypi/awscli)

### Install sagify

At the command line:

    pip install sagify


## Getting started -  LLM Deployment with no code
                
1. Make sure to configure your AWS account by following the instructions at section [Configure AWS Account](#configure-aws-account)
  
2. Finally, run the following command:

```sh
sagify cloud foundation-model-deploy --model-id model-txt2img-stabilityai-stable-diffusion-v2-1-base --model-version 1.* -n 1 -e ml.p3.2xlarge --aws-region us-east-1 --aws-profile sagemaker-dev
```
        
You can change the values for ec2 type (-e), aws region and aws profile with your preferred ones.

Once the Stable Diffusion model is deployed, you can use the generated code snippet to query it. Enjoy!

## Backend Platforms

### OpenAI

The following models are offered for chat completions:

| Model Name | URL |
|:------------:|:-----:|
|gpt-4|https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo|
|gpt-4-32k|https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo|
|gpt-3.5-turbo|https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo|

For image creation you can rely on the following models:

| Model Name | URL |
|:------------:|:-----:|
|dall-e-3|https://platform.openai.com/docs/models/dall-e|
|dall-e-2|https://platform.openai.com/docs/models/dall-e|

And for embeddings:

| Model Name | URL |
|:------------:|:-----:|
|text-embedding-3-large|https://platform.openai.com/docs/models/embeddings|
|text-embedding-3-small|https://platform.openai.com/docs/models/embeddings|
|text-embedding-ada-002|https://platform.openai.com/docs/models/embeddings|

All these lists of supported models on Openai can be retrieved by running the command `sagify llm models --all --provider openai`. If you want to focus only on chat completions models, then run `sagify llm models --chat-completions --provider openai`. For image creations and embeddings, `sagify llm models --image-creations --provider openai` and `sagify llm models --embeddings --provider openai`, respectively.

### Open-Source

The following open-source models are offered for chat completions:

| Model Name | URL |
|:------------:|:-----:|
|llama-2-7b|https://huggingface.co/meta-llama/Llama-2-7b|
|llama-2-13b|https://huggingface.co/meta-llama/Llama-2-13b|
|llama-2-70b|https://huggingface.co/meta-llama/Llama-2-70b|

For image creation you can rely on the following open-source models:

| Model Name | URL |
|:------------:|:-----:|
|stabilityai-stable-diffusion-v2|https://huggingface.co/stabilityai/stable-diffusion-2|
|stabilityai-stable-diffusion-v2-1-base|https://huggingface.co/stabilityai/stable-diffusion-2-1-base|
|stabilityai-stable-diffusion-v2-fp16|https://huggingface.co/stabilityai/stable-diffusion-2/tree/fp16|

And for embeddings:

| Model Name | URL |
|:------------:|:-----:|
|bge-large-en|https://huggingface.co/BAAI/bge-large-en|
|bge-base-en|https://huggingface.co/BAAI/bge-base-en|
|gte-large|https://huggingface.co/thenlper/gte-large|
|gte-base|https://huggingface.co/thenlper/gte-base|
|e5-large-v2|https://huggingface.co/intfloat/e5-large-v2|
|bge-small-en|https://huggingface.co/BAAI/bge-small-en|
|e5-base-v2|https://huggingface.co/intfloat/e5-base-v2|
|multilingual-e5-large|https://huggingface.co/intfloat/multilingual-e5-large|
|e5-large|https://huggingface.co/intfloat/e5-large|
|gte-small|https://huggingface.co/thenlper/gte-small|
|e5-base|https://huggingface.co/intfloat/e5-base|
|e5-small-v2|https://huggingface.co/intfloat/e5-small-v2|
|multilingual-e5-base|https://huggingface.co/intfloat/multilingual-e5-base|
|all-MiniLM-L6-v2|https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2|

All these lists of supported open-source models are supported on AWS Sagemaker and can be retrieved by running the command `sagify llm models --all --provider sagemaker`. If you want to focus only on chat completions models, then run `sagify llm models --chat-completions --provider sagemaker`. For image creations and embeddings, `sagify llm models --image-creations --provider sagemaker` and `sagify llm models --embeddings --provider sagemaker`, respectively.

## Set up OpenAI

You need to define the following env variables before you start the LLM Gateway server:

- `OPENAI_API_KEY`: Your OpenAI API key. Example: `export OPENAI_API_KEY=...`.
- `OPENAI_CHAT_COMPLETIONS_MODEL`: It should have one of values [here](https://platform.openai.com/docs/models/gpt-3-5-turbo) or [here](https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo).
- `OPENAI_EMBEDDINGS_MODEL`: It should have one of values [here](https://platform.openai.com/docs/models/embeddings).
- `OPENAI_IMAGE_CREATION_MODEL`: It should have one of values [here](https://platform.openai.com/docs/models/dall-e).

## Set up open-source LLMs

First step is to deploy the LLM model(s). You can choose to deploy all backend services (chat completions, image creations, embeddings) or some of them. 

If you want to deploy all of them, then run `sagify llm start --all`. This command will deploy all backend services (chat completions, image creations, embeddings) with the following configuration:

```json
{
    "chat_completions": {
        "model": "llama-2-7b",
        "instance_type": "ml.g5.2xlarge",
        "num_instances": 1,
    },
    "image_creations": {
        "model": "stabilityai-stable-diffusion-v2-1-base",
        "instance_type": "ml.p3.2xlarge",
        "num_instances": 1,
    },
    "embeddings": {
        "model": "gte-small",
        "instance_type": "ml.g5.2xlarge",
        "num_instances": 1,
    },
}
```

You can change this configuration by suppling your own config file, then you can run `sagify llm start -all --config YOUR_CONFIG_FILE.json`.

It takes 15 to 30 minutes to deploy all the backend services as Sagemaker endpoints.

The deployed model names, which are the Sagemaker endpoint names, are printed out and stored in the hidden file `.sagify_llm_infra.json`. You can also access them from the AWS Sagemaker web console.

## Deploy FastAPI LLM Gateway - Docker

Once you have set up your backend platform, you can deploy the FastAPI LLM Gateway locally. 

In case of using the AWS Sagemaker platform, you need to define the following env variables before you start the LLM Gateway server:

- `AWS_ACCESS_KEY_ID`: It can be the same one you use locally for Sagify. It should have access to Sagemaker and S3. Example: `export AWS_ACCESS_KEY_ID=...`.
- `AWS_SECRET_ACCESS_KEY`:  It can be the same one you use locally for Sagify. It should have access to Sagemaker and S3. Example: `export AWS_ACCESS_KEY_ID=...`.
- `AWS_REGION_NAME`: AWS region where the LLM backend services (Sagemaker endpoints) are deployed.
- `S3_BUCKET_NAME`: S3 bucket name where the created images by the image creation backend service are stored.
- `IMAGE_URL_TTL_IN_SECONDS`: TTL in seconds of the temporary url to the created images. Default value: 3600.
- `SM_CHAT_COMPLETIONS_MODEL`: The Sagemaker endpoint name where the chat completions model is deployed.
- `SM_EMBEDDINGS_MODEL`: The Sagemaker endpoint name where the embeddings model is deployed.
- `SM_IMAGE_CREATION_MODEL`: The Sagemaker endpoint name where the image creation model is deployed.

In case of using the OpenAI platform, you need to define the following env variables before you start the LLM Gateway server:

- `OPENAI_API_KEY`: Your OpenAI API key. Example: `export OPENAI_API_KEY=...`.
- `OPENAI_CHAT_COMPLETIONS_MODEL`: It should have one of values [here](https://platform.openai.com/docs/models/gpt-3-5-turbo) or [here](https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo).
- `OPENAI_EMBEDDINGS_MODEL`: It should have one of values [here](https://platform.openai.com/docs/models/embeddings).
- `OPENAI_IMAGE_CREATION_MODEL`: It should have one of values [here](https://platform.openai.com/docs/models/dall-e).

Now, you can run the command `sagify llm gateway --image sagify-llm-gateway:v0.1.0 --start-local` to start the LLM Gateway locally. You can change the name of the image via the `--image` argument.

This command will output the Docker container id. You can stop the container by executing `docker stop <CONTAINER_ID>`.

**Examples**

(*Remember to export first all the environment variables you need*)

In the case you want to create a docker image and then run it
```{bash}
sagify llm gateway --image sagify-llm-gateway:v0.1.0 --start-local
 ```

 If you want to use just build the image
 ```{bash}
 sagify llm gateway --image sagify-llm-gateway:v0.1.0
 ```

If you want to support both platforms (OpenAI and AWS Sagemaker), then pass all the env variables for both platforms.

## Deploy FastAPI LLM Gateway - AWS Fargate

In case you want to deploy the LLM Gateway to AWS Fargate, then you can follow these general steps:

1. Containerize the FastAPI LLM Gateway: See previous section.
2. Push Docker image to Amazon ECR.
3. Define Task Definition: Define a task definition that describes how to run your containerized FastAPI application on Fargate. Specify the Docker image, container port, CPU and memory requirements, and environment variables.
4. Create ECS Service: Create a Fargate service using the task definition. Configure the desired number of tasks, networking options, load balancing, and auto-scaling settings.
4. Set Environment Variables: Ensure that your FastAPI application retrieves the environment variables correctly at runtime.

Here's an example CloudFormation template to deploy a FastAPI service to AWS Fargate with 5 environment variables:

```yaml
Resources:
  MyFargateTaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Family: my-fargate-task
      ContainerDefinitions:
        - Name: fastapi-container
          Image: <YOUR_ECR_REPOSITORY_URI>
          Memory: 512
          PortMappings:
            - ContainerPort: 80
          Environment:
            - Name: AWS_ACCESS_KEY_ID
              Value: "value1"
            - Name: AWS_SECRET_ACCESS_KEY
              Value: "value2"
            - Name: AWS_REGION_NAME
              Value: "value3"
            - Name: S3_BUCKET_NAME
              Value: "value4"
            - Name: IMAGE_URL_TTL_IN_SECONDS
              Value: "value5"
            - Name: SM_CHAT_COMPLETIONS_MODEL
              Value: "value6"
            - Name: SM_EMBEDDINGS_MODEL
              Value: "value7"
            - Name: SM_IMAGE_CREATION_MODEL
              Value: "value8"
            - Name: OPENAI_CHAT_COMPLETIONS_MODEL
              Value: "value9"
            - Name: OPENAI_EMBEDDINGS_MODEL
              Value: "value10"
            - Name: OPENAI_IMAGE_CREATION_MODEL
              Value: "value11"

  MyFargateService:
    Type: AWS::ECS::Service
    Properties:
      Cluster: default
      TaskDefinition: !Ref MyFargateTaskDefinition
      DesiredCount: 2
      LaunchType: FARGATE
      NetworkConfiguration:
        AwsvpcConfiguration:
          Subnets:
            - <YOUR_SUBNET_ID>
          SecurityGroups:
            - <YOUR_SECURITY_GROUP_ID>
```

## LLM Gateway API

Once the LLM Gateway is deployed, you can access it on `HOST_NAME/docs`.

### Completions

```shell
curl --location --request POST '/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data-raw '{
    "provider": "sagemaker",
     "messages": [
      {
        "role": "system",
        "content": "you are a cook"
      },
      {
        "role": "user",
        "content": "what is the recipe of mayonnaise"
      }
    ],
    "temperature": 0,
    "max_tokens": 600,
    "top_p": 0.9,
    "seed": 32
}'
```

> Example responses

> 200 Response

```json
{
    "id": "chatcmpl-8167b99c-f22b-4e04-8e26-4ca06d58dc86",
    "object": "chat.completion",
    "created": 1708765682,
    "provider": "sagemaker",
    "model": "meta-textgeneration-llama-2-7b-f-2024-02-24-08-49-32-123",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": " Ah, a fellow foodie! Mayonnaise is a classic condiment that's easy to make and can elevate any dish. Here's my trusty recipe for homemade mayonnaise:\n\nIngredients:\n\n* 2 egg yolks\n* 1/2 cup (120 ml) neutral-tasting oil, such as canola or grapeseed\n* 1 tablespoon lemon juice or vinegar\n* Salt and pepper to taste\n\nInstructions:\n\n1. In a small bowl, whisk together the egg yolks and lemon juice or vinegar until well combined.\n2. Slowly pour in the oil while continuously whisking the mixture. You can do this by hand with a whisk or use an electric mixer on low speed.\n3. Continue whisking until the mixture thickens and emulsifies, which should take about 5-7 minutes. You'll know it's ready when it reaches a thick, creamy consistency.\n4. Taste and adjust the seasoning as needed. You can add more salt, pepper, or lemon juice to taste.\n5. Transfer the mayonnaise to a jar or airtight container and store it in the fridge for up to 1 week.\n\nThat's it! Homemade mayonnaise is a great way to control the ingredients and flavor, and it's also a fun kitchen experiment. Enjoy!"
            }
        }
    ]
}
```

### Embeddings

```shell
curl --location --request POST '/v1/embeddings' \
--header 'Content-Type: application/json' \
--data-raw '{
  "provider": "sagemaker",
  "input": [
    "The mayonnaise was delicious"
  ]
}'
```

> Example responses

> 200 Response

```json
{
    "data": [
        {
            "object": "embedding",
            "embedding": [
                -0.04274585098028183,
                0.021814687177538872,
                -0.004705613013356924,
                ...
                -0.07548460364341736,
                0.036427777260541916,
                0.016453085467219353,
                0.004641987383365631,
                -0.0072729517705738544,
                0.02343473769724369,
                -0.002924458822235465,
                0.0339619480073452,
                0.005262510851025581,
                -0.06709178537130356,
                -0.015170316211879253,
                -0.04612169787287712,
                -0.012380547821521759,
                -0.006663458421826363,
                -0.0573800653219223,
                0.007938326336443424,
                0.03486081212759018,
                0.021514462307095528
            ],
            "index": 0
        }
    ],
    "provider": "sagemaker",
    "model": "hf-sentencesimilarity-gte-small-2024-02-24-09-24-27-341",
    "object": "list"
}
```

### Image Generations

```shell
curl --location --request POST '/v1/images/generations' \
--header 'Content-Type: application/json' \
--data-raw '{
  "provider": "sagemaker",
  "prompt": 
    "A baby sea otter"
  ,
  "n": 1,
  "width": 512,
  "height": 512,
  "seed": 32,
  "response_format": "url"
}'
```

> Example responses

> 200 Response

```json
{
    "provider": "sagemaker",
    "model": "stable-diffusion-v2-1-base-2024-02-24-11-43-32-177",
    "created": 1708775601,
    "data": [
        {
            "url": "https://your-bucket.s3.amazonaws.com/31cedd17-ccd7-4cba-8dea-cb7e8b915782.png?AWSAccessKeyId=AKIAUKEQBDHITP26MLXH&Signature=%2Fd1J%2FUjOWbRnP5cwtkSzYUVoEoo%3D&Expires=1708779204"
        }
    ]
}
```

## Talk with the team

Email: pavlos@sagify.ai

## Why did we build this

We realized that there is not a single LLM to rule them all!

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Kenza-AI/sagify",
    "name": "sagify",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "~=3.7",
    "maintainer_email": "",
    "keywords": "sagify llm multi-modal-model foundation-model llm-production machine-learning machine-learning-deploy machine-learning-production deep-learning-production deep-learning cli aws sagemaker tensorflow mxnet scikit-learn artificial-intelligence keras cntk deploy",
    "author": "Pavlos Mitsoulis Ntompos, Ioakeim (Joakim) Lazakis, Dionysis Varelas",
    "author_email": "p.mitsoulis@gmail.com, ilazakis@gmail.com, dionvarelas@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e5/0f/aa334f351818c6e633a907275391e5b7c513ff79d40bce4111571fed450d/sagify-0.25.4.tar.gz",
    "platform": null,
    "description": "![Sagify](docs/sagify@2x.png)\n\n<p align=\"center\">\n    <em>LLMs and Machine Learning done easily.</em>\n</p>\n<p align=\"center\">\n<a href=\"https://github.com/kenza-ai/sagify/actions?query=workflow%3ACI\" target=\"_blank\">\n    <img src=\"https://github.com/kenza-ai/sagify/workflows/CI/badge.svg\" alt=\"Test\">\n</a>\n</p>\n\n# sagify\n\nSagify provides a simplified interface to manage machine learning workflows on [AWS SageMaker](https://aws.amazon.com/sagemaker/), helping you focus on building ML models rather than infrastructure. Its modular architecture includes an LLM Gateway module to provide a unified interface for leveraging both open source and proprietary large language models. The LLM Gateway gives access to various LLMs through a simple API, letting you easily incorporate them into your workflows.\n\nFor detailed reference to Sagify please go to: [Read the Docs](https://Kenza-AI.github.io/sagify/)\n\n## Installation\n\n### Prerequisites\n\nsagify requires the following:\n\n1. Python (3.7, 3.8, 3.9, 3.10, 3.11)\n2. [Docker](https://www.docker.com/) installed and running\n3. Configured [awscli](https://pypi.python.org/pypi/awscli)\n\n### Install sagify\n\nAt the command line:\n\n    pip install sagify\n\n\n## Getting started -  LLM Deployment with no code\n                \n1. Make sure to configure your AWS account by following the instructions at section [Configure AWS Account](#configure-aws-account)\n  \n2. Finally, run the following command:\n\n```sh\nsagify cloud foundation-model-deploy --model-id model-txt2img-stabilityai-stable-diffusion-v2-1-base --model-version 1.* -n 1 -e ml.p3.2xlarge --aws-region us-east-1 --aws-profile sagemaker-dev\n```\n        \nYou can change the values for ec2 type (-e), aws region and aws profile with your preferred ones.\n\nOnce the Stable Diffusion model is deployed, you can use the generated code snippet to query it. Enjoy!\n\n## Backend Platforms\n\n### OpenAI\n\nThe following models are offered for chat completions:\n\n| Model Name | URL |\n|:------------:|:-----:|\n|gpt-4|https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo|\n|gpt-4-32k|https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo|\n|gpt-3.5-turbo|https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo|\n\nFor image creation you can rely on the following models:\n\n| Model Name | URL |\n|:------------:|:-----:|\n|dall-e-3|https://platform.openai.com/docs/models/dall-e|\n|dall-e-2|https://platform.openai.com/docs/models/dall-e|\n\nAnd for embeddings:\n\n| Model Name | URL |\n|:------------:|:-----:|\n|text-embedding-3-large|https://platform.openai.com/docs/models/embeddings|\n|text-embedding-3-small|https://platform.openai.com/docs/models/embeddings|\n|text-embedding-ada-002|https://platform.openai.com/docs/models/embeddings|\n\nAll these lists of supported models on Openai can be retrieved by running the command `sagify llm models --all --provider openai`. If you want to focus only on chat completions models, then run `sagify llm models --chat-completions --provider openai`. For image creations and embeddings, `sagify llm models --image-creations --provider openai` and `sagify llm models --embeddings --provider openai`, respectively.\n\n### Open-Source\n\nThe following open-source models are offered for chat completions:\n\n| Model Name | URL |\n|:------------:|:-----:|\n|llama-2-7b|https://huggingface.co/meta-llama/Llama-2-7b|\n|llama-2-13b|https://huggingface.co/meta-llama/Llama-2-13b|\n|llama-2-70b|https://huggingface.co/meta-llama/Llama-2-70b|\n\nFor image creation you can rely on the following open-source models:\n\n| Model Name | URL |\n|:------------:|:-----:|\n|stabilityai-stable-diffusion-v2|https://huggingface.co/stabilityai/stable-diffusion-2|\n|stabilityai-stable-diffusion-v2-1-base|https://huggingface.co/stabilityai/stable-diffusion-2-1-base|\n|stabilityai-stable-diffusion-v2-fp16|https://huggingface.co/stabilityai/stable-diffusion-2/tree/fp16|\n\nAnd for embeddings:\n\n| Model Name | URL |\n|:------------:|:-----:|\n|bge-large-en|https://huggingface.co/BAAI/bge-large-en|\n|bge-base-en|https://huggingface.co/BAAI/bge-base-en|\n|gte-large|https://huggingface.co/thenlper/gte-large|\n|gte-base|https://huggingface.co/thenlper/gte-base|\n|e5-large-v2|https://huggingface.co/intfloat/e5-large-v2|\n|bge-small-en|https://huggingface.co/BAAI/bge-small-en|\n|e5-base-v2|https://huggingface.co/intfloat/e5-base-v2|\n|multilingual-e5-large|https://huggingface.co/intfloat/multilingual-e5-large|\n|e5-large|https://huggingface.co/intfloat/e5-large|\n|gte-small|https://huggingface.co/thenlper/gte-small|\n|e5-base|https://huggingface.co/intfloat/e5-base|\n|e5-small-v2|https://huggingface.co/intfloat/e5-small-v2|\n|multilingual-e5-base|https://huggingface.co/intfloat/multilingual-e5-base|\n|all-MiniLM-L6-v2|https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2|\n\nAll these lists of supported open-source models are supported on AWS Sagemaker and can be retrieved by running the command `sagify llm models --all --provider sagemaker`. If you want to focus only on chat completions models, then run `sagify llm models --chat-completions --provider sagemaker`. For image creations and embeddings, `sagify llm models --image-creations --provider sagemaker` and `sagify llm models --embeddings --provider sagemaker`, respectively.\n\n## Set up OpenAI\n\nYou need to define the following env variables before you start the LLM Gateway server:\n\n- `OPENAI_API_KEY`: Your OpenAI API key. Example: `export OPENAI_API_KEY=...`.\n- `OPENAI_CHAT_COMPLETIONS_MODEL`: It should have one of values [here](https://platform.openai.com/docs/models/gpt-3-5-turbo) or [here](https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo).\n- `OPENAI_EMBEDDINGS_MODEL`: It should have one of values [here](https://platform.openai.com/docs/models/embeddings).\n- `OPENAI_IMAGE_CREATION_MODEL`: It should have one of values [here](https://platform.openai.com/docs/models/dall-e).\n\n## Set up open-source LLMs\n\nFirst step is to deploy the LLM model(s). You can choose to deploy all backend services (chat completions, image creations, embeddings) or some of them. \n\nIf you want to deploy all of them, then run `sagify llm start --all`. This command will deploy all backend services (chat completions, image creations, embeddings) with the following configuration:\n\n```json\n{\n    \"chat_completions\": {\n        \"model\": \"llama-2-7b\",\n        \"instance_type\": \"ml.g5.2xlarge\",\n        \"num_instances\": 1,\n    },\n    \"image_creations\": {\n        \"model\": \"stabilityai-stable-diffusion-v2-1-base\",\n        \"instance_type\": \"ml.p3.2xlarge\",\n        \"num_instances\": 1,\n    },\n    \"embeddings\": {\n        \"model\": \"gte-small\",\n        \"instance_type\": \"ml.g5.2xlarge\",\n        \"num_instances\": 1,\n    },\n}\n```\n\nYou can change this configuration by suppling your own config file, then you can run `sagify llm start -all --config YOUR_CONFIG_FILE.json`.\n\nIt takes 15 to 30 minutes to deploy all the backend services as Sagemaker endpoints.\n\nThe deployed model names, which are the Sagemaker endpoint names, are printed out and stored in the hidden file `.sagify_llm_infra.json`. You can also access them from the AWS Sagemaker web console.\n\n## Deploy FastAPI LLM Gateway - Docker\n\nOnce you have set up your backend platform, you can deploy the FastAPI LLM Gateway locally. \n\nIn case of using the AWS Sagemaker platform, you need to define the following env variables before you start the LLM Gateway server:\n\n- `AWS_ACCESS_KEY_ID`: It can be the same one you use locally for Sagify. It should have access to Sagemaker and S3. Example: `export AWS_ACCESS_KEY_ID=...`.\n- `AWS_SECRET_ACCESS_KEY`:  It can be the same one you use locally for Sagify. It should have access to Sagemaker and S3. Example: `export AWS_ACCESS_KEY_ID=...`.\n- `AWS_REGION_NAME`: AWS region where the LLM backend services (Sagemaker endpoints) are deployed.\n- `S3_BUCKET_NAME`: S3 bucket name where the created images by the image creation backend service are stored.\n- `IMAGE_URL_TTL_IN_SECONDS`: TTL in seconds of the temporary url to the created images. Default value: 3600.\n- `SM_CHAT_COMPLETIONS_MODEL`: The Sagemaker endpoint name where the chat completions model is deployed.\n- `SM_EMBEDDINGS_MODEL`: The Sagemaker endpoint name where the embeddings model is deployed.\n- `SM_IMAGE_CREATION_MODEL`: The Sagemaker endpoint name where the image creation model is deployed.\n\nIn case of using the OpenAI platform, you need to define the following env variables before you start the LLM Gateway server:\n\n- `OPENAI_API_KEY`: Your OpenAI API key. Example: `export OPENAI_API_KEY=...`.\n- `OPENAI_CHAT_COMPLETIONS_MODEL`: It should have one of values [here](https://platform.openai.com/docs/models/gpt-3-5-turbo) or [here](https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo).\n- `OPENAI_EMBEDDINGS_MODEL`: It should have one of values [here](https://platform.openai.com/docs/models/embeddings).\n- `OPENAI_IMAGE_CREATION_MODEL`: It should have one of values [here](https://platform.openai.com/docs/models/dall-e).\n\nNow, you can run the command `sagify llm gateway --image sagify-llm-gateway:v0.1.0 --start-local` to start the LLM Gateway locally. You can change the name of the image via the `--image` argument.\n\nThis command will output the Docker container id. You can stop the container by executing `docker stop <CONTAINER_ID>`.\n\n**Examples**\n\n(*Remember to export first all the environment variables you need*)\n\nIn the case you want to create a docker image and then run it\n```{bash}\nsagify llm gateway --image sagify-llm-gateway:v0.1.0 --start-local\n ```\n\n If you want to use just build the image\n ```{bash}\n sagify llm gateway --image sagify-llm-gateway:v0.1.0\n ```\n\nIf you want to support both platforms (OpenAI and AWS Sagemaker), then pass all the env variables for both platforms.\n\n## Deploy FastAPI LLM Gateway - AWS Fargate\n\nIn case you want to deploy the LLM Gateway to AWS Fargate, then you can follow these general steps:\n\n1. Containerize the FastAPI LLM Gateway: See previous section.\n2. Push Docker image to Amazon ECR.\n3. Define Task Definition: Define a task definition that describes how to run your containerized FastAPI application on Fargate. Specify the Docker image, container port, CPU and memory requirements, and environment variables.\n4. Create ECS Service: Create a Fargate service using the task definition. Configure the desired number of tasks, networking options, load balancing, and auto-scaling settings.\n4. Set Environment Variables: Ensure that your FastAPI application retrieves the environment variables correctly at runtime.\n\nHere's an example CloudFormation template to deploy a FastAPI service to AWS Fargate with 5 environment variables:\n\n```yaml\nResources:\n  MyFargateTaskDefinition:\n    Type: AWS::ECS::TaskDefinition\n    Properties:\n      Family: my-fargate-task\n      ContainerDefinitions:\n        - Name: fastapi-container\n          Image: <YOUR_ECR_REPOSITORY_URI>\n          Memory: 512\n          PortMappings:\n            - ContainerPort: 80\n          Environment:\n            - Name: AWS_ACCESS_KEY_ID\n              Value: \"value1\"\n            - Name: AWS_SECRET_ACCESS_KEY\n              Value: \"value2\"\n            - Name: AWS_REGION_NAME\n              Value: \"value3\"\n            - Name: S3_BUCKET_NAME\n              Value: \"value4\"\n            - Name: IMAGE_URL_TTL_IN_SECONDS\n              Value: \"value5\"\n            - Name: SM_CHAT_COMPLETIONS_MODEL\n              Value: \"value6\"\n            - Name: SM_EMBEDDINGS_MODEL\n              Value: \"value7\"\n            - Name: SM_IMAGE_CREATION_MODEL\n              Value: \"value8\"\n            - Name: OPENAI_CHAT_COMPLETIONS_MODEL\n              Value: \"value9\"\n            - Name: OPENAI_EMBEDDINGS_MODEL\n              Value: \"value10\"\n            - Name: OPENAI_IMAGE_CREATION_MODEL\n              Value: \"value11\"\n\n  MyFargateService:\n    Type: AWS::ECS::Service\n    Properties:\n      Cluster: default\n      TaskDefinition: !Ref MyFargateTaskDefinition\n      DesiredCount: 2\n      LaunchType: FARGATE\n      NetworkConfiguration:\n        AwsvpcConfiguration:\n          Subnets:\n            - <YOUR_SUBNET_ID>\n          SecurityGroups:\n            - <YOUR_SECURITY_GROUP_ID>\n```\n\n## LLM Gateway API\n\nOnce the LLM Gateway is deployed, you can access it on `HOST_NAME/docs`.\n\n### Completions\n\n```shell\ncurl --location --request POST '/v1/chat/completions' \\\n--header 'Content-Type: application/json' \\\n--data-raw '{\n    \"provider\": \"sagemaker\",\n     \"messages\": [\n      {\n        \"role\": \"system\",\n        \"content\": \"you are a cook\"\n      },\n      {\n        \"role\": \"user\",\n        \"content\": \"what is the recipe of mayonnaise\"\n      }\n    ],\n    \"temperature\": 0,\n    \"max_tokens\": 600,\n    \"top_p\": 0.9,\n    \"seed\": 32\n}'\n```\n\n> Example responses\n\n> 200 Response\n\n```json\n{\n    \"id\": \"chatcmpl-8167b99c-f22b-4e04-8e26-4ca06d58dc86\",\n    \"object\": \"chat.completion\",\n    \"created\": 1708765682,\n    \"provider\": \"sagemaker\",\n    \"model\": \"meta-textgeneration-llama-2-7b-f-2024-02-24-08-49-32-123\",\n    \"choices\": [\n        {\n            \"index\": 0,\n            \"message\": {\n                \"role\": \"assistant\",\n                \"content\": \" Ah, a fellow foodie! Mayonnaise is a classic condiment that's easy to make and can elevate any dish. Here's my trusty recipe for homemade mayonnaise:\\n\\nIngredients:\\n\\n* 2 egg yolks\\n* 1/2 cup (120 ml) neutral-tasting oil, such as canola or grapeseed\\n* 1 tablespoon lemon juice or vinegar\\n* Salt and pepper to taste\\n\\nInstructions:\\n\\n1. In a small bowl, whisk together the egg yolks and lemon juice or vinegar until well combined.\\n2. Slowly pour in the oil while continuously whisking the mixture. You can do this by hand with a whisk or use an electric mixer on low speed.\\n3. Continue whisking until the mixture thickens and emulsifies, which should take about 5-7 minutes. You'll know it's ready when it reaches a thick, creamy consistency.\\n4. Taste and adjust the seasoning as needed. You can add more salt, pepper, or lemon juice to taste.\\n5. Transfer the mayonnaise to a jar or airtight container and store it in the fridge for up to 1 week.\\n\\nThat's it! Homemade mayonnaise is a great way to control the ingredients and flavor, and it's also a fun kitchen experiment. Enjoy!\"\n            }\n        }\n    ]\n}\n```\n\n### Embeddings\n\n```shell\ncurl --location --request POST '/v1/embeddings' \\\n--header 'Content-Type: application/json' \\\n--data-raw '{\n  \"provider\": \"sagemaker\",\n  \"input\": [\n    \"The mayonnaise was delicious\"\n  ]\n}'\n```\n\n> Example responses\n\n> 200 Response\n\n```json\n{\n    \"data\": [\n        {\n            \"object\": \"embedding\",\n            \"embedding\": [\n                -0.04274585098028183,\n                0.021814687177538872,\n                -0.004705613013356924,\n                ...\n                -0.07548460364341736,\n                0.036427777260541916,\n                0.016453085467219353,\n                0.004641987383365631,\n                -0.0072729517705738544,\n                0.02343473769724369,\n                -0.002924458822235465,\n                0.0339619480073452,\n                0.005262510851025581,\n                -0.06709178537130356,\n                -0.015170316211879253,\n                -0.04612169787287712,\n                -0.012380547821521759,\n                -0.006663458421826363,\n                -0.0573800653219223,\n                0.007938326336443424,\n                0.03486081212759018,\n                0.021514462307095528\n            ],\n            \"index\": 0\n        }\n    ],\n    \"provider\": \"sagemaker\",\n    \"model\": \"hf-sentencesimilarity-gte-small-2024-02-24-09-24-27-341\",\n    \"object\": \"list\"\n}\n```\n\n### Image Generations\n\n```shell\ncurl --location --request POST '/v1/images/generations' \\\n--header 'Content-Type: application/json' \\\n--data-raw '{\n  \"provider\": \"sagemaker\",\n  \"prompt\": \n    \"A baby sea otter\"\n  ,\n  \"n\": 1,\n  \"width\": 512,\n  \"height\": 512,\n  \"seed\": 32,\n  \"response_format\": \"url\"\n}'\n```\n\n> Example responses\n\n> 200 Response\n\n```json\n{\n    \"provider\": \"sagemaker\",\n    \"model\": \"stable-diffusion-v2-1-base-2024-02-24-11-43-32-177\",\n    \"created\": 1708775601,\n    \"data\": [\n        {\n            \"url\": \"https://your-bucket.s3.amazonaws.com/31cedd17-ccd7-4cba-8dea-cb7e8b915782.png?AWSAccessKeyId=AKIAUKEQBDHITP26MLXH&Signature=%2Fd1J%2FUjOWbRnP5cwtkSzYUVoEoo%3D&Expires=1708779204\"\n        }\n    ]\n}\n```\n\n## Talk with the team\n\nEmail: pavlos@sagify.ai\n\n## Why did we build this\n\nWe realized that there is not a single LLM to rule them all!\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Machine Learning Training, Tuning and Deployment on AWS",
    "version": "0.25.4",
    "project_urls": {
        "Homepage": "https://github.com/Kenza-AI/sagify"
    },
    "split_keywords": [
        "sagify",
        "llm",
        "multi-modal-model",
        "foundation-model",
        "llm-production",
        "machine-learning",
        "machine-learning-deploy",
        "machine-learning-production",
        "deep-learning-production",
        "deep-learning",
        "cli",
        "aws",
        "sagemaker",
        "tensorflow",
        "mxnet",
        "scikit-learn",
        "artificial-intelligence",
        "keras",
        "cntk",
        "deploy"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "18d871a47a60741045a83d794407f118b825acceed0ad04a9624931911184e49",
                "md5": "8682e8217507cf48d99a9976a725df01",
                "sha256": "a7eabbd89c463bc1faa165891ba35ddee534cd8b0d8b50260bbd9b1703d7944c"
            },
            "downloads": -1,
            "filename": "sagify-0.25.4-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8682e8217507cf48d99a9976a725df01",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": "~=3.7",
            "size": 77441,
            "upload_time": "2024-03-10T12:33:00",
            "upload_time_iso_8601": "2024-03-10T12:33:00.178356Z",
            "url": "https://files.pythonhosted.org/packages/18/d8/71a47a60741045a83d794407f118b825acceed0ad04a9624931911184e49/sagify-0.25.4-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e50faa334f351818c6e633a907275391e5b7c513ff79d40bce4111571fed450d",
                "md5": "7a52d53d682fd2efc0045b01aebd57d0",
                "sha256": "7030fb62a74ad3db7176ce274d3e572dbe38fd449d2878f7d73b07794a9318cc"
            },
            "downloads": -1,
            "filename": "sagify-0.25.4.tar.gz",
            "has_sig": false,
            "md5_digest": "7a52d53d682fd2efc0045b01aebd57d0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "~=3.7",
            "size": 60700,
            "upload_time": "2024-03-10T12:33:03",
            "upload_time_iso_8601": "2024-03-10T12:33:03.015390Z",
            "url": "https://files.pythonhosted.org/packages/e5/0f/aa334f351818c6e633a907275391e5b7c513ff79d40bce4111571fed450d/sagify-0.25.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-10 12:33:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Kenza-AI",
    "github_project": "sagify",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "sagify"
}

Pavlos Mitsoulis Ntompos, Ioakeim (Joakim) Lazakis, Dionysis Varelas