fmbench


Namefmbench JSON
Version 1.0.31 PyPI version JSON
download
home_pagehttps://github.com/aws-samples/foundation-model-benchmarking-tool
SummaryBenchmark performance of **any model** deployed on **Amazon SageMaker** or available on **Amazon Bedrock** or deployed by you on an AWS service of choice (such as Amazon EKS or Amazon EC2) a.k.a **Bring your own endpoint**.
upload_time2024-04-22 01:59:37
maintainerNone
docs_urlNone
authorAmit Arora
requires_python<4.0,>=3.11
licenseMIT
keywords benchmarking sagemaker bedrock bring your own endpoint generative-ai foundation-models
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Foundation Model benchmarking tool (FMBench)

<h1 align="center">
        <img src="https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/img/fmbt-small.png?raw=true"></img>
    </h1>
    <p align="center">
        <p align="center">Benchmark any Foundation Model (FM) on any AWS service [Amazon SageMaker, Amazon Bedrock, Amazon EKS, Bring your own endpoint etc.]
        <br>
    </p>
<h4 align="center"><a href="" target="_blank">Amazon SageMaker</a> | <a href="" target="_blank">Amazon Bedrock</a></h4>
<h4 align="center">
    <a href="https://pypi.org/project/fmbench/" target="_blank">
        <img src="https://img.shields.io/pypi/v/fmbench.svg" alt="PyPI Version">
    </a>    
</h4>

A key challenge with FMs is the ability to benchmark their performance in terms of inference latency, throughput and cost so as to determine which model running with what combination of the hardware and serving stack provides the best price-performance combination for a given workload.

Stated as **business problem**, the ask is “_*What is the dollar cost per transaction for a given generative AI workload that serves a given number of users while keeping the response time under a target threshold?*_”

But to really answer this question, we need to answer an **engineering question** (an optimization problem, actually) corresponding to this business problem: “*_What is the minimum number of instances N, of most cost optimal instance type T, that are needed to serve a workload W while keeping the average transaction latency under L seconds?_*”

*W: = {R transactions per-minute, average prompt token length P, average generation token length G}*

This foundation model benchmarking tool (a.k.a. `FMBench`) is a tool to answer the above engineering question and thus answer the original business question about how to get the best price performance for a given workload. Here is one of the plots generated by `FMBench` to help answer the above question (_the numbers on the y-axis, transactions per minute and latency have been removed from the image below, you can find them in the actual plot generated on running `FMBench`_).

![business question](https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/img/business_summary.png?raw=true)

## Models benchmarked

Configuration files are available in the [configs](./src/fmbench/configs) folder for the following models in this repo.

| Model    | SageMaker g4dn/g5/p3 | SageMaker Inf2 | SageMaker P4 | SageMaker P5 | Bedrock On-demand throughput | Bedrock provisioned throughput |
|:------------------|:-----------------|:----------------|:--------------|:--------------|:------------------------------|:--------------------------------|
| **Anthropic Claude-3 Sonnet** | | |  | | ✅ | ✅  | 
| **Anthropic Claude-3 Haiku**  | | |  | | ✅ |   |
| **Mistral-7b-instruct** |✅ | |✅  |✅ | ✅ |   |
| **Mistral-7b-AWQ** || | |✅ | |   |
| **Mixtral-8x7b-instruct**  | | |  | | ✅ |   |
| **Llama3-8b instruct**  |✅ ||✅  | |  |   |
| **Llama3-70b instruct**  |✅ ||✅  | | |   |
| **Llama2-13b chat**  |✅ |✅ |✅  | | ✅  |   |
| **Llama2-70b chat**  |✅ |✅ |✅  | | ✅  |   |
| **Amazon Titan text lite**  | | |  | | ✅ |   |
| **Amazon Titan text express**  | | |  | | ✅ |   |
| **Cohere Command text**  | | |  | | ✅ |   |
| **Cohere Command light text**  | | |  | | ✅ |   |
| **AI21 J2 Mid**  | | |  | | ✅ |   |
| **AI21 J2 Ultra** | | |  | | ✅ |   |
| **distilbert-base-uncased**  |  ✅ | |  | ||   |

## New in this release

### v1.0.31

1. Meta Llama3 benchmarking on Amazon SageMaker.

### v1.0.29

1. Support for Amazon Bedrock. Benchmark models available on Bedrock, both on-demand throughput and provisioned throughput.

### v1.0.28

1. Support for HuggingFace datasets as well as bring your own datasets, more [here](https://github.com/aws-samples/foundation-model-benchmarking-tool?tab=readme-ov-file#bring-your-own-dataset--endpoint).

1. Support for external endpoints. No longer limited to Amazon SageMaker endpoints, more [here](https://github.com/aws-samples/foundation-model-benchmarking-tool?tab=readme-ov-file#bring-your-own-dataset--endpoint).

1. Bring your own `Amazon SageMaker` endpoints. If you have an already deployed SageMaker endpoint you can now test it with `FMBench`.

1. Added config files for [`Mistral-7B-Instruct`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), [`Mistral-7B-Instruct-v0.2-AWQ`](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-AWQ), `huggingface-tc-distilbert-base-uncased` (from SageMaker JumpStart), `meta-textgenerationneuron-llama-2-70b-f` (on AWS Inferentia2).

## Key Features

1. Benchmark any model on any serving stack as long as it can be deployed on Amazon SageMaker.

1. Bring your own script for model deployment if the model is not natively available via Amazon SageMaker JumpStart. 

1. Bring your own tokenizer for your model, configure any inference container parameters you need.

1. Auto-generated reports comparing and contrasting different serving options.

## Installation

1. Launch the AWS CloudFormation template included in this repository using one of the buttons from the table below. The CloudFormation template creates the following resources within your AWS account: Amazon S3 buckets, Amazon IAM role and an Amazon SageMaker Notebook with this repository cloned. A read S3 bucket is created which contains all the files (configuration files, datasets) required to run `FMBench` and a write S3 bucket is created which will hold the metrics and reports generated by `FMBench`. The CloudFormation stack takes about 5-minutes to create.

   |AWS Region                |     Link        |
   |:------------------------:|:-----------:|
   |us-east-1 (N. Virginia)    | [<img src="https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/img/ML-FMBT-cloudformation-launch-stack.png?raw=true">](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?stackName=fmbench&templateURL=https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/ML-FMBT/template.yml) |

1. Once the CloudFormation stack is created, navigate to SageMaker Notebooks and open the `fmbench-notebook`.

1. On the `fmbench-notebook` open a Terminal and run the following commands.

    ```{.bash}
    conda create --name fmbench_python311 -y python=3.11 ipykernel
    source activate fmbench_python311;
    pip install -U fmbench
    ```

## Steps to run

1. Now you are ready to `fmbench` with the following command line. We will use a sample config file placed in the S3 bucket by the CloudFormation stack for a quick first run.
    
    1. We benchmark performance for the `Llama2-7b` model on a `ml.g5.xlarge` and a `ml.g5.2xlarge` instance type, using the `huggingface-pytorch-tgi-inference` inference container. This test would take about 30 minutes to complete and cost about $0.20.
    
    1. It uses a simple relationship of 750 words equals 1000 tokens, to get a more accurate representation of token counts use the `Llama2 tokenizer` (instructions are provided in the next section). ***It is strongly recommended that for more accurate results on token throughput you use a tokenizer specific to the model you are testing rather than the default tokenizer. See instructions provided later in this document on how to use a custom tokenizer***.

        ```{.bash}
        account=`aws sts get-caller-identity | jq .Account | tr -d '"'`
        fmbench --config-file s3://sagemaker-fmbench-read-${account}/configs/config-llama2-7b-g5-quick.yml
        ```

1. The generated reports and metrics are available in the `sagemaker-fmbench-write-<replace_w_your_aws_account_id>` bucket. The metrics and report files are also downloaded locally and in the `results` directory (created by `FMBench`) and the benchmarking report is available as a markdown file called `report.md` in the `results` directory. You can view the rendered Markdown report in the SageMaker notebook itself or download the metrics and report files to your machine for offline analysis.

## License

[MIT-0](https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/LICENSE)

## Documentation

The official documentation is available in the [GitHub repo](https://github.com/aws-samples/foundation-model-benchmarking-tool).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/aws-samples/foundation-model-benchmarking-tool",
    "name": "fmbench",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.11",
    "maintainer_email": null,
    "keywords": "benchmarking, sagemaker, bedrock, bring your own endpoint, generative-ai, foundation-models",
    "author": "Amit Arora",
    "author_email": "aroraai@amazon.com",
    "download_url": "https://files.pythonhosted.org/packages/e0/1a/08ad96c61a11fb44bba24215da5188108b22c4872d11b2bc910495b948ab/fmbench-1.0.31.tar.gz",
    "platform": null,
    "description": "# Foundation Model benchmarking tool (FMBench)\n\n<h1 align=\"center\">\n        <img src=\"https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/img/fmbt-small.png?raw=true\"></img>\n    </h1>\n    <p align=\"center\">\n        <p align=\"center\">Benchmark any Foundation Model (FM) on any AWS service [Amazon SageMaker, Amazon Bedrock, Amazon EKS, Bring your own endpoint etc.]\n        <br>\n    </p>\n<h4 align=\"center\"><a href=\"\" target=\"_blank\">Amazon SageMaker</a> | <a href=\"\" target=\"_blank\">Amazon Bedrock</a></h4>\n<h4 align=\"center\">\n    <a href=\"https://pypi.org/project/fmbench/\" target=\"_blank\">\n        <img src=\"https://img.shields.io/pypi/v/fmbench.svg\" alt=\"PyPI Version\">\n    </a>    \n</h4>\n\nA key challenge with FMs is the ability to benchmark their performance in terms of inference latency, throughput and cost so as to determine which model running with what combination of the hardware and serving stack provides the best price-performance combination for a given workload.\n\nStated as **business problem**, the ask is \u201c_*What is the dollar cost per transaction for a given generative AI workload that serves a given number of users while keeping the response time under a target threshold?*_\u201d\n\nBut to really answer this question, we need to answer an **engineering question** (an optimization problem, actually) corresponding to this business problem: \u201c*_What is the minimum number of instances N, of most cost optimal instance type T, that are needed to serve a workload W while keeping the average transaction latency under L seconds?_*\u201d\n\n*W: = {R transactions per-minute, average prompt token length P, average generation token length G}*\n\nThis foundation model benchmarking tool (a.k.a. `FMBench`) is a tool to answer the above engineering question and thus answer the original business question about how to get the best price performance for a given workload. Here is one of the plots generated by `FMBench` to help answer the above question (_the numbers on the y-axis, transactions per minute and latency have been removed from the image below, you can find them in the actual plot generated on running `FMBench`_).\n\n![business question](https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/img/business_summary.png?raw=true)\n\n## Models benchmarked\n\nConfiguration files are available in the [configs](./src/fmbench/configs) folder for the following models in this repo.\n\n| Model    | SageMaker g4dn/g5/p3 | SageMaker Inf2 | SageMaker P4 | SageMaker P5 | Bedrock On-demand throughput | Bedrock provisioned throughput |\n|:------------------|:-----------------|:----------------|:--------------|:--------------|:------------------------------|:--------------------------------|\n| **Anthropic Claude-3 Sonnet** | | |  | | \u2705 | \u2705  | \n| **Anthropic Claude-3 Haiku**  | | |  | | \u2705 |   |\n| **Mistral-7b-instruct** |\u2705 | |\u2705  |\u2705 | \u2705 |   |\n| **Mistral-7b-AWQ** || | |\u2705 | |   |\n| **Mixtral-8x7b-instruct**  | | |  | | \u2705 |   |\n| **Llama3-8b instruct**  |\u2705 ||\u2705  | |  |   |\n| **Llama3-70b instruct**  |\u2705 ||\u2705  | | |   |\n| **Llama2-13b chat**  |\u2705 |\u2705 |\u2705  | | \u2705  |   |\n| **Llama2-70b chat**  |\u2705 |\u2705 |\u2705  | | \u2705  |   |\n| **Amazon Titan text lite**  | | |  | | \u2705 |   |\n| **Amazon Titan text express**  | | |  | | \u2705 |   |\n| **Cohere Command text**  | | |  | | \u2705 |   |\n| **Cohere Command light text**  | | |  | | \u2705 |   |\n| **AI21 J2 Mid**  | | |  | | \u2705 |   |\n| **AI21 J2 Ultra** | | |  | | \u2705 |   |\n| **distilbert-base-uncased**  |  \u2705 | |  | ||   |\n\n## New in this release\n\n### v1.0.31\n\n1. Meta Llama3 benchmarking on Amazon SageMaker.\n\n### v1.0.29\n\n1. Support for Amazon Bedrock. Benchmark models available on Bedrock, both on-demand throughput and provisioned throughput.\n\n### v1.0.28\n\n1. Support for HuggingFace datasets as well as bring your own datasets, more [here](https://github.com/aws-samples/foundation-model-benchmarking-tool?tab=readme-ov-file#bring-your-own-dataset--endpoint).\n\n1. Support for external endpoints. No longer limited to Amazon SageMaker endpoints, more [here](https://github.com/aws-samples/foundation-model-benchmarking-tool?tab=readme-ov-file#bring-your-own-dataset--endpoint).\n\n1. Bring your own `Amazon SageMaker` endpoints. If you have an already deployed SageMaker endpoint you can now test it with `FMBench`.\n\n1. Added config files for [`Mistral-7B-Instruct`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), [`Mistral-7B-Instruct-v0.2-AWQ`](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-AWQ), `huggingface-tc-distilbert-base-uncased` (from SageMaker JumpStart), `meta-textgenerationneuron-llama-2-70b-f` (on AWS Inferentia2).\n\n## Key Features\n\n1. Benchmark any model on any serving stack as long as it can be deployed on Amazon SageMaker.\n\n1. Bring your own script for model deployment if the model is not natively available via Amazon SageMaker JumpStart. \n\n1. Bring your own tokenizer for your model, configure any inference container parameters you need.\n\n1. Auto-generated reports comparing and contrasting different serving options.\n\n## Installation\n\n1. Launch the AWS CloudFormation template included in this repository using one of the buttons from the table below. The CloudFormation template creates the following resources within your AWS account: Amazon S3 buckets, Amazon IAM role and an Amazon SageMaker Notebook with this repository cloned. A read S3 bucket is created which contains all the files (configuration files, datasets) required to run `FMBench` and a write S3 bucket is created which will hold the metrics and reports generated by `FMBench`. The CloudFormation stack takes about 5-minutes to create.\n\n   |AWS Region                |     Link        |\n   |:------------------------:|:-----------:|\n   |us-east-1 (N. Virginia)    | [<img src=\"https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/img/ML-FMBT-cloudformation-launch-stack.png?raw=true\">](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?stackName=fmbench&templateURL=https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/ML-FMBT/template.yml) |\n\n1. Once the CloudFormation stack is created, navigate to SageMaker Notebooks and open the `fmbench-notebook`.\n\n1. On the `fmbench-notebook` open a Terminal and run the following commands.\n\n    ```{.bash}\n    conda create --name fmbench_python311 -y python=3.11 ipykernel\n    source activate fmbench_python311;\n    pip install -U fmbench\n    ```\n\n## Steps to run\n\n1. Now you are ready to `fmbench` with the following command line. We will use a sample config file placed in the S3 bucket by the CloudFormation stack for a quick first run.\n    \n    1. We benchmark performance for the `Llama2-7b` model on a `ml.g5.xlarge` and a `ml.g5.2xlarge` instance type, using the `huggingface-pytorch-tgi-inference` inference container. This test would take about 30 minutes to complete and cost about $0.20.\n    \n    1. It uses a simple relationship of 750 words equals 1000 tokens, to get a more accurate representation of token counts use the `Llama2 tokenizer` (instructions are provided in the next section). ***It is strongly recommended that for more accurate results on token throughput you use a tokenizer specific to the model you are testing rather than the default tokenizer. See instructions provided later in this document on how to use a custom tokenizer***.\n\n        ```{.bash}\n        account=`aws sts get-caller-identity | jq .Account | tr -d '\"'`\n        fmbench --config-file s3://sagemaker-fmbench-read-${account}/configs/config-llama2-7b-g5-quick.yml\n        ```\n\n1. The generated reports and metrics are available in the `sagemaker-fmbench-write-<replace_w_your_aws_account_id>` bucket. The metrics and report files are also downloaded locally and in the `results` directory (created by `FMBench`) and the benchmarking report is available as a markdown file called `report.md` in the `results` directory. You can view the rendered Markdown report in the SageMaker notebook itself or download the metrics and report files to your machine for offline analysis.\n\n## License\n\n[MIT-0](https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/LICENSE)\n\n## Documentation\n\nThe official documentation is available in the [GitHub repo](https://github.com/aws-samples/foundation-model-benchmarking-tool).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Benchmark performance of **any model** deployed on **Amazon SageMaker** or available on **Amazon Bedrock** or deployed by you on an AWS service of choice (such as Amazon EKS or Amazon EC2) a.k.a **Bring your own endpoint**.",
    "version": "1.0.31",
    "project_urls": {
        "Homepage": "https://github.com/aws-samples/foundation-model-benchmarking-tool",
        "Repository": "https://github.com/aws-samples/foundation-model-benchmarking-tool"
    },
    "split_keywords": [
        "benchmarking",
        " sagemaker",
        " bedrock",
        " bring your own endpoint",
        " generative-ai",
        " foundation-models"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5034f56ed6dea2dcff82c8ced2e4dc5538e0ea1844fa3c6a9e5cfc6e0e71ec12",
                "md5": "d3abf1bd42daacd404c65acb2369f2bb",
                "sha256": "e8fcc9b451418319af41aaf37e3695577a914a486ff07c707a2dab926fa2f6d8"
            },
            "downloads": -1,
            "filename": "fmbench-1.0.31-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d3abf1bd42daacd404c65acb2369f2bb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.11",
            "size": 584453,
            "upload_time": "2024-04-22T01:59:35",
            "upload_time_iso_8601": "2024-04-22T01:59:35.024493Z",
            "url": "https://files.pythonhosted.org/packages/50/34/f56ed6dea2dcff82c8ced2e4dc5538e0ea1844fa3c6a9e5cfc6e0e71ec12/fmbench-1.0.31-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e01a08ad96c61a11fb44bba24215da5188108b22c4872d11b2bc910495b948ab",
                "md5": "69be562d3032968ae86fb4c5689d78ec",
                "sha256": "c05705d5cd0c05a7032a7fa343b0d75f5e85efdd13f08aac7345eb08215ca727"
            },
            "downloads": -1,
            "filename": "fmbench-1.0.31.tar.gz",
            "has_sig": false,
            "md5_digest": "69be562d3032968ae86fb4c5689d78ec",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.11",
            "size": 534816,
            "upload_time": "2024-04-22T01:59:37",
            "upload_time_iso_8601": "2024-04-22T01:59:37.156044Z",
            "url": "https://files.pythonhosted.org/packages/e0/1a/08ad96c61a11fb44bba24215da5188108b22c4872d11b2bc910495b948ab/fmbench-1.0.31.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-22 01:59:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aws-samples",
    "github_project": "foundation-model-benchmarking-tool",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "fmbench"
}
        
Elapsed time: 0.24160s