fmbt

Name	fmbt JSON
Version	1.0.7 JSON
	download
home_page
Summary	Benchmark performance of any model on any supported instance type on Amazon SageMaker.
upload_time	2024-02-12 03:57:36
maintainer
docs_url	None
author	Amit Arora
requires_python	>=3.11,<4.0
license
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# Foundation Model benchmarking tool (FMBT) built using Amazon SageMaker

![Foundation Model Benchmarking Tool](https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/img/fmbt-small.png?raw=true)

A key challenge with FMs is the ability to benchmark their performance in terms of inference latency, throughput and cost so as to determine which model running with what combination of the hardware and serving stack provides the best price-performance combination for a given workload.

Stated as **business problem**, the ask is “_*What is the dollar cost per transaction for a given generative AI workload that serves a given number of users while keeping the response time under a target threshold?*_”

But to really answer this question, we need to answer an **engineering question** (an optimization problem, actually) corresponding to this business problem: “*_What is the minimum number of instances N, of most cost optimal instance type T, that are needed to serve a workload W while keeping the average transaction latency under L seconds?_*”

*W: = {R transactions per-minute, average prompt token length P, average generation token length G}*

This foundation model benchmarking tool (a.k.a. `FMBT`) is a tool to answer the above engineering question and thus answer the original business question about how to get the best price performance for a given workload. Here is one of the plots generated by `FMBT` to help answer the above question (_the numbers on the y-axis, transactions per minute and latency have been removed from the image below, you can find them in the actual plot generated on running `FMBT`_).

![business question](https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/img/business_summary.png?raw=true)

## Description

The `FMBT` is a Python package for running performance benchmarks for **any model** on **any supported instance type** (`g5`, `p4d`, `Inf2`). `FMBT` deploys models on Amazon SageMaker and use the endpoint to send inference requests to and measure metrics such as inference latency, error rate, transactions per second etc. for different combinations of instance type, inference container and settings such as tensor parallelism etc. Because `FMBT` works for any model therefore it can be used not only testing _third party models_ available on SageMaker, _open-source models_ but also _proprietary models_ trained by enterprises on their own data.

>In a production system you may choose to deploy models outside of SageMaker such as on EC2 or EKS but even in those scenarios the benchmarking results from this tool can be used as a guide for determining the optimal instance type and serving stack (inference containers, configuration).

`FMBT` can be run on any AWS platform where we can run Python, such as Amazon EC2, Amazon SageMaker, or even the AWS CloudShell. It is important to run this tool on an AWS platform so that internet round trip time does not get included in the end-to-end response time latency.

The workflow for `FMBT` is as follows:

1. Create a dataset of different prompt sizes and select one or more such datasets for running the tests.
1. Currently `FMBT` supports datasets from [LongBench](https://github.com/THUDM/LongBench) and filter out individual items from the dataset based on their size in tokens (for example, prompts less than 500 tokens, between 500 to 1000 tokens and so on and so forth). Alternatively, you can download the folder from [this link](https://huggingface.co/datasets/THUDM/LongBench/resolve/main/data.zip) to load the data.

1. Deploy **any model** that is deployable on SageMaker on **any supported instance type** (`g5`, `p4d`, `Inf2`).
1. Models could be either available via SageMaker JumpStart (list available [here](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html)) as well as models not available via JumpStart but still deployable on SageMaker through the low level boto3 (Python) SDK (Bring Your Own Script).
1. Model deployment is completely configurable in terms of the inference container to use, environment variable to set, `setting.properties` file to provide (for inference containers such as DJL that use it) and instance type to use.

1. Benchmark FM performance in terms of inference latency, transactions per minute and dollar cost per transaction for any FM that can be deployed on SageMaker.
1. Tests are run for each combination of the configured concurrency levels i.e. transactions (inference requests) sent to the endpoint in parallel and dataset. For example, run multiple datasets of say prompt sizes between 3000 to 4000 tokens at concurrency levels of 1, 2, 4, 6, 8 etc. so as to test how many transactions of what token length can the endpoint handle while still maintaining an acceptable level of inference latency.

1. Generate a report that compares and contrasts the performance of the model over different test configurations and stores the reports in an Amazon S3 bucket.
1. The report is generated in the [Markdown](https://en.wikipedia.org/wiki/Markdown) format and consists of plots, tables and text that highlight the key results and provide an overall recommendation on what is the best combination of instance type and serving stack to use for the model under stack for a dataset of interest.
1. The report is created as an artifact of reproducible research so that anyone having access to the model, instance type and serving stack can run the code and recreate the same results and report.

1. Multiple [configuration files](https://github.com/aws-samples/jumpstart-models-benchmarking-test-harness/tree/main/configs) that can be used as reference for benchmarking new models and instance types.

## Getting started

`FMBT` is available as a Python package on [PyPi](https://pypi.org/project/fmbt) and is run as a command line tool once it is installed. All data that includes metrics, reports and results are stored in an Amazon S3 bucket.

### Prerequisites

Follow the prerequisites below to set up your environment before running the code:

1. **Python 3.11**: Setup a Python 3.11 virtual environment and install `FMBT`.
```{.bash}
python -m venv .fmbt
pip install fmbt==1.0.0
```

1. **S3 buckets for test data, scripts, and results**: Create two buckets within your AWS account:

* _Read bucket_: This bucket contains `tokenizer files`, `prompt template`, `source data` and `deployment scripts` stored in a directory structure as shown below. `FMBT` needs to have read access to this bucket.

```{.bash}
s3://<read-bucket-name>
├── source_data/
├── source_data/<source-data-file-name>.json
├── prompt_template/
├── prompt_template/prompt_template.txt
├── scripts/
├── scripts/<deployment-script-name>.py
├── tokenizer/
├── tokenizer/tokenizer.json
├── tokenizer/config.json
```

* The details of the bucket structure is as follows:

1. **Source Data Directory**: Create a `source_data` directory that stores the dataset you want to benchmark with. `FMBT` uses `Q&A` datasets from the [`LongBench dataset`](https://github.com/THUDM/LongBench) or alternatively from [this link](https://huggingface.co/datasets/THUDM/LongBench/resolve/main/data.zip). _Support for bring your own dataset will be added soon_.

* Download the different files specified in the [LongBench dataset](https://github.com/THUDM/LongBench) into the `source_data` directory. Following is a good list to get started with:

* `2wikimqa`
* `hotpotqa`
* `narrativeqa`
* `triviaqa`

Store these files in the `source_data` directory.

1. **Prompt Template Directory**: Create a `prompt_template` directory that contains a `prompt_template.txt` file. This `.txt` file contains the prompt template that your specific model supports. `FMBT` already supports the [prompt template](src/fmbt/prompt_template/prompt_template.txt) compatible with `Llama` models.

1. **Scripts Directory**: `FMBT` also supports a `bring your own script (BYOS)` mode for deploying models that are not natively available via SageMaker JumpStart i.e. anything not included in [this](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html) list. Here are the steps to use BYOS.

1. Create a Python script to deploy your model on a SageMaker endpoint. This script needs to have a `deploy` function that [`1_deploy_model.ipynb`](./1_deploy_model.ipynb) can invoke. See [`p4d_hf_tgi.py`](./scripts/p4d_hf_tgi.py) for reference.

1. Place your deployment script in the `scripts` directory in your ***read bucket***. If your script deploys a model directly from HuggingFace and needs to have access to a HuggingFace auth token, then create a file called `hf_token.txt` and put the auth token in that file. The [`.gitignore`](.gitgnore) file in this repo has rules to not commit the `hf_token.txt` to the repo. Today, `FMBT` provides inference scripts for:

* [All SageMaker Jumpstart Models](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models.html)
* [Text-Generation-Inference (TGI) container supported models](https://huggingface.co/text-generation-inference)
* [Deep Java Library DeepSpeed container supported models](https://docs.djl.ai/docs/serving/serving/docs/lmi/configurations_large_model_inference_containers.html)

Deployment scripts for the options above are available in the [scripts](https://github.com/aws-samples/foundation-model-benchmarking-tool/tree/s3_metrics/scripts) directory, you can use these as reference for creating your own deployment scripts as well.

1. **Tokenizer Directory**: Place the `tokenizer.json`, `config.json` and any other files required for your model's tokenizer in the `tokenizer` directory. The tokenizer for your model should be compatible with the [`tokenizers`](https://pypi.org/project/tokenizers/) package. `FMBT` uses `AutoTokenizer.from_pretrained` to load the tokenizer.
>As an example, to use the `Llama 2 Tokenizer` for counting prompt and generation tokens for the `Llama 2` family of models: Accept the License here: [meta approval form](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and download the `tokenizer.json` and `config.json` files from [Hugging Face website](https://huggingface.co/meta-llama/Llama-2-7b/tree/main) and place them in the `tokenizer` directory.

* _Write bucket_: All prompt payloads, model endpoint and metrics generated by `FMBT` are stored in this bucket. `FMBT` requires write permissions to store the results in this bucket. No directory structure needs to be pre-created in this bucket, everything is created by `FMBT` at runtime.

```{.bash}
s3://<write-bucket-name>
├── <test-name>
├── <test-name>/data
├── <test-name>/data/metrics
├── <test-name>/data/models
├── <test-name>/data/prompts

### Steps to run

1. `pip install` the `FMBT` package from PyPi.

1. Create a config file using one of the config files available [here](https://github.com/aws-samples/foundation-model-benchmarking-tool/tree/main/src/fmbt/configs).
1. The configuration file is a YAML file containing configuration for all steps of the benchmarking process. It is recommended to create a copy of an existing config file and tweak it as necessary to create a new one for your experiment.

1. Create the read and write buckets as mentioned in the prerequisites section. Mention the respective directories for your read and write buckets within the config files.

1. Run the `FMBT` tool from the command line.

```{.bash}
# the config file path could be an S3 path and https path
# or even a path to a file on the local filesystem
fmbt --config-file \path\to\config\file
```

1. Depending upon the experiments in the config file, the `FMBT` run may take a few minutes to several hours. Once the run completes, you can find the report and metrics in the write S3 bucket set in the [config file](https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/src/fmbt/configs/config-mistral-7b-tgi-g5.yml#L12). The report is generated as a markdown file called `report.md` and is available in the metrics directory in the write S3 bucket.

## Results

Here is a screenshot of the `report.md` file generated by `FMBT`.
![Report](https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/img/results.gif?raw=true)

## Building the `FMBT` Python package

The following steps describe how to build the `FMBT` Python package.

1. Clone the `FMBT` repo from GitHub.

1. Make any code changes as needed.

1. Install [`poetry`](https://pypi.org/project/poetry/).

```{.bash}
pip install poetry
```

1. Change directory to the `FMBT` repo directory and run poetry build.

```{.bash}
poetry build
```

1. The `.whl` file is generated in the `dist` folder. Install the `.whl` in your current Python environment.

```{.bash}
pip install .\dist\fmbt-X.Y.Z.tar.gz
```

1. Run `FMBT` as usual through the `fmbt` CLI command.

## Pending enhancements

The following enhancements are identified as next steps for `FMBT`.

1. [**Highest priority**] Convert `FMBT` to a Python package and publish on [PyPi](https://pypi.org/).

1. Containerize `FMBT` and provide instructions for running the container on EC2.

1. Add code to determine the cost of running an entire experiment and include it in the final report. This would only include the cost of running the SageMaker endpoints based on hourly public pricing (the cost of running this code on a notebook or a EC2 is trivial in comparison and can be ignored).

1. Support for a custom token counter. Currently only the LLama tokenizer is supported but we want to allow users to bring their own token counting logic for different models.

1. Support for different payload formats that might be needed for different inference containers. Currently the HF TGI container, and DJL Deep Speed container on SageMaker both use the same format but in future other containers might need a different payload format.

1. Emit live metrics so that they can be monitored through Grafana via live dashboard.

1. Allow users to publish their experiment configs and results by doing a POST to an AWS Lambda that writes results to a common S3 bucket that can serve as storage for a simple website.

1. Create a leaderboard of model benchmarks.

View the [ISSUES](https://github.com/aws-samples/foundation-model-benchmarking-tool/issues) on github and add any you might think be an beneficial iteration to this benchmarking harness.

## Security

See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.

## License

This library is licensed under the MIT-0 License. See the [LICENSE](./LICENSE) file.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "fmbt",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.11,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Amit Arora",
    "author_email": "aroraai@amazon.com",
    "download_url": "https://files.pythonhosted.org/packages/eb/5c/eeb31461fc072a3fe5a2aa0999b0ddef3b66929bb98a80da0342dbee3426/fmbt-1.0.7.tar.gz",
    "platform": null,
    "description": "# Foundation Model benchmarking tool (FMBT) built using Amazon SageMaker\n\n![Foundation Model Benchmarking Tool](https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/img/fmbt-small.png?raw=true)\n\nA key challenge with FMs is the ability to benchmark their performance in terms of inference latency, throughput and cost so as to determine which model running with what combination of the hardware and serving stack provides the best price-performance combination for a given workload.\n\nStated as **business problem**, the ask is \u201c_*What is the dollar cost per transaction for a given generative AI workload that serves a given number of users while keeping the response time under a target threshold?*_\u201d\n\nBut to really answer this question, we need to answer an **engineering question** (an optimization problem, actually) corresponding to this business problem: \u201c*_What is the minimum number of instances N, of most cost optimal instance type T, that are needed to serve a workload W while keeping the average transaction latency under L seconds?_*\u201d\n\n*W: = {R transactions per-minute, average prompt token length P, average generation token length G}*\n\nThis foundation model benchmarking tool (a.k.a. `FMBT`) is a tool to answer the above engineering question and thus answer the original business question about how to get the best price performance for a given workload. Here is one of the plots generated by `FMBT` to help answer the above question (_the numbers on the y-axis, transactions per minute and latency have been removed from the image below, you can find them in the actual plot generated on running `FMBT`_).\n\n![business question](https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/img/business_summary.png?raw=true)\n\n## Description\n\nThe `FMBT` is a Python package for running performance benchmarks for **any model** on **any supported instance type** (`g5`, `p4d`, `Inf2`). `FMBT` deploys models on Amazon SageMaker and use the endpoint to send inference requests to and measure metrics such as inference latency, error rate, transactions per second etc. for different combinations of instance type, inference container and settings such as tensor parallelism etc. Because `FMBT` works for any model therefore it can be used not only testing _third party models_ available on SageMaker, _open-source models_ but also _proprietary models_ trained by enterprises on their own data.\n\n>In a production system you may choose to deploy models outside of SageMaker such as on EC2 or EKS but even in those scenarios the benchmarking results from this tool can be used as a guide for determining the optimal instance type and serving stack (inference containers, configuration).\n\n`FMBT` can be run on any AWS platform where we can run Python, such as Amazon EC2, Amazon SageMaker, or even the AWS CloudShell. It is important to run this tool on an AWS platform so that internet round trip time does not get included in the end-to-end response time latency.\n\nThe workflow for `FMBT` is as follows:\n\n```\nCreate configuration file\n        |\n        |-----> Deploy model on SageMaker\n                    |\n                    |-----> Run inference against deployed endpoint(s)\n                                     |\n                                     |------> Create a benchmarking report\n```\n\n1. Create a dataset of different prompt sizes and select one or more such datasets for running the tests.\n    1. Currently `FMBT` supports datasets from [LongBench](https://github.com/THUDM/LongBench) and filter out individual items from the dataset based on their size in tokens (for example, prompts less than 500 tokens, between 500 to 1000 tokens and so on and so forth). Alternatively, you can download the folder from [this link](https://huggingface.co/datasets/THUDM/LongBench/resolve/main/data.zip) to load the data.\n\n1. Deploy **any model** that is deployable on SageMaker on **any supported instance type** (`g5`, `p4d`, `Inf2`).\n    1. Models could be either available via SageMaker JumpStart (list available [here](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html)) as well as models not available via JumpStart but still deployable on SageMaker through the low level boto3 (Python) SDK (Bring Your  Own Script).\n    1. Model deployment is completely configurable in terms of the inference container to use, environment variable to set, `setting.properties` file to provide (for inference containers such as DJL that use it) and instance type to use.\n\n1. Benchmark FM performance in terms of inference latency, transactions per minute and dollar cost per transaction for any FM that can be deployed on SageMaker.\n    1. Tests are run for each combination of the configured concurrency levels i.e. transactions (inference requests) sent to the endpoint in parallel and dataset. For example, run multiple datasets of say prompt sizes between 3000 to 4000 tokens at concurrency levels of 1, 2, 4, 6, 8 etc. so as to test how many transactions of what token length can the endpoint handle while still maintaining an acceptable level of inference latency.\n\n1. Generate a report that compares and contrasts the performance of the model over different test configurations and stores the reports in an Amazon S3 bucket.\n    1. The report is generated in the [Markdown](https://en.wikipedia.org/wiki/Markdown) format and consists of plots, tables and text that highlight the key results and provide an overall recommendation on what is the best combination of instance type and serving stack to use for the model under stack for a dataset of interest.\n    1. The report is created as an artifact of reproducible research so that anyone having access to the model, instance type and serving stack can run the code and recreate the same results and report.\n\n1. Multiple [configuration files](https://github.com/aws-samples/jumpstart-models-benchmarking-test-harness/tree/main/configs) that can be used as reference for benchmarking new models and instance types.\n\n## Getting started\n\n`FMBT` is available as a Python package on [PyPi](https://pypi.org/project/fmbt) and is run as a command line tool once it is installed. All data that includes metrics, reports and results are stored in an Amazon S3 bucket.\n\n### Prerequisites\n\nFollow the prerequisites below to set up your environment before running the code:\n\n1. **Python 3.11**: Setup a Python 3.11 virtual environment and install `FMBT`.\n    ```{.bash}\n    python -m venv .fmbt\n    pip install fmbt==1.0.0\n    ```\n\n1. **S3 buckets for test data, scripts, and results**: Create two buckets within your AWS account:\n\n    * _Read bucket_: This bucket contains `tokenizer files`, `prompt template`, `source data` and `deployment scripts` stored in a directory structure as shown below. `FMBT` needs to have read access to this bucket.\n    \n        ```{.bash}\n        s3://<read-bucket-name>\n            \u251c\u2500\u2500 source_data/\n            \u251c\u2500\u2500 source_data/<source-data-file-name>.json\n            \u251c\u2500\u2500 prompt_template/\n            \u251c\u2500\u2500 prompt_template/prompt_template.txt\n            \u251c\u2500\u2500 scripts/\n            \u251c\u2500\u2500 scripts/<deployment-script-name>.py\n            \u251c\u2500\u2500 tokenizer/\n            \u251c\u2500\u2500 tokenizer/tokenizer.json\n            \u251c\u2500\u2500 tokenizer/config.json\n        ```\n\n        * The details of the bucket structure is as follows:\n\n            1. **Source Data Directory**: Create a `source_data` directory that stores the dataset you want to benchmark with. `FMBT` uses `Q&A` datasets from the [`LongBench dataset`](https://github.com/THUDM/LongBench) or alternatively from [this link](https://huggingface.co/datasets/THUDM/LongBench/resolve/main/data.zip). _Support for bring your own dataset will be added soon_.\n\n                * Download the different files specified in the [LongBench dataset](https://github.com/THUDM/LongBench) into the `source_data` directory. Following is a good list to get started with:\n\n                    * `2wikimqa`\n                    * `hotpotqa`\n                    * `narrativeqa`\n                    * `triviaqa`\n                \n                    Store these files in the `source_data` directory.\n\n            1. **Prompt Template Directory**: Create a `prompt_template` directory that contains a `prompt_template.txt` file. This `.txt` file contains the prompt template that your specific model supports. `FMBT` already supports the [prompt template](src/fmbt/prompt_template/prompt_template.txt) compatible with `Llama` models.\n\n            1. **Scripts Directory**: `FMBT` also supports a `bring your own script (BYOS)` mode for deploying models that are not natively available via SageMaker JumpStart i.e. anything not included in [this](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html) list. Here are the steps to use BYOS.\n\n                1. Create a Python script to deploy your model on a SageMaker endpoint. This script needs to have a `deploy` function that [`1_deploy_model.ipynb`](./1_deploy_model.ipynb) can invoke. See [`p4d_hf_tgi.py`](./scripts/p4d_hf_tgi.py) for reference.\n\n                1. Place your deployment script in the `scripts` directory in your ***read bucket***. If your script deploys a model directly from HuggingFace and needs to have access to a HuggingFace auth token, then create a file called `hf_token.txt` and put the auth token in that file. The [`.gitignore`](.gitgnore) file in this repo has rules to not commit the `hf_token.txt` to the repo. Today, `FMBT` provides inference scripts for:\n\n                    * [All SageMaker Jumpstart Models](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models.html)\n                    * [Text-Generation-Inference (TGI) container supported models](https://huggingface.co/text-generation-inference)\n                    * [Deep Java Library DeepSpeed container supported models](https://docs.djl.ai/docs/serving/serving/docs/lmi/configurations_large_model_inference_containers.html)\n\n\n                    Deployment scripts for the options above are available in the [scripts](https://github.com/aws-samples/foundation-model-benchmarking-tool/tree/s3_metrics/scripts) directory, you can use these as reference for creating your own deployment scripts as well.\n\n            1. **Tokenizer Directory**: Place the `tokenizer.json`, `config.json` and any other files required for your model's tokenizer in the `tokenizer` directory. The tokenizer for your model should be compatible with the [`tokenizers`](https://pypi.org/project/tokenizers/) package. `FMBT` uses `AutoTokenizer.from_pretrained` to load the tokenizer.\n                >As an example, to use the `Llama 2 Tokenizer` for counting prompt and generation tokens for the `Llama 2` family of models: Accept the License here: [meta approval form](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and download the `tokenizer.json` and `config.json` files from [Hugging Face website](https://huggingface.co/meta-llama/Llama-2-7b/tree/main) and place them in the `tokenizer` directory.\n\n    * _Write bucket_: All prompt payloads, model endpoint and metrics generated by `FMBT` are stored in this bucket. `FMBT` requires write permissions to store the results in this bucket. No directory structure needs to be pre-created in this bucket, everything is created by `FMBT` at runtime.\n\n        ```{.bash}\n        s3://<write-bucket-name>\n            \u251c\u2500\u2500 <test-name>\n            \u251c\u2500\u2500 <test-name>/data\n            \u251c\u2500\u2500 <test-name>/data/metrics\n            \u251c\u2500\u2500 <test-name>/data/models\n            \u251c\u2500\u2500 <test-name>/data/prompts\n\n### Steps to run\n\n1. `pip install` the `FMBT` package from PyPi.\n\n1. Create a config file using one of the config files available [here](https://github.com/aws-samples/foundation-model-benchmarking-tool/tree/main/src/fmbt/configs).\n    1. The configuration file is a YAML file containing configuration for all steps of the benchmarking process. It is recommended to create a copy of an existing config file and tweak it as necessary to create a new one for your experiment.\n\n1. Create the read and write buckets as mentioned in the prerequisites section. Mention the respective directories for your read and write buckets within the config files.\n\n1. Run the `FMBT` tool from the command line.\n\n    ```{.bash}\n    # the config file path could be an S3 path and https path \n    # or even a path to a file on the local filesystem\n    fmbt --config-file \\path\\to\\config\\file\n    ```\n\n1. Depending upon the experiments in the config file, the `FMBT` run may take a few minutes to several hours. Once the run completes, you can find the report and metrics in the write S3 bucket set in the [config file](https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/src/fmbt/configs/config-mistral-7b-tgi-g5.yml#L12). The report is generated as a markdown file called `report.md` and is available in the metrics directory in the write S3 bucket.\n\n## Results\n\nHere is a screenshot of the `report.md` file generated by `FMBT`.\n![Report](https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/img/results.gif?raw=true)\n\n\n## Building the `FMBT` Python package\n\nThe following steps describe how to build the `FMBT` Python package.\n\n1. Clone the `FMBT` repo from GitHub.\n\n1. Make any code changes as needed.\n\n1. Install [`poetry`](https://pypi.org/project/poetry/).\n   \n    ```{.bash}\n    pip install poetry\n    ```\n\n1. Change directory to the `FMBT` repo directory and run poetry build.\n\n    ```{.bash}\n    poetry build\n    ```\n\n1. The `.whl` file is generated in the `dist` folder. Install the `.whl` in your current Python environment.\n\n    ```{.bash}\n    pip install .\\dist\\fmbt-X.Y.Z.tar.gz\n    ```\n\n1. Run `FMBT` as usual through the `fmbt` CLI command.\n\n## Pending enhancements\n\nThe following enhancements are identified as next steps for `FMBT`.\n\n1. [**Highest priority**] Convert `FMBT` to a Python package and publish on [PyPi](https://pypi.org/).\n\n1. Containerize `FMBT` and provide instructions for running the container on EC2.\n\n1. Add code to determine the cost of running an entire experiment and include it in the final report. This would only include the cost of running the SageMaker endpoints based on hourly public pricing (the cost of running this code on a notebook or a EC2 is trivial in comparison and can be ignored).\n\n1. Support for a custom token counter. Currently only the LLama tokenizer is supported but we want to allow users to bring their own token counting logic for different models.\n\n1. Support for different payload formats that might be needed for different inference containers. Currently the HF TGI container, and DJL Deep Speed container on SageMaker both use the same format but in future other containers might need a different payload format.\n\n1. Emit live metrics so that they can be monitored through Grafana via live dashboard.\n\n1. Allow users to publish their experiment configs and results by doing a POST to an AWS Lambda that writes results to a common S3 bucket that can serve as storage for a simple website.\n\n1. Create a leaderboard of model benchmarks.\n\nView the [ISSUES](https://github.com/aws-samples/foundation-model-benchmarking-tool/issues) on github and add any you might think be an beneficial iteration to this benchmarking harness.\n\n## Security\n\nSee [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.\n\n## License\n\nThis library is licensed under the MIT-0 License. See the [LICENSE](./LICENSE) file.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Benchmark performance of **any model** on **any supported instance type** on Amazon SageMaker.",
    "version": "1.0.7",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f4a22cdbf8764e067ecf5a5a175e4d97c505d94af4a24a254082a0e046a8d4c7",
                "md5": "6ad198b351433ad290de0fa629399b69",
                "sha256": "b6cef418fc4b7767ca110affff2b7468026848ea3cfeaa93212dba5cbee9fab8"
            },
            "downloads": -1,
            "filename": "fmbt-1.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6ad198b351433ad290de0fa629399b69",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11,<4.0",
            "size": 59451,
            "upload_time": "2024-02-12T03:57:34",
            "upload_time_iso_8601": "2024-02-12T03:57:34.477641Z",
            "url": "https://files.pythonhosted.org/packages/f4/a2/2cdbf8764e067ecf5a5a175e4d97c505d94af4a24a254082a0e046a8d4c7/fmbt-1.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "eb5ceeb31461fc072a3fe5a2aa0999b0ddef3b66929bb98a80da0342dbee3426",
                "md5": "2d405fe858a24f0c3438588cd453304a",
                "sha256": "d004d4ddef507c379140f43fa787bd9746465185c2a487cf53777d53930b8ca4"
            },
            "downloads": -1,
            "filename": "fmbt-1.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "2d405fe858a24f0c3438588cd453304a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11,<4.0",
            "size": 49551,
            "upload_time": "2024-02-12T03:57:36",
            "upload_time_iso_8601": "2024-02-12T03:57:36.328465Z",
            "url": "https://files.pythonhosted.org/packages/eb/5c/eeb31461fc072a3fe5a2aa0999b0ddef3b66929bb98a80da0342dbee3426/fmbt-1.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-12 03:57:36",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "fmbt"
}

Amit Arora