benchllama


Namebenchllama JSON
Version 0.2.7 PyPI version JSON
download
home_pagehttps://github.com/srikanth235/benchllama/
SummaryBenchmark your local LLMs.
upload_time2024-02-22 11:33:19
maintainer
docs_urlNone
authorSrikanth Srungarapu
requires_python>=3.9,<4.0
licenseMIT
keywords benchmark llama ollama llms openai chatgpt text-generation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <h1><b>🧮 Benchllama</b></h1>
  <p>
    <strong>An open-source tool to benchmark you local LLMs.</strong>
  </p>
  <img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"/>
  <img src="https://img.shields.io/pypi/v/benchllama" alt="PyPI"/>
  <img src="https://img.shields.io/pypi/pyversions/benchllama.svg" alt="Supported Versions"/>
  <img src="https://img.shields.io/pypi/dm/benchllama.svg" alt="GitHub: Downloads"/>
  <a href="https://discord.gg/wykDxGyUHA"  style="text-decoration: none; outline: none">
  <img src="https://dcbadge.vercel.app/api/server/vAcVQ7XhR2?style=flat&compact=true" alt="Discord"/>
  </a>
  <p align="center">
    <img src="https://raw.githubusercontent.com/srikanth235/benchllama/master/media/benchllama.gif" width="760"/>
  </p>
</div>

# 🔑 Key points

Benchllama helps with benchmarking your local LLMs. Currently, it <u>only supports benchmarking models served via Ollama</u>. By default, it pulls [`bigcode/humanevalpack`](https://huggingface.co/datasets/bigcode/humanevalpack) from HuggingFace. There is an out of box support for evaluating code autocompletion models (you need to use `--eval` flag for triggering this). Currently, it supports the following languages: Python, JavaScript, Java, Go, C++. You can also bring your own dataset (see this [example](https://github.com/srikanth235/benchllama/tree/master/examples) for helping you with creating one) by specifying the path to it in `--dataset` flag.

# 📜 Background

With the explosion of open source LLMs and toolbox to further customize these models like [Modelfiles](https://github.com/ollama/ollama/blob/main/docs/modelfile.md), [Mergekit](https://github.com/arcee-ai/mergekit), [LoRA](https://github.com/microsoft/LoRA) etc, it can be daunting to end users to choose the right LLM. From our experience with running local LLMs, the two key metrics that matter are performance and quality of responses. We created a simple CLI tool that enables the users to pick right LLM by evaluating them across these two parameters.

Given our experience in coding LLMs, we felt it would be useful to add out-of-box support for calculating `pass@k` for autocompletion models. In case, if you are into coding LLMs, please checkout our related project i.e **Privy** ([github repo](https://github.com/srikanth235/privy), [vscode link](https://marketplace.visualstudio.com/items?itemName=privy.privy-vscode), [openvsx link](https://open-vsx.org/extension/Privy/privy-vscode)).

# ✨ Features

- **Evaluate**: Evaluate the performance of your models on various tasks, such as code generation.
- **Clean**: Clean up temporary files generated by Benchllama.

# 🚀 Installation

```console
$ pip install benchllama
```

# ⚙️ Usage

```console
$ benchllama [OPTIONS] COMMAND [ARGS]...
```

**Options**:

- `--install-completion`: Install completion for the current shell.
- `--show-completion`: Show completion for the current shell, to copy it or customize the installation.
- `--help`: Show this message and exit.

**Commands**:

- `evaluate`
- `clean`

## `benchllama evaluate`

**Usage**:

```console
$ benchllama evaluate [OPTIONS]
```

**Options**:

- `--models TEXT`: Names of models that need to be evaluated. [required]
- `--provider-url TEXT`: The endpoint of the model provider. [default: http://localhost:11434]
- `--dataset FILE`: By default, bigcode/humanevalpack from Hugging Face will be used. If you want to use your own dataset, specify the path here.
- `--languages [python|js|java|go|cpp]`: List of languages to evaluate from bigcode/humanevalpack. Ignore this if you are brining your own data [default: Language.python]
- `--num-completions INTEGER`: Number of completions to be generated for each task. [default: 3]
- `--no-eval / --eval`: If true, evaluation will be done [default: no-eval]
- `--k INTEGER`: The k for calculating pass@k. The values shouldn't exceed num_completions [default: 1, 2]
- `--samples INTEGER`: Number of dataset samples to evaluate. By default, all the samples get processed. [default: -1]
- `--output PATH`: Output directory [default: /tmp]
- `--help`: Show this message and exit.

## `benchllama clean`

**Usage**:

```console
$ benchllama clean [OPTIONS]
```

**Options**:

- `--run-id TEXT`: Run id
- `--output PATH`: Output directory [default: /tmp]
- `--help`: Show this message and exit.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/srikanth235/benchllama/",
    "name": "benchllama",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "benchmark,llama,ollama,llms,openai,chatgpt,text-generation",
    "author": "Srikanth Srungarapu",
    "author_email": "srikanth235@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/4c/9b/e69ef31ec62dcaac80d12908cdc3b92fe7e2b9aae400f4cffab37901f6e7/benchllama-0.2.7.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <h1><b>\ud83e\uddee Benchllama</b></h1>\n  <p>\n    <strong>An open-source tool to benchmark you local LLMs.</strong>\n  </p>\n  <img src=\"https://img.shields.io/badge/License-MIT-yellow.svg\" alt=\"License: MIT\"/>\n  <img src=\"https://img.shields.io/pypi/v/benchllama\" alt=\"PyPI\"/>\n  <img src=\"https://img.shields.io/pypi/pyversions/benchllama.svg\" alt=\"Supported Versions\"/>\n  <img src=\"https://img.shields.io/pypi/dm/benchllama.svg\" alt=\"GitHub: Downloads\"/>\n  <a href=\"https://discord.gg/wykDxGyUHA\"  style=\"text-decoration: none; outline: none\">\n  <img src=\"https://dcbadge.vercel.app/api/server/vAcVQ7XhR2?style=flat&compact=true\" alt=\"Discord\"/>\n  </a>\n  <p align=\"center\">\n    <img src=\"https://raw.githubusercontent.com/srikanth235/benchllama/master/media/benchllama.gif\" width=\"760\"/>\n  </p>\n</div>\n\n# \ud83d\udd11 Key points\n\nBenchllama helps with benchmarking your local LLMs. Currently, it <u>only supports benchmarking models served via Ollama</u>. By default, it pulls [`bigcode/humanevalpack`](https://huggingface.co/datasets/bigcode/humanevalpack) from HuggingFace. There is an out of box support for evaluating code autocompletion models (you need to use `--eval` flag for triggering this). Currently, it supports the following languages: Python, JavaScript, Java, Go, C++. You can also bring your own dataset (see this [example](https://github.com/srikanth235/benchllama/tree/master/examples) for helping you with creating one) by specifying the path to it in `--dataset` flag.\n\n# \ud83d\udcdc Background\n\nWith the explosion of open source LLMs and toolbox to further customize these models like [Modelfiles](https://github.com/ollama/ollama/blob/main/docs/modelfile.md), [Mergekit](https://github.com/arcee-ai/mergekit), [LoRA](https://github.com/microsoft/LoRA) etc, it can be daunting to end users to choose the right LLM. From our experience with running local LLMs, the two key metrics that matter are performance and quality of responses. We created a simple CLI tool that enables the users to pick right LLM by evaluating them across these two parameters.\n\nGiven our experience in coding LLMs, we felt it would be useful to add out-of-box support for calculating `pass@k` for autocompletion models. In case, if you are into coding LLMs, please checkout our related project i.e **Privy** ([github repo](https://github.com/srikanth235/privy), [vscode link](https://marketplace.visualstudio.com/items?itemName=privy.privy-vscode), [openvsx link](https://open-vsx.org/extension/Privy/privy-vscode)).\n\n# \u2728 Features\n\n- **Evaluate**: Evaluate the performance of your models on various tasks, such as code generation.\n- **Clean**: Clean up temporary files generated by Benchllama.\n\n# \ud83d\ude80 Installation\n\n```console\n$ pip install benchllama\n```\n\n# \u2699\ufe0f Usage\n\n```console\n$ benchllama [OPTIONS] COMMAND [ARGS]...\n```\n\n**Options**:\n\n- `--install-completion`: Install completion for the current shell.\n- `--show-completion`: Show completion for the current shell, to copy it or customize the installation.\n- `--help`: Show this message and exit.\n\n**Commands**:\n\n- `evaluate`\n- `clean`\n\n## `benchllama evaluate`\n\n**Usage**:\n\n```console\n$ benchllama evaluate [OPTIONS]\n```\n\n**Options**:\n\n- `--models TEXT`: Names of models that need to be evaluated. [required]\n- `--provider-url TEXT`: The endpoint of the model provider. [default: http://localhost:11434]\n- `--dataset FILE`: By default, bigcode/humanevalpack from Hugging Face will be used. If you want to use your own dataset, specify the path here.\n- `--languages [python|js|java|go|cpp]`: List of languages to evaluate from bigcode/humanevalpack. Ignore this if you are brining your own data [default: Language.python]\n- `--num-completions INTEGER`: Number of completions to be generated for each task. [default: 3]\n- `--no-eval / --eval`: If true, evaluation will be done [default: no-eval]\n- `--k INTEGER`: The k for calculating pass@k. The values shouldn't exceed num_completions [default: 1, 2]\n- `--samples INTEGER`: Number of dataset samples to evaluate. By default, all the samples get processed. [default: -1]\n- `--output PATH`: Output directory [default: /tmp]\n- `--help`: Show this message and exit.\n\n## `benchllama clean`\n\n**Usage**:\n\n```console\n$ benchllama clean [OPTIONS]\n```\n\n**Options**:\n\n- `--run-id TEXT`: Run id\n- `--output PATH`: Output directory [default: /tmp]\n- `--help`: Show this message and exit.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Benchmark your local LLMs.",
    "version": "0.2.7",
    "project_urls": {
        "Homepage": "https://github.com/srikanth235/benchllama/"
    },
    "split_keywords": [
        "benchmark",
        "llama",
        "ollama",
        "llms",
        "openai",
        "chatgpt",
        "text-generation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "977b57f6ccff51bf691cf684361d6570a32c88d8e19cf1cf1ccc9e4f89537529",
                "md5": "525acaecb3dce9ed9028dfadb966331c",
                "sha256": "d0d02b5237cea6e42f1e2555695bde28bd7085e7ac0205a261ce93fbbacf7a3b"
            },
            "downloads": -1,
            "filename": "benchllama-0.2.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "525acaecb3dce9ed9028dfadb966331c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 17729,
            "upload_time": "2024-02-22T11:33:16",
            "upload_time_iso_8601": "2024-02-22T11:33:16.919462Z",
            "url": "https://files.pythonhosted.org/packages/97/7b/57f6ccff51bf691cf684361d6570a32c88d8e19cf1cf1ccc9e4f89537529/benchllama-0.2.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4c9be69ef31ec62dcaac80d12908cdc3b92fe7e2b9aae400f4cffab37901f6e7",
                "md5": "888a478aab1fef66d3dcc4e0b33a592f",
                "sha256": "5f574b44b89d3e4b18df6a0ad35afdeb6bfce7fc04250ed9f45695e5dc1e9399"
            },
            "downloads": -1,
            "filename": "benchllama-0.2.7.tar.gz",
            "has_sig": false,
            "md5_digest": "888a478aab1fef66d3dcc4e0b33a592f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 13034,
            "upload_time": "2024-02-22T11:33:19",
            "upload_time_iso_8601": "2024-02-22T11:33:19.122181Z",
            "url": "https://files.pythonhosted.org/packages/4c/9b/e69ef31ec62dcaac80d12908cdc3b92fe7e2b9aae400f4cffab37901f6e7/benchllama-0.2.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-22 11:33:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "srikanth235",
    "github_project": "benchllama",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "benchllama"
}
        
Elapsed time: 1.14315s