bigcodebench

Name	bigcodebench JSON
Version	0.2.3.post3 JSON
	download
home_page	https://github.com/bigcode-project/bigcodebench
Summary	"Evaluation package for BigCodeBench"
upload_time	2025-02-12 19:46:50
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	Apache-2.0
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # BigCodeBench
<center>
<img src="https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/bigcodebench_banner.svg?raw=true" alt="BigCodeBench">
</center>

<p align="center">
    <a href="https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard"><img src="https://img.shields.io/badge/🤗&nbsp&nbsp%F0%9F%8F%86-leaderboard-%23ff8811"></a>
    <a href="https://huggingface.co/collections/bigcode/bigcodebench-666ed21a5039c618e608ab06"><img src="https://img.shields.io/badge/🤗-collection-pink"></a>
    <a href="https://bigcode-bench.github.io/"><img src="https://img.shields.io/badge/%F0%9F%8F%86-website-8A2BE2"></a>
    <a href="https://arxiv.org/abs/2406.15877"><img src="https://img.shields.io/badge/arXiv-2406.15877-b31b1b.svg"></a>
    <a href="https://pypi.org/project/bigcodebench/"><img src="https://img.shields.io/pypi/v/bigcodebench?color=g"></a>
    <a href="https://pepy.tech/project/bigcodebench"><img src="https://static.pepy.tech/badge/bigcodebench"></a>
    <a href="https://github.com/bigcodebench/bigcodebench/blob/master/LICENSE"><img src="https://img.shields.io/pypi/l/bigcodebench"></a>
    <a href="https://hub.docker.com/r/bigcodebench/bigcodebench-evaluate" title="Docker-Eval"><img src="https://img.shields.io/docker/image-size/bigcodebench/bigcodebench-evaluate"></a>
</p>

<p align="center">
    <a href="#-impact">💥 Impact</a> •
    <a href="#-news">📰 News</a> •
    <a href="#-quick-start">🔥 Quick Start</a> •
    <a href="#-remote-evaluation">🚀 Remote Evaluation</a> •
    <a href="#-llm-generated-code">💻 LLM-generated Code</a> •
    <a href="#-advanced-usage">🧑 Advanced Usage</a> •
    <a href="#-result-submission">📰 Result Submission</a> •
    <a href="#-citation">📜 Citation</a>
</p>

<div align="center">
    <h2>🎉 Check out our latest work!<br>
    <a href="https://swe-arena.com">🌟 SWE Arena 🌟</a><br>
    <strong>🚀 Open Evaluation Platform on AI for Software Engineering 🚀<br>
    ✨ 100% free to use the latest frontier models! ✨</strong></h2>
</div>

## 💥 Impact
BigCodeBench has been trusted by many LLM teams including:
- Zhipu AI
- Alibaba Qwen
- DeepSeek
- Amazon AWS AI
- Snowflake AI Research
- ServiceNow Research
- Meta AI
- Cohere AI
- Sakana AI
- Allen Institute for Artificial Intelligence (AI2)

## 📰 News
- **[2025-01-22]** We are releasing `bigcodebench==v0.2.2.dev2`, with 163 models evaluated!
- **[2024-10-06]** We are releasing `bigcodebench==v0.2.0`!
- **[2024-10-05]** We create a public code execution API on the [Hugging Face space](https://huggingface.co/spaces/bigcode/bigcodebench-evaluator).
- **[2024-10-01]** We have evaluated 139 models on BigCodeBench-Hard so far. Take a look at the [leaderboard](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard)!
- **[2024-08-19]** To make the evaluation fully reproducible, we add a real-time code execution session to the leaderboard. It can be viewed [here](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard).
- **[2024-08-02]** We release `bigcodebench==v0.1.9`.

<details><summary>More News <i>:: click to expand ::</i></summary>
<div>

- **[2024-07-18]** We announce a subset of BigCodeBench, BigCodeBench-Hard, which includes 148 tasks that are more aligned with the real-world programming tasks. The details are available [in this blog post](https://huggingface.co/blog/terryyz/bigcodebench-hard). The dataset is available [here](https://huggingface.co/datasets/bigcode/bigcodebench-hard). The new release is `bigcodebench==v0.1.8`.
- **[2024-06-28]** We release `bigcodebench==v0.1.7`.
- **[2024-06-27]** We release `bigcodebench==v0.1.6`.
- **[2024-06-19]** We start the Hugging Face BigCodeBench Leaderboard! The leaderboard is available [here](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard).
- **[2024-06-18]** We release BigCodeBench, a new benchmark for code generation with 1140 software-engineering-oriented programming tasks. Preprint is available [here](https://arxiv.org/abs/2406.15877). PyPI package is available [here](https://pypi.org/project/bigcodebench/) with the version `0.1.5`.

</div>
</details>

## 🌸 About

### BigCodeBench

BigCodeBench is an **_easy-to-use_** benchmark for solving **_practical_** and **_challenging_** tasks via code. It aims to evaluate the true programming capabilities of large language models (LLMs) in a more realistic setting. The benchmark is designed for HumanEval-like function-level code generation tasks, but with much more complex instructions and diverse function calls.

There are two splits in BigCodeBench:
- `Complete`: Thes split is designed for code completion based on the comprehensive docstrings.
- `Instruct`: The split works for the instruction-tuned and chat models only, where the models are asked to generate a code snippet based on the natural language instructions. The instructions only contain necessary information, and require more complex reasoning.

### Why BigCodeBench?

BigCodeBench focuses on task automation via code generation with *diverse function calls* and *complex instructions*, with:

* ✨ **Precise evaluation & ranking**: See [our leaderboard](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard) for latest LLM rankings before & after rigorous evaluation.
* ✨ **Pre-generated samples**: BigCodeBench accelerates code intelligence research by open-sourcing [LLM-generated samples](#-LLM-generated-code) for various models -- no need to re-run the expensive benchmarks!

## 🔥 Quick Start

To get started, please first set up the environment:

```bash
# By default, you will use the remote evaluation API to execute the output samples.
pip install bigcodebench --upgrade

# You are suggested to use `flash-attn` for generating code samples.
pip install packaging ninja
pip install flash-attn --no-build-isolation
# Note: if you have installation problem, consider using pre-built
# wheels from https://github.com/Dao-AILab/flash-attention/releases
```

<details><summary>⏬ Install nightly version <i>:: click to expand ::</i></summary>
<div>

```bash
# Install to use bigcodebench.generate
pip install "git+https://github.com/bigcode-project/bigcodebench.git" --upgrade
```

</div>
</details>


## 🚀 Remote Evaluation

We use the greedy decoding as an example to show how to evaluate the generated code samples via remote API.
> [!Warning]
>
> To ease the generation, we use batch inference by default. However, the batch inference results could vary from *batch sizes to batch sizes* and *versions to versions*, at least for the vLLM backend. If you want to get more deterministic results for greedy decoding, please set `--bs` to `1`. 

> [!Note]
>
> `gradio` backend on `BigCodeBench-Full` typically takes 6-7 minutes, and on `BigCodeBench-Hard` typically takes 4-5 minutes.
> `e2b` backend with default machine on `BigCodeBench-Full` typically takes 25-30 minutes, and on `BigCodeBench-Hard` typically takes 15-20 minutes.

```bash
bigcodebench.evaluate \
  --model meta-llama/Meta-Llama-3.1-8B-Instruct \
  --execution [e2b|gradio|local] \
  --split [complete|instruct] \
  --subset [full|hard] \
  --backend [vllm|openai|anthropic|google|mistral|hf]
```

- All the resulted files will be stored in a folder named `bcb_results`.
- The generated code samples will be stored in a file named `[model_name]--bigcodebench-[instruct|complete]--[backend]-[temp]-[n_samples]-sanitized_calibrated.jsonl`.
- The evaluation results will be stored in a file named `[model_name]--bigcodebench-[instruct|complete]--[backend]-[temp]-[n_samples]-sanitized_calibrated_eval_results.json`.
- The pass@k results will be stored in a file named `[model_name]--bigcodebench-[instruct|complete]--[backend]-[temp]-[n_samples]-sanitized_calibrated_pass_at_k.json`.

> [!Note]
>
> The `gradio` backend is hosted on the [Hugging Face space](https://huggingface.co/spaces/bigcode/bigcodebench-evaluator) by default.
> The default space can be sometimes slow, so we recommend you to use the `gradio` backend with a cloned [bigcodebench-evaluator](https://huggingface.co/spaces/bigcode/bigcodebench-evaluator) endpoint for faster evaluation.
> Otherwise, you can also use the `e2b` sandbox for evaluation, which is also pretty slow on the default machine.

> [!Note]
>
> BigCodeBench uses different prompts for base and chat models.
> By default it is detected by `tokenizer.chat_template` when using `hf`/`vllm` as backend.
> For other backends, only chat mode is allowed.
>
> Therefore, if your base models come with a `tokenizer.chat_template`,
> please add `--direct_completion` to avoid being evaluated
> in a chat mode.

To use E2B, you need to set up an account and get an API key from [E2B](https://e2b.dev/).

```bash
export E2B_API_KEY=<your_e2b_api_key>
```

Access OpenAI APIs from [OpenAI Console](https://platform.openai.com/)
```bash
export OPENAI_API_KEY=<your_openai_api_key>
```

Access Anthropic APIs from [Anthropic Console](https://console.anthropic.com/)
```bash
export ANTHROPIC_API_KEY=<your_anthropic_api_key>
```

Access Mistral APIs from [Mistral Console](https://console.mistral.ai/)
```bash
export MISTRAL_API_KEY=<your_mistral_api_key>
```

Access Gemini APIs from [Google AI Studio](https://aistudio.google.com/)
```bash
export GOOGLE_API_KEY=<your_google_api_key>
```

## 💻 LLM-generated Code

We share pre-generated code samples from LLMs we have [evaluated](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard) on the full set:
*  See the attachment of our [v0.2.1.post7](https://github.com/bigcode-project/bigcodebench/releases/tag/v0.2.1.post7). We include `sanitized_samples_calibrated.zip` for your convenience.

## 🧑 Advanced Usage

Please refer to the [ADVANCED USAGE](https://github.com/bigcode-project/bigcodebench/blob/main/ADVANCED_USAGE.md) for more details.

## 📰 Result Submission

Please email both the generated code samples and the execution results to [terry.zhuo@monash.edu](mailto:terry.zhuo@monash.edu) if you would like to contribute your model to the leaderboard. Note that the file names should be in the format of `[model_name]--[revision]--[bigcodebench|bigcodebench-hard]-[instruct|complete]--[backend]-[temp]-[n_samples]-sanitized_calibrated.jsonl` and `[model_name]--[revision]--[bigcodebench|bigcodebench-hard]-[instruct|complete]--[backend]-[temp]-[n_samples]-sanitized_calibrated_eval_results.json`. You can [file an issue](https://github.com/bigcode-project/bigcodebench/issues/new/choose) to remind us if we do not respond to your email within 3 days.

## 📜 Citation

```bibtex
@article{zhuo2024bigcodebench,
  title={BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions},
  author={Zhuo, Terry Yue and Vu, Minh Chien and Chim, Jenny and Hu, Han and Yu, Wenhao and Widyasari, Ratnadira and Yusuf, Imam Nur Bani and Zhan, Haolan and He, Junda and Paul, Indraneil and others},
  journal={arXiv preprint arXiv:2406.15877},
  year={2024}
}
```

## 🙏 Acknowledgement

- [EvalPlus](https://github.com/evalplus/evalplus)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bigcode-project/bigcodebench",
    "name": "bigcodebench",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/e2/ac/eb532b5c4ffeecb5021041d7fc94e99bda71ad328d4654447e5454e6e2d7/bigcodebench-0.2.3.post3.tar.gz",
    "platform": "any",
    "description": "# BigCodeBench\n<center>\n<img src=\"https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/asset/bigcodebench_banner.svg?raw=true\" alt=\"BigCodeBench\">\n</center>\n\n<p align=\"center\">\n    <a href=\"https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard\"><img src=\"https://img.shields.io/badge/\ud83e\udd17&nbsp&nbsp%F0%9F%8F%86-leaderboard-%23ff8811\"></a>\n    <a href=\"https://huggingface.co/collections/bigcode/bigcodebench-666ed21a5039c618e608ab06\"><img src=\"https://img.shields.io/badge/\ud83e\udd17-collection-pink\"></a>\n    <a href=\"https://bigcode-bench.github.io/\"><img src=\"https://img.shields.io/badge/%F0%9F%8F%86-website-8A2BE2\"></a>\n    <a href=\"https://arxiv.org/abs/2406.15877\"><img src=\"https://img.shields.io/badge/arXiv-2406.15877-b31b1b.svg\"></a>\n    <a href=\"https://pypi.org/project/bigcodebench/\"><img src=\"https://img.shields.io/pypi/v/bigcodebench?color=g\"></a>\n    <a href=\"https://pepy.tech/project/bigcodebench\"><img src=\"https://static.pepy.tech/badge/bigcodebench\"></a>\n    <a href=\"https://github.com/bigcodebench/bigcodebench/blob/master/LICENSE\"><img src=\"https://img.shields.io/pypi/l/bigcodebench\"></a>\n    <a href=\"https://hub.docker.com/r/bigcodebench/bigcodebench-evaluate\" title=\"Docker-Eval\"><img src=\"https://img.shields.io/docker/image-size/bigcodebench/bigcodebench-evaluate\"></a>\n</p>\n\n<p align=\"center\">\n    <a href=\"#-impact\">\ud83d\udca5 Impact</a> \u2022\n    <a href=\"#-news\">\ud83d\udcf0 News</a> \u2022\n    <a href=\"#-quick-start\">\ud83d\udd25 Quick Start</a> \u2022\n    <a href=\"#-remote-evaluation\">\ud83d\ude80 Remote Evaluation</a> \u2022\n    <a href=\"#-llm-generated-code\">\ud83d\udcbb LLM-generated Code</a> \u2022\n    <a href=\"#-advanced-usage\">\ud83e\uddd1 Advanced Usage</a> \u2022\n    <a href=\"#-result-submission\">\ud83d\udcf0 Result Submission</a> \u2022\n    <a href=\"#-citation\">\ud83d\udcdc Citation</a>\n</p>\n\n<div align=\"center\">\n    <h2>\ud83c\udf89 Check out our latest work!<br>\n    <a href=\"https://swe-arena.com\">\ud83c\udf1f SWE Arena \ud83c\udf1f</a><br>\n    <strong>\ud83d\ude80 Open Evaluation Platform on AI for Software Engineering \ud83d\ude80<br>\n    \u2728 100% free to use the latest frontier models! \u2728</strong></h2>\n</div>\n\n## \ud83d\udca5 Impact\nBigCodeBench has been trusted by many LLM teams including:\n- Zhipu AI\n- Alibaba Qwen\n- DeepSeek\n- Amazon AWS AI\n- Snowflake AI Research\n- ServiceNow Research\n- Meta AI\n- Cohere AI\n- Sakana AI\n- Allen Institute for Artificial Intelligence (AI2)\n\n## \ud83d\udcf0 News\n- **[2025-01-22]** We are releasing `bigcodebench==v0.2.2.dev2`, with 163 models evaluated!\n- **[2024-10-06]** We are releasing `bigcodebench==v0.2.0`!\n- **[2024-10-05]** We create a public code execution API on the [Hugging Face space](https://huggingface.co/spaces/bigcode/bigcodebench-evaluator).\n- **[2024-10-01]** We have evaluated 139 models on BigCodeBench-Hard so far. Take a look at the [leaderboard](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard)!\n- **[2024-08-19]** To make the evaluation fully reproducible, we add a real-time code execution session to the leaderboard. It can be viewed [here](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard).\n- **[2024-08-02]** We release `bigcodebench==v0.1.9`.\n\n<details><summary>More News <i>:: click to expand ::</i></summary>\n<div>\n\n- **[2024-07-18]** We announce a subset of BigCodeBench, BigCodeBench-Hard, which includes 148 tasks that are more aligned with the real-world programming tasks. The details are available [in this blog post](https://huggingface.co/blog/terryyz/bigcodebench-hard). The dataset is available [here](https://huggingface.co/datasets/bigcode/bigcodebench-hard). The new release is `bigcodebench==v0.1.8`.\n- **[2024-06-28]** We release `bigcodebench==v0.1.7`.\n- **[2024-06-27]** We release `bigcodebench==v0.1.6`.\n- **[2024-06-19]** We start the Hugging Face BigCodeBench Leaderboard! The leaderboard is available [here](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard).\n- **[2024-06-18]** We release BigCodeBench, a new benchmark for code generation with 1140 software-engineering-oriented programming tasks. Preprint is available [here](https://arxiv.org/abs/2406.15877). PyPI package is available [here](https://pypi.org/project/bigcodebench/) with the version `0.1.5`.\n\n</div>\n</details>\n\n## \ud83c\udf38 About\n\n### BigCodeBench\n\nBigCodeBench is an **_easy-to-use_** benchmark for solving **_practical_** and **_challenging_** tasks via code. It aims to evaluate the true programming capabilities of large language models (LLMs) in a more realistic setting. The benchmark is designed for HumanEval-like function-level code generation tasks, but with much more complex instructions and diverse function calls.\n\nThere are two splits in BigCodeBench:\n- `Complete`: Thes split is designed for code completion based on the comprehensive docstrings.\n- `Instruct`: The split works for the instruction-tuned and chat models only, where the models are asked to generate a code snippet based on the natural language instructions. The instructions only contain necessary information, and require more complex reasoning.\n\n### Why BigCodeBench?\n\nBigCodeBench focuses on task automation via code generation with *diverse function calls* and *complex instructions*, with:\n\n* \u2728 **Precise evaluation & ranking**: See [our leaderboard](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard) for latest LLM rankings before & after rigorous evaluation.\n* \u2728 **Pre-generated samples**: BigCodeBench accelerates code intelligence research by open-sourcing [LLM-generated samples](#-LLM-generated-code) for various models -- no need to re-run the expensive benchmarks!\n\n## \ud83d\udd25 Quick Start\n\nTo get started, please first set up the environment:\n\n```bash\n# By default, you will use the remote evaluation API to execute the output samples.\npip install bigcodebench --upgrade\n\n# You are suggested to use `flash-attn` for generating code samples.\npip install packaging ninja\npip install flash-attn --no-build-isolation\n# Note: if you have installation problem, consider using pre-built\n# wheels from https://github.com/Dao-AILab/flash-attention/releases\n```\n\n<details><summary>\u23ec Install nightly version <i>:: click to expand ::</i></summary>\n<div>\n\n```bash\n# Install to use bigcodebench.generate\npip install \"git+https://github.com/bigcode-project/bigcodebench.git\" --upgrade\n```\n\n</div>\n</details>\n\n\n## \ud83d\ude80 Remote Evaluation\n\nWe use the greedy decoding as an example to show how to evaluate the generated code samples via remote API.\n> [!Warning]\n>\n> To ease the generation, we use batch inference by default. However, the batch inference results could vary from *batch sizes to batch sizes* and *versions to versions*, at least for the vLLM backend. If you want to get more deterministic results for greedy decoding, please set `--bs` to `1`. \n\n> [!Note]\n>\n> `gradio` backend on `BigCodeBench-Full` typically takes 6-7 minutes, and on `BigCodeBench-Hard` typically takes 4-5 minutes.\n> `e2b` backend with default machine on `BigCodeBench-Full` typically takes 25-30 minutes, and on `BigCodeBench-Hard` typically takes 15-20 minutes.\n\n```bash\nbigcodebench.evaluate \\\n  --model meta-llama/Meta-Llama-3.1-8B-Instruct \\\n  --execution [e2b|gradio|local] \\\n  --split [complete|instruct] \\\n  --subset [full|hard] \\\n  --backend [vllm|openai|anthropic|google|mistral|hf]\n```\n\n- All the resulted files will be stored in a folder named `bcb_results`.\n- The generated code samples will be stored in a file named `[model_name]--bigcodebench-[instruct|complete]--[backend]-[temp]-[n_samples]-sanitized_calibrated.jsonl`.\n- The evaluation results will be stored in a file named `[model_name]--bigcodebench-[instruct|complete]--[backend]-[temp]-[n_samples]-sanitized_calibrated_eval_results.json`.\n- The pass@k results will be stored in a file named `[model_name]--bigcodebench-[instruct|complete]--[backend]-[temp]-[n_samples]-sanitized_calibrated_pass_at_k.json`.\n\n> [!Note]\n>\n> The `gradio` backend is hosted on the [Hugging Face space](https://huggingface.co/spaces/bigcode/bigcodebench-evaluator) by default.\n> The default space can be sometimes slow, so we recommend you to use the `gradio` backend with a cloned [bigcodebench-evaluator](https://huggingface.co/spaces/bigcode/bigcodebench-evaluator) endpoint for faster evaluation.\n> Otherwise, you can also use the `e2b` sandbox for evaluation, which is also pretty slow on the default machine.\n\n> [!Note]\n>\n> BigCodeBench uses different prompts for base and chat models.\n> By default it is detected by `tokenizer.chat_template` when using `hf`/`vllm` as backend.\n> For other backends, only chat mode is allowed.\n>\n> Therefore, if your base models come with a `tokenizer.chat_template`,\n> please add `--direct_completion` to avoid being evaluated\n> in a chat mode.\n\nTo use E2B, you need to set up an account and get an API key from [E2B](https://e2b.dev/).\n\n```bash\nexport E2B_API_KEY=<your_e2b_api_key>\n```\n\nAccess OpenAI APIs from [OpenAI Console](https://platform.openai.com/)\n```bash\nexport OPENAI_API_KEY=<your_openai_api_key>\n```\n\nAccess Anthropic APIs from [Anthropic Console](https://console.anthropic.com/)\n```bash\nexport ANTHROPIC_API_KEY=<your_anthropic_api_key>\n```\n\nAccess Mistral APIs from [Mistral Console](https://console.mistral.ai/)\n```bash\nexport MISTRAL_API_KEY=<your_mistral_api_key>\n```\n\nAccess Gemini APIs from [Google AI Studio](https://aistudio.google.com/)\n```bash\nexport GOOGLE_API_KEY=<your_google_api_key>\n```\n\n## \ud83d\udcbb LLM-generated Code\n\nWe share pre-generated code samples from LLMs we have [evaluated](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard) on the full set:\n*  See the attachment of our [v0.2.1.post7](https://github.com/bigcode-project/bigcodebench/releases/tag/v0.2.1.post7). We include `sanitized_samples_calibrated.zip` for your convenience.\n\n## \ud83e\uddd1 Advanced Usage\n\nPlease refer to the [ADVANCED USAGE](https://github.com/bigcode-project/bigcodebench/blob/main/ADVANCED_USAGE.md) for more details.\n\n## \ud83d\udcf0 Result Submission\n\nPlease email both the generated code samples and the execution results to [terry.zhuo@monash.edu](mailto:terry.zhuo@monash.edu) if you would like to contribute your model to the leaderboard. Note that the file names should be in the format of `[model_name]--[revision]--[bigcodebench|bigcodebench-hard]-[instruct|complete]--[backend]-[temp]-[n_samples]-sanitized_calibrated.jsonl` and `[model_name]--[revision]--[bigcodebench|bigcodebench-hard]-[instruct|complete]--[backend]-[temp]-[n_samples]-sanitized_calibrated_eval_results.json`. You can [file an issue](https://github.com/bigcode-project/bigcodebench/issues/new/choose) to remind us if we do not respond to your email within 3 days.\n\n## \ud83d\udcdc Citation\n\n```bibtex\n@article{zhuo2024bigcodebench,\n  title={BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions},\n  author={Zhuo, Terry Yue and Vu, Minh Chien and Chim, Jenny and Hu, Han and Yu, Wenhao and Widyasari, Ratnadira and Yusuf, Imam Nur Bani and Zhan, Haolan and He, Junda and Paul, Indraneil and others},\n  journal={arXiv preprint arXiv:2406.15877},\n  year={2024}\n}\n```\n\n## \ud83d\ude4f Acknowledgement\n\n- [EvalPlus](https://github.com/evalplus/evalplus)\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "\"Evaluation package for BigCodeBench\"",
    "version": "0.2.3.post3",
    "project_urls": {
        "Homepage": "https://github.com/bigcode-project/bigcodebench"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "03d2909d9cd78fd67aea735b6c4104e0ae48f9cd006b0fd5238414f580f9f70e",
                "md5": "d586c27a8724ec12a1def1b574e7d1e1",
                "sha256": "22f31acdbf790ce63f0efa4d77b89170026c8f429120c000255c7e0e44bf5bb3"
            },
            "downloads": -1,
            "filename": "bigcodebench-0.2.3.post3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d586c27a8724ec12a1def1b574e7d1e1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 46916,
            "upload_time": "2025-02-12T19:46:48",
            "upload_time_iso_8601": "2025-02-12T19:46:48.730739Z",
            "url": "https://files.pythonhosted.org/packages/03/d2/909d9cd78fd67aea735b6c4104e0ae48f9cd006b0fd5238414f580f9f70e/bigcodebench-0.2.3.post3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e2aceb532b5c4ffeecb5021041d7fc94e99bda71ad328d4654447e5454e6e2d7",
                "md5": "97936b7b7bb2349d3beb01e72cc3a740",
                "sha256": "c64ae7d16a4eb8aa386682cea2762ccb8388af0ad2c24f85fdb68cb8594efe7e"
            },
            "downloads": -1,
            "filename": "bigcodebench-0.2.3.post3.tar.gz",
            "has_sig": false,
            "md5_digest": "97936b7b7bb2349d3beb01e72cc3a740",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 74978,
            "upload_time": "2025-02-12T19:46:50",
            "upload_time_iso_8601": "2025-02-12T19:46:50.280714Z",
            "url": "https://files.pythonhosted.org/packages/e2/ac/eb532b5c4ffeecb5021041d7fc94e99bda71ad328d4654447e5454e6e2d7/bigcodebench-0.2.3.post3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-12 19:46:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bigcode-project",
    "github_project": "bigcodebench",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "bigcodebench"
}

None