codebleu

Name	codebleu JSON
Version	0.7.0 JSON
	download
home_page	None
Summary	Unofficial CodeBLEU implementation that supports Linux, MacOS and Windows available on PyPI.
upload_time	2024-05-30 10:32:09
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	MIT License
keywords	codebleu code bleu nlp natural language processing programming evaluate evaluation code generation metrics
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # CodeBLEU
[![Publish](https://github.com/k4black/codebleu/actions/workflows/publish.yml/badge.svg)](https://github.com/k4black/codebleu/actions/workflows/publish.yml)
[![Test](https://github.com/k4black/codebleu/actions/workflows/test.yml/badge.svg?event=push)](https://github.com/k4black/codebleu/actions/workflows/test.yml)
[![codecov](https://codecov.io/gh/k4black/codebleu/branch/main/graph/badge.svg?token=60BIFPWRCE)](https://codecov.io/gh/k4black/codebleu)
[![PyPI version](https://badge.fury.io/py/codebleu.svg)](https://badge.fury.io/py/codebleu)


This repository contains an unofficial `CodeBLEU` implementation that supports `Linux`, `MacOS` (incl. M-series) and `Windows`. It is available through `PyPI` and the `evaluate` library.

Available for: `Python`, `C`, `C#`, `C++`, `Java`, `JavaScript`, `PHP`, `Go`, `Ruby`, `Rust`.

---

The code is based on the original [CodeXGLUE/CodeBLEU](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator/CodeBLEU) and updated version by [XLCoST/CodeBLEU](https://github.com/reddy-lab-code-research/XLCoST/tree/main/code/translation/evaluator/CodeBLEU).  It has been refactored, tested, built for macOS and Windows, and multiple improvements have been made to enhance usability.

## Metric Description

> An ideal evaluation metric should consider the grammatical correctness and the logic correctness.
> We propose weighted n-gram match and syntactic AST match to measure grammatical correctness, and introduce semantic data-flow match to calculate logic correctness.
> ![CodeBLEU](CodeBLEU.jpg)  
[from [CodeXGLUE](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator/CodeBLEU) repo]

In a nutshell, `CodeBLEU` is a weighted combination of `n-gram match (BLEU)`, `weighted n-gram match (BLEU-weighted)`, `AST match` and `data-flow match` scores.

The metric has shown higher correlation with human evaluation than `BLEU` and `accuracy` metrics.


## Installation

This library requires `so` file compilation with tree-sitter, so it is platform dependent.  
Currently available for `Linux` (manylinux), `MacOS` and `Windows` with Python 3.8+.

The metrics is available as [pip package](https://pypi.org/project/codebleu/) and can be installed as indicated above:
```bash
pip install codebleu
```
or directly from git repo (require internet connection to download tree-sitter):
```bash
pip install git+https://github.com/k4black/codebleu.git
```

Also you have to install tree-sitter language you need (e.g. python, rust, etc):
```bash
pip install tree-sitter-python
```
Or you can install all languages:
```bash
pip install codebleu[all]
```

Note: At the moment (May 2024) precompiled languages are NOT available for arm64 (M1) MacOS, so you have to install and build tree-sitter languages manually, for example:
```bash
pip install pip install git+https://github.com/tree-sitter/tree-sitter-python.git
```


## Usage 

```python
from codebleu import calc_codebleu

prediction = "def add ( a , b ) :\n return a + b"
reference = "def sum ( first , second ) :\n return second + first"

result = calc_codebleu([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None)
print(result)
# {
#   'codebleu': 0.5537, 
#   'ngram_match_score': 0.1041, 
#   'weighted_ngram_match_score': 0.1109, 
#   'syntax_match_score': 1.0, 
#   'dataflow_match_score': 1.0
# }
```
where `calc_codebleu` takes the following arguments:
- `refarences` (`list[str]` or `list[list[str]]`): reference code
- `predictions` (`list[str]`) predicted code
- `lang` (`str`): code language, see `codebleu.AVAILABLE_LANGS` for available languages (python, c_sharp c, cpp, javascript, java, php, go and ruby at the moment)
- `weights` (`tuple[float,float,float,float]`): weights of the `ngram_match`, `weighted_ngram_match`, `syntax_match`, and `dataflow_match` respectively, defaults to `(0.25, 0.25, 0.25, 0.25)`
- `tokenizer` (`callable`): to split code string to tokens, defaults to `s.split()`

and outputs the `dict[str, float]` with following fields:
- `codebleu`: the final `CodeBLEU` score
- `ngram_match_score`: `ngram_match` score (BLEU)
- `weighted_ngram_match_score`: `weighted_ngram_match` score (BLEU-weighted)
- `syntax_match_score`: `syntax_match` score (AST match)
- `dataflow_match_score`: `dataflow_match` score

Alternatively, you can use `k4black/codebleu` from HuggingFace Spaces (`codebleu` package required):
```python
import evaluate
metric = evaluate.load("dvitel/codebleu")

prediction = "def add ( a , b ) :\n return a + b"
reference = "def sum ( first , second ) :\n return second + first"

result = metric.compute([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25))
```

Feel free to check the HF Space with online example: [k4black/codebleu](https://huggingface.co/spaces/k4black/codebleu) 


## Contributing

Contributions are welcome!  
If you have any questions, suggestions, or bug reports, please open an issue on GitHub.

Make your own fork and clone it:
```bash
git clone https://github.com/k4black/codebleu
```

For development, you need to install library with `all` precompiled languages and `test` extra:  
(require internet connection to download tree-sitter)
```bash
python -m pip install -e .[all,test]
python -m pip install -e .\[all,test\]  # for macos
```

For testing just run pytest:
```bash
python -m pytest
```

To perform a style check, run:
```bash
python -m isort codebleu --check
python -m black codebleu --check
python -m ruff codebleu
python -m mypy codebleu
```


## License

This project is licensed under the terms of the MIT license.


## Citation

Official [CodeBLEU paper](https://arxiv.org/abs/2009.10297) can be cited as follows:
```bibtex
@misc{ren2020codebleu,
      title={CodeBLEU: a Method for Automatic Evaluation of Code Synthesis}, 
      author={Shuo Ren and Daya Guo and Shuai Lu and Long Zhou and Shujie Liu and Duyu Tang and Neel Sundaresan and Ming Zhou and Ambrosio Blanco and Shuai Ma},
      year={2020},
      eprint={2009.10297},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "codebleu",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "codebleu, code, bleu, nlp, natural language processing, programming, evaluate, evaluation, code generation, metrics",
    "author": null,
    "author_email": "Konstantin Chernyshev <kdchernyshev+github@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/8f/45/87298d89e206d88ce83c5d1c36d3d5bcd4eb0fcd64cfcd36adf58b844093/codebleu-0.7.0.tar.gz",
    "platform": null,
    "description": "# CodeBLEU\n[![Publish](https://github.com/k4black/codebleu/actions/workflows/publish.yml/badge.svg)](https://github.com/k4black/codebleu/actions/workflows/publish.yml)\n[![Test](https://github.com/k4black/codebleu/actions/workflows/test.yml/badge.svg?event=push)](https://github.com/k4black/codebleu/actions/workflows/test.yml)\n[![codecov](https://codecov.io/gh/k4black/codebleu/branch/main/graph/badge.svg?token=60BIFPWRCE)](https://codecov.io/gh/k4black/codebleu)\n[![PyPI version](https://badge.fury.io/py/codebleu.svg)](https://badge.fury.io/py/codebleu)\n\n\nThis repository contains an unofficial `CodeBLEU` implementation that supports `Linux`, `MacOS` (incl. M-series) and `Windows`. It is available through `PyPI` and the `evaluate` library.\n\nAvailable for: `Python`, `C`, `C#`, `C++`, `Java`, `JavaScript`, `PHP`, `Go`, `Ruby`, `Rust`.\n\n---\n\nThe code is based on the original [CodeXGLUE/CodeBLEU](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator/CodeBLEU) and updated version by [XLCoST/CodeBLEU](https://github.com/reddy-lab-code-research/XLCoST/tree/main/code/translation/evaluator/CodeBLEU).  It has been refactored, tested, built for macOS and Windows, and multiple improvements have been made to enhance usability.\n\n## Metric Description\n\n> An ideal evaluation metric should consider the grammatical correctness and the logic correctness.\n> We propose weighted n-gram match and syntactic AST match to measure grammatical correctness, and introduce semantic data-flow match to calculate logic correctness.\n> ![CodeBLEU](CodeBLEU.jpg)  \n[from [CodeXGLUE](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator/CodeBLEU) repo]\n\nIn a nutshell, `CodeBLEU` is a weighted combination of `n-gram match (BLEU)`, `weighted n-gram match (BLEU-weighted)`, `AST match` and `data-flow match` scores.\n\nThe metric has shown higher correlation with human evaluation than `BLEU` and `accuracy` metrics.\n\n\n## Installation\n\nThis library requires `so` file compilation with tree-sitter, so it is platform dependent.  \nCurrently available for `Linux` (manylinux), `MacOS` and `Windows` with Python 3.8+.\n\nThe metrics is available as [pip package](https://pypi.org/project/codebleu/) and can be installed as indicated above:\n```bash\npip install codebleu\n```\nor directly from git repo (require internet connection to download tree-sitter):\n```bash\npip install git+https://github.com/k4black/codebleu.git\n```\n\nAlso you have to install tree-sitter language you need (e.g. python, rust, etc):\n```bash\npip install tree-sitter-python\n```\nOr you can install all languages:\n```bash\npip install codebleu[all]\n```\n\nNote: At the moment (May 2024) precompiled languages are NOT available for arm64 (M1) MacOS, so you have to install and build tree-sitter languages manually, for example:\n```bash\npip install pip install git+https://github.com/tree-sitter/tree-sitter-python.git\n```\n\n\n## Usage \n\n```python\nfrom codebleu import calc_codebleu\n\nprediction = \"def add ( a , b ) :\\n return a + b\"\nreference = \"def sum ( first , second ) :\\n return second + first\"\n\nresult = calc_codebleu([reference], [prediction], lang=\"python\", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None)\nprint(result)\n# {\n#   'codebleu': 0.5537, \n#   'ngram_match_score': 0.1041, \n#   'weighted_ngram_match_score': 0.1109, \n#   'syntax_match_score': 1.0, \n#   'dataflow_match_score': 1.0\n# }\n```\nwhere `calc_codebleu` takes the following arguments:\n- `refarences` (`list[str]` or `list[list[str]]`): reference code\n- `predictions` (`list[str]`) predicted code\n- `lang` (`str`): code language, see `codebleu.AVAILABLE_LANGS` for available languages (python, c_sharp c, cpp, javascript, java, php, go and ruby at the moment)\n- `weights` (`tuple[float,float,float,float]`): weights of the `ngram_match`, `weighted_ngram_match`, `syntax_match`, and `dataflow_match` respectively, defaults to `(0.25, 0.25, 0.25, 0.25)`\n- `tokenizer` (`callable`): to split code string to tokens, defaults to `s.split()`\n\nand outputs the `dict[str, float]` with following fields:\n- `codebleu`: the final `CodeBLEU` score\n- `ngram_match_score`: `ngram_match` score (BLEU)\n- `weighted_ngram_match_score`: `weighted_ngram_match` score (BLEU-weighted)\n- `syntax_match_score`: `syntax_match` score (AST match)\n- `dataflow_match_score`: `dataflow_match` score\n\nAlternatively, you can use `k4black/codebleu` from HuggingFace Spaces (`codebleu` package required):\n```python\nimport evaluate\nmetric = evaluate.load(\"dvitel/codebleu\")\n\nprediction = \"def add ( a , b ) :\\n return a + b\"\nreference = \"def sum ( first , second ) :\\n return second + first\"\n\nresult = metric.compute([reference], [prediction], lang=\"python\", weights=(0.25, 0.25, 0.25, 0.25))\n```\n\nFeel free to check the HF Space with online example: [k4black/codebleu](https://huggingface.co/spaces/k4black/codebleu) \n\n\n## Contributing\n\nContributions are welcome!  \nIf you have any questions, suggestions, or bug reports, please open an issue on GitHub.\n\nMake your own fork and clone it:\n```bash\ngit clone https://github.com/k4black/codebleu\n```\n\nFor development, you need to install library with `all` precompiled languages and `test` extra:  \n(require internet connection to download tree-sitter)\n```bash\npython -m pip install -e .[all,test]\npython -m pip install -e .\\[all,test\\]  # for macos\n```\n\nFor testing just run pytest:\n```bash\npython -m pytest\n```\n\nTo perform a style check, run:\n```bash\npython -m isort codebleu --check\npython -m black codebleu --check\npython -m ruff codebleu\npython -m mypy codebleu\n```\n\n\n## License\n\nThis project is licensed under the terms of the MIT license.\n\n\n## Citation\n\nOfficial [CodeBLEU paper](https://arxiv.org/abs/2009.10297) can be cited as follows:\n```bibtex\n@misc{ren2020codebleu,\n      title={CodeBLEU: a Method for Automatic Evaluation of Code Synthesis}, \n      author={Shuo Ren and Daya Guo and Shuai Lu and Long Zhou and Shujie Liu and Duyu Tang and Neel Sundaresan and Ming Zhou and Ambrosio Blanco and Shuai Ma},\n      year={2020},\n      eprint={2009.10297},\n      archivePrefix={arXiv},\n      primaryClass={cs.SE}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Unofficial CodeBLEU implementation that supports Linux, MacOS and Windows available on PyPI.",
    "version": "0.7.0",
    "project_urls": {
        "homepage": "https://github.com/k4black/codebleu"
    },
    "split_keywords": [
        "codebleu",
        " code",
        " bleu",
        " nlp",
        " natural language processing",
        " programming",
        " evaluate",
        " evaluation",
        " code generation",
        " metrics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a744888354a4cc9376e3ddc26ed5ed765e2ede0dcdba37ea7b71411b9dab90d2",
                "md5": "3363809c700c2a34529612509e91ab5f",
                "sha256": "e664e21cf407a355a726fd5af05e0021d8a162f28999281d907d4e7b36c54873"
            },
            "downloads": -1,
            "filename": "codebleu-0.7.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3363809c700c2a34529612509e91ab5f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 31656,
            "upload_time": "2024-05-30T10:31:58",
            "upload_time_iso_8601": "2024-05-30T10:31:58.717527Z",
            "url": "https://files.pythonhosted.org/packages/a7/44/888354a4cc9376e3ddc26ed5ed765e2ede0dcdba37ea7b71411b9dab90d2/codebleu-0.7.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8f4587298d89e206d88ce83c5d1c36d3d5bcd4eb0fcd64cfcd36adf58b844093",
                "md5": "41e9e4fadd85c7e4233b396764b3b817",
                "sha256": "6f379d5cd1663e1d248ee79d9b23d043c6bc6b60bd03fb106472bae77d2285c3"
            },
            "downloads": -1,
            "filename": "codebleu-0.7.0.tar.gz",
            "has_sig": false,
            "md5_digest": "41e9e4fadd85c7e4233b396764b3b817",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 177961,
            "upload_time": "2024-05-30T10:32:09",
            "upload_time_iso_8601": "2024-05-30T10:32:09.033292Z",
            "url": "https://files.pythonhosted.org/packages/8f/45/87298d89e206d88ce83c5d1c36d3d5bcd4eb0fcd64cfcd36adf58b844093/codebleu-0.7.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-30 10:32:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "k4black",
    "github_project": "codebleu",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "codebleu"
}

None