mteval

Name	mteval JSON
Version	0.0.2 JSON
	download
home_page	https://github.com/achimr/mteval
Summary	Library to automate machine translation evaluation
upload_time	2023-04-27 16:51:01
maintainer
docs_url	None
author	Achim Ruopp
requires_python	>=3.7
license	Apache Software License 2.0
keywords	nbdev jupyter notebook python
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            mteval
================

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

<div>

[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/polyglottech/mteval/blob/main/nbs/index.ipynb)

</div>

## Introduction

This library enables easy, automated machine translation evaluation
using the evaluation tools
[sacreBLEU](https://github.com/mjpost/sacrebleu) and
[COMET](https://github.com/Unbabel/COMET). While the evaluation tools
readily provide command line access, they lack dataset handling and
translation of datasets with major online machine translation services.
This is provided by this `mteval` library along with code that logs
evaluation results and enables easier automation for multiple datasets
and MT systems from Python.

## Install

### Installing the library from PyPI

``` sh
pip install mteval
```

### Setting up Cloud authentication and parameters in the environment

This library currently supports the cloud translation services Amazon
Translate, DeepL, Google Translate and Microsoft Translator. To
authenticate with the services and configure them, you need to set the
following enviroment variables:

    export GOOGLE_APPLICATION_CREDENTIALS='/path/to/google/credentials/file.json'
    export GOOGLE_PROJECT_ID=''
    export MS_SUBSCRIPTION_KEY=''
    export MS_REGION=''
    export AWS_DEFAULT_REGION=''
    export AWS_ACCESS_KEY_ID=''
    export AWS_SECRET_ACCESS_KEY=''
    export DEEPL_API_KEY=''
    export MMT_API_KEY=''

#### How to obtain subscription credentials

- [Amazon
  Translate](https://docs.aws.amazon.com/translate/latest/dg/setting-up.html)
- [DeepL](https://www.deepl.com/docs-api/api-access/authentication/)
- [Google Translate](https://cloud.google.com/translate/docs/setup)
- [Microsoft
  Translator](https://learn.microsoft.com/en-us/azure/cognitive-services/translator/how-to-create-translator-resource)

You can set the environment values by adding above `export` statements
to your `.bashrc` file in Linux or in Jupyter notebook by adding
environment variables to the kernel configuration file
[kernel.json](https://jupyter-client.readthedocs.io/en/stable/kernels.html#kernel-specs).

This library has only been tested on Linux, not Windows or MacOS.

### On Google Colab: Loading the environment from a .env file

[Google Colab](https://research.google.com/colaboratory/faq.html), which
is a hosted cloud solution for Jupyter notebooks with GPU runtimes,
doesn’t support persistent environment variables. The environment
variables can be stored in a `.env` file on Google Drive and loaded at
each start of a notebook using `mteval`.

``` python
import os
running_in_colab = 'google.colab' in str(get_ipython())
if running_in_colab:
    from google.colab import drive
    drive.mount('/content/drive')
    homedir = "/content/drive/MyDrive"
else:
    homedir = os.getenv('HOME')
```

Run the following cell to install `mteval` from PyPI

``` python
!pip install mteval
```

Run the following cell to install `mteval` from the Github repository

``` python
!pip install git+https://github.com/polyglottech/mteval.git
```

``` python
from dotenv import load_dotenv

if running_in_colab:
    # Colab doesn't have a mechanism to set environment variables other than python-dotenv
    env_file = homedir+'/secrets/.env'
```

Also make sure to store the Google Cloud credentials JSON file on Google
Drive, e.g. in the `/content/drive/MyDrive/secrets/` folder.

## How to use

This is a short example how to translate a few sentences and how to
score the machine translations with BLEU using human reference
translations. See the [reference
documentation](https://polyglottech.github.io/mteval/) for a complete
list of functions.

``` python
from mteval.microsoftmt import *
from mteval.bleu import *
import json
```

``` python
sources = ["Puissiez-vous passer une semaine intéressante et enrichissante avec nous.",
           "Honorables sénateurs, je connais, bien entendu, les références du ministre de l'Environnement et je pense que c'est une personne admirable.",
           "Il est certain que le renforcement des forces de maintien de la paix et l'envoi d'autres casques bleus ne suffiront pas, compte tenu du mauvais fonctionnement des structures de contrôle et de commandement là-bas."]
references = ["May you have an interesting and useful week with us.",
              "Honourable senators, I am, of course, familiar with the credentials of the Minister of the Environment and consider him an admirable person.",
              "Surely, strengthening and adding more peacekeepers is not sufficient when we know the command and control structures are not working."]

hypotheses = []
msmt = microsofttranslate()
for source in sources:
    translation = msmt.translate_text("fr","en",source)
    print(translation)
    hypotheses.append(translation)

score = json.loads(measure_bleu(hypotheses,references,"en"))
print(score)
```

The source texts and references are from the [Canadian Hansard
corpus](https://www.isi.edu/division3/natural-language/download/hansard/).
For real-world evaluation, the set would have to be at least 100-200
segments long.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/achimr/mteval",
    "name": "mteval",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "nbdev jupyter notebook python",
    "author": "Achim Ruopp",
    "author_email": "achimru@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/0e/fc/eded1f01a8682d7a3ba01b0d83f62a521b58d01a0f8963d515047469cb5b/mteval-0.0.2.tar.gz",
    "platform": null,
    "description": "mteval\n================\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n<div>\n\n[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/polyglottech/mteval/blob/main/nbs/index.ipynb)\n\n</div>\n\n## Introduction\n\nThis library enables easy, automated machine translation evaluation\nusing the evaluation tools\n[sacreBLEU](https://github.com/mjpost/sacrebleu) and\n[COMET](https://github.com/Unbabel/COMET). While the evaluation tools\nreadily provide command line access, they lack dataset handling and\ntranslation of datasets with major online machine translation services.\nThis is provided by this `mteval` library along with code that logs\nevaluation results and enables easier automation for multiple datasets\nand MT systems from Python.\n\n## Install\n\n### Installing the library from PyPI\n\n``` sh\npip install mteval\n```\n\n### Setting up Cloud authentication and parameters in the environment\n\nThis library currently supports the cloud translation services Amazon\nTranslate, DeepL, Google Translate and Microsoft Translator. To\nauthenticate with the services and configure them, you need to set the\nfollowing enviroment variables:\n\n    export GOOGLE_APPLICATION_CREDENTIALS='/path/to/google/credentials/file.json'\n    export GOOGLE_PROJECT_ID=''\n    export MS_SUBSCRIPTION_KEY=''\n    export MS_REGION=''\n    export AWS_DEFAULT_REGION=''\n    export AWS_ACCESS_KEY_ID=''\n    export AWS_SECRET_ACCESS_KEY=''\n    export DEEPL_API_KEY=''\n    export MMT_API_KEY=''\n\n#### How to obtain subscription credentials\n\n- [Amazon\n  Translate](https://docs.aws.amazon.com/translate/latest/dg/setting-up.html)\n- [DeepL](https://www.deepl.com/docs-api/api-access/authentication/)\n- [Google Translate](https://cloud.google.com/translate/docs/setup)\n- [Microsoft\n  Translator](https://learn.microsoft.com/en-us/azure/cognitive-services/translator/how-to-create-translator-resource)\n\nYou can set the environment values by adding above `export` statements\nto your `.bashrc` file in Linux or in Jupyter notebook by adding\nenvironment variables to the kernel configuration file\n[kernel.json](https://jupyter-client.readthedocs.io/en/stable/kernels.html#kernel-specs).\n\nThis library has only been tested on Linux, not Windows or MacOS.\n\n### On Google Colab: Loading the environment from a .env file\n\n[Google Colab](https://research.google.com/colaboratory/faq.html), which\nis a hosted cloud solution for Jupyter notebooks with GPU runtimes,\ndoesn\u2019t support persistent environment variables. The environment\nvariables can be stored in a `.env` file on Google Drive and loaded at\neach start of a notebook using `mteval`.\n\n``` python\nimport os\nrunning_in_colab = 'google.colab' in str(get_ipython())\nif running_in_colab:\n    from google.colab import drive\n    drive.mount('/content/drive')\n    homedir = \"/content/drive/MyDrive\"\nelse:\n    homedir = os.getenv('HOME')\n```\n\nRun the following cell to install `mteval` from PyPI\n\n``` python\n!pip install mteval\n```\n\nRun the following cell to install `mteval` from the Github repository\n\n``` python\n!pip install git+https://github.com/polyglottech/mteval.git\n```\n\n``` python\nfrom dotenv import load_dotenv\n\nif running_in_colab:\n    # Colab doesn't have a mechanism to set environment variables other than python-dotenv\n    env_file = homedir+'/secrets/.env'\n```\n\nAlso make sure to store the Google Cloud credentials JSON file on Google\nDrive, e.g.\u00a0in the `/content/drive/MyDrive/secrets/` folder.\n\n## How to use\n\nThis is a short example how to translate a few sentences and how to\nscore the machine translations with BLEU using human reference\ntranslations. See the [reference\ndocumentation](https://polyglottech.github.io/mteval/) for a complete\nlist of functions.\n\n``` python\nfrom mteval.microsoftmt import *\nfrom mteval.bleu import *\nimport json\n```\n\n``` python\nsources = [\"Puissiez-vous passer une semaine int\u00e9ressante et enrichissante avec nous.\",\n           \"Honorables s\u00e9nateurs, je connais, bien entendu, les r\u00e9f\u00e9rences du ministre de l'Environnement et je pense que c'est une personne admirable.\",\n           \"Il est certain que le renforcement des forces de maintien de la paix et l'envoi d'autres casques bleus ne suffiront pas, compte tenu du mauvais fonctionnement des structures de contr\u00f4le et de commandement l\u00e0-bas.\"]\nreferences = [\"May you have an interesting and useful week with us.\",\n              \"Honourable senators, I am, of course, familiar with the credentials of the Minister of the Environment and consider him an admirable person.\",\n              \"Surely, strengthening and adding more peacekeepers is not sufficient when we know the command and control structures are not working.\"]\n\nhypotheses = []\nmsmt = microsofttranslate()\nfor source in sources:\n    translation = msmt.translate_text(\"fr\",\"en\",source)\n    print(translation)\n    hypotheses.append(translation)\n\nscore = json.loads(measure_bleu(hypotheses,references,\"en\"))\nprint(score)\n```\n\nThe source texts and references are from the [Canadian Hansard\ncorpus](https://www.isi.edu/division3/natural-language/download/hansard/).\nFor real-world evaluation, the set would have to be at least 100-200\nsegments long.\n\n\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "Library to automate machine translation evaluation",
    "version": "0.0.2",
    "split_keywords": [
        "nbdev",
        "jupyter",
        "notebook",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "089ee201da25253cc93568ab19e723e1febd1d8eea85572f7e9e3a74e6012f67",
                "md5": "64814e7cc7313ab111d7721822336fb2",
                "sha256": "b0596b9db5c50a5eea5ef80d600a1de2334008b95f8c5e35232a4667fc443ab7"
            },
            "downloads": -1,
            "filename": "mteval-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "64814e7cc7313ab111d7721822336fb2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 20489,
            "upload_time": "2023-04-27T16:50:59",
            "upload_time_iso_8601": "2023-04-27T16:50:59.885344Z",
            "url": "https://files.pythonhosted.org/packages/08/9e/e201da25253cc93568ab19e723e1febd1d8eea85572f7e9e3a74e6012f67/mteval-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0efceded1f01a8682d7a3ba01b0d83f62a521b58d01a0f8963d515047469cb5b",
                "md5": "4357291d85dc7ec4d6c0e017f5a8f374",
                "sha256": "2175eacbceb492515f44141259fe0df8ae469cd79304134e8acc3b88ee904daa"
            },
            "downloads": -1,
            "filename": "mteval-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "4357291d85dc7ec4d6c0e017f5a8f374",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 19978,
            "upload_time": "2023-04-27T16:51:01",
            "upload_time_iso_8601": "2023-04-27T16:51:01.394117Z",
            "url": "https://files.pythonhosted.org/packages/0e/fc/eded1f01a8682d7a3ba01b0d83f62a521b58d01a0f8963d515047469cb5b/mteval-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-27 16:51:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "achimr",
    "github_project": "mteval",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "mteval"
}

Achim Ruopp