asian-mtl

Name	asian-mtl JSON
Version	0.1.2 JSON
	download
home_page	https://github.com/EasierMTL/asian_mtl
Summary	Seamlessly translate East Asian texts with deep learning models.
upload_time	2022-12-02 05:57:10
maintainer
docs_url	None
author	Joseph Chen
requires_python	>=3.8,<4.0
license
keywords	nlp translation
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # `asian_mtl`

This repository contains the code and documentation for the machine translation models used for EasierMTL's API.

Improved version of the models in the original repository: [EasierMTL/chinese-translation-app](https://github.com/EasierMTL/chinese-translation-app/tree/main/server/chinese_translation_api)

## Supported Translators

All translators support dynamic quantization! [Our benchmarks](#benchmarks) indicate that they 2x inference speeds, while losing <1% BLEU.

- `ChineseToEnglishTranslator()`
- `EnglishToChineseTranslator()`

## Getting Started

```bash
pip install asian-mtl
```

Here's a simple example:

```python
from asian_mtl.models.base import ChineseToEnglishTranslator

translator = ChineseToEnglishTranslator()
# Quantize for better CPU production performance!
translator.quantize()

prediction = translator.predict("我爱ECSE484.")
print(prediction)
# prediction will be:
# "I love ECSE 484."
```

And you're good to go!

If you are contributing, run:

```bash
# https://stackoverflow.com/questions/59882884/vscode-doesnt-show-poetry-virtualenvs-in-select-interpreter-option

poetry config virtualenvs.in-project true

# shows the name of the current environment
poetry env list

poetry install
```

## Usage

When you are using quantized models in this repository, make sure to set `torch.set_num_threads(1)`. This is not set under-the-hood because it could interfere with user setups in an invasive way.

Not doing so will make the quantized models slower than their vanilla counterparts.

## Evaluation

See [`scripts`](./scripts) for evaluation scripts.

To run the scripts, simply run:

```bash
# Running with CLI and config with BERT
python ./scripts/evaluation/eval.py -c ./scripts/evaluation/configs/helsinki.yaml
```

Change the config [`helsinki.yaml`](./scripts/evaluation/configs/helsinki.yaml) to use quantized or your specific use case.

### Benchmarks

Here are some basic benchmarks of models in this repository:

| Model                      | Quantized? | N   | BLEU  | Runtime |
| -------------------------- | ---------- | --- | ----- | ------- |
| Helsinki-NLP/opus-mt-zh-en | No         | 100 | 0.319 | 27s     |
|                            | Yes        | 100 | 0.306 | 13.5s   |

The benchmarks described in the [docs](./docs/evaluation/EVALUATION_REG.md) are a little out-of-date.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/EasierMTL/asian_mtl",
    "name": "asian-mtl",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "nlp,translation",
    "author": "Joseph Chen",
    "author_email": "jchen42703@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/68/37/ae97e331e7bd133b522e4f434d5f1c57d71f0457da3239d73054f1a0b5cc/asian_mtl-0.1.2.tar.gz",
    "platform": null,
    "description": "# `asian_mtl`\n\nThis repository contains the code and documentation for the machine translation models used for EasierMTL's API.\n\nImproved version of the models in the original repository: [EasierMTL/chinese-translation-app](https://github.com/EasierMTL/chinese-translation-app/tree/main/server/chinese_translation_api)\n\n## Supported Translators\n\nAll translators support dynamic quantization! [Our benchmarks](#benchmarks) indicate that they 2x inference speeds, while losing <1% BLEU.\n\n- `ChineseToEnglishTranslator()`\n- `EnglishToChineseTranslator()`\n\n## Getting Started\n\n```bash\npip install asian-mtl\n```\n\nHere's a simple example:\n\n```python\nfrom asian_mtl.models.base import ChineseToEnglishTranslator\n\ntranslator = ChineseToEnglishTranslator()\n# Quantize for better CPU production performance!\ntranslator.quantize()\n\nprediction = translator.predict(\"\u6211\u7231ECSE484.\")\nprint(prediction)\n# prediction will be:\n# \"I love ECSE 484.\"\n```\n\nAnd you're good to go!\n\nIf you are contributing, run:\n\n```bash\n# https://stackoverflow.com/questions/59882884/vscode-doesnt-show-poetry-virtualenvs-in-select-interpreter-option\n\npoetry config virtualenvs.in-project true\n\n# shows the name of the current environment\npoetry env list\n\npoetry install\n```\n\n## Usage\n\nWhen you are using quantized models in this repository, make sure to set `torch.set_num_threads(1)`. This is not set under-the-hood because it could interfere with user setups in an invasive way.\n\nNot doing so will make the quantized models slower than their vanilla counterparts.\n\n## Evaluation\n\nSee [`scripts`](./scripts) for evaluation scripts.\n\nTo run the scripts, simply run:\n\n```bash\n# Running with CLI and config with BERT\npython ./scripts/evaluation/eval.py -c ./scripts/evaluation/configs/helsinki.yaml\n```\n\nChange the config [`helsinki.yaml`](./scripts/evaluation/configs/helsinki.yaml) to use quantized or your specific use case.\n\n### Benchmarks\n\nHere are some basic benchmarks of models in this repository:\n\n| Model                      | Quantized? | N   | BLEU  | Runtime |\n| -------------------------- | ---------- | --- | ----- | ------- |\n| Helsinki-NLP/opus-mt-zh-en | No         | 100 | 0.319 | 27s     |\n|                            | Yes        | 100 | 0.306 | 13.5s   |\n\nThe benchmarks described in the [docs](./docs/evaluation/EVALUATION_REG.md) are a little out-of-date.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Seamlessly translate East Asian texts with deep learning models.",
    "version": "0.1.2",
    "split_keywords": [
        "nlp",
        "translation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "aa5ed9f137f7bdeb9a31b1ccc48f6833",
                "sha256": "2371e184cb00c0404afbe8dbce5874c3b1bfc0f4ffb056f0b8efdfd640db3e0a"
            },
            "downloads": -1,
            "filename": "asian_mtl-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "aa5ed9f137f7bdeb9a31b1ccc48f6833",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 8574,
            "upload_time": "2022-12-02T05:57:07",
            "upload_time_iso_8601": "2022-12-02T05:57:07.925364Z",
            "url": "https://files.pythonhosted.org/packages/5c/4b/1b13c34c66a89bd7db77b1fa88b2bd462ae00019215d1bcd527575de7ee5/asian_mtl-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "90e78d289466ba6eb37a4addca1ec1d0",
                "sha256": "2c9bb678d1e2eddf467311189a689cbd58b80edd450c1065fc3945715781f169"
            },
            "downloads": -1,
            "filename": "asian_mtl-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "90e78d289466ba6eb37a4addca1ec1d0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 8403,
            "upload_time": "2022-12-02T05:57:10",
            "upload_time_iso_8601": "2022-12-02T05:57:10.177241Z",
            "url": "https://files.pythonhosted.org/packages/68/37/ae97e331e7bd133b522e4f434d5f1c57d71f0457da3239d73054f1a0b5cc/asian_mtl-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-02 05:57:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "EasierMTL",
    "github_project": "asian_mtl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "asian-mtl"
}

Joseph Chen