bulk-translate


Namebulk-translate JSON
Version 0.25.2 PyPI version JSON
download
home_pagehttps://github.com/nicolay-r/bulk-translate
SummaryA tiny Python no-string package for performing translation of a massive CSV/JSONL files with optionally pre-annotated object spans
upload_time2025-02-22 12:48:57
maintainerNone
docs_urlNone
authorNicolay Rusnachenko
requires_python>=3.6
licenseMIT License
keywords natural language processing machine translation translation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # bulk-translate 0.25.2
![](https://img.shields.io/badge/Python-3.9-brightgreen.svg)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nicolay-r/bulk-translate/blob/master/bulk_translate_demo.ipynb)
[![twitter](https://img.shields.io/twitter/url/https/shields.io.svg?style=social)](https://x.com/nicolayr_/status/1871218031709323461)
[![PyPI downloads](https://img.shields.io/pypi/dm/bulk-translate.svg)](https://pypistats.org/packages/bulk-translate)

<p align="center">
    <img src="logo.png"/>
</p>
<p align="center">
  <a href="https://github.com/nicolay-r/nlp-thirdgate?tab=readme-ov-file#text-translation"><b>Third-party providers hosting</b>↗️</a>
</p>

A tiny Python no-string package for performing translation of a massive `CSV`/`JSONL` files that 
natively provides support of pre-annotated **fixed-spans** that are invariant for translator.

## Description
  
<details>
<summary>
  
### 📘 More on spans
</summary>

<p align="center">
    <img src="example.png"  width="600"/>
</p>

</details>
<details>
<summary>

### 📘 `bulk-translate` features
</summary>

The out-of-the box features of the `bulk-translate` are:
* ✅ Support of the `spans` for annotation / optional translation.
* ✅ Native Implementation of two translation modes:
  - `fast-mode`: exploits extra chars that could be used for grouping all the text parts into single batch with further deconstruction.
  - `accurate`: performs individual translation of each text part.
* ✅ No strings: you're free to adopt any LM / LLM backend.
  - Support `googletrans` by default.
 
</details>

## Installation

From PyPI: 
```bash
pip install bulk-translate
```

or latest version from here:
```bash
pip install git+https://github.com/nicolay-r/bulk-translate
```

## Usage

### API

### 👉 [Follow this notebook tutorial at `nlp-thirdgate`](https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/translate_texts_with_spans_via_googletrans.ipynb)


## Command Line / Shell 

> **NOTE:** Spans supports only in JSON-lines format.
 
> **NOTE:** Requires `source_iter` package installation.

For the following [`test.tsv` example data](/test/data/test.tsv) with annotated entities enclosed in square brackets:

```bash
python -m bulk_translate.translate \
    --src "test/data/test.tsv" \
    --schema '{"translated":"{text}"}' \
    --adapter "dynamic:models/googletrans_310a.py:GoogleTranslateModel" \
    --output "test-translated.jsonl" \
    --batch-size 10 \
    %%m \
    --src "auto" \
    --dest "ru"
```

## Powered by

The pipeline construction components were taken from AREkit [[github]](https://github.com/nicolay-r/AREkit)

<p float="left">
<a href="https://github.com/nicolay-r/AREkit"><img src="https://github.com/nicolay-r/ARElight/assets/14871187/01232f7a-970f-416c-b7a4-1cda48506afe"/></a>
</p>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nicolay-r/bulk-translate",
    "name": "bulk-translate",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "natural language processing, machine translation, translation",
    "author": "Nicolay Rusnachenko",
    "author_email": "rusnicolay@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/85/66/fd8e25ee54f5dc883c50bafdc107b46d8e081e4260808584f72c31136b81/bulk_translate-0.25.2.tar.gz",
    "platform": null,
    "description": "# bulk-translate 0.25.2\n![](https://img.shields.io/badge/Python-3.9-brightgreen.svg)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nicolay-r/bulk-translate/blob/master/bulk_translate_demo.ipynb)\n[![twitter](https://img.shields.io/twitter/url/https/shields.io.svg?style=social)](https://x.com/nicolayr_/status/1871218031709323461)\n[![PyPI downloads](https://img.shields.io/pypi/dm/bulk-translate.svg)](https://pypistats.org/packages/bulk-translate)\n\n<p align=\"center\">\n    <img src=\"logo.png\"/>\n</p>\n<p align=\"center\">\n  <a href=\"https://github.com/nicolay-r/nlp-thirdgate?tab=readme-ov-file#text-translation\"><b>Third-party providers hosting</b>\u2197\ufe0f</a>\n</p>\n\nA tiny Python no-string package for performing translation of a massive `CSV`/`JSONL` files that \nnatively provides support of pre-annotated **fixed-spans** that are invariant for translator.\n\n## Description\n  \n<details>\n<summary>\n  \n### \ud83d\udcd8 More on spans\n</summary>\n\n<p align=\"center\">\n    <img src=\"example.png\"  width=\"600\"/>\n</p>\n\n</details>\n<details>\n<summary>\n\n### \ud83d\udcd8 `bulk-translate` features\n</summary>\n\nThe out-of-the box features of the `bulk-translate` are:\n* \u2705 Support of the `spans` for annotation / optional translation.\n* \u2705 Native Implementation of two translation modes:\n  - `fast-mode`: exploits extra chars that could be used for grouping all the text parts into single batch with further deconstruction.\n  - `accurate`: performs individual translation of each text part.\n* \u2705 No strings: you're free to adopt any LM / LLM backend.\n  - Support `googletrans` by default.\n \n</details>\n\n## Installation\n\nFrom PyPI: \n```bash\npip install bulk-translate\n```\n\nor latest version from here:\n```bash\npip install git+https://github.com/nicolay-r/bulk-translate\n```\n\n## Usage\n\n### API\n\n### \ud83d\udc49 [Follow this notebook tutorial at `nlp-thirdgate`](https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/translate_texts_with_spans_via_googletrans.ipynb)\n\n\n## Command Line / Shell \n\n> **NOTE:** Spans supports only in JSON-lines format.\n \n> **NOTE:** Requires `source_iter` package installation.\n\nFor the following [`test.tsv` example data](/test/data/test.tsv) with annotated entities enclosed in square brackets:\n\n```bash\npython -m bulk_translate.translate \\\n    --src \"test/data/test.tsv\" \\\n    --schema '{\"translated\":\"{text}\"}' \\\n    --adapter \"dynamic:models/googletrans_310a.py:GoogleTranslateModel\" \\\n    --output \"test-translated.jsonl\" \\\n    --batch-size 10 \\\n    %%m \\\n    --src \"auto\" \\\n    --dest \"ru\"\n```\n\n## Powered by\n\nThe pipeline construction components were taken from AREkit [[github]](https://github.com/nicolay-r/AREkit)\n\n<p float=\"left\">\n<a href=\"https://github.com/nicolay-r/AREkit\"><img src=\"https://github.com/nicolay-r/ARElight/assets/14871187/01232f7a-970f-416c-b7a4-1cda48506afe\"/></a>\n</p>\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "A tiny Python no-string package for performing translation of a massive CSV/JSONL files with optionally pre-annotated object spans",
    "version": "0.25.2",
    "project_urls": {
        "Homepage": "https://github.com/nicolay-r/bulk-translate"
    },
    "split_keywords": [
        "natural language processing",
        " machine translation",
        " translation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b84513bfef33da4ea2b9cd34694d24870228c2cbc5769872e41ff86573fed9e9",
                "md5": "5d89bab503644af406a26a8e9100f7bd",
                "sha256": "a1168f4a5c431caa4b452fb38ac63747f17b8f7fc135c6c6e72330fea52b189f"
            },
            "downloads": -1,
            "filename": "bulk_translate-0.25.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5d89bab503644af406a26a8e9100f7bd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 15225,
            "upload_time": "2025-02-22T12:48:55",
            "upload_time_iso_8601": "2025-02-22T12:48:55.619324Z",
            "url": "https://files.pythonhosted.org/packages/b8/45/13bfef33da4ea2b9cd34694d24870228c2cbc5769872e41ff86573fed9e9/bulk_translate-0.25.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8566fd8e25ee54f5dc883c50bafdc107b46d8e081e4260808584f72c31136b81",
                "md5": "72541847b2bbad798f25197120210454",
                "sha256": "c8ed141e543dbdc73031071985c08dca133f50865025f41a6168f43d5523f501"
            },
            "downloads": -1,
            "filename": "bulk_translate-0.25.2.tar.gz",
            "has_sig": false,
            "md5_digest": "72541847b2bbad798f25197120210454",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 13251,
            "upload_time": "2025-02-22T12:48:57",
            "upload_time_iso_8601": "2025-02-22T12:48:57.444965Z",
            "url": "https://files.pythonhosted.org/packages/85/66/fd8e25ee54f5dc883c50bafdc107b46d8e081e4260808584f72c31136b81/bulk_translate-0.25.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-22 12:48:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nicolay-r",
    "github_project": "bulk-translate",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "bulk-translate"
}
        
Elapsed time: 1.21641s