# bulk-translate 0.25.1
![](https://img.shields.io/badge/Python-3.9-brightgreen.svg)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nicolay-r/bulk-translate/blob/master/bulk_translate_demo.ipynb)
[![twitter](https://img.shields.io/twitter/url/https/shields.io.svg?style=social)](https://x.com/nicolayr_/status/1871218031709323461)
[![PyPI downloads](https://img.shields.io/pypi/dm/bulk-translate.svg)](https://pypistats.org/packages/bulk-translate)
<p align="center">
<img src="logo.png"/>
</p>
A tiny Python no-string package for performing translation of a massive `CSV`/`JSONL` files that
natively provides support of pre-annotated **fixed-spans** that are invariant for translator.
## Description
<details>
<summary>
### 📘 More on spans
</summary>
<p align="center">
<img src="example.png" width="600"/>
</p>
</details>
<details>
<summary>
### 📘 `bulk-translate` features
</summary>
The out-of-the box features of the `bulk-translate` are:
* ✅ Support of the `spans` for annotation / optional translation.
* ✅ Native Implementation of two translation modes:
- `fast-mode`: exploits extra chars that could be used for grouping all the text parts into single batch with further deconstruction.
- `accurate`: performs individual translation of each text part.
* ✅ No strings: you're free to adopt any LM / LLM backend.
- Support `googletrans` by default.
</details>
## Installation
From PyPI:
```bash
pip install bulk-translate
```
or latest version from here:
```bash
pip install git+https://github.com/nicolay-r/bulk-translate
```
## Usage
### API
Please take a look at the [**related Wiki page**](https://github.com/nicolay-r/bulk-translate/wiki)
### Command Line / Shell
> **NOTE:** Spans supports only in JSON-lines format.
> **NOTE:** Requires `source_iter` package installation.
For the following [`test.tsv` example data](/test/data/test.tsv) with annotated entities enclosed in square brackets:
```bash
python -m bulk_translate.translate \
--src "test/data/test.tsv" \
--prompt "{text}" \
--adapter "dynamic:models/googletrans_310a.py:GoogleTranslateModel" \
--output "test-translated.jsonl" \
%%m \
--src "auto" \
--dest "ru"
```
## Powered by
The pipeline construction components were taken from AREkit [[github]](https://github.com/nicolay-r/AREkit)
<p float="left">
<a href="https://github.com/nicolay-r/AREkit"><img src="https://github.com/nicolay-r/ARElight/assets/14871187/01232f7a-970f-416c-b7a4-1cda48506afe"/></a>
</p>
Raw data
{
"_id": null,
"home_page": "https://github.com/nicolay-r/bulk-translate",
"name": "bulk-translate",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "natural language processing, machine translation, translation",
"author": "Nicolay Rusnachenko",
"author_email": "rusnicolay@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/c3/7a/8dce2b8ec579ed1da4cfd677b995f4b1106fd6f4e774abdd16c7fdb056c0/bulk_translate-0.25.1.tar.gz",
"platform": null,
"description": "# bulk-translate 0.25.1\n![](https://img.shields.io/badge/Python-3.9-brightgreen.svg)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nicolay-r/bulk-translate/blob/master/bulk_translate_demo.ipynb)\n[![twitter](https://img.shields.io/twitter/url/https/shields.io.svg?style=social)](https://x.com/nicolayr_/status/1871218031709323461)\n[![PyPI downloads](https://img.shields.io/pypi/dm/bulk-translate.svg)](https://pypistats.org/packages/bulk-translate)\n\n<p align=\"center\">\n <img src=\"logo.png\"/>\n</p>\n\nA tiny Python no-string package for performing translation of a massive `CSV`/`JSONL` files that \nnatively provides support of pre-annotated **fixed-spans** that are invariant for translator.\n\n## Description\n \n<details>\n<summary>\n \n### \ud83d\udcd8 More on spans\n</summary>\n\n<p align=\"center\">\n <img src=\"example.png\" width=\"600\"/>\n</p>\n\n</details>\n<details>\n<summary>\n\n### \ud83d\udcd8 `bulk-translate` features\n</summary>\n\nThe out-of-the box features of the `bulk-translate` are:\n* \u2705 Support of the `spans` for annotation / optional translation.\n* \u2705 Native Implementation of two translation modes:\n - `fast-mode`: exploits extra chars that could be used for grouping all the text parts into single batch with further deconstruction.\n - `accurate`: performs individual translation of each text part.\n* \u2705 No strings: you're free to adopt any LM / LLM backend.\n - Support `googletrans` by default.\n \n</details>\n\n## Installation\n\nFrom PyPI: \n```bash\npip install bulk-translate\n```\n\nor latest version from here:\n```bash\npip install git+https://github.com/nicolay-r/bulk-translate\n```\n\n## Usage\n\n### API\n\nPlease take a look at the [**related Wiki page**](https://github.com/nicolay-r/bulk-translate/wiki)\n\n### Command Line / Shell \n\n> **NOTE:** Spans supports only in JSON-lines format.\n \n> **NOTE:** Requires `source_iter` package installation.\n\nFor the following [`test.tsv` example data](/test/data/test.tsv) with annotated entities enclosed in square brackets:\n\n```bash\npython -m bulk_translate.translate \\\n --src \"test/data/test.tsv\" \\\n --prompt \"{text}\" \\\n --adapter \"dynamic:models/googletrans_310a.py:GoogleTranslateModel\" \\\n --output \"test-translated.jsonl\" \\\n %%m \\\n --src \"auto\" \\\n --dest \"ru\"\n```\n\n\n## Powered by\n\nThe pipeline construction components were taken from AREkit [[github]](https://github.com/nicolay-r/AREkit)\n\n<p float=\"left\">\n<a href=\"https://github.com/nicolay-r/AREkit\"><img src=\"https://github.com/nicolay-r/ARElight/assets/14871187/01232f7a-970f-416c-b7a4-1cda48506afe\"/></a>\n</p>\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "A tiny Python no-string package for performing translation of a massive CSV/JSONL files with optionally pre-annotated object spans",
"version": "0.25.1",
"project_urls": {
"Homepage": "https://github.com/nicolay-r/bulk-translate"
},
"split_keywords": [
"natural language processing",
" machine translation",
" translation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "055d0e5aaabfb219897b5112808dab565f56a7f9887ed643f175c75c6b0133e1",
"md5": "ea44fdb6366a3021d224c255713787b8",
"sha256": "b30956d6d117bbc525bb3d10e4687270f1c02b9ab3921edbad1fe020961b2b6a"
},
"downloads": -1,
"filename": "bulk_translate-0.25.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ea44fdb6366a3021d224c255713787b8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 14710,
"upload_time": "2025-01-18T16:13:36",
"upload_time_iso_8601": "2025-01-18T16:13:36.488470Z",
"url": "https://files.pythonhosted.org/packages/05/5d/0e5aaabfb219897b5112808dab565f56a7f9887ed643f175c75c6b0133e1/bulk_translate-0.25.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c37a8dce2b8ec579ed1da4cfd677b995f4b1106fd6f4e774abdd16c7fdb056c0",
"md5": "d8df43348e29ee38197a044d2fe3c385",
"sha256": "8311792ec69a7f62068d03ce9ef8329801ef6f4c20b3755cff0517ec3b0535db"
},
"downloads": -1,
"filename": "bulk_translate-0.25.1.tar.gz",
"has_sig": false,
"md5_digest": "d8df43348e29ee38197a044d2fe3c385",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 12527,
"upload_time": "2025-01-18T16:13:37",
"upload_time_iso_8601": "2025-01-18T16:13:37.690673Z",
"url": "https://files.pythonhosted.org/packages/c3/7a/8dce2b8ec579ed1da4cfd677b995f4b1106fd6f4e774abdd16c7fdb056c0/bulk_translate-0.25.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-18 16:13:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nicolay-r",
"github_project": "bulk-translate",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "bulk-translate"
}