# bulk-translate 0.25.2

[](https://colab.research.google.com/github/nicolay-r/bulk-translate/blob/master/bulk_translate_demo.ipynb)
[](https://x.com/nicolayr_/status/1871218031709323461)
[](https://pypistats.org/packages/bulk-translate)
<p align="center">
<img src="logo.png"/>
</p>
<p align="center">
<a href="https://github.com/nicolay-r/nlp-thirdgate?tab=readme-ov-file#text-translation"><b>Third-party providers hosting</b>↗️</a>
</p>
A tiny Python no-string package for performing translation of a massive `CSV`/`JSONL` files that
natively provides support of pre-annotated **fixed-spans** that are invariant for translator.
## Description
<details>
<summary>
### 📘 More on spans
</summary>
<p align="center">
<img src="example.png" width="600"/>
</p>
</details>
<details>
<summary>
### 📘 `bulk-translate` features
</summary>
The out-of-the box features of the `bulk-translate` are:
* ✅ Support of the `spans` for annotation / optional translation.
* ✅ Native Implementation of two translation modes:
- `fast-mode`: exploits extra chars that could be used for grouping all the text parts into single batch with further deconstruction.
- `accurate`: performs individual translation of each text part.
* ✅ No strings: you're free to adopt any LM / LLM backend.
- Support `googletrans` by default.
</details>
## Installation
From PyPI:
```bash
pip install bulk-translate
```
or latest version from here:
```bash
pip install git+https://github.com/nicolay-r/bulk-translate
```
## Usage
### API
### 👉 [Follow this notebook tutorial at `nlp-thirdgate`](https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/translate_texts_with_spans_via_googletrans.ipynb)
## Command Line / Shell
> **NOTE:** Spans supports only in JSON-lines format.
> **NOTE:** Requires `source_iter` package installation.
For the following [`test.tsv` example data](/test/data/test.tsv) with annotated entities enclosed in square brackets:
```bash
python -m bulk_translate.translate \
--src "test/data/test.tsv" \
--schema '{"translated":"{text}"}' \
--adapter "dynamic:models/googletrans_310a.py:GoogleTranslateModel" \
--output "test-translated.jsonl" \
--batch-size 10 \
%%m \
--src "auto" \
--dest "ru"
```
## Powered by
The pipeline construction components were taken from AREkit [[github]](https://github.com/nicolay-r/AREkit)
<p float="left">
<a href="https://github.com/nicolay-r/AREkit"><img src="https://github.com/nicolay-r/ARElight/assets/14871187/01232f7a-970f-416c-b7a4-1cda48506afe"/></a>
</p>
Raw data
{
"_id": null,
"home_page": "https://github.com/nicolay-r/bulk-translate",
"name": "bulk-translate",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "natural language processing, machine translation, translation",
"author": "Nicolay Rusnachenko",
"author_email": "rusnicolay@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/85/66/fd8e25ee54f5dc883c50bafdc107b46d8e081e4260808584f72c31136b81/bulk_translate-0.25.2.tar.gz",
"platform": null,
"description": "# bulk-translate 0.25.2\n\n[](https://colab.research.google.com/github/nicolay-r/bulk-translate/blob/master/bulk_translate_demo.ipynb)\n[](https://x.com/nicolayr_/status/1871218031709323461)\n[](https://pypistats.org/packages/bulk-translate)\n\n<p align=\"center\">\n <img src=\"logo.png\"/>\n</p>\n<p align=\"center\">\n <a href=\"https://github.com/nicolay-r/nlp-thirdgate?tab=readme-ov-file#text-translation\"><b>Third-party providers hosting</b>\u2197\ufe0f</a>\n</p>\n\nA tiny Python no-string package for performing translation of a massive `CSV`/`JSONL` files that \nnatively provides support of pre-annotated **fixed-spans** that are invariant for translator.\n\n## Description\n \n<details>\n<summary>\n \n### \ud83d\udcd8 More on spans\n</summary>\n\n<p align=\"center\">\n <img src=\"example.png\" width=\"600\"/>\n</p>\n\n</details>\n<details>\n<summary>\n\n### \ud83d\udcd8 `bulk-translate` features\n</summary>\n\nThe out-of-the box features of the `bulk-translate` are:\n* \u2705 Support of the `spans` for annotation / optional translation.\n* \u2705 Native Implementation of two translation modes:\n - `fast-mode`: exploits extra chars that could be used for grouping all the text parts into single batch with further deconstruction.\n - `accurate`: performs individual translation of each text part.\n* \u2705 No strings: you're free to adopt any LM / LLM backend.\n - Support `googletrans` by default.\n \n</details>\n\n## Installation\n\nFrom PyPI: \n```bash\npip install bulk-translate\n```\n\nor latest version from here:\n```bash\npip install git+https://github.com/nicolay-r/bulk-translate\n```\n\n## Usage\n\n### API\n\n### \ud83d\udc49 [Follow this notebook tutorial at `nlp-thirdgate`](https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/translate_texts_with_spans_via_googletrans.ipynb)\n\n\n## Command Line / Shell \n\n> **NOTE:** Spans supports only in JSON-lines format.\n \n> **NOTE:** Requires `source_iter` package installation.\n\nFor the following [`test.tsv` example data](/test/data/test.tsv) with annotated entities enclosed in square brackets:\n\n```bash\npython -m bulk_translate.translate \\\n --src \"test/data/test.tsv\" \\\n --schema '{\"translated\":\"{text}\"}' \\\n --adapter \"dynamic:models/googletrans_310a.py:GoogleTranslateModel\" \\\n --output \"test-translated.jsonl\" \\\n --batch-size 10 \\\n %%m \\\n --src \"auto\" \\\n --dest \"ru\"\n```\n\n## Powered by\n\nThe pipeline construction components were taken from AREkit [[github]](https://github.com/nicolay-r/AREkit)\n\n<p float=\"left\">\n<a href=\"https://github.com/nicolay-r/AREkit\"><img src=\"https://github.com/nicolay-r/ARElight/assets/14871187/01232f7a-970f-416c-b7a4-1cda48506afe\"/></a>\n</p>\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "A tiny Python no-string package for performing translation of a massive CSV/JSONL files with optionally pre-annotated object spans",
"version": "0.25.2",
"project_urls": {
"Homepage": "https://github.com/nicolay-r/bulk-translate"
},
"split_keywords": [
"natural language processing",
" machine translation",
" translation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b84513bfef33da4ea2b9cd34694d24870228c2cbc5769872e41ff86573fed9e9",
"md5": "5d89bab503644af406a26a8e9100f7bd",
"sha256": "a1168f4a5c431caa4b452fb38ac63747f17b8f7fc135c6c6e72330fea52b189f"
},
"downloads": -1,
"filename": "bulk_translate-0.25.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5d89bab503644af406a26a8e9100f7bd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 15225,
"upload_time": "2025-02-22T12:48:55",
"upload_time_iso_8601": "2025-02-22T12:48:55.619324Z",
"url": "https://files.pythonhosted.org/packages/b8/45/13bfef33da4ea2b9cd34694d24870228c2cbc5769872e41ff86573fed9e9/bulk_translate-0.25.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8566fd8e25ee54f5dc883c50bafdc107b46d8e081e4260808584f72c31136b81",
"md5": "72541847b2bbad798f25197120210454",
"sha256": "c8ed141e543dbdc73031071985c08dca133f50865025f41a6168f43d5523f501"
},
"downloads": -1,
"filename": "bulk_translate-0.25.2.tar.gz",
"has_sig": false,
"md5_digest": "72541847b2bbad798f25197120210454",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 13251,
"upload_time": "2025-02-22T12:48:57",
"upload_time_iso_8601": "2025-02-22T12:48:57.444965Z",
"url": "https://files.pythonhosted.org/packages/85/66/fd8e25ee54f5dc883c50bafdc107b46d8e081e4260808584f72c31136b81/bulk_translate-0.25.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-22 12:48:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nicolay-r",
"github_project": "bulk-translate",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "bulk-translate"
}