bulk-translate


Namebulk-translate JSON
Version 0.25.1 PyPI version JSON
download
home_pagehttps://github.com/nicolay-r/bulk-translate
SummaryA tiny Python no-string package for performing translation of a massive CSV/JSONL files with optionally pre-annotated object spans
upload_time2025-01-18 16:13:37
maintainerNone
docs_urlNone
authorNicolay Rusnachenko
requires_python>=3.6
licenseMIT License
keywords natural language processing machine translation translation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # bulk-translate 0.25.1
![](https://img.shields.io/badge/Python-3.9-brightgreen.svg)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nicolay-r/bulk-translate/blob/master/bulk_translate_demo.ipynb)
[![twitter](https://img.shields.io/twitter/url/https/shields.io.svg?style=social)](https://x.com/nicolayr_/status/1871218031709323461)
[![PyPI downloads](https://img.shields.io/pypi/dm/bulk-translate.svg)](https://pypistats.org/packages/bulk-translate)

<p align="center">
    <img src="logo.png"/>
</p>

A tiny Python no-string package for performing translation of a massive `CSV`/`JSONL` files that 
natively provides support of pre-annotated **fixed-spans** that are invariant for translator.

## Description
  
<details>
<summary>
  
### 📘 More on spans
</summary>

<p align="center">
    <img src="example.png"  width="600"/>
</p>

</details>
<details>
<summary>

### 📘 `bulk-translate` features
</summary>

The out-of-the box features of the `bulk-translate` are:
* ✅ Support of the `spans` for annotation / optional translation.
* ✅ Native Implementation of two translation modes:
  - `fast-mode`: exploits extra chars that could be used for grouping all the text parts into single batch with further deconstruction.
  - `accurate`: performs individual translation of each text part.
* ✅ No strings: you're free to adopt any LM / LLM backend.
  - Support `googletrans` by default.
 
</details>

## Installation

From PyPI: 
```bash
pip install bulk-translate
```

or latest version from here:
```bash
pip install git+https://github.com/nicolay-r/bulk-translate
```

## Usage

### API

Please take a look at the [**related Wiki page**](https://github.com/nicolay-r/bulk-translate/wiki)

### Command Line / Shell 

> **NOTE:** Spans supports only in JSON-lines format.
 
> **NOTE:** Requires `source_iter` package installation.

For the following [`test.tsv` example data](/test/data/test.tsv) with annotated entities enclosed in square brackets:

```bash
python -m bulk_translate.translate \
    --src "test/data/test.tsv" \
    --prompt "{text}" \
    --adapter "dynamic:models/googletrans_310a.py:GoogleTranslateModel" \
    --output "test-translated.jsonl" \
    %%m \
    --src "auto" \
    --dest "ru"
```


## Powered by

The pipeline construction components were taken from AREkit [[github]](https://github.com/nicolay-r/AREkit)

<p float="left">
<a href="https://github.com/nicolay-r/AREkit"><img src="https://github.com/nicolay-r/ARElight/assets/14871187/01232f7a-970f-416c-b7a4-1cda48506afe"/></a>
</p>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nicolay-r/bulk-translate",
    "name": "bulk-translate",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "natural language processing, machine translation, translation",
    "author": "Nicolay Rusnachenko",
    "author_email": "rusnicolay@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/c3/7a/8dce2b8ec579ed1da4cfd677b995f4b1106fd6f4e774abdd16c7fdb056c0/bulk_translate-0.25.1.tar.gz",
    "platform": null,
    "description": "# bulk-translate 0.25.1\n![](https://img.shields.io/badge/Python-3.9-brightgreen.svg)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nicolay-r/bulk-translate/blob/master/bulk_translate_demo.ipynb)\n[![twitter](https://img.shields.io/twitter/url/https/shields.io.svg?style=social)](https://x.com/nicolayr_/status/1871218031709323461)\n[![PyPI downloads](https://img.shields.io/pypi/dm/bulk-translate.svg)](https://pypistats.org/packages/bulk-translate)\n\n<p align=\"center\">\n    <img src=\"logo.png\"/>\n</p>\n\nA tiny Python no-string package for performing translation of a massive `CSV`/`JSONL` files that \nnatively provides support of pre-annotated **fixed-spans** that are invariant for translator.\n\n## Description\n  \n<details>\n<summary>\n  \n### \ud83d\udcd8 More on spans\n</summary>\n\n<p align=\"center\">\n    <img src=\"example.png\"  width=\"600\"/>\n</p>\n\n</details>\n<details>\n<summary>\n\n### \ud83d\udcd8 `bulk-translate` features\n</summary>\n\nThe out-of-the box features of the `bulk-translate` are:\n* \u2705 Support of the `spans` for annotation / optional translation.\n* \u2705 Native Implementation of two translation modes:\n  - `fast-mode`: exploits extra chars that could be used for grouping all the text parts into single batch with further deconstruction.\n  - `accurate`: performs individual translation of each text part.\n* \u2705 No strings: you're free to adopt any LM / LLM backend.\n  - Support `googletrans` by default.\n \n</details>\n\n## Installation\n\nFrom PyPI: \n```bash\npip install bulk-translate\n```\n\nor latest version from here:\n```bash\npip install git+https://github.com/nicolay-r/bulk-translate\n```\n\n## Usage\n\n### API\n\nPlease take a look at the [**related Wiki page**](https://github.com/nicolay-r/bulk-translate/wiki)\n\n### Command Line / Shell \n\n> **NOTE:** Spans supports only in JSON-lines format.\n \n> **NOTE:** Requires `source_iter` package installation.\n\nFor the following [`test.tsv` example data](/test/data/test.tsv) with annotated entities enclosed in square brackets:\n\n```bash\npython -m bulk_translate.translate \\\n    --src \"test/data/test.tsv\" \\\n    --prompt \"{text}\" \\\n    --adapter \"dynamic:models/googletrans_310a.py:GoogleTranslateModel\" \\\n    --output \"test-translated.jsonl\" \\\n    %%m \\\n    --src \"auto\" \\\n    --dest \"ru\"\n```\n\n\n## Powered by\n\nThe pipeline construction components were taken from AREkit [[github]](https://github.com/nicolay-r/AREkit)\n\n<p float=\"left\">\n<a href=\"https://github.com/nicolay-r/AREkit\"><img src=\"https://github.com/nicolay-r/ARElight/assets/14871187/01232f7a-970f-416c-b7a4-1cda48506afe\"/></a>\n</p>\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "A tiny Python no-string package for performing translation of a massive CSV/JSONL files with optionally pre-annotated object spans",
    "version": "0.25.1",
    "project_urls": {
        "Homepage": "https://github.com/nicolay-r/bulk-translate"
    },
    "split_keywords": [
        "natural language processing",
        " machine translation",
        " translation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "055d0e5aaabfb219897b5112808dab565f56a7f9887ed643f175c75c6b0133e1",
                "md5": "ea44fdb6366a3021d224c255713787b8",
                "sha256": "b30956d6d117bbc525bb3d10e4687270f1c02b9ab3921edbad1fe020961b2b6a"
            },
            "downloads": -1,
            "filename": "bulk_translate-0.25.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ea44fdb6366a3021d224c255713787b8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 14710,
            "upload_time": "2025-01-18T16:13:36",
            "upload_time_iso_8601": "2025-01-18T16:13:36.488470Z",
            "url": "https://files.pythonhosted.org/packages/05/5d/0e5aaabfb219897b5112808dab565f56a7f9887ed643f175c75c6b0133e1/bulk_translate-0.25.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c37a8dce2b8ec579ed1da4cfd677b995f4b1106fd6f4e774abdd16c7fdb056c0",
                "md5": "d8df43348e29ee38197a044d2fe3c385",
                "sha256": "8311792ec69a7f62068d03ce9ef8329801ef6f4c20b3755cff0517ec3b0535db"
            },
            "downloads": -1,
            "filename": "bulk_translate-0.25.1.tar.gz",
            "has_sig": false,
            "md5_digest": "d8df43348e29ee38197a044d2fe3c385",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 12527,
            "upload_time": "2025-01-18T16:13:37",
            "upload_time_iso_8601": "2025-01-18T16:13:37.690673Z",
            "url": "https://files.pythonhosted.org/packages/c3/7a/8dce2b8ec579ed1da4cfd677b995f4b1106fd6f4e774abdd16c7fdb056c0/bulk_translate-0.25.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-18 16:13:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nicolay-r",
    "github_project": "bulk-translate",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "bulk-translate"
}
        
Elapsed time: 0.47305s