en2th-transliterator


Nameen2th-transliterator JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/tchayintr/en2th-transliterator
SummaryA Python package for transliterating English text to Thai using a ByT5 model
upload_time2025-02-26 04:10:34
maintainerNone
docs_urlNone
authorThodsaporn Chay-intr
requires_python>=3.6
licenseMIT
keywords thai transliteration nlp byt5 transformers
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # En2Th Transliterator

[![PyPI version](https://badge.fury.io/py/en2th-transliterator.svg)](https://badge.fury.io/py/en2th-transliterator)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A Python package for transliterating English text to Thai using a **ByT5** model.

## Features

- **Byte-level processing**: More robust against spelling variations
- **Beam search & sampling**: Allows fine-tuning of output quality
- **Batch processing**: Efficient for large-scale transliteration
- **Mixed precision (FP16)**: Faster inference on compatible GPUs
- **Command-line interface**: Easy to use from the terminal
- **Hugging Face integration**: Automatically downloads and caches the model

## Installation

You can install the package via pip:

```bash
pip install en2th-transliterator
```

## Usage

### As a Python Package

#### Basic Usage

```python
from en2th_transliterator import En2ThTransliterator

# Initialize with the default model
model = En2ThTransliterator()

# Transliterate a single text
thai_text = model.transliterate("hello")
print(f"Thai: {thai_text}")
```

#### Advanced Usage

```python
from en2th_transliterator import En2ThTransliterator

# Initialize with custom parameters
model = En2ThTransliterator(
    model_path=None,  # Use default HF model
    max_length=50,
    num_beams=5,
    length_penalty=1.5,
    verbose=True,
    fp16=True  # Enable mixed precision
)

# Transliterate using sampling
thai_text = model.transliterate(
    "artificial intelligence",
    temperature=0.8,
    top_k=40,
    top_p=0.95
)
print(f"Thai: {thai_text}")

# Batch transliteration
english_texts = ["computer", "keyboard", "mouse", "monitor"]
thai_texts = model.batch_transliterate(
    english_texts,
    batch_size=2,
    temperature=0.5
)

for eng, thai in zip(english_texts, thai_texts):
    print(f"{eng} → {thai}")
```

### Command Line Interface

#### Basic Usage
```bash
en2th-transliterate --text "hello"
```

#### Transliterate from a File
```bash
en2th-transliterate --file input.txt --output results.txt
```

#### Output in JSON Format
```bash
en2th-transliterate --file input.txt --format json --output results.json
```

#### Output in TSV Format
```bash
en2th-transliterate --file input.txt --format tsv --output results.tsv
```

#### Using Custom Parameters
```bash
en2th-transliterate --text "hello" --fp16 --temperature 0.7 --num-beams 5
```

## Model

The package utilizes a **ByT5** model fine-tuned on English-to-Thai transliteration data. The model operates at the **byte level**, making it effective for handling various input variations and generating Thai text with high accuracy.

This package uses the [yacht/byt5-base-en2th-transliterator](https://huggingface.co/yacht/byt5-base-en2th-transliterator) model from Hugging Face Hub.

## Performance Optimization

### FP16 Mixed Precision

The package supports FP16 mixed precision for faster inference on compatible GPUs. This is enabled by default but can be disabled if needed:

```python
model = En2ThTransliterator(fp16=False)
```

Or from the command line:

```bash
en2th-transliterate --text "hello" --no-fp16
```

### Batch Processing

For transliterating multiple texts, batch processing is more efficient:

```python
texts = ["hello", "world", "computer", "science"]
results = model.batch_transliterate(texts, batch_size=4)
```

## Development

### Setting Up Development Environment

```bash
# Clone the repository
git clone https://github.com/tchayintr/en2th-transliterator.git
cd en2th-transliterator

# Install in development mode
pip install -e .
```

### Running Tests

```bash
# Create a test script
python test_package.py
```

### Building the Package

```bash
# Install build tools
pip install build twine

# Build the package
python -m build

# Upload to PyPI
python -m twine upload dist/*
```

## License

This project is licensed under the **MIT License** - see the LICENSE file for details.

## Citation

If you use this package in your research, please cite:

```bibtex
@software{en2th_transliterator,
  author = {Thodsaporn Chay-intr},
  title = {En2Th Transliterator: English to Thai Transliteration using ByT5},
  year = {2025},
  url = {https://github.com/tchayintr/en2th-transliterator}
}
```

## Acknowledgements

- This package uses the [ByT5](https://huggingface.co/google/byt5-base) architecture developed by Google Research
- The model was fine-tuned on English-Thai transliteration data from [here](https://github.com/wannaphong/thai-english-transliteration-dictionary)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/tchayintr/en2th-transliterator",
    "name": "en2th-transliterator",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "thai, transliteration, nlp, byt5, transformers",
    "author": "Thodsaporn Chay-intr",
    "author_email": "Thodsaporn Chay-intr <t.chayintr@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/6e/ea/1f6715bb87d984cc9e42229a09783e97e290c014287713f2bd7cc42036a8/en2th_transliterator-0.1.0.tar.gz",
    "platform": null,
    "description": "# En2Th Transliterator\n\n[![PyPI version](https://badge.fury.io/py/en2th-transliterator.svg)](https://badge.fury.io/py/en2th-transliterator)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nA Python package for transliterating English text to Thai using a **ByT5** model.\n\n## Features\n\n- **Byte-level processing**: More robust against spelling variations\n- **Beam search & sampling**: Allows fine-tuning of output quality\n- **Batch processing**: Efficient for large-scale transliteration\n- **Mixed precision (FP16)**: Faster inference on compatible GPUs\n- **Command-line interface**: Easy to use from the terminal\n- **Hugging Face integration**: Automatically downloads and caches the model\n\n## Installation\n\nYou can install the package via pip:\n\n```bash\npip install en2th-transliterator\n```\n\n## Usage\n\n### As a Python Package\n\n#### Basic Usage\n\n```python\nfrom en2th_transliterator import En2ThTransliterator\n\n# Initialize with the default model\nmodel = En2ThTransliterator()\n\n# Transliterate a single text\nthai_text = model.transliterate(\"hello\")\nprint(f\"Thai: {thai_text}\")\n```\n\n#### Advanced Usage\n\n```python\nfrom en2th_transliterator import En2ThTransliterator\n\n# Initialize with custom parameters\nmodel = En2ThTransliterator(\n    model_path=None,  # Use default HF model\n    max_length=50,\n    num_beams=5,\n    length_penalty=1.5,\n    verbose=True,\n    fp16=True  # Enable mixed precision\n)\n\n# Transliterate using sampling\nthai_text = model.transliterate(\n    \"artificial intelligence\",\n    temperature=0.8,\n    top_k=40,\n    top_p=0.95\n)\nprint(f\"Thai: {thai_text}\")\n\n# Batch transliteration\nenglish_texts = [\"computer\", \"keyboard\", \"mouse\", \"monitor\"]\nthai_texts = model.batch_transliterate(\n    english_texts,\n    batch_size=2,\n    temperature=0.5\n)\n\nfor eng, thai in zip(english_texts, thai_texts):\n    print(f\"{eng} \u2192 {thai}\")\n```\n\n### Command Line Interface\n\n#### Basic Usage\n```bash\nen2th-transliterate --text \"hello\"\n```\n\n#### Transliterate from a File\n```bash\nen2th-transliterate --file input.txt --output results.txt\n```\n\n#### Output in JSON Format\n```bash\nen2th-transliterate --file input.txt --format json --output results.json\n```\n\n#### Output in TSV Format\n```bash\nen2th-transliterate --file input.txt --format tsv --output results.tsv\n```\n\n#### Using Custom Parameters\n```bash\nen2th-transliterate --text \"hello\" --fp16 --temperature 0.7 --num-beams 5\n```\n\n## Model\n\nThe package utilizes a **ByT5** model fine-tuned on English-to-Thai transliteration data. The model operates at the **byte level**, making it effective for handling various input variations and generating Thai text with high accuracy.\n\nThis package uses the [yacht/byt5-base-en2th-transliterator](https://huggingface.co/yacht/byt5-base-en2th-transliterator) model from Hugging Face Hub.\n\n## Performance Optimization\n\n### FP16 Mixed Precision\n\nThe package supports FP16 mixed precision for faster inference on compatible GPUs. This is enabled by default but can be disabled if needed:\n\n```python\nmodel = En2ThTransliterator(fp16=False)\n```\n\nOr from the command line:\n\n```bash\nen2th-transliterate --text \"hello\" --no-fp16\n```\n\n### Batch Processing\n\nFor transliterating multiple texts, batch processing is more efficient:\n\n```python\ntexts = [\"hello\", \"world\", \"computer\", \"science\"]\nresults = model.batch_transliterate(texts, batch_size=4)\n```\n\n## Development\n\n### Setting Up Development Environment\n\n```bash\n# Clone the repository\ngit clone https://github.com/tchayintr/en2th-transliterator.git\ncd en2th-transliterator\n\n# Install in development mode\npip install -e .\n```\n\n### Running Tests\n\n```bash\n# Create a test script\npython test_package.py\n```\n\n### Building the Package\n\n```bash\n# Install build tools\npip install build twine\n\n# Build the package\npython -m build\n\n# Upload to PyPI\npython -m twine upload dist/*\n```\n\n## License\n\nThis project is licensed under the **MIT License** - see the LICENSE file for details.\n\n## Citation\n\nIf you use this package in your research, please cite:\n\n```bibtex\n@software{en2th_transliterator,\n  author = {Thodsaporn Chay-intr},\n  title = {En2Th Transliterator: English to Thai Transliteration using ByT5},\n  year = {2025},\n  url = {https://github.com/tchayintr/en2th-transliterator}\n}\n```\n\n## Acknowledgements\n\n- This package uses the [ByT5](https://huggingface.co/google/byt5-base) architecture developed by Google Research\n- The model was fine-tuned on English-Thai transliteration data from [here](https://github.com/wannaphong/thai-english-transliteration-dictionary)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python package for transliterating English text to Thai using a ByT5 model",
    "version": "0.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/tchayintr/en2th-transliterator/issues",
        "Homepage": "https://github.com/tchayintr/en2th-transliterator"
    },
    "split_keywords": [
        "thai",
        " transliteration",
        " nlp",
        " byt5",
        " transformers"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9cd2e80698c18677c8ba66cd45ead8a1cc1d8718a9c61eed3ab098c96ae13159",
                "md5": "514a36fe5baee17bb54d1e9ad89b4c10",
                "sha256": "c073e1cdfe88b42f07fabc47e7c07158b6a1f6e0579774dee54868eac315ef0a"
            },
            "downloads": -1,
            "filename": "en2th_transliterator-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "514a36fe5baee17bb54d1e9ad89b4c10",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 10354,
            "upload_time": "2025-02-26T04:10:31",
            "upload_time_iso_8601": "2025-02-26T04:10:31.962383Z",
            "url": "https://files.pythonhosted.org/packages/9c/d2/e80698c18677c8ba66cd45ead8a1cc1d8718a9c61eed3ab098c96ae13159/en2th_transliterator-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6eea1f6715bb87d984cc9e42229a09783e97e290c014287713f2bd7cc42036a8",
                "md5": "a6cfc5265089baa7c4d7ea4764dfba30",
                "sha256": "45960156448c541d6a7fc242213e07b5c48d6185bd0e9aba92e0ca0f5c8f36e5"
            },
            "downloads": -1,
            "filename": "en2th_transliterator-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a6cfc5265089baa7c4d7ea4764dfba30",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 10414,
            "upload_time": "2025-02-26T04:10:34",
            "upload_time_iso_8601": "2025-02-26T04:10:34.925677Z",
            "url": "https://files.pythonhosted.org/packages/6e/ea/1f6715bb87d984cc9e42229a09783e97e290c014287713f2bd7cc42036a8/en2th_transliterator-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-26 04:10:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tchayintr",
    "github_project": "en2th-transliterator",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "en2th-transliterator"
}
        
Elapsed time: 0.44839s