# En2Th Transliterator
[](https://badge.fury.io/py/en2th-transliterator)
[](https://opensource.org/licenses/MIT)
A Python package for transliterating English text to Thai using a **ByT5** model.
## Features
- **Byte-level processing**: More robust against spelling variations
- **Beam search & sampling**: Allows fine-tuning of output quality
- **Batch processing**: Efficient for large-scale transliteration
- **Mixed precision (FP16)**: Faster inference on compatible GPUs
- **Command-line interface**: Easy to use from the terminal
- **Hugging Face integration**: Automatically downloads and caches the model
## Installation
You can install the package via pip:
```bash
pip install en2th-transliterator
```
## Usage
### As a Python Package
#### Basic Usage
```python
from en2th_transliterator import En2ThTransliterator
# Initialize with the default model
model = En2ThTransliterator()
# Transliterate a single text
thai_text = model.transliterate("hello")
print(f"Thai: {thai_text}")
```
#### Advanced Usage
```python
from en2th_transliterator import En2ThTransliterator
# Initialize with custom parameters
model = En2ThTransliterator(
model_path=None, # Use default HF model
max_length=50,
num_beams=5,
length_penalty=1.5,
verbose=True,
fp16=True # Enable mixed precision
)
# Transliterate using sampling
thai_text = model.transliterate(
"artificial intelligence",
temperature=0.8,
top_k=40,
top_p=0.95
)
print(f"Thai: {thai_text}")
# Batch transliteration
english_texts = ["computer", "keyboard", "mouse", "monitor"]
thai_texts = model.batch_transliterate(
english_texts,
batch_size=2,
temperature=0.5
)
for eng, thai in zip(english_texts, thai_texts):
print(f"{eng} → {thai}")
```
### Command Line Interface
#### Basic Usage
```bash
en2th-transliterate --text "hello"
```
#### Transliterate from a File
```bash
en2th-transliterate --file input.txt --output results.txt
```
#### Output in JSON Format
```bash
en2th-transliterate --file input.txt --format json --output results.json
```
#### Output in TSV Format
```bash
en2th-transliterate --file input.txt --format tsv --output results.tsv
```
#### Using Custom Parameters
```bash
en2th-transliterate --text "hello" --fp16 --temperature 0.7 --num-beams 5
```
## Model
The package utilizes a **ByT5** model fine-tuned on English-to-Thai transliteration data. The model operates at the **byte level**, making it effective for handling various input variations and generating Thai text with high accuracy.
This package uses the [yacht/byt5-base-en2th-transliterator](https://huggingface.co/yacht/byt5-base-en2th-transliterator) model from Hugging Face Hub.
## Performance Optimization
### FP16 Mixed Precision
The package supports FP16 mixed precision for faster inference on compatible GPUs. This is enabled by default but can be disabled if needed:
```python
model = En2ThTransliterator(fp16=False)
```
Or from the command line:
```bash
en2th-transliterate --text "hello" --no-fp16
```
### Batch Processing
For transliterating multiple texts, batch processing is more efficient:
```python
texts = ["hello", "world", "computer", "science"]
results = model.batch_transliterate(texts, batch_size=4)
```
## Development
### Setting Up Development Environment
```bash
# Clone the repository
git clone https://github.com/tchayintr/en2th-transliterator.git
cd en2th-transliterator
# Install in development mode
pip install -e .
```
### Running Tests
```bash
# Create a test script
python test_package.py
```
### Building the Package
```bash
# Install build tools
pip install build twine
# Build the package
python -m build
# Upload to PyPI
python -m twine upload dist/*
```
## License
This project is licensed under the **MIT License** - see the LICENSE file for details.
## Citation
If you use this package in your research, please cite:
```bibtex
@software{en2th_transliterator,
author = {Thodsaporn Chay-intr},
title = {En2Th Transliterator: English to Thai Transliteration using ByT5},
year = {2025},
url = {https://github.com/tchayintr/en2th-transliterator}
}
```
## Acknowledgements
- This package uses the [ByT5](https://huggingface.co/google/byt5-base) architecture developed by Google Research
- The model was fine-tuned on English-Thai transliteration data from [here](https://github.com/wannaphong/thai-english-transliteration-dictionary)
Raw data
{
"_id": null,
"home_page": "https://github.com/tchayintr/en2th-transliterator",
"name": "en2th-transliterator",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "thai, transliteration, nlp, byt5, transformers",
"author": "Thodsaporn Chay-intr",
"author_email": "Thodsaporn Chay-intr <t.chayintr@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/6e/ea/1f6715bb87d984cc9e42229a09783e97e290c014287713f2bd7cc42036a8/en2th_transliterator-0.1.0.tar.gz",
"platform": null,
"description": "# En2Th Transliterator\n\n[](https://badge.fury.io/py/en2th-transliterator)\n[](https://opensource.org/licenses/MIT)\n\nA Python package for transliterating English text to Thai using a **ByT5** model.\n\n## Features\n\n- **Byte-level processing**: More robust against spelling variations\n- **Beam search & sampling**: Allows fine-tuning of output quality\n- **Batch processing**: Efficient for large-scale transliteration\n- **Mixed precision (FP16)**: Faster inference on compatible GPUs\n- **Command-line interface**: Easy to use from the terminal\n- **Hugging Face integration**: Automatically downloads and caches the model\n\n## Installation\n\nYou can install the package via pip:\n\n```bash\npip install en2th-transliterator\n```\n\n## Usage\n\n### As a Python Package\n\n#### Basic Usage\n\n```python\nfrom en2th_transliterator import En2ThTransliterator\n\n# Initialize with the default model\nmodel = En2ThTransliterator()\n\n# Transliterate a single text\nthai_text = model.transliterate(\"hello\")\nprint(f\"Thai: {thai_text}\")\n```\n\n#### Advanced Usage\n\n```python\nfrom en2th_transliterator import En2ThTransliterator\n\n# Initialize with custom parameters\nmodel = En2ThTransliterator(\n model_path=None, # Use default HF model\n max_length=50,\n num_beams=5,\n length_penalty=1.5,\n verbose=True,\n fp16=True # Enable mixed precision\n)\n\n# Transliterate using sampling\nthai_text = model.transliterate(\n \"artificial intelligence\",\n temperature=0.8,\n top_k=40,\n top_p=0.95\n)\nprint(f\"Thai: {thai_text}\")\n\n# Batch transliteration\nenglish_texts = [\"computer\", \"keyboard\", \"mouse\", \"monitor\"]\nthai_texts = model.batch_transliterate(\n english_texts,\n batch_size=2,\n temperature=0.5\n)\n\nfor eng, thai in zip(english_texts, thai_texts):\n print(f\"{eng} \u2192 {thai}\")\n```\n\n### Command Line Interface\n\n#### Basic Usage\n```bash\nen2th-transliterate --text \"hello\"\n```\n\n#### Transliterate from a File\n```bash\nen2th-transliterate --file input.txt --output results.txt\n```\n\n#### Output in JSON Format\n```bash\nen2th-transliterate --file input.txt --format json --output results.json\n```\n\n#### Output in TSV Format\n```bash\nen2th-transliterate --file input.txt --format tsv --output results.tsv\n```\n\n#### Using Custom Parameters\n```bash\nen2th-transliterate --text \"hello\" --fp16 --temperature 0.7 --num-beams 5\n```\n\n## Model\n\nThe package utilizes a **ByT5** model fine-tuned on English-to-Thai transliteration data. The model operates at the **byte level**, making it effective for handling various input variations and generating Thai text with high accuracy.\n\nThis package uses the [yacht/byt5-base-en2th-transliterator](https://huggingface.co/yacht/byt5-base-en2th-transliterator) model from Hugging Face Hub.\n\n## Performance Optimization\n\n### FP16 Mixed Precision\n\nThe package supports FP16 mixed precision for faster inference on compatible GPUs. This is enabled by default but can be disabled if needed:\n\n```python\nmodel = En2ThTransliterator(fp16=False)\n```\n\nOr from the command line:\n\n```bash\nen2th-transliterate --text \"hello\" --no-fp16\n```\n\n### Batch Processing\n\nFor transliterating multiple texts, batch processing is more efficient:\n\n```python\ntexts = [\"hello\", \"world\", \"computer\", \"science\"]\nresults = model.batch_transliterate(texts, batch_size=4)\n```\n\n## Development\n\n### Setting Up Development Environment\n\n```bash\n# Clone the repository\ngit clone https://github.com/tchayintr/en2th-transliterator.git\ncd en2th-transliterator\n\n# Install in development mode\npip install -e .\n```\n\n### Running Tests\n\n```bash\n# Create a test script\npython test_package.py\n```\n\n### Building the Package\n\n```bash\n# Install build tools\npip install build twine\n\n# Build the package\npython -m build\n\n# Upload to PyPI\npython -m twine upload dist/*\n```\n\n## License\n\nThis project is licensed under the **MIT License** - see the LICENSE file for details.\n\n## Citation\n\nIf you use this package in your research, please cite:\n\n```bibtex\n@software{en2th_transliterator,\n author = {Thodsaporn Chay-intr},\n title = {En2Th Transliterator: English to Thai Transliteration using ByT5},\n year = {2025},\n url = {https://github.com/tchayintr/en2th-transliterator}\n}\n```\n\n## Acknowledgements\n\n- This package uses the [ByT5](https://huggingface.co/google/byt5-base) architecture developed by Google Research\n- The model was fine-tuned on English-Thai transliteration data from [here](https://github.com/wannaphong/thai-english-transliteration-dictionary)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python package for transliterating English text to Thai using a ByT5 model",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/tchayintr/en2th-transliterator/issues",
"Homepage": "https://github.com/tchayintr/en2th-transliterator"
},
"split_keywords": [
"thai",
" transliteration",
" nlp",
" byt5",
" transformers"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9cd2e80698c18677c8ba66cd45ead8a1cc1d8718a9c61eed3ab098c96ae13159",
"md5": "514a36fe5baee17bb54d1e9ad89b4c10",
"sha256": "c073e1cdfe88b42f07fabc47e7c07158b6a1f6e0579774dee54868eac315ef0a"
},
"downloads": -1,
"filename": "en2th_transliterator-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "514a36fe5baee17bb54d1e9ad89b4c10",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 10354,
"upload_time": "2025-02-26T04:10:31",
"upload_time_iso_8601": "2025-02-26T04:10:31.962383Z",
"url": "https://files.pythonhosted.org/packages/9c/d2/e80698c18677c8ba66cd45ead8a1cc1d8718a9c61eed3ab098c96ae13159/en2th_transliterator-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6eea1f6715bb87d984cc9e42229a09783e97e290c014287713f2bd7cc42036a8",
"md5": "a6cfc5265089baa7c4d7ea4764dfba30",
"sha256": "45960156448c541d6a7fc242213e07b5c48d6185bd0e9aba92e0ca0f5c8f36e5"
},
"downloads": -1,
"filename": "en2th_transliterator-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "a6cfc5265089baa7c4d7ea4764dfba30",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 10414,
"upload_time": "2025-02-26T04:10:34",
"upload_time_iso_8601": "2025-02-26T04:10:34.925677Z",
"url": "https://files.pythonhosted.org/packages/6e/ea/1f6715bb87d984cc9e42229a09783e97e290c014287713f2bd7cc42036a8/en2th_transliterator-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-26 04:10:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tchayintr",
"github_project": "en2th-transliterator",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "en2th-transliterator"
}