# Khmerocr_tools | Synthetic Data Generator
## Introduction
The Khmer Text Image Generator is a Python library that generates synthetic images containing Khmer text for use in training optical character recognition (OCR) models. It allows users to customize various aspects of the generated images, such as the text content, font style, background color, and blur effect.
## Features
- Generate synthetic images containing Khmer text
- Customize text content from a file
- Choose from multiple font styles
- Option to apply random blur effect to images
- Generate corresponding labels for each image
## Installation
You can install the Khmer Text Image Generator using pip:
```bash
pip install khmerocr_tools
```
## Usage
- create text file to words list eg. dict.txt and put all khmer words you want to gnerate or download [sample data here](https://github.com/MetythornPenn/khmerocr_tools/blob/main/dict.txt)
- create a folder call font and download all font from this link : [font](https://github.com/MetythornPenn/khmerocr_tools/tree/main/font)
- create python script to generate data eg. test.py
```python
from khmerocr_tools import synthetic_data
# Set parameters
image_height = 128
output_folder = 'output'
output_labels_file = 'output/labels.txt'
text_file_path = "dict.txt"
font_option = [1, 2]
# Generate images and labels
synthetic_data(
text_file_path,
image_height,
output_folder,
output_labels_file,
font_option=font_option,
random_blur=True
)
```
## Parameters
- `image_height`: Height of the generated images in pixels.
- `output_folder`: Path to the folder where generated images will be saved.
- `output_labels_file`: Path to the file where labels will be saved.
- `text_file_path`: Path to the text file containing Khmer text for generation.
- `font_option`: List of integers representing font options.
- 1 for AKbalthom KhmerLer Regular.
- 2 for Khmer MEF1 Regular.
- 3 for Khmer OS Battambang Regular.
- 4 for Khmer OS Muol Light Regular.
- 5 for Khmer OS Siemreap Regular.
- Use an empty list [] to select all available fonts.
- `random_blur`: Boolean flag indicating whether to apply random blur effect to images.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/MetythornPenn/khmerocr_tools",
"name": "khmerocr-tools",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "",
"author": "Metythorn Penn",
"author_email": "Metythorn@gmail.com",
"download_url": "",
"platform": null,
"description": "# Khmerocr_tools | Synthetic Data Generator\n\n## Introduction\n\nThe Khmer Text Image Generator is a Python library that generates synthetic images containing Khmer text for use in training optical character recognition (OCR) models. It allows users to customize various aspects of the generated images, such as the text content, font style, background color, and blur effect.\n\n## Features\n\n- Generate synthetic images containing Khmer text\n- Customize text content from a file\n- Choose from multiple font styles\n- Option to apply random blur effect to images\n- Generate corresponding labels for each image\n\n\n## Installation\n\nYou can install the Khmer Text Image Generator using pip:\n\n```bash\npip install khmerocr_tools\n```\n\n\n## Usage\n\n- create text file to words list eg. dict.txt and put all khmer words you want to gnerate or download [sample data here](https://github.com/MetythornPenn/khmerocr_tools/blob/main/dict.txt)\n\n- create a folder call font and download all font from this link : [font](https://github.com/MetythornPenn/khmerocr_tools/tree/main/font)\n\n- create python script to generate data eg. test.py\n```python\nfrom khmerocr_tools import synthetic_data\n\n# Set parameters\nimage_height = 128\noutput_folder = 'output'\noutput_labels_file = 'output/labels.txt'\ntext_file_path = \"dict.txt\"\nfont_option = [1, 2] \n\n# Generate images and labels\nsynthetic_data(\n text_file_path, \n image_height, \n output_folder, \n output_labels_file, \n font_option=font_option, \n random_blur=True\n)\n\n```\n\n## Parameters\n\n- `image_height`: Height of the generated images in pixels.\n- `output_folder`: Path to the folder where generated images will be saved.\n- `output_labels_file`: Path to the file where labels will be saved.\n- `text_file_path`: Path to the text file containing Khmer text for generation.\n- `font_option`: List of integers representing font options. \n - 1 for AKbalthom KhmerLer Regular.\n - 2 for Khmer MEF1 Regular.\n - 3 for Khmer OS Battambang Regular.\n - 4 for Khmer OS Muol Light Regular.\n - 5 for Khmer OS Siemreap Regular.\n - Use an empty list [] to select all available fonts.\n- `random_blur`: Boolean flag indicating whether to apply random blur effect to images.\n\n\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Khmerocr_tools is a Python library that generates synthetic images containing Khmer text",
"version": "0.13",
"project_urls": {
"Homepage": "https://github.com/MetythornPenn/khmerocr_tools"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "16a9b9bdcc03092c646a82a8baad8dfd7d93e6dafb531621fdee7422f1d5c2d5",
"md5": "3d0fce113aa0cb4a935c9ed7e17bca0a",
"sha256": "b035ffdc84cdcd00a76511c99a2b3f1ef61a2d8fd923de59a9d4aeb6bff6e7b0"
},
"downloads": -1,
"filename": "khmerocr_tools-0.13-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3d0fce113aa0cb4a935c9ed7e17bca0a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 4923,
"upload_time": "2024-03-07T15:37:21",
"upload_time_iso_8601": "2024-03-07T15:37:21.772997Z",
"url": "https://files.pythonhosted.org/packages/16/a9/b9bdcc03092c646a82a8baad8dfd7d93e6dafb531621fdee7422f1d5c2d5/khmerocr_tools-0.13-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-07 15:37:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "MetythornPenn",
"github_project": "khmerocr_tools",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "khmerocr-tools"
}