# OCR toolkits
## Introduction
Collection of functions to work with ocr and synthetic data generater
## Features
- Generate synthetic images containing Khmer text
- Customize text content from a file
- Choose from multiple font styles
- Option to apply random blur effect to images
- Generate corresponding labels for each image
## Installation
You can install the Khmer Text Image Generator using pip:
```bash
pip install ocr_toolkits
```
## Usage
### move files from a folder to another folder filter by extension
```python
from ocr_toolkits import move_files_ext
# Move all files from 'src' directory to 'dst' directory
move_files_ext(
src_dir = 'src',
dst_dir = 'dst1',
)
# Move only .jpg files from 'src' directory to 'dst' directory
move_files_ext(
src_dir = 'src',
dst_dir = 'dst1',
ext = '.jpg'
)
```
### change files extension from a folder filter by extension
```python
from ocr_toolkits import change_files_ext
# Example usage
change_files_ext(
src_dir ='src',
dst_dir = 'dst',
ext = '.png'
)
```
### delete files from a folder to another folder filter by extension
```python
from ocr_toolkits import delete_files_ext
delete_files_ext(
dir = 'dst',
ext = '.jpg',
)
```
### autocorrect gender
```python
from ocr_toolkits.postprocess import autocorrect_gender
corrected_gender_eng = autocorrect_gender("ប្រុ", return_eng=False)
print(corrected_gender_eng) # Output: Male
# Example usage with return_eng=False (Cambodian output)
corrected_gender_kh = autocorrect_gender("ស្រ", return_eng=False)
print(corrected_gender_kh) # Output: ស្រី
```
### resize image
```python
from ocr_toolkits import resize_image
resized_image = resize_image(
image_path='./images/img.jpg',
width=555,
height=555,
save=True,
save_path='save.jpg'
)
```
- create text file to words list eg. dict.txt and put all khmer words you want to gnerate or download [sample data here](https://github.com/MetythornPenn/khmerocr_tools/blob/main/dict.txt)
- create a folder call font and download all font from this link : [font](https://github.com/MetythornPenn/khmerocr_tools/tree/main/font)
- create python script to generate data eg. test.py
```python
from khmerocr_tools import synthetic_data
# Set parameters
image_height = 128
output_folder = 'output'
output_labels_file = 'output/labels.txt'
text_file_path = "dict.txt"
font_option = [1, 2]
# Generate images and labels
synthetic_data(
text_file_path,
image_height,
output_folder,
output_labels_file,
font_option=font_option,
random_blur=True
)
```
## Parameters
- `image_height`: Height of the generated images in pixels.
- `output_folder`: Path to the folder where generated images will be saved.
- `output_labels_file`: Path to the file where labels will be saved.
- `text_file_path`: Path to the text file containing Khmer text for generation.
- `font_option`: List of integers representing font options.
- 1 for AKbalthom KhmerLer Regular.
- 2 for Khmer MEF1 Regular.
- 3 for Khmer OS Battambang Regular.
- 4 for Khmer OS Muol Light Regular.
- 5 for Khmer OS Siemreap Regular.
- Use an empty list [] to select all available fonts.
- `random_blur`: Boolean flag indicating whether to apply random blur effect to images.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/MetythornPenn/ocr-toolkits.git",
"name": "ocr-toolkits",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "ocr-toolkits",
"author": "Metythorn Penn",
"author_email": "metythorn@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/58/4e/46cfac37ce74fc33171ff273b3f58e9c284b65a356ef379c90ec3cb7dfae/ocr-toolkits-0.0.3.tar.gz",
"platform": null,
"description": "# OCR toolkits\n\n## Introduction\n\nCollection of functions to work with ocr and synthetic data generater\n\n## Features\n\n- Generate synthetic images containing Khmer text\n- Customize text content from a file\n- Choose from multiple font styles\n- Option to apply random blur effect to images\n- Generate corresponding labels for each image\n\n## Installation\n\nYou can install the Khmer Text Image Generator using pip:\n\n```bash\npip install ocr_toolkits\n```\n\n\n## Usage\n\n### move files from a folder to another folder filter by extension\n```python\nfrom ocr_toolkits import move_files_ext\n\n# Move all files from 'src' directory to 'dst' directory\nmove_files_ext(\n src_dir = 'src', \n dst_dir = 'dst1',\n)\n# Move only .jpg files from 'src' directory to 'dst' directory\nmove_files_ext(\n src_dir = 'src', \n dst_dir = 'dst1',\n ext = '.jpg'\n)\n\n```\n\n### change files extension from a folder filter by extension\n```python\n\nfrom ocr_toolkits import change_files_ext\n\n# Example usage\nchange_files_ext(\n src_dir ='src', \n dst_dir = 'dst', \n ext = '.png'\n)\n```\n\n### delete files from a folder to another folder filter by extension\n```python\nfrom ocr_toolkits import delete_files_ext\n\n\ndelete_files_ext(\n dir = 'dst',\n ext = '.jpg',\n)\n```\n\n\n### autocorrect gender\n```python\nfrom ocr_toolkits.postprocess import autocorrect_gender\n\n\ncorrected_gender_eng = autocorrect_gender(\"\u1794\u17d2\u179a\u17bb\", return_eng=False)\nprint(corrected_gender_eng) # Output: Male\n\n# Example usage with return_eng=False (Cambodian output)\ncorrected_gender_kh = autocorrect_gender(\"\u179f\u17d2\u179a\", return_eng=False)\nprint(corrected_gender_kh) # Output: \u179f\u17d2\u179a\u17b8\n```\n\n### resize image \n```python\nfrom ocr_toolkits import resize_image\n\nresized_image = resize_image(\n image_path='./images/img.jpg', \n width=555,\n height=555,\n save=True,\n save_path='save.jpg'\n)\n\n```\n\n- create text file to words list eg. dict.txt and put all khmer words you want to gnerate or download [sample data here](https://github.com/MetythornPenn/khmerocr_tools/blob/main/dict.txt)\n\n- create a folder call font and download all font from this link : [font](https://github.com/MetythornPenn/khmerocr_tools/tree/main/font)\n\n- create python script to generate data eg. test.py\n```python\nfrom khmerocr_tools import synthetic_data\n\n# Set parameters\nimage_height = 128\noutput_folder = 'output'\noutput_labels_file = 'output/labels.txt'\ntext_file_path = \"dict.txt\"\nfont_option = [1, 2] \n\n# Generate images and labels\nsynthetic_data(\n text_file_path, \n image_height, \n output_folder, \n output_labels_file, \n font_option=font_option, \n random_blur=True\n)\n\n```\n\n## Parameters\n\n- `image_height`: Height of the generated images in pixels.\n- `output_folder`: Path to the folder where generated images will be saved.\n- `output_labels_file`: Path to the file where labels will be saved.\n- `text_file_path`: Path to the text file containing Khmer text for generation.\n- `font_option`: List of integers representing font options. \n - 1 for AKbalthom KhmerLer Regular.\n - 2 for Khmer MEF1 Regular.\n - 3 for Khmer OS Battambang Regular.\n - 4 for Khmer OS Muol Light Regular.\n - 5 for Khmer OS Siemreap Regular.\n - Use an empty list [] to select all available fonts.\n- `random_blur`: Boolean flag indicating whether to apply random blur effect to images.\n\n\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n\n",
"bugtrack_url": null,
"license": "Apache Software License 2.0",
"summary": "Ocr_tools is a Python library that generates synthetic images containing Khmer text and other important toolbox",
"version": "0.0.3",
"project_urls": {
"Homepage": "https://github.com/MetythornPenn/ocr-toolkits.git"
},
"split_keywords": [
"ocr-toolkits"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c3d8a2838f7f62166b4cb0270aed47af95ddaac41e59ce61c02f7db0283da506",
"md5": "cbed219a4c364c0098a499487df93812",
"sha256": "0fcd706d5361a656c7754cabc957ec465390418f08b41bfd6d4a38701c4dce12"
},
"downloads": -1,
"filename": "ocr_toolkits-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cbed219a4c364c0098a499487df93812",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 9238,
"upload_time": "2024-06-11T09:16:17",
"upload_time_iso_8601": "2024-06-11T09:16:17.727040Z",
"url": "https://files.pythonhosted.org/packages/c3/d8/a2838f7f62166b4cb0270aed47af95ddaac41e59ce61c02f7db0283da506/ocr_toolkits-0.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "584e46cfac37ce74fc33171ff273b3f58e9c284b65a356ef379c90ec3cb7dfae",
"md5": "8ae81f4949b1cbb1ea5324dfc0ccb608",
"sha256": "ad4ceec815f3b100f5e6de24f667e016045f2e7604d51e1baeae199a9e3d2833"
},
"downloads": -1,
"filename": "ocr-toolkits-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "8ae81f4949b1cbb1ea5324dfc0ccb608",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 8822,
"upload_time": "2024-06-11T09:16:19",
"upload_time_iso_8601": "2024-06-11T09:16:19.581008Z",
"url": "https://files.pythonhosted.org/packages/58/4e/46cfac37ce74fc33171ff273b3f58e9c284b65a356ef379c90ec3cb7dfae/ocr-toolkits-0.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-11 09:16:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "MetythornPenn",
"github_project": "ocr-toolkits",
"github_not_found": true,
"lcname": "ocr-toolkits"
}