ocr-toolkits


Nameocr-toolkits JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://github.com/MetythornPenn/ocr-toolkits.git
SummaryOcr_tools is a Python library that generates synthetic images containing Khmer text and other important toolbox
upload_time2024-06-11 09:16:19
maintainerNone
docs_urlNone
authorMetythorn Penn
requires_pythonNone
licenseApache Software License 2.0
keywords ocr-toolkits
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # OCR toolkits

## Introduction

Collection of functions to work with ocr and synthetic data generater

## Features

- Generate synthetic images containing Khmer text
- Customize text content from a file
- Choose from multiple font styles
- Option to apply random blur effect to images
- Generate corresponding labels for each image

## Installation

You can install the Khmer Text Image Generator using pip:

```bash
pip install ocr_toolkits
```


## Usage

### move files from a folder to another folder filter by extension
```python
from ocr_toolkits import move_files_ext

# Move all files from 'src' directory to 'dst' directory
move_files_ext(
    src_dir = 'src', 
    dst_dir = 'dst1',
)
# Move only .jpg files from 'src' directory to 'dst' directory
move_files_ext(
    src_dir = 'src', 
    dst_dir = 'dst1',
    ext = '.jpg'
)

```

### change files extension from a folder filter by extension
```python

from ocr_toolkits import change_files_ext

# Example usage
change_files_ext(
  src_dir ='src', 
  dst_dir = 'dst', 
  ext = '.png'
)
```

### delete files from a folder to another folder filter by extension
```python
from ocr_toolkits import delete_files_ext


delete_files_ext(
    dir = 'dst',
    ext = '.jpg',
)
```


### autocorrect gender
```python
from ocr_toolkits.postprocess import autocorrect_gender


corrected_gender_eng = autocorrect_gender("ប្រុ", return_eng=False)
print(corrected_gender_eng)  # Output: Male

# Example usage with return_eng=False (Cambodian output)
corrected_gender_kh = autocorrect_gender("ស្រ", return_eng=False)
print(corrected_gender_kh)  # Output: ស្រី
```

### resize image 
```python
from ocr_toolkits import resize_image

resized_image = resize_image(
    image_path='./images/img.jpg', 
    width=555,
    height=555,
    save=True,
    save_path='save.jpg'
)

```

- create text file to words list eg. dict.txt and put all khmer words you want to gnerate or download [sample data here](https://github.com/MetythornPenn/khmerocr_tools/blob/main/dict.txt)

- create a folder call font and download all font from this link : [font](https://github.com/MetythornPenn/khmerocr_tools/tree/main/font)

- create python script to generate data eg. test.py
```python
from khmerocr_tools import synthetic_data

# Set parameters
image_height = 128
output_folder = 'output'
output_labels_file = 'output/labels.txt'
text_file_path = "dict.txt"
font_option = [1, 2]  

# Generate images and labels
synthetic_data(
    text_file_path, 
    image_height, 
    output_folder, 
    output_labels_file, 
    font_option=font_option, 
    random_blur=True
)

```

## Parameters

- `image_height`: Height of the generated images in pixels.
- `output_folder`: Path to the folder where generated images will be saved.
- `output_labels_file`: Path to the file where labels will be saved.
- `text_file_path`: Path to the text file containing Khmer text for generation.
- `font_option`: List of integers representing font options. 
  - 1 for AKbalthom KhmerLer Regular.
  - 2 for Khmer MEF1 Regular.
  - 3 for Khmer OS Battambang Regular.
  - 4 for Khmer OS Muol Light Regular.
  - 5 for Khmer OS Siemreap Regular.
  - Use an empty list [] to select all available fonts.
- `random_blur`: Boolean flag indicating whether to apply random blur effect to images.



## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/MetythornPenn/ocr-toolkits.git",
    "name": "ocr-toolkits",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "ocr-toolkits",
    "author": "Metythorn Penn",
    "author_email": "metythorn@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/58/4e/46cfac37ce74fc33171ff273b3f58e9c284b65a356ef379c90ec3cb7dfae/ocr-toolkits-0.0.3.tar.gz",
    "platform": null,
    "description": "# OCR toolkits\n\n## Introduction\n\nCollection of functions to work with ocr and synthetic data generater\n\n## Features\n\n- Generate synthetic images containing Khmer text\n- Customize text content from a file\n- Choose from multiple font styles\n- Option to apply random blur effect to images\n- Generate corresponding labels for each image\n\n## Installation\n\nYou can install the Khmer Text Image Generator using pip:\n\n```bash\npip install ocr_toolkits\n```\n\n\n## Usage\n\n### move files from a folder to another folder filter by extension\n```python\nfrom ocr_toolkits import move_files_ext\n\n# Move all files from 'src' directory to 'dst' directory\nmove_files_ext(\n    src_dir = 'src', \n    dst_dir = 'dst1',\n)\n# Move only .jpg files from 'src' directory to 'dst' directory\nmove_files_ext(\n    src_dir = 'src', \n    dst_dir = 'dst1',\n    ext = '.jpg'\n)\n\n```\n\n### change files extension from a folder filter by extension\n```python\n\nfrom ocr_toolkits import change_files_ext\n\n# Example usage\nchange_files_ext(\n  src_dir ='src', \n  dst_dir = 'dst', \n  ext = '.png'\n)\n```\n\n### delete files from a folder to another folder filter by extension\n```python\nfrom ocr_toolkits import delete_files_ext\n\n\ndelete_files_ext(\n    dir = 'dst',\n    ext = '.jpg',\n)\n```\n\n\n### autocorrect gender\n```python\nfrom ocr_toolkits.postprocess import autocorrect_gender\n\n\ncorrected_gender_eng = autocorrect_gender(\"\u1794\u17d2\u179a\u17bb\", return_eng=False)\nprint(corrected_gender_eng)  # Output: Male\n\n# Example usage with return_eng=False (Cambodian output)\ncorrected_gender_kh = autocorrect_gender(\"\u179f\u17d2\u179a\", return_eng=False)\nprint(corrected_gender_kh)  # Output: \u179f\u17d2\u179a\u17b8\n```\n\n### resize image \n```python\nfrom ocr_toolkits import resize_image\n\nresized_image = resize_image(\n    image_path='./images/img.jpg', \n    width=555,\n    height=555,\n    save=True,\n    save_path='save.jpg'\n)\n\n```\n\n- create text file to words list eg. dict.txt and put all khmer words you want to gnerate or download [sample data here](https://github.com/MetythornPenn/khmerocr_tools/blob/main/dict.txt)\n\n- create a folder call font and download all font from this link : [font](https://github.com/MetythornPenn/khmerocr_tools/tree/main/font)\n\n- create python script to generate data eg. test.py\n```python\nfrom khmerocr_tools import synthetic_data\n\n# Set parameters\nimage_height = 128\noutput_folder = 'output'\noutput_labels_file = 'output/labels.txt'\ntext_file_path = \"dict.txt\"\nfont_option = [1, 2]  \n\n# Generate images and labels\nsynthetic_data(\n    text_file_path, \n    image_height, \n    output_folder, \n    output_labels_file, \n    font_option=font_option, \n    random_blur=True\n)\n\n```\n\n## Parameters\n\n- `image_height`: Height of the generated images in pixels.\n- `output_folder`: Path to the folder where generated images will be saved.\n- `output_labels_file`: Path to the file where labels will be saved.\n- `text_file_path`: Path to the text file containing Khmer text for generation.\n- `font_option`: List of integers representing font options. \n  - 1 for AKbalthom KhmerLer Regular.\n  - 2 for Khmer MEF1 Regular.\n  - 3 for Khmer OS Battambang Regular.\n  - 4 for Khmer OS Muol Light Regular.\n  - 5 for Khmer OS Siemreap Regular.\n  - Use an empty list [] to select all available fonts.\n- `random_blur`: Boolean flag indicating whether to apply random blur effect to images.\n\n\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "Ocr_tools is a Python library that generates synthetic images containing Khmer text and other important toolbox",
    "version": "0.0.3",
    "project_urls": {
        "Homepage": "https://github.com/MetythornPenn/ocr-toolkits.git"
    },
    "split_keywords": [
        "ocr-toolkits"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c3d8a2838f7f62166b4cb0270aed47af95ddaac41e59ce61c02f7db0283da506",
                "md5": "cbed219a4c364c0098a499487df93812",
                "sha256": "0fcd706d5361a656c7754cabc957ec465390418f08b41bfd6d4a38701c4dce12"
            },
            "downloads": -1,
            "filename": "ocr_toolkits-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cbed219a4c364c0098a499487df93812",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9238,
            "upload_time": "2024-06-11T09:16:17",
            "upload_time_iso_8601": "2024-06-11T09:16:17.727040Z",
            "url": "https://files.pythonhosted.org/packages/c3/d8/a2838f7f62166b4cb0270aed47af95ddaac41e59ce61c02f7db0283da506/ocr_toolkits-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "584e46cfac37ce74fc33171ff273b3f58e9c284b65a356ef379c90ec3cb7dfae",
                "md5": "8ae81f4949b1cbb1ea5324dfc0ccb608",
                "sha256": "ad4ceec815f3b100f5e6de24f667e016045f2e7604d51e1baeae199a9e3d2833"
            },
            "downloads": -1,
            "filename": "ocr-toolkits-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "8ae81f4949b1cbb1ea5324dfc0ccb608",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 8822,
            "upload_time": "2024-06-11T09:16:19",
            "upload_time_iso_8601": "2024-06-11T09:16:19.581008Z",
            "url": "https://files.pythonhosted.org/packages/58/4e/46cfac37ce74fc33171ff273b3f58e9c284b65a356ef379c90ec3cb7dfae/ocr-toolkits-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-11 09:16:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MetythornPenn",
    "github_project": "ocr-toolkits",
    "github_not_found": true,
    "lcname": "ocr-toolkits"
}
        
Elapsed time: 0.43321s