# Vietnamese Handwriting Text Recognition (aka vnhtr package)
This project deploys and improves two foundational models within [TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr) and [VietOCR](https://github.com/pbcquoc/vietocr).
## Proposal Architecture
### VGG Transformer with Rethinking Head
![VGG Transformer with Rethinking Head](https://github.com/nguyenhoanganh2002/vnhtr/assets/79850337/82876cdd-b84a-47da-9339-6362bd0400d1)
### TrOCR with Rethinking Head
![TrOCR with Rethinking Head](https://github.com/nguyenhoanganh2002/vnhtr/assets/79850337/9295c94f-5059-4a03-a3f3-950e0ab92e30)
## Usage
### `vnhtr` package
```bash
pip install vnhtr
```
```python
from PIL import Image
from vnhtr.vnhtr_script.tools import *
vta_predictor = VGGTransformer("cuda:0")
tra_predictor = TrOCR("cuda:0")
vta_predictor.predict([Image.open("/content/out_sample_2.jpg")])
tra_predictor.predict([Image.open("/content/out_sample_2.jpg")])
```
### Fully implemented
```bash
git clone https://github.com/nguyenhoanganh2002/vnhtr
cd ./vnhtr/vnhtr/source
pip install -r requirements.txt
```
* Pretrain/Fintune VGG Transformer/TrOCR (pretraining on a large dataset and then finetuning on a wild dataset)
```bash
python VGGTransformer/train.py
python VisionEncoderDecoder/train.py
```
* Pretrain VGG Transformer/TrOCR with Rethinking Head (large dataset)
```bash
python VGGTransformer/adapter_trainer.py
python VisionEncoderDecoder/adapter_trainer.py
```
* Finetune VGG Transformer with Rethinking Head (wild dataset)
```bash
python VGGTransformer/finetune.py
python VisionEncoderDecoder/finetune.py
```
* Access the model without going through the training or finetuning phases.
```python
from VGGTransformer.config import config as vggtransformer_cf
from VGGTransformer.models import VGGTransformer, AdapterVGGTransformer
from VisionEncoderDecoder.config import config as trocr_cf
from VisionEncoderDecoder.model import VNTrOCR, AdapterVNTrOCR
vt_base = VGGTransformer(vggtransformer_cf)
vt_adapter = AdapterVGGTransformer(vggtransformer_cf)
tr_base = VNTrOCR(trocr_cf)
tr_adapter = AdapterVNTrOCR(trocr_cf)
```
For access to the full dataset and pretrained weights, please contact: [anh.nh204511@gmail.com](mailto:anh.nh204511@gmail.com)
Raw data
{
"_id": null,
"home_page": "https://github.com/nguyenhoanganh2002/vnhtr",
"name": "vnhtr",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "ocr,vnocr,htr,vnhtr",
"author": "nguyenhoanganh2002",
"author_email": "anh.nh204511@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/c9/ee/0e75d3c39b87df9daacee0e04f640f014439663e62cc6f7f515f237bb046/vnhtr-0.1.8.tar.gz",
"platform": null,
"description": "# Vietnamese Handwriting Text Recognition (aka vnhtr package)\n\nThis project deploys and improves two foundational models within [TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr) and [VietOCR](https://github.com/pbcquoc/vietocr).\n\n## Proposal Architecture\n### VGG Transformer with Rethinking Head\n![VGG Transformer with Rethinking Head](https://github.com/nguyenhoanganh2002/vnhtr/assets/79850337/82876cdd-b84a-47da-9339-6362bd0400d1)\n### TrOCR with Rethinking Head\n![TrOCR with Rethinking Head](https://github.com/nguyenhoanganh2002/vnhtr/assets/79850337/9295c94f-5059-4a03-a3f3-950e0ab92e30)\n## Usage\n### `vnhtr` package\n```bash\npip install vnhtr\n```\n```python\nfrom PIL import Image\nfrom vnhtr.vnhtr_script.tools import *\n\nvta_predictor = VGGTransformer(\"cuda:0\")\ntra_predictor = TrOCR(\"cuda:0\")\n\nvta_predictor.predict([Image.open(\"/content/out_sample_2.jpg\")])\ntra_predictor.predict([Image.open(\"/content/out_sample_2.jpg\")])\n```\n### Fully implemented\n```bash\ngit clone https://github.com/nguyenhoanganh2002/vnhtr\ncd ./vnhtr/vnhtr/source\npip install -r requirements.txt\n```\n* Pretrain/Fintune VGG Transformer/TrOCR (pretraining on a large dataset and then finetuning on a wild dataset) \n```bash\npython VGGTransformer/train.py\npython VisionEncoderDecoder/train.py\n```\n* Pretrain VGG Transformer/TrOCR with Rethinking Head (large dataset)\n```bash\npython VGGTransformer/adapter_trainer.py\npython VisionEncoderDecoder/adapter_trainer.py\n```\n* Finetune VGG Transformer with Rethinking Head (wild dataset)\n```bash\npython VGGTransformer/finetune.py\npython VisionEncoderDecoder/finetune.py\n```\n* Access the model without going through the training or finetuning phases.\n```python\nfrom VGGTransformer.config import config as vggtransformer_cf\nfrom VGGTransformer.models import VGGTransformer, AdapterVGGTransformer\nfrom VisionEncoderDecoder.config import config as trocr_cf\nfrom VisionEncoderDecoder.model import VNTrOCR, AdapterVNTrOCR\n\nvt_base = VGGTransformer(vggtransformer_cf)\nvt_adapter = AdapterVGGTransformer(vggtransformer_cf)\ntr_base = VNTrOCR(trocr_cf)\ntr_adapter = AdapterVNTrOCR(trocr_cf)\n```\n\nFor access to the full dataset and pretrained weights, please contact: [anh.nh204511@gmail.com](mailto:anh.nh204511@gmail.com)\n",
"bugtrack_url": null,
"license": "",
"summary": "Encoder-Decoder base for Vietnamese handwriting recognition",
"version": "0.1.8",
"project_urls": {
"Homepage": "https://github.com/nguyenhoanganh2002/vnhtr"
},
"split_keywords": [
"ocr",
"vnocr",
"htr",
"vnhtr"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "65dba81df657c54395cb3b716cb833caf4989dbe32b617955a88e3435068ca17",
"md5": "6bb979c4e3ccaa67d7b0e190b4345334",
"sha256": "88dab8c51e4d641a6de8127dd8d3902f30198235ef976b05e6b1bd0c75d28725"
},
"downloads": -1,
"filename": "vnhtr-0.1.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6bb979c4e3ccaa67d7b0e190b4345334",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 50592,
"upload_time": "2024-01-13T13:57:57",
"upload_time_iso_8601": "2024-01-13T13:57:57.196623Z",
"url": "https://files.pythonhosted.org/packages/65/db/a81df657c54395cb3b716cb833caf4989dbe32b617955a88e3435068ca17/vnhtr-0.1.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c9ee0e75d3c39b87df9daacee0e04f640f014439663e62cc6f7f515f237bb046",
"md5": "4fd62697b313f397a99ec65c94a234ba",
"sha256": "39bb0fe41c4ed1d6f2a3bf6e879aaafbc53ba7eaeb75e7d64c6448d860215d19"
},
"downloads": -1,
"filename": "vnhtr-0.1.8.tar.gz",
"has_sig": false,
"md5_digest": "4fd62697b313f397a99ec65c94a234ba",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 32903,
"upload_time": "2024-01-13T13:57:59",
"upload_time_iso_8601": "2024-01-13T13:57:59.472876Z",
"url": "https://files.pythonhosted.org/packages/c9/ee/0e75d3c39b87df9daacee0e04f640f014439663e62cc6f7f515f237bb046/vnhtr-0.1.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-13 13:57:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nguyenhoanganh2002",
"github_project": "vnhtr",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "vnhtr"
}