# apsis-ocr
![](/deployment/images/apsis.png)
Apsis-OCR is a Mixed language ocr system for Printed Documents developed at [Apsis Solutions limited](https://apsissolutions.com/)
The full system is build with 3 components:
* Text detection : DBNet
* Text recognition:
* Bangla Text : ApsisNet
* ApsisNet is a model developed at Apsis Solutions Limited.
* It is used by [bbOCR](https://github.com/BengaliAI/bbocr/blob/dev/modules.md) as the recognition model
* ApsisNet is found to be the best among other available recognition models (such as tesseract and easyOCR) in the linked [paper](https://arxiv.org/abs/2308.10647)
* English Text : SVTR-LCNet
* Text classification : DenseNet121
# **Installation**
## **As module/pypi package**
### **cpu installation**
```bash
pip install apsisocr
pip install onnxruntime
pip install fastdeploy-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
```
### **gpu installation**
It is recommended to use conda environment . Specially for GPU.
* **installing cudatoolkit and cudnn**:
```bash
conda install cudatoolkit
conda install cudnn
```
* **installing packages**
```bash
pip install apsisocr
pip install onnxruntime-gpu
python -m pip install -U fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
```
* **exporting environment variables**
```bash
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
```
## **Building from source : Linux/Ubuntu**
It is recommended to use conda environment .
* **clone the repository** :
```bash
git clone https://github.com/mnansary/apsisOCR.git
cd apsisOCR
```
* **create a conda environment**:
```bash
conda create -n apsisocr python=3.9
```
* **activate conda environment**:
```bash
conda activate apsisocr
```
* **cpu installation** :
```bash
bash install.sh cpu
```
* **gpu installation** :
```bash
bash install.sh gpu
```
# Useage
## Apsisnet : Bangla Recognizer
* useage
```python
from apsisocr import ApsisNet
bnocr=ApsisNet()
bnocr.infer(crops)
```
* docstring for ```ApsisNet.infer```
```python
"""
Perform inference on image crops.
Args:
crops (list[np.ndarray]): List of image crops.
batch_size (int): Batch size for inference (default: 32).
normalize_unicode (bool): Flag to normalize unicode (default: True).
Returns:
list[str]: List of inferred texts.
"""
```
## SVTR-LCNet : English Recognizer
```python
from apsisocr import SVTRLCNet
enocr=SVTRLCNet()
enocr.infer(crops)
```
* docstring for ```SVTRLCNet.infer```
```python
"""
Perform inference on image crops.
Args:
crops (list[np.ndarray]): List of image crops.
batch_size (int): Batch size for inference.
Returns:
list[str]: List of recognized texts.
"""
```
## DenseNet121BnEnClassifier : Language classifier
```python
from apsisocr import DenseNet121BnEnClassifier
lang=DenseNet121BnEnClassifier()
lang.infer(crops)
```
* docstring for ```DenseNet121BnEnClassifier.infer```
```python
"""
Perform inference on image crops.
Args:
crops (list[np.ndarray]): List of image crops.
batch_size (int): Batch size for inference (default: 32).
Returns:
list[str]: List of inferred languages.
"""
```
## PaddleDBNet : Text Detector
* check [paddleOCR](https://github.com/PaddlePaddle/PaddleOCR) official website for better understanding of the model
```python
# initialization
from apsisocr import PaddleDBNet
detector=PaddleDBNet()
# getting word boxes
word_boxes=detector.get_word_boxes(img)
# getting line boxes
line_boxes=detector.get_line_boxes(img)
# getting crop with either of the results
crops=detector.get_crops(img,word_boxes)
```
## ApsisOCR : Overall System
```python
from apsisocr import ApsisOCR
ocr=ApsisOCR()
results=ocr(img_path)
```
* docstring for ```ApsisOCR.__call__```
```python
"""
Perform OCR on an image.
Args:
img_path (str): Path to the image file.
Returns:
dict: OCR results containing recognized text and associated information. The dictionary has the following structre
{
"text" : multiline text with newline separators
"result" : list a dictionaries that contains the following structre:
{
"line_no" : the line number of the word
"word_no" : the word number in the line
"poly" : the four point polygonal bounding box of the word in the image
"text" : the recognized text
"lang" : the classified language code
}
}
"""
```
**check ```useage/useage.ipynb``` for examples**
# **Deployment**
* ```cd deployment```: change directory to deployment folder
* change the configs as required in ```config.py```
```python
# This port will be used by the api_ocr.py
OCR_API_PORT=3032
# This api address is diplayed after deploying api_ocr.py and this is used in app.py
OCR_API="http://172.20.4.53:3032/ocr"
```
* running the api and app:
```bash
python api_ocr.py # deploys at port 3032 by defautl
streamlit run app.py --server.port 3033 # deploys streamlit built frontend at 3033 port
```
* The **api_ocr.py** lets the api to be used without any UI (a postman screenshot is attached below)
![](/deployment/images/api_ocr.png)
* The **app.py** runs a streamlit UI
![](/deployment/images/app.png)
**TESTED GPU INFERENCE SERVER CONFIG**
```python
OS : Ubuntu 20.04.6 LTS
Memory : 62.4 GiB
Processor : Intel® Xeon(R) Silver 4214R CPU @ 2.40GHz × 24
Graphics : NVIDIA RTX A6000/PCIe/SSE2
Gnome : 3.36.8
```
# License
Contents of this repository are restricted to non-commercial research purposes only under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/).
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>
Change Log
===========
0.0.1 (24/09/2023)
-------------------
- added useage and examples
0.0.2 (24/09/2023)
-------------------
- added liscencing
0.0.3 (05/11/2023)
-------------------
- added base application without line and word number
0.0.4 (01/05/2024)
-------------------
- apsisbnocr
0.0.5 (22/11/2024)
------------------
- added Null handling to apsisbnocr
0.0.6 (22/11/2024)
------------------
- added Null handling to apsisbnocr - output typo update
0.0.7 (22/11/2024)
------------------
- BNbaseOCR -> location and text only
Raw data
{
"_id": null,
"home_page": "https://github.com/mnansary/apsis-ocr/",
"name": "apsisocr",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "ocr, multilingual-ocr, scene ocr, apsisnet",
"author": "Nazmuddoha Ansary",
"author_email": "nazmuddoha.ansary@apsissolutions.com",
"download_url": null,
"platform": null,
"description": "# apsis-ocr\n![](/deployment/images/apsis.png) \n\nApsis-OCR is a Mixed language ocr system for Printed Documents developed at [Apsis Solutions limited](https://apsissolutions.com/)\n\nThe full system is build with 3 components: \n* Text detection : DBNet\n* Text recognition:\n * Bangla Text : ApsisNet \n * ApsisNet is a model developed at Apsis Solutions Limited. \n * It is used by [bbOCR](https://github.com/BengaliAI/bbocr/blob/dev/modules.md) as the recognition model \n * ApsisNet is found to be the best among other available recognition models (such as tesseract and easyOCR) in the linked [paper](https://arxiv.org/abs/2308.10647)\n * English Text : SVTR-LCNet\n* Text classification : DenseNet121 \n\n\n# **Installation**\n\n\n## **As module/pypi package**\n### **cpu installation**\n\n```bash\npip install apsisocr\npip install onnxruntime\npip install fastdeploy-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html\n```\n\n### **gpu installation**\n\nIt is recommended to use conda environment . Specially for GPU.\n\n* **installing cudatoolkit and cudnn**: \n\n```bash\nconda install cudatoolkit\nconda install cudnn\n```\n\n* **installing packages**\n\n```bash\npip install apsisocr\npip install onnxruntime-gpu\npython -m pip install -U fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html\n```\n\n* **exporting environment variables**\n\n```bash\nmkdir -p $CONDA_PREFIX/etc/conda/activate.d\necho 'export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh\n```\n\n## **Building from source : Linux/Ubuntu**\nIt is recommended to use conda environment .\n\n* **clone the repository** : \n```bash\ngit clone https://github.com/mnansary/apsisOCR.git\ncd apsisOCR\n```\n\n\n* **create a conda environment**: \n\n```bash\nconda create -n apsisocr python=3.9\n```\n\n* **activate conda environment**: \n\n```bash\nconda activate apsisocr\n\n```\n* **cpu installation** :\n\n```bash\nbash install.sh cpu\n``` \n* **gpu installation** :\n \n```bash\nbash install.sh gpu\n``` \n\n# Useage\n\n\n## Apsisnet : Bangla Recognizer\n\n* useage\n```python\nfrom apsisocr import ApsisNet\nbnocr=ApsisNet()\nbnocr.infer(crops)\n```\n* docstring for ```ApsisNet.infer```\n\n```python\n\"\"\"\nPerform inference on image crops.\n\nArgs:\n crops (list[np.ndarray]): List of image crops.\n batch_size (int): Batch size for inference (default: 32).\n normalize_unicode (bool): Flag to normalize unicode (default: True).\n\nReturns:\n list[str]: List of inferred texts.\n\"\"\"\n```\n\n## SVTR-LCNet : English Recognizer\n\n```python\nfrom apsisocr import SVTRLCNet\nenocr=SVTRLCNet()\nenocr.infer(crops)\n```\n\n* docstring for ```SVTRLCNet.infer```\n\n```python\n\"\"\"\nPerform inference on image crops.\n\nArgs:\n crops (list[np.ndarray]): List of image crops.\n batch_size (int): Batch size for inference.\n\nReturns:\n list[str]: List of recognized texts.\n\"\"\"\n```\n\n\n## DenseNet121BnEnClassifier : Language classifier\n\n```python\nfrom apsisocr import DenseNet121BnEnClassifier\nlang=DenseNet121BnEnClassifier()\nlang.infer(crops)\n```\n\n* docstring for ```DenseNet121BnEnClassifier.infer```\n\n```python\n\"\"\"\nPerform inference on image crops.\n\nArgs:\n crops (list[np.ndarray]): List of image crops.\n batch_size (int): Batch size for inference (default: 32).\n\nReturns:\n list[str]: List of inferred languages.\n\"\"\"\n```\n\n## PaddleDBNet : Text Detector\n\n* check [paddleOCR](https://github.com/PaddlePaddle/PaddleOCR) official website for better understanding of the model\n\n```python\n# initialization\nfrom apsisocr import PaddleDBNet\ndetector=PaddleDBNet()\n# getting word boxes\nword_boxes=detector.get_word_boxes(img)\n# getting line boxes\nline_boxes=detector.get_line_boxes(img)\n# getting crop with either of the results\ncrops=detector.get_crops(img,word_boxes)\n```\n\n## ApsisOCR : Overall System\n\n```python\nfrom apsisocr import ApsisOCR\nocr=ApsisOCR()\nresults=ocr(img_path)\n```\n\n* docstring for ```ApsisOCR.__call__```\n\n```python\n\"\"\"\nPerform OCR on an image.\n\nArgs:\n img_path (str): Path to the image file.\n\nReturns:\n dict: OCR results containing recognized text and associated information. The dictionary has the following structre\n {\n \"text\" : multiline text with newline separators\n \"result\" : list a dictionaries that contains the following structre:\n {\n \"line_no\" : the line number of the word\n \"word_no\" : the word number in the line \n \"poly\" : the four point polygonal bounding box of the word in the image\n \"text\" : the recognized text \n \"lang\" : the classified language code\n }\n }\n\"\"\" \n```\n\n**check ```useage/useage.ipynb``` for examples**\n\n\n# **Deployment**\n* ```cd deployment```: change directory to deployment folder\n* change the configs as required in ```config.py```\n\n```python\n# This port will be used by the api_ocr.py \nOCR_API_PORT=3032\n# This api address is diplayed after deploying api_ocr.py and this is used in app.py \nOCR_API=\"http://172.20.4.53:3032/ocr\"\n```\n* running the api and app:\n\n```bash\npython api_ocr.py # deploys at port 3032 by defautl\nstreamlit run app.py --server.port 3033 # deploys streamlit built frontend at 3033 port\n```\n* The **api_ocr.py** lets the api to be used without any UI (a postman screenshot is attached below)\n\n![](/deployment/images/api_ocr.png) \n\n* The **app.py** runs a streamlit UI \n\n![](/deployment/images/app.png) \n\n\n**TESTED GPU INFERENCE SERVER CONFIG** \n\n```python\nOS : Ubuntu 20.04.6 LTS \nMemory : 62.4\u00a0GiB \nProcessor : Intel\u00ae Xeon(R) Silver 4214R CPU @ 2.40GHz \u00d7 24 \nGraphics : NVIDIA RTX A6000/PCIe/SSE2\nGnome : 3.36.8\n```\n# License\nContents of this repository are restricted to non-commercial research purposes only under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). \n\n<a rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\"><img alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png\" /></a>\n\n\n\nChange Log\n===========\n\n0.0.1 (24/09/2023)\n-------------------\n- added useage and examples\n\n0.0.2 (24/09/2023)\n-------------------\n- added liscencing\n\n0.0.3 (05/11/2023)\n-------------------\n- added base application without line and word number\n\n0.0.4 (01/05/2024)\n-------------------\n- apsisbnocr\n\n0.0.5 (22/11/2024)\n------------------\n- added Null handling to apsisbnocr\n\n0.0.6 (22/11/2024)\n------------------\n- added Null handling to apsisbnocr - output typo update\n\n0.0.7 (22/11/2024)\n------------------\n- BNbaseOCR -> location and text only\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Bangla English Mixed Languge Scene / printed OCR Toolkit",
"version": "0.0.7",
"project_urls": {
"Homepage": "https://github.com/mnansary/apsis-ocr/"
},
"split_keywords": [
"ocr",
" multilingual-ocr",
" scene ocr",
" apsisnet"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9a351e2b960548f439589b840acc9bddbba9a0e59697de11a79f9ca683871217",
"md5": "e03d3d54cc1864458ccfa7ac38ec1743",
"sha256": "a8d4b8596c0336baa43b31db99ba6775eae72bbe2315fcaf8f13acff5e362f56"
},
"downloads": -1,
"filename": "apsisocr-0.0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e03d3d54cc1864458ccfa7ac38ec1743",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 22568,
"upload_time": "2024-11-21T23:24:39",
"upload_time_iso_8601": "2024-11-21T23:24:39.124369Z",
"url": "https://files.pythonhosted.org/packages/9a/35/1e2b960548f439589b840acc9bddbba9a0e59697de11a79f9ca683871217/apsisocr-0.0.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-21 23:24:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mnansary",
"github_project": "apsis-ocr",
"github_not_found": true,
"lcname": "apsisocr"
}