apsisocr

Name	apsisocr JSON
Version	0.0.7 JSON
	download
home_page	https://github.com/mnansary/apsis-ocr/
Summary	Bangla English Mixed Languge Scene / printed OCR Toolkit
upload_time	2024-11-21 23:24:39
maintainer	None
docs_url	None
author	Nazmuddoha Ansary
requires_python	None
license	MIT
keywords	ocr multilingual-ocr scene ocr apsisnet
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # apsis-ocr
![](/deployment/images/apsis.png) 

Apsis-OCR is a Mixed language ocr system for Printed Documents developed at [Apsis Solutions limited](https://apsissolutions.com/)

The full system is build with 3 components: 
* Text detection : DBNet
* Text recognition:
    * Bangla Text : ApsisNet 
        * ApsisNet is a model developed at Apsis Solutions Limited. 
        * It is used by [bbOCR](https://github.com/BengaliAI/bbocr/blob/dev/modules.md) as the recognition model 
        * ApsisNet is found to be the best among other available recognition models (such as tesseract and easyOCR) in the linked [paper](https://arxiv.org/abs/2308.10647)
    * English Text : SVTR-LCNet
* Text classification : DenseNet121    


# **Installation**


## **As module/pypi package**
### **cpu installation**

```bash
pip install apsisocr
pip install onnxruntime
pip install fastdeploy-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
```

### **gpu installation**

It is recommended to use conda environment . Specially for GPU.

* **installing cudatoolkit and cudnn**: 

```bash
conda install cudatoolkit
conda install cudnn
```

* **installing packages**

```bash
pip install apsisocr
pip install onnxruntime-gpu
python -m pip install -U fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
```

* **exporting environment variables**

```bash
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
```

## **Building from source : Linux/Ubuntu**
It is recommended to use conda environment .

* **clone the repository** : 
```bash
git clone https://github.com/mnansary/apsisOCR.git
cd apsisOCR
```


* **create a conda environment**: 

```bash
conda create -n apsisocr python=3.9
```

* **activate conda environment**: 

```bash
conda activate apsisocr

```
* **cpu installation**  :

```bash
bash install.sh cpu
``` 
* **gpu installation**  :
    
```bash
bash install.sh gpu
``` 

# Useage


## Apsisnet : Bangla Recognizer

* useage
```python
from apsisocr import ApsisNet
bnocr=ApsisNet()
bnocr.infer(crops)
```
* docstring for ```ApsisNet.infer```

```python
"""
Perform inference on image crops.

Args:
    crops (list[np.ndarray]): List of image crops.
    batch_size (int): Batch size for inference (default: 32).
    normalize_unicode (bool): Flag to normalize unicode (default: True).

Returns:
    list[str]: List of inferred texts.
"""
```

## SVTR-LCNet : English Recognizer

```python
from apsisocr import SVTRLCNet
enocr=SVTRLCNet()
enocr.infer(crops)
```

* docstring for ```SVTRLCNet.infer```

```python
"""
Perform inference on image crops.

Args:
    crops (list[np.ndarray]): List of image crops.
    batch_size (int): Batch size for inference.

Returns:
    list[str]: List of recognized texts.
"""
```


## DenseNet121BnEnClassifier : Language classifier

```python
from apsisocr import DenseNet121BnEnClassifier
lang=DenseNet121BnEnClassifier()
lang.infer(crops)
```

* docstring for ```DenseNet121BnEnClassifier.infer```

```python
"""
Perform inference on image crops.

Args:
    crops (list[np.ndarray]): List of image crops.
    batch_size (int): Batch size for inference (default: 32).

Returns:
    list[str]: List of inferred languages.
"""
```

## PaddleDBNet : Text Detector

* check [paddleOCR](https://github.com/PaddlePaddle/PaddleOCR) official website for better understanding of the model

```python
# initialization
from apsisocr import PaddleDBNet
detector=PaddleDBNet()
# getting word boxes
word_boxes=detector.get_word_boxes(img)
# getting line boxes
line_boxes=detector.get_line_boxes(img)
# getting crop with either of the results
crops=detector.get_crops(img,word_boxes)
```

## ApsisOCR : Overall System

```python
from apsisocr import ApsisOCR
ocr=ApsisOCR()
results=ocr(img_path)
```

* docstring for ```ApsisOCR.__call__```

```python
"""
Perform OCR on an image.

Args:
    img_path (str): Path to the image file.

Returns:
    dict: OCR results containing recognized text and associated information. The dictionary has the following structre
            {
            "text" : multiline text with newline separators
            "result" : list a dictionaries that contains the following structre:
                        {
                        "line_no" : the line number of the word
                        "word_no" : the word number in the line 
                        "poly"    : the four point polygonal bounding box of the word in the image
                        "text"    : the recognized text 
                        "lang"    : the classified language code
                        }
            }
"""  
```

**check ```useage/useage.ipynb``` for examples**


# **Deployment**
* ```cd deployment```: change directory to deployment folder
* change the configs as required in ```config.py```

```python
# This port will be used by the api_ocr.py 
OCR_API_PORT=3032
# This api address is diplayed after deploying api_ocr.py and this is used in app.py  
OCR_API="http://172.20.4.53:3032/ocr"
```
* running the api and app:

```bash
python api_ocr.py # deploys at port 3032 by defautl
streamlit run app.py --server.port 3033 # deploys streamlit built frontend at 3033 port
```
* The **api_ocr.py** lets the api to be used without any UI (a postman screenshot is attached below)

![](/deployment/images/api_ocr.png) 

* The **app.py** runs a streamlit UI 

![](/deployment/images/app.png) 


**TESTED GPU INFERENCE SERVER CONFIG**  

```python
OS          : Ubuntu 20.04.6 LTS      
Memory      : 62.4 GiB 
Processor   : Intel® Xeon(R) Silver 4214R CPU @ 2.40GHz × 24    
Graphics    : NVIDIA RTX A6000/PCIe/SSE2
Gnome       : 3.36.8
```
# License
Contents of this repository are restricted to non-commercial research purposes only under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). 

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>



Change Log
===========

0.0.1 (24/09/2023)
-------------------
- added useage and examples

0.0.2 (24/09/2023)
-------------------
- added liscencing

0.0.3 (05/11/2023)
-------------------
- added base application without line and word number

0.0.4 (01/05/2024)
-------------------
- apsisbnocr

0.0.5 (22/11/2024)
------------------
- added Null handling to apsisbnocr

0.0.6 (22/11/2024)
------------------
- added Null handling to apsisbnocr - output typo update

0.0.7 (22/11/2024)
------------------
- BNbaseOCR -> location and text only

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mnansary/apsis-ocr/",
    "name": "apsisocr",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "ocr, multilingual-ocr, scene ocr, apsisnet",
    "author": "Nazmuddoha Ansary",
    "author_email": "nazmuddoha.ansary@apsissolutions.com",
    "download_url": null,
    "platform": null,
    "description": "# apsis-ocr\n![](/deployment/images/apsis.png) \n\nApsis-OCR is a Mixed language ocr system for Printed Documents developed at [Apsis Solutions limited](https://apsissolutions.com/)\n\nThe full system is build with 3 components: \n* Text detection : DBNet\n* Text recognition:\n    * Bangla Text : ApsisNet \n        * ApsisNet is a model developed at Apsis Solutions Limited. \n        * It is used by [bbOCR](https://github.com/BengaliAI/bbocr/blob/dev/modules.md) as the recognition model \n        * ApsisNet is found to be the best among other available recognition models (such as tesseract and easyOCR) in the linked [paper](https://arxiv.org/abs/2308.10647)\n    * English Text : SVTR-LCNet\n* Text classification : DenseNet121    \n\n\n# **Installation**\n\n\n## **As module/pypi package**\n### **cpu installation**\n\n```bash\npip install apsisocr\npip install onnxruntime\npip install fastdeploy-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html\n```\n\n### **gpu installation**\n\nIt is recommended to use conda environment . Specially for GPU.\n\n* **installing cudatoolkit and cudnn**: \n\n```bash\nconda install cudatoolkit\nconda install cudnn\n```\n\n* **installing packages**\n\n```bash\npip install apsisocr\npip install onnxruntime-gpu\npython -m pip install -U fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html\n```\n\n* **exporting environment variables**\n\n```bash\nmkdir -p $CONDA_PREFIX/etc/conda/activate.d\necho 'export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh\n```\n\n## **Building from source : Linux/Ubuntu**\nIt is recommended to use conda environment .\n\n* **clone the repository** : \n```bash\ngit clone https://github.com/mnansary/apsisOCR.git\ncd apsisOCR\n```\n\n\n* **create a conda environment**: \n\n```bash\nconda create -n apsisocr python=3.9\n```\n\n* **activate conda environment**: \n\n```bash\nconda activate apsisocr\n\n```\n* **cpu installation**  :\n\n```bash\nbash install.sh cpu\n``` \n* **gpu installation**  :\n    \n```bash\nbash install.sh gpu\n``` \n\n# Useage\n\n\n## Apsisnet : Bangla Recognizer\n\n* useage\n```python\nfrom apsisocr import ApsisNet\nbnocr=ApsisNet()\nbnocr.infer(crops)\n```\n* docstring for ```ApsisNet.infer```\n\n```python\n\"\"\"\nPerform inference on image crops.\n\nArgs:\n    crops (list[np.ndarray]): List of image crops.\n    batch_size (int): Batch size for inference (default: 32).\n    normalize_unicode (bool): Flag to normalize unicode (default: True).\n\nReturns:\n    list[str]: List of inferred texts.\n\"\"\"\n```\n\n## SVTR-LCNet : English Recognizer\n\n```python\nfrom apsisocr import SVTRLCNet\nenocr=SVTRLCNet()\nenocr.infer(crops)\n```\n\n* docstring for ```SVTRLCNet.infer```\n\n```python\n\"\"\"\nPerform inference on image crops.\n\nArgs:\n    crops (list[np.ndarray]): List of image crops.\n    batch_size (int): Batch size for inference.\n\nReturns:\n    list[str]: List of recognized texts.\n\"\"\"\n```\n\n\n## DenseNet121BnEnClassifier : Language classifier\n\n```python\nfrom apsisocr import DenseNet121BnEnClassifier\nlang=DenseNet121BnEnClassifier()\nlang.infer(crops)\n```\n\n* docstring for ```DenseNet121BnEnClassifier.infer```\n\n```python\n\"\"\"\nPerform inference on image crops.\n\nArgs:\n    crops (list[np.ndarray]): List of image crops.\n    batch_size (int): Batch size for inference (default: 32).\n\nReturns:\n    list[str]: List of inferred languages.\n\"\"\"\n```\n\n## PaddleDBNet : Text Detector\n\n* check [paddleOCR](https://github.com/PaddlePaddle/PaddleOCR) official website for better understanding of the model\n\n```python\n# initialization\nfrom apsisocr import PaddleDBNet\ndetector=PaddleDBNet()\n# getting word boxes\nword_boxes=detector.get_word_boxes(img)\n# getting line boxes\nline_boxes=detector.get_line_boxes(img)\n# getting crop with either of the results\ncrops=detector.get_crops(img,word_boxes)\n```\n\n## ApsisOCR : Overall System\n\n```python\nfrom apsisocr import ApsisOCR\nocr=ApsisOCR()\nresults=ocr(img_path)\n```\n\n* docstring for ```ApsisOCR.__call__```\n\n```python\n\"\"\"\nPerform OCR on an image.\n\nArgs:\n    img_path (str): Path to the image file.\n\nReturns:\n    dict: OCR results containing recognized text and associated information. The dictionary has the following structre\n            {\n            \"text\" : multiline text with newline separators\n            \"result\" : list a dictionaries that contains the following structre:\n                        {\n                        \"line_no\" : the line number of the word\n                        \"word_no\" : the word number in the line \n                        \"poly\"    : the four point polygonal bounding box of the word in the image\n                        \"text\"    : the recognized text \n                        \"lang\"    : the classified language code\n                        }\n            }\n\"\"\"  \n```\n\n**check ```useage/useage.ipynb``` for examples**\n\n\n# **Deployment**\n* ```cd deployment```: change directory to deployment folder\n* change the configs as required in ```config.py```\n\n```python\n# This port will be used by the api_ocr.py \nOCR_API_PORT=3032\n# This api address is diplayed after deploying api_ocr.py and this is used in app.py  \nOCR_API=\"http://172.20.4.53:3032/ocr\"\n```\n* running the api and app:\n\n```bash\npython api_ocr.py # deploys at port 3032 by defautl\nstreamlit run app.py --server.port 3033 # deploys streamlit built frontend at 3033 port\n```\n* The **api_ocr.py** lets the api to be used without any UI (a postman screenshot is attached below)\n\n![](/deployment/images/api_ocr.png) \n\n* The **app.py** runs a streamlit UI \n\n![](/deployment/images/app.png) \n\n\n**TESTED GPU INFERENCE SERVER CONFIG**  \n\n```python\nOS          : Ubuntu 20.04.6 LTS      \nMemory      : 62.4\u00a0GiB \nProcessor   : Intel\u00ae Xeon(R) Silver 4214R CPU @ 2.40GHz \u00d7 24    \nGraphics    : NVIDIA RTX A6000/PCIe/SSE2\nGnome       : 3.36.8\n```\n# License\nContents of this repository are restricted to non-commercial research purposes only under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). \n\n<a rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\"><img alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png\" /></a>\n\n\n\nChange Log\n===========\n\n0.0.1 (24/09/2023)\n-------------------\n- added useage and examples\n\n0.0.2 (24/09/2023)\n-------------------\n- added liscencing\n\n0.0.3 (05/11/2023)\n-------------------\n- added base application without line and word number\n\n0.0.4 (01/05/2024)\n-------------------\n- apsisbnocr\n\n0.0.5 (22/11/2024)\n------------------\n- added Null handling to apsisbnocr\n\n0.0.6 (22/11/2024)\n------------------\n- added Null handling to apsisbnocr - output typo update\n\n0.0.7 (22/11/2024)\n------------------\n- BNbaseOCR -> location and text only\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Bangla English Mixed Languge Scene / printed OCR Toolkit",
    "version": "0.0.7",
    "project_urls": {
        "Homepage": "https://github.com/mnansary/apsis-ocr/"
    },
    "split_keywords": [
        "ocr",
        " multilingual-ocr",
        " scene ocr",
        " apsisnet"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9a351e2b960548f439589b840acc9bddbba9a0e59697de11a79f9ca683871217",
                "md5": "e03d3d54cc1864458ccfa7ac38ec1743",
                "sha256": "a8d4b8596c0336baa43b31db99ba6775eae72bbe2315fcaf8f13acff5e362f56"
            },
            "downloads": -1,
            "filename": "apsisocr-0.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e03d3d54cc1864458ccfa7ac38ec1743",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 22568,
            "upload_time": "2024-11-21T23:24:39",
            "upload_time_iso_8601": "2024-11-21T23:24:39.124369Z",
            "url": "https://files.pythonhosted.org/packages/9a/35/1e2b960548f439589b840acc9bddbba9a0e59697de11a79f9ca683871217/apsisocr-0.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-21 23:24:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mnansary",
    "github_project": "apsis-ocr",
    "github_not_found": true,
    "lcname": "apsisocr"
}

Nazmuddoha Ansary