apsisocr


Nameapsisocr JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://github.com/mnansary/apsis-ocr/
SummaryBangla English Mixed Languge Scene / printed OCR Toolkit
upload_time2023-11-05 15:32:42
maintainer
docs_urlNone
authorNazmuddoha Ansary
requires_python
licenseMIT
keywords ocr multilingual-ocr scene ocr apsisnet
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # apsis-ocr
![](/deployment/images/apsis.png) 

Apsis-OCR is a Mixed language ocr system for Printed Documents developed at [Apsis Solutions limited](https://apsissolutions.com/)

The full system is build with 3 components: 
* Text detection : DBNet
* Text recognition:
    * Bangla Text : ApsisNet 
        * ApsisNet is a model developed at Apsis Solutions Limited. 
        * It is used by [bbOCR](https://github.com/BengaliAI/bbocr/blob/dev/modules.md) as the recognition model 
        * ApsisNet is found to be the best among other available recognition models (such as tesseract and easyOCR) in the linked [paper](https://arxiv.org/abs/2308.10647)
    * English Text : SVTR-LCNet
* Text classification : DenseNet121    


# **Installation**


## **As module/pypi package**
### **cpu installation**

```bash
pip install apsisocr
pip install onnxruntime
pip install fastdeploy-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
```

### **gpu installation**

It is recommended to use conda environment . Specially for GPU.

* **installing cudatoolkit and cudnn**: 

```bash
conda install cudatoolkit
conda install cudnn
```

* **installing packages**

```bash
pip install apsisocr
pip install onnxruntime-gpu
python -m pip install -U fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
```

* **exporting environment variables**

```bash
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
```

## **Building from source : Linux/Ubuntu**
It is recommended to use conda environment .

* **clone the repository** : 
```bash
git clone https://github.com/mnansary/apsisOCR.git
cd apsisOCR
```


* **create a conda environment**: 

```bash
conda create -n apsisocr python=3.9
```

* **activate conda environment**: 

```bash
conda activate apsisocr

```
* **cpu installation**  :

```bash
bash install.sh cpu
``` 
* **gpu installation**  :
    
```bash
bash install.sh gpu
``` 

# Useage


## Apsisnet : Bangla Recognizer

* useage
```python
from apsisocr import ApsisNet
bnocr=ApsisNet()
bnocr.infer(crops)
```
* docstring for ```ApsisNet.infer```

```python
"""
Perform inference on image crops.

Args:
    crops (list[np.ndarray]): List of image crops.
    batch_size (int): Batch size for inference (default: 32).
    normalize_unicode (bool): Flag to normalize unicode (default: True).

Returns:
    list[str]: List of inferred texts.
"""
```

## SVTR-LCNet : English Recognizer

```python
from apsisocr import SVTRLCNet
enocr=SVTRLCNet()
enocr.infer(crops)
```

* docstring for ```SVTRLCNet.infer```

```python
"""
Perform inference on image crops.

Args:
    crops (list[np.ndarray]): List of image crops.
    batch_size (int): Batch size for inference.

Returns:
    list[str]: List of recognized texts.
"""
```


## DenseNet121BnEnClassifier : Language classifier

```python
from apsisocr import DenseNet121BnEnClassifier
lang=DenseNet121BnEnClassifier()
lang.infer(crops)
```

* docstring for ```DenseNet121BnEnClassifier.infer```

```python
"""
Perform inference on image crops.

Args:
    crops (list[np.ndarray]): List of image crops.
    batch_size (int): Batch size for inference (default: 32).

Returns:
    list[str]: List of inferred languages.
"""
```

## PaddleDBNet : Text Detector

* check [paddleOCR](https://github.com/PaddlePaddle/PaddleOCR) official website for better understanding of the model

```python
# initialization
from apsisocr import PaddleDBNet
detector=PaddleDBNet()
# getting word boxes
word_boxes=detector.get_word_boxes(img)
# getting line boxes
line_boxes=detector.get_line_boxes(img)
# getting crop with either of the results
crops=detector.get_crops(img,word_boxes)
```

## ApsisOCR : Overall System

```python
from apsisocr import ApsisOCR
ocr=ApsisOCR()
results=ocr(img_path)
```

* docstring for ```ApsisOCR.__call__```

```python
"""
Perform OCR on an image.

Args:
    img_path (str): Path to the image file.

Returns:
    dict: OCR results containing recognized text and associated information. The dictionary has the following structre
            {
            "text" : multiline text with newline separators
            "result" : list a dictionaries that contains the following structre:
                        {
                        "line_no" : the line number of the word
                        "word_no" : the word number in the line 
                        "poly"    : the four point polygonal bounding box of the word in the image
                        "text"    : the recognized text 
                        "lang"    : the classified language code
                        }
            }
"""  
```

**check ```useage/useage.ipynb``` for examples**


# **Deployment**
* ```cd deployment```: change directory to deployment folder
* change the configs as required in ```config.py```

```python
# This port will be used by the api_ocr.py 
OCR_API_PORT=3032
# This api address is diplayed after deploying api_ocr.py and this is used in app.py  
OCR_API="http://172.20.4.53:3032/ocr"
```
* running the api and app:

```bash
python api_ocr.py # deploys at port 3032 by defautl
streamlit run app.py --server.port 3033 # deploys streamlit built frontend at 3033 port
```
* The **api_ocr.py** lets the api to be used without any UI (a postman screenshot is attached below)

![](/deployment/images/api_ocr.png) 

* The **app.py** runs a streamlit UI 

![](/deployment/images/app.png) 


**TESTED GPU INFERENCE SERVER CONFIG**  

```python
OS          : Ubuntu 20.04.6 LTS      
Memory      : 62.4 GiB 
Processor   : Intel® Xeon(R) Silver 4214R CPU @ 2.40GHz × 24    
Graphics    : NVIDIA RTX A6000/PCIe/SSE2
Gnome       : 3.36.8
```
# License
Contents of this repository are restricted to non-commercial research purposes only under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). 

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>



Change Log
===========

0.0.1 (24/09/2023)
-------------------
- added useage and examples

0.0.2 (24/09/2023)
-------------------
- added liscencing

0.0.3 (05/11/2023)
-------------------
- added base application without line and word number

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mnansary/apsis-ocr/",
    "name": "apsisocr",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "ocr,multilingual-ocr,scene ocr,apsisnet",
    "author": "Nazmuddoha Ansary",
    "author_email": "nazmuddoha.ansary@apsissolutions.com",
    "download_url": "https://files.pythonhosted.org/packages/cd/e1/a202d1e113570b6177e10edf4ed59fb4ce2291d9fdc6706ce3898121ecc7/apsisocr-0.0.3.tar.gz",
    "platform": null,
    "description": "# apsis-ocr\n![](/deployment/images/apsis.png) \n\nApsis-OCR is a Mixed language ocr system for Printed Documents developed at [Apsis Solutions limited](https://apsissolutions.com/)\n\nThe full system is build with 3 components: \n* Text detection : DBNet\n* Text recognition:\n    * Bangla Text : ApsisNet \n        * ApsisNet is a model developed at Apsis Solutions Limited. \n        * It is used by [bbOCR](https://github.com/BengaliAI/bbocr/blob/dev/modules.md) as the recognition model \n        * ApsisNet is found to be the best among other available recognition models (such as tesseract and easyOCR) in the linked [paper](https://arxiv.org/abs/2308.10647)\n    * English Text : SVTR-LCNet\n* Text classification : DenseNet121    \n\n\n# **Installation**\n\n\n## **As module/pypi package**\n### **cpu installation**\n\n```bash\npip install apsisocr\npip install onnxruntime\npip install fastdeploy-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html\n```\n\n### **gpu installation**\n\nIt is recommended to use conda environment . Specially for GPU.\n\n* **installing cudatoolkit and cudnn**: \n\n```bash\nconda install cudatoolkit\nconda install cudnn\n```\n\n* **installing packages**\n\n```bash\npip install apsisocr\npip install onnxruntime-gpu\npython -m pip install -U fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html\n```\n\n* **exporting environment variables**\n\n```bash\nmkdir -p $CONDA_PREFIX/etc/conda/activate.d\necho 'export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh\n```\n\n## **Building from source : Linux/Ubuntu**\nIt is recommended to use conda environment .\n\n* **clone the repository** : \n```bash\ngit clone https://github.com/mnansary/apsisOCR.git\ncd apsisOCR\n```\n\n\n* **create a conda environment**: \n\n```bash\nconda create -n apsisocr python=3.9\n```\n\n* **activate conda environment**: \n\n```bash\nconda activate apsisocr\n\n```\n* **cpu installation**  :\n\n```bash\nbash install.sh cpu\n``` \n* **gpu installation**  :\n    \n```bash\nbash install.sh gpu\n``` \n\n# Useage\n\n\n## Apsisnet : Bangla Recognizer\n\n* useage\n```python\nfrom apsisocr import ApsisNet\nbnocr=ApsisNet()\nbnocr.infer(crops)\n```\n* docstring for ```ApsisNet.infer```\n\n```python\n\"\"\"\nPerform inference on image crops.\n\nArgs:\n    crops (list[np.ndarray]): List of image crops.\n    batch_size (int): Batch size for inference (default: 32).\n    normalize_unicode (bool): Flag to normalize unicode (default: True).\n\nReturns:\n    list[str]: List of inferred texts.\n\"\"\"\n```\n\n## SVTR-LCNet : English Recognizer\n\n```python\nfrom apsisocr import SVTRLCNet\nenocr=SVTRLCNet()\nenocr.infer(crops)\n```\n\n* docstring for ```SVTRLCNet.infer```\n\n```python\n\"\"\"\nPerform inference on image crops.\n\nArgs:\n    crops (list[np.ndarray]): List of image crops.\n    batch_size (int): Batch size for inference.\n\nReturns:\n    list[str]: List of recognized texts.\n\"\"\"\n```\n\n\n## DenseNet121BnEnClassifier : Language classifier\n\n```python\nfrom apsisocr import DenseNet121BnEnClassifier\nlang=DenseNet121BnEnClassifier()\nlang.infer(crops)\n```\n\n* docstring for ```DenseNet121BnEnClassifier.infer```\n\n```python\n\"\"\"\nPerform inference on image crops.\n\nArgs:\n    crops (list[np.ndarray]): List of image crops.\n    batch_size (int): Batch size for inference (default: 32).\n\nReturns:\n    list[str]: List of inferred languages.\n\"\"\"\n```\n\n## PaddleDBNet : Text Detector\n\n* check [paddleOCR](https://github.com/PaddlePaddle/PaddleOCR) official website for better understanding of the model\n\n```python\n# initialization\nfrom apsisocr import PaddleDBNet\ndetector=PaddleDBNet()\n# getting word boxes\nword_boxes=detector.get_word_boxes(img)\n# getting line boxes\nline_boxes=detector.get_line_boxes(img)\n# getting crop with either of the results\ncrops=detector.get_crops(img,word_boxes)\n```\n\n## ApsisOCR : Overall System\n\n```python\nfrom apsisocr import ApsisOCR\nocr=ApsisOCR()\nresults=ocr(img_path)\n```\n\n* docstring for ```ApsisOCR.__call__```\n\n```python\n\"\"\"\nPerform OCR on an image.\n\nArgs:\n    img_path (str): Path to the image file.\n\nReturns:\n    dict: OCR results containing recognized text and associated information. The dictionary has the following structre\n            {\n            \"text\" : multiline text with newline separators\n            \"result\" : list a dictionaries that contains the following structre:\n                        {\n                        \"line_no\" : the line number of the word\n                        \"word_no\" : the word number in the line \n                        \"poly\"    : the four point polygonal bounding box of the word in the image\n                        \"text\"    : the recognized text \n                        \"lang\"    : the classified language code\n                        }\n            }\n\"\"\"  \n```\n\n**check ```useage/useage.ipynb``` for examples**\n\n\n# **Deployment**\n* ```cd deployment```: change directory to deployment folder\n* change the configs as required in ```config.py```\n\n```python\n# This port will be used by the api_ocr.py \nOCR_API_PORT=3032\n# This api address is diplayed after deploying api_ocr.py and this is used in app.py  \nOCR_API=\"http://172.20.4.53:3032/ocr\"\n```\n* running the api and app:\n\n```bash\npython api_ocr.py # deploys at port 3032 by defautl\nstreamlit run app.py --server.port 3033 # deploys streamlit built frontend at 3033 port\n```\n* The **api_ocr.py** lets the api to be used without any UI (a postman screenshot is attached below)\n\n![](/deployment/images/api_ocr.png) \n\n* The **app.py** runs a streamlit UI \n\n![](/deployment/images/app.png) \n\n\n**TESTED GPU INFERENCE SERVER CONFIG**  \n\n```python\nOS          : Ubuntu 20.04.6 LTS      \nMemory      : 62.4\u00a0GiB \nProcessor   : Intel\u00ae Xeon(R) Silver 4214R CPU @ 2.40GHz \u00d7 24    \nGraphics    : NVIDIA RTX A6000/PCIe/SSE2\nGnome       : 3.36.8\n```\n# License\nContents of this repository are restricted to non-commercial research purposes only under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). \n\n<a rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\"><img alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png\" /></a>\n\n\n\nChange Log\n===========\n\n0.0.1 (24/09/2023)\n-------------------\n- added useage and examples\n\n0.0.2 (24/09/2023)\n-------------------\n- added liscencing\n\n0.0.3 (05/11/2023)\n-------------------\n- added base application without line and word number\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Bangla English Mixed Languge Scene / printed OCR Toolkit",
    "version": "0.0.3",
    "project_urls": {
        "Homepage": "https://github.com/mnansary/apsis-ocr/"
    },
    "split_keywords": [
        "ocr",
        "multilingual-ocr",
        "scene ocr",
        "apsisnet"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a9e838469270895ac4f67c7175c4a22b63343be9f40688639cf7dd0a78b33317",
                "md5": "93dfec66652123388483c56cfff4f9f2",
                "sha256": "5ad136c873d9a562a6e20886ca001c5875c8b13922c3b72705a6c135a15f9584"
            },
            "downloads": -1,
            "filename": "apsisocr-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "93dfec66652123388483c56cfff4f9f2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 18862,
            "upload_time": "2023-11-05T15:32:39",
            "upload_time_iso_8601": "2023-11-05T15:32:39.303454Z",
            "url": "https://files.pythonhosted.org/packages/a9/e8/38469270895ac4f67c7175c4a22b63343be9f40688639cf7dd0a78b33317/apsisocr-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cde1a202d1e113570b6177e10edf4ed59fb4ce2291d9fdc6706ce3898121ecc7",
                "md5": "0dbc0dbdc36c2af590a2d12bf947927c",
                "sha256": "36dd7d4f1916e846ff66a258d92fd34a90f37662239db9c8bbb65c2f5241fe98"
            },
            "downloads": -1,
            "filename": "apsisocr-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "0dbc0dbdc36c2af590a2d12bf947927c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 19180,
            "upload_time": "2023-11-05T15:32:42",
            "upload_time_iso_8601": "2023-11-05T15:32:42.541849Z",
            "url": "https://files.pythonhosted.org/packages/cd/e1/a202d1e113570b6177e10edf4ed59fb4ce2291d9fdc6706ce3898121ecc7/apsisocr-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-05 15:32:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mnansary",
    "github_project": "apsis-ocr",
    "github_not_found": true,
    "lcname": "apsisocr"
}
        
Elapsed time: 0.15155s