kocr


Namekocr JSON
Version 0.1.4 PyPI version JSON
download
home_pagehttps://github.com/kime541200/kocr
SummaryUse to build server end-point and client end-point of OCR service.
upload_time2024-08-21 15:21:30
maintainerNone
docs_urlNone
authorKim Chen
requires_python>=3.9
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # KOCR

![banner](./img/ComfyUI_00081_.png)

## Introduction

PDF 中的資訊難以提取? 試試 OCR 吧!🤩

PDF 文件中的文字雖然易於閱讀,但想要提取其中的資訊卻常常讓人頭痛😓。別擔心,OCR (光學字符識別) 來救你啦!🙌

**OCR 能將 PDF 文件內的圖像轉換為可編輯的文本內容,並提供每個文字的位置資訊。** 🤯 這意味着你可以輕鬆的:

* 將 PDF 文档中的文字複製到其他應用程式中 📑
* 搜尋 PDF 文件中的特定關鍵字🔎
* 自動整理表格数据📊
* 更有效率地分析和處理文本信息📈

這個專案旨在提供一個方便又實用的解決方案,讓你快速、高效地提取 PDF 文件中的資訊。🚀

採用了 [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR.git) 開源預訓練模型 💪,並搭建了 server 端和 client 端:

* **完全本地運行!** 你不需要連網,任何時候都可以使用它!🌎
* **簡單易用!** 輕鬆架設完成,讓你快速上手 🚀

解鎖 PDF 文件的無限潛力吧!✨

## Pre-require
- [Python](https://www.python.org/downloads/)
- [Docker](https://docs.docker.com/engine/install/ubuntu/)
- 到[PaddleOCR](https://paddlepaddle.github.io/PaddleOCR/ppocr/model_list.html)官網下載相關模型, 模型尺寸請依據各自需求下載, 至少須下載以下3種模型各一個
  - 檢測模型(det)
  - 識別模型(rec)
  - 文本方向分類模型(cls)

## Usage - Server

要啟動server有兩種方式:
- pip
- (推薦)Docker

### pip

```bash
conda create -n kocr python=3.11 -y -q
conda activate kocr
pip install kocr
```

啟動server前可以先設定模型目錄以及要運行的port

```bash
export OCR_MODEL_ROOT=/data/models/paddleocr
export DET_MODEL=/det/en/en_PP-OCRv3_det_infer
export REC_MODEL=/rec/en/en_PP-OCRv4_rec_infer
export CLS_MODEL=/cls/ch/ch_ppocr_mobile_v2.0_cls_slim_infer
export PORT=8868

python -m kocr.api_server
```

正常啟動server的話應該會看到
```bash
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8868 (Press CTRL+C to quit)
```

### Docker
1. Build Docker image
```bash
docker pull kime541200/kocr:latest
```

2. Create Docker container
```bash
sudo docker run -it \
--gpus='"device=0"' \
-v /data/models/paddleocr:/data/models/paddleocr \
-e OCR_MODEL_ROOT=/data/models/paddleocr \
-e DET_MODEL=/det/en/en_PP-OCRv3_det_infer \
-e REC_MODEL=/rec/en/en_PP-OCRv4_rec_infer \
-e CLS_MODEL=/cls/ch/ch_ppocr_mobile_v2.0_cls_slim_infer \
-e OCR_PORT=8868 \
-p 8868:8868 \
-w /usr/src/app/kocr \
--restart unless-stopped \
--name kocr \
kime541200/kocr:latest \
python api_server.py
```

其中 
- `OCR_MODEL_ROOT` 是存放OCR模型的根目錄
- `DET_MODEL` 是存放檢測模型的目錄
- `REC_MODEL` 是存放識別模型的目錄
- `CLS_MODEL` 是存放文本方向分類模型的目錄

Server端預設情況下會去 `{OCR_MODEL_ROOT}{DET_MODEL}` 讀取檢測模型, 沒有的話就會直接下載到該目錄(須連網), 其他兩個模型依此類推。

這邊提供建立容器的範例中以 `-v /data/models/paddleocr:/data/models/paddleocr` 將本機的目錄掛載進容器中, 是因為我將模型放在本機的 `/data/models/paddleocr` 底下, 實際情況可依個人需求進行調整。

`-e OCR_PORT` 則可用來設置server運行的port。

容器建立後server會在背景運行, 例如: http://0.0.0.0:8868。

## Usage - Client

### PDF OCR

```python
from kocr.app.client.classes.OcrClient import OcrClient
from kocr.app.ocr.utils.utils import decode_base64_image, draw_text_box

ocr_client = OcrClient(host='http://127.0.0.1:8868')  # change IP and port if needed

def run():
    # leave `specific_pages` to `None` will stream every pages in the PDF file
    for result in ocr_client.send_pdf(pdf_path='/path/to/file.pdf', specific_pages=[1, 3, 21]): 
        img = decode_base64_image(result['base64_img'])
        draw_text_box(img=img, ocr_results=result['result'])
    
if __name__ == "__main__":
    run()
```

### 圖片OCR

```python
from kocr.app.client.classes.OcrClient import OcrClient
from PIL import Image
from kocr.app.ocr.utils.utils import image_to_base64, decode_base64_image, draw_text_box

ocr_client = OcrClient(host='http://127.0.0.1:8868') # change IP and port if needed

def run():
    # 載入本地影像
    image = Image.open("/path/to/image.jpg")

    img_base64 = image_to_base64(image)
    response = ocr_client.send_image(img_base64=img_base64)

    # 輸出伺服器的回應
    if response.status_code == 200:
        img = decode_base64_image(base64_str=response.json()['base64_img'])
        ocr_result = response.json()['result']
        draw_text_box(img=img, ocr_results=ocr_result)
        
    else:
        print(f"Failed to send image. Status code: {response.status_code}")

if __name__ == "__main__":
    run()
```

### 滑動視窗OCR (處理較大圖片)

```python
from PIL import Image
from kocr.app.client.classes.OcrClient import OcrClient
from kocr.app.ocr.classes import OcrConfig
from kocr.app.ocr.utils.utils import image_to_base64, decode_base64_image, draw_text_box

ocr_client = OcrClient(host='http://127.0.0.1:8868')

def run():
    # 載入本地影像
    image = Image.open("/path/to/large.jpg")

    img_base64 = image_to_base64(image)
    # must set the slide window's size
    config = {
        "slice":{'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
    }
    response = ocr_client.send_image(img_base64=img_base64, config=OcrConfig(**config))

    # 輸出伺服器的回應
    if response.status_code == 200:
        img = decode_base64_image(base64_str=response.json()['base64_img'])
        ocr_result = response.json()['result']
        draw_text_box(img=img, ocr_results=ocr_result)
        
    else:
        print(f"Failed to send image. Status code: {response.status_code}")
    

if __name__ == "__main__":
    run()
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kime541200/kocr",
    "name": "kocr",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Kim Chen",
    "author_email": "kime541200@outlook.com",
    "download_url": "https://files.pythonhosted.org/packages/18/4d/e7e58b7ed74be650a633975ea54fa4f6f6af0f35eb030478d7b429f93ab6/kocr-0.1.4.tar.gz",
    "platform": null,
    "description": "# KOCR\n\n![banner](./img/ComfyUI_00081_.png)\n\n## Introduction\n\nPDF \u4e2d\u7684\u8cc7\u8a0a\u96e3\u4ee5\u63d0\u53d6\uff1f \u8a66\u8a66 OCR \u5427\uff01\ud83e\udd29\n\nPDF \u6587\u4ef6\u4e2d\u7684\u6587\u5b57\u96d6\u7136\u6613\u65bc\u95b1\u8b80\uff0c\u4f46\u60f3\u8981\u63d0\u53d6\u5176\u4e2d\u7684\u8cc7\u8a0a\u537b\u5e38\u5e38\u8b93\u4eba\u982d\u75db\ud83d\ude13\u3002\u5225\u64d4\u5fc3\uff0cOCR (\u5149\u5b78\u5b57\u7b26\u8b58\u5225) \u4f86\u6551\u4f60\u5566\uff01\ud83d\ude4c\n\n**OCR \u80fd\u5c07 PDF \u6587\u4ef6\u5167\u7684\u5716\u50cf\u8f49\u63db\u70ba\u53ef\u7de8\u8f2f\u7684\u6587\u672c\u5167\u5bb9\uff0c\u4e26\u63d0\u4f9b\u6bcf\u500b\u6587\u5b57\u7684\u4f4d\u7f6e\u8cc7\u8a0a\u3002** \ud83e\udd2f \u9019\u610f\u5473\u7740\u4f60\u53ef\u4ee5\u8f15\u9b06\u7684\uff1a\n\n* \u5c07 PDF \u6587\u6863\u4e2d\u7684\u6587\u5b57\u8907\u88fd\u5230\u5176\u4ed6\u61c9\u7528\u7a0b\u5f0f\u4e2d \ud83d\udcd1\n* \u641c\u5c0b PDF \u6587\u4ef6\u4e2d\u7684\u7279\u5b9a\u95dc\u9375\u5b57\ud83d\udd0e\n* \u81ea\u52d5\u6574\u7406\u8868\u683c\u6570\u636e\ud83d\udcca\n* \u66f4\u6709\u6548\u7387\u5730\u5206\u6790\u548c\u8655\u7406\u6587\u672c\u4fe1\u606f\ud83d\udcc8\n\n\u9019\u500b\u5c08\u6848\u65e8\u5728\u63d0\u4f9b\u4e00\u500b\u65b9\u4fbf\u53c8\u5be6\u7528\u7684\u89e3\u6c7a\u65b9\u6848\uff0c\u8b93\u4f60\u5feb\u901f\u3001\u9ad8\u6548\u5730\u63d0\u53d6 PDF \u6587\u4ef6\u4e2d\u7684\u8cc7\u8a0a\u3002\ud83d\ude80\n\n\u63a1\u7528\u4e86 [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR.git) \u958b\u6e90\u9810\u8a13\u7df4\u6a21\u578b \ud83d\udcaa\uff0c\u4e26\u642d\u5efa\u4e86 server \u7aef\u548c client \u7aef\uff1a\n\n* **\u5b8c\u5168\u672c\u5730\u904b\u884c\uff01** \u4f60\u4e0d\u9700\u8981\u9023\u7db2\uff0c\u4efb\u4f55\u6642\u5019\u90fd\u53ef\u4ee5\u4f7f\u7528\u5b83\uff01\ud83c\udf0e\n* **\u7c21\u55ae\u6613\u7528!** \u8f15\u9b06\u67b6\u8a2d\u5b8c\u6210\uff0c\u8b93\u4f60\u5feb\u901f\u4e0a\u624b \ud83d\ude80\n\n\u89e3\u9396 PDF \u6587\u4ef6\u7684\u7121\u9650\u6f5b\u529b\u5427\uff01\u2728\n\n## Pre-require\n- [Python](https://www.python.org/downloads/)\n- [Docker](https://docs.docker.com/engine/install/ubuntu/)\n- \u5230[PaddleOCR](https://paddlepaddle.github.io/PaddleOCR/ppocr/model_list.html)\u5b98\u7db2\u4e0b\u8f09\u76f8\u95dc\u6a21\u578b, \u6a21\u578b\u5c3a\u5bf8\u8acb\u4f9d\u64da\u5404\u81ea\u9700\u6c42\u4e0b\u8f09, \u81f3\u5c11\u9808\u4e0b\u8f09\u4ee5\u4e0b3\u7a2e\u6a21\u578b\u5404\u4e00\u500b\n  - \u6aa2\u6e2c\u6a21\u578b(det)\n  - \u8b58\u5225\u6a21\u578b(rec)\n  - \u6587\u672c\u65b9\u5411\u5206\u985e\u6a21\u578b(cls)\n\n## Usage - Server\n\n\u8981\u555f\u52d5server\u6709\u5169\u7a2e\u65b9\u5f0f:\n- pip\n- (\u63a8\u85a6)Docker\n\n### pip\n\n```bash\nconda create -n kocr python=3.11 -y -q\nconda activate kocr\npip install kocr\n```\n\n\u555f\u52d5server\u524d\u53ef\u4ee5\u5148\u8a2d\u5b9a\u6a21\u578b\u76ee\u9304\u4ee5\u53ca\u8981\u904b\u884c\u7684port\n\n```bash\nexport OCR_MODEL_ROOT=/data/models/paddleocr\nexport DET_MODEL=/det/en/en_PP-OCRv3_det_infer\nexport REC_MODEL=/rec/en/en_PP-OCRv4_rec_infer\nexport CLS_MODEL=/cls/ch/ch_ppocr_mobile_v2.0_cls_slim_infer\nexport PORT=8868\n\npython -m kocr.api_server\n```\n\n\u6b63\u5e38\u555f\u52d5server\u7684\u8a71\u61c9\u8a72\u6703\u770b\u5230\n```bash\nINFO:     Application startup complete.\nINFO:     Uvicorn running on http://0.0.0.0:8868 (Press CTRL+C to quit)\n```\n\n### Docker\n1. Build Docker image\n```bash\ndocker pull kime541200/kocr:latest\n```\n\n2. Create Docker container\n```bash\nsudo docker run -it \\\n--gpus='\"device=0\"' \\\n-v /data/models/paddleocr:/data/models/paddleocr \\\n-e OCR_MODEL_ROOT=/data/models/paddleocr \\\n-e DET_MODEL=/det/en/en_PP-OCRv3_det_infer \\\n-e REC_MODEL=/rec/en/en_PP-OCRv4_rec_infer \\\n-e CLS_MODEL=/cls/ch/ch_ppocr_mobile_v2.0_cls_slim_infer \\\n-e OCR_PORT=8868 \\\n-p 8868:8868 \\\n-w /usr/src/app/kocr \\\n--restart unless-stopped \\\n--name kocr \\\nkime541200/kocr:latest \\\npython api_server.py\n```\n\n\u5176\u4e2d \n- `OCR_MODEL_ROOT` \u662f\u5b58\u653eOCR\u6a21\u578b\u7684\u6839\u76ee\u9304\n- `DET_MODEL` \u662f\u5b58\u653e\u6aa2\u6e2c\u6a21\u578b\u7684\u76ee\u9304\n- `REC_MODEL` \u662f\u5b58\u653e\u8b58\u5225\u6a21\u578b\u7684\u76ee\u9304\n- `CLS_MODEL` \u662f\u5b58\u653e\u6587\u672c\u65b9\u5411\u5206\u985e\u6a21\u578b\u7684\u76ee\u9304\n\nServer\u7aef\u9810\u8a2d\u60c5\u6cc1\u4e0b\u6703\u53bb `{OCR_MODEL_ROOT}{DET_MODEL}` \u8b80\u53d6\u6aa2\u6e2c\u6a21\u578b, \u6c92\u6709\u7684\u8a71\u5c31\u6703\u76f4\u63a5\u4e0b\u8f09\u5230\u8a72\u76ee\u9304(\u9808\u9023\u7db2), \u5176\u4ed6\u5169\u500b\u6a21\u578b\u4f9d\u6b64\u985e\u63a8\u3002\n\n\u9019\u908a\u63d0\u4f9b\u5efa\u7acb\u5bb9\u5668\u7684\u7bc4\u4f8b\u4e2d\u4ee5 `-v /data/models/paddleocr:/data/models/paddleocr` \u5c07\u672c\u6a5f\u7684\u76ee\u9304\u639b\u8f09\u9032\u5bb9\u5668\u4e2d, \u662f\u56e0\u70ba\u6211\u5c07\u6a21\u578b\u653e\u5728\u672c\u6a5f\u7684 `/data/models/paddleocr` \u5e95\u4e0b, \u5be6\u969b\u60c5\u6cc1\u53ef\u4f9d\u500b\u4eba\u9700\u6c42\u9032\u884c\u8abf\u6574\u3002\n\n`-e OCR_PORT` \u5247\u53ef\u7528\u4f86\u8a2d\u7f6eserver\u904b\u884c\u7684port\u3002\n\n\u5bb9\u5668\u5efa\u7acb\u5f8cserver\u6703\u5728\u80cc\u666f\u904b\u884c, \u4f8b\u5982: http://0.0.0.0:8868\u3002\n\n## Usage - Client\n\n### PDF OCR\n\n```python\nfrom kocr.app.client.classes.OcrClient import OcrClient\nfrom kocr.app.ocr.utils.utils import decode_base64_image, draw_text_box\n\nocr_client = OcrClient(host='http://127.0.0.1:8868')  # change IP and port if needed\n\ndef run():\n    # leave `specific_pages` to `None` will stream every pages in the PDF file\n    for result in ocr_client.send_pdf(pdf_path='/path/to/file.pdf', specific_pages=[1, 3, 21]): \n        img = decode_base64_image(result['base64_img'])\n        draw_text_box(img=img, ocr_results=result['result'])\n    \nif __name__ == \"__main__\":\n    run()\n```\n\n### \u5716\u7247OCR\n\n```python\nfrom kocr.app.client.classes.OcrClient import OcrClient\nfrom PIL import Image\nfrom kocr.app.ocr.utils.utils import image_to_base64, decode_base64_image, draw_text_box\n\nocr_client = OcrClient(host='http://127.0.0.1:8868') # change IP and port if needed\n\ndef run():\n    # \u8f09\u5165\u672c\u5730\u5f71\u50cf\n    image = Image.open(\"/path/to/image.jpg\")\n\n    img_base64 = image_to_base64(image)\n    response = ocr_client.send_image(img_base64=img_base64)\n\n    # \u8f38\u51fa\u4f3a\u670d\u5668\u7684\u56de\u61c9\n    if response.status_code == 200:\n        img = decode_base64_image(base64_str=response.json()['base64_img'])\n        ocr_result = response.json()['result']\n        draw_text_box(img=img, ocr_results=ocr_result)\n        \n    else:\n        print(f\"Failed to send image. Status code: {response.status_code}\")\n\nif __name__ == \"__main__\":\n    run()\n```\n\n### \u6ed1\u52d5\u8996\u7a97OCR (\u8655\u7406\u8f03\u5927\u5716\u7247)\n\n```python\nfrom PIL import Image\nfrom kocr.app.client.classes.OcrClient import OcrClient\nfrom kocr.app.ocr.classes import OcrConfig\nfrom kocr.app.ocr.utils.utils import image_to_base64, decode_base64_image, draw_text_box\n\nocr_client = OcrClient(host='http://127.0.0.1:8868')\n\ndef run():\n    # \u8f09\u5165\u672c\u5730\u5f71\u50cf\n    image = Image.open(\"/path/to/large.jpg\")\n\n    img_base64 = image_to_base64(image)\n    # must set the slide window's size\n    config = {\n        \"slice\":{'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}\n    }\n    response = ocr_client.send_image(img_base64=img_base64, config=OcrConfig(**config))\n\n    # \u8f38\u51fa\u4f3a\u670d\u5668\u7684\u56de\u61c9\n    if response.status_code == 200:\n        img = decode_base64_image(base64_str=response.json()['base64_img'])\n        ocr_result = response.json()['result']\n        draw_text_box(img=img, ocr_results=ocr_result)\n        \n    else:\n        print(f\"Failed to send image. Status code: {response.status_code}\")\n    \n\nif __name__ == \"__main__\":\n    run()\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Use to build server end-point and client end-point of OCR service.",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/kime541200/kocr"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6a5f4a5005af67a0aceff7417c0199fe05feabaf1181d4f560eb95bd3234d7b7",
                "md5": "d792aab8bb1e9c1d447cb53997a2f349",
                "sha256": "e7e145995265bd99f58daef1c447d0db0d0186fbde47e8c93e35442a985181cc"
            },
            "downloads": -1,
            "filename": "kocr-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d792aab8bb1e9c1d447cb53997a2f349",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 16759,
            "upload_time": "2024-08-21T15:21:28",
            "upload_time_iso_8601": "2024-08-21T15:21:28.856178Z",
            "url": "https://files.pythonhosted.org/packages/6a/5f/4a5005af67a0aceff7417c0199fe05feabaf1181d4f560eb95bd3234d7b7/kocr-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "184de7e58b7ed74be650a633975ea54fa4f6f6af0f35eb030478d7b429f93ab6",
                "md5": "02a4ba7fbc17b1920fc401517c79fb56",
                "sha256": "5e47515554c8295b981e52f4adadce6484498acdc5a4fb0728803889dd7e18d3"
            },
            "downloads": -1,
            "filename": "kocr-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "02a4ba7fbc17b1920fc401517c79fb56",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 13972,
            "upload_time": "2024-08-21T15:21:30",
            "upload_time_iso_8601": "2024-08-21T15:21:30.620228Z",
            "url": "https://files.pythonhosted.org/packages/18/4d/e7e58b7ed74be650a633975ea54fa4f6f6af0f35eb030478d7b429f93ab6/kocr-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-21 15:21:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kime541200",
    "github_project": "kocr",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "kocr"
}
        
Elapsed time: 0.59700s