| Name | kocr JSON |
| Version |
0.1.4
JSON |
| download |
| home_page | https://github.com/kime541200/kocr |
| Summary | Use to build server end-point and client end-point of OCR service. |
| upload_time | 2024-08-21 15:21:30 |
| maintainer | None |
| docs_url | None |
| author | Kim Chen |
| requires_python | >=3.9 |
| license | None |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# KOCR

## Introduction
PDF 中的資訊難以提取? 試試 OCR 吧!🤩
PDF 文件中的文字雖然易於閱讀,但想要提取其中的資訊卻常常讓人頭痛😓。別擔心,OCR (光學字符識別) 來救你啦!🙌
**OCR 能將 PDF 文件內的圖像轉換為可編輯的文本內容,並提供每個文字的位置資訊。** 🤯 這意味着你可以輕鬆的:
* 將 PDF 文档中的文字複製到其他應用程式中 📑
* 搜尋 PDF 文件中的特定關鍵字🔎
* 自動整理表格数据📊
* 更有效率地分析和處理文本信息📈
這個專案旨在提供一個方便又實用的解決方案,讓你快速、高效地提取 PDF 文件中的資訊。🚀
採用了 [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR.git) 開源預訓練模型 💪,並搭建了 server 端和 client 端:
* **完全本地運行!** 你不需要連網,任何時候都可以使用它!🌎
* **簡單易用!** 輕鬆架設完成,讓你快速上手 🚀
解鎖 PDF 文件的無限潛力吧!✨
## Pre-require
- [Python](https://www.python.org/downloads/)
- [Docker](https://docs.docker.com/engine/install/ubuntu/)
- 到[PaddleOCR](https://paddlepaddle.github.io/PaddleOCR/ppocr/model_list.html)官網下載相關模型, 模型尺寸請依據各自需求下載, 至少須下載以下3種模型各一個
- 檢測模型(det)
- 識別模型(rec)
- 文本方向分類模型(cls)
## Usage - Server
要啟動server有兩種方式:
- pip
- (推薦)Docker
### pip
```bash
conda create -n kocr python=3.11 -y -q
conda activate kocr
pip install kocr
```
啟動server前可以先設定模型目錄以及要運行的port
```bash
export OCR_MODEL_ROOT=/data/models/paddleocr
export DET_MODEL=/det/en/en_PP-OCRv3_det_infer
export REC_MODEL=/rec/en/en_PP-OCRv4_rec_infer
export CLS_MODEL=/cls/ch/ch_ppocr_mobile_v2.0_cls_slim_infer
export PORT=8868
python -m kocr.api_server
```
正常啟動server的話應該會看到
```bash
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8868 (Press CTRL+C to quit)
```
### Docker
1. Build Docker image
```bash
docker pull kime541200/kocr:latest
```
2. Create Docker container
```bash
sudo docker run -it \
--gpus='"device=0"' \
-v /data/models/paddleocr:/data/models/paddleocr \
-e OCR_MODEL_ROOT=/data/models/paddleocr \
-e DET_MODEL=/det/en/en_PP-OCRv3_det_infer \
-e REC_MODEL=/rec/en/en_PP-OCRv4_rec_infer \
-e CLS_MODEL=/cls/ch/ch_ppocr_mobile_v2.0_cls_slim_infer \
-e OCR_PORT=8868 \
-p 8868:8868 \
-w /usr/src/app/kocr \
--restart unless-stopped \
--name kocr \
kime541200/kocr:latest \
python api_server.py
```
其中
- `OCR_MODEL_ROOT` 是存放OCR模型的根目錄
- `DET_MODEL` 是存放檢測模型的目錄
- `REC_MODEL` 是存放識別模型的目錄
- `CLS_MODEL` 是存放文本方向分類模型的目錄
Server端預設情況下會去 `{OCR_MODEL_ROOT}{DET_MODEL}` 讀取檢測模型, 沒有的話就會直接下載到該目錄(須連網), 其他兩個模型依此類推。
這邊提供建立容器的範例中以 `-v /data/models/paddleocr:/data/models/paddleocr` 將本機的目錄掛載進容器中, 是因為我將模型放在本機的 `/data/models/paddleocr` 底下, 實際情況可依個人需求進行調整。
`-e OCR_PORT` 則可用來設置server運行的port。
容器建立後server會在背景運行, 例如: http://0.0.0.0:8868。
## Usage - Client
### PDF OCR
```python
from kocr.app.client.classes.OcrClient import OcrClient
from kocr.app.ocr.utils.utils import decode_base64_image, draw_text_box
ocr_client = OcrClient(host='http://127.0.0.1:8868') # change IP and port if needed
def run():
# leave `specific_pages` to `None` will stream every pages in the PDF file
for result in ocr_client.send_pdf(pdf_path='/path/to/file.pdf', specific_pages=[1, 3, 21]):
img = decode_base64_image(result['base64_img'])
draw_text_box(img=img, ocr_results=result['result'])
if __name__ == "__main__":
run()
```
### 圖片OCR
```python
from kocr.app.client.classes.OcrClient import OcrClient
from PIL import Image
from kocr.app.ocr.utils.utils import image_to_base64, decode_base64_image, draw_text_box
ocr_client = OcrClient(host='http://127.0.0.1:8868') # change IP and port if needed
def run():
# 載入本地影像
image = Image.open("/path/to/image.jpg")
img_base64 = image_to_base64(image)
response = ocr_client.send_image(img_base64=img_base64)
# 輸出伺服器的回應
if response.status_code == 200:
img = decode_base64_image(base64_str=response.json()['base64_img'])
ocr_result = response.json()['result']
draw_text_box(img=img, ocr_results=ocr_result)
else:
print(f"Failed to send image. Status code: {response.status_code}")
if __name__ == "__main__":
run()
```
### 滑動視窗OCR (處理較大圖片)
```python
from PIL import Image
from kocr.app.client.classes.OcrClient import OcrClient
from kocr.app.ocr.classes import OcrConfig
from kocr.app.ocr.utils.utils import image_to_base64, decode_base64_image, draw_text_box
ocr_client = OcrClient(host='http://127.0.0.1:8868')
def run():
# 載入本地影像
image = Image.open("/path/to/large.jpg")
img_base64 = image_to_base64(image)
# must set the slide window's size
config = {
"slice":{'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
}
response = ocr_client.send_image(img_base64=img_base64, config=OcrConfig(**config))
# 輸出伺服器的回應
if response.status_code == 200:
img = decode_base64_image(base64_str=response.json()['base64_img'])
ocr_result = response.json()['result']
draw_text_box(img=img, ocr_results=ocr_result)
else:
print(f"Failed to send image. Status code: {response.status_code}")
if __name__ == "__main__":
run()
```
Raw data
{
"_id": null,
"home_page": "https://github.com/kime541200/kocr",
"name": "kocr",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": null,
"author": "Kim Chen",
"author_email": "kime541200@outlook.com",
"download_url": "https://files.pythonhosted.org/packages/18/4d/e7e58b7ed74be650a633975ea54fa4f6f6af0f35eb030478d7b429f93ab6/kocr-0.1.4.tar.gz",
"platform": null,
"description": "# KOCR\n\n\n\n## Introduction\n\nPDF \u4e2d\u7684\u8cc7\u8a0a\u96e3\u4ee5\u63d0\u53d6\uff1f \u8a66\u8a66 OCR \u5427\uff01\ud83e\udd29\n\nPDF \u6587\u4ef6\u4e2d\u7684\u6587\u5b57\u96d6\u7136\u6613\u65bc\u95b1\u8b80\uff0c\u4f46\u60f3\u8981\u63d0\u53d6\u5176\u4e2d\u7684\u8cc7\u8a0a\u537b\u5e38\u5e38\u8b93\u4eba\u982d\u75db\ud83d\ude13\u3002\u5225\u64d4\u5fc3\uff0cOCR (\u5149\u5b78\u5b57\u7b26\u8b58\u5225) \u4f86\u6551\u4f60\u5566\uff01\ud83d\ude4c\n\n**OCR \u80fd\u5c07 PDF \u6587\u4ef6\u5167\u7684\u5716\u50cf\u8f49\u63db\u70ba\u53ef\u7de8\u8f2f\u7684\u6587\u672c\u5167\u5bb9\uff0c\u4e26\u63d0\u4f9b\u6bcf\u500b\u6587\u5b57\u7684\u4f4d\u7f6e\u8cc7\u8a0a\u3002** \ud83e\udd2f \u9019\u610f\u5473\u7740\u4f60\u53ef\u4ee5\u8f15\u9b06\u7684\uff1a\n\n* \u5c07 PDF \u6587\u6863\u4e2d\u7684\u6587\u5b57\u8907\u88fd\u5230\u5176\u4ed6\u61c9\u7528\u7a0b\u5f0f\u4e2d \ud83d\udcd1\n* \u641c\u5c0b PDF \u6587\u4ef6\u4e2d\u7684\u7279\u5b9a\u95dc\u9375\u5b57\ud83d\udd0e\n* \u81ea\u52d5\u6574\u7406\u8868\u683c\u6570\u636e\ud83d\udcca\n* \u66f4\u6709\u6548\u7387\u5730\u5206\u6790\u548c\u8655\u7406\u6587\u672c\u4fe1\u606f\ud83d\udcc8\n\n\u9019\u500b\u5c08\u6848\u65e8\u5728\u63d0\u4f9b\u4e00\u500b\u65b9\u4fbf\u53c8\u5be6\u7528\u7684\u89e3\u6c7a\u65b9\u6848\uff0c\u8b93\u4f60\u5feb\u901f\u3001\u9ad8\u6548\u5730\u63d0\u53d6 PDF \u6587\u4ef6\u4e2d\u7684\u8cc7\u8a0a\u3002\ud83d\ude80\n\n\u63a1\u7528\u4e86 [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR.git) \u958b\u6e90\u9810\u8a13\u7df4\u6a21\u578b \ud83d\udcaa\uff0c\u4e26\u642d\u5efa\u4e86 server \u7aef\u548c client \u7aef\uff1a\n\n* **\u5b8c\u5168\u672c\u5730\u904b\u884c\uff01** \u4f60\u4e0d\u9700\u8981\u9023\u7db2\uff0c\u4efb\u4f55\u6642\u5019\u90fd\u53ef\u4ee5\u4f7f\u7528\u5b83\uff01\ud83c\udf0e\n* **\u7c21\u55ae\u6613\u7528!** \u8f15\u9b06\u67b6\u8a2d\u5b8c\u6210\uff0c\u8b93\u4f60\u5feb\u901f\u4e0a\u624b \ud83d\ude80\n\n\u89e3\u9396 PDF \u6587\u4ef6\u7684\u7121\u9650\u6f5b\u529b\u5427\uff01\u2728\n\n## Pre-require\n- [Python](https://www.python.org/downloads/)\n- [Docker](https://docs.docker.com/engine/install/ubuntu/)\n- \u5230[PaddleOCR](https://paddlepaddle.github.io/PaddleOCR/ppocr/model_list.html)\u5b98\u7db2\u4e0b\u8f09\u76f8\u95dc\u6a21\u578b, \u6a21\u578b\u5c3a\u5bf8\u8acb\u4f9d\u64da\u5404\u81ea\u9700\u6c42\u4e0b\u8f09, \u81f3\u5c11\u9808\u4e0b\u8f09\u4ee5\u4e0b3\u7a2e\u6a21\u578b\u5404\u4e00\u500b\n - \u6aa2\u6e2c\u6a21\u578b(det)\n - \u8b58\u5225\u6a21\u578b(rec)\n - \u6587\u672c\u65b9\u5411\u5206\u985e\u6a21\u578b(cls)\n\n## Usage - Server\n\n\u8981\u555f\u52d5server\u6709\u5169\u7a2e\u65b9\u5f0f:\n- pip\n- (\u63a8\u85a6)Docker\n\n### pip\n\n```bash\nconda create -n kocr python=3.11 -y -q\nconda activate kocr\npip install kocr\n```\n\n\u555f\u52d5server\u524d\u53ef\u4ee5\u5148\u8a2d\u5b9a\u6a21\u578b\u76ee\u9304\u4ee5\u53ca\u8981\u904b\u884c\u7684port\n\n```bash\nexport OCR_MODEL_ROOT=/data/models/paddleocr\nexport DET_MODEL=/det/en/en_PP-OCRv3_det_infer\nexport REC_MODEL=/rec/en/en_PP-OCRv4_rec_infer\nexport CLS_MODEL=/cls/ch/ch_ppocr_mobile_v2.0_cls_slim_infer\nexport PORT=8868\n\npython -m kocr.api_server\n```\n\n\u6b63\u5e38\u555f\u52d5server\u7684\u8a71\u61c9\u8a72\u6703\u770b\u5230\n```bash\nINFO: Application startup complete.\nINFO: Uvicorn running on http://0.0.0.0:8868 (Press CTRL+C to quit)\n```\n\n### Docker\n1. Build Docker image\n```bash\ndocker pull kime541200/kocr:latest\n```\n\n2. Create Docker container\n```bash\nsudo docker run -it \\\n--gpus='\"device=0\"' \\\n-v /data/models/paddleocr:/data/models/paddleocr \\\n-e OCR_MODEL_ROOT=/data/models/paddleocr \\\n-e DET_MODEL=/det/en/en_PP-OCRv3_det_infer \\\n-e REC_MODEL=/rec/en/en_PP-OCRv4_rec_infer \\\n-e CLS_MODEL=/cls/ch/ch_ppocr_mobile_v2.0_cls_slim_infer \\\n-e OCR_PORT=8868 \\\n-p 8868:8868 \\\n-w /usr/src/app/kocr \\\n--restart unless-stopped \\\n--name kocr \\\nkime541200/kocr:latest \\\npython api_server.py\n```\n\n\u5176\u4e2d \n- `OCR_MODEL_ROOT` \u662f\u5b58\u653eOCR\u6a21\u578b\u7684\u6839\u76ee\u9304\n- `DET_MODEL` \u662f\u5b58\u653e\u6aa2\u6e2c\u6a21\u578b\u7684\u76ee\u9304\n- `REC_MODEL` \u662f\u5b58\u653e\u8b58\u5225\u6a21\u578b\u7684\u76ee\u9304\n- `CLS_MODEL` \u662f\u5b58\u653e\u6587\u672c\u65b9\u5411\u5206\u985e\u6a21\u578b\u7684\u76ee\u9304\n\nServer\u7aef\u9810\u8a2d\u60c5\u6cc1\u4e0b\u6703\u53bb `{OCR_MODEL_ROOT}{DET_MODEL}` \u8b80\u53d6\u6aa2\u6e2c\u6a21\u578b, \u6c92\u6709\u7684\u8a71\u5c31\u6703\u76f4\u63a5\u4e0b\u8f09\u5230\u8a72\u76ee\u9304(\u9808\u9023\u7db2), \u5176\u4ed6\u5169\u500b\u6a21\u578b\u4f9d\u6b64\u985e\u63a8\u3002\n\n\u9019\u908a\u63d0\u4f9b\u5efa\u7acb\u5bb9\u5668\u7684\u7bc4\u4f8b\u4e2d\u4ee5 `-v /data/models/paddleocr:/data/models/paddleocr` \u5c07\u672c\u6a5f\u7684\u76ee\u9304\u639b\u8f09\u9032\u5bb9\u5668\u4e2d, \u662f\u56e0\u70ba\u6211\u5c07\u6a21\u578b\u653e\u5728\u672c\u6a5f\u7684 `/data/models/paddleocr` \u5e95\u4e0b, \u5be6\u969b\u60c5\u6cc1\u53ef\u4f9d\u500b\u4eba\u9700\u6c42\u9032\u884c\u8abf\u6574\u3002\n\n`-e OCR_PORT` \u5247\u53ef\u7528\u4f86\u8a2d\u7f6eserver\u904b\u884c\u7684port\u3002\n\n\u5bb9\u5668\u5efa\u7acb\u5f8cserver\u6703\u5728\u80cc\u666f\u904b\u884c, \u4f8b\u5982: http://0.0.0.0:8868\u3002\n\n## Usage - Client\n\n### PDF OCR\n\n```python\nfrom kocr.app.client.classes.OcrClient import OcrClient\nfrom kocr.app.ocr.utils.utils import decode_base64_image, draw_text_box\n\nocr_client = OcrClient(host='http://127.0.0.1:8868') # change IP and port if needed\n\ndef run():\n # leave `specific_pages` to `None` will stream every pages in the PDF file\n for result in ocr_client.send_pdf(pdf_path='/path/to/file.pdf', specific_pages=[1, 3, 21]): \n img = decode_base64_image(result['base64_img'])\n draw_text_box(img=img, ocr_results=result['result'])\n \nif __name__ == \"__main__\":\n run()\n```\n\n### \u5716\u7247OCR\n\n```python\nfrom kocr.app.client.classes.OcrClient import OcrClient\nfrom PIL import Image\nfrom kocr.app.ocr.utils.utils import image_to_base64, decode_base64_image, draw_text_box\n\nocr_client = OcrClient(host='http://127.0.0.1:8868') # change IP and port if needed\n\ndef run():\n # \u8f09\u5165\u672c\u5730\u5f71\u50cf\n image = Image.open(\"/path/to/image.jpg\")\n\n img_base64 = image_to_base64(image)\n response = ocr_client.send_image(img_base64=img_base64)\n\n # \u8f38\u51fa\u4f3a\u670d\u5668\u7684\u56de\u61c9\n if response.status_code == 200:\n img = decode_base64_image(base64_str=response.json()['base64_img'])\n ocr_result = response.json()['result']\n draw_text_box(img=img, ocr_results=ocr_result)\n \n else:\n print(f\"Failed to send image. Status code: {response.status_code}\")\n\nif __name__ == \"__main__\":\n run()\n```\n\n### \u6ed1\u52d5\u8996\u7a97OCR (\u8655\u7406\u8f03\u5927\u5716\u7247)\n\n```python\nfrom PIL import Image\nfrom kocr.app.client.classes.OcrClient import OcrClient\nfrom kocr.app.ocr.classes import OcrConfig\nfrom kocr.app.ocr.utils.utils import image_to_base64, decode_base64_image, draw_text_box\n\nocr_client = OcrClient(host='http://127.0.0.1:8868')\n\ndef run():\n # \u8f09\u5165\u672c\u5730\u5f71\u50cf\n image = Image.open(\"/path/to/large.jpg\")\n\n img_base64 = image_to_base64(image)\n # must set the slide window's size\n config = {\n \"slice\":{'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}\n }\n response = ocr_client.send_image(img_base64=img_base64, config=OcrConfig(**config))\n\n # \u8f38\u51fa\u4f3a\u670d\u5668\u7684\u56de\u61c9\n if response.status_code == 200:\n img = decode_base64_image(base64_str=response.json()['base64_img'])\n ocr_result = response.json()['result']\n draw_text_box(img=img, ocr_results=ocr_result)\n \n else:\n print(f\"Failed to send image. Status code: {response.status_code}\")\n \n\nif __name__ == \"__main__\":\n run()\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "Use to build server end-point and client end-point of OCR service.",
"version": "0.1.4",
"project_urls": {
"Homepage": "https://github.com/kime541200/kocr"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6a5f4a5005af67a0aceff7417c0199fe05feabaf1181d4f560eb95bd3234d7b7",
"md5": "d792aab8bb1e9c1d447cb53997a2f349",
"sha256": "e7e145995265bd99f58daef1c447d0db0d0186fbde47e8c93e35442a985181cc"
},
"downloads": -1,
"filename": "kocr-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d792aab8bb1e9c1d447cb53997a2f349",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 16759,
"upload_time": "2024-08-21T15:21:28",
"upload_time_iso_8601": "2024-08-21T15:21:28.856178Z",
"url": "https://files.pythonhosted.org/packages/6a/5f/4a5005af67a0aceff7417c0199fe05feabaf1181d4f560eb95bd3234d7b7/kocr-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "184de7e58b7ed74be650a633975ea54fa4f6f6af0f35eb030478d7b429f93ab6",
"md5": "02a4ba7fbc17b1920fc401517c79fb56",
"sha256": "5e47515554c8295b981e52f4adadce6484498acdc5a4fb0728803889dd7e18d3"
},
"downloads": -1,
"filename": "kocr-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "02a4ba7fbc17b1920fc401517c79fb56",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 13972,
"upload_time": "2024-08-21T15:21:30",
"upload_time_iso_8601": "2024-08-21T15:21:30.620228Z",
"url": "https://files.pythonhosted.org/packages/18/4d/e7e58b7ed74be650a633975ea54fa4f6f6af0f35eb030478d7b429f93ab6/kocr-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-21 15:21:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kime541200",
"github_project": "kocr",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "kocr"
}