winocr

Name	winocr JSON
Version	0.0.15 JSON
	download
home_page	https://github.com/GitHub30/winocr
Summary	Windows.Media.Ocr
upload_time	2024-10-25 04:11:57
maintainer	None
docs_url	None
author	Tomofumi Inoue
requires_python	None
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # WinOCR
[![Python](https://img.shields.io/pypi/pyversions/winocr.svg)](https://badge.fury.io/py/winocr)
[![PyPI](https://badge.fury.io/py/winocr.svg)](https://badge.fury.io/py/winocr)

# Installation
```powershell
pip install winocr
```

<details>
  <summary>Full install</summary>
  
  ```powershell
  pip install winocr[all]
  ```
</details>

# Usage

## Pillow

The language to be recognized can be specified by the lang parameter (second argument).

```python
import winocr
from PIL import Image

img = Image.open('test.jpg')
(await winocr.recognize_pil(img, 'ja')).text
```
![](https://camo.githubusercontent.com/4e68db4fc3106c03e9919eb4391ce7548c1321429f9dc1a95a6937f51f01d5f6/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f363337383562393633666135643637653966326265316163396534393533353739663463323538342f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663333333733303631333533343633333832643632333533363631326433353333363233383264333636363332333532643336333433333333363333313336333033383338363636313265373036653637)

## OpenCV

```python
import winocr
import cv2

img = cv2.imread('test.jpg')
(await winocr.recognize_cv2(img, 'ja')).text
```
![](https://camo.githubusercontent.com/fbbc81dd9fb138032625585dd3cd41a4b14b14621be77c11a15ea8949a3cc8a3/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f616439313337366536316230653332613234336664633932613435383665383763386636383362612f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663635333833303331333336333338333632643631333833333332326433393338333736333264333533373633333832643331333533383335333633353636333433313330333433323265373036653637)

## Connect to local runtime on Colaboratory

Create a local connection by following [these instructions](https://research.google.com/colaboratory/local-runtimes.html).

```powershell
pip install jupyterlab jupyter_http_over_ws
jupyter serverextension enable --py jupyter_http_over_ws
jupyter notebook --NotebookApp.allow_origin='https://colab.research.google.com' --ip=0.0.0.0 --port=8888 --NotebookApp.port_retries=0
```

![](https://i.imgur.com/gvj959U.png)

![](https://i.imgur.com/o9e0Fwk.png)

Also available on Jupyter / Jupyter Lab.

## REPL

```python
import cv2
from winocr import recognize_cv2_sync

img = cv2.imread('testocr.png')
recognize_cv2_sync(img)['text']
'This is a lot of 12 point text to test the ocr code and see if it works on all types of file format. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox.'
```

```python
from PIL import Image
from winocr import recognize_pil_sync

img = Image.open('testocr.png')
recognize_pil_sync(img)['text']
'This is a lot of 12 point text to test the ocr code and see if it works on all types of file format. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox.'
```

## Multi-Processing

```python
from PIL import Image
import concurrent.futures
from winocr import recognize_pil_sync

images = [Image.open('testocr.png') for i in range(1000)]

with concurrent.futures.ProcessPoolExecutor() as executor:
  results = list(executor.map(recognize_pil_sync, images))
print(results)
```

## Web API

Run server
```powershell
pip install winocr[api]
winocr_serve
```

### curl

```bash
curl localhost:8000?lang=ja --data-binary @test.jpg
```
![](https://camo.githubusercontent.com/658ff5e7ff505281fc464f642579ab8dac1a7e9120a0345c0eeaf0f46995c404/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f396463623138383330656665343832643962626231633861393064383032303566373131313265642f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663636363433313338333433353636363332643635333633343337326433303634333736363264333533333336363532643336333436353333363333303332363336333338363133313265373036653637)

### Python

```python
import requests

bytes = open('test.jpg', 'rb').read()
requests.post('http://localhost:8000/?lang=ja', bytes).json()['text']
```

![](https://camo.githubusercontent.com/fb338aadf3f057e14c4b6474f4802b6958f9264aff634fdf22d7d5b321747bd5/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f303438353362653766613263333839623339323161653461303938663165343161626162316136372f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663634333733313632333333353331333632643332363436363635326436313634333933313264363236363631333532643636333433373636333736323338333636333332363336333265373036653637)

You can run OCR with the Colaboratory runtime with `./ngrok http 8000`

```python
from PIL import Image
from io import BytesIO

img = Image.open('test.jpg')
# Preprocessing
buf = BytesIO()
img.save(buf, format='JPEG')
requests.post('https://15a5fabf0d78.ngrok.io/?lang=ja', buf.getvalue()).json()['text']
```
![](https://camo.githubusercontent.com/61adc7eb41c54bedfd19ab3ce2e55dd7b0c865a22c0ab787439296a0afc75d7a/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f656538343938663932656566303336333262623064336162623236646531323639393730393030632f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663333333933303634333933353339333732643331363433353335326433353634333136333264333833353332333032643331333536333334363133383331333133383634363436343265373036653637)

```python
import cv2
import requests

img = cv2.imread('test.jpg')
# Preprocessing
requests.post('https://15a5fabf0d78.ngrok.io/?lang=ja', cv2.imencode('.jpg', img)[1].tobytes()).json()['text']
```
![](https://camo.githubusercontent.com/a303dc95a4df7dbef67143a983b7792172b3d1b1837b0be7e7fa3c8a92b728d7/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f653566346530626630353338623835316464643532353837393630306137313261336365393738612f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663339363133333634363133353334363132643336333636343336326433393631333833323264333733303338363632643634363433343337363333323338333233313339333033303265373036653637)

### JavaScript

If you only need to recognize Chrome and English, you can also consider the Text Detection API.

```javascript
// File
const file = document.querySelector('[type=file]').files[0]
await fetch('http://localhost:8000/', {method: 'POST', body: file}).then(r => r.json())

// Blob
const blob = await fetch('https://image.itmedia.co.jp/ait/articles/1706/15/news015_16.jpg').then(r=>r.blob())
await fetch('http://localhost:8000/?lang=ja', {method: 'POST', body: blob}).then(r => r.json())
```

It is also possible to run OCR Server on Windows Server.

# Information that can be obtained
You can get **angle**, **text**, **line**, **word**, **BoundingBox**.

```python
import pprint

result = await winocr.recognize_pil(img, 'ja')
pprint.pprint({
    'text_angle': result.text_angle,
    'text': result.text,
    'lines': [{
        'text': line.text,
        'words': [{
            'bounding_rect': {'x': word.bounding_rect.x, 'y': word.bounding_rect.y, 'width': word.bounding_rect.width, 'height': word.bounding_rect.height},
            'text': word.text
        } for word in line.words]
    } for line in result.lines]
})
```
![](https://camo.githubusercontent.com/c0715ad500369e6b1b498293335bd8844e38baee7ead335a7047128947f0b9b6/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f636561393234303738393733346663323734383663363265666563373936623633393764376433352f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663633363633353334333736323331333132643331333033383634326436333633333533333264363533383633333332643331333636363333333736353634333233383631363333353265373036653637)

# Language installation
```powershell
# Run as Administrator
Add-WindowsCapability -Online -Name "Language.OCR~~~en-US~0.0.1.0"
Add-WindowsCapability -Online -Name "Language.OCR~~~ja-JP~0.0.1.0"

# Search for installed languages
Get-WindowsCapability -Online -Name "Language.OCR*"
# State: Not Present language is not installed, so please install it if necessary.
Name         : Language.OCR~~~hu-HU~0.0.1.0
State        : NotPresent
DisplayName  : ハンガリー語の光学式文字認識
Description  : ハンガリー語の光学式文字認識
DownloadSize : 194407
InstallSize  : 535714

Name         : Language.OCR~~~it-IT~0.0.1.0
State        : NotPresent
DisplayName  : イタリア語の光学式文字認識
Description  : イタリア語の光学式文字認識
DownloadSize : 159875
InstallSize  : 485922

Name         : Language.OCR~~~ja-JP~0.0.1.0
State        : Installed
DisplayName  : 日本語の光学式文字認識
Description  : 日本語の光学式文字認識
DownloadSize : 1524589
InstallSize  : 3398536

Name         : Language.OCR~~~ko-KR~0.0.1.0
State        : NotPresent
DisplayName  : 韓国語の光学式文字認識
Description  : 韓国語の光学式文字認識
DownloadSize : 3405683
InstallSize  : 7890408
```

If you hate Python and just want to recognize it with PowerShell, click [here](https://gist.github.com/GitHub30/8bc1e784148e4f9801520c7e7ba191ea)

# Multi-Processing

By processing in parallel, it is 3 times faster. You can make it even faster by increasing the number of cores!

```python
from PIL import Image

images = [Image.open('testocr.png') for i in range(1000)]
```

### 1 core(elapsed 48s)

The CPU is not used up.
![](https://camo.githubusercontent.com/a9003bdc7db7d8c0524fd8f9ef2394eac4a7ad68ba618954f518ed81a12738e8/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f633963393931656231343733313337383636666238363933656231643462656637623661646466632f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663636363133323633333236363335333232643339363633383336326436343334333533323264363433323633333732643631363233333633333036353330363136363338333736343265373036653637)

```python
import winocr

[(await winocr.recognize_pil(img)).text for img in images]
```
![](https://camo.githubusercontent.com/5e965ce96d5b3fdb5220c619ceb1597d09fea8d34df5f3a7a0b5388a8286a034/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f356261623862393830666565333764363632663733383933646632613463306234623439346464312f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663635363536353336363336333332333032643337363636333335326433373634363233373264333833343634363232643633363136353631363533323634363536363631333933393265373036653637)

### 4 cores(elapsed 16s)

I'm using 100% CPU.

![](https://camo.githubusercontent.com/9bc7fc8bbf5c1e5cc9a89e4fb2233900867b6f79019ee530fe36e5d36c896ad9/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f323732326136303261313930616335653534646637313634623965336366373134636234386434322f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663337363336353635363236363331363532643636333233323636326433353330363533353264333933363335363132643334333033323636363636333337333833363334333536323265373036653637)

Create a worker module.
```python
%%writefile worker.py
import winocr
import asyncio

async def ensure_coroutine(awaitable):
    return await awaitable

def recognize_pil_text(img):
    return asyncio.run(ensure_coroutine(winocr.recognize_pil(img))).text
```

```python
import worker
import concurrent.futures

with concurrent.futures.ProcessPoolExecutor() as executor:
  # https://stackoverflow.com/questions/62488423
  results = executor.map(worker.recognize_pil_text, images)
list(results)
```

![](https://camo.githubusercontent.com/cd21e01dd05a064986c764e0b86aa98f3b25ad3b346ff5bdfee3d1dd7dbae132/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f653137323531336435386531306339616436646464313438656562373865316263313132663632342f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663631333733313336333733353337333132643631363133353634326436333632333133353264363136343631333132643631333236343332333033303635333533383635363233383265373036653637)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/GitHub30/winocr",
    "name": "winocr",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Tomofumi Inoue",
    "author_email": "funaox@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/46/d6/98fa493ecd6b9c411351e1d01ab6da770ae5c534046fd158fe53b689550b/winocr-0.0.15.tar.gz",
    "platform": null,
    "description": "# WinOCR\n[![Python](https://img.shields.io/pypi/pyversions/winocr.svg)](https://badge.fury.io/py/winocr)\n[![PyPI](https://badge.fury.io/py/winocr.svg)](https://badge.fury.io/py/winocr)\n\n# Installation\n```powershell\npip install winocr\n```\n\n<details>\n  <summary>Full install</summary>\n  \n  ```powershell\n  pip install winocr[all]\n  ```\n</details>\n\n# Usage\n\n## Pillow\n\nThe language to be recognized can be specified by the lang parameter (second argument).\n\n```python\nimport winocr\nfrom PIL import Image\n\nimg = Image.open('test.jpg')\n(await winocr.recognize_pil(img, 'ja')).text\n```\n![](https://camo.githubusercontent.com/4e68db4fc3106c03e9919eb4391ce7548c1321429f9dc1a95a6937f51f01d5f6/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f363337383562393633666135643637653966326265316163396534393533353739663463323538342f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663333333733303631333533343633333832643632333533363631326433353333363233383264333636363332333532643336333433333333363333313336333033383338363636313265373036653637)\n\n## OpenCV\n\n```python\nimport winocr\nimport cv2\n\nimg = cv2.imread('test.jpg')\n(await winocr.recognize_cv2(img, 'ja')).text\n```\n![](https://camo.githubusercontent.com/fbbc81dd9fb138032625585dd3cd41a4b14b14621be77c11a15ea8949a3cc8a3/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f616439313337366536316230653332613234336664633932613435383665383763386636383362612f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663635333833303331333336333338333632643631333833333332326433393338333736333264333533373633333832643331333533383335333633353636333433313330333433323265373036653637)\n\n## Connect to local runtime on Colaboratory\n\nCreate a local connection by following [these instructions](https://research.google.com/colaboratory/local-runtimes.html).\n\n```powershell\npip install jupyterlab jupyter_http_over_ws\njupyter serverextension enable --py jupyter_http_over_ws\njupyter notebook --NotebookApp.allow_origin='https://colab.research.google.com' --ip=0.0.0.0 --port=8888 --NotebookApp.port_retries=0\n```\n\n![](https://i.imgur.com/gvj959U.png)\n\n![](https://i.imgur.com/o9e0Fwk.png)\n\nAlso available on Jupyter / Jupyter Lab.\n\n## REPL\n\n```python\nimport cv2\nfrom winocr import recognize_cv2_sync\n\nimg = cv2.imread('testocr.png')\nrecognize_cv2_sync(img)['text']\n'This is a lot of 12 point text to test the ocr code and see if it works on all types of file format. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox.'\n```\n\n```python\nfrom PIL import Image\nfrom winocr import recognize_pil_sync\n\nimg = Image.open('testocr.png')\nrecognize_pil_sync(img)['text']\n'This is a lot of 12 point text to test the ocr code and see if it works on all types of file format. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox.'\n```\n\n## Multi-Processing\n\n```python\nfrom PIL import Image\nimport concurrent.futures\nfrom winocr import recognize_pil_sync\n\nimages = [Image.open('testocr.png') for i in range(1000)]\n\nwith concurrent.futures.ProcessPoolExecutor() as executor:\n  results = list(executor.map(recognize_pil_sync, images))\nprint(results)\n```\n\n## Web API\n\nRun server\n```powershell\npip install winocr[api]\nwinocr_serve\n```\n\n### curl\n\n```bash\ncurl localhost:8000?lang=ja --data-binary @test.jpg\n```\n![](https://camo.githubusercontent.com/658ff5e7ff505281fc464f642579ab8dac1a7e9120a0345c0eeaf0f46995c404/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f396463623138383330656665343832643962626231633861393064383032303566373131313265642f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663636363433313338333433353636363332643635333633343337326433303634333736363264333533333336363532643336333436353333363333303332363336333338363133313265373036653637)\n\n### Python\n\n```python\nimport requests\n\nbytes = open('test.jpg', 'rb').read()\nrequests.post('http://localhost:8000/?lang=ja', bytes).json()['text']\n```\n\n![](https://camo.githubusercontent.com/fb338aadf3f057e14c4b6474f4802b6958f9264aff634fdf22d7d5b321747bd5/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f303438353362653766613263333839623339323161653461303938663165343161626162316136372f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663634333733313632333333353331333632643332363436363635326436313634333933313264363236363631333532643636333433373636333736323338333636333332363336333265373036653637)\n\nYou can run OCR with the Colaboratory runtime with `./ngrok http 8000`\n\n```python\nfrom PIL import Image\nfrom io import BytesIO\n\nimg = Image.open('test.jpg')\n# Preprocessing\nbuf = BytesIO()\nimg.save(buf, format='JPEG')\nrequests.post('https://15a5fabf0d78.ngrok.io/?lang=ja', buf.getvalue()).json()['text']\n```\n![](https://camo.githubusercontent.com/61adc7eb41c54bedfd19ab3ce2e55dd7b0c865a22c0ab787439296a0afc75d7a/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f656538343938663932656566303336333262623064336162623236646531323639393730393030632f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663333333933303634333933353339333732643331363433353335326433353634333136333264333833353332333032643331333536333334363133383331333133383634363436343265373036653637)\n\n```python\nimport cv2\nimport requests\n\nimg = cv2.imread('test.jpg')\n# Preprocessing\nrequests.post('https://15a5fabf0d78.ngrok.io/?lang=ja', cv2.imencode('.jpg', img)[1].tobytes()).json()['text']\n```\n![](https://camo.githubusercontent.com/a303dc95a4df7dbef67143a983b7792172b3d1b1837b0be7e7fa3c8a92b728d7/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f653566346530626630353338623835316464643532353837393630306137313261336365393738612f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663339363133333634363133353334363132643336333636343336326433393631333833323264333733303338363632643634363433343337363333323338333233313339333033303265373036653637)\n\n### JavaScript\n\nIf you only need to recognize Chrome and English, you can also consider the Text Detection API.\n\n```javascript\n// File\nconst file = document.querySelector('[type=file]').files[0]\nawait fetch('http://localhost:8000/', {method: 'POST', body: file}).then(r => r.json())\n\n// Blob\nconst blob = await fetch('https://image.itmedia.co.jp/ait/articles/1706/15/news015_16.jpg').then(r=>r.blob())\nawait fetch('http://localhost:8000/?lang=ja', {method: 'POST', body: blob}).then(r => r.json())\n```\n\nIt is also possible to run OCR Server on Windows Server.\n\n# Information that can be obtained\nYou can get **angle**, **text**, **line**, **word**, **BoundingBox**.\n\n```python\nimport pprint\n\nresult = await winocr.recognize_pil(img, 'ja')\npprint.pprint({\n    'text_angle': result.text_angle,\n    'text': result.text,\n    'lines': [{\n        'text': line.text,\n        'words': [{\n            'bounding_rect': {'x': word.bounding_rect.x, 'y': word.bounding_rect.y, 'width': word.bounding_rect.width, 'height': word.bounding_rect.height},\n            'text': word.text\n        } for word in line.words]\n    } for line in result.lines]\n})\n```\n![](https://camo.githubusercontent.com/c0715ad500369e6b1b498293335bd8844e38baee7ead335a7047128947f0b9b6/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f636561393234303738393733346663323734383663363265666563373936623633393764376433352f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663633363633353334333736323331333132643331333033383634326436333633333533333264363533383633333332643331333636363333333736353634333233383631363333353265373036653637)\n\n# Language installation\n```powershell\n# Run as Administrator\nAdd-WindowsCapability -Online -Name \"Language.OCR~~~en-US~0.0.1.0\"\nAdd-WindowsCapability -Online -Name \"Language.OCR~~~ja-JP~0.0.1.0\"\n\n# Search for installed languages\nGet-WindowsCapability -Online -Name \"Language.OCR*\"\n# State: Not Present language is not installed, so please install it if necessary.\nName         : Language.OCR~~~hu-HU~0.0.1.0\nState        : NotPresent\nDisplayName  : \u30cf\u30f3\u30ac\u30ea\u30fc\u8a9e\u306e\u5149\u5b66\u5f0f\u6587\u5b57\u8a8d\u8b58\nDescription  : \u30cf\u30f3\u30ac\u30ea\u30fc\u8a9e\u306e\u5149\u5b66\u5f0f\u6587\u5b57\u8a8d\u8b58\nDownloadSize : 194407\nInstallSize  : 535714\n\nName         : Language.OCR~~~it-IT~0.0.1.0\nState        : NotPresent\nDisplayName  : \u30a4\u30bf\u30ea\u30a2\u8a9e\u306e\u5149\u5b66\u5f0f\u6587\u5b57\u8a8d\u8b58\nDescription  : \u30a4\u30bf\u30ea\u30a2\u8a9e\u306e\u5149\u5b66\u5f0f\u6587\u5b57\u8a8d\u8b58\nDownloadSize : 159875\nInstallSize  : 485922\n\nName         : Language.OCR~~~ja-JP~0.0.1.0\nState        : Installed\nDisplayName  : \u65e5\u672c\u8a9e\u306e\u5149\u5b66\u5f0f\u6587\u5b57\u8a8d\u8b58\nDescription  : \u65e5\u672c\u8a9e\u306e\u5149\u5b66\u5f0f\u6587\u5b57\u8a8d\u8b58\nDownloadSize : 1524589\nInstallSize  : 3398536\n\nName         : Language.OCR~~~ko-KR~0.0.1.0\nState        : NotPresent\nDisplayName  : \u97d3\u56fd\u8a9e\u306e\u5149\u5b66\u5f0f\u6587\u5b57\u8a8d\u8b58\nDescription  : \u97d3\u56fd\u8a9e\u306e\u5149\u5b66\u5f0f\u6587\u5b57\u8a8d\u8b58\nDownloadSize : 3405683\nInstallSize  : 7890408\n```\n\nIf you hate Python and just want to recognize it with PowerShell, click [here](https://gist.github.com/GitHub30/8bc1e784148e4f9801520c7e7ba191ea)\n\n# Multi-Processing\n\nBy processing in parallel, it is 3 times faster. You can make it even faster by increasing the number of cores!\n\n```python\nfrom PIL import Image\n\nimages = [Image.open('testocr.png') for i in range(1000)]\n```\n\n### 1 core(elapsed 48s)\n\nThe CPU is not used up.\n![](https://camo.githubusercontent.com/a9003bdc7db7d8c0524fd8f9ef2394eac4a7ad68ba618954f518ed81a12738e8/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f633963393931656231343733313337383636666238363933656231643462656637623661646466632f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663636363133323633333236363335333232643339363633383336326436343334333533323264363433323633333732643631363233333633333036353330363136363338333736343265373036653637)\n\n```python\nimport winocr\n\n[(await winocr.recognize_pil(img)).text for img in images]\n```\n![](https://camo.githubusercontent.com/5e965ce96d5b3fdb5220c619ceb1597d09fea8d34df5f3a7a0b5388a8286a034/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f356261623862393830666565333764363632663733383933646632613463306234623439346464312f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663635363536353336363336333332333032643337363636333335326433373634363233373264333833343634363232643633363136353631363533323634363536363631333933393265373036653637)\n\n### 4 cores(elapsed 16s)\n\nI'm using 100% CPU.\n\n![](https://camo.githubusercontent.com/9bc7fc8bbf5c1e5cc9a89e4fb2233900867b6f79019ee530fe36e5d36c896ad9/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f323732326136303261313930616335653534646637313634623965336366373134636234386434322f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663337363336353635363236363331363532643636333233323636326433353330363533353264333933363335363132643334333033323636363636333337333833363334333536323265373036653637)\n\nCreate a worker module.\n```python\n%%writefile worker.py\nimport winocr\nimport asyncio\n\nasync def ensure_coroutine(awaitable):\n    return await awaitable\n\ndef recognize_pil_text(img):\n    return asyncio.run(ensure_coroutine(winocr.recognize_pil(img))).text\n```\n\n```python\nimport worker\nimport concurrent.futures\n\nwith concurrent.futures.ProcessPoolExecutor() as executor:\n  # https://stackoverflow.com/questions/62488423\n  results = executor.map(worker.recognize_pil_text, images)\nlist(results)\n```\n\n![](https://camo.githubusercontent.com/cd21e01dd05a064986c764e0b86aa98f3b25ad3b346ff5bdfee3d1dd7dbae132/68747470733a2f2f63616d6f2e716969746175736572636f6e74656e742e636f6d2f653137323531336435386531306339616436646464313438656562373865316263313132663632342f36383734373437303733336132663266373136393639373436313264363936643631363736353264373337343666373236353265373333333265363137303264366536663732373436383635363137333734326433313265363136643631376136663665363137373733326536333666366432663330326633323330333833333336333332663631333733313336333733353337333132643631363133353634326436333632333133353264363136343631333132643631333236343332333033303635333533383635363233383265373036653637)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Windows.Media.Ocr",
    "version": "0.0.15",
    "project_urls": {
        "Bug Tracker": "https://github.com/GitHub30/winocr/issues",
        "Homepage": "https://github.com/GitHub30/winocr"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cb6cc076c34b70df9fbb84df1dc61cc5be5f36438afc81ecde6dc0e33ed68d24",
                "md5": "6a7999a117380d5171bc67f03fe71f6a",
                "sha256": "48ff0a0a2dbcd1f7e84b4fb22575f5d4aba0ec99fdfe07fb0d04254d00826f18"
            },
            "downloads": -1,
            "filename": "winocr-0.0.15-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6a7999a117380d5171bc67f03fe71f6a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 7665,
            "upload_time": "2024-10-25T04:11:56",
            "upload_time_iso_8601": "2024-10-25T04:11:56.203116Z",
            "url": "https://files.pythonhosted.org/packages/cb/6c/c076c34b70df9fbb84df1dc61cc5be5f36438afc81ecde6dc0e33ed68d24/winocr-0.0.15-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "46d698fa493ecd6b9c411351e1d01ab6da770ae5c534046fd158fe53b689550b",
                "md5": "c130546c1a523fdb3a23bb5176b790d5",
                "sha256": "767f85e8331f6348985813f37763b21838090d23acd9afdd372fb1f7cabc3881"
            },
            "downloads": -1,
            "filename": "winocr-0.0.15.tar.gz",
            "has_sig": false,
            "md5_digest": "c130546c1a523fdb3a23bb5176b790d5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 8120,
            "upload_time": "2024-10-25T04:11:57",
            "upload_time_iso_8601": "2024-10-25T04:11:57.677705Z",
            "url": "https://files.pythonhosted.org/packages/46/d6/98fa493ecd6b9c411351e1d01ab6da770ae5c534046fd158fe53b689550b/winocr-0.0.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-25 04:11:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "GitHub30",
    "github_project": "winocr",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "winocr"
}

Tomofumi Inoue