shiftlab-ocr


Nameshiftlab-ocr JSON
Version 0.3.2 PyPI version JSON
download
home_pagehttps://github.com/konverner/shiftlab_ocr
SummarySHIFT OCR is a library for handwriting text segmentation and character recognition.
upload_time2023-07-04 13:49:23
maintainer
docs_urlNone
authorKonstantin Verner
requires_python>=3.6
licenseMIT
keywords data computer vision handwriting doc2text
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SHIFTLAB OCR

SHIFT OCR is a library for handwriting text segmentation and character recognition.
 
# Get Started

``` 
pip install shiftlab_ocr
```
## Doc2Text
`Reader` from `doc2text` performs text detection and the following recognition.

![](https://github.com/constantin50/shiftlab_ocr/blob/main/demo_image.png)

```
import urllib

from shiftlab_ocr.doc2text.reader import Reader


urllib.request.urlretrieve(
  'https://raw.githubusercontent.com/konverner/shiftlab_ocr/main/demo_image.png',
   'test.png')
   
reader = Reader()
result = reader.doc2text("test.png")

```

Display recognized text:

```
print(result[0])

Действительно ли добро сильнее зла?
Именно над этим вопросом аставля заставляет
читателей задуматься В. Тендряков.
Автор рассматривает данную пробле-
му на конкретном примере, рассказывая
историю 00 заблудившемся немце русских
солдатах, которые пожалели врала и
позволи ему остаться землянке. 

```

Display segmented crops:

```
import matplotlib.pyplot as plt

def show_img_grid(images, N):
    n = int(N**(0.5))
    k = 0
    f, axarr = plt.subplots(n,n,figsize=(10,10))
    for i in range(n):
        for j in range(n):
            axarr[i,j].imshow(images[k].img)
            k += 1
    f.show()

show_img_grid(result[1], 48)
```

![](https://github.com/konverner/shiftlab_ocr/blob/main/crops_image.png?raw=true)

## Generator of handwriting

It generates handwriting script with random backgrounds and handwriting fonts with a given string or a list of strings saved in `source.txt`.

Generating a random sample from a string:

```
from shiftlab_ocr.generator.generator import Generator

g = Generator(lang='ru')
s = g.generate_from_string('Москва',min_length=4,max_length=24) # get from a string
s
```

![](https://sun9-51.userapi.com/impg/CSeyZPb4rDmP4aCYIDoMDx5VQMXcWO6CwtpGUA/vH_cghX1JtA.jpg?size=344x88&quality=96&sign=c61344d4c7f5576ffe03e750ca31f94c&type=album)

Generating batch of random samples from `source.txt`:

```
import numpy as np

# upload source.txt with one word per line
g.upload_source('source.txt')
b = g.generate_batch(12,4,13) # get batch of random samples from source.txt
fig=plt.figure(figsize=(10, 10))
rows = int(len(b)/4) + 2
columns = int(len(b)/8) + 2
for i in range(len(b)):
  fig.add_subplot(rows, columns, i+1)
  plt.imshow(np.asarray(b[i][0])) 

```

![](https://sun9-80.userapi.com/impg/ay9o11D8ItN65kDqYnZBahiZFk1zZ2wo5BYoMA/I_nNhdMQeLs.jpg?size=600x409&quality=96&sign=9d6a3ee935fcdc7112aec557eeed74f1&type=album)

Also, see [Google Colab Demo](https://colab.research.google.com/drive/1FPfQY9HvjEPEdzfFEZsgSCk5P1TBUAse?usp=sharing)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/konverner/shiftlab_ocr",
    "name": "shiftlab-ocr",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "data,computer vision,handwriting,doc2text",
    "author": "Konstantin Verner",
    "author_email": "konst.verner@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/02/72/046035814cd471be9534bd7b0cb831b376cc9cac178edf58d7bca68cab00/shiftlab-ocr-0.3.2.tar.gz",
    "platform": null,
    "description": "# SHIFTLAB OCR\r\n\r\nSHIFT OCR is a library for handwriting text segmentation and character recognition.\r\n \r\n# Get Started\r\n\r\n``` \r\npip install shiftlab_ocr\r\n```\r\n## Doc2Text\r\n`Reader` from `doc2text` performs text detection and the following recognition.\r\n\r\n![](https://github.com/constantin50/shiftlab_ocr/blob/main/demo_image.png)\r\n\r\n```\r\nimport urllib\r\n\r\nfrom shiftlab_ocr.doc2text.reader import Reader\r\n\r\n\r\nurllib.request.urlretrieve(\r\n  'https://raw.githubusercontent.com/konverner/shiftlab_ocr/main/demo_image.png',\r\n   'test.png')\r\n   \r\nreader = Reader()\r\nresult = reader.doc2text(\"test.png\")\r\n\r\n```\r\n\r\nDisplay recognized text:\r\n\r\n```\r\nprint(result[0])\r\n\r\n\u0414\u0435\u0439\u0441\u0442\u0432\u0438\u0442\u0435\u043b\u044c\u043d\u043e \u043b\u0438 \u0434\u043e\u0431\u0440\u043e \u0441\u0438\u043b\u044c\u043d\u0435\u0435 \u0437\u043b\u0430?\r\n\u0418\u043c\u0435\u043d\u043d\u043e \u043d\u0430\u0434 \u044d\u0442\u0438\u043c \u0432\u043e\u043f\u0440\u043e\u0441\u043e\u043c \u0430\u0441\u0442\u0430\u0432\u043b\u044f \u0437\u0430\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442\r\n\u0447\u0438\u0442\u0430\u0442\u0435\u043b\u0435\u0439 \u0437\u0430\u0434\u0443\u043c\u0430\u0442\u044c\u0441\u044f \u0412. \u0422\u0435\u043d\u0434\u0440\u044f\u043a\u043e\u0432.\r\n\u0410\u0432\u0442\u043e\u0440 \u0440\u0430\u0441\u0441\u043c\u0430\u0442\u0440\u0438\u0432\u0430\u0435\u0442 \u0434\u0430\u043d\u043d\u0443\u044e \u043f\u0440\u043e\u0431\u043b\u0435-\r\n\u043c\u0443 \u043d\u0430 \u043a\u043e\u043d\u043a\u0440\u0435\u0442\u043d\u043e\u043c \u043f\u0440\u0438\u043c\u0435\u0440\u0435, \u0440\u0430\u0441\u0441\u043a\u0430\u0437\u044b\u0432\u0430\u044f\r\n\u0438\u0441\u0442\u043e\u0440\u0438\u044e 00 \u0437\u0430\u0431\u043b\u0443\u0434\u0438\u0432\u0448\u0435\u043c\u0441\u044f \u043d\u0435\u043c\u0446\u0435 \u0440\u0443\u0441\u0441\u043a\u0438\u0445\r\n\u0441\u043e\u043b\u0434\u0430\u0442\u0430\u0445, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043f\u043e\u0436\u0430\u043b\u0435\u043b\u0438 \u0432\u0440\u0430\u043b\u0430 \u0438\r\n\u043f\u043e\u0437\u0432\u043e\u043b\u0438 \u0435\u043c\u0443 \u043e\u0441\u0442\u0430\u0442\u044c\u0441\u044f \u0437\u0435\u043c\u043b\u044f\u043d\u043a\u0435. \r\n\r\n```\r\n\r\nDisplay segmented crops:\r\n\r\n```\r\nimport matplotlib.pyplot as plt\r\n\r\ndef show_img_grid(images, N):\r\n    n = int(N**(0.5))\r\n    k = 0\r\n    f, axarr = plt.subplots(n,n,figsize=(10,10))\r\n    for i in range(n):\r\n        for j in range(n):\r\n            axarr[i,j].imshow(images[k].img)\r\n            k += 1\r\n    f.show()\r\n\r\nshow_img_grid(result[1], 48)\r\n```\r\n\r\n![](https://github.com/konverner/shiftlab_ocr/blob/main/crops_image.png?raw=true)\r\n\r\n## Generator of handwriting\r\n\r\nIt generates handwriting script with random backgrounds and handwriting fonts with a given string or a list of strings saved in `source.txt`.\r\n\r\nGenerating a random sample from a string:\r\n\r\n```\r\nfrom shiftlab_ocr.generator.generator import Generator\r\n\r\ng = Generator(lang='ru')\r\ns = g.generate_from_string('\u041c\u043e\u0441\u043a\u0432\u0430',min_length=4,max_length=24) # get from a string\r\ns\r\n```\r\n\r\n![](https://sun9-51.userapi.com/impg/CSeyZPb4rDmP4aCYIDoMDx5VQMXcWO6CwtpGUA/vH_cghX1JtA.jpg?size=344x88&quality=96&sign=c61344d4c7f5576ffe03e750ca31f94c&type=album)\r\n\r\nGenerating batch of random samples from `source.txt`:\r\n\r\n```\r\nimport numpy as np\r\n\r\n# upload source.txt with one word per line\r\ng.upload_source('source.txt')\r\nb = g.generate_batch(12,4,13) # get batch of random samples from source.txt\r\nfig=plt.figure(figsize=(10, 10))\r\nrows = int(len(b)/4) + 2\r\ncolumns = int(len(b)/8) + 2\r\nfor i in range(len(b)):\r\n  fig.add_subplot(rows, columns, i+1)\r\n  plt.imshow(np.asarray(b[i][0])) \r\n\r\n```\r\n\r\n![](https://sun9-80.userapi.com/impg/ay9o11D8ItN65kDqYnZBahiZFk1zZ2wo5BYoMA/I_nNhdMQeLs.jpg?size=600x409&quality=96&sign=9d6a3ee935fcdc7112aec557eeed74f1&type=album)\r\n\r\nAlso, see [Google Colab Demo](https://colab.research.google.com/drive/1FPfQY9HvjEPEdzfFEZsgSCk5P1TBUAse?usp=sharing)\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "SHIFT OCR is a library for handwriting text segmentation and character recognition.",
    "version": "0.3.2",
    "project_urls": {
        "Homepage": "https://github.com/konverner/shiftlab_ocr"
    },
    "split_keywords": [
        "data",
        "computer vision",
        "handwriting",
        "doc2text"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cd30b3be68e7b970c7b3140607367dede17227e11209aa19ab9069ec08cbee52",
                "md5": "fed67162fd9052e23eda0312a52190e9",
                "sha256": "4f04a0a2292fda20d6d95ec652f24342a5a85a64984af63bac0c25e8656bc565"
            },
            "downloads": -1,
            "filename": "shiftlab_ocr-0.3.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fed67162fd9052e23eda0312a52190e9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 1656279,
            "upload_time": "2023-07-04T13:49:21",
            "upload_time_iso_8601": "2023-07-04T13:49:21.206909Z",
            "url": "https://files.pythonhosted.org/packages/cd/30/b3be68e7b970c7b3140607367dede17227e11209aa19ab9069ec08cbee52/shiftlab_ocr-0.3.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0272046035814cd471be9534bd7b0cb831b376cc9cac178edf58d7bca68cab00",
                "md5": "3328b64d7446d8db3bbc36477922b1b9",
                "sha256": "db95692c5af6c7a2a317ba36e6df5ce1fbbb15814a16743ade21098c5bdab162"
            },
            "downloads": -1,
            "filename": "shiftlab-ocr-0.3.2.tar.gz",
            "has_sig": false,
            "md5_digest": "3328b64d7446d8db3bbc36477922b1b9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 1604455,
            "upload_time": "2023-07-04T13:49:23",
            "upload_time_iso_8601": "2023-07-04T13:49:23.207685Z",
            "url": "https://files.pythonhosted.org/packages/02/72/046035814cd471be9534bd7b0cb831b376cc9cac178edf58d7bca68cab00/shiftlab-ocr-0.3.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-04 13:49:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "konverner",
    "github_project": "shiftlab_ocr",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "shiftlab-ocr"
}
        
Elapsed time: 0.11023s