Name | kraken JSON |
Version |
6.0.0
JSON |
| download |
home_page | https://kraken.re |
Summary | OCR/HTR engine for all the languages |
upload_time | 2025-08-29 21:11:39 |
maintainer | None |
docs_url | None |
author | Benjamin Kiessling |
requires_python | <3.13,>=3.9 |
license | Apache |
keywords |
ocr
htr
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
Description
===========
.. image:: https://github.com/mittagessen/kraken/actions/workflows/test.yml/badge.svg
:target: https://github.com/mittagessen/kraken/actions/workflows/test.yml
kraken is a turn-key OCR system optimized for historical and non-Latin script
material.
kraken's main features are:
- Fully trainable layout analysis, reading order, and character recognition
- `Right-to-Left <https://en.wikipedia.org/wiki/Right-to-left>`_, `BiDi
<https://en.wikipedia.org/wiki/Bi-directional_text>`_, and Top-to-Bottom
script support
- `ALTO <https://www.loc.gov/standards/alto/>`_, PageXML, abbyyXML, and hOCR
output
- Word bounding boxes and character cuts
- Multi-script recognition support
- `Public repository <https://zenodo.org/communities/ocr_models>`_ of model files
- Variable recognition network architecture
Installation
============
Kraken can be run on Linux or Mac OS X (both x64 and ARM). Installation is
through the on-board *pip* utility. To not pollute the global state of your
distribution's package manager it is recommended to use virtual environments.
If you do not have a setup or do not wish to handle virtual environments
yourself you can use `pipx`.
.. code-block:: console
$ sudo apt install pipx
$ pipx install kraken
kraken works both on Linux and Mac OS X and with any python interpreter between
3.9 and 3.11. It is possible the installation fails because `pipx` defaults to
an unsupported interpreter version. In that case you need to install a
compatible interpreter version such as 3.11 and then specify this version
explicitly:
.. code-block:: console
$ sudo apt install python3.11-full
$ pipx install --python python3.11 kraken
Installation using pip
----------------------
Create and activate a separate virtual environment using whatever tool you
like.
.. code-block:: console
$ pip install kraken
or by running pip in the git repository:
.. code-block:: console
$ pip install .
If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to
install the `pdf` extras package for PyPi:
.. code-block:: console
$ pip install kraken[pdf]
Finally you'll have to scrounge up a model to do the actual recognition of
characters. To download the default model for printed French text and place it
in the kraken directory for the current user:
::
$ kraken get 10.5281/zenodo.10592716
A list of libre models available in the central repository can be retrieved by
running:
::
$ kraken list
Quickstart
==========
Recognizing text on an image using the default parameters including the
prerequisite steps of binarization and page segmentation:
::
$ kraken -i image.tif image.txt binarize segment ocr
To binarize a single image using the nlbin algorithm:
::
$ kraken -i image.tif bw.png binarize
To segment an image (binarized or not) with the new baseline segmenter:
::
$ kraken -i image.tif lines.json segment -bl
To segment and OCR an image using the default model(s):
::
$ kraken -i image.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel
All subcommands and options are documented. Use the ``help`` option to get more
information.
Documentation
=============
Have a look at the `docs <https://kraken.re>`_.
Related Software
================
These days kraken is quite closely linked to the `eScriptorium
<https://gitlab.com/scripta/escriptorium/>`_ project developed in the same eScripta research
group. eScriptorium provides a user-friendly interface for annotating data,
training models, and inference (but also much more). There is a `gitter channel
<https://gitter.im/escripta/escriptorium>`_ that is mostly intended for
coordinating technical development but is also a spot to find people with
experience on applying kraken on a wide variety of material.
Funding
=======
kraken is developed at the `École Pratique des Hautes Études <https://www.ephe.psl.eu>`_, `Université PSL <https://www.psl.eu>`_.
.. container:: twocol
.. container::
.. image:: https://raw.githubusercontent.com/mittagessen/kraken/main/docs/_static/normal-reproduction-low-resolution.jpg
:width: 100
:alt: Co-financed by the European Union
.. container::
This project was funded in part by the European Union. (ERC, MiDRASH,
project number 101071829).
.. container:: twocol
.. container::
.. image:: https://raw.githubusercontent.com/mittagessen/kraken/main/docs/_static/normal-reproduction-low-resolution.jpg
:width: 100
:alt: Co-financed by the European Union
.. container::
This project was partially funded through the RESILIENCE project, funded from
the European Union’s Horizon 2020 Framework Programme for Research and
Innovation.
.. container:: twocol
.. container::
.. image:: https://projet.biblissima.fr/sites/default/files/2021-11/biblissima-baseline-sombre-ia.png
:width: 400
:alt: Received funding from the Programme d’investissements d’Avenir
.. container::
Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la
Recherche au titre du Programme d’Investissements d’Avenir portant la référence
ANR-21-ESRE-0005 (Biblissima+).
Raw data
{
"_id": null,
"home_page": "https://kraken.re",
"name": "kraken",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.9",
"maintainer_email": null,
"keywords": "ocr, htr",
"author": "Benjamin Kiessling",
"author_email": "mittagessen@l.unchti.me",
"download_url": "https://files.pythonhosted.org/packages/27/8f/644f76db107904e172c6390d3dbb809a8eb156f4b42c1120f08973f5efdb/kraken-6.0.0.tar.gz",
"platform": null,
"description": "Description\n===========\n\n.. image:: https://github.com/mittagessen/kraken/actions/workflows/test.yml/badge.svg\n :target: https://github.com/mittagessen/kraken/actions/workflows/test.yml\n\nkraken is a turn-key OCR system optimized for historical and non-Latin script\nmaterial.\n\nkraken's main features are:\n\n - Fully trainable layout analysis, reading order, and character recognition\n - `Right-to-Left <https://en.wikipedia.org/wiki/Right-to-left>`_, `BiDi\n <https://en.wikipedia.org/wiki/Bi-directional_text>`_, and Top-to-Bottom\n script support\n - `ALTO <https://www.loc.gov/standards/alto/>`_, PageXML, abbyyXML, and hOCR\n output\n - Word bounding boxes and character cuts\n - Multi-script recognition support\n - `Public repository <https://zenodo.org/communities/ocr_models>`_ of model files\n - Variable recognition network architecture\n\nInstallation\n============\n\nKraken can be run on Linux or Mac OS X (both x64 and ARM). Installation is\nthrough the on-board *pip* utility. To not pollute the global state of your\ndistribution's package manager it is recommended to use virtual environments.\nIf you do not have a setup or do not wish to handle virtual environments\nyourself you can use `pipx`.\n\n.. code-block:: console\n\n $ sudo apt install pipx\n $ pipx install kraken\n\nkraken works both on Linux and Mac OS X and with any python interpreter between\n3.9 and 3.11. It is possible the installation fails because `pipx` defaults to\nan unsupported interpreter version. In that case you need to install a\ncompatible interpreter version such as 3.11 and then specify this version\nexplicitly:\n\n.. code-block:: console\n\n $ sudo apt install python3.11-full\n $ pipx install --python python3.11 kraken\n\n\nInstallation using pip\n----------------------\n\nCreate and activate a separate virtual environment using whatever tool you\nlike.\n\n.. code-block:: console\n\n $ pip install kraken\n\nor by running pip in the git repository:\n\n.. code-block:: console\n\n $ pip install .\n\nIf you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to\ninstall the `pdf` extras package for PyPi:\n\n.. code-block:: console\n\n $ pip install kraken[pdf]\n\nFinally you'll have to scrounge up a model to do the actual recognition of\ncharacters. To download the default model for printed French text and place it\nin the kraken directory for the current user:\n\n::\n\n $ kraken get 10.5281/zenodo.10592716\n\nA list of libre models available in the central repository can be retrieved by\nrunning:\n\n::\n\n $ kraken list\n\nQuickstart\n==========\n\nRecognizing text on an image using the default parameters including the\nprerequisite steps of binarization and page segmentation:\n\n::\n\n $ kraken -i image.tif image.txt binarize segment ocr\n\nTo binarize a single image using the nlbin algorithm:\n\n::\n\n $ kraken -i image.tif bw.png binarize\n\nTo segment an image (binarized or not) with the new baseline segmenter:\n\n::\n\n $ kraken -i image.tif lines.json segment -bl\n\n\nTo segment and OCR an image using the default model(s):\n\n::\n\n $ kraken -i image.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel\n\nAll subcommands and options are documented. Use the ``help`` option to get more\ninformation.\n\nDocumentation\n=============\n\nHave a look at the `docs <https://kraken.re>`_.\n\nRelated Software\n================\n\nThese days kraken is quite closely linked to the `eScriptorium\n<https://gitlab.com/scripta/escriptorium/>`_ project developed in the same eScripta research\ngroup. eScriptorium provides a user-friendly interface for annotating data,\ntraining models, and inference (but also much more). There is a `gitter channel\n<https://gitter.im/escripta/escriptorium>`_ that is mostly intended for\ncoordinating technical development but is also a spot to find people with\nexperience on applying kraken on a wide variety of material.\n\nFunding\n=======\n\nkraken is developed at the `\u00c9cole Pratique des Hautes \u00c9tudes <https://www.ephe.psl.eu>`_, `Universit\u00e9 PSL <https://www.psl.eu>`_.\n\n.. container:: twocol\n\n .. container::\n\n .. image:: https://raw.githubusercontent.com/mittagessen/kraken/main/docs/_static/normal-reproduction-low-resolution.jpg\n :width: 100\n :alt: Co-financed by the European Union\n\n .. container::\n\n This project was funded in part by the European Union. (ERC, MiDRASH,\n project number 101071829).\n\n.. container:: twocol\n\n .. container::\n\n .. image:: https://raw.githubusercontent.com/mittagessen/kraken/main/docs/_static/normal-reproduction-low-resolution.jpg\n :width: 100\n :alt: Co-financed by the European Union\n\n .. container::\n\n This project was partially funded through the RESILIENCE project, funded from\n the European Union\u2019s Horizon 2020 Framework Programme for Research and\n Innovation.\n\n\n.. container:: twocol\n\n .. container::\n\n .. image:: https://projet.biblissima.fr/sites/default/files/2021-11/biblissima-baseline-sombre-ia.png\n :width: 400\n :alt: Received funding from the Programme d\u2019investissements d\u2019Avenir\n\n .. container::\n\n Ce travail a b\u00e9n\u00e9fici\u00e9 d\u2019une aide de l\u2019\u00c9tat g\u00e9r\u00e9e par l\u2019Agence Nationale de la\n Recherche au titre du Programme d\u2019Investissements d\u2019Avenir portant la r\u00e9f\u00e9rence\n ANR-21-ESRE-0005 (Biblissima+).\n\n\n",
"bugtrack_url": null,
"license": "Apache",
"summary": "OCR/HTR engine for all the languages",
"version": "6.0.0",
"project_urls": {
"Homepage": "https://kraken.re"
},
"split_keywords": [
"ocr",
" htr"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3782c21ca6ec5ebd93f32cfc36abcf0b75f6d8e6a7abb1903f4359a87a325060",
"md5": "1d2086c9e2790042c29dc5b8ea88f4a5",
"sha256": "5aea2523c3e3a351861854151a90123bd17be6225b9b6401568e9de29206ae15"
},
"downloads": -1,
"filename": "kraken-6.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1d2086c9e2790042c29dc5b8ea88f4a5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.9",
"size": 4974482,
"upload_time": "2025-08-29T21:11:36",
"upload_time_iso_8601": "2025-08-29T21:11:36.972013Z",
"url": "https://files.pythonhosted.org/packages/37/82/c21ca6ec5ebd93f32cfc36abcf0b75f6d8e6a7abb1903f4359a87a325060/kraken-6.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "278f644f76db107904e172c6390d3dbb809a8eb156f4b42c1120f08973f5efdb",
"md5": "306068ace70c89d6a3c8cd82e8c25321",
"sha256": "ab0388dff9008530caa0867fc1d544db6c3ac23f84d39a83a06e9f0e7556917f"
},
"downloads": -1,
"filename": "kraken-6.0.0.tar.gz",
"has_sig": false,
"md5_digest": "306068ace70c89d6a3c8cd82e8c25321",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.9",
"size": 12881100,
"upload_time": "2025-08-29T21:11:39",
"upload_time_iso_8601": "2025-08-29T21:11:39.257566Z",
"url": "https://files.pythonhosted.org/packages/27/8f/644f76db107904e172c6390d3dbb809a8eb156f4b42c1120f08973f5efdb/kraken-6.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-29 21:11:39",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "kraken"
}