kraken


Namekraken JSON
Version 5.2.2 PyPI version JSON
download
home_pagehttps://kraken.re
SummaryOCR/HTR engine for all the languages
upload_time2024-04-30 13:23:47
maintainerNone
docs_urlNone
authorBenjamin Kiessling
requires_python<=3.11.99,>=3.8
licenseApache
keywords ocr htr
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Description
===========

.. image:: https://github.com/mittagessen/kraken/actions/workflows/test.yml/badge.svg
    :target: https://github.com/mittagessen/kraken/actions/workflows/test.yml

kraken is a turn-key OCR system optimized for historical and non-Latin script
material.

kraken's main features are:

  - Fully trainable layout analysis, reading order, and character recognition
  - `Right-to-Left <https://en.wikipedia.org/wiki/Right-to-left>`_, `BiDi
    <https://en.wikipedia.org/wiki/Bi-directional_text>`_, and Top-to-Bottom
    script support
  - `ALTO <https://www.loc.gov/standards/alto/>`_, PageXML, abbyyXML, and hOCR
    output
  - Word bounding boxes and character cuts
  - Multi-script recognition support
  - `Public repository <https://zenodo.org/communities/ocr_models>`_ of model files
  - Variable recognition network architecture

Installation
============

kraken only runs on **Linux or Mac OS X**. Windows is not supported.

The latest stable releases can be installed either from `PyPi <https://pypi.org>`_:

::

  $ pip install kraken

or through `conda <https://anaconda.org>`_:

::

  $ conda install -c conda-forge -c mittagessen kraken

If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to
install the `pdf` extras package for PyPi:

::

  $ pip install kraken[pdf]

or install `pyvips` manually with pip:

::

  $ pip install pyvips

Conda environment files are provided for the seamless installation of the main
branch as well:

::

  $ git clone https://github.com/mittagessen/kraken.git
  $ cd kraken
  $ conda env create -f environment.yml

or:

::

  $ git clone https://github.com/mittagessen/kraken.git
  $ cd kraken
  $ conda env create -f environment_cuda.yml

for CUDA acceleration with the appropriate hardware.

Finally you'll have to scrounge up a model to do the actual recognition of
characters. To download the default model for printed French text and place it
in the kraken directory for the current user:

::

  $ kraken get 10.5281/zenodo.10592716

A list of libre models available in the central repository can be retrieved by
running:

::

  $ kraken list

Quickstart
==========

Recognizing text on an image using the default parameters including the
prerequisite steps of binarization and page segmentation:

::

  $ kraken -i image.tif image.txt binarize segment ocr

To binarize a single image using the nlbin algorithm:

::

  $ kraken -i image.tif bw.png binarize

To segment an image (binarized or not) with the new baseline segmenter:

::

  $ kraken -i image.tif lines.json segment -bl


To segment and OCR an image using the default model(s):

::

  $ kraken -i image.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel

All subcommands and options are documented. Use the ``help`` option to get more
information.

Documentation
=============

Have a look at the `docs <https://kraken.re>`_.

Related Software
================

These days kraken is quite closely linked to the `eScriptorium
<https://gitlab.com/scripta/escriptorium/>`_ project developed in the same eScripta research
group. eScriptorium provides a user-friendly interface for annotating data,
training models, and inference (but also much more). There is a `gitter channel
<https://gitter.im/escripta/escriptorium>`_ that is mostly intended for
coordinating technical development but is also a spot to find people with
experience on applying kraken on a wide variety of material.

Funding
=======

kraken is developed at the `École Pratique des Hautes Études <https://www.ephe.psl.eu>`_, `Université PSL <https://www.psl.eu>`_.

.. container:: twocol

   .. container::

        .. image:: https://raw.githubusercontent.com/mittagessen/kraken/main/docs/_static/normal-reproduction-low-resolution.jpg
          :width: 100
          :alt: Co-financed by the European Union

   .. container::

        This project was partially funded through the RESILIENCE project, funded from
        the European Union’s Horizon 2020 Framework Programme for Research and
        Innovation.


.. container:: twocol

   .. container::

      .. image:: https://projet.biblissima.fr/sites/default/files/2021-11/biblissima-baseline-sombre-ia.png
         :width: 400
         :alt: Received funding from the Programme d’investissements d’Avenir

   .. container::

        Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la
        Recherche au titre du Programme d’Investissements d’Avenir portant la référence
        ANR-21-ESRE-0005 (Biblissima+).



            

Raw data

            {
    "_id": null,
    "home_page": "https://kraken.re",
    "name": "kraken",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<=3.11.99,>=3.8",
    "maintainer_email": null,
    "keywords": "ocr, htr",
    "author": "Benjamin Kiessling",
    "author_email": "mittagessen@l.unchti.me",
    "download_url": "https://files.pythonhosted.org/packages/79/87/f240e2b517616f0293025f4ecff12b469c9b46ed501f0abcfd82b1d51728/kraken-5.2.2.tar.gz",
    "platform": null,
    "description": "Description\n===========\n\n.. image:: https://github.com/mittagessen/kraken/actions/workflows/test.yml/badge.svg\n    :target: https://github.com/mittagessen/kraken/actions/workflows/test.yml\n\nkraken is a turn-key OCR system optimized for historical and non-Latin script\nmaterial.\n\nkraken's main features are:\n\n  - Fully trainable layout analysis, reading order, and character recognition\n  - `Right-to-Left <https://en.wikipedia.org/wiki/Right-to-left>`_, `BiDi\n    <https://en.wikipedia.org/wiki/Bi-directional_text>`_, and Top-to-Bottom\n    script support\n  - `ALTO <https://www.loc.gov/standards/alto/>`_, PageXML, abbyyXML, and hOCR\n    output\n  - Word bounding boxes and character cuts\n  - Multi-script recognition support\n  - `Public repository <https://zenodo.org/communities/ocr_models>`_ of model files\n  - Variable recognition network architecture\n\nInstallation\n============\n\nkraken only runs on **Linux or Mac OS X**. Windows is not supported.\n\nThe latest stable releases can be installed either from `PyPi <https://pypi.org>`_:\n\n::\n\n  $ pip install kraken\n\nor through `conda <https://anaconda.org>`_:\n\n::\n\n  $ conda install -c conda-forge -c mittagessen kraken\n\nIf you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to\ninstall the `pdf` extras package for PyPi:\n\n::\n\n  $ pip install kraken[pdf]\n\nor install `pyvips` manually with pip:\n\n::\n\n  $ pip install pyvips\n\nConda environment files are provided for the seamless installation of the main\nbranch as well:\n\n::\n\n  $ git clone https://github.com/mittagessen/kraken.git\n  $ cd kraken\n  $ conda env create -f environment.yml\n\nor:\n\n::\n\n  $ git clone https://github.com/mittagessen/kraken.git\n  $ cd kraken\n  $ conda env create -f environment_cuda.yml\n\nfor CUDA acceleration with the appropriate hardware.\n\nFinally you'll have to scrounge up a model to do the actual recognition of\ncharacters. To download the default model for printed French text and place it\nin the kraken directory for the current user:\n\n::\n\n  $ kraken get 10.5281/zenodo.10592716\n\nA list of libre models available in the central repository can be retrieved by\nrunning:\n\n::\n\n  $ kraken list\n\nQuickstart\n==========\n\nRecognizing text on an image using the default parameters including the\nprerequisite steps of binarization and page segmentation:\n\n::\n\n  $ kraken -i image.tif image.txt binarize segment ocr\n\nTo binarize a single image using the nlbin algorithm:\n\n::\n\n  $ kraken -i image.tif bw.png binarize\n\nTo segment an image (binarized or not) with the new baseline segmenter:\n\n::\n\n  $ kraken -i image.tif lines.json segment -bl\n\n\nTo segment and OCR an image using the default model(s):\n\n::\n\n  $ kraken -i image.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel\n\nAll subcommands and options are documented. Use the ``help`` option to get more\ninformation.\n\nDocumentation\n=============\n\nHave a look at the `docs <https://kraken.re>`_.\n\nRelated Software\n================\n\nThese days kraken is quite closely linked to the `eScriptorium\n<https://gitlab.com/scripta/escriptorium/>`_ project developed in the same eScripta research\ngroup. eScriptorium provides a user-friendly interface for annotating data,\ntraining models, and inference (but also much more). There is a `gitter channel\n<https://gitter.im/escripta/escriptorium>`_ that is mostly intended for\ncoordinating technical development but is also a spot to find people with\nexperience on applying kraken on a wide variety of material.\n\nFunding\n=======\n\nkraken is developed at the `\u00c9cole Pratique des Hautes \u00c9tudes <https://www.ephe.psl.eu>`_, `Universit\u00e9 PSL <https://www.psl.eu>`_.\n\n.. container:: twocol\n\n   .. container::\n\n        .. image:: https://raw.githubusercontent.com/mittagessen/kraken/main/docs/_static/normal-reproduction-low-resolution.jpg\n          :width: 100\n          :alt: Co-financed by the European Union\n\n   .. container::\n\n        This project was partially funded through the RESILIENCE project, funded from\n        the European Union\u2019s Horizon 2020 Framework Programme for Research and\n        Innovation.\n\n\n.. container:: twocol\n\n   .. container::\n\n      .. image:: https://projet.biblissima.fr/sites/default/files/2021-11/biblissima-baseline-sombre-ia.png\n         :width: 400\n         :alt: Received funding from the Programme d\u2019investissements d\u2019Avenir\n\n   .. container::\n\n        Ce travail a b\u00e9n\u00e9fici\u00e9 d\u2019une aide de l\u2019\u00c9tat g\u00e9r\u00e9e par l\u2019Agence Nationale de la\n        Recherche au titre du Programme d\u2019Investissements d\u2019Avenir portant la r\u00e9f\u00e9rence\n        ANR-21-ESRE-0005 (Biblissima+).\n\n\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "OCR/HTR engine for all the languages",
    "version": "5.2.2",
    "project_urls": {
        "Homepage": "https://kraken.re"
    },
    "split_keywords": [
        "ocr",
        " htr"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "37b9eb8fd631bdecd801986aca77d80cb181f986c2a51c115f1170f293e96601",
                "md5": "a6a05023003c050795eb4243133ac8e7",
                "sha256": "9c99eff9473c6382e1182ca1b1b991fe58230abe71c119f256db113c414afc5b"
            },
            "downloads": -1,
            "filename": "kraken-5.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a6a05023003c050795eb4243133ac8e7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<=3.11.99,>=3.8",
            "size": 4973286,
            "upload_time": "2024-04-30T13:23:44",
            "upload_time_iso_8601": "2024-04-30T13:23:44.042369Z",
            "url": "https://files.pythonhosted.org/packages/37/b9/eb8fd631bdecd801986aca77d80cb181f986c2a51c115f1170f293e96601/kraken-5.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7987f240e2b517616f0293025f4ecff12b469c9b46ed501f0abcfd82b1d51728",
                "md5": "a356472d18e0fcc12049d3874e44bbe3",
                "sha256": "f5b3362234d94ab9113e9ed6f6136753b6a11f0703bd6b1caa2c1ad16ad80ff1"
            },
            "downloads": -1,
            "filename": "kraken-5.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "a356472d18e0fcc12049d3874e44bbe3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<=3.11.99,>=3.8",
            "size": 12830778,
            "upload_time": "2024-04-30T13:23:47",
            "upload_time_iso_8601": "2024-04-30T13:23:47.307072Z",
            "url": "https://files.pythonhosted.org/packages/79/87/f240e2b517616f0293025f4ecff12b469c9b46ed501f0abcfd82b1d51728/kraken-5.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-30 13:23:47",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "kraken"
}
        
Elapsed time: 0.25041s