# HTRMoPo

HTRMoPo is a schema and an implementation for an automatic text recognition
model repository hosted on the [Zenodo](https://zenodo.org) research data
infrastructure. It is designed to enable discoverability of models across a
wide number of software and ATR-related tasks and aid in model selection.
There are two versions of the schema: `v0` and `v1`. `v0` is the legacy kraken
model schema for the Zenodo repository that is fairly limited, in particular by
not supporting non-recognition models and providing limited ways of
incorporating model cards. `v1` is intended for all kinds of machine learning
models involved in ATR independent of software.
## Schema
### v0
v0 is conserved for historical interest mostly. Records in v0 format consist of
a JSON metadata file and at most a single model file that is referenced in it.
### v1
Repository records following the v1 schema consist of a Markdown model card
with a YAML metadata front matter and an arbitrary number of files in the
record. There is an [example for the model card](schema/v1/model_card.md) that
is inspired by the huggingface example template but in principle model cards
are free form. The front matter can be validated against a JSON schema found
[here](schema/v1/metadata.schema.json).
## How does it work ?
Install the python library and prepare a model card for your ATR model, no
matter of segmentation, recognition, reading order, postcorrection, ....
Afterwards you need to create an account on [Zenodo](https://zenodo.org) and
create an API access token as described
[here](https://developers.zenodo.org/#creating-a-personal-access-token).
With the HTRMoPo reference implementation and the access token you can then
create model deposits on Zenodo. Deposits will be immediately accessible to the
whole world but won't be discoverable until the community inclusion request is
manually approved by one of the repository administrators.
Using a research data infrastructure like Zenodo assures long-term
accessibility of the deposited models while also enabling good scientific
practices like reproducibility and crediting contributions.
## Deposits and Identifiers
Each model in the repository consists of the model card with metadata and one
or more model files and is identified by two persistent and unique DOIs. One of
the DOIs refers to the deposit, which means a single model, itself while the
second one is called the [concept DOI](https://zenodo.org/help/versioning). An
example is [10.5281/zenodo.7051646](https://zenodo.org/records/7051646) with
concept DOI [10.5281/zenodo.7051645](https://doi.org/10.5281/zenodo.7051645).
When a new version of a model is updated to the repository a new DOI is
created, for example
[10.5281/zenodo.14585602](https://zenodo.org/records/14585602) for the above
model but the concept DOI remains the same, aggregating all versions of a model
under a single identifier. The concept DOI therefore aggregates all versions of
the model and in addition will always link to the latest version of it.
## Python Library
A reference implementation to interact with the repository on Zenodo is in the
htrmopo directory, containing both a python library and command line drivers.
The library can be installed using pip:
~> pip install htrmopo
### CLI
The `htrmopo` command line tool is used to query the repository, download
existing models, and upload and update items to it.
#### Querying the repository
To get a listing of all models:
~> htrmopo list
Retrieving model list ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ DOI ┃ summary ┃ model type ┃ keywords ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 10.5281/zenodo.7547437 │ │ │ │
│ ├── 10.5281/zenodo.10800223 │ HTR model for documentary Latin, Old French and Spanish medieval manuscripts (11th-16th) │ recognition │ Handwritten text recognition; Handwritten text recognition for Medieval manuscripts; Digital Paleography │
│ └── 10.5281/zenodo.7547438 │ HTR model for documentary Latin and Old French medieval manuscripts (12th-15th) │ recognition │ Handwritten text recognition; Handwritten text recognition for Medieval manuscripts; Digital Paleography │
│ 10.5281/zenodo.7050269 │ │ │ │
│ └── 10.5281/zenodo.7050270 │ Printed Arabic-Script Base Model Trained on the OpenITI Corpus │ recognition │ kraken_pytorch │
│ 10.5281/zenodo.6542743 │ │ │ │
│ └── 10.5281/zenodo.6542744 │ LECTAUREP Contemporary French Model (Administration) │ recognition │ kraken_pytorch; HTR; transcription model; recognition model; French; Contemporary French │
│ 10.5281/zenodo.13814199 │ │ │ │
│ └── 10.5281/zenodo.13814200 │ Segmentation model for historical Samaritan Manuscripts for one column pages, model trained on 13 pentateuchal Samaritan manuscripts │ recognition │ kraken_pytorch │
│ 10.5281/zenodo.6891851 │ │ │ │
│ ├── 10.5281/zenodo.7933402 │ Fraktur model trained from enhanced Austrian Newspapers dataset │ recognition │ kraken_pytorch; Fraktur; Latin │
│ └── 10.5281/zenodo.6891852 │ Fraktur model trained from enhanced Austrian Newspapers dataset │ recognition │ kraken_pytorch │
│ 10.5281/zenodo.8193497 │ │ │ │
│ └── 10.5281/zenodo.8193498 │ Transcription model for Lucien Peraire's handwriting (French, 20th century) │ recognition │ kraken_pytorch; HTR; Peraire; Manu McFrench; contemporary handwriting; French │
│ 10.5281/zenodo.5468664 │ │ │ │
│ └── 10.5281/zenodo.5468665 │ Medieval Hebrew manuscripts in Sephardi bookhand version 1.0 │ recognition │ kraken_pytorch │
│ 10.5281/zenodo.10592715 │ │ │ │
│ └── 10.5281/zenodo.10592716 │ CATMuS-Print (Large, 2024-01-30) - Diachronic model for French prints and other languages │ recognition │ kraken_pytorch; optical text recognition │
│ 10.5281/zenodo.7051645 │ │ │ │
│ ├── 10.5281/zenodo.14585602 │ Printed Urdu Base Model Trained on the OpenITI Corpus │ recognition │ automatic-text-recognition │
│ ├── 10.5281/zenodo.14574660 │ Printed Urdu Base Model Trained on the OpenITI Corpus │ recognition │ kraken_pytorch │
│ └── 10.5281/zenodo.7051646 │ Printed Urdu Base Model Trained on the OpenITI Corpus │ recognition │ kraken_pytorch │
│ 10.5281/zenodo.5468285 │ │ │ │
│ └── 10.5281/zenodo.5468286 │ Medieval Hebrew manuscripts version 1.0 │ recognition │ kraken_pytorch │
│ 10.5281/zenodo.6657808 │ │ │ │
│ ├── 10.5281/zenodo.10886224 │ Model train on openly licensed data from HTR-United from the 17th century to the 21st were used. │ recognition │ kraken_pytorch │
│ ├── 10.5281/zenodo.6657809 │ Model train on openly licensed data from HTR-United. All French manuscript data from the 17th century to the 21st were used (72k lines). │ recognition │ kraken_pytorch │
│ └── 10.5281/zenodo.10874058 │ Model train on openly licensed data from HTR-United. All French manuscript data from the 17th century to the 21st were used. │ recognition │ kraken_pytorch │
│ 10.5281/zenodo.7234165 │ │ │ │
....
Records are represented in a tree structure in the left-most column. The DOI at
the root of each tree is a [concept DOI](https://zenodo.org/help/versioning)
which always links to the most recent version of a model. The leaves of the
tree are particular versions of the record ordered chronologically. Either type
of DOI is acceptable as arguments for the functions below although it is
recommended to reference a concrete version in contexts where reproducibility
is desired.
To fetch the metadata for a single model (both v0 and v1 schema):
~> htrmopo show 10.5281/zenodo.10800223
HTR model for documentary Latin, Old French and Spanish medieval manuscripts (11th-16th)
┌──────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ DOI │ 10.5281/zenodo.10800223 │
│ concept DOI │ 10.5281/zenodo.7547437 │
│ publication date │ 2024-03-14T01:47:02+00:00 │
│ model type │ recognition │
│ script │ Latin │
│ alphabet │ ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ a b c d e f g h i j k l m n o p │
│ │ q r s t u v w x y z { | } ~ ¡ £ § ª « ¬ ° ¶ º » ½ ¾ À Ä Ç È É Ë Ï Û Ü à á â ä æ ç è é ê ë ì í î ï ñ ò ó ô ö ù ú û ü ÿ ā ă ē ĕ ę ī ō ŏ œ ŭ ƒ ȩ ˀ ο а е о с ᗅ │
│ │ – — ‘ ’ ” „ † … ⁖ ₎ 〈 〉 ✳ ꝫ │
│ │ 0x9, SPACE, 0x92, 0x97, NO-BREAK SPACE, COMBINING MACRON, COMBINING LATIN SMALL LETTER A, COMBINING LATIN SMALL LETTER E, COMBINING LATIN SMALL LETTER O, │
│ │ COMBINING LATIN SMALL LETTER U, COMBINING LATIN SMALL LETTER C, WORD JOINER, 0xf2f7 │
│ keywords │ Handwritten text recognition │
│ │ Handwritten text recognition for Medieval manuscripts │
│ │ Digital Paleography │
│ metrics │ cer: 7.82 │
│ license │ MIT │
│ creators │ Torres Aguilar, Sergio (https://orcid.org/0000-0002-1801-3147) (University of Luxembourg) │
│ │ Jolivet, Vincent (École nationale des chartes) │
│ │ Sergio Torres Aguilar (University of Luxembourg) │
│ description │ The model was trained on diplomatic transcriptions of documentary manuscripts from the Late-medieval period (12-15th) and early modernity (16th). The │
│ │ training and evaluation sets entail 215k lines and 2.4M of tokens using open source corpora. │
│ │ │
└──────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Downloading a single model:
~> htrmopo get 10.5281/zenodo.7547437
Processing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
Model name: /home/mittagessen/.local/share/htrmopo/0ac39ba5-8f85-5ea1-913a-f84a13ca756f
Models are placed per default in reproducible locations in the application
state dir printed after the download is finished. The `-o` option allows
customization of that behavior:
~> htrmopo get -o manu 10.5281/zenodo.7547437
Processing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
Model name: /home/mittagessen/manu
#### Publishing models
There are two modes of publishing ATR models with the `htrmopo` command. The
first creates new stand-alone deposits while the second one creates a new
version of an existing record that will all be grouped under the same concept
DOI. Updating a model deposit is usually done when a prior model is retrained
with additional training data, the metadata has been refined, or additional
evaluation has been done.
The calls for both modes are very similar, the only difference being `-d`
option giving the DOI of an existing model deposit in the repository:
~> htrmopo publish -i model_card.md -a ${ACCESS_TOKEN} model_dir
Uploading ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
model PID: 10.5072/zenodo.146629
~> htrmopo publish -d 10.5072/zenodo.146502 -i model_card.md -a ${ACCESS_TOKEN} model_dir
Uploading ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
model PID: 10.5072/zenodo.146627
#### Configuration
The tool is intended to work out of the box but sometimes it can be useful for
testing purposes to point it to another instance of
[InvenioDRM](https://inveniosoftware.org/products/rdm/) such as the [Zenodo
sandbox](https://sandbox.zenodo.org/) in order not to pollute the main
repository with spurious deposits.
You can set the OAI-PMH API endpoint (required for querying) and InvenioDRM
endpoint (needed for querying and publishing) with the `MODEL_REPO_OAI_URL` and
`MODEL_REPO_URL` environments, for example:
MODEL_REPO_URL=https://sandbox.zenodo.org/api/ htrmopo publish -i model_card.md -a ....
will upload a model to the sandbox instance of Zenodo.
Raw data
{
"_id": null,
"home_page": "http://htrmopo.github.io",
"name": "htrmopo",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "atr",
"author": "Benjamin Kiessling",
"author_email": "mittagessen@l.unchti.me",
"download_url": "https://files.pythonhosted.org/packages/8d/71/353ba08185842d997766febd674edc7e05de68438e0101d674c20dfd0c3f/htrmopo-0.2.0.tar.gz",
"platform": null,
"description": "# HTRMoPo\n\n\n\nHTRMoPo is a schema and an implementation for an automatic text recognition\nmodel repository hosted on the [Zenodo](https://zenodo.org) research data\ninfrastructure. It is designed to enable discoverability of models across a\nwide number of software and ATR-related tasks and aid in model selection.\n\nThere are two versions of the schema: `v0` and `v1`. `v0` is the legacy kraken\nmodel schema for the Zenodo repository that is fairly limited, in particular by\nnot supporting non-recognition models and providing limited ways of\nincorporating model cards. `v1` is intended for all kinds of machine learning\nmodels involved in ATR independent of software.\n\n## Schema\n\n### v0\n\nv0 is conserved for historical interest mostly. Records in v0 format consist of\na JSON metadata file and at most a single model file that is referenced in it.\n\n### v1\n\nRepository records following the v1 schema consist of a Markdown model card\nwith a YAML metadata front matter and an arbitrary number of files in the\nrecord. There is an [example for the model card](schema/v1/model_card.md) that\nis inspired by the huggingface example template but in principle model cards\nare free form. The front matter can be validated against a JSON schema found\n[here](schema/v1/metadata.schema.json).\n\n## How does it work ?\n\nInstall the python library and prepare a model card for your ATR model, no\nmatter of segmentation, recognition, reading order, postcorrection, ....\nAfterwards you need to create an account on [Zenodo](https://zenodo.org) and\ncreate an API access token as described\n[here](https://developers.zenodo.org/#creating-a-personal-access-token).\n\nWith the HTRMoPo reference implementation and the access token you can then\ncreate model deposits on Zenodo. Deposits will be immediately accessible to the\nwhole world but won't be discoverable until the community inclusion request is\nmanually approved by one of the repository administrators.\n\nUsing a research data infrastructure like Zenodo assures long-term\naccessibility of the deposited models while also enabling good scientific\npractices like reproducibility and crediting contributions.\n\n## Deposits and Identifiers\n\nEach model in the repository consists of the model card with metadata and one\nor more model files and is identified by two persistent and unique DOIs. One of\nthe DOIs refers to the deposit, which means a single model, itself while the\nsecond one is called the [concept DOI](https://zenodo.org/help/versioning). An\nexample is [10.5281/zenodo.7051646](https://zenodo.org/records/7051646) with\nconcept DOI [10.5281/zenodo.7051645](https://doi.org/10.5281/zenodo.7051645).\nWhen a new version of a model is updated to the repository a new DOI is\ncreated, for example\n[10.5281/zenodo.14585602](https://zenodo.org/records/14585602) for the above\nmodel but the concept DOI remains the same, aggregating all versions of a model\nunder a single identifier. The concept DOI therefore aggregates all versions of\nthe model and in addition will always link to the latest version of it.\n\n## Python Library\n\nA reference implementation to interact with the repository on Zenodo is in the\nhtrmopo directory, containing both a python library and command line drivers.\n\nThe library can be installed using pip:\n\n ~> pip install htrmopo\n\n### CLI\n\nThe `htrmopo` command line tool is used to query the repository, download\nexisting models, and upload and update items to it.\n\n#### Querying the repository\n\nTo get a listing of all models:\n\n ~> htrmopo list\n Retrieving model list \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501 100% 0:00:00\n \u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n \u2503 DOI \u2503 summary \u2503 model type \u2503 keywords \u2503\n \u2521\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2529\n \u2502 10.5281/zenodo.7547437 \u2502 \u2502 \u2502 \u2502\n \u2502 \u251c\u2500\u2500 10.5281/zenodo.10800223 \u2502 HTR model for documentary Latin, Old French and Spanish medieval manuscripts (11th-16th) \u2502 recognition \u2502 Handwritten text recognition; Handwritten text recognition for Medieval manuscripts; Digital Paleography \u2502\n \u2502 \u2514\u2500\u2500 10.5281/zenodo.7547438 \u2502 HTR model for documentary Latin and Old French medieval manuscripts (12th-15th) \u2502 recognition \u2502 Handwritten text recognition; Handwritten text recognition for Medieval manuscripts; Digital Paleography \u2502\n \u2502 10.5281/zenodo.7050269 \u2502 \u2502 \u2502 \u2502\n \u2502 \u2514\u2500\u2500 10.5281/zenodo.7050270 \u2502 Printed Arabic-Script Base Model Trained on the OpenITI Corpus \u2502 recognition \u2502 kraken_pytorch \u2502\n \u2502 10.5281/zenodo.6542743 \u2502 \u2502 \u2502 \u2502\n \u2502 \u2514\u2500\u2500 10.5281/zenodo.6542744 \u2502 LECTAUREP Contemporary French Model (Administration) \u2502 recognition \u2502 kraken_pytorch; HTR; transcription model; recognition model; French; Contemporary French \u2502\n \u2502 10.5281/zenodo.13814199 \u2502 \u2502 \u2502 \u2502\n \u2502 \u2514\u2500\u2500 10.5281/zenodo.13814200 \u2502 Segmentation model for historical Samaritan Manuscripts for one column pages, model trained on 13 pentateuchal Samaritan manuscripts \u2502 recognition \u2502 kraken_pytorch \u2502\n \u2502 10.5281/zenodo.6891851 \u2502 \u2502 \u2502 \u2502\n \u2502 \u251c\u2500\u2500 10.5281/zenodo.7933402 \u2502 Fraktur model trained from enhanced Austrian Newspapers dataset \u2502 recognition \u2502 kraken_pytorch; Fraktur; Latin \u2502\n \u2502 \u2514\u2500\u2500 10.5281/zenodo.6891852 \u2502 Fraktur model trained from enhanced Austrian Newspapers dataset \u2502 recognition \u2502 kraken_pytorch \u2502\n \u2502 10.5281/zenodo.8193497 \u2502 \u2502 \u2502 \u2502\n \u2502 \u2514\u2500\u2500 10.5281/zenodo.8193498 \u2502 Transcription model for Lucien Peraire's handwriting (French, 20th century) \u2502 recognition \u2502 kraken_pytorch; HTR; Peraire; Manu McFrench; contemporary handwriting; French \u2502\n \u2502 10.5281/zenodo.5468664 \u2502 \u2502 \u2502 \u2502\n \u2502 \u2514\u2500\u2500 10.5281/zenodo.5468665 \u2502 Medieval Hebrew manuscripts in Sephardi bookhand version 1.0 \u2502 recognition \u2502 kraken_pytorch \u2502\n \u2502 10.5281/zenodo.10592715 \u2502 \u2502 \u2502 \u2502\n \u2502 \u2514\u2500\u2500 10.5281/zenodo.10592716 \u2502 CATMuS-Print (Large, 2024-01-30) - Diachronic model for French prints and other languages \u2502 recognition \u2502 kraken_pytorch; optical text recognition \u2502\n \u2502 10.5281/zenodo.7051645 \u2502 \u2502 \u2502 \u2502\n \u2502 \u251c\u2500\u2500 10.5281/zenodo.14585602 \u2502 Printed Urdu Base Model Trained on the OpenITI Corpus \u2502 recognition \u2502 automatic-text-recognition \u2502\n \u2502 \u251c\u2500\u2500 10.5281/zenodo.14574660 \u2502 Printed Urdu Base Model Trained on the OpenITI Corpus \u2502 recognition \u2502 kraken_pytorch \u2502\n \u2502 \u2514\u2500\u2500 10.5281/zenodo.7051646 \u2502 Printed Urdu Base Model Trained on the OpenITI Corpus \u2502 recognition \u2502 kraken_pytorch \u2502\n \u2502 10.5281/zenodo.5468285 \u2502 \u2502 \u2502 \u2502\n \u2502 \u2514\u2500\u2500 10.5281/zenodo.5468286 \u2502 Medieval Hebrew manuscripts version 1.0 \u2502 recognition \u2502 kraken_pytorch \u2502\n \u2502 10.5281/zenodo.6657808 \u2502 \u2502 \u2502 \u2502\n \u2502 \u251c\u2500\u2500 10.5281/zenodo.10886224 \u2502 Model train on openly licensed data from HTR-United from the 17th century to the 21st were used. \u2502 recognition \u2502 kraken_pytorch \u2502\n \u2502 \u251c\u2500\u2500 10.5281/zenodo.6657809 \u2502 Model train on openly licensed data from HTR-United. All French manuscript data from the 17th century to the 21st were used (72k lines). \u2502 recognition \u2502 kraken_pytorch \u2502\n \u2502 \u2514\u2500\u2500 10.5281/zenodo.10874058 \u2502 Model train on openly licensed data from HTR-United. All French manuscript data from the 17th century to the 21st were used. \u2502 recognition \u2502 kraken_pytorch \u2502\n \u2502 10.5281/zenodo.7234165 \u2502 \u2502 \u2502 \u2502\n ....\n\nRecords are represented in a tree structure in the left-most column. The DOI at\nthe root of each tree is a [concept DOI](https://zenodo.org/help/versioning)\nwhich always links to the most recent version of a model. The leaves of the\ntree are particular versions of the record ordered chronologically. Either type\nof DOI is acceptable as arguments for the functions below although it is\nrecommended to reference a concrete version in contexts where reproducibility\nis desired.\n\nTo fetch the metadata for a single model (both v0 and v1 schema):\n\n ~> htrmopo show 10.5281/zenodo.10800223\n HTR model for documentary Latin, Old French and Spanish medieval manuscripts (11th-16th) \n \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n \u2502 DOI \u2502 10.5281/zenodo.10800223 \u2502\n \u2502 concept DOI \u2502 10.5281/zenodo.7547437 \u2502\n \u2502 publication date \u2502 2024-03-14T01:47:02+00:00 \u2502\n \u2502 model type \u2502 recognition \u2502\n \u2502 script \u2502 Latin \u2502\n \u2502 alphabet \u2502 ! \" # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \\ ] ^ _ a b c d e f g h i j k l m n o p \u2502\n \u2502 \u2502 q r s t u v w x y z { | } ~ \u00a1 \u00a3 \u00a7 \u00aa \u00ab \u00ac \u00b0 \u00b6 \u00ba \u00bb \u00bd \u00be \u00c0 \u00c4 \u00c7 \u00c8 \u00c9 \u00cb \u00cf \u00db \u00dc \u00e0 \u00e1 \u00e2 \u00e4 \u00e6 \u00e7 \u00e8 \u00e9 \u00ea \u00eb \u00ec \u00ed \u00ee \u00ef \u00f1 \u00f2 \u00f3 \u00f4 \u00f6 \u00f9 \u00fa \u00fb \u00fc \u00ff \u0101 \u0103 \u0113 \u0115 \u0119 \u012b \u014d \u014f \u0153 \u016d \u0192 \u0229 \u02c0 \u03bf \u0430 \u0435 \u043e \u0441 \u15c5 \u2502\n \u2502 \u2502 \u2013 \u2014 \u2018 \u2019 \u201d \u201e \u2020 \u2026 \u2056 \u208e \u2329 \u232a \u2733 \ua76b \u2502\n \u2502 \u2502 0x9, SPACE, 0x92, 0x97, NO-BREAK SPACE, COMBINING MACRON, COMBINING LATIN SMALL LETTER A, COMBINING LATIN SMALL LETTER E, COMBINING LATIN SMALL LETTER O, \u2502\n \u2502 \u2502 COMBINING LATIN SMALL LETTER U, COMBINING LATIN SMALL LETTER C, WORD JOINER, 0xf2f7 \u2502\n \u2502 keywords \u2502 Handwritten text recognition \u2502\n \u2502 \u2502 Handwritten text recognition for Medieval manuscripts \u2502\n \u2502 \u2502 Digital Paleography \u2502\n \u2502 metrics \u2502 cer: 7.82 \u2502\n \u2502 license \u2502 MIT \u2502\n \u2502 creators \u2502 Torres Aguilar, Sergio (https://orcid.org/0000-0002-1801-3147) (University of Luxembourg) \u2502\n \u2502 \u2502 Jolivet, Vincent (\u00c9cole nationale des chartes) \u2502\n \u2502 \u2502 Sergio Torres Aguilar (University of Luxembourg) \u2502\n \u2502 description \u2502 The model was trained on diplomatic transcriptions of documentary manuscripts from the Late-medieval period (12-15th) and early modernity (16th). The \u2502\n \u2502 \u2502 training and evaluation sets entail 215k lines and 2.4M of tokens using open source corpora. \u2502\n \u2502 \u2502 \u2502\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n\nDownloading a single model:\n\n ~> htrmopo get 10.5281/zenodo.7547437 \n Processing \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501 100% 0:00:00\n Model name: /home/mittagessen/.local/share/htrmopo/0ac39ba5-8f85-5ea1-913a-f84a13ca756f\n\nModels are placed per default in reproducible locations in the application\nstate dir printed after the download is finished. The `-o` option allows\ncustomization of that behavior:\n\n ~> htrmopo get -o manu 10.5281/zenodo.7547437\n Processing \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501 100% 0:00:00\n Model name: /home/mittagessen/manu\n\n#### Publishing models\n\nThere are two modes of publishing ATR models with the `htrmopo` command. The\nfirst creates new stand-alone deposits while the second one creates a new\nversion of an existing record that will all be grouped under the same concept\nDOI. Updating a model deposit is usually done when a prior model is retrained\nwith additional training data, the metadata has been refined, or additional\nevaluation has been done.\n\nThe calls for both modes are very similar, the only difference being `-d`\noption giving the DOI of an existing model deposit in the repository:\n\n ~> htrmopo publish -i model_card.md -a ${ACCESS_TOKEN} model_dir\n Uploading \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501 100% 0:00:00\n model PID: 10.5072/zenodo.146629\n \n ~> htrmopo publish -d 10.5072/zenodo.146502 -i model_card.md -a ${ACCESS_TOKEN} model_dir\n Uploading \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501 100% 0:00:00\n model PID: 10.5072/zenodo.146627\n\n#### Configuration\n\nThe tool is intended to work out of the box but sometimes it can be useful for\ntesting purposes to point it to another instance of\n[InvenioDRM](https://inveniosoftware.org/products/rdm/) such as the [Zenodo\nsandbox](https://sandbox.zenodo.org/) in order not to pollute the main\nrepository with spurious deposits.\n\nYou can set the OAI-PMH API endpoint (required for querying) and InvenioDRM\nendpoint (needed for querying and publishing) with the `MODEL_REPO_OAI_URL` and\n`MODEL_REPO_URL` environments, for example:\n\n MODEL_REPO_URL=https://sandbox.zenodo.org/api/ htrmopo publish -i model_card.md -a ....\n\nwill upload a model to the sandbox instance of Zenodo.\n",
"bugtrack_url": null,
"license": "Apache",
"summary": "HTRMoPo repository reference implementation",
"version": "0.2.0",
"project_urls": {
"Homepage": "http://htrmopo.github.io"
},
"split_keywords": [
"atr"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9036e216f1153ceaf7bb7f964fa490f5852a03f05945cc9a37fbe81caae61f61",
"md5": "e40dd0eeeb27f59cedac8ec10ea70618",
"sha256": "913dd4fe9cb9cd03833827f2c92b2d74fa772295e3c4256aab2a04037a8de983"
},
"downloads": -1,
"filename": "htrmopo-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e40dd0eeeb27f59cedac8ec10ea70618",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 113893,
"upload_time": "2025-01-05T14:40:56",
"upload_time_iso_8601": "2025-01-05T14:40:56.260589Z",
"url": "https://files.pythonhosted.org/packages/90/36/e216f1153ceaf7bb7f964fa490f5852a03f05945cc9a37fbe81caae61f61/htrmopo-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8d71353ba08185842d997766febd674edc7e05de68438e0101d674c20dfd0c3f",
"md5": "e60afdb9f0929c08eeed5de821f8f6f5",
"sha256": "3bbb13bf84ee4c6c752b7219c36beb5a6915277f0e435d5664a9da04465fa492"
},
"downloads": -1,
"filename": "htrmopo-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "e60afdb9f0929c08eeed5de821f8f6f5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 121901,
"upload_time": "2025-01-05T14:40:59",
"upload_time_iso_8601": "2025-01-05T14:40:59.189114Z",
"url": "https://files.pythonhosted.org/packages/8d/71/353ba08185842d997766febd674edc7e05de68438e0101d674c20dfd0c3f/htrmopo-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-05 14:40:59",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "htrmopo"
}