pylaia


Namepylaia JSON
Version 1.1.2 PyPI version JSON
download
home_pageNone
SummaryNone
upload_time2024-10-16 15:51:04
maintainerNone
docs_urlNone
authorNone
requires_python<3.11,>=3.9
licenseMIT
keywords htr ocr python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">

# PyLaia

**PyLaia is a device agnostic, PyTorch based, deep learning toolkit for handwritten document analysis.**

**It is also a successor to [Laia](https://github.com/jpuigcerver/Laia).**

[![pipeline status](https://gitlab.teklia.com/atr/pylaia/badges/master/pipeline.svg)](https://gitlab.teklia.com/atr/pylaia/-/commits/master)
[![Coverage](https://gitlab.teklia.com/atr/pylaia/badges/master/coverage.svg)](https://gitlab.teklia.com/atr/pylaia/-/commits/master)
[![Code quality](https://img.shields.io/codefactor/grade/github/jpuigcerver/PyLaia?&label=CodeFactor&logo=CodeFactor&labelColor=2782f7)](https://www.codefactor.io/repository/github/jpuigcerver/PyLaia)

[![Python: 3.9 | 3.10](https://img.shields.io/badge/python-3.9%20%7C%203.10-blue)](https://www.python.org/)
[![PyTorch: 1.13.0 | 1.13.1](https://img.shields.io/badge/PyTorch-1.13.0%20%7C%201.13.1-8628d5.svg?&logo=PyTorch&logoColor=white&labelColor=%23ee4c2c)](https://pytorch.org/)
[![pre-commit: enabled](https://img.shields.io/badge/pre--commit-enabled-76877c?&logo=pre-commit&labelColor=1f2d23)](https://github.com/pre-commit/pre-commit)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?)](https://github.com/ambv/black)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

</div>

Get started by having a look at our [Documentation](https://atr.pages.teklia.com/pylaia)!

## Installation

To install PyLaia from PyPi:

```bash
pip install pylaia
```

The following Python scripts will be installed in your system:

- [`pylaia-htr-create-model`](laia/scripts/htr/create_model.py): Create a VGG-like model with BLSTMs on top for handwriting text recognition. The script has different options to customize the model. The architecture is based on the paper ["Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition?"](https://ieeexplore.ieee.org/document/8269951) (2017) by J. Puigcerver.
- [`pylaia-htr-train-ctc`](laia/scripts/htr/train_ctc.py): Train a model using the CTC algorithm and a set of text-line images and their transcripts.
- [`pylaia-htr-decode-ctc`](laia/scripts/htr/decode_ctc.py): Decode text line images using a trained model and the CTC algorithm. It can also output the char/word segmentation boundaries of the symbols recognized.
- [`pylaia-htr-netout`](laia/scripts/htr/netout.py): Dump the output of the model for a set of text-line images in order to decode using an external language model.

## Contributing

If you want to contribute new feature or found a text that is incorrectly segmented using pySBD, then please head to [CONTRIBUTING.md](https://gitlab.teklia.com/atr/pylaia/-/blob/master/CONTRIBUTING.md) to know more and follow these steps.

1.  Fork it ( <https://gitlab.teklia.com/atr/pylaia/-/forks/new> )
2.  Create your feature branch (`git checkout -b my-new-feature`)
3.  Commit your changes (`git commit -am 'Add some feature'`)
4.  Push to the branch (`git push origin my-new-feature`)
5.  Create a new Merge Request ( <https://gitlab.teklia.com/atr/pylaia/-/merge_requests/new> )

### Code of conduct

We are committed to providing a friendly, safe and welcoming environment for all. Please read and
respect the [PyLaia Code of Conduct](https://gitlab.teklia.com/atr/pylaia/-/blob/master/CODE_OF_CONDUCT.md).

## Acknowledgments

Work in this toolkit was financially supported by the [Pattern Recognition and Human Language Technology (PRHLT) Research Center](https://www.prhlt.upv.es/).

## Citation

* Article describing the latest contributions to PyLaia

```bib
@inproceedings{pylaia2024,
    author = "Tarride, Solène and Schneider, Yoann and Generali, Marie and Boillet, Melodie and Abadie, Bastien and Kermorvant, Christopher",
    title = "Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library",
    booktitle = "Submitted at ICDAR",
    year = "2024"
}
```

* Original article

```bib
@inproceedings{laia2017,
  author={Puigcerver, Joan},
  booktitle={2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)},
  title={Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition?},
  year={2017},
  volume={01},
  number={},
  pages={67-72},
  doi={10.1109/ICDAR.2017.20}}
```

* GitLab repository

```bib
@software{pylaia-teklia,
  author = {Teklia},
  title = {PyLaia},
  year = {2022},
  url = {https://gitlab.teklia.com/atr/pylaia/},
  version = {1.1.0},
  note = {commit SHA}
}
```

* GitHub repository

```bib
@misc{puigcerver2018pylaia,
  author = {Joan Puigcerver and Carlos Mocholí},
  title = {PyLaia},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/jpuigcerver/PyLaia/}},
  commit = {commit SHA}
}
```

## Contact

🆘 Have a question about PyLaia? Please contact us on [support.teklia.com](https://support.teklia.com/c/machine-learning/pylaia/13).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pylaia",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.11,>=3.9",
    "maintainer_email": "Teklia <contact@teklia.com>",
    "keywords": "HTR OCR python",
    "author": null,
    "author_email": "Joan Puigcerver <joapuipe@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/51/b7/47f07f442c9bd651cef7d36b382cd26856e567e1ab724cd2fe53e5ae0e40/pylaia-1.1.2.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n\n# PyLaia\n\n**PyLaia is a device agnostic, PyTorch based, deep learning toolkit for handwritten document analysis.**\n\n**It is also a successor to [Laia](https://github.com/jpuigcerver/Laia).**\n\n[![pipeline status](https://gitlab.teklia.com/atr/pylaia/badges/master/pipeline.svg)](https://gitlab.teklia.com/atr/pylaia/-/commits/master)\n[![Coverage](https://gitlab.teklia.com/atr/pylaia/badges/master/coverage.svg)](https://gitlab.teklia.com/atr/pylaia/-/commits/master)\n[![Code quality](https://img.shields.io/codefactor/grade/github/jpuigcerver/PyLaia?&label=CodeFactor&logo=CodeFactor&labelColor=2782f7)](https://www.codefactor.io/repository/github/jpuigcerver/PyLaia)\n\n[![Python: 3.9 | 3.10](https://img.shields.io/badge/python-3.9%20%7C%203.10-blue)](https://www.python.org/)\n[![PyTorch: 1.13.0 | 1.13.1](https://img.shields.io/badge/PyTorch-1.13.0%20%7C%201.13.1-8628d5.svg?&logo=PyTorch&logoColor=white&labelColor=%23ee4c2c)](https://pytorch.org/)\n[![pre-commit: enabled](https://img.shields.io/badge/pre--commit-enabled-76877c?&logo=pre-commit&labelColor=1f2d23)](https://github.com/pre-commit/pre-commit)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?)](https://github.com/ambv/black)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n\n</div>\n\nGet started by having a look at our [Documentation](https://atr.pages.teklia.com/pylaia)!\n\n## Installation\n\nTo install PyLaia from PyPi:\n\n```bash\npip install pylaia\n```\n\nThe following Python scripts will be installed in your system:\n\n- [`pylaia-htr-create-model`](laia/scripts/htr/create_model.py): Create a VGG-like model with BLSTMs on top for handwriting text recognition. The script has different options to customize the model. The architecture is based on the paper [\"Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition?\"](https://ieeexplore.ieee.org/document/8269951) (2017) by J. Puigcerver.\n- [`pylaia-htr-train-ctc`](laia/scripts/htr/train_ctc.py): Train a model using the CTC algorithm and a set of text-line images and their transcripts.\n- [`pylaia-htr-decode-ctc`](laia/scripts/htr/decode_ctc.py): Decode text line images using a trained model and the CTC algorithm. It can also output the char/word segmentation boundaries of the symbols recognized.\n- [`pylaia-htr-netout`](laia/scripts/htr/netout.py): Dump the output of the model for a set of text-line images in order to decode using an external language model.\n\n## Contributing\n\nIf you want to contribute new feature or found a text that is incorrectly segmented using pySBD, then please head to [CONTRIBUTING.md](https://gitlab.teklia.com/atr/pylaia/-/blob/master/CONTRIBUTING.md) to know more and follow these steps.\n\n1.  Fork it ( <https://gitlab.teklia.com/atr/pylaia/-/forks/new> )\n2.  Create your feature branch (`git checkout -b my-new-feature`)\n3.  Commit your changes (`git commit -am 'Add some feature'`)\n4.  Push to the branch (`git push origin my-new-feature`)\n5.  Create a new Merge Request ( <https://gitlab.teklia.com/atr/pylaia/-/merge_requests/new> )\n\n### Code of conduct\n\nWe are committed to providing a friendly, safe and welcoming environment for all. Please read and\nrespect the [PyLaia Code of Conduct](https://gitlab.teklia.com/atr/pylaia/-/blob/master/CODE_OF_CONDUCT.md).\n\n## Acknowledgments\n\nWork in this toolkit was financially supported by the [Pattern Recognition and Human Language Technology (PRHLT) Research Center](https://www.prhlt.upv.es/).\n\n## Citation\n\n* Article describing the latest contributions to PyLaia\n\n```bib\n@inproceedings{pylaia2024,\n    author = \"Tarride, Sol\u00e8ne and Schneider, Yoann and Generali, Marie and Boillet, Melodie and Abadie, Bastien and Kermorvant, Christopher\",\n    title = \"Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library\",\n    booktitle = \"Submitted at ICDAR\",\n    year = \"2024\"\n}\n```\n\n* Original article\n\n```bib\n@inproceedings{laia2017,\n  author={Puigcerver, Joan},\n  booktitle={2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)},\n  title={Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition?},\n  year={2017},\n  volume={01},\n  number={},\n  pages={67-72},\n  doi={10.1109/ICDAR.2017.20}}\n```\n\n* GitLab repository\n\n```bib\n@software{pylaia-teklia,\n  author = {Teklia},\n  title = {PyLaia},\n  year = {2022},\n  url = {https://gitlab.teklia.com/atr/pylaia/},\n  version = {1.1.0},\n  note = {commit SHA}\n}\n```\n\n* GitHub repository\n\n```bib\n@misc{puigcerver2018pylaia,\n  author = {Joan Puigcerver and Carlos Mochol\u00ed},\n  title = {PyLaia},\n  year = {2018},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/jpuigcerver/PyLaia/}},\n  commit = {commit SHA}\n}\n```\n\n## Contact\n\n\ud83c\udd98 Have a question about PyLaia? Please contact us on [support.teklia.com](https://support.teklia.com/c/machine-learning/pylaia/13).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": null,
    "version": "1.1.2",
    "project_urls": {
        "Documentation": "https://atr.pages.teklia.com/pylaia/",
        "Downloads": "https://gitlab.teklia.com/atr/pylaia",
        "Homepage": "https://atr.pages.teklia.com/pylaia/",
        "Source": "https://gitlab.teklia.com/atr/pylaia/",
        "Tracker": "https://gitlab.teklia.com/atr/pylaia/issues/"
    },
    "split_keywords": [
        "htr",
        "ocr",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "36d24d2ecc9677eddef508ef4a1671fbcfd1459f904600e8be0edac712d2f7f2",
                "md5": "da3912e93189f62c50551e3e6447ce55",
                "sha256": "8b894d3a84a8636b58cbd8b5a1e08024b77f3de42fc2a81e772e4710c734399b"
            },
            "downloads": -1,
            "filename": "pylaia-1.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "da3912e93189f62c50551e3e6447ce55",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.11,>=3.9",
            "size": 93416,
            "upload_time": "2024-10-16T15:51:02",
            "upload_time_iso_8601": "2024-10-16T15:51:02.555856Z",
            "url": "https://files.pythonhosted.org/packages/36/d2/4d2ecc9677eddef508ef4a1671fbcfd1459f904600e8be0edac712d2f7f2/pylaia-1.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "51b747f07f442c9bd651cef7d36b382cd26856e567e1ab724cd2fe53e5ae0e40",
                "md5": "ac8ff69819deb3f05db095397cd619f3",
                "sha256": "873fabd7f382ce7d267cc8af2e0f1848ab6f3382eb2c7e5a0fbfa8c0f576b95a"
            },
            "downloads": -1,
            "filename": "pylaia-1.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "ac8ff69819deb3f05db095397cd619f3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.11,>=3.9",
            "size": 66849,
            "upload_time": "2024-10-16T15:51:04",
            "upload_time_iso_8601": "2024-10-16T15:51:04.163072Z",
            "url": "https://files.pythonhosted.org/packages/51/b7/47f07f442c9bd651cef7d36b382cd26856e567e1ab724cd2fe53e5ae0e40/pylaia-1.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-16 15:51:04",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "pylaia"
}
        
Elapsed time: 0.81227s