pylelemmatize


Namepylelemmatize JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/anguelos/pylelemmatize
SummaryA set utilities for hadling alphabets of corpora and OCR/HTR datasets
upload_time2025-11-02 17:56:21
maintainerNone
docs_urlNone
authorAnguelos Nicolaou
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements numpy unidecode fargv matplotlib scipy tqdm networkx torch lxml pytest pytest-cov sphinx sphinx-autodoc-typehints myst-parser sphinx-rtd-theme sphinx-copybutton sphinxcontrib-mermaid linkify-it-py sphinx_rtd_theme
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PyLeLemmatize

[![PyPI](https://img.shields.io/pypi/v/pylelemmatize.svg)](https://pypi.org/project/pylelemmatize/)
[![Python](https://img.shields.io/pypi/pyversions/pylelemmatize.svg)](https://pypi.org/project/pylelemmatize/)
[![Build](https://github.com/anguelos/pylelemmatize/actions/workflows/tests.yml/badge.svg)](https://github.com/anguelos/pylelemmatize/actions/workflows/tests.yml)
[![Docs](https://readthedocs.org/projects/pylelemmatize/badge/?version=latest)](https://pylelemmatize.readthedocs.io/en/latest/)
[![License](https://img.shields.io/github/license/anguelos/pylelemmatize.svg)](https://github.com/anguelos/pylelemmatize/blob/main/LICENSE)

A fast, modular lemmatization toolkit for Python.



PyLeLemmatize is a Python package for lemmatizing text. It provides a simple and efficient way to reduce large characters to simpler ones.

## Installation

### Install from GitHub with pip

To install PyLemmatize directly from GitHub using pip, run the following command:

```sh
pip install git+https://github.com/yourusername/pylelemmatize.git
```

### Install from GitHub with code

To install PyLemmatize from the source code, follow these steps:

1. Clone the repository:
2. Navigate to the project directory:
3. Install the package

```sh
git clone https://github.com/yourusername/pylelemmatize.git
cd pylelemmatize
pip install -e ./  
# If you dont want a development install, do pip install ./
```

## Usage

### Command Line Invocation

#### Evaluate Merges

```sh
ll_evaluate_merges -h # get help string with the cli interface
ll_evaluate_merges -corpus_glob  './sample_data/wienocist_charter_1/wienocist_charter_1*'
```

Attention the merge CER is not symetric at all!
```
# The following gives a CER of 0.0591
ll_evaluate_merges -corpus_glob  './sample_data/wienocist_charter_1/wienocist_charter_1*' -merges '[("I", "J"), ("i", "j")]'
# While the following gives a CER of 0.0007
ll_evaluate_merges -corpus_glob  './sample_data/wienocist_charter_1/wienocist_charter_1*' -merges '[("J", "I"), ("j", "i")]'
```

#### Extract corpus alphabet
```sh
ll_extract_corpus_alphabet -h # get help string with the cli interface
ll_extract_corpus_alphabet -corpus_glob './sample_data/wienocist_charter_1/wienocist_charter_1*'
```

#### Test corpus on alphabets
```sh
ll_test_corpus_on_alphabets -h # get help string with the cli interface
ll_test_corpus_on_alphabets -corpus_glob './sample_data/wienocist_charter_1/wienocist_charter_1*' -alphabets 'bmp_mufi,ascii,mes1,iso8859_2' -verbose
```

## Contributing

Contributions are welcome!

## License

This project is licensed under the MIT License.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/anguelos/pylelemmatize",
    "name": "pylelemmatize",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Anguelos Nicolaou",
    "author_email": "anguelos.nicolaou@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/8a/87/0cd057b049f3613f27972ff864be268ef2a540a039a7e5a44f49735e14d5/pylelemmatize-0.1.0.tar.gz",
    "platform": null,
    "description": "# PyLeLemmatize\n\n[![PyPI](https://img.shields.io/pypi/v/pylelemmatize.svg)](https://pypi.org/project/pylelemmatize/)\n[![Python](https://img.shields.io/pypi/pyversions/pylelemmatize.svg)](https://pypi.org/project/pylelemmatize/)\n[![Build](https://github.com/anguelos/pylelemmatize/actions/workflows/tests.yml/badge.svg)](https://github.com/anguelos/pylelemmatize/actions/workflows/tests.yml)\n[![Docs](https://readthedocs.org/projects/pylelemmatize/badge/?version=latest)](https://pylelemmatize.readthedocs.io/en/latest/)\n[![License](https://img.shields.io/github/license/anguelos/pylelemmatize.svg)](https://github.com/anguelos/pylelemmatize/blob/main/LICENSE)\n\nA fast, modular lemmatization toolkit for Python.\n\n\n\nPyLeLemmatize is a Python package for lemmatizing text. It provides a simple and efficient way to reduce large characters to simpler ones.\n\n## Installation\n\n### Install from GitHub with pip\n\nTo install PyLemmatize directly from GitHub using pip, run the following command:\n\n```sh\npip install git+https://github.com/yourusername/pylelemmatize.git\n```\n\n### Install from GitHub with code\n\nTo install PyLemmatize from the source code, follow these steps:\n\n1. Clone the repository:\n2. Navigate to the project directory:\n3. Install the package\n\n```sh\ngit clone https://github.com/yourusername/pylelemmatize.git\ncd pylelemmatize\npip install -e ./  \n# If you dont want a development install, do pip install ./\n```\n\n## Usage\n\n### Command Line Invocation\n\n#### Evaluate Merges\n\n```sh\nll_evaluate_merges -h # get help string with the cli interface\nll_evaluate_merges -corpus_glob  './sample_data/wienocist_charter_1/wienocist_charter_1*'\n```\n\nAttention the merge CER is not symetric at all!\n```\n# The following gives a CER of 0.0591\nll_evaluate_merges -corpus_glob  './sample_data/wienocist_charter_1/wienocist_charter_1*' -merges '[(\"I\", \"J\"), (\"i\", \"j\")]'\n# While the following gives a CER of 0.0007\nll_evaluate_merges -corpus_glob  './sample_data/wienocist_charter_1/wienocist_charter_1*' -merges '[(\"J\", \"I\"), (\"j\", \"i\")]'\n```\n\n#### Extract corpus alphabet\n```sh\nll_extract_corpus_alphabet -h # get help string with the cli interface\nll_extract_corpus_alphabet -corpus_glob './sample_data/wienocist_charter_1/wienocist_charter_1*'\n```\n\n#### Test corpus on alphabets\n```sh\nll_test_corpus_on_alphabets -h # get help string with the cli interface\nll_test_corpus_on_alphabets -corpus_glob './sample_data/wienocist_charter_1/wienocist_charter_1*' -alphabets 'bmp_mufi,ascii,mes1,iso8859_2' -verbose\n```\n\n## Contributing\n\nContributions are welcome!\n\n## License\n\nThis project is licensed under the MIT License.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A set utilities for hadling alphabets of corpora and OCR/HTR datasets",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/anguelos/pylelemmatize"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bb674a7435798ed4f6353ed0a67b1ccf789625e98a16d4f8448f0b8444b7637d",
                "md5": "2d14062bee18c75bcfdfc3c24afa3393",
                "sha256": "f9021e92a0a095404b29230f9ce7dc90ceb86167093e2a63007fc808b1ed5ae3"
            },
            "downloads": -1,
            "filename": "pylelemmatize-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2d14062bee18c75bcfdfc3c24afa3393",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 54428,
            "upload_time": "2025-11-02T17:56:19",
            "upload_time_iso_8601": "2025-11-02T17:56:19.930424Z",
            "url": "https://files.pythonhosted.org/packages/bb/67/4a7435798ed4f6353ed0a67b1ccf789625e98a16d4f8448f0b8444b7637d/pylelemmatize-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8a870cd057b049f3613f27972ff864be268ef2a540a039a7e5a44f49735e14d5",
                "md5": "816a70a7a1cd54c8465cdb149c5ced6d",
                "sha256": "2e4cbd7ff7692fc1d0de9b04c1a86b36d23840315e206af17a440c4ee8a66576"
            },
            "downloads": -1,
            "filename": "pylelemmatize-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "816a70a7a1cd54c8465cdb149c5ced6d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 45526,
            "upload_time": "2025-11-02T17:56:21",
            "upload_time_iso_8601": "2025-11-02T17:56:21.775029Z",
            "url": "https://files.pythonhosted.org/packages/8a/87/0cd057b049f3613f27972ff864be268ef2a540a039a7e5a44f49735e14d5/pylelemmatize-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-02 17:56:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "anguelos",
    "github_project": "pylelemmatize",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "unidecode",
            "specs": []
        },
        {
            "name": "fargv",
            "specs": []
        },
        {
            "name": "matplotlib",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "networkx",
            "specs": []
        },
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "lxml",
            "specs": []
        },
        {
            "name": "pytest",
            "specs": []
        },
        {
            "name": "pytest-cov",
            "specs": []
        },
        {
            "name": "sphinx",
            "specs": []
        },
        {
            "name": "sphinx-autodoc-typehints",
            "specs": []
        },
        {
            "name": "myst-parser",
            "specs": []
        },
        {
            "name": "sphinx-rtd-theme",
            "specs": []
        },
        {
            "name": "sphinx-copybutton",
            "specs": []
        },
        {
            "name": "sphinxcontrib-mermaid",
            "specs": []
        },
        {
            "name": "linkify-it-py",
            "specs": []
        },
        {
            "name": "sphinx_rtd_theme",
            "specs": []
        }
    ],
    "lcname": "pylelemmatize"
}
        
Elapsed time: 3.64829s