# PyLeLemmatize
[](https://pypi.org/project/pylelemmatize/)
[](https://pypi.org/project/pylelemmatize/)
[](https://github.com/anguelos/pylelemmatize/actions/workflows/tests.yml)
[](https://pylelemmatize.readthedocs.io/en/latest/)
[](https://github.com/anguelos/pylelemmatize/blob/main/LICENSE)
A fast, modular lemmatization toolkit for Python.
PyLeLemmatize is a Python package for lemmatizing text. It provides a simple and efficient way to reduce large characters to simpler ones.
## Installation
### Install from GitHub with pip
To install PyLemmatize directly from GitHub using pip, run the following command:
```sh
pip install git+https://github.com/yourusername/pylelemmatize.git
```
### Install from GitHub with code
To install PyLemmatize from the source code, follow these steps:
1. Clone the repository:
2. Navigate to the project directory:
3. Install the package
```sh
git clone https://github.com/yourusername/pylelemmatize.git
cd pylelemmatize
pip install -e ./
# If you dont want a development install, do pip install ./
```
## Usage
### Command Line Invocation
#### Evaluate Merges
```sh
ll_evaluate_merges -h # get help string with the cli interface
ll_evaluate_merges -corpus_glob './sample_data/wienocist_charter_1/wienocist_charter_1*'
```
Attention the merge CER is not symetric at all!
```
# The following gives a CER of 0.0591
ll_evaluate_merges -corpus_glob './sample_data/wienocist_charter_1/wienocist_charter_1*' -merges '[("I", "J"), ("i", "j")]'
# While the following gives a CER of 0.0007
ll_evaluate_merges -corpus_glob './sample_data/wienocist_charter_1/wienocist_charter_1*' -merges '[("J", "I"), ("j", "i")]'
```
#### Extract corpus alphabet
```sh
ll_extract_corpus_alphabet -h # get help string with the cli interface
ll_extract_corpus_alphabet -corpus_glob './sample_data/wienocist_charter_1/wienocist_charter_1*'
```
#### Test corpus on alphabets
```sh
ll_test_corpus_on_alphabets -h # get help string with the cli interface
ll_test_corpus_on_alphabets -corpus_glob './sample_data/wienocist_charter_1/wienocist_charter_1*' -alphabets 'bmp_mufi,ascii,mes1,iso8859_2' -verbose
```
## Contributing
Contributions are welcome!
## License
This project is licensed under the MIT License.
Raw data
{
"_id": null,
"home_page": "https://github.com/anguelos/pylelemmatize",
"name": "pylelemmatize",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": null,
"author": "Anguelos Nicolaou",
"author_email": "anguelos.nicolaou@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/8a/87/0cd057b049f3613f27972ff864be268ef2a540a039a7e5a44f49735e14d5/pylelemmatize-0.1.0.tar.gz",
"platform": null,
"description": "# PyLeLemmatize\n\n[](https://pypi.org/project/pylelemmatize/)\n[](https://pypi.org/project/pylelemmatize/)\n[](https://github.com/anguelos/pylelemmatize/actions/workflows/tests.yml)\n[](https://pylelemmatize.readthedocs.io/en/latest/)\n[](https://github.com/anguelos/pylelemmatize/blob/main/LICENSE)\n\nA fast, modular lemmatization toolkit for Python.\n\n\n\nPyLeLemmatize is a Python package for lemmatizing text. It provides a simple and efficient way to reduce large characters to simpler ones.\n\n## Installation\n\n### Install from GitHub with pip\n\nTo install PyLemmatize directly from GitHub using pip, run the following command:\n\n```sh\npip install git+https://github.com/yourusername/pylelemmatize.git\n```\n\n### Install from GitHub with code\n\nTo install PyLemmatize from the source code, follow these steps:\n\n1. Clone the repository:\n2. Navigate to the project directory:\n3. Install the package\n\n```sh\ngit clone https://github.com/yourusername/pylelemmatize.git\ncd pylelemmatize\npip install -e ./ \n# If you dont want a development install, do pip install ./\n```\n\n## Usage\n\n### Command Line Invocation\n\n#### Evaluate Merges\n\n```sh\nll_evaluate_merges -h # get help string with the cli interface\nll_evaluate_merges -corpus_glob './sample_data/wienocist_charter_1/wienocist_charter_1*'\n```\n\nAttention the merge CER is not symetric at all!\n```\n# The following gives a CER of 0.0591\nll_evaluate_merges -corpus_glob './sample_data/wienocist_charter_1/wienocist_charter_1*' -merges '[(\"I\", \"J\"), (\"i\", \"j\")]'\n# While the following gives a CER of 0.0007\nll_evaluate_merges -corpus_glob './sample_data/wienocist_charter_1/wienocist_charter_1*' -merges '[(\"J\", \"I\"), (\"j\", \"i\")]'\n```\n\n#### Extract corpus alphabet\n```sh\nll_extract_corpus_alphabet -h # get help string with the cli interface\nll_extract_corpus_alphabet -corpus_glob './sample_data/wienocist_charter_1/wienocist_charter_1*'\n```\n\n#### Test corpus on alphabets\n```sh\nll_test_corpus_on_alphabets -h # get help string with the cli interface\nll_test_corpus_on_alphabets -corpus_glob './sample_data/wienocist_charter_1/wienocist_charter_1*' -alphabets 'bmp_mufi,ascii,mes1,iso8859_2' -verbose\n```\n\n## Contributing\n\nContributions are welcome!\n\n## License\n\nThis project is licensed under the MIT License.\n",
"bugtrack_url": null,
"license": null,
"summary": "A set utilities for hadling alphabets of corpora and OCR/HTR datasets",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/anguelos/pylelemmatize"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "bb674a7435798ed4f6353ed0a67b1ccf789625e98a16d4f8448f0b8444b7637d",
"md5": "2d14062bee18c75bcfdfc3c24afa3393",
"sha256": "f9021e92a0a095404b29230f9ce7dc90ceb86167093e2a63007fc808b1ed5ae3"
},
"downloads": -1,
"filename": "pylelemmatize-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2d14062bee18c75bcfdfc3c24afa3393",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 54428,
"upload_time": "2025-11-02T17:56:19",
"upload_time_iso_8601": "2025-11-02T17:56:19.930424Z",
"url": "https://files.pythonhosted.org/packages/bb/67/4a7435798ed4f6353ed0a67b1ccf789625e98a16d4f8448f0b8444b7637d/pylelemmatize-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "8a870cd057b049f3613f27972ff864be268ef2a540a039a7e5a44f49735e14d5",
"md5": "816a70a7a1cd54c8465cdb149c5ced6d",
"sha256": "2e4cbd7ff7692fc1d0de9b04c1a86b36d23840315e206af17a440c4ee8a66576"
},
"downloads": -1,
"filename": "pylelemmatize-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "816a70a7a1cd54c8465cdb149c5ced6d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 45526,
"upload_time": "2025-11-02T17:56:21",
"upload_time_iso_8601": "2025-11-02T17:56:21.775029Z",
"url": "https://files.pythonhosted.org/packages/8a/87/0cd057b049f3613f27972ff864be268ef2a540a039a7e5a44f49735e14d5/pylelemmatize-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-02 17:56:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "anguelos",
"github_project": "pylelemmatize",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": []
},
{
"name": "unidecode",
"specs": []
},
{
"name": "fargv",
"specs": []
},
{
"name": "matplotlib",
"specs": []
},
{
"name": "scipy",
"specs": []
},
{
"name": "tqdm",
"specs": []
},
{
"name": "networkx",
"specs": []
},
{
"name": "torch",
"specs": []
},
{
"name": "lxml",
"specs": []
},
{
"name": "pytest",
"specs": []
},
{
"name": "pytest-cov",
"specs": []
},
{
"name": "sphinx",
"specs": []
},
{
"name": "sphinx-autodoc-typehints",
"specs": []
},
{
"name": "myst-parser",
"specs": []
},
{
"name": "sphinx-rtd-theme",
"specs": []
},
{
"name": "sphinx-copybutton",
"specs": []
},
{
"name": "sphinxcontrib-mermaid",
"specs": []
},
{
"name": "linkify-it-py",
"specs": []
},
{
"name": "sphinx_rtd_theme",
"specs": []
}
],
"lcname": "pylelemmatize"
}