# Wolaita_POST
## Overview
Wolaita_POST is a Python package designed for accurate Part-of-Speech (POS) tagging in the Wolaita language. It employs deep learning models, including Bi-GRU, Bi-LSTM, and others, and integrates FastText embeddings for enhanced performance. The package utilizes pretrained models to simplify deployment and improve tagging accuracy. Wolaita_POST is an essential tool for researchers and developers focused on Natural Language Processing (NLP) for lesser-resourced languages, providing a robust solution for Wolaita language text analysis.
## Features
- Accurate POS Tagging: Utilizes deep learning models (Bi-GRU, Bi-LSTM, etc.) to achieve precise Part-of-Speech tagging for Wolaita language text.
- Pretrained Models: Ready-to-use pretrained models for quick deployment and high accuracy.
- FastText Embeddings: Incorporates FastText word embeddings to capture subword information and improve performance on low-resource languages.
- Easy Integration: Simple API that allows researchers and developers to integrate POS tagging into their NLP pipelines.
- Supports Wolaita Language: Specifically designed for the Wolaita language, addressing the challenges of processing lesser-resourced languages.
- Customizable: Flexible configuration to accommodate different models, tokenizers, and word vectors based on project requirements.
- Efficient Deployment: Enables easy deployment for various NLP applications, such as machine translation and named entity recognition (NER).
## Installation
To install Wolaita_POST, you can use pip:
- pip install Wolaita_POST
##Usage
After installation, you can use Wolaita_POST as follows:
1. Import the package:
from Wolaita_POST import pos_tagger
2. Set file paths for your pretrained model, word vectors, and tokenizers:
model_path = "/content/drive/MyDrive/POS/Bi_GRU_model.h5" # Adjust if your model file has a different extension
word_vector_path = "/content/drive/MyDrive/POS/fasttext_compatible.bin"
word_tokenizer_path = "/content/drive/MyDrive/POS/wolaita_tokenizerX.pkl"
tag_tokenizer_path = "/content/drive/MyDrive/POS/wolaita_tag_tokenizerY.pkl"
3. Initialize the POS tagger:
pos_tagger = WolaitaPOSTagger(
model_path=model_path,
word_vector_path=word_vector_path,
word_tokenizer_path=word_tokenizer_path,
tag_tokenizer_path=tag_tokenizer_path
)
4. Use the POS tagger to tag Wolaita text:
text = ['Insert your sample text here']
tagged_text = pos_tagger.tag(text)
print(tagged_text)
The tagged_text will contain the part-of-speech tags for the given Wolaita text.
##Running Tests
If you want to verify functionality, you can use pytest. Run this command in your project directory:
- !pytest /content/drive/MyDrive/Wolaita_POST/tests > test_report.txt
##License
This project is licensed under the MIT License. See the LICENSE file for more details.
##Contributing
Contributions are welcome! If you have suggestions for improving the package or find any issues, feel free to open a pull request or submit an issue on GitHub.
##Acknowledgements
Special thanks to the developers and researchers who contributed to this project, making it possible to expand NLP resources for the Wolaita language.
Raw data
{
"_id": null,
"home_page": "https://github.com/Sisagegn/Wolaita_POST",
"name": "Wolaita-POST",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "Wolaita POS tagging NLP deep learning",
"author": "Sisagegn Samuel",
"author_email": "samuelsisagegn@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/bc/0b/7c284a88126d41dcd9fe1ac9661036ddd9ea8f934ca7e8c122836086d8be/wolaita_post-0.1.0.tar.gz",
"platform": null,
"description": "# Wolaita_POST\n## Overview\nWolaita_POST is a Python package designed for accurate Part-of-Speech (POS) tagging in the Wolaita language. It employs deep learning models, including Bi-GRU, Bi-LSTM, and others, and integrates FastText embeddings for enhanced performance. The package utilizes pretrained models to simplify deployment and improve tagging accuracy. Wolaita_POST is an essential tool for researchers and developers focused on Natural Language Processing (NLP) for lesser-resourced languages, providing a robust solution for Wolaita language text analysis.\n\n## Features\n- Accurate POS Tagging: Utilizes deep learning models (Bi-GRU, Bi-LSTM, etc.) to achieve precise Part-of-Speech tagging for Wolaita language text.\n- Pretrained Models: Ready-to-use pretrained models for quick deployment and high accuracy.\n- FastText Embeddings: Incorporates FastText word embeddings to capture subword information and improve performance on low-resource languages.\n- Easy Integration: Simple API that allows researchers and developers to integrate POS tagging into their NLP pipelines.\n- Supports Wolaita Language: Specifically designed for the Wolaita language, addressing the challenges of processing lesser-resourced languages.\n- Customizable: Flexible configuration to accommodate different models, tokenizers, and word vectors based on project requirements.\n- Efficient Deployment: Enables easy deployment for various NLP applications, such as machine translation and named entity recognition (NER).\n\n## Installation\nTo install Wolaita_POST, you can use pip:\n- pip install Wolaita_POST\n\n##Usage\n\nAfter installation, you can use Wolaita_POST as follows:\n1. Import the package:\nfrom Wolaita_POST import pos_tagger\n2. Set file paths for your pretrained model, word vectors, and tokenizers:\nmodel_path = \"/content/drive/MyDrive/POS/Bi_GRU_model.h5\" # Adjust if your model file has a different extension\nword_vector_path = \"/content/drive/MyDrive/POS/fasttext_compatible.bin\"\nword_tokenizer_path = \"/content/drive/MyDrive/POS/wolaita_tokenizerX.pkl\"\ntag_tokenizer_path = \"/content/drive/MyDrive/POS/wolaita_tag_tokenizerY.pkl\"\n3. Initialize the POS tagger:\npos_tagger = WolaitaPOSTagger(\n model_path=model_path,\n word_vector_path=word_vector_path,\n word_tokenizer_path=word_tokenizer_path,\n tag_tokenizer_path=tag_tokenizer_path\n)\n4. Use the POS tagger to tag Wolaita text:\ntext = ['Insert your sample text here']\ntagged_text = pos_tagger.tag(text)\nprint(tagged_text)\n\nThe tagged_text will contain the part-of-speech tags for the given Wolaita text.\n\n##Running Tests\nIf you want to verify functionality, you can use pytest. Run this command in your project directory:\n- !pytest /content/drive/MyDrive/Wolaita_POST/tests > test_report.txt\n\n##License\nThis project is licensed under the MIT License. See the LICENSE file for more details.\n\n##Contributing\nContributions are welcome! If you have suggestions for improving the package or find any issues, feel free to open a pull request or submit an issue on GitHub.\n\n##Acknowledgements\nSpecial thanks to the developers and researchers who contributed to this project, making it possible to expand NLP resources for the Wolaita language.\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A POS tagger for the Wolaita language using deep learning",
"version": "0.1.0",
"project_urls": {
"Documentation": "https://github.com/Sisagegn/Wolaita_POST/wiki",
"Homepage": "https://github.com/Sisagegn/Wolaita_POST",
"Source": "https://github.com/Sisagegn/Wolaita_POST",
"Tracker": "https://github.com/Sisagegn/Wolaita_POST/issues"
},
"split_keywords": [
"wolaita",
"pos",
"tagging",
"nlp",
"deep",
"learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b8c0e4a25cc3cf7d4c927d77ecae9ab0a1d478a96d9b5455da03ce9d2742648c",
"md5": "235206866656df3d04427bd365cf661d",
"sha256": "8d4086d1499c1dcf90443fecd6e61711d01cc342f6ac99cffa656f8c8df5d495"
},
"downloads": -1,
"filename": "Wolaita_POST-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "235206866656df3d04427bd365cf661d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 4197,
"upload_time": "2024-11-07T20:36:18",
"upload_time_iso_8601": "2024-11-07T20:36:18.976245Z",
"url": "https://files.pythonhosted.org/packages/b8/c0/e4a25cc3cf7d4c927d77ecae9ab0a1d478a96d9b5455da03ce9d2742648c/Wolaita_POST-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "bc0b7c284a88126d41dcd9fe1ac9661036ddd9ea8f934ca7e8c122836086d8be",
"md5": "840e629bfb36ad53e95b252029d29ded",
"sha256": "755c48b3ecdea28b86c9a4e206af6a81f1b62997a477ec08c43e34005592b529"
},
"downloads": -1,
"filename": "wolaita_post-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "840e629bfb36ad53e95b252029d29ded",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 5126,
"upload_time": "2024-11-07T20:36:20",
"upload_time_iso_8601": "2024-11-07T20:36:20.611486Z",
"url": "https://files.pythonhosted.org/packages/bc/0b/7c284a88126d41dcd9fe1ac9661036ddd9ea8f934ca7e8c122836086d8be/wolaita_post-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-07 20:36:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Sisagegn",
"github_project": "Wolaita_POST",
"github_not_found": true,
"lcname": "wolaita-post"
}