# Aamraz - Kurdish NLP collection
## Overview
Aamraz which is written "ئامراز" in kurdish script means "instrument". This project is a collection of Natural Language Processing tools for Kurdish Language.
Despite being spoken by millions, Kurdish remains an under-resourced language in Natural Language Processing (NLP).
Recognizing the rich cultural heritage and historical significance of the Kurdish people, we—regardless of ethnicity—are committed to advancing tools and pre-trained models that empower the Kurdish language in modern research and technology.
Our work aims to foster further development and provide a foundation for future research and applications in NLP. [see github repository](https://github.com/MohammadDevelop/Aamraz)
## Installation
pip install aamraz
## Base Features
- **Normalization**
- **Tokenization**
- **Stemming**
- **Word Embedding:** Creates vector representations of words.
- **Sentences Embedding:** Creates vector representations of sentences.
## Usage
```python
import aamraz
# Normalization
normalizer= aamraz.Normalizer()
sample_sentence="قڵبەکەم بە کوردی قسە دەکات."
normalized_sentence=normalizer.normalize(sample_sentence)
print(normalized_sentence)
# Tokenization
tokenizer = aamraz.WordTokenizer()
sample_sentence="زوانی له دربره"
tokens = tokenizer.tokenize(sample_sentence)
print(tokens)
# Embedding by fasttext
model_path = 'kurdish_fasttext_skipgram_dim300_v1.bin'
embedding_model = aamraz.EmbeddingModel(model_path, dim=50)
sample_word="ئامراز"
sample_sentence="زوانی له دربره"
word_vector = embedding_model.word_embedding(sample_word)
sentence_vector = embedding_model.sentence_embedding(sample_sentence)
print(word_vector)
print(sentence_vector)
# Embedding by word2vec
model_path = 'kurdish_word2vec_model_dim100_v1.bin'
embedding_model = aamraz.EmbeddingModel(model_path, type='word2vec')
sample_word="ئامراز"
sample_sentence="زوانی له دربره"
word_vector = embedding_model.word_embedding(sample_word)
sentence_vector = embedding_model.sentence_embedding(sample_sentence)
print(word_vector)
print(sentence_vector)
# Stemming
stemmer=aamraz.Stemmer(method='simple')
stemmed=stemmer.stem("کتێبەکانمان")
print(stemmed)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/MohammadDevelop/Aamraz",
"name": "aamraz",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": null,
"author": "Mohammad Mahmoodi Varnamkhasti",
"author_email": "research@amzmohammad.com",
"download_url": "https://files.pythonhosted.org/packages/80/3a/e215f33c52e27b0223400bbd6c9c0466de944566b9ec22801b9e234aa6e1/aamraz-0.1.0.tar.gz",
"platform": null,
"description": "# Aamraz - Kurdish NLP collection\n\n## Overview\nAamraz which is written \"\u0626\u0627\u0645\u0631\u0627\u0632\" in kurdish script means \"instrument\". This project is a collection of Natural Language Processing tools for Kurdish Language.\nDespite being spoken by millions, Kurdish remains an under-resourced language in Natural Language Processing (NLP).\nRecognizing the rich cultural heritage and historical significance of the Kurdish people, we\u2014regardless of ethnicity\u2014are committed to advancing tools and pre-trained models that empower the Kurdish language in modern research and technology.\nOur work aims to foster further development and provide a foundation for future research and applications in NLP. [see github repository](https://github.com/MohammadDevelop/Aamraz)\n\n\n## Installation\n pip install aamraz\n\n## Base Features\n- **Normalization** \n- **Tokenization** \n- **Stemming**\n- **Word Embedding:** Creates vector representations of words.\n- **Sentences Embedding:** Creates vector representations of sentences.\n\n## Usage\n```python\nimport aamraz\n\n# Normalization\nnormalizer= aamraz.Normalizer()\nsample_sentence=\"\u0642\u06b5\u0628\u06d5\u200c\u06a9\u06d5\u200c\u0645\u200c \u0628\u06d5\u200c \u06a9\u0648\u0631\u062f\u06cc\u200c \u0642\u0633\u06d5\u200c \u062f\u06d5\u200c\u06a9\u0627\u062a\u200c.\"\nnormalized_sentence=normalizer.normalize(sample_sentence)\nprint(normalized_sentence)\n\n# Tokenization\ntokenizer = aamraz.WordTokenizer()\nsample_sentence=\"\u0632\u0648\u0627\u0646\u06cc \u0644\u0647 \u062f\u0631\u0628\u0631\u0647\"\ntokens = tokenizer.tokenize(sample_sentence)\nprint(tokens)\n\n# Embedding by fasttext\nmodel_path = 'kurdish_fasttext_skipgram_dim300_v1.bin'\nembedding_model = aamraz.EmbeddingModel(model_path, dim=50)\n\nsample_word=\"\u0626\u0627\u0645\u0631\u0627\u0632\"\nsample_sentence=\"\u0632\u0648\u0627\u0646\u06cc \u0644\u0647 \u062f\u0631\u0628\u0631\u0647\"\n\nword_vector = embedding_model.word_embedding(sample_word)\nsentence_vector = embedding_model.sentence_embedding(sample_sentence)\n\nprint(word_vector)\nprint(sentence_vector)\n\n# Embedding by word2vec\nmodel_path = 'kurdish_word2vec_model_dim100_v1.bin'\nembedding_model = aamraz.EmbeddingModel(model_path, type='word2vec')\n\nsample_word=\"\u0626\u0627\u0645\u0631\u0627\u0632\"\nsample_sentence=\"\u0632\u0648\u0627\u0646\u06cc \u0644\u0647 \u062f\u0631\u0628\u0631\u0647\"\n\nword_vector = embedding_model.word_embedding(sample_word)\nsentence_vector = embedding_model.sentence_embedding(sample_sentence)\n\nprint(word_vector)\nprint(sentence_vector)\n\n# Stemming\nstemmer=aamraz.Stemmer(method='simple')\nstemmed=stemmer.stem(\"\u06a9\u062a\u06ce\u0628\u06d5\u06a9\u0627\u0646\u0645\u0627\u0646\")\nprint(stemmed)\n```\n\n",
"bugtrack_url": null,
"license": null,
"summary": "This project is a collection of Natural Language Processing tools for Kurdish Language.",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/MohammadDevelop/Aamraz"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6afe4003eec358ec3754bf161ee4774d18f2560346d5ad300d67d129155289ec",
"md5": "4077e8efbc989d24986b4d4b53066aba",
"sha256": "e261f08030a0a64391fccdd140fc2ba5321a183b7d3bf7f1aea4a843a1d46046"
},
"downloads": -1,
"filename": "aamraz-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4077e8efbc989d24986b4d4b53066aba",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 10714,
"upload_time": "2024-10-11T20:17:30",
"upload_time_iso_8601": "2024-10-11T20:17:30.207988Z",
"url": "https://files.pythonhosted.org/packages/6a/fe/4003eec358ec3754bf161ee4774d18f2560346d5ad300d67d129155289ec/aamraz-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "803ae215f33c52e27b0223400bbd6c9c0466de944566b9ec22801b9e234aa6e1",
"md5": "01389600a3d83739f531dc985e3e5acf",
"sha256": "2fa62ce66711a0436d39fa2a07140dc2b9a92eb3e939157f2f54849a2705154d"
},
"downloads": -1,
"filename": "aamraz-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "01389600a3d83739f531dc985e3e5acf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 10100,
"upload_time": "2024-10-11T20:17:34",
"upload_time_iso_8601": "2024-10-11T20:17:34.427935Z",
"url": "https://files.pythonhosted.org/packages/80/3a/e215f33c52e27b0223400bbd6c9c0466de944566b9ec22801b9e234aa6e1/aamraz-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-11 20:17:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "MohammadDevelop",
"github_project": "Aamraz",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "aamraz"
}