aamraz

Name	aamraz JSON
Version	0.1.0 JSON
	download
home_page	https://github.com/MohammadDevelop/Aamraz
Summary	This project is a collection of Natural Language Processing tools for Kurdish Language.
upload_time	2024-10-11 20:17:34
maintainer	None
docs_url	None
author	Mohammad Mahmoodi Varnamkhasti
requires_python	>=3.6
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Aamraz - Kurdish NLP collection

## Overview
Aamraz which is written "ئامراز" in kurdish script means "instrument". This project is a collection of Natural Language Processing tools for Kurdish Language.
Despite being spoken by millions, Kurdish remains an under-resourced language in Natural Language Processing (NLP).
Recognizing the rich cultural heritage and historical significance of the Kurdish people, we—regardless of ethnicity—are committed to advancing tools and pre-trained models that empower the Kurdish language in modern research and technology.
Our work aims to foster further development and provide a foundation for future research and applications in NLP. [see github repository](https://github.com/MohammadDevelop/Aamraz)


## Installation
    pip install aamraz

## Base Features
- **Normalization** 
- **Tokenization** 
- **Stemming**
- **Word Embedding:** Creates vector representations of words.
- **Sentences Embedding:** Creates vector representations of sentences.

## Usage
```python
import aamraz

# Normalization
normalizer= aamraz.Normalizer()
sample_sentence="قڵبە‌کە‌م‌ بە‌  کوردی‌  قسە‌ دە‌کات‌."
normalized_sentence=normalizer.normalize(sample_sentence)
print(normalized_sentence)

# Tokenization
tokenizer = aamraz.WordTokenizer()
sample_sentence="زوانی له دربره"
tokens = tokenizer.tokenize(sample_sentence)
print(tokens)

# Embedding by fasttext
model_path = 'kurdish_fasttext_skipgram_dim300_v1.bin'
embedding_model = aamraz.EmbeddingModel(model_path, dim=50)

sample_word="ئامراز"
sample_sentence="زوانی له دربره"

word_vector = embedding_model.word_embedding(sample_word)
sentence_vector = embedding_model.sentence_embedding(sample_sentence)

print(word_vector)
print(sentence_vector)

# Embedding by word2vec
model_path = 'kurdish_word2vec_model_dim100_v1.bin'
embedding_model = aamraz.EmbeddingModel(model_path, type='word2vec')

sample_word="ئامراز"
sample_sentence="زوانی له دربره"

word_vector = embedding_model.word_embedding(sample_word)
sentence_vector = embedding_model.sentence_embedding(sample_sentence)

print(word_vector)
print(sentence_vector)

# Stemming
stemmer=aamraz.Stemmer(method='simple')
stemmed=stemmer.stem("کتێبەکانمان")
print(stemmed)
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/MohammadDevelop/Aamraz",
    "name": "aamraz",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Mohammad Mahmoodi Varnamkhasti",
    "author_email": "research@amzmohammad.com",
    "download_url": "https://files.pythonhosted.org/packages/80/3a/e215f33c52e27b0223400bbd6c9c0466de944566b9ec22801b9e234aa6e1/aamraz-0.1.0.tar.gz",
    "platform": null,
    "description": "# Aamraz - Kurdish NLP collection\n\n## Overview\nAamraz which is written \"\u0626\u0627\u0645\u0631\u0627\u0632\" in kurdish script means \"instrument\". This project is a collection of Natural Language Processing tools for Kurdish Language.\nDespite being spoken by millions, Kurdish remains an under-resourced language in Natural Language Processing (NLP).\nRecognizing the rich cultural heritage and historical significance of the Kurdish people, we\u2014regardless of ethnicity\u2014are committed to advancing tools and pre-trained models that empower the Kurdish language in modern research and technology.\nOur work aims to foster further development and provide a foundation for future research and applications in NLP. [see github repository](https://github.com/MohammadDevelop/Aamraz)\n\n\n## Installation\n    pip install aamraz\n\n## Base Features\n- **Normalization** \n- **Tokenization** \n- **Stemming**\n- **Word Embedding:** Creates vector representations of words.\n- **Sentences Embedding:** Creates vector representations of sentences.\n\n## Usage\n```python\nimport aamraz\n\n# Normalization\nnormalizer= aamraz.Normalizer()\nsample_sentence=\"\u0642\u06b5\u0628\u06d5\u200c\u06a9\u06d5\u200c\u0645\u200c \u0628\u06d5\u200c  \u06a9\u0648\u0631\u062f\u06cc\u200c  \u0642\u0633\u06d5\u200c \u062f\u06d5\u200c\u06a9\u0627\u062a\u200c.\"\nnormalized_sentence=normalizer.normalize(sample_sentence)\nprint(normalized_sentence)\n\n# Tokenization\ntokenizer = aamraz.WordTokenizer()\nsample_sentence=\"\u0632\u0648\u0627\u0646\u06cc \u0644\u0647 \u062f\u0631\u0628\u0631\u0647\"\ntokens = tokenizer.tokenize(sample_sentence)\nprint(tokens)\n\n# Embedding by fasttext\nmodel_path = 'kurdish_fasttext_skipgram_dim300_v1.bin'\nembedding_model = aamraz.EmbeddingModel(model_path, dim=50)\n\nsample_word=\"\u0626\u0627\u0645\u0631\u0627\u0632\"\nsample_sentence=\"\u0632\u0648\u0627\u0646\u06cc \u0644\u0647 \u062f\u0631\u0628\u0631\u0647\"\n\nword_vector = embedding_model.word_embedding(sample_word)\nsentence_vector = embedding_model.sentence_embedding(sample_sentence)\n\nprint(word_vector)\nprint(sentence_vector)\n\n# Embedding by word2vec\nmodel_path = 'kurdish_word2vec_model_dim100_v1.bin'\nembedding_model = aamraz.EmbeddingModel(model_path, type='word2vec')\n\nsample_word=\"\u0626\u0627\u0645\u0631\u0627\u0632\"\nsample_sentence=\"\u0632\u0648\u0627\u0646\u06cc \u0644\u0647 \u062f\u0631\u0628\u0631\u0647\"\n\nword_vector = embedding_model.word_embedding(sample_word)\nsentence_vector = embedding_model.sentence_embedding(sample_sentence)\n\nprint(word_vector)\nprint(sentence_vector)\n\n# Stemming\nstemmer=aamraz.Stemmer(method='simple')\nstemmed=stemmer.stem(\"\u06a9\u062a\u06ce\u0628\u06d5\u06a9\u0627\u0646\u0645\u0627\u0646\")\nprint(stemmed)\n```\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "This project is a collection of Natural Language Processing tools for Kurdish Language.",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/MohammadDevelop/Aamraz"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6afe4003eec358ec3754bf161ee4774d18f2560346d5ad300d67d129155289ec",
                "md5": "4077e8efbc989d24986b4d4b53066aba",
                "sha256": "e261f08030a0a64391fccdd140fc2ba5321a183b7d3bf7f1aea4a843a1d46046"
            },
            "downloads": -1,
            "filename": "aamraz-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4077e8efbc989d24986b4d4b53066aba",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 10714,
            "upload_time": "2024-10-11T20:17:30",
            "upload_time_iso_8601": "2024-10-11T20:17:30.207988Z",
            "url": "https://files.pythonhosted.org/packages/6a/fe/4003eec358ec3754bf161ee4774d18f2560346d5ad300d67d129155289ec/aamraz-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "803ae215f33c52e27b0223400bbd6c9c0466de944566b9ec22801b9e234aa6e1",
                "md5": "01389600a3d83739f531dc985e3e5acf",
                "sha256": "2fa62ce66711a0436d39fa2a07140dc2b9a92eb3e939157f2f54849a2705154d"
            },
            "downloads": -1,
            "filename": "aamraz-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "01389600a3d83739f531dc985e3e5acf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 10100,
            "upload_time": "2024-10-11T20:17:34",
            "upload_time_iso_8601": "2024-10-11T20:17:34.427935Z",
            "url": "https://files.pythonhosted.org/packages/80/3a/e215f33c52e27b0223400bbd6c9c0466de944566b9ec22801b9e234aa6e1/aamraz-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-11 20:17:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MohammadDevelop",
    "github_project": "Aamraz",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "aamraz"
}

Mohammad Mahmoodi Varnamkhasti