qalsadi

Name	qalsadi JSON
Version	0.5.1 JSON
	download
home_page	http://qalsadi.sourceforge.net/
Summary	Qalsadi Arabic Morphological Analyzer and lemmatizer for Python
upload_time	2025-07-27 08:47:17
maintainer	None
docs_url	https://pythonhosted.org/qalsadi/
author	Taha Zerrouki
requires_python	>=3.6
license	GPL
keywords	arabic nlp morphological analysis lemmatizer
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Qalsadi Arabic Morphological Analyzer and Lemmatizer for Python

المكتبة البرمجية [القلصادي](https://github.com/linuxscout/qalsadi)  أداة متخصصة في التحليل الصرفي للنصوص العربية. تعتمد على قاعدة بيانات معجمية لتحليل النصوص سواء كانت مشكولة جزئياً أو كلياً. تقدم هذه المكتبة تشكيل الكلمات وتحليلها الصرفي، بالإضافة إلى تقييم درجة شيوع الكلمة في اللغة العربية المعاصرة.

متوفرة للتجربة على موقع [مشكال](http://tahadz.com/mishkal)، قسم  أدوات/تحليل

[Qalsadi](https://github.com/linuxscout/qalsadi) library is a specialized tool for morphological analysis of Arabic texts. It uses a lexical database to analyze fully or partially vocalized texts, providing both morphological analysis and diacritics. Additionally, it evaluates the frequency of word usage in contemporary Arabic and uses the "Qutrub" tool for verb conjugation.

The demo is available on [Mishkal](http://Tahadz.com/mishkal/ >Tools/َAnalysis

  Developpers:  Taha Zerrouki: http://tahadz.com
    taha dot zerrouki at gmail dot com

Features  |   value
----------|---------------------------------------------------------------------------------
Authors   | [Authors.md](https://github.com/linuxscout/qalsadi/master/AUTHORS.md)
Release   | {{ release }}
License   |[GPL](https://github.com/linuxscout/qalsadi/master/LICENSE)
Tracker   |[linuxscout/qalsadi/Issues](https://github.com/linuxscout/qalsadi/issues)
Website   |[https://pypi.python.org/pypi/qalsadi](https://pypi.python.org/pypi/qalsadi)
Doc       |[package Documentaion](https://qalsadi.readthedocs.io/)
Source    |[Github](http://github.com/linuxscout/qalsadi)
Download  |[sourceforge](http://qalsadi.sourceforge.net)
Feedbacks |[Comments](http://tahadz.com/qalsadi/contact)
Accounts  |[@Twitter](https://twitter.com/linuxscout)  [@Sourceforge](http://sourceforge.net/projects/qalsadi/)



## Citation
If you would cite it in academic work, can you use this citation
```
T. Zerrouki‏, Qalsadi, Arabic mophological analyzer Library for python.,  https://pypi.python.org/pypi/qalsadi/
```
Another Citation:
```
Zerrouki, Taha. "Towards An Open Platform For Arabic Language Processing." (2020).
```
or in bibtex format

```bibtex
@misc{zerrouki2012qalsadi,
  title={qalsadi, Arabic mophological analyzer Library for python.},
  author={Zerrouki, Taha},
  url={https://pypi.python.org/pypi/qalsadi},
  year={2012}
}

```bibtex
@thesis{zerrouki2020towards,
  title={Towards An Open Platform For Arabic Language Processing},
  author={Zerrouki, Taha},
  year={2020}
}

```


## Features  مزايا
 - Lemmatization
 - Vocalized Text Analyzer, 
 - Use Qutrub library to analyze verbs.
 - give word frequency in Arabic modern use.

### Applications

* Stemming texts
* Text Classification and categorization
* Sentiment Analysis
* Named Entities Recognition

### Installation

```
pip install qalsadi
```
#### Requirements

``` 
pip install -r requirements.txt 
```

## Usage
### Demo
The demo is available on [Tahadz.com](http://tahadz.com/mishkal) >Tools/َAnalysis قسم أدوات - تحليل
### Example 
#### Lemmatization
```python
>>> import qalsadi.lemmatizer 
>>> text = u"""هل تحتاج إلى ترجمة كي تفهم خطاب الملك؟ اللغة "الكلاسيكية" (الفصحى) موجودة في كل اللغات وكذلك اللغة "الدارجة" .. الفرنسية التي ندرس في المدرسة ليست الفرنسية التي يستخدمها الناس في شوارع باريس .. وملكة بريطانيا لا تخطب بلغة شوارع لندن .. لكل مقام مقال"""
>>> lemmer = qalsadi.lemmatizer.Lemmatizer()
>>> # lemmatize a word
... lemmer.lemmatize("يحتاج")
'احتاج'
>>> # lemmatize a word with a specific pos
>>> lemmer.lemmatize("وفي")
'في'
>>> lemmer.lemmatize("وفي", pos="v")
'وفى'

>>> lemmas = lemmer.lemmatize_text(text)
>>> print(lemmas)
['هل', 'احتاج', 'إلى', 'ترجمة', 'كي', 'تفهم', 'خطاب', 'ملك', '؟', 'لغة', '"', 'كلاسيكي', '"(', 'فصحى', ')', 'موجود', 'في', 'كل', 'لغة', 'ذلك', 'لغة', '"', 'دارج', '"..', 'فرنسي', 'التي', 'درس', 'في', 'مدرسة', 'ليست', 'فرنسي', 'التي', 'استخدم', 'ناس', 'في', 'شوارع', 'باريس', '..', 'ملك', 'بريطانيا', 'لا', 'خطب', 'بلغة', 'شوارع', 'دنو', '..', 'كل', 'مقام', 'مقالي']
>>> # lemmatize a text and return lemma pos
... lemmas = lemmer.lemmatize_text(text, return_pos=True)
>>> print(lemmas)
[('هل', 'stopword'), ('احتاج', 'verb'), ('إلى', 'stopword'), ('ترجمة', 'noun'), ('كي', 'stopword'), ('تفهم', 'noun'), ('خطاب', 'noun'), ('ملك', 'noun'), '؟', ('لغة', 'noun'), '"', ('كلاسيكي', 'noun'), '"(', ('فصحى', 'noun'), ')', ('موجود', 'noun'), ('في', 'stopword'), ('كل', 'stopword'), ('لغة', 'noun'), ('ذلك', 'stopword'), ('لغة', 'noun'), '"', ('دارج', 'noun'), '"..', ('فرنسي', 'noun'), ('التي', 'stopword'), ('درس', 'verb'), ('في', 'stopword'), ('مدرسة', 'noun'), ('ليست', 'stopword'), ('فرنسي', 'noun'), ('التي', 'stopword'), ('استخدم', 'verb'), ('ناس', 'noun'), ('في', 'stopword'), ('شوارع', 'noun'), ('باريس', 'all'), '..', ('ملك', 'noun'), ('بريطانيا', 'noun'), ('لا', 'stopword'), ('خطب', 'verb'), ('بلغة', 'noun'), ('شوارع', 'noun'), ('دنو', 'verb'), '..', ('كل', 'stopword'), ('مقام', 'noun'), ('مقالي', 'noun')]

>>> # Get vocalized output lemmas
>>> lemmer.set_vocalized_lemma()
>>> lemmas = lemmer.lemmatize_text(text)
>>> print(lemmas)
['هَلْ', 'اِحْتَاجَ', 'إِلَى', 'تَرْجَمَةٌ', 'كَيْ', 'تَفَهُّمٌ', 'خَطَّابٌ', 'مَلَكٌ', '؟', 'لُغَةٌ', '"', 'كِلاَسِيكِيٌّ', '"(', 'فُصْحَى', ')', 'مَوْجُودٌ', 'فِي', 'كُلَّ', 'لُغَةٌ', 'ذَلِكَ', 'لُغَةٌ', '"', 'دَارِجٌ', '"..', 'فَرَنْسِيّ', 'الَّتِي', 'دَرَسَ', 'فِي', 'مَدْرَسَةٌ', 'لَيْسَتْ', 'فَرَنْسِيّ', 'الَّتِي', 'اِسْتَخْدَمَ', 'نَاسٌ', 'فِي', 'شَوَارِعٌ', 'باريس', '..', 'مَلَكٌ', 'برِيطانِيا', 'لَا', 'خَطَبَ', 'بَلَغَةٌ', 'شَوَارِعٌ', 'أَدَانَ', '..', 'كُلَّ', 'مَقَامٌ', 'مَقَالٌ']
>>> # get all lemmas for each word text
>>> lemmas = lemmer.lemmatize_text(text, all=True)
>>> lemmas
[['هل', 'وهل', 'هال'], ['احتاج'], ['إلى'], ['ترجمة'], ['كي'], ['تف', 'أفهم', 'فهم', 'تفهم'], ['خاطب', 'خطاب'], ['مالك', 'ملك'], ['؟'], ['لغة'], ['"'], ['كلاسيكي'], ['"('], ['فصحى'], [')'], ['موجود'], ['في'], ['أكل', 'كال', 'كل', 'وكل'], ['لغة'], ['كذلك', 'ذل'], ['لغة'], ['"'], ['دارج'], ['"..'], ['فرنسة', 'فرنسي'], ['التي'], ['درس'], ['في'], ['مدرس', 'مدرسة'], ['يس', 'ليست', 'لاس', 'ليس'], ['فرنسة', 'فرنسي'], ['التي'], ['استخدم'], ['ناس'], ['في'], ['شارع'], ['باريس'], ['..'], ['مالك', 'ملك', 'ملكة'], ['بريطانيا', 'بريطاني'], ['لا'], ['خطب'], ['بالغ', 'لغة', 'بلغة'], ['شارع'], ['دن', 'دنى', 'دان', 'ناد', 'دنو', 'أدنى', 'أدان', 'دنا', 'ودن'], ['..'], ['كل'], ['مقام'], ['مقالي', 'مقال']]

```

#### Morphology analysis
``` python
import qalsadi.analex as qa

text = "لا يحمل الحقد من تعلو به الرتب"
analyzer = qa.Analex()
result = analyzer.check_text(text)
print(result)
```

## Morphology analysis display

* The morphology generate a lot of fields, to manage dispaly we use the resultFormatter class

  ```python
  import qalsadi.analex as qa
  from qalsadi.resultformatter import ResultFormatter
  
  text = "لا يحمل الحقد من تعلو به الرتب"
  analyzer = qa.Analex()
  results = analyzer.check_text(text)
  formatter = ResultFormatter(result)
  
  # Use main fields display
  formatter.set_used_fields("main")
  print(formatter.as_table())
  
  ```

  * Other table formats:

    ```python
    # other table format
    print(formatter.as_table(tablefmt="github") 
    # tablefmt can  table all values from tabulate libray 
    # "plain" (default), "grid", "pipe" (Markdown), "html", "latex", "tsv"
    ```

    

  * Other display formats:

    ```python
    print(formatter.as_csv())
    print(formatter.as_json())
    print(formatter.as_xml())
    ```

  * Other display file formats saving:

    ```python
    formatter.as_csv("output/results.csv")
    formatter.to_json("output/results.json")
    formatter.to_xml("output/results.xml")
    ```

    

  * Change fields to display:

    ```python
    profile  = "main" # other values: "all" "roots", "lemmas", "inflect"
    formatter.set_used_fields(profile)
    ```

  * Add a customizable fields: 

    * if the given field name is not valid, it's ignored.

    ```python
    profile  = "main" # other values: "roots", "lemmas", "inflect"
    formatter.set_used_fields(profile, additional_fields=["root","INVALID"])
    ```

    





#### Output description:

* The result of morphology analysis is  a list of list of `StemmedWord` objects from `qalsadi.stemmedword` file.

The `StemmedWord` is handled as a `dict`n it contains the following fields:


Category   | Applied on | feature             | example         a |شرح
-----------|------------|---------------------|-------------------|---
affix      | all        | affix_key           | ال--َاتُ-       a |مفتاح الزوائد
affix      | all        | affix               | a                 |الزوائد
input      | all        | word                | البيانات        a |الكلمة المدخلة
input      | all        | unvocalized         | a                 |غير مشكول
original   | noun/verb  | freq                | 694644           a |درجة شيوع الكلمة
original   | all        | original_tags       | (u              a |خصائص الكلمة الأصلية
original   | all        | original            | بَيَانٌ         a |الكلمة الأصلية
original   | all        | root                | بين             a |الجذر
output     | all        | type                | Noun:مصدر       a |نوع الكلمة
output     | all        | semivocalized       | الْبَيَانَات    a |الكلمة مشكولة بدون علامة الإعراب
output     | all        | vocalized           | الْبَيَانَاتُ   a |الكلمة مشكولة
output     | all        | stem                | بيان            a |الجذع
output     | all        | lemma               | بيان        a |الأصل

* For more details about fields in the output, see [DOCS/DataDescription](docs/datadescription.md)

#### Using Cache

Qalsadi can use Cache to speed up the process, there are 4 kinds of cache,

* Memory cache
* Pickle cache
* Pickledb cache
* CodernityDB cache.

To use one of it, you can see the followng examples:
* Using a factory method

```python
>> > import qalsadi.analex
>> > from qalsadi.cachemanager.cache_factory import Cache_Factory
>> > analyzer = qalsadi.analex.Analex()
>> >  # list available cache names
>> > Cache_Factory.list()
['', 'memory', 'pickle', 'pickledb', 'codernity']
>> >  # configure cacher
>> >  # configure path used to store the cache
>> > path = 'cache/qalsasicache.pickledb'
>> > cacher = Cache_Factory.factory("pickledb", path)
>> > analyzer.set_cacher(cacher)
>> >  # to enable the use of cacher
>> > analyzer.enable_allow_cache_use()
```
* Memory cache

```python
>> > import qalsadi.analex
>> > analyzer = qalsadi.analex.Analex()
>> >  # configure cacher
>> > import qalsadi.cachemanager
>> > cacher = qalsadi.cache.cache.Cache()
>> > analyzer.set_cacher(cacher)
>> >  # to enable the use of cacher
>> > analyzer.enable_allow_cache_use()
>> >  # to disable the use of cacher
>> > analyzer.disable_allow_cache_use()
```
* Pickle cache

```python
>> > import qalsadi.analex
>> > from qalsadi.cachemanager.cache_pickle import Cache
>> > analyzer = qalsadi.analex.Analex()
>> >  # configure cacher
>> >  # configure path used to store the cache
>> > path = 'cache/qalsadiCache.pickle'
>> > cacher = Cache(path)
>> > analyzer.set_cacher(cacher)
>> >  # to enable the use of cacher
>> > analyzer.enable_allow_cache_use()

```
* Pickledb cache

```python
>> > import qalsadi.analex
>> > from qalsadi.cachemanager.cache_pickledb import Cache
>> > analyzer = qalsadi.analex.Analex()
>> >  # configure cacher
>> >  # configure path used to store the cache
>> > path = 'cache/qalsadiCache.pickledb'
>> > cacher = Cache(path)
>> > analyzer.set_cacher(cacher)
>> >  # to enable the use of cacher
>> > analyzer.enable_allow_cache_use()

```
* CodernityDB cache

```python
>> > import qalsadi.analex
>> > from qalsadi.cachemanager.cache_codernity import Cache
>> > analyzer = qalsadi.analex.Analex()
>> >  # configure cacher
>> >  # configure path used to store the cache
>> > path = 'cache'
>> > cacher = Cache(path)
>> > analyzer.set_cacher(cacher)
>> >  # to enable the use of cacher
>> > analyzer.enable_allow_cache_use()
```

Raw data

            {
    "_id": null,
    "home_page": "http://qalsadi.sourceforge.net/",
    "name": "qalsadi",
    "maintainer": null,
    "docs_url": "https://pythonhosted.org/qalsadi/",
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "Arabic, NLP, morphological analysis, lemmatizer",
    "author": "Taha Zerrouki",
    "author_email": "Taha Zerrouki <taha.zerrouki@gmail.com>",
    "download_url": null,
    "platform": null,
    "description": "# Qalsadi Arabic Morphological Analyzer and Lemmatizer for Python\n\n\u0627\u0644\u0645\u0643\u062a\u0628\u0629 \u0627\u0644\u0628\u0631\u0645\u062c\u064a\u0629 [\u0627\u0644\u0642\u0644\u0635\u0627\u062f\u064a](https://github.com/linuxscout/qalsadi)  \u0623\u062f\u0627\u0629 \u0645\u062a\u062e\u0635\u0635\u0629 \u0641\u064a \u0627\u0644\u062a\u062d\u0644\u064a\u0644 \u0627\u0644\u0635\u0631\u0641\u064a \u0644\u0644\u0646\u0635\u0648\u0635 \u0627\u0644\u0639\u0631\u0628\u064a\u0629. \u062a\u0639\u062a\u0645\u062f \u0639\u0644\u0649 \u0642\u0627\u0639\u062f\u0629 \u0628\u064a\u0627\u0646\u0627\u062a \u0645\u0639\u062c\u0645\u064a\u0629 \u0644\u062a\u062d\u0644\u064a\u0644 \u0627\u0644\u0646\u0635\u0648\u0635 \u0633\u0648\u0627\u0621 \u0643\u0627\u0646\u062a \u0645\u0634\u0643\u0648\u0644\u0629 \u062c\u0632\u0626\u064a\u0627\u064b \u0623\u0648 \u0643\u0644\u064a\u0627\u064b. \u062a\u0642\u062f\u0645 \u0647\u0630\u0647 \u0627\u0644\u0645\u0643\u062a\u0628\u0629 \u062a\u0634\u0643\u064a\u0644 \u0627\u0644\u0643\u0644\u0645\u0627\u062a \u0648\u062a\u062d\u0644\u064a\u0644\u0647\u0627 \u0627\u0644\u0635\u0631\u0641\u064a\u060c \u0628\u0627\u0644\u0625\u0636\u0627\u0641\u0629 \u0625\u0644\u0649 \u062a\u0642\u064a\u064a\u0645 \u062f\u0631\u062c\u0629 \u0634\u064a\u0648\u0639 \u0627\u0644\u0643\u0644\u0645\u0629 \u0641\u064a \u0627\u0644\u0644\u063a\u0629 \u0627\u0644\u0639\u0631\u0628\u064a\u0629 \u0627\u0644\u0645\u0639\u0627\u0635\u0631\u0629.\n\n\u0645\u062a\u0648\u0641\u0631\u0629 \u0644\u0644\u062a\u062c\u0631\u0628\u0629 \u0639\u0644\u0649 \u0645\u0648\u0642\u0639 [\u0645\u0634\u0643\u0627\u0644](http://tahadz.com/mishkal)\u060c \u0642\u0633\u0645  \u0623\u062f\u0648\u0627\u062a/\u062a\u062d\u0644\u064a\u0644\n\n[Qalsadi](https://github.com/linuxscout/qalsadi) library is a specialized tool for morphological analysis of Arabic texts. It uses a lexical database to analyze fully or partially vocalized texts, providing both morphological analysis and diacritics. Additionally, it evaluates the frequency of word usage in contemporary Arabic and uses the \"Qutrub\" tool for verb conjugation.\n\nThe demo is available on [Mishkal](http://Tahadz.com/mishkal/ >Tools/\u064eAnalysis\n\n  Developpers:  Taha Zerrouki: http://tahadz.com\n    taha dot zerrouki at gmail dot com\n\nFeatures  |   value\n----------|---------------------------------------------------------------------------------\nAuthors   | [Authors.md](https://github.com/linuxscout/qalsadi/master/AUTHORS.md)\nRelease   | {{ release }}\nLicense   |[GPL](https://github.com/linuxscout/qalsadi/master/LICENSE)\nTracker   |[linuxscout/qalsadi/Issues](https://github.com/linuxscout/qalsadi/issues)\nWebsite   |[https://pypi.python.org/pypi/qalsadi](https://pypi.python.org/pypi/qalsadi)\nDoc       |[package Documentaion](https://qalsadi.readthedocs.io/)\nSource    |[Github](http://github.com/linuxscout/qalsadi)\nDownload  |[sourceforge](http://qalsadi.sourceforge.net)\nFeedbacks |[Comments](http://tahadz.com/qalsadi/contact)\nAccounts  |[@Twitter](https://twitter.com/linuxscout)  [@Sourceforge](http://sourceforge.net/projects/qalsadi/)\n\n\n\n## Citation\nIf you would cite it in academic work, can you use this citation\n```\nT. Zerrouki\u200f, Qalsadi, Arabic mophological analyzer Library for python.,  https://pypi.python.org/pypi/qalsadi/\n```\nAnother Citation:\n```\nZerrouki, Taha. \"Towards An Open Platform For Arabic Language Processing.\" (2020).\n```\nor in bibtex format\n\n```bibtex\n@misc{zerrouki2012qalsadi,\n  title={qalsadi, Arabic mophological analyzer Library for python.},\n  author={Zerrouki, Taha},\n  url={https://pypi.python.org/pypi/qalsadi},\n  year={2012}\n}\n\n```bibtex\n@thesis{zerrouki2020towards,\n  title={Towards An Open Platform For Arabic Language Processing},\n  author={Zerrouki, Taha},\n  year={2020}\n}\n\n```\n\n\n## Features  \u0645\u0632\u0627\u064a\u0627\n - Lemmatization\n - Vocalized Text Analyzer, \n - Use Qutrub library to analyze verbs.\n - give word frequency in Arabic modern use.\n\n### Applications\n\n* Stemming texts\n* Text Classification and categorization\n* Sentiment Analysis\n* Named Entities Recognition\n\n### Installation\n\n```\npip install qalsadi\n```\n#### Requirements\n\n``` \npip install -r requirements.txt \n```\n\n## Usage\n### Demo\nThe demo is available on [Tahadz.com](http://tahadz.com/mishkal) >Tools/\u064eAnalysis \u0642\u0633\u0645 \u0623\u062f\u0648\u0627\u062a - \u062a\u062d\u0644\u064a\u0644\n### Example \n#### Lemmatization\n```python\n>>> import qalsadi.lemmatizer \n>>> text = u\"\"\"\u0647\u0644 \u062a\u062d\u062a\u0627\u062c \u0625\u0644\u0649 \u062a\u0631\u062c\u0645\u0629 \u0643\u064a \u062a\u0641\u0647\u0645 \u062e\u0637\u0627\u0628 \u0627\u0644\u0645\u0644\u0643\u061f \u0627\u0644\u0644\u063a\u0629 \"\u0627\u0644\u0643\u0644\u0627\u0633\u064a\u0643\u064a\u0629\" (\u0627\u0644\u0641\u0635\u062d\u0649) \u0645\u0648\u062c\u0648\u062f\u0629 \u0641\u064a \u0643\u0644 \u0627\u0644\u0644\u063a\u0627\u062a \u0648\u0643\u0630\u0644\u0643 \u0627\u0644\u0644\u063a\u0629 \"\u0627\u0644\u062f\u0627\u0631\u062c\u0629\" .. \u0627\u0644\u0641\u0631\u0646\u0633\u064a\u0629 \u0627\u0644\u062a\u064a \u0646\u062f\u0631\u0633 \u0641\u064a \u0627\u0644\u0645\u062f\u0631\u0633\u0629 \u0644\u064a\u0633\u062a \u0627\u0644\u0641\u0631\u0646\u0633\u064a\u0629 \u0627\u0644\u062a\u064a \u064a\u0633\u062a\u062e\u062f\u0645\u0647\u0627 \u0627\u0644\u0646\u0627\u0633 \u0641\u064a \u0634\u0648\u0627\u0631\u0639 \u0628\u0627\u0631\u064a\u0633 .. \u0648\u0645\u0644\u0643\u0629 \u0628\u0631\u064a\u0637\u0627\u0646\u064a\u0627 \u0644\u0627 \u062a\u062e\u0637\u0628 \u0628\u0644\u063a\u0629 \u0634\u0648\u0627\u0631\u0639 \u0644\u0646\u062f\u0646 .. \u0644\u0643\u0644 \u0645\u0642\u0627\u0645 \u0645\u0642\u0627\u0644\"\"\"\n>>> lemmer = qalsadi.lemmatizer.Lemmatizer()\n>>> # lemmatize a word\n... lemmer.lemmatize(\"\u064a\u062d\u062a\u0627\u062c\")\n'\u0627\u062d\u062a\u0627\u062c'\n>>> # lemmatize a word with a specific pos\n>>> lemmer.lemmatize(\"\u0648\u0641\u064a\")\n'\u0641\u064a'\n>>> lemmer.lemmatize(\"\u0648\u0641\u064a\", pos=\"v\")\n'\u0648\u0641\u0649'\n\n>>> lemmas = lemmer.lemmatize_text(text)\n>>> print(lemmas)\n['\u0647\u0644', '\u0627\u062d\u062a\u0627\u062c', '\u0625\u0644\u0649', '\u062a\u0631\u062c\u0645\u0629', '\u0643\u064a', '\u062a\u0641\u0647\u0645', '\u062e\u0637\u0627\u0628', '\u0645\u0644\u0643', '\u061f', '\u0644\u063a\u0629', '\"', '\u0643\u0644\u0627\u0633\u064a\u0643\u064a', '\"(', '\u0641\u0635\u062d\u0649', ')', '\u0645\u0648\u062c\u0648\u062f', '\u0641\u064a', '\u0643\u0644', '\u0644\u063a\u0629', '\u0630\u0644\u0643', '\u0644\u063a\u0629', '\"', '\u062f\u0627\u0631\u062c', '\"..', '\u0641\u0631\u0646\u0633\u064a', '\u0627\u0644\u062a\u064a', '\u062f\u0631\u0633', '\u0641\u064a', '\u0645\u062f\u0631\u0633\u0629', '\u0644\u064a\u0633\u062a', '\u0641\u0631\u0646\u0633\u064a', '\u0627\u0644\u062a\u064a', '\u0627\u0633\u062a\u062e\u062f\u0645', '\u0646\u0627\u0633', '\u0641\u064a', '\u0634\u0648\u0627\u0631\u0639', '\u0628\u0627\u0631\u064a\u0633', '..', '\u0645\u0644\u0643', '\u0628\u0631\u064a\u0637\u0627\u0646\u064a\u0627', '\u0644\u0627', '\u062e\u0637\u0628', '\u0628\u0644\u063a\u0629', '\u0634\u0648\u0627\u0631\u0639', '\u062f\u0646\u0648', '..', '\u0643\u0644', '\u0645\u0642\u0627\u0645', '\u0645\u0642\u0627\u0644\u064a']\n>>> # lemmatize a text and return lemma pos\n... lemmas = lemmer.lemmatize_text(text, return_pos=True)\n>>> print(lemmas)\n[('\u0647\u0644', 'stopword'), ('\u0627\u062d\u062a\u0627\u062c', 'verb'), ('\u0625\u0644\u0649', 'stopword'), ('\u062a\u0631\u062c\u0645\u0629', 'noun'), ('\u0643\u064a', 'stopword'), ('\u062a\u0641\u0647\u0645', 'noun'), ('\u062e\u0637\u0627\u0628', 'noun'), ('\u0645\u0644\u0643', 'noun'), '\u061f', ('\u0644\u063a\u0629', 'noun'), '\"', ('\u0643\u0644\u0627\u0633\u064a\u0643\u064a', 'noun'), '\"(', ('\u0641\u0635\u062d\u0649', 'noun'), ')', ('\u0645\u0648\u062c\u0648\u062f', 'noun'), ('\u0641\u064a', 'stopword'), ('\u0643\u0644', 'stopword'), ('\u0644\u063a\u0629', 'noun'), ('\u0630\u0644\u0643', 'stopword'), ('\u0644\u063a\u0629', 'noun'), '\"', ('\u062f\u0627\u0631\u062c', 'noun'), '\"..', ('\u0641\u0631\u0646\u0633\u064a', 'noun'), ('\u0627\u0644\u062a\u064a', 'stopword'), ('\u062f\u0631\u0633', 'verb'), ('\u0641\u064a', 'stopword'), ('\u0645\u062f\u0631\u0633\u0629', 'noun'), ('\u0644\u064a\u0633\u062a', 'stopword'), ('\u0641\u0631\u0646\u0633\u064a', 'noun'), ('\u0627\u0644\u062a\u064a', 'stopword'), ('\u0627\u0633\u062a\u062e\u062f\u0645', 'verb'), ('\u0646\u0627\u0633', 'noun'), ('\u0641\u064a', 'stopword'), ('\u0634\u0648\u0627\u0631\u0639', 'noun'), ('\u0628\u0627\u0631\u064a\u0633', 'all'), '..', ('\u0645\u0644\u0643', 'noun'), ('\u0628\u0631\u064a\u0637\u0627\u0646\u064a\u0627', 'noun'), ('\u0644\u0627', 'stopword'), ('\u062e\u0637\u0628', 'verb'), ('\u0628\u0644\u063a\u0629', 'noun'), ('\u0634\u0648\u0627\u0631\u0639', 'noun'), ('\u062f\u0646\u0648', 'verb'), '..', ('\u0643\u0644', 'stopword'), ('\u0645\u0642\u0627\u0645', 'noun'), ('\u0645\u0642\u0627\u0644\u064a', 'noun')]\n\n>>> # Get vocalized output lemmas\n>>> lemmer.set_vocalized_lemma()\n>>> lemmas = lemmer.lemmatize_text(text)\n>>> print(lemmas)\n['\u0647\u064e\u0644\u0652', '\u0627\u0650\u062d\u0652\u062a\u064e\u0627\u062c\u064e', '\u0625\u0650\u0644\u064e\u0649', '\u062a\u064e\u0631\u0652\u062c\u064e\u0645\u064e\u0629\u064c', '\u0643\u064e\u064a\u0652', '\u062a\u064e\u0641\u064e\u0647\u0651\u064f\u0645\u064c', '\u062e\u064e\u0637\u0651\u064e\u0627\u0628\u064c', '\u0645\u064e\u0644\u064e\u0643\u064c', '\u061f', '\u0644\u064f\u063a\u064e\u0629\u064c', '\"', '\u0643\u0650\u0644\u0627\u064e\u0633\u0650\u064a\u0643\u0650\u064a\u0651\u064c', '\"(', '\u0641\u064f\u0635\u0652\u062d\u064e\u0649', ')', '\u0645\u064e\u0648\u0652\u062c\u064f\u0648\u062f\u064c', '\u0641\u0650\u064a', '\u0643\u064f\u0644\u0651\u064e', '\u0644\u064f\u063a\u064e\u0629\u064c', '\u0630\u064e\u0644\u0650\u0643\u064e', '\u0644\u064f\u063a\u064e\u0629\u064c', '\"', '\u062f\u064e\u0627\u0631\u0650\u062c\u064c', '\"..', '\u0641\u064e\u0631\u064e\u0646\u0652\u0633\u0650\u064a\u0651', '\u0627\u0644\u0651\u064e\u062a\u0650\u064a', '\u062f\u064e\u0631\u064e\u0633\u064e', '\u0641\u0650\u064a', '\u0645\u064e\u062f\u0652\u0631\u064e\u0633\u064e\u0629\u064c', '\u0644\u064e\u064a\u0652\u0633\u064e\u062a\u0652', '\u0641\u064e\u0631\u064e\u0646\u0652\u0633\u0650\u064a\u0651', '\u0627\u0644\u0651\u064e\u062a\u0650\u064a', '\u0627\u0650\u0633\u0652\u062a\u064e\u062e\u0652\u062f\u064e\u0645\u064e', '\u0646\u064e\u0627\u0633\u064c', '\u0641\u0650\u064a', '\u0634\u064e\u0648\u064e\u0627\u0631\u0650\u0639\u064c', '\u0628\u0627\u0631\u064a\u0633', '..', '\u0645\u064e\u0644\u064e\u0643\u064c', '\u0628\u0631\u0650\u064a\u0637\u0627\u0646\u0650\u064a\u0627', '\u0644\u064e\u0627', '\u062e\u064e\u0637\u064e\u0628\u064e', '\u0628\u064e\u0644\u064e\u063a\u064e\u0629\u064c', '\u0634\u064e\u0648\u064e\u0627\u0631\u0650\u0639\u064c', '\u0623\u064e\u062f\u064e\u0627\u0646\u064e', '..', '\u0643\u064f\u0644\u0651\u064e', '\u0645\u064e\u0642\u064e\u0627\u0645\u064c', '\u0645\u064e\u0642\u064e\u0627\u0644\u064c']\n>>> # get all lemmas for each word text\n>>> lemmas = lemmer.lemmatize_text(text, all=True)\n>>> lemmas\n[['\u0647\u0644', '\u0648\u0647\u0644', '\u0647\u0627\u0644'], ['\u0627\u062d\u062a\u0627\u062c'], ['\u0625\u0644\u0649'], ['\u062a\u0631\u062c\u0645\u0629'], ['\u0643\u064a'], ['\u062a\u0641', '\u0623\u0641\u0647\u0645', '\u0641\u0647\u0645', '\u062a\u0641\u0647\u0645'], ['\u062e\u0627\u0637\u0628', '\u062e\u0637\u0627\u0628'], ['\u0645\u0627\u0644\u0643', '\u0645\u0644\u0643'], ['\u061f'], ['\u0644\u063a\u0629'], ['\"'], ['\u0643\u0644\u0627\u0633\u064a\u0643\u064a'], ['\"('], ['\u0641\u0635\u062d\u0649'], [')'], ['\u0645\u0648\u062c\u0648\u062f'], ['\u0641\u064a'], ['\u0623\u0643\u0644', '\u0643\u0627\u0644', '\u0643\u0644', '\u0648\u0643\u0644'], ['\u0644\u063a\u0629'], ['\u0643\u0630\u0644\u0643', '\u0630\u0644'], ['\u0644\u063a\u0629'], ['\"'], ['\u062f\u0627\u0631\u062c'], ['\"..'], ['\u0641\u0631\u0646\u0633\u0629', '\u0641\u0631\u0646\u0633\u064a'], ['\u0627\u0644\u062a\u064a'], ['\u062f\u0631\u0633'], ['\u0641\u064a'], ['\u0645\u062f\u0631\u0633', '\u0645\u062f\u0631\u0633\u0629'], ['\u064a\u0633', '\u0644\u064a\u0633\u062a', '\u0644\u0627\u0633', '\u0644\u064a\u0633'], ['\u0641\u0631\u0646\u0633\u0629', '\u0641\u0631\u0646\u0633\u064a'], ['\u0627\u0644\u062a\u064a'], ['\u0627\u0633\u062a\u062e\u062f\u0645'], ['\u0646\u0627\u0633'], ['\u0641\u064a'], ['\u0634\u0627\u0631\u0639'], ['\u0628\u0627\u0631\u064a\u0633'], ['..'], ['\u0645\u0627\u0644\u0643', '\u0645\u0644\u0643', '\u0645\u0644\u0643\u0629'], ['\u0628\u0631\u064a\u0637\u0627\u0646\u064a\u0627', '\u0628\u0631\u064a\u0637\u0627\u0646\u064a'], ['\u0644\u0627'], ['\u062e\u0637\u0628'], ['\u0628\u0627\u0644\u063a', '\u0644\u063a\u0629', '\u0628\u0644\u063a\u0629'], ['\u0634\u0627\u0631\u0639'], ['\u062f\u0646', '\u062f\u0646\u0649', '\u062f\u0627\u0646', '\u0646\u0627\u062f', '\u062f\u0646\u0648', '\u0623\u062f\u0646\u0649', '\u0623\u062f\u0627\u0646', '\u062f\u0646\u0627', '\u0648\u062f\u0646'], ['..'], ['\u0643\u0644'], ['\u0645\u0642\u0627\u0645'], ['\u0645\u0642\u0627\u0644\u064a', '\u0645\u0642\u0627\u0644']]\n\n```\n\n#### Morphology analysis\n``` python\nimport qalsadi.analex as qa\n\ntext = \"\u0644\u0627 \u064a\u062d\u0645\u0644 \u0627\u0644\u062d\u0642\u062f \u0645\u0646 \u062a\u0639\u0644\u0648 \u0628\u0647 \u0627\u0644\u0631\u062a\u0628\"\nanalyzer = qa.Analex()\nresult = analyzer.check_text(text)\nprint(result)\n```\n\n## Morphology analysis display\n\n* The morphology generate a lot of fields, to manage dispaly we use the resultFormatter class\n\n  ```python\n  import qalsadi.analex as qa\n  from qalsadi.resultformatter import ResultFormatter\n  \n  text = \"\u0644\u0627 \u064a\u062d\u0645\u0644 \u0627\u0644\u062d\u0642\u062f \u0645\u0646 \u062a\u0639\u0644\u0648 \u0628\u0647 \u0627\u0644\u0631\u062a\u0628\"\n  analyzer = qa.Analex()\n  results = analyzer.check_text(text)\n  formatter = ResultFormatter(result)\n  \n  # Use main fields display\n  formatter.set_used_fields(\"main\")\n  print(formatter.as_table())\n  \n  ```\n\n  * Other table formats:\n\n    ```python\n    # other table format\n    print(formatter.as_table(tablefmt=\"github\") \n    # tablefmt can  table all values from tabulate libray \n    # \"plain\" (default), \"grid\", \"pipe\" (Markdown), \"html\", \"latex\", \"tsv\"\n    ```\n\n    \n\n  * Other display formats:\n\n    ```python\n    print(formatter.as_csv())\n    print(formatter.as_json())\n    print(formatter.as_xml())\n    ```\n\n  * Other display file formats saving:\n\n    ```python\n    formatter.as_csv(\"output/results.csv\")\n    formatter.to_json(\"output/results.json\")\n    formatter.to_xml(\"output/results.xml\")\n    ```\n\n    \n\n  * Change fields to display:\n\n    ```python\n    profile  = \"main\" # other values: \"all\" \"roots\", \"lemmas\", \"inflect\"\n    formatter.set_used_fields(profile)\n    ```\n\n  * Add a customizable fields: \n\n    * if the given field name is not valid, it's ignored.\n\n    ```python\n    profile  = \"main\" # other values: \"roots\", \"lemmas\", \"inflect\"\n    formatter.set_used_fields(profile, additional_fields=[\"root\",\"INVALID\"])\n    ```\n\n    \n\n\n\n\n\n#### Output description:\n\n* The result of morphology analysis is  a list of list of `StemmedWord` objects from `qalsadi.stemmedword` file.\n\nThe `StemmedWord` is handled as a `dict`n it contains the following fields:\n\n\nCategory   | Applied on | feature             | example         a |\u0634\u0631\u062d\n-----------|------------|---------------------|-------------------|---\naffix      | all        | affix_key           | \u0627\u0644--\u064e\u0627\u062a\u064f-       a |\u0645\u0641\u062a\u0627\u062d \u0627\u0644\u0632\u0648\u0627\u0626\u062f\naffix      | all        | affix               | a                 |\u0627\u0644\u0632\u0648\u0627\u0626\u062f\ninput      | all        | word                | \u0627\u0644\u0628\u064a\u0627\u0646\u0627\u062a        a |\u0627\u0644\u0643\u0644\u0645\u0629 \u0627\u0644\u0645\u062f\u062e\u0644\u0629\ninput      | all        | unvocalized         | a                 |\u063a\u064a\u0631 \u0645\u0634\u0643\u0648\u0644\noriginal   | noun/verb  | freq                | 694644           a |\u062f\u0631\u062c\u0629 \u0634\u064a\u0648\u0639 \u0627\u0644\u0643\u0644\u0645\u0629\noriginal   | all        | original_tags       | (u              a |\u062e\u0635\u0627\u0626\u0635 \u0627\u0644\u0643\u0644\u0645\u0629 \u0627\u0644\u0623\u0635\u0644\u064a\u0629\noriginal   | all        | original            | \u0628\u064e\u064a\u064e\u0627\u0646\u064c         a |\u0627\u0644\u0643\u0644\u0645\u0629 \u0627\u0644\u0623\u0635\u0644\u064a\u0629\noriginal   | all        | root                | \u0628\u064a\u0646             a |\u0627\u0644\u062c\u0630\u0631\noutput     | all        | type                | Noun:\u0645\u0635\u062f\u0631       a |\u0646\u0648\u0639 \u0627\u0644\u0643\u0644\u0645\u0629\noutput     | all        | semivocalized       | \u0627\u0644\u0652\u0628\u064e\u064a\u064e\u0627\u0646\u064e\u0627\u062a    a |\u0627\u0644\u0643\u0644\u0645\u0629 \u0645\u0634\u0643\u0648\u0644\u0629 \u0628\u062f\u0648\u0646 \u0639\u0644\u0627\u0645\u0629 \u0627\u0644\u0625\u0639\u0631\u0627\u0628\noutput     | all        | vocalized           | \u0627\u0644\u0652\u0628\u064e\u064a\u064e\u0627\u0646\u064e\u0627\u062a\u064f   a |\u0627\u0644\u0643\u0644\u0645\u0629 \u0645\u0634\u0643\u0648\u0644\u0629\noutput     | all        | stem                | \u0628\u064a\u0627\u0646            a |\u0627\u0644\u062c\u0630\u0639\noutput     | all        | lemma               | \u0628\u064a\u0627\u0646        a |\u0627\u0644\u0623\u0635\u0644\n\n* For more details about fields in the output, see [DOCS/DataDescription](docs/datadescription.md)\n\n#### Using Cache\n\nQalsadi can use Cache to speed up the process, there are 4 kinds of cache,\n\n* Memory cache\n* Pickle cache\n* Pickledb cache\n* CodernityDB cache.\n\nTo use one of it, you can see the followng examples:\n* Using a factory method\n\n```python\n>> > import qalsadi.analex\n>> > from qalsadi.cachemanager.cache_factory import Cache_Factory\n>> > analyzer = qalsadi.analex.Analex()\n>> >  # list available cache names\n>> > Cache_Factory.list()\n['', 'memory', 'pickle', 'pickledb', 'codernity']\n>> >  # configure cacher\n>> >  # configure path used to store the cache\n>> > path = 'cache/qalsasicache.pickledb'\n>> > cacher = Cache_Factory.factory(\"pickledb\", path)\n>> > analyzer.set_cacher(cacher)\n>> >  # to enable the use of cacher\n>> > analyzer.enable_allow_cache_use()\n```\n* Memory cache\n\n```python\n>> > import qalsadi.analex\n>> > analyzer = qalsadi.analex.Analex()\n>> >  # configure cacher\n>> > import qalsadi.cachemanager\n>> > cacher = qalsadi.cache.cache.Cache()\n>> > analyzer.set_cacher(cacher)\n>> >  # to enable the use of cacher\n>> > analyzer.enable_allow_cache_use()\n>> >  # to disable the use of cacher\n>> > analyzer.disable_allow_cache_use()\n```\n* Pickle cache\n\n```python\n>> > import qalsadi.analex\n>> > from qalsadi.cachemanager.cache_pickle import Cache\n>> > analyzer = qalsadi.analex.Analex()\n>> >  # configure cacher\n>> >  # configure path used to store the cache\n>> > path = 'cache/qalsadiCache.pickle'\n>> > cacher = Cache(path)\n>> > analyzer.set_cacher(cacher)\n>> >  # to enable the use of cacher\n>> > analyzer.enable_allow_cache_use()\n\n```\n* Pickledb cache\n\n```python\n>> > import qalsadi.analex\n>> > from qalsadi.cachemanager.cache_pickledb import Cache\n>> > analyzer = qalsadi.analex.Analex()\n>> >  # configure cacher\n>> >  # configure path used to store the cache\n>> > path = 'cache/qalsadiCache.pickledb'\n>> > cacher = Cache(path)\n>> > analyzer.set_cacher(cacher)\n>> >  # to enable the use of cacher\n>> > analyzer.enable_allow_cache_use()\n\n```\n* CodernityDB cache\n\n```python\n>> > import qalsadi.analex\n>> > from qalsadi.cachemanager.cache_codernity import Cache\n>> > analyzer = qalsadi.analex.Analex()\n>> >  # configure cacher\n>> >  # configure path used to store the cache\n>> > path = 'cache'\n>> > cacher = Cache(path)\n>> > analyzer.set_cacher(cacher)\n>> >  # to enable the use of cacher\n>> > analyzer.enable_allow_cache_use()\n```\n",
    "bugtrack_url": null,
    "license": "GPL",
    "summary": "Qalsadi Arabic Morphological Analyzer and lemmatizer for Python",
    "version": "0.5.1",
    "project_urls": {
        "Homepage": "http://qalsadi.sourceforge.net/"
    },
    "split_keywords": [
        "arabic",
        " nlp",
        " morphological analysis",
        " lemmatizer"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ffbc66ba91f862b1da98419118c124912bef72760e5d58f13eedb75b878e4f32",
                "md5": "b99f772f12f7fdb6e4b8f5d2ecb37bf8",
                "sha256": "15f51c8158a4366443941216fcf32fb230345c850dc86bc5caecc9e4481c8c72"
            },
            "downloads": -1,
            "filename": "qalsadi-0.5.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b99f772f12f7fdb6e4b8f5d2ecb37bf8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 268524,
            "upload_time": "2025-07-27T08:47:17",
            "upload_time_iso_8601": "2025-07-27T08:47:17.030012Z",
            "url": "https://files.pythonhosted.org/packages/ff/bc/66ba91f862b1da98419118c124912bef72760e5d58f13eedb75b878e4f32/qalsadi-0.5.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-27 08:47:17",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "qalsadi"
}

Taha Zerrouki