qalsadi


Nameqalsadi JSON
Version 0.5 PyPI version JSON
download
home_pagehttp://qalsadi.sourceforge.net/
SummaryQalsadi Arabic Morphological Analyzer and lemmatizer for Python
upload_time2023-07-17 06:39:26
maintainer
docs_urlhttps://pythonhosted.org/qalsadi/
authorTaha Zerrouki
requires_python
licenseGPL
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Qalsadi Arabic Morphological Analyzer and Lemmatizer for Python



  Developpers:  Taha Zerrouki: http://tahadz.com
    taha dot zerrouki at gmail dot com

Features  |   value
----------|---------------------------------------------------------------------------------
Authors   | [Authors.md](https://github.com/linuxscout/qalsadi/master/AUTHORS.md)
Release   | 0.5 
License   |[GPL](https://github.com/linuxscout/qalsadi/master/LICENSE)
Tracker   |[linuxscout/qalsadi/Issues](https://github.com/linuxscout/qalsadi/issues)
Website   |[https://pypi.python.org/pypi/qalsadi](https://pypi.python.org/pypi/qalsadi)
Doc       |[package Documentaion](https://qalsadi.readthedocs.io/)
Source    |[Github](http://github.com/linuxscout/qalsadi)
Download  |[sourceforge](http://qalsadi.sourceforge.net)
Feedbacks |[Comments](http://tahadz.com/qalsadi/contact)
Accounts  |[@Twitter](https://twitter.com/linuxscout)  [@Sourceforge](http://sourceforge.net/projects/qalsadi/)



## Citation
If you would cite it in academic work, can you use this citation
```
T. Zerrouki‏, Qalsadi, Arabic mophological analyzer Library for python.,  https://pypi.python.org/pypi/qalsadi/
```
Another Citation:
```
Zerrouki, Taha. "Towards An Open Platform For Arabic Language Processing." (2020).
```
or in bibtex format

```bibtex
@misc{zerrouki2012qalsadi,
  title={qalsadi, Arabic mophological analyzer Library for python.},
  author={Zerrouki, Taha},
  url={https://pypi.python.org/pypi/qalsadi},
  year={2012}
}

```bibtex
@thesis{zerrouki2020towards,
  title={Towards An Open Platform For Arabic Language Processing},
  author={Zerrouki, Taha},
  year={2020}
}

```


## Features  مزايا
 - Lemmatization
 - Vocalized Text Analyzer, 
 - Use Qutrub library to analyze verbs.
 - give word frequency in Arabic modern use.

### Applications

* Stemming texts
* Text Classification and categorization
* Sentiment Analysis
* Named Entities Recognition

### Installation

```
pip install qalsadi
```
#### Requirements

``` 
pip install -r requirements.txt 
```

## Usage
### Demo
The demo is available on [Tahadz.com](http://tahadz.com/mishkal) >Tools/َAnalysis قسم أدوات - تحليل
### Example 
#### Lemmatization
```python
>>> import qalsadi.lemmatizer 
>>> text = u"""هل تحتاج إلى ترجمة كي تفهم خطاب الملك؟ اللغة "الكلاسيكية" (الفصحى) موجودة في كل اللغات وكذلك اللغة "الدارجة" .. الفرنسية التي ندرس في المدرسة ليست الفرنسية التي يستخدمها الناس في شوارع باريس .. وملكة بريطانيا لا تخطب بلغة شوارع لندن .. لكل مقام مقال"""
>>> lemmer = qalsadi.lemmatizer.Lemmatizer()
>>> # lemmatize a word
... lemmer.lemmatize("يحتاج")
'احتاج'
>>> # lemmatize a word with a specific pos
>>> lemmer.lemmatize("وفي")
'في'
>>> lemmer.lemmatize("وفي", pos="v")
'وفى'

>>> lemmas = lemmer.lemmatize_text(text)
>>> print(lemmas)
['هل', 'احتاج', 'إلى', 'ترجمة', 'كي', 'تفهم', 'خطاب', 'ملك', '؟', 'لغة', '"', 'كلاسيكي', '"(', 'فصحى', ')', 'موجود', 'في', 'كل', 'لغة', 'ذلك', 'لغة', '"', 'دارج', '"..', 'فرنسي', 'التي', 'درس', 'في', 'مدرسة', 'ليست', 'فرنسي', 'التي', 'استخدم', 'ناس', 'في', 'شوارع', 'باريس', '..', 'ملك', 'بريطانيا', 'لا', 'خطب', 'بلغة', 'شوارع', 'دنو', '..', 'كل', 'مقام', 'مقالي']
>>> # lemmatize a text and return lemma pos
... lemmas = lemmer.lemmatize_text(text, return_pos=True)
>>> print(lemmas)
[('هل', 'stopword'), ('احتاج', 'verb'), ('إلى', 'stopword'), ('ترجمة', 'noun'), ('كي', 'stopword'), ('تفهم', 'noun'), ('خطاب', 'noun'), ('ملك', 'noun'), '؟', ('لغة', 'noun'), '"', ('كلاسيكي', 'noun'), '"(', ('فصحى', 'noun'), ')', ('موجود', 'noun'), ('في', 'stopword'), ('كل', 'stopword'), ('لغة', 'noun'), ('ذلك', 'stopword'), ('لغة', 'noun'), '"', ('دارج', 'noun'), '"..', ('فرنسي', 'noun'), ('التي', 'stopword'), ('درس', 'verb'), ('في', 'stopword'), ('مدرسة', 'noun'), ('ليست', 'stopword'), ('فرنسي', 'noun'), ('التي', 'stopword'), ('استخدم', 'verb'), ('ناس', 'noun'), ('في', 'stopword'), ('شوارع', 'noun'), ('باريس', 'all'), '..', ('ملك', 'noun'), ('بريطانيا', 'noun'), ('لا', 'stopword'), ('خطب', 'verb'), ('بلغة', 'noun'), ('شوارع', 'noun'), ('دنو', 'verb'), '..', ('كل', 'stopword'), ('مقام', 'noun'), ('مقالي', 'noun')]

>>> # Get vocalized output lemmas
>>> lemmer.set_vocalized_lemma()
>>> lemmas = lemmer.lemmatize_text(text)
>>> print(lemmas)
['هَلْ', 'اِحْتَاجَ', 'إِلَى', 'تَرْجَمَةٌ', 'كَيْ', 'تَفَهُّمٌ', 'خَطَّابٌ', 'مَلَكٌ', '؟', 'لُغَةٌ', '"', 'كِلاَسِيكِيٌّ', '"(', 'فُصْحَى', ')', 'مَوْجُودٌ', 'فِي', 'كُلَّ', 'لُغَةٌ', 'ذَلِكَ', 'لُغَةٌ', '"', 'دَارِجٌ', '"..', 'فَرَنْسِيّ', 'الَّتِي', 'دَرَسَ', 'فِي', 'مَدْرَسَةٌ', 'لَيْسَتْ', 'فَرَنْسِيّ', 'الَّتِي', 'اِسْتَخْدَمَ', 'نَاسٌ', 'فِي', 'شَوَارِعٌ', 'باريس', '..', 'مَلَكٌ', 'برِيطانِيا', 'لَا', 'خَطَبَ', 'بَلَغَةٌ', 'شَوَارِعٌ', 'أَدَانَ', '..', 'كُلَّ', 'مَقَامٌ', 'مَقَالٌ']
>>> 
```

#### Morphology analysis
``` python
filename="samples/text.txt"
import qalsadi.analex as qa
try:
    myfile=open(filename)
    text=(myfile.read()).decode('utf8');

    if text == None:
        text=u"السلام عليكم"
except:
    text=u"أسلم"
    print " given text"

debug=False;
limit=500
analyzer = qa.Analex()
analyzer.set_debug(debug);
result = analyzer.check_text(text);
print '----------------python format result-------'
print result
for i in range(len(result)):
#       print "--------تحليل كلمة  ------------", word.encode('utf8');
    print "-------------One word detailed case------";
    for analyzed in  result[i]:
        print "-------------one case for word------";
        print repr(analyzed);
```



#### Output description
Category   | Applied on | feature              | example         a|شرح
-----------|------------|----------------------|------------------|---
affix      | all        | affix_key            | ال--َاتُ-       a|مفتاح الزوائد
affix      | all        | affix                |                 a|الزوائد
input      | all        | word                 | البيانات        a|الكلمة المدخلة
input      | all        | unvocalized          |                 a|غير مشكول
morphology | noun       | tag_mamnou3          |0                a|ممنوع من الصرف
morphology | verb       | tag_confirmed        |0                a|خاصية الفعل المؤكد
morphology | verb       | tag_mood             |0                a|حالة الفعل المضارع (منصوب، مجزوم، مرفوع)
morphology | verb       | tag_pronoun          |0                a|الضمير
morphology | verb       | tag_transitive       |0                a|التعدي اللزوم
morphology | verb       | tag_voice            |0                a|البناء للمعلوم/ البناء للمجهول
morphology | noun       | tag_regular          |1                a|قياسي/ سماعي
morphology | noun/verb  | tag_gender           |3                a|النوع ( مؤنث مذكر)
morphology | verb       | tag_person           |4                a|الشخص (المتكلم الغائب المخاطب)
morphology | noun       | tag_number           |21               a|العدد(فرد/مثنى/جمع)
original   | noun/verb  | freq                 |694644           a|درجة شيوع الكلمة
original   | all        | original_tags        | (u              a|خصائص الكلمة الأصلية
original   | all        | original             | بَيَانٌ         a|الكلمة الأصلية
original   | all        | root                 | بين             a|الجذر
original   | all        | tag_original_gender  | مذكر            a|جنس الكلمة الأصلية
original   | noun       | tag_original_number  | مفرد            a|عدد الكلمة الأصلية
output     | all        | type                 | Noun:مصدر       a|نوع الكلمة
output     | all        | semivocalized        | الْبَيَانَات    a|الكلمة مشكولة بدون علامة الإعراب
output     | all        | vocalized            | الْبَيَانَاتُ   a|الكلمةمشكولة
output     | all        | stem                 | بيان            a|الجذع
syntax     | all        | tag_break            |0                a|الكلمة منفصلة عمّا قبلها
syntax     | all        | tag_initial          |0                a|خاصية نحوية، الكلمة في بداية الجملة
syntax     | all        | tag_transparent      |0                a|البدل
syntax     | noun       | tag_added            |0                a|خاصية نحوية، الكلمة مضاف
syntax     | all        | need                 |                 a|الكلمة تحتاج إلى كلمة أخرى (المتعدي، العوامل) غير منجزة
syntax     | tool       | action               |                 a|العمل
syntax     | tool       | object_type          |                 a|نوع المعمول، بالنسبة للعامل، مثلا اسم لحرف الجر

#### Unsing Cache
Qalsadi can use Cache to speed up the process, there are 4 kinds of cache,

* Memory cache
* Pickle cache
* Pickledb cache
* CodernityDB cache.

To use one of it, you can see the followng examples:
* Using a factory method
```python
>>> import qalsadi.analex
>>> from qalsadi.cache_factory import Cache_Factory
>>> analyzer = qalsadi.analex.Analex()
>>> # list available cache names
>>> Cache_Factory.list()
['', 'memory', 'pickle', 'pickledb', 'codernity']
>>> # configure cacher
>>> # configure path used to store the cache
>>> path = 'cache/qalsasicache.pickledb'
>>> cacher = Cache_Factory.factory("pickledb", path)
>>> analyzer.set_cacher(cacher)
>>> # to enable the use of cacher
>>> analyzer.enable_allow_cache_use()
```
* Memory cache

```python
>>> import qalsadi.analex
>>> analyzer = qalsadi.analex.Analex()
>>> # configure cacher
>>> import qalsadi.cache
>>> cacher = qalsadi.cache.Cache()
>>> analyzer.set_cacher(cacher)
>>> # to enable the use of cacher
>>> analyzer.enable_allow_cache_use()
>>> # to disable the use of cacher
>>> analyzer.disable_allow_cache_use()
```
* Pickle cache

```python
>>> import qalsadi.analex
>>> from qalsadi.cache_pickle import Cache
>>> analyzer = qalsadi.analex.Analex()
>>> # configure cacher
>>> # configure path used to store the cache
>>> path = 'cache/qalsadiCache.pickle'
>>> cacher = Cache(path)
>>> analyzer.set_cacher(cacher)
>>> # to enable the use of cacher
>>> analyzer.enable_allow_cache_use()

```
* Pickledb cache

```python
>>> import qalsadi.analex
>>> from qalsadi.cache_pickledb import Cache
>>> analyzer = qalsadi.analex.Analex()
>>> # configure cacher
>>> # configure path used to store the cache
>>> path = 'cache/qalsadiCache.pickledb'
>>> cacher = Cache(path)
>>> analyzer.set_cacher(cacher)
>>> # to enable the use of cacher
>>> analyzer.enable_allow_cache_use()

```
* CodernityDB cache


```python
>>> import qalsadi.analex
>>> from qalsadi.cache_codernity import Cache
>>> analyzer = qalsadi.analex.Analex()
>>> # configure cacher
>>> # configure path used to store the cache
>>> path = 'cache'
>>> cacher = Cache(path)
>>> analyzer.set_cacher(cacher)
>>> # to enable the use of cacher
>>> analyzer.enable_allow_cache_use()
```


            

Raw data

            {
    "_id": null,
    "home_page": "http://qalsadi.sourceforge.net/",
    "name": "qalsadi",
    "maintainer": "",
    "docs_url": "https://pythonhosted.org/qalsadi/",
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Taha Zerrouki",
    "author_email": "taha.zerrouki@gmail .com",
    "download_url": "",
    "platform": null,
    "description": "# Qalsadi Arabic Morphological Analyzer and Lemmatizer for Python\n\n\n\n  Developpers:  Taha Zerrouki: http://tahadz.com\n    taha dot zerrouki at gmail dot com\n\nFeatures  |   value\n----------|---------------------------------------------------------------------------------\nAuthors   | [Authors.md](https://github.com/linuxscout/qalsadi/master/AUTHORS.md)\nRelease   | 0.5 \nLicense   |[GPL](https://github.com/linuxscout/qalsadi/master/LICENSE)\nTracker   |[linuxscout/qalsadi/Issues](https://github.com/linuxscout/qalsadi/issues)\nWebsite   |[https://pypi.python.org/pypi/qalsadi](https://pypi.python.org/pypi/qalsadi)\nDoc       |[package Documentaion](https://qalsadi.readthedocs.io/)\nSource    |[Github](http://github.com/linuxscout/qalsadi)\nDownload  |[sourceforge](http://qalsadi.sourceforge.net)\nFeedbacks |[Comments](http://tahadz.com/qalsadi/contact)\nAccounts  |[@Twitter](https://twitter.com/linuxscout)  [@Sourceforge](http://sourceforge.net/projects/qalsadi/)\n\n\n\n## Citation\nIf you would cite it in academic work, can you use this citation\n```\nT. Zerrouki\u200f, Qalsadi, Arabic mophological analyzer Library for python.,  https://pypi.python.org/pypi/qalsadi/\n```\nAnother Citation:\n```\nZerrouki, Taha. \"Towards An Open Platform For Arabic Language Processing.\" (2020).\n```\nor in bibtex format\n\n```bibtex\n@misc{zerrouki2012qalsadi,\n  title={qalsadi, Arabic mophological analyzer Library for python.},\n  author={Zerrouki, Taha},\n  url={https://pypi.python.org/pypi/qalsadi},\n  year={2012}\n}\n\n```bibtex\n@thesis{zerrouki2020towards,\n  title={Towards An Open Platform For Arabic Language Processing},\n  author={Zerrouki, Taha},\n  year={2020}\n}\n\n```\n\n\n## Features  \u0645\u0632\u0627\u064a\u0627\n - Lemmatization\n - Vocalized Text Analyzer, \n - Use Qutrub library to analyze verbs.\n - give word frequency in Arabic modern use.\n\n### Applications\n\n* Stemming texts\n* Text Classification and categorization\n* Sentiment Analysis\n* Named Entities Recognition\n\n### Installation\n\n```\npip install qalsadi\n```\n#### Requirements\n\n``` \npip install -r requirements.txt \n```\n\n## Usage\n### Demo\nThe demo is available on [Tahadz.com](http://tahadz.com/mishkal) >Tools/\u064eAnalysis \u0642\u0633\u0645 \u0623\u062f\u0648\u0627\u062a - \u062a\u062d\u0644\u064a\u0644\n### Example \n#### Lemmatization\n```python\n>>> import qalsadi.lemmatizer \n>>> text = u\"\"\"\u0647\u0644 \u062a\u062d\u062a\u0627\u062c \u0625\u0644\u0649 \u062a\u0631\u062c\u0645\u0629 \u0643\u064a \u062a\u0641\u0647\u0645 \u062e\u0637\u0627\u0628 \u0627\u0644\u0645\u0644\u0643\u061f \u0627\u0644\u0644\u063a\u0629 \"\u0627\u0644\u0643\u0644\u0627\u0633\u064a\u0643\u064a\u0629\" (\u0627\u0644\u0641\u0635\u062d\u0649) \u0645\u0648\u062c\u0648\u062f\u0629 \u0641\u064a \u0643\u0644 \u0627\u0644\u0644\u063a\u0627\u062a \u0648\u0643\u0630\u0644\u0643 \u0627\u0644\u0644\u063a\u0629 \"\u0627\u0644\u062f\u0627\u0631\u062c\u0629\" .. \u0627\u0644\u0641\u0631\u0646\u0633\u064a\u0629 \u0627\u0644\u062a\u064a \u0646\u062f\u0631\u0633 \u0641\u064a \u0627\u0644\u0645\u062f\u0631\u0633\u0629 \u0644\u064a\u0633\u062a \u0627\u0644\u0641\u0631\u0646\u0633\u064a\u0629 \u0627\u0644\u062a\u064a \u064a\u0633\u062a\u062e\u062f\u0645\u0647\u0627 \u0627\u0644\u0646\u0627\u0633 \u0641\u064a \u0634\u0648\u0627\u0631\u0639 \u0628\u0627\u0631\u064a\u0633 .. \u0648\u0645\u0644\u0643\u0629 \u0628\u0631\u064a\u0637\u0627\u0646\u064a\u0627 \u0644\u0627 \u062a\u062e\u0637\u0628 \u0628\u0644\u063a\u0629 \u0634\u0648\u0627\u0631\u0639 \u0644\u0646\u062f\u0646 .. \u0644\u0643\u0644 \u0645\u0642\u0627\u0645 \u0645\u0642\u0627\u0644\"\"\"\n>>> lemmer = qalsadi.lemmatizer.Lemmatizer()\n>>> # lemmatize a word\n... lemmer.lemmatize(\"\u064a\u062d\u062a\u0627\u062c\")\n'\u0627\u062d\u062a\u0627\u062c'\n>>> # lemmatize a word with a specific pos\n>>> lemmer.lemmatize(\"\u0648\u0641\u064a\")\n'\u0641\u064a'\n>>> lemmer.lemmatize(\"\u0648\u0641\u064a\", pos=\"v\")\n'\u0648\u0641\u0649'\n\n>>> lemmas = lemmer.lemmatize_text(text)\n>>> print(lemmas)\n['\u0647\u0644', '\u0627\u062d\u062a\u0627\u062c', '\u0625\u0644\u0649', '\u062a\u0631\u062c\u0645\u0629', '\u0643\u064a', '\u062a\u0641\u0647\u0645', '\u062e\u0637\u0627\u0628', '\u0645\u0644\u0643', '\u061f', '\u0644\u063a\u0629', '\"', '\u0643\u0644\u0627\u0633\u064a\u0643\u064a', '\"(', '\u0641\u0635\u062d\u0649', ')', '\u0645\u0648\u062c\u0648\u062f', '\u0641\u064a', '\u0643\u0644', '\u0644\u063a\u0629', '\u0630\u0644\u0643', '\u0644\u063a\u0629', '\"', '\u062f\u0627\u0631\u062c', '\"..', '\u0641\u0631\u0646\u0633\u064a', '\u0627\u0644\u062a\u064a', '\u062f\u0631\u0633', '\u0641\u064a', '\u0645\u062f\u0631\u0633\u0629', '\u0644\u064a\u0633\u062a', '\u0641\u0631\u0646\u0633\u064a', '\u0627\u0644\u062a\u064a', '\u0627\u0633\u062a\u062e\u062f\u0645', '\u0646\u0627\u0633', '\u0641\u064a', '\u0634\u0648\u0627\u0631\u0639', '\u0628\u0627\u0631\u064a\u0633', '..', '\u0645\u0644\u0643', '\u0628\u0631\u064a\u0637\u0627\u0646\u064a\u0627', '\u0644\u0627', '\u062e\u0637\u0628', '\u0628\u0644\u063a\u0629', '\u0634\u0648\u0627\u0631\u0639', '\u062f\u0646\u0648', '..', '\u0643\u0644', '\u0645\u0642\u0627\u0645', '\u0645\u0642\u0627\u0644\u064a']\n>>> # lemmatize a text and return lemma pos\n... lemmas = lemmer.lemmatize_text(text, return_pos=True)\n>>> print(lemmas)\n[('\u0647\u0644', 'stopword'), ('\u0627\u062d\u062a\u0627\u062c', 'verb'), ('\u0625\u0644\u0649', 'stopword'), ('\u062a\u0631\u062c\u0645\u0629', 'noun'), ('\u0643\u064a', 'stopword'), ('\u062a\u0641\u0647\u0645', 'noun'), ('\u062e\u0637\u0627\u0628', 'noun'), ('\u0645\u0644\u0643', 'noun'), '\u061f', ('\u0644\u063a\u0629', 'noun'), '\"', ('\u0643\u0644\u0627\u0633\u064a\u0643\u064a', 'noun'), '\"(', ('\u0641\u0635\u062d\u0649', 'noun'), ')', ('\u0645\u0648\u062c\u0648\u062f', 'noun'), ('\u0641\u064a', 'stopword'), ('\u0643\u0644', 'stopword'), ('\u0644\u063a\u0629', 'noun'), ('\u0630\u0644\u0643', 'stopword'), ('\u0644\u063a\u0629', 'noun'), '\"', ('\u062f\u0627\u0631\u062c', 'noun'), '\"..', ('\u0641\u0631\u0646\u0633\u064a', 'noun'), ('\u0627\u0644\u062a\u064a', 'stopword'), ('\u062f\u0631\u0633', 'verb'), ('\u0641\u064a', 'stopword'), ('\u0645\u062f\u0631\u0633\u0629', 'noun'), ('\u0644\u064a\u0633\u062a', 'stopword'), ('\u0641\u0631\u0646\u0633\u064a', 'noun'), ('\u0627\u0644\u062a\u064a', 'stopword'), ('\u0627\u0633\u062a\u062e\u062f\u0645', 'verb'), ('\u0646\u0627\u0633', 'noun'), ('\u0641\u064a', 'stopword'), ('\u0634\u0648\u0627\u0631\u0639', 'noun'), ('\u0628\u0627\u0631\u064a\u0633', 'all'), '..', ('\u0645\u0644\u0643', 'noun'), ('\u0628\u0631\u064a\u0637\u0627\u0646\u064a\u0627', 'noun'), ('\u0644\u0627', 'stopword'), ('\u062e\u0637\u0628', 'verb'), ('\u0628\u0644\u063a\u0629', 'noun'), ('\u0634\u0648\u0627\u0631\u0639', 'noun'), ('\u062f\u0646\u0648', 'verb'), '..', ('\u0643\u0644', 'stopword'), ('\u0645\u0642\u0627\u0645', 'noun'), ('\u0645\u0642\u0627\u0644\u064a', 'noun')]\n\n>>> # Get vocalized output lemmas\n>>> lemmer.set_vocalized_lemma()\n>>> lemmas = lemmer.lemmatize_text(text)\n>>> print(lemmas)\n['\u0647\u064e\u0644\u0652', '\u0627\u0650\u062d\u0652\u062a\u064e\u0627\u062c\u064e', '\u0625\u0650\u0644\u064e\u0649', '\u062a\u064e\u0631\u0652\u062c\u064e\u0645\u064e\u0629\u064c', '\u0643\u064e\u064a\u0652', '\u062a\u064e\u0641\u064e\u0647\u0651\u064f\u0645\u064c', '\u062e\u064e\u0637\u0651\u064e\u0627\u0628\u064c', '\u0645\u064e\u0644\u064e\u0643\u064c', '\u061f', '\u0644\u064f\u063a\u064e\u0629\u064c', '\"', '\u0643\u0650\u0644\u0627\u064e\u0633\u0650\u064a\u0643\u0650\u064a\u0651\u064c', '\"(', '\u0641\u064f\u0635\u0652\u062d\u064e\u0649', ')', '\u0645\u064e\u0648\u0652\u062c\u064f\u0648\u062f\u064c', '\u0641\u0650\u064a', '\u0643\u064f\u0644\u0651\u064e', '\u0644\u064f\u063a\u064e\u0629\u064c', '\u0630\u064e\u0644\u0650\u0643\u064e', '\u0644\u064f\u063a\u064e\u0629\u064c', '\"', '\u062f\u064e\u0627\u0631\u0650\u062c\u064c', '\"..', '\u0641\u064e\u0631\u064e\u0646\u0652\u0633\u0650\u064a\u0651', '\u0627\u0644\u0651\u064e\u062a\u0650\u064a', '\u062f\u064e\u0631\u064e\u0633\u064e', '\u0641\u0650\u064a', '\u0645\u064e\u062f\u0652\u0631\u064e\u0633\u064e\u0629\u064c', '\u0644\u064e\u064a\u0652\u0633\u064e\u062a\u0652', '\u0641\u064e\u0631\u064e\u0646\u0652\u0633\u0650\u064a\u0651', '\u0627\u0644\u0651\u064e\u062a\u0650\u064a', '\u0627\u0650\u0633\u0652\u062a\u064e\u062e\u0652\u062f\u064e\u0645\u064e', '\u0646\u064e\u0627\u0633\u064c', '\u0641\u0650\u064a', '\u0634\u064e\u0648\u064e\u0627\u0631\u0650\u0639\u064c', '\u0628\u0627\u0631\u064a\u0633', '..', '\u0645\u064e\u0644\u064e\u0643\u064c', '\u0628\u0631\u0650\u064a\u0637\u0627\u0646\u0650\u064a\u0627', '\u0644\u064e\u0627', '\u062e\u064e\u0637\u064e\u0628\u064e', '\u0628\u064e\u0644\u064e\u063a\u064e\u0629\u064c', '\u0634\u064e\u0648\u064e\u0627\u0631\u0650\u0639\u064c', '\u0623\u064e\u062f\u064e\u0627\u0646\u064e', '..', '\u0643\u064f\u0644\u0651\u064e', '\u0645\u064e\u0642\u064e\u0627\u0645\u064c', '\u0645\u064e\u0642\u064e\u0627\u0644\u064c']\n>>> \n```\n\n#### Morphology analysis\n``` python\nfilename=\"samples/text.txt\"\nimport qalsadi.analex as qa\ntry:\n    myfile=open(filename)\n    text=(myfile.read()).decode('utf8');\n\n    if text == None:\n        text=u\"\u0627\u0644\u0633\u0644\u0627\u0645 \u0639\u0644\u064a\u0643\u0645\"\nexcept:\n    text=u\"\u0623\u0633\u0644\u0645\"\n    print \" given text\"\n\ndebug=False;\nlimit=500\nanalyzer = qa.Analex()\nanalyzer.set_debug(debug);\nresult = analyzer.check_text(text);\nprint '----------------python format result-------'\nprint result\nfor i in range(len(result)):\n#       print \"--------\u062a\u062d\u0644\u064a\u0644 \u0643\u0644\u0645\u0629  ------------\", word.encode('utf8');\n    print \"-------------One word detailed case------\";\n    for analyzed in  result[i]:\n        print \"-------------one case for word------\";\n        print repr(analyzed);\n```\n\n\n\n#### Output description\nCategory   | Applied on | feature              | example         a|\u0634\u0631\u062d\n-----------|------------|----------------------|------------------|---\naffix      | all        | affix_key            | \u0627\u0644--\u064e\u0627\u062a\u064f-       a|\u0645\u0641\u062a\u0627\u062d \u0627\u0644\u0632\u0648\u0627\u0626\u062f\naffix      | all        | affix                |                 a|\u0627\u0644\u0632\u0648\u0627\u0626\u062f\ninput      | all        | word                 | \u0627\u0644\u0628\u064a\u0627\u0646\u0627\u062a        a|\u0627\u0644\u0643\u0644\u0645\u0629 \u0627\u0644\u0645\u062f\u062e\u0644\u0629\ninput      | all        | unvocalized          |                 a|\u063a\u064a\u0631 \u0645\u0634\u0643\u0648\u0644\nmorphology | noun       | tag_mamnou3          |0                a|\u0645\u0645\u0646\u0648\u0639 \u0645\u0646 \u0627\u0644\u0635\u0631\u0641\nmorphology | verb       | tag_confirmed        |0                a|\u062e\u0627\u0635\u064a\u0629 \u0627\u0644\u0641\u0639\u0644 \u0627\u0644\u0645\u0624\u0643\u062f\nmorphology | verb       | tag_mood             |0                a|\u062d\u0627\u0644\u0629 \u0627\u0644\u0641\u0639\u0644 \u0627\u0644\u0645\u0636\u0627\u0631\u0639 (\u0645\u0646\u0635\u0648\u0628\u060c \u0645\u062c\u0632\u0648\u0645\u060c \u0645\u0631\u0641\u0648\u0639)\nmorphology | verb       | tag_pronoun          |0                a|\u0627\u0644\u0636\u0645\u064a\u0631\nmorphology | verb       | tag_transitive       |0                a|\u0627\u0644\u062a\u0639\u062f\u064a \u0627\u0644\u0644\u0632\u0648\u0645\nmorphology | verb       | tag_voice            |0                a|\u0627\u0644\u0628\u0646\u0627\u0621 \u0644\u0644\u0645\u0639\u0644\u0648\u0645/ \u0627\u0644\u0628\u0646\u0627\u0621 \u0644\u0644\u0645\u062c\u0647\u0648\u0644\nmorphology | noun       | tag_regular          |1                a|\u0642\u064a\u0627\u0633\u064a/ \u0633\u0645\u0627\u0639\u064a\nmorphology | noun/verb  | tag_gender           |3                a|\u0627\u0644\u0646\u0648\u0639 ( \u0645\u0624\u0646\u062b \u0645\u0630\u0643\u0631)\nmorphology | verb       | tag_person           |4                a|\u0627\u0644\u0634\u062e\u0635 (\u0627\u0644\u0645\u062a\u0643\u0644\u0645 \u0627\u0644\u063a\u0627\u0626\u0628 \u0627\u0644\u0645\u062e\u0627\u0637\u0628)\nmorphology | noun       | tag_number           |21               a|\u0627\u0644\u0639\u062f\u062f(\u0641\u0631\u062f/\u0645\u062b\u0646\u0649/\u062c\u0645\u0639)\noriginal   | noun/verb  | freq                 |694644           a|\u062f\u0631\u062c\u0629 \u0634\u064a\u0648\u0639 \u0627\u0644\u0643\u0644\u0645\u0629\noriginal   | all        | original_tags        | (u              a|\u062e\u0635\u0627\u0626\u0635 \u0627\u0644\u0643\u0644\u0645\u0629 \u0627\u0644\u0623\u0635\u0644\u064a\u0629\noriginal   | all        | original             | \u0628\u064e\u064a\u064e\u0627\u0646\u064c         a|\u0627\u0644\u0643\u0644\u0645\u0629 \u0627\u0644\u0623\u0635\u0644\u064a\u0629\noriginal   | all        | root                 | \u0628\u064a\u0646             a|\u0627\u0644\u062c\u0630\u0631\noriginal   | all        | tag_original_gender  | \u0645\u0630\u0643\u0631            a|\u062c\u0646\u0633 \u0627\u0644\u0643\u0644\u0645\u0629 \u0627\u0644\u0623\u0635\u0644\u064a\u0629\noriginal   | noun       | tag_original_number  | \u0645\u0641\u0631\u062f            a|\u0639\u062f\u062f \u0627\u0644\u0643\u0644\u0645\u0629 \u0627\u0644\u0623\u0635\u0644\u064a\u0629\noutput     | all        | type                 | Noun:\u0645\u0635\u062f\u0631       a|\u0646\u0648\u0639 \u0627\u0644\u0643\u0644\u0645\u0629\noutput     | all        | semivocalized        | \u0627\u0644\u0652\u0628\u064e\u064a\u064e\u0627\u0646\u064e\u0627\u062a    a|\u0627\u0644\u0643\u0644\u0645\u0629 \u0645\u0634\u0643\u0648\u0644\u0629 \u0628\u062f\u0648\u0646 \u0639\u0644\u0627\u0645\u0629 \u0627\u0644\u0625\u0639\u0631\u0627\u0628\noutput     | all        | vocalized            | \u0627\u0644\u0652\u0628\u064e\u064a\u064e\u0627\u0646\u064e\u0627\u062a\u064f   a|\u0627\u0644\u0643\u0644\u0645\u0629\u0645\u0634\u0643\u0648\u0644\u0629\noutput     | all        | stem                 | \u0628\u064a\u0627\u0646            a|\u0627\u0644\u062c\u0630\u0639\nsyntax     | all        | tag_break            |0                a|\u0627\u0644\u0643\u0644\u0645\u0629 \u0645\u0646\u0641\u0635\u0644\u0629 \u0639\u0645\u0651\u0627 \u0642\u0628\u0644\u0647\u0627\nsyntax     | all        | tag_initial          |0                a|\u062e\u0627\u0635\u064a\u0629 \u0646\u062d\u0648\u064a\u0629\u060c \u0627\u0644\u0643\u0644\u0645\u0629 \u0641\u064a \u0628\u062f\u0627\u064a\u0629 \u0627\u0644\u062c\u0645\u0644\u0629\nsyntax     | all        | tag_transparent      |0                a|\u0627\u0644\u0628\u062f\u0644\nsyntax     | noun       | tag_added            |0                a|\u062e\u0627\u0635\u064a\u0629 \u0646\u062d\u0648\u064a\u0629\u060c \u0627\u0644\u0643\u0644\u0645\u0629 \u0645\u0636\u0627\u0641\nsyntax     | all        | need                 |                 a|\u0627\u0644\u0643\u0644\u0645\u0629 \u062a\u062d\u062a\u0627\u062c \u0625\u0644\u0649 \u0643\u0644\u0645\u0629 \u0623\u062e\u0631\u0649 (\u0627\u0644\u0645\u062a\u0639\u062f\u064a\u060c \u0627\u0644\u0639\u0648\u0627\u0645\u0644) \u063a\u064a\u0631 \u0645\u0646\u062c\u0632\u0629\nsyntax     | tool       | action               |                 a|\u0627\u0644\u0639\u0645\u0644\nsyntax     | tool       | object_type          |                 a|\u0646\u0648\u0639 \u0627\u0644\u0645\u0639\u0645\u0648\u0644\u060c \u0628\u0627\u0644\u0646\u0633\u0628\u0629 \u0644\u0644\u0639\u0627\u0645\u0644\u060c \u0645\u062b\u0644\u0627 \u0627\u0633\u0645 \u0644\u062d\u0631\u0641 \u0627\u0644\u062c\u0631\n\n#### Unsing Cache\nQalsadi can use Cache to speed up the process, there are 4 kinds of cache,\n\n* Memory cache\n* Pickle cache\n* Pickledb cache\n* CodernityDB cache.\n\nTo use one of it, you can see the followng examples:\n* Using a factory method\n```python\n>>> import qalsadi.analex\n>>> from qalsadi.cache_factory import Cache_Factory\n>>> analyzer = qalsadi.analex.Analex()\n>>> # list available cache names\n>>> Cache_Factory.list()\n['', 'memory', 'pickle', 'pickledb', 'codernity']\n>>> # configure cacher\n>>> # configure path used to store the cache\n>>> path = 'cache/qalsasicache.pickledb'\n>>> cacher = Cache_Factory.factory(\"pickledb\", path)\n>>> analyzer.set_cacher(cacher)\n>>> # to enable the use of cacher\n>>> analyzer.enable_allow_cache_use()\n```\n* Memory cache\n\n```python\n>>> import qalsadi.analex\n>>> analyzer = qalsadi.analex.Analex()\n>>> # configure cacher\n>>> import qalsadi.cache\n>>> cacher = qalsadi.cache.Cache()\n>>> analyzer.set_cacher(cacher)\n>>> # to enable the use of cacher\n>>> analyzer.enable_allow_cache_use()\n>>> # to disable the use of cacher\n>>> analyzer.disable_allow_cache_use()\n```\n* Pickle cache\n\n```python\n>>> import qalsadi.analex\n>>> from qalsadi.cache_pickle import Cache\n>>> analyzer = qalsadi.analex.Analex()\n>>> # configure cacher\n>>> # configure path used to store the cache\n>>> path = 'cache/qalsadiCache.pickle'\n>>> cacher = Cache(path)\n>>> analyzer.set_cacher(cacher)\n>>> # to enable the use of cacher\n>>> analyzer.enable_allow_cache_use()\n\n```\n* Pickledb cache\n\n```python\n>>> import qalsadi.analex\n>>> from qalsadi.cache_pickledb import Cache\n>>> analyzer = qalsadi.analex.Analex()\n>>> # configure cacher\n>>> # configure path used to store the cache\n>>> path = 'cache/qalsadiCache.pickledb'\n>>> cacher = Cache(path)\n>>> analyzer.set_cacher(cacher)\n>>> # to enable the use of cacher\n>>> analyzer.enable_allow_cache_use()\n\n```\n* CodernityDB cache\n\n\n```python\n>>> import qalsadi.analex\n>>> from qalsadi.cache_codernity import Cache\n>>> analyzer = qalsadi.analex.Analex()\n>>> # configure cacher\n>>> # configure path used to store the cache\n>>> path = 'cache'\n>>> cacher = Cache(path)\n>>> analyzer.set_cacher(cacher)\n>>> # to enable the use of cacher\n>>> analyzer.enable_allow_cache_use()\n```\n\n",
    "bugtrack_url": null,
    "license": "GPL",
    "summary": "Qalsadi Arabic Morphological Analyzer and lemmatizer for Python",
    "version": "0.5",
    "project_urls": {
        "Homepage": "http://qalsadi.sourceforge.net/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "60d4cec78d96862c7afd8cddf3d214c02a5d1a04c680b704575e31539a76c674",
                "md5": "6f7bcc7ab4b39643a708a66de9086997",
                "sha256": "56cff2d3f89db5b7805162144369df88ae434267538f4f6898fd5b7b491d4557"
            },
            "downloads": -1,
            "filename": "qalsadi-0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6f7bcc7ab4b39643a708a66de9086997",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 264267,
            "upload_time": "2023-07-17T06:39:26",
            "upload_time_iso_8601": "2023-07-17T06:39:26.371750Z",
            "url": "https://files.pythonhosted.org/packages/60/d4/cec78d96862c7afd8cddf3d214c02a5d1a04c680b704575e31539a76c674/qalsadi-0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-17 06:39:26",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "qalsadi"
}
        
Elapsed time: 0.09151s