Persian SpellChecker
===============================================================================
.. image:: https://img.shields.io/badge/license-MIT-blue.svg
:target: https://opensource.org/licenses/MIT/
:alt: License
.. image:: https://img.shields.io/github/v/release/AshkanFeyzollahi/fa-spellchecker
:target: https://github.com/AshkanFeyzollahi/fa-spellchecker/releases/
:alt: GitHub Release
.. image:: https://img.shields.io/pypi/v/fa-spellchecker
:target: https://pypi.org/project/fa-spellchecker/
:alt: PyPI - Version
.. image:: https://img.shields.io/pypi/dm/fa-spellchecker
:target: https://pypi.org/project/fa-spellchecker/
:alt: PyPI - Downloads
.. image:: https://img.shields.io/readthedocs/fa-spellchecker
:target: https://fa-spellchecker.readthedocs.io/en/latest/
:alt: Read the Docs
Pure Python Persian Spell Checking based on `Peter Norvig's blog post <https://norvig.com/spell-correct.html>`__ on setting up a simple spell checking algorithm and also inspired by `pyspellchecker <https://github.com/barrust/pyspellchecker>`__.
As said in **pyspellchecker**, It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results.
**fa-spellchecker** is **ONLY** made for the language **Persian**, and it requires `Python>=3.7` to work properly. For information on how the Persian dictionary was created and how it can be updated and improved, please see the **Dictionary Creation and Updating** section of the readme!
Installation
-------------------------------------------------------------------------------
To start using **fa-spellchecker** in your Python3 projects, you have to install it which is done by ways below:
1. Package could be installed by using the python package manager **Pip**:
.. code:: bash
pip install fa-spellchecker
2. Building package from its source:
.. code:: bash
git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git
cd fa-spellchecker
python -m build
Quickstart
-------------------------------------------------------------------------------
To get started using **fa-spellchecker**, you must install it, if it's not installed already check the **Installation** section in readme! Here are some examples for **fa-spellchecker**:
1. A simple Persian word correction example:
.. code:: python
# Import SpellChecker object from faspellchecker
from faspellchecker import SpellChecker
# Initialize a faspellchecker.SpellChecker instance
spellchecker = SpellChecker()
# Correct the Persian misspelled word
print(spellchecker.correction('سابون')) # 'صابون'
2. An advanced correction for Persian words found in a sentence with the help of `hazm <https://github.com/roshan-research/hazm>`__:
.. code:: python
# Import dependencies
from hazm import word_tokenize
from faspellchecker import SpellChecker
from faspellchecker.utils import ignore_non_persian_words
# Define a sentence of Persian words
a_persian_sentence = "من به پارک رفتم و در آنجا با دوشت هایم بازی کردم"
# Tokenize the sentence into a list of words
tokenized_sentence = word_tokenize(a_persian_sentence)
# Ignore the non Persian words (in this case there are no non Persian words
# based on function `is_word_persian`, so this line will return the give list
# itself)
tokenized_sentence = ignore_non_persian_words(tokenized_sentence)
# Initialize a faspellchecker.SpellChecker instance
spellchecker = SpellChecker()
# Find all misspelled words
for misspelled_word in spellchecker.unknown(tokenized_sentence):
# And display a list of correct words based on misspelled word
print(spellchecker.candidates(misspelled_word))
For more information on how to use this package, check out `On-line documentations <https://fa-spellchecker.readthedocs.io/en/latest/>`__!
Dictionary Creation and Updating
-------------------------------------------------------------------------------
I have provided a script that, given a text file of words & sentences (in this case from the txt files in the folder `resources <resources/>`__) it will generate a *Persian* word frequency list based on the words found within the text.
Adding new files to `resources <resources/>`__ will lead to force the `scripts/build_dictionary.py` to use them as a resource to build a Persian dictionary file which then that dictionary file will be used by `faspellchecker`.
The easiest way to build Persian dictionary files using the `scripts/build_dictionary.py`:
.. code:: bash
git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git
cd fa-spellchecker
python scripts/build_dictionary.py
Any help in updating and maintaining the dictionary would be greatly desired. To do this, a `discussion <https://github.com/AshkanFeyzollahi/fa-spellchecker/discussions>`__ could be started on GitHub or pull requests to update the include and exclude files could be added.
Credits
-------------------------------------------------------------------------------
* `Peter Norvig <https://norvig.com/spell-correct.html>`__ blog post on setting up a simple spell checking algorithm.
* `persiannlp/persian-raw-text <https://github.com/persiannlp/persian-raw-text>`__ Contains a huge amount of Persian text such as Persian corpora. VOA corpus was collected from this repository in order to create a word frequency list!
Raw data
{
"_id": null,
"home_page": null,
"name": "fa-spellchecker",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "python, spelling, natural language processing, nlp, typo, checker, persian",
"author": null,
"author_email": "Ashkan Feyzollahi <ashkanfeyzollahi@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/00/1f/c027a2fc99b3f8ebf1b4a30e87a68bc61a3063586563d7e087f5662809fc/fa_spellchecker-0.3.1.tar.gz",
"platform": null,
"description": "Persian SpellChecker\n===============================================================================\n\n.. image:: https://img.shields.io/badge/license-MIT-blue.svg\n :target: https://opensource.org/licenses/MIT/\n :alt: License\n.. image:: https://img.shields.io/github/v/release/AshkanFeyzollahi/fa-spellchecker\n :target: https://github.com/AshkanFeyzollahi/fa-spellchecker/releases/\n :alt: GitHub Release\n.. image:: https://img.shields.io/pypi/v/fa-spellchecker\n :target: https://pypi.org/project/fa-spellchecker/\n :alt: PyPI - Version\n.. image:: https://img.shields.io/pypi/dm/fa-spellchecker\n :target: https://pypi.org/project/fa-spellchecker/\n :alt: PyPI - Downloads\n.. image:: https://img.shields.io/readthedocs/fa-spellchecker\n :target: https://fa-spellchecker.readthedocs.io/en/latest/\n :alt: Read the Docs\n\nPure Python Persian Spell Checking based on `Peter Norvig's blog post <https://norvig.com/spell-correct.html>`__ on setting up a simple spell checking algorithm and also inspired by `pyspellchecker <https://github.com/barrust/pyspellchecker>`__.\n\nAs said in **pyspellchecker**, It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results.\n\n**fa-spellchecker** is **ONLY** made for the language **Persian**, and it requires `Python>=3.7` to work properly. For information on how the Persian dictionary was created and how it can be updated and improved, please see the **Dictionary Creation and Updating** section of the readme!\n\nInstallation\n-------------------------------------------------------------------------------\n\nTo start using **fa-spellchecker** in your Python3 projects, you have to install it which is done by ways below:\n\n 1. Package could be installed by using the python package manager **Pip**:\n\n .. code:: bash\n\n pip install fa-spellchecker\n\n 2. Building package from its source:\n\n .. code:: bash\n\n git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git\n cd fa-spellchecker\n python -m build\n\n\nQuickstart\n-------------------------------------------------------------------------------\n\nTo get started using **fa-spellchecker**, you must install it, if it's not installed already check the **Installation** section in readme! Here are some examples for **fa-spellchecker**:\n\n 1. A simple Persian word correction example:\n\n .. code:: python\n\n # Import SpellChecker object from faspellchecker\n from faspellchecker import SpellChecker\n\n # Initialize a faspellchecker.SpellChecker instance\n spellchecker = SpellChecker()\n\n # Correct the Persian misspelled word\n print(spellchecker.correction('\u0633\u0627\u0628\u0648\u0646')) # '\u0635\u0627\u0628\u0648\u0646'\n\n 2. An advanced correction for Persian words found in a sentence with the help of `hazm <https://github.com/roshan-research/hazm>`__:\n\n .. code:: python\n\n # Import dependencies\n from hazm import word_tokenize\n\n from faspellchecker import SpellChecker\n from faspellchecker.utils import ignore_non_persian_words\n\n # Define a sentence of Persian words\n a_persian_sentence = \"\u0645\u0646 \u0628\u0647 \u067e\u0627\u0631\u06a9 \u0631\u0641\u062a\u0645 \u0648 \u062f\u0631 \u0622\u0646\u062c\u0627 \u0628\u0627 \u062f\u0648\u0634\u062a \u0647\u0627\u06cc\u0645 \u0628\u0627\u0632\u06cc \u06a9\u0631\u062f\u0645\"\n\n # Tokenize the sentence into a list of words\n tokenized_sentence = word_tokenize(a_persian_sentence)\n\n # Ignore the non Persian words (in this case there are no non Persian words\n # based on function `is_word_persian`, so this line will return the give list\n # itself)\n tokenized_sentence = ignore_non_persian_words(tokenized_sentence)\n\n # Initialize a faspellchecker.SpellChecker instance\n spellchecker = SpellChecker()\n\n # Find all misspelled words\n for misspelled_word in spellchecker.unknown(tokenized_sentence):\n # And display a list of correct words based on misspelled word\n print(spellchecker.candidates(misspelled_word))\n\nFor more information on how to use this package, check out `On-line documentations <https://fa-spellchecker.readthedocs.io/en/latest/>`__!\n\nDictionary Creation and Updating\n-------------------------------------------------------------------------------\n\nI have provided a script that, given a text file of words & sentences (in this case from the txt files in the folder `resources <resources/>`__) it will generate a *Persian* word frequency list based on the words found within the text.\n\nAdding new files to `resources <resources/>`__ will lead to force the `scripts/build_dictionary.py` to use them as a resource to build a Persian dictionary file which then that dictionary file will be used by `faspellchecker`.\n\nThe easiest way to build Persian dictionary files using the `scripts/build_dictionary.py`:\n\n.. code:: bash\n\n git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git\n cd fa-spellchecker\n python scripts/build_dictionary.py\n\nAny help in updating and maintaining the dictionary would be greatly desired. To do this, a `discussion <https://github.com/AshkanFeyzollahi/fa-spellchecker/discussions>`__ could be started on GitHub or pull requests to update the include and exclude files could be added.\n\nCredits\n-------------------------------------------------------------------------------\n\n* `Peter Norvig <https://norvig.com/spell-correct.html>`__ blog post on setting up a simple spell checking algorithm.\n* `persiannlp/persian-raw-text <https://github.com/persiannlp/persian-raw-text>`__ Contains a huge amount of Persian text such as Persian corpora. VOA corpus was collected from this repository in order to create a word frequency list!\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Pure Python Spell Checking",
"version": "0.3.1",
"project_urls": {
"bug-tracker": "https://github.com/AshkanFeyzollahi/fa-spellchecker/issues",
"documentation": "https://fa-spellchecker.readthedocs.io/en/latest/",
"homepage": "https://github.com/AshkanFeyzollahi/fa-spellchecker"
},
"split_keywords": [
"python",
" spelling",
" natural language processing",
" nlp",
" typo",
" checker",
" persian"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9060898da66b68af54950373b58f3ae2a5b12c5c43a21a0026f11e62f39af0c1",
"md5": "0cbdd2a28b06fdd4712c0a8a8ba2caa8",
"sha256": "d60ed165da6410e27ea4d82b94bd9cc2e3ded95fd1aad3cbb53be92099f54c01"
},
"downloads": -1,
"filename": "fa_spellchecker-0.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0cbdd2a28b06fdd4712c0a8a8ba2caa8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 692920,
"upload_time": "2025-01-06T06:49:20",
"upload_time_iso_8601": "2025-01-06T06:49:20.569560Z",
"url": "https://files.pythonhosted.org/packages/90/60/898da66b68af54950373b58f3ae2a5b12c5c43a21a0026f11e62f39af0c1/fa_spellchecker-0.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "001fc027a2fc99b3f8ebf1b4a30e87a68bc61a3063586563d7e087f5662809fc",
"md5": "edf4ad0fdb349b7b62fc17251f81d98b",
"sha256": "db8e0ce8b85f720611d2f869cf7310e06ef43d93518634b10eca5962d1c48ad4"
},
"downloads": -1,
"filename": "fa_spellchecker-0.3.1.tar.gz",
"has_sig": false,
"md5_digest": "edf4ad0fdb349b7b62fc17251f81d98b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 692136,
"upload_time": "2025-01-06T06:49:23",
"upload_time_iso_8601": "2025-01-06T06:49:23.874836Z",
"url": "https://files.pythonhosted.org/packages/00/1f/c027a2fc99b3f8ebf1b4a30e87a68bc61a3063586563d7e087f5662809fc/fa_spellchecker-0.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-06 06:49:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AshkanFeyzollahi",
"github_project": "fa-spellchecker",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "fa-spellchecker"
}