Persian SpellChecker
===============================================================================
.. image:: https://img.shields.io/badge/license-MIT-blue.svg
:target: https://opensource.org/licenses/MIT/
:alt: License
.. image:: https://img.shields.io/github/v/release/AshkanFeyzollahi/fa-spellchecker
:target: https://github.com/AshkanFeyzollahi/fa-spellchecker/releases/
:alt: GitHub Release
.. image:: https://img.shields.io/pypi/v/fa-spellchecker
:target: https://pypi.org/project/fa-spellchecker/
:alt: PyPI - Version
.. image:: https://img.shields.io/pypi/dm/fa-spellchecker
:target: https://pypi.org/project/fa-spellchecker/
:alt: PyPI - Downloads
.. image:: https://img.shields.io/readthedocs/fa-spellchecker
:target: https://fa-spellchecker.readthedocs.io/en/latest/
:alt: Read the Docs
Pure Python Persian Spell Checking based on `Peter Norvig's blog post <https://norvig.com/spell-correct.html>`__ on setting up a simple spell checking algorithm and also inspired by `pyspellchecker <https://github.com/barrust/pyspellchecker>`__.
As said in **pyspellchecker**, It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results.
**fa-spellchecker** is **ONLY** made for the language **Persian**, and it requires `Python>=3.7` to work properly. For information on how the Persian vocabulary was created and how it can be updated and improved, please see the **Vocabulary Creation and Updating** section of the readme!
Installation
-------------------------------------------------------------------------------
To start using **fa-spellchecker** in your Python3 projects, you have to install it which is done by ways below:
1. Package could be installed by using the python package manager **Pip**:
.. code:: bash
pip install fa-spellchecker
2. Building package from its source:
.. code:: bash
git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git
cd fa-spellchecker
python -m build
Quickstart
-------------------------------------------------------------------------------
To get started using **fa-spellchecker**, you must install it, if it's not installed already check the **Installation** section in readme! Here are some examples for **fa-spellchecker**:
1. A simple Persian word correction example:
.. code:: python
# Import SpellChecker object from faspellchecker
from faspellchecker import SpellChecker
# Initialize a faspellchecker.SpellChecker instance
spellchecker = SpellChecker()
# Correct the Persian misspelled word
print(spellchecker.correction('سابون')) # 'صابون'
2. An advanced correction for Persian words found in a sentence with the help of `hazm <https://github.com/roshan-research/hazm>`__:
.. code:: python
# Import dependencies
from hazm import word_tokenize
from faspellchecker import SpellChecker
# Define a sentence of Persian words
a_persian_sentence = "من به پارک رفتم و در آنجا با دوشت هایم بازی کردم"
# Tokenize the sentence into a list of words
tokenized_sentence = word_tokenize(a_persian_sentence)
# Initialize a faspellchecker.SpellChecker instance
spellchecker = SpellChecker()
# Find all misspelled words
for misspelled_word in spellchecker.unknown(tokenized_sentence):
# And display a list of correct words based on misspelled word
print(spellchecker.candidates(misspelled_word))
For more information on how to use this package, check out `On-line documentations <https://fa-spellchecker.readthedocs.io/en/latest/>`__!
Vocabulary Creation and Updating
-------------------------------------------------------------------------------
I have provided a script that, given a text file of words & sentences (in this case from the txt files in the folder `resources <resources/>`__) it will generate a *Persian* word frequency list based on the words found within the text.
Adding new files to `resources <resources/>`__ will lead to force the `scripts/build_vocabulary.py` to use them as a resource to build a Persian vocabulary file which then that vocabulary file will be used by `faspellchecker`.
The easiest way to build Persian vocabulary files using the `scripts/build_vocabulary.py`:
.. code:: bash
git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git
cd fa-spellchecker
python scripts/build_vocabulary.py
Any help in updating and maintaining the vocabulary would be greatly desired. To do this, a `discussion <https://github.com/AshkanFeyzollahi/fa-spellchecker/discussions>`__ could be started on GitHub or pull requests to update the include and exclude files could be added.
Credits
-------------------------------------------------------------------------------
* `Peter Norvig <https://norvig.com/spell-correct.html>`__ blog post on setting up a simple spell checking algorithm.
* `persiannlp/persian-raw-text <https://github.com/persiannlp/persian-raw-text>`__ Contains a huge amount of Persian text such as Persian corpora. VOA corpus was collected from this repository in order to create a word frequency list!
Raw data
{
"_id": null,
"home_page": null,
"name": "fa-spellchecker",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "python, spelling, natural language processing, nlp, typo, checker, persian",
"author": null,
"author_email": "Ashkan Feyzollahi <ashkanfeyzollahi@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/6d/96/db42b9d9712352e5043b318917a47c924139834f58d08ac7ae671844d47e/fa_spellchecker-0.1.2.tar.gz",
"platform": null,
"description": "Persian SpellChecker\n===============================================================================\n\n.. image:: https://img.shields.io/badge/license-MIT-blue.svg\n :target: https://opensource.org/licenses/MIT/\n :alt: License\n.. image:: https://img.shields.io/github/v/release/AshkanFeyzollahi/fa-spellchecker\n :target: https://github.com/AshkanFeyzollahi/fa-spellchecker/releases/\n :alt: GitHub Release\n.. image:: https://img.shields.io/pypi/v/fa-spellchecker\n :target: https://pypi.org/project/fa-spellchecker/\n :alt: PyPI - Version\n.. image:: https://img.shields.io/pypi/dm/fa-spellchecker\n :target: https://pypi.org/project/fa-spellchecker/\n :alt: PyPI - Downloads\n.. image:: https://img.shields.io/readthedocs/fa-spellchecker\n :target: https://fa-spellchecker.readthedocs.io/en/latest/\n :alt: Read the Docs\n\nPure Python Persian Spell Checking based on `Peter Norvig's blog post <https://norvig.com/spell-correct.html>`__ on setting up a simple spell checking algorithm and also inspired by `pyspellchecker <https://github.com/barrust/pyspellchecker>`__.\n\nAs said in **pyspellchecker**, It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results.\n\n**fa-spellchecker** is **ONLY** made for the language **Persian**, and it requires `Python>=3.7` to work properly. For information on how the Persian vocabulary was created and how it can be updated and improved, please see the **Vocabulary Creation and Updating** section of the readme!\n\nInstallation\n-------------------------------------------------------------------------------\n\nTo start using **fa-spellchecker** in your Python3 projects, you have to install it which is done by ways below:\n\n 1. Package could be installed by using the python package manager **Pip**:\n\n .. code:: bash\n\n pip install fa-spellchecker\n\n 2. Building package from its source:\n\n .. code:: bash\n\n git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git\n cd fa-spellchecker\n python -m build\n\n\nQuickstart\n-------------------------------------------------------------------------------\n\nTo get started using **fa-spellchecker**, you must install it, if it's not installed already check the **Installation** section in readme! Here are some examples for **fa-spellchecker**:\n\n 1. A simple Persian word correction example:\n\n .. code:: python\n\n # Import SpellChecker object from faspellchecker\n from faspellchecker import SpellChecker\n\n # Initialize a faspellchecker.SpellChecker instance\n spellchecker = SpellChecker()\n\n # Correct the Persian misspelled word\n print(spellchecker.correction('\u0633\u0627\u0628\u0648\u0646')) # '\u0635\u0627\u0628\u0648\u0646'\n\n 2. An advanced correction for Persian words found in a sentence with the help of `hazm <https://github.com/roshan-research/hazm>`__:\n\n .. code:: python\n\n # Import dependencies\n from hazm import word_tokenize\n\n from faspellchecker import SpellChecker\n\n # Define a sentence of Persian words\n a_persian_sentence = \"\u0645\u0646 \u0628\u0647 \u067e\u0627\u0631\u06a9 \u0631\u0641\u062a\u0645 \u0648 \u062f\u0631 \u0622\u0646\u062c\u0627 \u0628\u0627 \u062f\u0648\u0634\u062a \u0647\u0627\u06cc\u0645 \u0628\u0627\u0632\u06cc \u06a9\u0631\u062f\u0645\"\n\n # Tokenize the sentence into a list of words\n tokenized_sentence = word_tokenize(a_persian_sentence)\n\n # Initialize a faspellchecker.SpellChecker instance\n spellchecker = SpellChecker()\n\n # Find all misspelled words\n for misspelled_word in spellchecker.unknown(tokenized_sentence):\n # And display a list of correct words based on misspelled word\n print(spellchecker.candidates(misspelled_word))\n\nFor more information on how to use this package, check out `On-line documentations <https://fa-spellchecker.readthedocs.io/en/latest/>`__!\n\nVocabulary Creation and Updating\n-------------------------------------------------------------------------------\n\nI have provided a script that, given a text file of words & sentences (in this case from the txt files in the folder `resources <resources/>`__) it will generate a *Persian* word frequency list based on the words found within the text.\n\nAdding new files to `resources <resources/>`__ will lead to force the `scripts/build_vocabulary.py` to use them as a resource to build a Persian vocabulary file which then that vocabulary file will be used by `faspellchecker`.\n\nThe easiest way to build Persian vocabulary files using the `scripts/build_vocabulary.py`:\n\n.. code:: bash\n\n git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git\n cd fa-spellchecker\n python scripts/build_vocabulary.py\n\nAny help in updating and maintaining the vocabulary would be greatly desired. To do this, a `discussion <https://github.com/AshkanFeyzollahi/fa-spellchecker/discussions>`__ could be started on GitHub or pull requests to update the include and exclude files could be added.\n\nCredits\n-------------------------------------------------------------------------------\n\n* `Peter Norvig <https://norvig.com/spell-correct.html>`__ blog post on setting up a simple spell checking algorithm.\n* `persiannlp/persian-raw-text <https://github.com/persiannlp/persian-raw-text>`__ Contains a huge amount of Persian text such as Persian corpora. VOA corpus was collected from this repository in order to create a word frequency list!\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Pure Python Spell Checking",
"version": "0.1.2",
"project_urls": {
"bug-tracker": "https://github.com/AshkanFeyzollahi/fa-spellchecker/issues",
"documentation": "https://fa-spellchecker.readthedocs.io/en/latest/",
"homepage": "https://github.com/AshkanFeyzollahi/fa-spellchecker"
},
"split_keywords": [
"python",
" spelling",
" natural language processing",
" nlp",
" typo",
" checker",
" persian"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "561505fcc1a6be99d9b35d81e9df223e9d6cb839bdc9558b5786027a4f82cdb2",
"md5": "a776deed65b0f171242a521e172646b7",
"sha256": "06186b22fc69c6662f92966f4d4efb96d062507b44e216abcbe370cdff3f2db8"
},
"downloads": -1,
"filename": "fa_spellchecker-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a776deed65b0f171242a521e172646b7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 691816,
"upload_time": "2024-12-22T10:03:30",
"upload_time_iso_8601": "2024-12-22T10:03:30.677044Z",
"url": "https://files.pythonhosted.org/packages/56/15/05fcc1a6be99d9b35d81e9df223e9d6cb839bdc9558b5786027a4f82cdb2/fa_spellchecker-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6d96db42b9d9712352e5043b318917a47c924139834f58d08ac7ae671844d47e",
"md5": "54e37e1a7bbcfb55ea56cbb0fe0d0223",
"sha256": "810afaab301804f59c25d26ede2e8bfb05171f47483cc78751b12db6041d4969"
},
"downloads": -1,
"filename": "fa_spellchecker-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "54e37e1a7bbcfb55ea56cbb0fe0d0223",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 691172,
"upload_time": "2024-12-22T10:03:33",
"upload_time_iso_8601": "2024-12-22T10:03:33.722657Z",
"url": "https://files.pythonhosted.org/packages/6d/96/db42b9d9712352e5043b318917a47c924139834f58d08ac7ae671844d47e/fa_spellchecker-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-22 10:03:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AshkanFeyzollahi",
"github_project": "fa-spellchecker",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "fa-spellchecker"
}