fa-spellchecker


Namefa-spellchecker JSON
Version 0.3.1 PyPI version JSON
download
home_pageNone
SummaryPure Python Spell Checking
upload_time2025-01-06 06:49:23
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseMIT
keywords python spelling natural language processing nlp typo checker persian
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Persian SpellChecker
===============================================================================

.. image:: https://img.shields.io/badge/license-MIT-blue.svg
    :target: https://opensource.org/licenses/MIT/
    :alt: License
.. image:: https://img.shields.io/github/v/release/AshkanFeyzollahi/fa-spellchecker
    :target: https://github.com/AshkanFeyzollahi/fa-spellchecker/releases/
    :alt: GitHub Release
.. image:: https://img.shields.io/pypi/v/fa-spellchecker
    :target: https://pypi.org/project/fa-spellchecker/
    :alt: PyPI - Version
.. image:: https://img.shields.io/pypi/dm/fa-spellchecker
    :target: https://pypi.org/project/fa-spellchecker/
    :alt: PyPI - Downloads
.. image:: https://img.shields.io/readthedocs/fa-spellchecker
    :target: https://fa-spellchecker.readthedocs.io/en/latest/
    :alt: Read the Docs

Pure Python Persian Spell Checking based on `Peter Norvig's blog post <https://norvig.com/spell-correct.html>`__ on setting up a simple spell checking algorithm and also inspired by `pyspellchecker <https://github.com/barrust/pyspellchecker>`__.

As said in **pyspellchecker**, It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results.

**fa-spellchecker** is **ONLY** made for the language **Persian**, and it requires `Python>=3.7` to work properly. For information on how the Persian dictionary was created and how it can be updated and improved, please see the **Dictionary Creation and Updating** section of the readme!

Installation
-------------------------------------------------------------------------------

To start using **fa-spellchecker** in your Python3 projects, you have to install it which is done by ways below:

    1. Package could be installed by using the python package manager **Pip**:

    .. code:: bash

        pip install fa-spellchecker

    2. Building package from its source:

    .. code:: bash

        git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git
        cd fa-spellchecker
        python -m build


Quickstart
-------------------------------------------------------------------------------

To get started using **fa-spellchecker**, you must install it, if it's not installed already check the **Installation** section in readme! Here are some examples for **fa-spellchecker**:

    1. A simple Persian word correction example:

    .. code:: python

        # Import SpellChecker object from faspellchecker
        from faspellchecker import SpellChecker

        # Initialize a faspellchecker.SpellChecker instance
        spellchecker = SpellChecker()

        # Correct the Persian misspelled word
        print(spellchecker.correction('سابون')) # 'صابون'

    2. An advanced correction for Persian words found in a sentence with the help of `hazm <https://github.com/roshan-research/hazm>`__:

    .. code:: python

        # Import dependencies
        from hazm import word_tokenize

        from faspellchecker import SpellChecker
        from faspellchecker.utils import ignore_non_persian_words

        # Define a sentence of Persian words
        a_persian_sentence = "من به پارک رفتم و در آنجا با دوشت هایم بازی کردم"

        # Tokenize the sentence into a list of words
        tokenized_sentence = word_tokenize(a_persian_sentence)

        # Ignore the non Persian words (in this case there are no non Persian words
        # based on function `is_word_persian`, so this line will return the give list
        # itself)
        tokenized_sentence = ignore_non_persian_words(tokenized_sentence)

        # Initialize a faspellchecker.SpellChecker instance
        spellchecker = SpellChecker()

        # Find all misspelled words
        for misspelled_word in spellchecker.unknown(tokenized_sentence):
            # And display a list of correct words based on misspelled word
            print(spellchecker.candidates(misspelled_word))

For more information on how to use this package, check out `On-line documentations <https://fa-spellchecker.readthedocs.io/en/latest/>`__!

Dictionary Creation and Updating
-------------------------------------------------------------------------------

I have provided a script that, given a text file of words & sentences (in this case from the txt files in the folder `resources <resources/>`__) it will generate a *Persian* word frequency list based on the words found within the text.

Adding new files to `resources <resources/>`__ will lead to force the `scripts/build_dictionary.py` to use them as a resource to build a Persian dictionary file which then that dictionary file will be used by `faspellchecker`.

The easiest way to build Persian dictionary files using the `scripts/build_dictionary.py`:

.. code:: bash

    git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git
    cd fa-spellchecker
    python scripts/build_dictionary.py

Any help in updating and maintaining the dictionary would be greatly desired. To do this, a `discussion <https://github.com/AshkanFeyzollahi/fa-spellchecker/discussions>`__ could be started on GitHub or pull requests to update the include and exclude files could be added.

Credits
-------------------------------------------------------------------------------

* `Peter Norvig <https://norvig.com/spell-correct.html>`__ blog post on setting up a simple spell checking algorithm.
* `persiannlp/persian-raw-text <https://github.com/persiannlp/persian-raw-text>`__ Contains a huge amount of Persian text such as Persian corpora. VOA corpus was collected from this repository in order to create a word frequency list!

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fa-spellchecker",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "python, spelling, natural language processing, nlp, typo, checker, persian",
    "author": null,
    "author_email": "Ashkan Feyzollahi <ashkanfeyzollahi@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/00/1f/c027a2fc99b3f8ebf1b4a30e87a68bc61a3063586563d7e087f5662809fc/fa_spellchecker-0.3.1.tar.gz",
    "platform": null,
    "description": "Persian SpellChecker\n===============================================================================\n\n.. image:: https://img.shields.io/badge/license-MIT-blue.svg\n    :target: https://opensource.org/licenses/MIT/\n    :alt: License\n.. image:: https://img.shields.io/github/v/release/AshkanFeyzollahi/fa-spellchecker\n    :target: https://github.com/AshkanFeyzollahi/fa-spellchecker/releases/\n    :alt: GitHub Release\n.. image:: https://img.shields.io/pypi/v/fa-spellchecker\n    :target: https://pypi.org/project/fa-spellchecker/\n    :alt: PyPI - Version\n.. image:: https://img.shields.io/pypi/dm/fa-spellchecker\n    :target: https://pypi.org/project/fa-spellchecker/\n    :alt: PyPI - Downloads\n.. image:: https://img.shields.io/readthedocs/fa-spellchecker\n    :target: https://fa-spellchecker.readthedocs.io/en/latest/\n    :alt: Read the Docs\n\nPure Python Persian Spell Checking based on `Peter Norvig's blog post <https://norvig.com/spell-correct.html>`__ on setting up a simple spell checking algorithm and also inspired by `pyspellchecker <https://github.com/barrust/pyspellchecker>`__.\n\nAs said in **pyspellchecker**, It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results.\n\n**fa-spellchecker** is **ONLY** made for the language **Persian**, and it requires `Python>=3.7` to work properly. For information on how the Persian dictionary was created and how it can be updated and improved, please see the **Dictionary Creation and Updating** section of the readme!\n\nInstallation\n-------------------------------------------------------------------------------\n\nTo start using **fa-spellchecker** in your Python3 projects, you have to install it which is done by ways below:\n\n    1. Package could be installed by using the python package manager **Pip**:\n\n    .. code:: bash\n\n        pip install fa-spellchecker\n\n    2. Building package from its source:\n\n    .. code:: bash\n\n        git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git\n        cd fa-spellchecker\n        python -m build\n\n\nQuickstart\n-------------------------------------------------------------------------------\n\nTo get started using **fa-spellchecker**, you must install it, if it's not installed already check the **Installation** section in readme! Here are some examples for **fa-spellchecker**:\n\n    1. A simple Persian word correction example:\n\n    .. code:: python\n\n        # Import SpellChecker object from faspellchecker\n        from faspellchecker import SpellChecker\n\n        # Initialize a faspellchecker.SpellChecker instance\n        spellchecker = SpellChecker()\n\n        # Correct the Persian misspelled word\n        print(spellchecker.correction('\u0633\u0627\u0628\u0648\u0646')) # '\u0635\u0627\u0628\u0648\u0646'\n\n    2. An advanced correction for Persian words found in a sentence with the help of `hazm <https://github.com/roshan-research/hazm>`__:\n\n    .. code:: python\n\n        # Import dependencies\n        from hazm import word_tokenize\n\n        from faspellchecker import SpellChecker\n        from faspellchecker.utils import ignore_non_persian_words\n\n        # Define a sentence of Persian words\n        a_persian_sentence = \"\u0645\u0646 \u0628\u0647 \u067e\u0627\u0631\u06a9 \u0631\u0641\u062a\u0645 \u0648 \u062f\u0631 \u0622\u0646\u062c\u0627 \u0628\u0627 \u062f\u0648\u0634\u062a \u0647\u0627\u06cc\u0645 \u0628\u0627\u0632\u06cc \u06a9\u0631\u062f\u0645\"\n\n        # Tokenize the sentence into a list of words\n        tokenized_sentence = word_tokenize(a_persian_sentence)\n\n        # Ignore the non Persian words (in this case there are no non Persian words\n        # based on function `is_word_persian`, so this line will return the give list\n        # itself)\n        tokenized_sentence = ignore_non_persian_words(tokenized_sentence)\n\n        # Initialize a faspellchecker.SpellChecker instance\n        spellchecker = SpellChecker()\n\n        # Find all misspelled words\n        for misspelled_word in spellchecker.unknown(tokenized_sentence):\n            # And display a list of correct words based on misspelled word\n            print(spellchecker.candidates(misspelled_word))\n\nFor more information on how to use this package, check out `On-line documentations <https://fa-spellchecker.readthedocs.io/en/latest/>`__!\n\nDictionary Creation and Updating\n-------------------------------------------------------------------------------\n\nI have provided a script that, given a text file of words & sentences (in this case from the txt files in the folder `resources <resources/>`__) it will generate a *Persian* word frequency list based on the words found within the text.\n\nAdding new files to `resources <resources/>`__ will lead to force the `scripts/build_dictionary.py` to use them as a resource to build a Persian dictionary file which then that dictionary file will be used by `faspellchecker`.\n\nThe easiest way to build Persian dictionary files using the `scripts/build_dictionary.py`:\n\n.. code:: bash\n\n    git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git\n    cd fa-spellchecker\n    python scripts/build_dictionary.py\n\nAny help in updating and maintaining the dictionary would be greatly desired. To do this, a `discussion <https://github.com/AshkanFeyzollahi/fa-spellchecker/discussions>`__ could be started on GitHub or pull requests to update the include and exclude files could be added.\n\nCredits\n-------------------------------------------------------------------------------\n\n* `Peter Norvig <https://norvig.com/spell-correct.html>`__ blog post on setting up a simple spell checking algorithm.\n* `persiannlp/persian-raw-text <https://github.com/persiannlp/persian-raw-text>`__ Contains a huge amount of Persian text such as Persian corpora. VOA corpus was collected from this repository in order to create a word frequency list!\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Pure Python Spell Checking",
    "version": "0.3.1",
    "project_urls": {
        "bug-tracker": "https://github.com/AshkanFeyzollahi/fa-spellchecker/issues",
        "documentation": "https://fa-spellchecker.readthedocs.io/en/latest/",
        "homepage": "https://github.com/AshkanFeyzollahi/fa-spellchecker"
    },
    "split_keywords": [
        "python",
        " spelling",
        " natural language processing",
        " nlp",
        " typo",
        " checker",
        " persian"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9060898da66b68af54950373b58f3ae2a5b12c5c43a21a0026f11e62f39af0c1",
                "md5": "0cbdd2a28b06fdd4712c0a8a8ba2caa8",
                "sha256": "d60ed165da6410e27ea4d82b94bd9cc2e3ded95fd1aad3cbb53be92099f54c01"
            },
            "downloads": -1,
            "filename": "fa_spellchecker-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0cbdd2a28b06fdd4712c0a8a8ba2caa8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 692920,
            "upload_time": "2025-01-06T06:49:20",
            "upload_time_iso_8601": "2025-01-06T06:49:20.569560Z",
            "url": "https://files.pythonhosted.org/packages/90/60/898da66b68af54950373b58f3ae2a5b12c5c43a21a0026f11e62f39af0c1/fa_spellchecker-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "001fc027a2fc99b3f8ebf1b4a30e87a68bc61a3063586563d7e087f5662809fc",
                "md5": "edf4ad0fdb349b7b62fc17251f81d98b",
                "sha256": "db8e0ce8b85f720611d2f869cf7310e06ef43d93518634b10eca5962d1c48ad4"
            },
            "downloads": -1,
            "filename": "fa_spellchecker-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "edf4ad0fdb349b7b62fc17251f81d98b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 692136,
            "upload_time": "2025-01-06T06:49:23",
            "upload_time_iso_8601": "2025-01-06T06:49:23.874836Z",
            "url": "https://files.pythonhosted.org/packages/00/1f/c027a2fc99b3f8ebf1b4a30e87a68bc61a3063586563d7e087f5662809fc/fa_spellchecker-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-06 06:49:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AshkanFeyzollahi",
    "github_project": "fa-spellchecker",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "fa-spellchecker"
}
        
Elapsed time: 3.28622s