fa-spellchecker


Namefa-spellchecker JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryPure Python Spell Checking
upload_time2024-12-22 10:03:33
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseMIT
keywords python spelling natural language processing nlp typo checker persian
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Persian SpellChecker
===============================================================================

.. image:: https://img.shields.io/badge/license-MIT-blue.svg
    :target: https://opensource.org/licenses/MIT/
    :alt: License
.. image:: https://img.shields.io/github/v/release/AshkanFeyzollahi/fa-spellchecker
    :target: https://github.com/AshkanFeyzollahi/fa-spellchecker/releases/
    :alt: GitHub Release
.. image:: https://img.shields.io/pypi/v/fa-spellchecker
    :target: https://pypi.org/project/fa-spellchecker/
    :alt: PyPI - Version
.. image:: https://img.shields.io/pypi/dm/fa-spellchecker
    :target: https://pypi.org/project/fa-spellchecker/
    :alt: PyPI - Downloads
.. image:: https://img.shields.io/readthedocs/fa-spellchecker
    :target: https://fa-spellchecker.readthedocs.io/en/latest/
    :alt: Read the Docs

Pure Python Persian Spell Checking based on `Peter Norvig's blog post <https://norvig.com/spell-correct.html>`__ on setting up a simple spell checking algorithm and also inspired by `pyspellchecker <https://github.com/barrust/pyspellchecker>`__.

As said in **pyspellchecker**, It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results.

**fa-spellchecker** is **ONLY** made for the language **Persian**, and it requires `Python>=3.7` to work properly. For information on how the Persian vocabulary was created and how it can be updated and improved, please see the **Vocabulary Creation and Updating** section of the readme!

Installation
-------------------------------------------------------------------------------

To start using **fa-spellchecker** in your Python3 projects, you have to install it which is done by ways below:

    1. Package could be installed by using the python package manager **Pip**:

    .. code:: bash

        pip install fa-spellchecker

    2. Building package from its source:

    .. code:: bash

        git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git
        cd fa-spellchecker
        python -m build


Quickstart
-------------------------------------------------------------------------------

To get started using **fa-spellchecker**, you must install it, if it's not installed already check the **Installation** section in readme! Here are some examples for **fa-spellchecker**:

    1. A simple Persian word correction example:

    .. code:: python

        # Import SpellChecker object from faspellchecker
        from faspellchecker import SpellChecker

        # Initialize a faspellchecker.SpellChecker instance
        spellchecker = SpellChecker()

        # Correct the Persian misspelled word
        print(spellchecker.correction('سابون')) # 'صابون'

    2. An advanced correction for Persian words found in a sentence with the help of `hazm <https://github.com/roshan-research/hazm>`__:

    .. code:: python

        # Import dependencies
        from hazm import word_tokenize

        from faspellchecker import SpellChecker

        # Define a sentence of Persian words
        a_persian_sentence = "من به پارک رفتم و در آنجا با دوشت هایم بازی کردم"

        # Tokenize the sentence into a list of words
        tokenized_sentence = word_tokenize(a_persian_sentence)

        # Initialize a faspellchecker.SpellChecker instance
        spellchecker = SpellChecker()

        # Find all misspelled words
        for misspelled_word in spellchecker.unknown(tokenized_sentence):
            # And display a list of correct words based on misspelled word
            print(spellchecker.candidates(misspelled_word))

For more information on how to use this package, check out `On-line documentations <https://fa-spellchecker.readthedocs.io/en/latest/>`__!

Vocabulary Creation and Updating
-------------------------------------------------------------------------------

I have provided a script that, given a text file of words & sentences (in this case from the txt files in the folder `resources <resources/>`__) it will generate a *Persian* word frequency list based on the words found within the text.

Adding new files to `resources <resources/>`__ will lead to force the `scripts/build_vocabulary.py` to use them as a resource to build a Persian vocabulary file which then that vocabulary file will be used by `faspellchecker`.

The easiest way to build Persian vocabulary files using the `scripts/build_vocabulary.py`:

.. code:: bash

    git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git
    cd fa-spellchecker
    python scripts/build_vocabulary.py

Any help in updating and maintaining the vocabulary would be greatly desired. To do this, a `discussion <https://github.com/AshkanFeyzollahi/fa-spellchecker/discussions>`__ could be started on GitHub or pull requests to update the include and exclude files could be added.

Credits
-------------------------------------------------------------------------------

* `Peter Norvig <https://norvig.com/spell-correct.html>`__ blog post on setting up a simple spell checking algorithm.
* `persiannlp/persian-raw-text <https://github.com/persiannlp/persian-raw-text>`__ Contains a huge amount of Persian text such as Persian corpora. VOA corpus was collected from this repository in order to create a word frequency list!

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fa-spellchecker",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "python, spelling, natural language processing, nlp, typo, checker, persian",
    "author": null,
    "author_email": "Ashkan Feyzollahi <ashkanfeyzollahi@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/6d/96/db42b9d9712352e5043b318917a47c924139834f58d08ac7ae671844d47e/fa_spellchecker-0.1.2.tar.gz",
    "platform": null,
    "description": "Persian SpellChecker\n===============================================================================\n\n.. image:: https://img.shields.io/badge/license-MIT-blue.svg\n    :target: https://opensource.org/licenses/MIT/\n    :alt: License\n.. image:: https://img.shields.io/github/v/release/AshkanFeyzollahi/fa-spellchecker\n    :target: https://github.com/AshkanFeyzollahi/fa-spellchecker/releases/\n    :alt: GitHub Release\n.. image:: https://img.shields.io/pypi/v/fa-spellchecker\n    :target: https://pypi.org/project/fa-spellchecker/\n    :alt: PyPI - Version\n.. image:: https://img.shields.io/pypi/dm/fa-spellchecker\n    :target: https://pypi.org/project/fa-spellchecker/\n    :alt: PyPI - Downloads\n.. image:: https://img.shields.io/readthedocs/fa-spellchecker\n    :target: https://fa-spellchecker.readthedocs.io/en/latest/\n    :alt: Read the Docs\n\nPure Python Persian Spell Checking based on `Peter Norvig's blog post <https://norvig.com/spell-correct.html>`__ on setting up a simple spell checking algorithm and also inspired by `pyspellchecker <https://github.com/barrust/pyspellchecker>`__.\n\nAs said in **pyspellchecker**, It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results.\n\n**fa-spellchecker** is **ONLY** made for the language **Persian**, and it requires `Python>=3.7` to work properly. For information on how the Persian vocabulary was created and how it can be updated and improved, please see the **Vocabulary Creation and Updating** section of the readme!\n\nInstallation\n-------------------------------------------------------------------------------\n\nTo start using **fa-spellchecker** in your Python3 projects, you have to install it which is done by ways below:\n\n    1. Package could be installed by using the python package manager **Pip**:\n\n    .. code:: bash\n\n        pip install fa-spellchecker\n\n    2. Building package from its source:\n\n    .. code:: bash\n\n        git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git\n        cd fa-spellchecker\n        python -m build\n\n\nQuickstart\n-------------------------------------------------------------------------------\n\nTo get started using **fa-spellchecker**, you must install it, if it's not installed already check the **Installation** section in readme! Here are some examples for **fa-spellchecker**:\n\n    1. A simple Persian word correction example:\n\n    .. code:: python\n\n        # Import SpellChecker object from faspellchecker\n        from faspellchecker import SpellChecker\n\n        # Initialize a faspellchecker.SpellChecker instance\n        spellchecker = SpellChecker()\n\n        # Correct the Persian misspelled word\n        print(spellchecker.correction('\u0633\u0627\u0628\u0648\u0646')) # '\u0635\u0627\u0628\u0648\u0646'\n\n    2. An advanced correction for Persian words found in a sentence with the help of `hazm <https://github.com/roshan-research/hazm>`__:\n\n    .. code:: python\n\n        # Import dependencies\n        from hazm import word_tokenize\n\n        from faspellchecker import SpellChecker\n\n        # Define a sentence of Persian words\n        a_persian_sentence = \"\u0645\u0646 \u0628\u0647 \u067e\u0627\u0631\u06a9 \u0631\u0641\u062a\u0645 \u0648 \u062f\u0631 \u0622\u0646\u062c\u0627 \u0628\u0627 \u062f\u0648\u0634\u062a \u0647\u0627\u06cc\u0645 \u0628\u0627\u0632\u06cc \u06a9\u0631\u062f\u0645\"\n\n        # Tokenize the sentence into a list of words\n        tokenized_sentence = word_tokenize(a_persian_sentence)\n\n        # Initialize a faspellchecker.SpellChecker instance\n        spellchecker = SpellChecker()\n\n        # Find all misspelled words\n        for misspelled_word in spellchecker.unknown(tokenized_sentence):\n            # And display a list of correct words based on misspelled word\n            print(spellchecker.candidates(misspelled_word))\n\nFor more information on how to use this package, check out `On-line documentations <https://fa-spellchecker.readthedocs.io/en/latest/>`__!\n\nVocabulary Creation and Updating\n-------------------------------------------------------------------------------\n\nI have provided a script that, given a text file of words & sentences (in this case from the txt files in the folder `resources <resources/>`__) it will generate a *Persian* word frequency list based on the words found within the text.\n\nAdding new files to `resources <resources/>`__ will lead to force the `scripts/build_vocabulary.py` to use them as a resource to build a Persian vocabulary file which then that vocabulary file will be used by `faspellchecker`.\n\nThe easiest way to build Persian vocabulary files using the `scripts/build_vocabulary.py`:\n\n.. code:: bash\n\n    git clone https://github.com/AshkanFeyzollahi/fa-spellchecker.git\n    cd fa-spellchecker\n    python scripts/build_vocabulary.py\n\nAny help in updating and maintaining the vocabulary would be greatly desired. To do this, a `discussion <https://github.com/AshkanFeyzollahi/fa-spellchecker/discussions>`__ could be started on GitHub or pull requests to update the include and exclude files could be added.\n\nCredits\n-------------------------------------------------------------------------------\n\n* `Peter Norvig <https://norvig.com/spell-correct.html>`__ blog post on setting up a simple spell checking algorithm.\n* `persiannlp/persian-raw-text <https://github.com/persiannlp/persian-raw-text>`__ Contains a huge amount of Persian text such as Persian corpora. VOA corpus was collected from this repository in order to create a word frequency list!\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Pure Python Spell Checking",
    "version": "0.1.2",
    "project_urls": {
        "bug-tracker": "https://github.com/AshkanFeyzollahi/fa-spellchecker/issues",
        "documentation": "https://fa-spellchecker.readthedocs.io/en/latest/",
        "homepage": "https://github.com/AshkanFeyzollahi/fa-spellchecker"
    },
    "split_keywords": [
        "python",
        " spelling",
        " natural language processing",
        " nlp",
        " typo",
        " checker",
        " persian"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "561505fcc1a6be99d9b35d81e9df223e9d6cb839bdc9558b5786027a4f82cdb2",
                "md5": "a776deed65b0f171242a521e172646b7",
                "sha256": "06186b22fc69c6662f92966f4d4efb96d062507b44e216abcbe370cdff3f2db8"
            },
            "downloads": -1,
            "filename": "fa_spellchecker-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a776deed65b0f171242a521e172646b7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 691816,
            "upload_time": "2024-12-22T10:03:30",
            "upload_time_iso_8601": "2024-12-22T10:03:30.677044Z",
            "url": "https://files.pythonhosted.org/packages/56/15/05fcc1a6be99d9b35d81e9df223e9d6cb839bdc9558b5786027a4f82cdb2/fa_spellchecker-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6d96db42b9d9712352e5043b318917a47c924139834f58d08ac7ae671844d47e",
                "md5": "54e37e1a7bbcfb55ea56cbb0fe0d0223",
                "sha256": "810afaab301804f59c25d26ede2e8bfb05171f47483cc78751b12db6041d4969"
            },
            "downloads": -1,
            "filename": "fa_spellchecker-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "54e37e1a7bbcfb55ea56cbb0fe0d0223",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 691172,
            "upload_time": "2024-12-22T10:03:33",
            "upload_time_iso_8601": "2024-12-22T10:03:33.722657Z",
            "url": "https://files.pythonhosted.org/packages/6d/96/db42b9d9712352e5043b318917a47c924139834f58d08ac7ae671844d47e/fa_spellchecker-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-22 10:03:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AshkanFeyzollahi",
    "github_project": "fa-spellchecker",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "fa-spellchecker"
}
        
Elapsed time: 0.68983s