thefuzz


Namethefuzz JSON
Version 0.22.1 PyPI version JSON
download
home_pagehttps://github.com/seatgeek/thefuzz
SummaryFuzzy string matching in python
upload_time2024-01-19 19:18:23
maintainer
docs_urlNone
authorAdam Cohen
requires_python>=3.8
licenseMIT
keywords
VCS
bugtrack_url
requirements rapidfuzz pycodestyle hypothesis pytest docutils Pygments wheel setuptools gitchangelog restructuredtext_lint
Travis-CI No Travis.
coveralls test coverage No coveralls.
            .. image:: https://github.com/seatgeek/thefuzz/actions/workflows/ci.yml/badge.svg
    :target: https://github.com/seatgeek/thefuzz

TheFuzz
=======

Fuzzy string matching like a boss. It uses `Levenshtein Distance <https://en.wikipedia.org/wiki/Levenshtein_distance>`_ to calculate the differences between sequences in a simple-to-use package.

Requirements
============

-  Python 3.8 or higher
-  `rapidfuzz <https://github.com/maxbachmann/RapidFuzz/>`_

For testing
~~~~~~~~~~~
-  pycodestyle
-  hypothesis
-  pytest

Installation
============

Using pip via PyPI

.. code:: bash

    pip install thefuzz


Using pip via GitHub

.. code:: bash

    pip install git+git://github.com/seatgeek/thefuzz.git@0.19.0#egg=thefuzz

Adding to your ``requirements.txt`` file (run ``pip install -r requirements.txt`` afterwards)

.. code:: bash

    git+ssh://git@github.com/seatgeek/thefuzz.git@0.19.0#egg=thefuzz

Manually via GIT

.. code:: bash

    git clone git://github.com/seatgeek/thefuzz.git thefuzz
    cd thefuzz
    python setup.py install


Usage
=====

.. code:: python

    >>> from thefuzz import fuzz
    >>> from thefuzz import process

Simple Ratio
~~~~~~~~~~~~

.. code:: python

    >>> fuzz.ratio("this is a test", "this is a test!")
        97

Partial Ratio
~~~~~~~~~~~~~

.. code:: python

    >>> fuzz.partial_ratio("this is a test", "this is a test!")
        100

Token Sort Ratio
~~~~~~~~~~~~~~~~

.. code:: python

    >>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
        91
    >>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
        100

Token Set Ratio
~~~~~~~~~~~~~~~

.. code:: python

    >>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
        84
    >>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
        100

Partial Token Sort Ratio
~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: python

    >>> fuzz.token_sort_ratio("fuzzy was a bear", "wuzzy fuzzy was a bear")
        84
    >>> fuzz.partial_token_sort_ratio("fuzzy was a bear", "wuzzy fuzzy was a bear")
        100

Process
~~~~~~~

.. code:: python

    >>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
    >>> process.extract("new york jets", choices, limit=2)
        [('New York Jets', 100), ('New York Giants', 78)]
    >>> process.extractOne("cowboys", choices)
        ("Dallas Cowboys", 90)

You can also pass additional parameters to ``extractOne`` method to make it use a specific scorer. A typical use case is to match file paths:

.. code:: python

    >>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
        ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
    >>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
        ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

.. |Build Status| image:: https://github.com/seatgeek/thefuzz/actions/workflows/ci.yml/badge.svg
   :target: https://github.com/seatgeek/thefuzz

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/seatgeek/thefuzz",
    "name": "thefuzz",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "",
    "author": "Adam Cohen",
    "author_email": "adam@seatgeek.com",
    "download_url": "https://files.pythonhosted.org/packages/81/4b/d3eb25831590d6d7d38c2f2e3561d3ba41d490dc89cd91d9e65e7c812508/thefuzz-0.22.1.tar.gz",
    "platform": null,
    "description": ".. image:: https://github.com/seatgeek/thefuzz/actions/workflows/ci.yml/badge.svg\n    :target: https://github.com/seatgeek/thefuzz\n\nTheFuzz\n=======\n\nFuzzy string matching like a boss. It uses `Levenshtein Distance <https://en.wikipedia.org/wiki/Levenshtein_distance>`_ to calculate the differences between sequences in a simple-to-use package.\n\nRequirements\n============\n\n-  Python 3.8 or higher\n-  `rapidfuzz <https://github.com/maxbachmann/RapidFuzz/>`_\n\nFor testing\n~~~~~~~~~~~\n-  pycodestyle\n-  hypothesis\n-  pytest\n\nInstallation\n============\n\nUsing pip via PyPI\n\n.. code:: bash\n\n    pip install thefuzz\n\n\nUsing pip via GitHub\n\n.. code:: bash\n\n    pip install git+git://github.com/seatgeek/thefuzz.git@0.19.0#egg=thefuzz\n\nAdding to your ``requirements.txt`` file (run ``pip install -r requirements.txt`` afterwards)\n\n.. code:: bash\n\n    git+ssh://git@github.com/seatgeek/thefuzz.git@0.19.0#egg=thefuzz\n\nManually via GIT\n\n.. code:: bash\n\n    git clone git://github.com/seatgeek/thefuzz.git thefuzz\n    cd thefuzz\n    python setup.py install\n\n\nUsage\n=====\n\n.. code:: python\n\n    >>> from thefuzz import fuzz\n    >>> from thefuzz import process\n\nSimple Ratio\n~~~~~~~~~~~~\n\n.. code:: python\n\n    >>> fuzz.ratio(\"this is a test\", \"this is a test!\")\n        97\n\nPartial Ratio\n~~~~~~~~~~~~~\n\n.. code:: python\n\n    >>> fuzz.partial_ratio(\"this is a test\", \"this is a test!\")\n        100\n\nToken Sort Ratio\n~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n    >>> fuzz.ratio(\"fuzzy wuzzy was a bear\", \"wuzzy fuzzy was a bear\")\n        91\n    >>> fuzz.token_sort_ratio(\"fuzzy wuzzy was a bear\", \"wuzzy fuzzy was a bear\")\n        100\n\nToken Set Ratio\n~~~~~~~~~~~~~~~\n\n.. code:: python\n\n    >>> fuzz.token_sort_ratio(\"fuzzy was a bear\", \"fuzzy fuzzy was a bear\")\n        84\n    >>> fuzz.token_set_ratio(\"fuzzy was a bear\", \"fuzzy fuzzy was a bear\")\n        100\n\nPartial Token Sort Ratio\n~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n    >>> fuzz.token_sort_ratio(\"fuzzy was a bear\", \"wuzzy fuzzy was a bear\")\n        84\n    >>> fuzz.partial_token_sort_ratio(\"fuzzy was a bear\", \"wuzzy fuzzy was a bear\")\n        100\n\nProcess\n~~~~~~~\n\n.. code:: python\n\n    >>> choices = [\"Atlanta Falcons\", \"New York Jets\", \"New York Giants\", \"Dallas Cowboys\"]\n    >>> process.extract(\"new york jets\", choices, limit=2)\n        [('New York Jets', 100), ('New York Giants', 78)]\n    >>> process.extractOne(\"cowboys\", choices)\n        (\"Dallas Cowboys\", 90)\n\nYou can also pass additional parameters to ``extractOne`` method to make it use a specific scorer. A typical use case is to match file paths:\n\n.. code:: python\n\n    >>> process.extractOne(\"System of a down - Hypnotize - Heroin\", songs)\n        ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)\n    >>> process.extractOne(\"System of a down - Hypnotize - Heroin\", songs, scorer=fuzz.token_sort_ratio)\n        (\"/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3\", 61)\n\n.. |Build Status| image:: https://github.com/seatgeek/thefuzz/actions/workflows/ci.yml/badge.svg\n   :target: https://github.com/seatgeek/thefuzz\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Fuzzy string matching in python",
    "version": "0.22.1",
    "project_urls": {
        "Homepage": "https://github.com/seatgeek/thefuzz"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "824f1695e70ceb3604f19eda9908e289c687ea81c4fecef4d90a9d1d0f2f7ae9",
                "md5": "f04e6215c71dde3e79d55b3911acd351",
                "sha256": "59729b33556850b90e1093c4cf9e618af6f2e4c985df193fdf3c5b5cf02ca481"
            },
            "downloads": -1,
            "filename": "thefuzz-0.22.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f04e6215c71dde3e79d55b3911acd351",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 8245,
            "upload_time": "2024-01-19T19:18:20",
            "upload_time_iso_8601": "2024-01-19T19:18:20.362755Z",
            "url": "https://files.pythonhosted.org/packages/82/4f/1695e70ceb3604f19eda9908e289c687ea81c4fecef4d90a9d1d0f2f7ae9/thefuzz-0.22.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "814bd3eb25831590d6d7d38c2f2e3561d3ba41d490dc89cd91d9e65e7c812508",
                "md5": "0b7ec0d80b46c90d113df62892d78395",
                "sha256": "7138039a7ecf540da323792d8592ef9902b1d79eb78c147d4f20664de79f3680"
            },
            "downloads": -1,
            "filename": "thefuzz-0.22.1.tar.gz",
            "has_sig": false,
            "md5_digest": "0b7ec0d80b46c90d113df62892d78395",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 19993,
            "upload_time": "2024-01-19T19:18:23",
            "upload_time_iso_8601": "2024-01-19T19:18:23.135879Z",
            "url": "https://files.pythonhosted.org/packages/81/4b/d3eb25831590d6d7d38c2f2e3561d3ba41d490dc89cd91d9e65e7c812508/thefuzz-0.22.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-19 19:18:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "seatgeek",
    "github_project": "thefuzz",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "rapidfuzz",
            "specs": [
                [
                    "==",
                    "3.4.0"
                ]
            ]
        },
        {
            "name": "pycodestyle",
            "specs": [
                [
                    "==",
                    "2.11.1"
                ]
            ]
        },
        {
            "name": "hypothesis",
            "specs": [
                [
                    "==",
                    "6.88.1"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "7.4.3"
                ]
            ]
        },
        {
            "name": "docutils",
            "specs": [
                [
                    "==",
                    "0.20.1"
                ]
            ]
        },
        {
            "name": "Pygments",
            "specs": [
                [
                    "==",
                    "2.16.1"
                ]
            ]
        },
        {
            "name": "wheel",
            "specs": [
                [
                    "==",
                    "0.41.3"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    "==",
                    "68.2.2"
                ]
            ]
        },
        {
            "name": "gitchangelog",
            "specs": [
                [
                    "==",
                    "3.0.4"
                ]
            ]
        },
        {
            "name": "restructuredtext_lint",
            "specs": [
                [
                    "==",
                    "1.4.0"
                ]
            ]
        }
    ],
    "tox": true,
    "lcname": "thefuzz"
}
        
Elapsed time: 0.19807s