.. image:: https://travis-ci.org/seatgeek/fuzzywuzzy.svg?branch=master
:target: https://travis-ci.org/seatgeek/fuzzywuzzy
FuzzyWuzzy
==========
Fuzzy string matching like a boss. It uses `Levenshtein Distance <https://en.wikipedia.org/wiki/Levenshtein_distance>`_ to calculate the differences between sequences in a simple-to-use package.
Requirements
============
- Python 2.7 or higher
- difflib
- `python-Levenshtein <https://github.com/ztane/python-Levenshtein/>`_ (optional, provides a 4-10x speedup in String
Matching, though may result in `differing results for certain cases <https://github.com/seatgeek/fuzzywuzzy/issues/128>`_)
For testing
~~~~~~~~~~~
- pycodestyle
- hypothesis
- pytest
Installation
============
Using PIP via PyPI
.. code:: bash
pip install fuzzywuzzy
or the following to install `python-Levenshtein` too
.. code:: bash
pip install fuzzywuzzy[speedup]
Using PIP via Github
.. code:: bash
pip install git+git://github.com/seatgeek/fuzzywuzzy.git@0.18.0#egg=fuzzywuzzy
Adding to your ``requirements.txt`` file (run ``pip install -r requirements.txt`` afterwards)
.. code:: bash
git+ssh://git@github.com/seatgeek/fuzzywuzzy.git@0.18.0#egg=fuzzywuzzy
Manually via GIT
.. code:: bash
git clone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy
cd fuzzywuzzy
python setup.py install
Usage
=====
.. code:: python
>>> from fuzzywuzzy import fuzz
>>> from fuzzywuzzy import process
Simple Ratio
~~~~~~~~~~~~
.. code:: python
>>> fuzz.ratio("this is a test", "this is a test!")
97
Partial Ratio
~~~~~~~~~~~~~
.. code:: python
>>> fuzz.partial_ratio("this is a test", "this is a test!")
100
Token Sort Ratio
~~~~~~~~~~~~~~~~
.. code:: python
>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
91
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
100
Token Set Ratio
~~~~~~~~~~~~~~~
.. code:: python
>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
84
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
100
Process
~~~~~~~
.. code:: python
>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
>>> process.extract("new york jets", choices, limit=2)
[('New York Jets', 100), ('New York Giants', 78)]
>>> process.extractOne("cowboys", choices)
("Dallas Cowboys", 90)
You can also pass additional parameters to ``extractOne`` method to make it use a specific scorer. A typical use case is to match file paths:
.. code:: python
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)
.. |Build Status| image:: https://api.travis-ci.org/seatgeek/fuzzywuzzy.png?branch=master
:target: https:travis-ci.org/seatgeek/fuzzywuzzy
Known Ports
============
FuzzyWuzzy is being ported to other languages too! Here are a few ports we know about:
- Java: `xpresso's fuzzywuzzy implementation <https://github.com/WantedTechnologies/xpresso/wiki/Approximate-string-comparison-and-pattern-matching-in-Java>`_
- Java: `fuzzywuzzy (java port) <https://github.com/xdrop/fuzzywuzzy>`_
- Rust: `fuzzyrusty (Rust port) <https://github.com/logannc/fuzzyrusty>`_
- JavaScript: `fuzzball.js (JavaScript port) <https://github.com/nol13/fuzzball.js>`_
- C++: `Tmplt/fuzzywuzzy <https://github.com/Tmplt/fuzzywuzzy>`_
- C#: `fuzzysharp (.Net port) <https://github.com/BoomTownRoi/BoomTown.FuzzySharp>`_
- Go: `go-fuzzywuzz (Go port) <https://github.com/paul-mannino/go-fuzzywuzzy>`_
- Free Pascal: `FuzzyWuzzy.pas (Free Pascal port) <https://github.com/DavidMoraisFerreira/FuzzyWuzzy.pas>`_
- Kotlin multiplatform: `FuzzyWuzzy-Kotlin <https://github.com/willowtreeapps/fuzzywuzzy-kotlin>`_
- R: `fuzzywuzzyR (R port) <https://github.com/mlampros/fuzzywuzzyR>`_
Raw data
{
"_id": null,
"home_page": "https://github.com/seatgeek/fuzzywuzzy",
"name": "fuzzywuzzy",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Adam Cohen",
"author_email": "adam@seatgeek.com",
"download_url": "https://files.pythonhosted.org/packages/11/4b/0a002eea91be6048a2b5d53c5f1b4dafd57ba2e36eea961d05086d7c28ce/fuzzywuzzy-0.18.0.tar.gz",
"platform": "",
"description": ".. image:: https://travis-ci.org/seatgeek/fuzzywuzzy.svg?branch=master\n :target: https://travis-ci.org/seatgeek/fuzzywuzzy\n\nFuzzyWuzzy\n==========\n\nFuzzy string matching like a boss. It uses `Levenshtein Distance <https://en.wikipedia.org/wiki/Levenshtein_distance>`_ to calculate the differences between sequences in a simple-to-use package.\n\nRequirements\n============\n\n- Python 2.7 or higher\n- difflib\n- `python-Levenshtein <https://github.com/ztane/python-Levenshtein/>`_ (optional, provides a 4-10x speedup in String\n Matching, though may result in `differing results for certain cases <https://github.com/seatgeek/fuzzywuzzy/issues/128>`_)\n\nFor testing\n~~~~~~~~~~~\n- pycodestyle\n- hypothesis\n- pytest\n\nInstallation\n============\n\nUsing PIP via PyPI\n\n.. code:: bash\n\n pip install fuzzywuzzy\n\nor the following to install `python-Levenshtein` too\n\n.. code:: bash\n\n pip install fuzzywuzzy[speedup]\n\n\nUsing PIP via Github\n\n.. code:: bash\n\n pip install git+git://github.com/seatgeek/fuzzywuzzy.git@0.18.0#egg=fuzzywuzzy\n\nAdding to your ``requirements.txt`` file (run ``pip install -r requirements.txt`` afterwards)\n\n.. code:: bash\n\n git+ssh://git@github.com/seatgeek/fuzzywuzzy.git@0.18.0#egg=fuzzywuzzy\n\nManually via GIT\n\n.. code:: bash\n\n git clone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy\n cd fuzzywuzzy\n python setup.py install\n\n\nUsage\n=====\n\n.. code:: python\n\n >>> from fuzzywuzzy import fuzz\n >>> from fuzzywuzzy import process\n\nSimple Ratio\n~~~~~~~~~~~~\n\n.. code:: python\n\n >>> fuzz.ratio(\"this is a test\", \"this is a test!\")\n 97\n\nPartial Ratio\n~~~~~~~~~~~~~\n\n.. code:: python\n\n >>> fuzz.partial_ratio(\"this is a test\", \"this is a test!\")\n 100\n\nToken Sort Ratio\n~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n >>> fuzz.ratio(\"fuzzy wuzzy was a bear\", \"wuzzy fuzzy was a bear\")\n 91\n >>> fuzz.token_sort_ratio(\"fuzzy wuzzy was a bear\", \"wuzzy fuzzy was a bear\")\n 100\n\nToken Set Ratio\n~~~~~~~~~~~~~~~\n\n.. code:: python\n\n >>> fuzz.token_sort_ratio(\"fuzzy was a bear\", \"fuzzy fuzzy was a bear\")\n 84\n >>> fuzz.token_set_ratio(\"fuzzy was a bear\", \"fuzzy fuzzy was a bear\")\n 100\n\nProcess\n~~~~~~~\n\n.. code:: python\n\n >>> choices = [\"Atlanta Falcons\", \"New York Jets\", \"New York Giants\", \"Dallas Cowboys\"]\n >>> process.extract(\"new york jets\", choices, limit=2)\n [('New York Jets', 100), ('New York Giants', 78)]\n >>> process.extractOne(\"cowboys\", choices)\n (\"Dallas Cowboys\", 90)\n\nYou can also pass additional parameters to ``extractOne`` method to make it use a specific scorer. A typical use case is to match file paths:\n\n.. code:: python\n\n >>> process.extractOne(\"System of a down - Hypnotize - Heroin\", songs)\n ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)\n >>> process.extractOne(\"System of a down - Hypnotize - Heroin\", songs, scorer=fuzz.token_sort_ratio)\n (\"/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3\", 61)\n\n.. |Build Status| image:: https://api.travis-ci.org/seatgeek/fuzzywuzzy.png?branch=master\n :target: https:travis-ci.org/seatgeek/fuzzywuzzy\n\nKnown Ports\n============\n\nFuzzyWuzzy is being ported to other languages too! Here are a few ports we know about:\n\n- Java: `xpresso's fuzzywuzzy implementation <https://github.com/WantedTechnologies/xpresso/wiki/Approximate-string-comparison-and-pattern-matching-in-Java>`_\n- Java: `fuzzywuzzy (java port) <https://github.com/xdrop/fuzzywuzzy>`_\n- Rust: `fuzzyrusty (Rust port) <https://github.com/logannc/fuzzyrusty>`_\n- JavaScript: `fuzzball.js (JavaScript port) <https://github.com/nol13/fuzzball.js>`_\n- C++: `Tmplt/fuzzywuzzy <https://github.com/Tmplt/fuzzywuzzy>`_\n- C#: `fuzzysharp (.Net port) <https://github.com/BoomTownRoi/BoomTown.FuzzySharp>`_\n- Go: `go-fuzzywuzz (Go port) <https://github.com/paul-mannino/go-fuzzywuzzy>`_\n- Free Pascal: `FuzzyWuzzy.pas (Free Pascal port) <https://github.com/DavidMoraisFerreira/FuzzyWuzzy.pas>`_\n- Kotlin multiplatform: `FuzzyWuzzy-Kotlin <https://github.com/willowtreeapps/fuzzywuzzy-kotlin>`_\n- R: `fuzzywuzzyR (R port) <https://github.com/mlampros/fuzzywuzzyR>`_\n\n\n",
"bugtrack_url": null,
"license": "GPLv2",
"summary": "Fuzzy string matching in python",
"version": "0.18.0",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "237450dba93f7226c7dfbdd04a1355c6",
"sha256": "928244b28db720d1e0ee7587acf660ea49d7e4c632569cad4f1cd7e68a5f0993"
},
"downloads": -1,
"filename": "fuzzywuzzy-0.18.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "237450dba93f7226c7dfbdd04a1355c6",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 18272,
"upload_time": "2020-02-13T21:06:25",
"upload_time_iso_8601": "2020-02-13T21:06:25.209912Z",
"url": "https://files.pythonhosted.org/packages/43/ff/74f23998ad2f93b945c0309f825be92e04e0348e062026998b5eefef4c33/fuzzywuzzy-0.18.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "29708593c35b1ca67c329f853d9abcd0",
"sha256": "45016e92264780e58972dca1b3d939ac864b78437422beecebb3095f8efd00e8"
},
"downloads": -1,
"filename": "fuzzywuzzy-0.18.0.tar.gz",
"has_sig": false,
"md5_digest": "29708593c35b1ca67c329f853d9abcd0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 28888,
"upload_time": "2020-02-13T21:06:27",
"upload_time_iso_8601": "2020-02-13T21:06:27.054783Z",
"url": "https://files.pythonhosted.org/packages/11/4b/0a002eea91be6048a2b5d53c5f1b4dafd57ba2e36eea961d05086d7c28ce/fuzzywuzzy-0.18.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2020-02-13 21:06:27",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "seatgeek",
"github_project": "fuzzywuzzy",
"travis_ci": true,
"coveralls": false,
"github_actions": false,
"tox": true,
"lcname": "fuzzywuzzy"
}