Jaro Winkler Distance
=====================
.. image:: https://travis-ci.org/nap/jaro-winkler-distance.svg?branch=master
:target: https://travis-ci.org/nap/jaro-winkler-distance
.. image:: https://coveralls.io/repos/nap/jaro-winkler-distance/badge.svg?branch=master&service=github
:target: https://coveralls.io/github/nap/jaro-winkler-distance?branch=master
.. image:: https://img.shields.io/github/license/nap/jaro-winkler-distance.svg
:target: https://raw.githubusercontent.com/nap/jaro-winkler-distance/master/LICENSE
.. image:: https://img.shields.io/pypi/pyversions/pyjarowinkler.svg
:target: https://pypi.python.org/pypi/pyjarowinkler
Find the Jaro Winkler Distance which indicates the similarity score between two Strings.
The Jaro measure is the weighted sum of percentage of matched characters from each file
and transposed characters. Winkler increased this measure for matching initial characters.
The Implementation
------------------
The original implementation is based on the `Jaro Winkler Similarity Algorithm <http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance>`_ article that can be found on `Wikipedia <http://wikipedia.org>`_.
This Python version of the original implementation is based on the `Apache StringUtils <http://commons.apache.org/proper/commons-lang/apidocs/src-html/org/apache/commons/lang3/StringUtils.html#line.7141>`_ library.
Correctness
-----------
Unittest similar to what you will find in the ``StringUtils`` library were used to validate implementation.
Note
----
A limit of ``shorter / 2 + 1`` is used in StringUtils, this differs from Wikipedia and also `Winkler's paper <http://www.amstat.org/sections/srms/Proceedings/papers/1990_056.pdf>`_, where a distance of ``longer / 2 - 1`` is used, corresponding to positions of ``longer / 2``.
As of ``version 1.8``, the algorithm now correctly works with the ``"CTRATE" - "TRACE"`` example from Wikipedia.
Example
-------
::
>>> from pyjarowinkler import distance
>>> # Scaling is 0.1 by default
>>> print distance.get_jaro_distance("hello", "haloa", winkler=True, scaling=0.1)
0.76
>>> print distance.get_jaro_distance("hello", "haloa", winkler=False, scaling=0.1)
0.733333333333
:Version: 1.8 of 2016-03-22
Raw data
{
"_id": null,
"home_page": "https://github.com/nap/jaro-winkler-distance",
"name": "pyjarowinkler",
"maintainer": "",
"docs_url": null,
"requires_python": null,
"maintainer_email": "",
"keywords": "jaro winkler distance score string delta diff",
"author": "Jean-Bernard Ratte",
"author_email": "jean.bernard.ratte@unary.ca",
"download_url": "https://files.pythonhosted.org/packages/04/c2/d560c1eebd87b668394daee4ac07959bc1a00db56364b86863470a8c23e4/pyjarowinkler-1.8.tar.gz",
"platform": "Linux",
"description": "Jaro Winkler Distance\r\n=====================\r\n\r\n.. image:: https://travis-ci.org/nap/jaro-winkler-distance.svg?branch=master\r\n :target: https://travis-ci.org/nap/jaro-winkler-distance\r\n.. image:: https://coveralls.io/repos/nap/jaro-winkler-distance/badge.svg?branch=master&service=github\r\n :target: https://coveralls.io/github/nap/jaro-winkler-distance?branch=master\r\n.. image:: https://img.shields.io/github/license/nap/jaro-winkler-distance.svg\r\n :target: https://raw.githubusercontent.com/nap/jaro-winkler-distance/master/LICENSE\r\n.. image:: https://img.shields.io/pypi/pyversions/pyjarowinkler.svg\r\n :target: https://pypi.python.org/pypi/pyjarowinkler\r\n\r\nFind the Jaro Winkler Distance which indicates the similarity score between two Strings.\r\nThe Jaro measure is the weighted sum of percentage of matched characters from each file\r\nand transposed characters. Winkler increased this measure for matching initial characters.\r\n\r\nThe Implementation\r\n------------------\r\nThe original implementation is based on the `Jaro Winkler Similarity Algorithm <http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance>`_ article that can be found on `Wikipedia <http://wikipedia.org>`_.\r\nThis Python version of the original implementation is based on the `Apache StringUtils <http://commons.apache.org/proper/commons-lang/apidocs/src-html/org/apache/commons/lang3/StringUtils.html#line.7141>`_ library.\r\n\r\nCorrectness\r\n-----------\r\nUnittest similar to what you will find in the ``StringUtils`` library were used to validate implementation.\r\n\r\nNote\r\n----\r\nA limit of ``shorter / 2 + 1`` is used in StringUtils, this differs from Wikipedia and also `Winkler's paper <http://www.amstat.org/sections/srms/Proceedings/papers/1990_056.pdf>`_, where a distance of ``longer / 2 - 1`` is used, corresponding to positions of ``longer / 2``.\r\nAs of ``version 1.8``, the algorithm now correctly works with the ``\"CTRATE\" - \"TRACE\"`` example from Wikipedia.\r\n\r\nExample\r\n-------\r\n\r\n::\r\n\r\n >>> from pyjarowinkler import distance\r\n >>> # Scaling is 0.1 by default\r\n >>> print distance.get_jaro_distance(\"hello\", \"haloa\", winkler=True, scaling=0.1)\r\n 0.76\r\n >>> print distance.get_jaro_distance(\"hello\", \"haloa\", winkler=False, scaling=0.1)\r\n 0.733333333333\r\n\r\n:Version: 1.8 of 2016-03-22",
"bugtrack_url": null,
"license": "http://www.apache.org/licenses/",
"summary": "Find the Jaro Winkler Distance which indicates the similarity score between two Strings",
"version": "1.8",
"project_urls": {
"Download": "https://github.com/nap/jaro-winkler-distance/archive/v1.8.zip",
"Homepage": "https://github.com/nap/jaro-winkler-distance"
},
"split_keywords": [
"jaro",
"winkler",
"distance",
"score",
"string",
"delta",
"diff"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b958b89073047b447e02b08d4f64fbb984e5a4dfef4134477350b256c625c779",
"md5": "fc9a5bd0344c24c10cf57e7dce6e0370",
"sha256": "dc80f4e606a6384729a577d0a0dff5aceadb9efbe19bd0fc04e79d55ffd1e0aa"
},
"downloads": -1,
"filename": "pyjarowinkler-1.8-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "fc9a5bd0344c24c10cf57e7dce6e0370",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 5904,
"upload_time": "2016-03-23T02:09:37",
"upload_time_iso_8601": "2016-03-23T02:09:37.939536Z",
"url": "https://files.pythonhosted.org/packages/b9/58/b89073047b447e02b08d4f64fbb984e5a4dfef4134477350b256c625c779/pyjarowinkler-1.8-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "04c2d560c1eebd87b668394daee4ac07959bc1a00db56364b86863470a8c23e4",
"md5": "82b244b397493e53a70cd05db498fb3c",
"sha256": "49828834eddae6a078ee1329dca572541192a3f49e407608f4063c692c1ef1df"
},
"downloads": -1,
"filename": "pyjarowinkler-1.8.tar.gz",
"has_sig": false,
"md5_digest": "82b244b397493e53a70cd05db498fb3c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 4589,
"upload_time": "2016-03-23T02:09:46",
"upload_time_iso_8601": "2016-03-23T02:09:46.887986Z",
"url": "https://files.pythonhosted.org/packages/04/c2/d560c1eebd87b668394daee4ac07959bc1a00db56364b86863470a8c23e4/pyjarowinkler-1.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2016-03-23 02:09:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nap",
"github_project": "jaro-winkler-distance",
"travis_ci": true,
"coveralls": false,
"github_actions": false,
"tox": true,
"lcname": "pyjarowinkler"
}