pyjarowinkler


Namepyjarowinkler JSON
Version 1.8 PyPI version JSON
download
home_pagehttps://github.com/nap/jaro-winkler-distance
SummaryFind the Jaro Winkler Distance which indicates the similarity score between two Strings
upload_time2016-03-23 02:09:46
maintainer
docs_urlNone
authorJean-Bernard Ratte
requires_pythonNone
licensehttp://www.apache.org/licenses/
keywords jaro winkler distance score string delta diff
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            Jaro Winkler Distance
=====================

.. image:: https://travis-ci.org/nap/jaro-winkler-distance.svg?branch=master
    :target: https://travis-ci.org/nap/jaro-winkler-distance
.. image:: https://coveralls.io/repos/nap/jaro-winkler-distance/badge.svg?branch=master&service=github
    :target: https://coveralls.io/github/nap/jaro-winkler-distance?branch=master
.. image:: https://img.shields.io/github/license/nap/jaro-winkler-distance.svg
    :target: https://raw.githubusercontent.com/nap/jaro-winkler-distance/master/LICENSE
.. image:: https://img.shields.io/pypi/pyversions/pyjarowinkler.svg
    :target: https://pypi.python.org/pypi/pyjarowinkler

Find the Jaro Winkler Distance which indicates the similarity score between two Strings.
The Jaro measure is the weighted sum of percentage of matched characters from each file
and transposed characters. Winkler increased this measure for matching initial characters.

The Implementation
------------------
The original implementation is based on the `Jaro Winkler Similarity Algorithm <http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance>`_ article that can be found on `Wikipedia <http://wikipedia.org>`_.
This Python version of the original implementation is based on the `Apache StringUtils <http://commons.apache.org/proper/commons-lang/apidocs/src-html/org/apache/commons/lang3/StringUtils.html#line.7141>`_ library.

Correctness
-----------
Unittest similar to what you will find in the ``StringUtils`` library were used to validate implementation.

Note
----
A limit of ``shorter / 2 + 1`` is used in StringUtils, this differs from Wikipedia and also `Winkler's paper <http://www.amstat.org/sections/srms/Proceedings/papers/1990_056.pdf>`_, where a distance of ``longer / 2 - 1`` is used, corresponding to positions of ``longer / 2``.
As of ``version 1.8``, the algorithm now correctly works with the ``"CTRATE" - "TRACE"`` example from Wikipedia.

Example
-------

::

    >>> from pyjarowinkler import distance
    >>> # Scaling is 0.1 by default
    >>> print distance.get_jaro_distance("hello", "haloa", winkler=True, scaling=0.1)
    0.76
    >>> print distance.get_jaro_distance("hello", "haloa", winkler=False, scaling=0.1)
    0.733333333333

:Version: 1.8 of 2016-03-22
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nap/jaro-winkler-distance",
    "name": "pyjarowinkler",
    "maintainer": "",
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": "",
    "keywords": "jaro winkler distance score string delta diff",
    "author": "Jean-Bernard Ratte",
    "author_email": "jean.bernard.ratte@unary.ca",
    "download_url": "https://files.pythonhosted.org/packages/04/c2/d560c1eebd87b668394daee4ac07959bc1a00db56364b86863470a8c23e4/pyjarowinkler-1.8.tar.gz",
    "platform": "Linux",
    "description": "Jaro Winkler Distance\r\n=====================\r\n\r\n.. image:: https://travis-ci.org/nap/jaro-winkler-distance.svg?branch=master\r\n    :target: https://travis-ci.org/nap/jaro-winkler-distance\r\n.. image:: https://coveralls.io/repos/nap/jaro-winkler-distance/badge.svg?branch=master&service=github\r\n    :target: https://coveralls.io/github/nap/jaro-winkler-distance?branch=master\r\n.. image:: https://img.shields.io/github/license/nap/jaro-winkler-distance.svg\r\n    :target: https://raw.githubusercontent.com/nap/jaro-winkler-distance/master/LICENSE\r\n.. image:: https://img.shields.io/pypi/pyversions/pyjarowinkler.svg\r\n    :target: https://pypi.python.org/pypi/pyjarowinkler\r\n\r\nFind the Jaro Winkler Distance which indicates the similarity score between two Strings.\r\nThe Jaro measure is the weighted sum of percentage of matched characters from each file\r\nand transposed characters. Winkler increased this measure for matching initial characters.\r\n\r\nThe Implementation\r\n------------------\r\nThe original implementation is based on the `Jaro Winkler Similarity Algorithm <http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance>`_ article that can be found on `Wikipedia <http://wikipedia.org>`_.\r\nThis Python version of the original implementation is based on the `Apache StringUtils <http://commons.apache.org/proper/commons-lang/apidocs/src-html/org/apache/commons/lang3/StringUtils.html#line.7141>`_ library.\r\n\r\nCorrectness\r\n-----------\r\nUnittest similar to what you will find in the ``StringUtils`` library were used to validate implementation.\r\n\r\nNote\r\n----\r\nA limit of ``shorter / 2 + 1`` is used in StringUtils, this differs from Wikipedia and also `Winkler's paper <http://www.amstat.org/sections/srms/Proceedings/papers/1990_056.pdf>`_, where a distance of ``longer / 2 - 1`` is used, corresponding to positions of ``longer / 2``.\r\nAs of ``version 1.8``, the algorithm now correctly works with the ``\"CTRATE\" - \"TRACE\"`` example from Wikipedia.\r\n\r\nExample\r\n-------\r\n\r\n::\r\n\r\n    >>> from pyjarowinkler import distance\r\n    >>> # Scaling is 0.1 by default\r\n    >>> print distance.get_jaro_distance(\"hello\", \"haloa\", winkler=True, scaling=0.1)\r\n    0.76\r\n    >>> print distance.get_jaro_distance(\"hello\", \"haloa\", winkler=False, scaling=0.1)\r\n    0.733333333333\r\n\r\n:Version: 1.8 of 2016-03-22",
    "bugtrack_url": null,
    "license": "http://www.apache.org/licenses/",
    "summary": "Find the Jaro Winkler Distance which indicates the similarity score between two Strings",
    "version": "1.8",
    "project_urls": {
        "Download": "https://github.com/nap/jaro-winkler-distance/archive/v1.8.zip",
        "Homepage": "https://github.com/nap/jaro-winkler-distance"
    },
    "split_keywords": [
        "jaro",
        "winkler",
        "distance",
        "score",
        "string",
        "delta",
        "diff"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b958b89073047b447e02b08d4f64fbb984e5a4dfef4134477350b256c625c779",
                "md5": "fc9a5bd0344c24c10cf57e7dce6e0370",
                "sha256": "dc80f4e606a6384729a577d0a0dff5aceadb9efbe19bd0fc04e79d55ffd1e0aa"
            },
            "downloads": -1,
            "filename": "pyjarowinkler-1.8-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fc9a5bd0344c24c10cf57e7dce6e0370",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 5904,
            "upload_time": "2016-03-23T02:09:37",
            "upload_time_iso_8601": "2016-03-23T02:09:37.939536Z",
            "url": "https://files.pythonhosted.org/packages/b9/58/b89073047b447e02b08d4f64fbb984e5a4dfef4134477350b256c625c779/pyjarowinkler-1.8-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "04c2d560c1eebd87b668394daee4ac07959bc1a00db56364b86863470a8c23e4",
                "md5": "82b244b397493e53a70cd05db498fb3c",
                "sha256": "49828834eddae6a078ee1329dca572541192a3f49e407608f4063c692c1ef1df"
            },
            "downloads": -1,
            "filename": "pyjarowinkler-1.8.tar.gz",
            "has_sig": false,
            "md5_digest": "82b244b397493e53a70cd05db498fb3c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 4589,
            "upload_time": "2016-03-23T02:09:46",
            "upload_time_iso_8601": "2016-03-23T02:09:46.887986Z",
            "url": "https://files.pythonhosted.org/packages/04/c2/d560c1eebd87b668394daee4ac07959bc1a00db56364b86863470a8c23e4/pyjarowinkler-1.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2016-03-23 02:09:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nap",
    "github_project": "jaro-winkler-distance",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "tox": true,
    "lcname": "pyjarowinkler"
}
        
Elapsed time: 0.28973s