StringDist


NameStringDist JSON
Version 1.0.9 PyPI version JSON
download
home_pagehttps://github.com/obulkin/string-dist
SummaryThis package provides the stringdist module, which includes several functions for calculating string distances. Under the hood, a C extension module is preferentially used for optimal performance, with an automatic fallback to a Python implementation.
upload_time2017-05-11 07:54:54
maintainer
docs_urlNone
authorOleg Bulkin
requires_python
licenseMIT
keywords string metric string distance edit distance levenshtein damerau-levenshtein optimal string alignment distance
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ==========
StringDist
==========

This package provides the ``stringdist`` module, which includes functions for 
calculating raw and normalized versions of the following string distance 
measurements:

* Levenshtein distance
* Restricted Damerau-Levenshtein distance (a.k.a. optimal string alignment 
  distance)

For optimal performance, the package compiles and uses a C extension module 
under the hood, but a Python implementation is included as well and will 
automatically be used if C extensions are not supported by the system 
(e.g. when the selected interpreter is PyPy).

Installation
============

To install this package, just use pip::

    pip install StringDist

All Python versions ``>=3.3`` should be supported.

Usage
=====

To use the package, simply import the ``stringdist`` module and call the 
desired function, passing in two strings::

    import stringdist
    stringdist.levenshtein('test', 'testing')

The available functions are as follows:

* ``levenshtein``
* ``levenshtein_norm``
* ``rdlevenshtein``
* ``rdlevenshtein_norm``

Raw distances assume that every allowed operation has a cost of ``1``. 
Normalized distances are floats in the range ``[0.0, 1.0]``, where ``0.0`` 
always corresponds to a raw value of ``0`` and ``1.0`` always corresponds to 
the length of the longer string, i.e. the biggest possible raw value.

**Note**: The restricted Damerau-Levenshtein distance is not a true distance 
metric because it does not satisfy the 
`triangle inequality <https://en.wikipedia.org/wiki/Triangle_inequality>`_. 
This makes it a poor choice for applications that involve evaluating the 
similarity of more than two strings, such as clustering.

Bugs and Requests
=================

Please use `GitHub Issues <https://github.com/obulkin/string-dist/issues>`_ 
for bugs and feature requests, checking first to make sure you're not creating 
a duplicate issue.

Contributing
============

Pull requests are welcome. Please discuss your plans first by creating a 
GitHub issue and use good coding style. For Python, this means following the 
rules laid out in PEP 8 and other relevant PEPs. If in doubt, use a linter 
like `Pylint <https://www.pylint.org>`_.

To run unit tests::

    git clone https://github.com/obulkin/string-dist.git {directory}
    cd {directory}
    python setup.py install
    python -m unittest -v test_stringdist

You can run tests without installing the package, but this will always cause 
the Python implementation to be used as the C variant has to be compiled 
first. By the same token, any changes to the C code will require recompilation 
before showing up in the tests, which can be handled by running 
``python setup.py install`` again.

Contributors
============

* Oleg Bulkin <o.bulkin@gmail.com>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/obulkin/string-dist",
    "name": "StringDist",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "string metric string distance edit distance levenshtein damerau-levenshtein optimal string alignment distance",
    "author": "Oleg Bulkin",
    "author_email": "o.bulkin@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/85/f0/c56cbe92b4b06fbc7adaa81917ad34d7027834e166fff2d2db73961c67fa/StringDist-1.0.9.tar.gz",
    "platform": "UNKNOWN",
    "description": "==========\nStringDist\n==========\n\nThis package provides the ``stringdist`` module, which includes functions for \ncalculating raw and normalized versions of the following string distance \nmeasurements:\n\n* Levenshtein distance\n* Restricted Damerau-Levenshtein distance (a.k.a. optimal string alignment \n  distance)\n\nFor optimal performance, the package compiles and uses a C extension module \nunder the hood, but a Python implementation is included as well and will \nautomatically be used if C extensions are not supported by the system \n(e.g. when the selected interpreter is PyPy).\n\nInstallation\n============\n\nTo install this package, just use pip::\n\n    pip install StringDist\n\nAll Python versions ``>=3.3`` should be supported.\n\nUsage\n=====\n\nTo use the package, simply import the ``stringdist`` module and call the \ndesired function, passing in two strings::\n\n    import stringdist\n    stringdist.levenshtein('test', 'testing')\n\nThe available functions are as follows:\n\n* ``levenshtein``\n* ``levenshtein_norm``\n* ``rdlevenshtein``\n* ``rdlevenshtein_norm``\n\nRaw distances assume that every allowed operation has a cost of ``1``. \nNormalized distances are floats in the range ``[0.0, 1.0]``, where ``0.0`` \nalways corresponds to a raw value of ``0`` and ``1.0`` always corresponds to \nthe length of the longer string, i.e. the biggest possible raw value.\n\n**Note**: The restricted Damerau-Levenshtein distance is not a true distance \nmetric because it does not satisfy the \n`triangle inequality <https://en.wikipedia.org/wiki/Triangle_inequality>`_. \nThis makes it a poor choice for applications that involve evaluating the \nsimilarity of more than two strings, such as clustering.\n\nBugs and Requests\n=================\n\nPlease use `GitHub Issues <https://github.com/obulkin/string-dist/issues>`_ \nfor bugs and feature requests, checking first to make sure you're not creating \na duplicate issue.\n\nContributing\n============\n\nPull requests are welcome. Please discuss your plans first by creating a \nGitHub issue and use good coding style. For Python, this means following the \nrules laid out in PEP 8 and other relevant PEPs. If in doubt, use a linter \nlike `Pylint <https://www.pylint.org>`_.\n\nTo run unit tests::\n\n    git clone https://github.com/obulkin/string-dist.git {directory}\n    cd {directory}\n    python setup.py install\n    python -m unittest -v test_stringdist\n\nYou can run tests without installing the package, but this will always cause \nthe Python implementation to be used as the C variant has to be compiled \nfirst. By the same token, any changes to the C code will require recompilation \nbefore showing up in the tests, which can be handled by running \n``python setup.py install`` again.\n\nContributors\n============\n\n* Oleg Bulkin <o.bulkin@gmail.com>\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "This package provides the stringdist module, which includes several functions for calculating string distances. Under the hood, a C extension module is preferentially used for optimal performance, with an automatic fallback to a Python implementation.",
    "version": "1.0.9",
    "project_urls": {
        "Homepage": "https://github.com/obulkin/string-dist"
    },
    "split_keywords": [
        "string",
        "metric",
        "string",
        "distance",
        "edit",
        "distance",
        "levenshtein",
        "damerau-levenshtein",
        "optimal",
        "string",
        "alignment",
        "distance"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "85f0c56cbe92b4b06fbc7adaa81917ad34d7027834e166fff2d2db73961c67fa",
                "md5": "7491a2a39dcb0d84253cf58902d9ec41",
                "sha256": "91e6d4a348223db094d029e7e3de9ce89c561738047555dfad60ff5ccb7a5b74"
            },
            "downloads": -1,
            "filename": "StringDist-1.0.9.tar.gz",
            "has_sig": false,
            "md5_digest": "7491a2a39dcb0d84253cf58902d9ec41",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7400,
            "upload_time": "2017-05-11T07:54:54",
            "upload_time_iso_8601": "2017-05-11T07:54:54.261901Z",
            "url": "https://files.pythonhosted.org/packages/85/f0/c56cbe92b4b06fbc7adaa81917ad34d7027834e166fff2d2db73961c67fa/StringDist-1.0.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2017-05-11 07:54:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "obulkin",
    "github_project": "string-dist",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "stringdist"
}
        
Elapsed time: 0.58852s