==========
StringDist
==========
This package provides the ``stringdist`` module, which includes functions for
calculating raw and normalized versions of the following string distance
measurements:
* Levenshtein distance
* Restricted Damerau-Levenshtein distance (a.k.a. optimal string alignment
distance)
For optimal performance, the package compiles and uses a C extension module
under the hood, but a Python implementation is included as well and will
automatically be used if C extensions are not supported by the system
(e.g. when the selected interpreter is PyPy).
Installation
============
To install this package, just use pip::
pip install StringDist
All Python versions ``>=3.3`` should be supported.
Usage
=====
To use the package, simply import the ``stringdist`` module and call the
desired function, passing in two strings::
import stringdist
stringdist.levenshtein('test', 'testing')
The available functions are as follows:
* ``levenshtein``
* ``levenshtein_norm``
* ``rdlevenshtein``
* ``rdlevenshtein_norm``
Raw distances assume that every allowed operation has a cost of ``1``.
Normalized distances are floats in the range ``[0.0, 1.0]``, where ``0.0``
always corresponds to a raw value of ``0`` and ``1.0`` always corresponds to
the length of the longer string, i.e. the biggest possible raw value.
**Note**: The restricted Damerau-Levenshtein distance is not a true distance
metric because it does not satisfy the
`triangle inequality <https://en.wikipedia.org/wiki/Triangle_inequality>`_.
This makes it a poor choice for applications that involve evaluating the
similarity of more than two strings, such as clustering.
Bugs and Requests
=================
Please use `GitHub Issues <https://github.com/obulkin/string-dist/issues>`_
for bugs and feature requests, checking first to make sure you're not creating
a duplicate issue.
Contributing
============
Pull requests are welcome. Please discuss your plans first by creating a
GitHub issue and use good coding style. For Python, this means following the
rules laid out in PEP 8 and other relevant PEPs. If in doubt, use a linter
like `Pylint <https://www.pylint.org>`_.
To run unit tests::
git clone https://github.com/obulkin/string-dist.git {directory}
cd {directory}
python setup.py install
python -m unittest -v test_stringdist
You can run tests without installing the package, but this will always cause
the Python implementation to be used as the C variant has to be compiled
first. By the same token, any changes to the C code will require recompilation
before showing up in the tests, which can be handled by running
``python setup.py install`` again.
Contributors
============
* Oleg Bulkin <o.bulkin@gmail.com>
Raw data
{
"_id": null,
"home_page": "https://github.com/obulkin/string-dist",
"name": "StringDist",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "string metric string distance edit distance levenshtein damerau-levenshtein optimal string alignment distance",
"author": "Oleg Bulkin",
"author_email": "o.bulkin@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/85/f0/c56cbe92b4b06fbc7adaa81917ad34d7027834e166fff2d2db73961c67fa/StringDist-1.0.9.tar.gz",
"platform": "UNKNOWN",
"description": "==========\nStringDist\n==========\n\nThis package provides the ``stringdist`` module, which includes functions for \ncalculating raw and normalized versions of the following string distance \nmeasurements:\n\n* Levenshtein distance\n* Restricted Damerau-Levenshtein distance (a.k.a. optimal string alignment \n distance)\n\nFor optimal performance, the package compiles and uses a C extension module \nunder the hood, but a Python implementation is included as well and will \nautomatically be used if C extensions are not supported by the system \n(e.g. when the selected interpreter is PyPy).\n\nInstallation\n============\n\nTo install this package, just use pip::\n\n pip install StringDist\n\nAll Python versions ``>=3.3`` should be supported.\n\nUsage\n=====\n\nTo use the package, simply import the ``stringdist`` module and call the \ndesired function, passing in two strings::\n\n import stringdist\n stringdist.levenshtein('test', 'testing')\n\nThe available functions are as follows:\n\n* ``levenshtein``\n* ``levenshtein_norm``\n* ``rdlevenshtein``\n* ``rdlevenshtein_norm``\n\nRaw distances assume that every allowed operation has a cost of ``1``. \nNormalized distances are floats in the range ``[0.0, 1.0]``, where ``0.0`` \nalways corresponds to a raw value of ``0`` and ``1.0`` always corresponds to \nthe length of the longer string, i.e. the biggest possible raw value.\n\n**Note**: The restricted Damerau-Levenshtein distance is not a true distance \nmetric because it does not satisfy the \n`triangle inequality <https://en.wikipedia.org/wiki/Triangle_inequality>`_. \nThis makes it a poor choice for applications that involve evaluating the \nsimilarity of more than two strings, such as clustering.\n\nBugs and Requests\n=================\n\nPlease use `GitHub Issues <https://github.com/obulkin/string-dist/issues>`_ \nfor bugs and feature requests, checking first to make sure you're not creating \na duplicate issue.\n\nContributing\n============\n\nPull requests are welcome. Please discuss your plans first by creating a \nGitHub issue and use good coding style. For Python, this means following the \nrules laid out in PEP 8 and other relevant PEPs. If in doubt, use a linter \nlike `Pylint <https://www.pylint.org>`_.\n\nTo run unit tests::\n\n git clone https://github.com/obulkin/string-dist.git {directory}\n cd {directory}\n python setup.py install\n python -m unittest -v test_stringdist\n\nYou can run tests without installing the package, but this will always cause \nthe Python implementation to be used as the C variant has to be compiled \nfirst. By the same token, any changes to the C code will require recompilation \nbefore showing up in the tests, which can be handled by running \n``python setup.py install`` again.\n\nContributors\n============\n\n* Oleg Bulkin <o.bulkin@gmail.com>\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "This package provides the stringdist module, which includes several functions for calculating string distances. Under the hood, a C extension module is preferentially used for optimal performance, with an automatic fallback to a Python implementation.",
"version": "1.0.9",
"project_urls": {
"Homepage": "https://github.com/obulkin/string-dist"
},
"split_keywords": [
"string",
"metric",
"string",
"distance",
"edit",
"distance",
"levenshtein",
"damerau-levenshtein",
"optimal",
"string",
"alignment",
"distance"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "85f0c56cbe92b4b06fbc7adaa81917ad34d7027834e166fff2d2db73961c67fa",
"md5": "7491a2a39dcb0d84253cf58902d9ec41",
"sha256": "91e6d4a348223db094d029e7e3de9ce89c561738047555dfad60ff5ccb7a5b74"
},
"downloads": -1,
"filename": "StringDist-1.0.9.tar.gz",
"has_sig": false,
"md5_digest": "7491a2a39dcb0d84253cf58902d9ec41",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7400,
"upload_time": "2017-05-11T07:54:54",
"upload_time_iso_8601": "2017-05-11T07:54:54.261901Z",
"url": "https://files.pythonhosted.org/packages/85/f0/c56cbe92b4b06fbc7adaa81917ad34d7027834e166fff2d2db73961c67fa/StringDist-1.0.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2017-05-11 07:54:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "obulkin",
"github_project": "string-dist",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "stringdist"
}