editdistpy


Nameeditdistpy JSON
Version 0.1.3 PyPI version JSON
download
home_pagehttps://github.com/mammothb/editdistpy
SummaryFast Levenshtein and Damerau optimal string alignment algorithms.
upload_time2021-11-29 12:37:52
maintainer
docs_urlNone
authormmb L
requires_python>=3.6
licenseMIT
keywords edit distance levenshtein damerau
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            editdistpy <br>
[![PyPI version](https://badge.fury.io/py/editdistpy.svg)](https://badge.fury.io/py/editdistpy)
[![Tests](https://github.com/mammothb/editdistpy/actions/workflows/tests.yml/badge.svg)](https://github.com/mammothb/editdistpy/actions/workflows/tests.yml)
========

editdistpy is a fast implementation of the Levenshtein edit distance and
the Damerau-Levenshtein optimal string alignment (OSA) edit distance
algorithms. The original C# project can be found at [SoftWx.Match](https://github.com/softwx/SoftWx.Match).

## Installation

The easiest way to install editdistpy is using `pip`:
```
pip install -U editdistpy
```

## Usage

You can specify the `max_distance` you care about, if the edit distance exceeds
this `max_distance`, `-1` will be returned. Specifying a sensible max distance
can result in significant speed improvement.

You can also specify `max_distance=sys.maxsize` if you wish for the actual edit
distance to always be computed.

### Levenshtein

```python
import sys

from editdistpy import levenshtein

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: 6
```

### Damerau-Levenshtein OSA

```python
import sys

from editdistpy import damerau_osa

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: 6
```

## Benchmark

A simple benchmark was done on Python 3.8.12 against [editdistance](https://github.com/roy-ht/editdistance) which implements the Levenshtein edit distance
algorithm.

The script used by the benchmark can be found [here](https://github.com/mammothb/editdistpy/blob/master/tests/benchmarks.py).

For clarity, the following string pairs were used.

### Single word (completely different)
"xabxcdxxefxgx"<br>
"1ab2cd34ef5g6"

### Single word (similar)
"example" <br>
"samples"

### Single word (identical ending)
"kdeisfnexabxcdxlskdixefxgx"<br>
"xabxcdxlskdixefxgx"

### Short string
"short sentence with words"<br>
"shrtsen tence wit mispeledwords"

### Long string
"Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod rem"<br>
"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium"

```
single_dif string
        test_damerau_osa               0.5202 usec/pass 1040.36 msec total 2000000 iterations
        test_levenshtein               0.3547 usec/pass 709.40 msec total 2000000 iterations
        test_editdistance              0.6399 usec/pass 1279.81 msec total 2000000 iterations
        test_damerau_osa early_cutoff  0.5134 usec/pass 1026.72 msec total 2000000 iterations
        test_levenshtein early_cutoff  0.3862 usec/pass 772.31 msec total 2000000 iterations
single_sim string
        test_damerau_osa               0.2983 usec/pass 596.57 msec total 2000000 iterations
        test_levenshtein               0.2433 usec/pass 486.68 msec total 2000000 iterations
        test_editdistance              0.3942 usec/pass 788.36 msec total 2000000 iterations
        test_damerau_osa early_cutoff  0.2865 usec/pass 572.90 msec total 2000000 iterations
        test_levenshtein early_cutoff  0.2363 usec/pass 472.61 msec total 2000000 iterations
single_end string
        test_damerau_osa               0.3332 usec/pass 666.32 msec total 2000000 iterations
        test_levenshtein               0.3300 usec/pass 659.93 msec total 2000000 iterations
        test_editdistance              0.7902 usec/pass 1580.42 msec total 2000000 iterations
        test_damerau_osa early_cutoff  0.3199 usec/pass 639.74 msec total 2000000 iterations
        test_levenshtein early_cutoff  0.3205 usec/pass 641.01 msec total 2000000 iterations
short string
        test_damerau_osa               0.9925 usec/pass 1984.97 msec total 2000000 iterations
        test_levenshtein               0.6379 usec/pass 1275.76 msec total 2000000 iterations
        test_editdistance              0.9587 usec/pass 1917.37 msec total 2000000 iterations
        test_damerau_osa early_cutoff  0.7535 usec/pass 1506.91 msec total 2000000 iterations
        test_levenshtein early_cutoff  0.5794 usec/pass 1158.79 msec total 2000000 iterations
long string
        test_damerau_osa               8.6244 usec/pass 17248.73 msec total 2000000 iterations
        test_levenshtein               4.2367 usec/pass 8473.36 msec total 2000000 iterations
        test_editdistance              2.0407 usec/pass 4081.31 msec total 2000000 iterations
        test_damerau_osa early_cutoff  1.0795 usec/pass 2158.99 msec total 2000000 iterations
        test_levenshtein early_cutoff  0.9031 usec/pass 1806.28 msec total 2000000 iterations
```

While `max_distance=10` significantly improves the computation time, it may not
be a sensible value in some cases.

editdistpy is also seen to perform better with shorter length strings and can
be the more suitable library if your use case mainly deals with comparing short
strings.

## Changelog

See the [changelog](https://github.com/mammothb/editdistpy/blob/master/CHANGELOG.md) for a history of notable changes to edistdistpy.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mammothb/editdistpy",
    "name": "editdistpy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "edit distance,levenshtein,damerau",
    "author": "mmb L",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/c8/83/d3192c486d81f6ccc10940c9be084682271e514c70ced94b29f4575a4c5c/editdistpy-0.1.3.tar.gz",
    "platform": "",
    "description": "editdistpy <br>\n[![PyPI version](https://badge.fury.io/py/editdistpy.svg)](https://badge.fury.io/py/editdistpy)\n[![Tests](https://github.com/mammothb/editdistpy/actions/workflows/tests.yml/badge.svg)](https://github.com/mammothb/editdistpy/actions/workflows/tests.yml)\n========\n\neditdistpy is a fast implementation of the Levenshtein edit distance and\nthe Damerau-Levenshtein optimal string alignment (OSA) edit distance\nalgorithms. The original C# project can be found at [SoftWx.Match](https://github.com/softwx/SoftWx.Match).\n\n## Installation\n\nThe easiest way to install editdistpy is using `pip`:\n```\npip install -U editdistpy\n```\n\n## Usage\n\nYou can specify the `max_distance` you care about, if the edit distance exceeds\nthis `max_distance`, `-1` will be returned. Specifying a sensible max distance\ncan result in significant speed improvement.\n\nYou can also specify `max_distance=sys.maxsize` if you wish for the actual edit\ndistance to always be computed.\n\n### Levenshtein\n\n```python\nimport sys\n\nfrom editdistpy import levenshtein\n\nstring_1 = \"flintstone\"\nstring_2 = \"hanson\"\n\nmax_distance = 2\nprint(levenshtein.distance(string_1, string_2, max_distance))\n# expected output: -1\n\nmax_distance = sys.maxsize\nprint(levenshtein.distance(string_1, string_2, max_distance))\n# expected output: 6\n```\n\n### Damerau-Levenshtein OSA\n\n```python\nimport sys\n\nfrom editdistpy import damerau_osa\n\nstring_1 = \"flintstone\"\nstring_2 = \"hanson\"\n\nmax_distance = 2\nprint(damerau_osa.distance(string_1, string_2, max_distance))\n# expected output: -1\n\nmax_distance = sys.maxsize\nprint(damerau_osa.distance(string_1, string_2, max_distance))\n# expected output: 6\n```\n\n## Benchmark\n\nA simple benchmark was done on Python 3.8.12 against [editdistance](https://github.com/roy-ht/editdistance) which implements the Levenshtein edit distance\nalgorithm.\n\nThe script used by the benchmark can be found [here](https://github.com/mammothb/editdistpy/blob/master/tests/benchmarks.py).\n\nFor clarity, the following string pairs were used.\n\n### Single word (completely different)\n\"xabxcdxxefxgx\"<br>\n\"1ab2cd34ef5g6\"\n\n### Single word (similar)\n\"example\" <br>\n\"samples\"\n\n### Single word (identical ending)\n\"kdeisfnexabxcdxlskdixefxgx\"<br>\n\"xabxcdxlskdixefxgx\"\n\n### Short string\n\"short sentence with words\"<br>\n\"shrtsen tence wit mispeledwords\"\n\n### Long string\n\"Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod rem\"<br>\n\"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium\"\n\n```\nsingle_dif string\n        test_damerau_osa               0.5202 usec/pass 1040.36 msec total 2000000 iterations\n        test_levenshtein               0.3547 usec/pass 709.40 msec total 2000000 iterations\n        test_editdistance              0.6399 usec/pass 1279.81 msec total 2000000 iterations\n        test_damerau_osa early_cutoff  0.5134 usec/pass 1026.72 msec total 2000000 iterations\n        test_levenshtein early_cutoff  0.3862 usec/pass 772.31 msec total 2000000 iterations\nsingle_sim string\n        test_damerau_osa               0.2983 usec/pass 596.57 msec total 2000000 iterations\n        test_levenshtein               0.2433 usec/pass 486.68 msec total 2000000 iterations\n        test_editdistance              0.3942 usec/pass 788.36 msec total 2000000 iterations\n        test_damerau_osa early_cutoff  0.2865 usec/pass 572.90 msec total 2000000 iterations\n        test_levenshtein early_cutoff  0.2363 usec/pass 472.61 msec total 2000000 iterations\nsingle_end string\n        test_damerau_osa               0.3332 usec/pass 666.32 msec total 2000000 iterations\n        test_levenshtein               0.3300 usec/pass 659.93 msec total 2000000 iterations\n        test_editdistance              0.7902 usec/pass 1580.42 msec total 2000000 iterations\n        test_damerau_osa early_cutoff  0.3199 usec/pass 639.74 msec total 2000000 iterations\n        test_levenshtein early_cutoff  0.3205 usec/pass 641.01 msec total 2000000 iterations\nshort string\n        test_damerau_osa               0.9925 usec/pass 1984.97 msec total 2000000 iterations\n        test_levenshtein               0.6379 usec/pass 1275.76 msec total 2000000 iterations\n        test_editdistance              0.9587 usec/pass 1917.37 msec total 2000000 iterations\n        test_damerau_osa early_cutoff  0.7535 usec/pass 1506.91 msec total 2000000 iterations\n        test_levenshtein early_cutoff  0.5794 usec/pass 1158.79 msec total 2000000 iterations\nlong string\n        test_damerau_osa               8.6244 usec/pass 17248.73 msec total 2000000 iterations\n        test_levenshtein               4.2367 usec/pass 8473.36 msec total 2000000 iterations\n        test_editdistance              2.0407 usec/pass 4081.31 msec total 2000000 iterations\n        test_damerau_osa early_cutoff  1.0795 usec/pass 2158.99 msec total 2000000 iterations\n        test_levenshtein early_cutoff  0.9031 usec/pass 1806.28 msec total 2000000 iterations\n```\n\nWhile `max_distance=10` significantly improves the computation time, it may not\nbe a sensible value in some cases.\n\neditdistpy is also seen to perform better with shorter length strings and can\nbe the more suitable library if your use case mainly deals with comparing short\nstrings.\n\n## Changelog\n\nSee the [changelog](https://github.com/mammothb/editdistpy/blob/master/CHANGELOG.md) for a history of notable changes to edistdistpy.\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Fast Levenshtein and Damerau optimal string alignment algorithms.",
    "version": "0.1.3",
    "split_keywords": [
        "edit distance",
        "levenshtein",
        "damerau"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "8c57d41c56d89b885164d59e6e9a4d60",
                "sha256": "7f951243c0f7074415849ca18d107891e98afe0f5381fb65bd39768633ab6748"
            },
            "downloads": -1,
            "filename": "editdistpy-0.1.3-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "8c57d41c56d89b885164d59e6e9a4d60",
            "packagetype": "bdist_wheel",
            "python_version": "cp36",
            "requires_python": ">=3.6",
            "size": 123368,
            "upload_time": "2021-11-29T12:37:43",
            "upload_time_iso_8601": "2021-11-29T12:37:43.288737Z",
            "url": "https://files.pythonhosted.org/packages/da/45/29541b4fc6e4670b26ed68e99e8e9f257befc2dc7c1b5a42a42367c7af87/editdistpy-0.1.3-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "b56cf8482bcf68d233bfc6977f647f4f",
                "sha256": "4e7e1ac3c59d479f568e38729b022240cc46f5e6753ba1c023456314970bb149"
            },
            "downloads": -1,
            "filename": "editdistpy-0.1.3-cp36-cp36m-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "b56cf8482bcf68d233bfc6977f647f4f",
            "packagetype": "bdist_wheel",
            "python_version": "cp36",
            "requires_python": ">=3.6",
            "size": 32795,
            "upload_time": "2021-11-29T12:37:44",
            "upload_time_iso_8601": "2021-11-29T12:37:44.885363Z",
            "url": "https://files.pythonhosted.org/packages/d7/75/f72f5ecc27dda84f6d671feac692e978e3069b1bede74dc63d15beb02c78/editdistpy-0.1.3-cp36-cp36m-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "dc46395331d883f737945a60e6bbb89a",
                "sha256": "b747995aebe987a565cff0e56812bf08e4e4e9ba8b7f094804845d9e53b20533"
            },
            "downloads": -1,
            "filename": "editdistpy-0.1.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "dc46395331d883f737945a60e6bbb89a",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.6",
            "size": 125540,
            "upload_time": "2021-11-29T12:37:45",
            "upload_time_iso_8601": "2021-11-29T12:37:45.850710Z",
            "url": "https://files.pythonhosted.org/packages/ec/2a/65dbc51b4c63c1c7cd0941ece5d3cf037e03a4a1e227a23948de424c2b59/editdistpy-0.1.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "cdf3a0ce13bbc32b8f780bab065a0b07",
                "sha256": "a74fdc3becf0dc115d9f34cc45dcfa0e5154f5348f2b34ccc71026b064c22689"
            },
            "downloads": -1,
            "filename": "editdistpy-0.1.3-cp37-cp37m-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "cdf3a0ce13bbc32b8f780bab065a0b07",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.6",
            "size": 32943,
            "upload_time": "2021-11-29T12:37:47",
            "upload_time_iso_8601": "2021-11-29T12:37:47.218913Z",
            "url": "https://files.pythonhosted.org/packages/49/72/537626d0f5872bf184893a4f9bed9417c2a0808f847120169bf7e6081a82/editdistpy-0.1.3-cp37-cp37m-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "2a72b65a9e1224041398276dd00267f2",
                "sha256": "28f71c6343a996776f5691da19136a558593428ad6a10baab85795fa21a461ff"
            },
            "downloads": -1,
            "filename": "editdistpy-0.1.3-cp38-cp38-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "2a72b65a9e1224041398276dd00267f2",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.6",
            "size": 27493,
            "upload_time": "2021-11-29T12:37:48",
            "upload_time_iso_8601": "2021-11-29T12:37:48.128609Z",
            "url": "https://files.pythonhosted.org/packages/1f/3b/8e7c072f6d4482a494bc304c528aca280ff35d9330217a5a645ee9e74a39/editdistpy-0.1.3-cp38-cp38-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "a4c9665ef6e8cad57ae6038728f51ce6",
                "sha256": "aa559312e07ff8835470d0d9b2f245aa2bc9c2d3370a6c5812508e6f0265a523"
            },
            "downloads": -1,
            "filename": "editdistpy-0.1.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "a4c9665ef6e8cad57ae6038728f51ce6",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.6",
            "size": 126942,
            "upload_time": "2021-11-29T12:37:49",
            "upload_time_iso_8601": "2021-11-29T12:37:49.793391Z",
            "url": "https://files.pythonhosted.org/packages/4d/fe/a47ee3127cbb28f8f07d162c2a81efdd9e85695dbb894552e1020baeaf65/editdistpy-0.1.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "d61f71679563e2718ec93915d8fa29b7",
                "sha256": "7c00a540d17b998519b100a610acc1e1f2aa015b2b8dc268dcd62eeae4259916"
            },
            "downloads": -1,
            "filename": "editdistpy-0.1.3-cp38-cp38-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "d61f71679563e2718ec93915d8fa29b7",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.6",
            "size": 33231,
            "upload_time": "2021-11-29T12:37:51",
            "upload_time_iso_8601": "2021-11-29T12:37:51.182414Z",
            "url": "https://files.pythonhosted.org/packages/0c/05/0894272ae1ffa579fabb41baa5469912a377db898b518cd14982071e425d/editdistpy-0.1.3-cp38-cp38-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "6b1a5b4478f3c9f058588646cb04773b",
                "sha256": "b3cad07319d79fe8b3ba6bf92293de962932917d88e9c624df7b6a44bddf1dcc"
            },
            "downloads": -1,
            "filename": "editdistpy-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "6b1a5b4478f3c9f058588646cb04773b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 57171,
            "upload_time": "2021-11-29T12:37:52",
            "upload_time_iso_8601": "2021-11-29T12:37:52.112862Z",
            "url": "https://files.pythonhosted.org/packages/c8/83/d3192c486d81f6ccc10940c9be084682271e514c70ced94b29f4575a4c5c/editdistpy-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-11-29 12:37:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "mammothb",
    "github_project": "editdistpy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "editdistpy"
}
        
Elapsed time: 0.01962s