polars-strsim


Namepolars-strsim JSON
Version 0.2.3 PyPI version JSON
download
home_pageNone
SummaryPolars extension for string similarity
upload_time2024-10-03 19:42:35
maintainerNone
docs_urlNone
authorJeremy Foxcroft
requires_python>=3.8
licenseNone
keywords polars-extension string-similarity
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <a href="https://pypi.org/project/polars-strsim/">
    <img src="https://img.shields.io/pypi/v/polars-strsim.svg" alt="PyPi Latest Release"/>
</a>

# String Similarity Measures for Polars

This package provides python bindings to compute various string similarity measures directly on a polars dataframe. All string similarity measures are implemented in rust and computed in parallel.

The similarity measures that have been implemented are:

- Levenshtein
- Jaro
- Jaro-Winkler
- Jaccard
- Sørensen-Dice

Each similarity measure returns a value normalized between 0.0 and 1.0 (inclusive), where 0.0 indicates the inputs are maximally different and 1.0 means the strings are maximally similar.

## Installing the Library

### With pip

```bash
pip install polars-strsim
```

### From Source

To build and install this library from source, first ensure you have [cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) installed. You will also need maturin, which you can install via `pip install 'maturin[patchelf]'`

polars-strsim can then be installed in your current python environment by running `maturin develop --release`

## Using the Library

**Input:**

```python
import polars as pl
from polars_strsim import levenshtein, jaro, jaro_winkler, jaccard, sorensen_dice

df = pl.DataFrame(
    {
        "name_a": ["phillips", "phillips", ""        , "", None      , None],
        "name_b": ["phillips", "philips" , "phillips", "", "phillips", None],
    }
).with_columns(
    levenshtein=levenshtein("name_a", "name_b"),
    jaro=jaro("name_a", "name_b"),
    jaro_winkler=jaro_winkler("name_a", "name_b"),
    jaccard=jaccard("name_a", "name_b"),
    sorensen_dice=sorensen_dice("name_a", "name_b"),
)

with pl.Config(ascii_tables=True):
    print(df)
```
**Output:**
```
shape: (6, 7)
+----------+----------+-------------+----------+--------------+---------+---------------+
| name_a   | name_b   | levenshtein | jaro     | jaro_winkler | jaccard | sorensen_dice |
| ---      | ---      | ---         | ---      | ---          | ---     | ---           |
| str      | str      | f64         | f64      | f64          | f64     | f64           |
+=======================================================================================+
| phillips | phillips | 1.0         | 1.0      | 1.0          | 1.0     | 1.0           |
| phillips | philips  | 0.875       | 0.958333 | 0.975        | 0.875   | 0.933333      |
|          | phillips | 0.0         | 0.0      | 0.0          | 0.0     | 0.0           |
|          |          | 1.0         | 1.0      | 1.0          | 1.0     | 1.0           |
| null     | phillips | null        | null     | null         | null    | null          |
| null     | null     | null        | null     | null         | null    | null          |
+----------+----------+-------------+----------+--------------+---------+---------------+
```


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "polars-strsim",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "polars-extension, string-similarity",
    "author": "Jeremy Foxcroft",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/cd/0a/2f7dac45cfebc9372faabd183f52f05a7b5349722666f492056d64bc8e1f/polars_strsim-0.2.3.tar.gz",
    "platform": null,
    "description": "<a href=\"https://pypi.org/project/polars-strsim/\">\n    <img src=\"https://img.shields.io/pypi/v/polars-strsim.svg\" alt=\"PyPi Latest Release\"/>\n</a>\n\n# String Similarity Measures for Polars\n\nThis package provides python bindings to compute various string similarity measures directly on a polars dataframe. All string similarity measures are implemented in rust and computed in parallel.\n\nThe similarity measures that have been implemented are:\n\n- Levenshtein\n- Jaro\n- Jaro-Winkler\n- Jaccard\n- S\u00f8rensen-Dice\n\nEach similarity measure returns a value normalized between 0.0 and 1.0 (inclusive), where 0.0 indicates the inputs are maximally different and 1.0 means the strings are maximally similar.\n\n## Installing the Library\n\n### With pip\n\n```bash\npip install polars-strsim\n```\n\n### From Source\n\nTo build and install this library from source, first ensure you have [cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) installed. You will also need maturin, which you can install via `pip install 'maturin[patchelf]'`\n\npolars-strsim can then be installed in your current python environment by running `maturin develop --release`\n\n## Using the Library\n\n**Input:**\n\n```python\nimport polars as pl\nfrom polars_strsim import levenshtein, jaro, jaro_winkler, jaccard, sorensen_dice\n\ndf = pl.DataFrame(\n    {\n        \"name_a\": [\"phillips\", \"phillips\", \"\"        , \"\", None      , None],\n        \"name_b\": [\"phillips\", \"philips\" , \"phillips\", \"\", \"phillips\", None],\n    }\n).with_columns(\n    levenshtein=levenshtein(\"name_a\", \"name_b\"),\n    jaro=jaro(\"name_a\", \"name_b\"),\n    jaro_winkler=jaro_winkler(\"name_a\", \"name_b\"),\n    jaccard=jaccard(\"name_a\", \"name_b\"),\n    sorensen_dice=sorensen_dice(\"name_a\", \"name_b\"),\n)\n\nwith pl.Config(ascii_tables=True):\n    print(df)\n```\n**Output:**\n```\nshape: (6, 7)\n+----------+----------+-------------+----------+--------------+---------+---------------+\n| name_a   | name_b   | levenshtein | jaro     | jaro_winkler | jaccard | sorensen_dice |\n| ---      | ---      | ---         | ---      | ---          | ---     | ---           |\n| str      | str      | f64         | f64      | f64          | f64     | f64           |\n+=======================================================================================+\n| phillips | phillips | 1.0         | 1.0      | 1.0          | 1.0     | 1.0           |\n| phillips | philips  | 0.875       | 0.958333 | 0.975        | 0.875   | 0.933333      |\n|          | phillips | 0.0         | 0.0      | 0.0          | 0.0     | 0.0           |\n|          |          | 1.0         | 1.0      | 1.0          | 1.0     | 1.0           |\n| null     | phillips | null        | null     | null         | null    | null          |\n| null     | null     | null        | null     | null         | null    | null          |\n+----------+----------+-------------+----------+--------------+---------+---------------+\n```\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Polars extension for string similarity",
    "version": "0.2.3",
    "project_urls": {
        "Issues": "https://github.com/foxcroftjn/polars-strsim/issues",
        "Repository": "https://github.com/foxcroftjn/polars-strsim"
    },
    "split_keywords": [
        "polars-extension",
        " string-similarity"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3c2610dcc77881417ed1fb349b19785fc8cd041ae1d4c9fea0e493a45563cee6",
                "md5": "39b2416f751f1fa79d2594f0576e0053",
                "sha256": "099b41cecdaa6bf70dc70bb1bc657303cfa7d84ad662da7acc06d0bc393ff88f"
            },
            "downloads": -1,
            "filename": "polars_strsim-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl",
            "has_sig": false,
            "md5_digest": "39b2416f751f1fa79d2594f0576e0053",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 3296707,
            "upload_time": "2024-10-03T19:42:23",
            "upload_time_iso_8601": "2024-10-03T19:42:23.739630Z",
            "url": "https://files.pythonhosted.org/packages/3c/26/10dcc77881417ed1fb349b19785fc8cd041ae1d4c9fea0e493a45563cee6/polars_strsim-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "22fd9c150b8efd43baa59d1cf602c8bde240bbf863f09f5502633b3caf49d5b3",
                "md5": "e07c5cf3ac910b40adbef3c81f853bb8",
                "sha256": "1c12eefd99d0d6f31748a4d2d0fc787dd7e4f2a99c2a4bffd4f76a78baa829dc"
            },
            "downloads": -1,
            "filename": "polars_strsim-0.2.3-cp38-abi3-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "e07c5cf3ac910b40adbef3c81f853bb8",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 3124560,
            "upload_time": "2024-10-03T19:42:21",
            "upload_time_iso_8601": "2024-10-03T19:42:21.732847Z",
            "url": "https://files.pythonhosted.org/packages/22/fd/9c150b8efd43baa59d1cf602c8bde240bbf863f09f5502633b3caf49d5b3/polars_strsim-0.2.3-cp38-abi3-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8af4ac701f2dd822a4505e9ec04abb38ff69fa360ab3c8ea49112605532e6fb0",
                "md5": "a72be08fa22ab0854520f8957974f708",
                "sha256": "f993301f9390bd80880622ba09bc6b53c81b495b7bd797e01481b63964da3de7"
            },
            "downloads": -1,
            "filename": "polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "a72be08fa22ab0854520f8957974f708",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 3821788,
            "upload_time": "2024-10-03T19:42:12",
            "upload_time_iso_8601": "2024-10-03T19:42:12.201905Z",
            "url": "https://files.pythonhosted.org/packages/8a/f4/ac701f2dd822a4505e9ec04abb38ff69fa360ab3c8ea49112605532e6fb0/polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "72e8a76c3cd3c4a1e0b1b79cf9560ce1c039694b8ad97b21cdce278533b1ac5b",
                "md5": "89f7f96b3d3427c787ad3ca42b2c04e9",
                "sha256": "ce09bd951b6302b1fa904cb5d22870dfcb4092b115501cadb6a0a94ae5c0ff1f"
            },
            "downloads": -1,
            "filename": "polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl",
            "has_sig": false,
            "md5_digest": "89f7f96b3d3427c787ad3ca42b2c04e9",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 3982552,
            "upload_time": "2024-10-03T19:42:14",
            "upload_time_iso_8601": "2024-10-03T19:42:14.404673Z",
            "url": "https://files.pythonhosted.org/packages/72/e8/a76c3cd3c4a1e0b1b79cf9560ce1c039694b8ad97b21cdce278533b1ac5b/polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6ebdc8fe9f8f9de7828eb9194d570d2288a5f3a3dfb1dae1b1ee1eae46db7026",
                "md5": "3ef389293c76ad12013a47e84ee9e8a8",
                "sha256": "f05c6eaedc91ab657de84abc13796c3bf4e47ff304be10fb137db8db71fa9eb8"
            },
            "downloads": -1,
            "filename": "polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl",
            "has_sig": false,
            "md5_digest": "3ef389293c76ad12013a47e84ee9e8a8",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4286243,
            "upload_time": "2024-10-03T19:42:16",
            "upload_time_iso_8601": "2024-10-03T19:42:16.668271Z",
            "url": "https://files.pythonhosted.org/packages/6e/bd/c8fe9f8f9de7828eb9194d570d2288a5f3a3dfb1dae1b1ee1eae46db7026/polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "223b7b235b457075aa8ab63a251d33046301678f2af0040a1018c1659c5da941",
                "md5": "85730d37a84eac64bc425a50ad2d7438",
                "sha256": "a811fd2b169e2c4f732d087303d1038d84ba309f4debb7852f29f2487a3f8a7b"
            },
            "downloads": -1,
            "filename": "polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "85730d37a84eac64bc425a50ad2d7438",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 3961747,
            "upload_time": "2024-10-03T19:42:19",
            "upload_time_iso_8601": "2024-10-03T19:42:19.221853Z",
            "url": "https://files.pythonhosted.org/packages/22/3b/7b235b457075aa8ab63a251d33046301678f2af0040a1018c1659c5da941/polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a4b48f8f25245d3669b11e2615e5ba1496675d3e798bc61e0b54506ae4fa3db6",
                "md5": "7ea49d97ccd30034e180dd4520818286",
                "sha256": "dbb5afea1d06e8934d579e53deda6e883a59d28558336a339090bda50bf2022a"
            },
            "downloads": -1,
            "filename": "polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_aarch64.whl",
            "has_sig": false,
            "md5_digest": "7ea49d97ccd30034e180dd4520818286",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 3876065,
            "upload_time": "2024-10-03T19:42:26",
            "upload_time_iso_8601": "2024-10-03T19:42:26.112586Z",
            "url": "https://files.pythonhosted.org/packages/a4/b4/8f8f25245d3669b11e2615e5ba1496675d3e798bc61e0b54506ae4fa3db6/polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e6bcdd7ae1edf72fbcbe261a39e19e29f1e3360addfe9e21bf18d09a6a9647d0",
                "md5": "a61617a9b36d01fd05e94b3396b45ad9",
                "sha256": "bfcd90efdcb54016caa119ea8c651df27096f3b2f9a0fda0f2faa2b150861203"
            },
            "downloads": -1,
            "filename": "polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_armv7l.whl",
            "has_sig": false,
            "md5_digest": "a61617a9b36d01fd05e94b3396b45ad9",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4124209,
            "upload_time": "2024-10-03T19:42:28",
            "upload_time_iso_8601": "2024-10-03T19:42:28.807214Z",
            "url": "https://files.pythonhosted.org/packages/e6/bc/dd7ae1edf72fbcbe261a39e19e29f1e3360addfe9e21bf18d09a6a9647d0/polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_armv7l.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2713b71b1f6af4d9dfc283dfafc881c8dcbb42e9706d77a63d91d2ba9bc18b85",
                "md5": "afa42deeb81eb37e080ed40fa9c2e463",
                "sha256": "f308588690b0a3bf4a4347d6ca7cdd4578f6ea6c290bd83572dcf89b06331853"
            },
            "downloads": -1,
            "filename": "polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_i686.whl",
            "has_sig": false,
            "md5_digest": "afa42deeb81eb37e080ed40fa9c2e463",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4182439,
            "upload_time": "2024-10-03T19:42:31",
            "upload_time_iso_8601": "2024-10-03T19:42:31.115022Z",
            "url": "https://files.pythonhosted.org/packages/27/13/b71b1f6af4d9dfc283dfafc881c8dcbb42e9706d77a63d91d2ba9bc18b85/polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e3293df8d42ba9269e99db76c309bf2c9da88b4bdc16b7239b594d725823f370",
                "md5": "f35873aa0e02be45dc38394314cf2129",
                "sha256": "22e7c8afed562f7461e5c2e31c35184f024dd730f4477c5f3325d7beb74b5d52"
            },
            "downloads": -1,
            "filename": "polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_x86_64.whl",
            "has_sig": false,
            "md5_digest": "f35873aa0e02be45dc38394314cf2129",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4054099,
            "upload_time": "2024-10-03T19:42:33",
            "upload_time_iso_8601": "2024-10-03T19:42:33.769991Z",
            "url": "https://files.pythonhosted.org/packages/e3/29/3df8d42ba9269e99db76c309bf2c9da88b4bdc16b7239b594d725823f370/polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "233d9d68fa5a272989d063f174c81f888a2e30886e55edba7c3209338a3aaa0f",
                "md5": "f809380ce5a5fa47b1292685c11357d2",
                "sha256": "33e9e760558dde60296dba2893bf5001afa9b47bfc5bfddd081fb75a6d571401"
            },
            "downloads": -1,
            "filename": "polars_strsim-0.2.3-cp38-abi3-win32.whl",
            "has_sig": false,
            "md5_digest": "f809380ce5a5fa47b1292685c11357d2",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 2892703,
            "upload_time": "2024-10-03T19:42:39",
            "upload_time_iso_8601": "2024-10-03T19:42:39.033571Z",
            "url": "https://files.pythonhosted.org/packages/23/3d/9d68fa5a272989d063f174c81f888a2e30886e55edba7c3209338a3aaa0f/polars_strsim-0.2.3-cp38-abi3-win32.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b256e8ba294ed528a02bfaeecf83f8b732c3b230545f243d3b1b9044e72ef3f5",
                "md5": "651fce92976016f9f781c90e8ad3278a",
                "sha256": "12364251d8584cf0a42faf03a78c2ae5dc19a23fba74674ce3275d9b3475a193"
            },
            "downloads": -1,
            "filename": "polars_strsim-0.2.3-cp38-abi3-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "651fce92976016f9f781c90e8ad3278a",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 3281660,
            "upload_time": "2024-10-03T19:42:36",
            "upload_time_iso_8601": "2024-10-03T19:42:36.777919Z",
            "url": "https://files.pythonhosted.org/packages/b2/56/e8ba294ed528a02bfaeecf83f8b732c3b230545f243d3b1b9044e72ef3f5/polars_strsim-0.2.3-cp38-abi3-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cd0a2f7dac45cfebc9372faabd183f52f05a7b5349722666f492056d64bc8e1f",
                "md5": "8041f1243e80e57a8e942a6c3897c80b",
                "sha256": "3e92bc81c933867e3e812a7000a51bc830d78377e079e0fc98bb26ad022879e3"
            },
            "downloads": -1,
            "filename": "polars_strsim-0.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "8041f1243e80e57a8e942a6c3897c80b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 31358,
            "upload_time": "2024-10-03T19:42:35",
            "upload_time_iso_8601": "2024-10-03T19:42:35.176365Z",
            "url": "https://files.pythonhosted.org/packages/cd/0a/2f7dac45cfebc9372faabd183f52f05a7b5349722666f492056d64bc8e1f/polars_strsim-0.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-03 19:42:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "foxcroftjn",
    "github_project": "polars-strsim",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "polars-strsim"
}
        
Elapsed time: 0.33589s