<a href="https://pypi.org/project/polars-strsim/">
<img src="https://img.shields.io/pypi/v/polars-strsim.svg" alt="PyPi Latest Release"/>
</a>
# String Similarity Measures for Polars
This package provides python bindings to compute various string similarity measures directly on a polars dataframe. All string similarity measures are implemented in rust and computed in parallel.
The similarity measures that have been implemented are:
- Levenshtein
- Jaro
- Jaro-Winkler
- Jaccard
- Sørensen-Dice
Each similarity measure returns a value normalized between 0.0 and 1.0 (inclusive), where 0.0 indicates the inputs are maximally different and 1.0 means the strings are maximally similar.
## Installing the Library
### With pip
```bash
pip install polars-strsim
```
### From Source
To build and install this library from source, first ensure you have [cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) installed. You will also need maturin, which you can install via `pip install 'maturin[patchelf]'`
polars-strsim can then be installed in your current python environment by running `maturin develop --release`
## Using the Library
**Input:**
```python
import polars as pl
from polars_strsim import levenshtein, jaro, jaro_winkler, jaccard, sorensen_dice
df = pl.DataFrame(
{
"name_a": ["phillips", "phillips", "" , "", None , None],
"name_b": ["phillips", "philips" , "phillips", "", "phillips", None],
}
).with_columns(
levenshtein=levenshtein("name_a", "name_b"),
jaro=jaro("name_a", "name_b"),
jaro_winkler=jaro_winkler("name_a", "name_b"),
jaccard=jaccard("name_a", "name_b"),
sorensen_dice=sorensen_dice("name_a", "name_b"),
)
with pl.Config(ascii_tables=True):
print(df)
```
**Output:**
```
shape: (6, 7)
+----------+----------+-------------+----------+--------------+---------+---------------+
| name_a | name_b | levenshtein | jaro | jaro_winkler | jaccard | sorensen_dice |
| --- | --- | --- | --- | --- | --- | --- |
| str | str | f64 | f64 | f64 | f64 | f64 |
+=======================================================================================+
| phillips | phillips | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
| phillips | philips | 0.875 | 0.958333 | 0.975 | 0.875 | 0.933333 |
| | phillips | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| | | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
| null | phillips | null | null | null | null | null |
| null | null | null | null | null | null | null |
+----------+----------+-------------+----------+--------------+---------+---------------+
```
Raw data
{
"_id": null,
"home_page": null,
"name": "polars-strsim",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "polars-extension, string-similarity",
"author": "Jeremy Foxcroft",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/cd/0a/2f7dac45cfebc9372faabd183f52f05a7b5349722666f492056d64bc8e1f/polars_strsim-0.2.3.tar.gz",
"platform": null,
"description": "<a href=\"https://pypi.org/project/polars-strsim/\">\n <img src=\"https://img.shields.io/pypi/v/polars-strsim.svg\" alt=\"PyPi Latest Release\"/>\n</a>\n\n# String Similarity Measures for Polars\n\nThis package provides python bindings to compute various string similarity measures directly on a polars dataframe. All string similarity measures are implemented in rust and computed in parallel.\n\nThe similarity measures that have been implemented are:\n\n- Levenshtein\n- Jaro\n- Jaro-Winkler\n- Jaccard\n- S\u00f8rensen-Dice\n\nEach similarity measure returns a value normalized between 0.0 and 1.0 (inclusive), where 0.0 indicates the inputs are maximally different and 1.0 means the strings are maximally similar.\n\n## Installing the Library\n\n### With pip\n\n```bash\npip install polars-strsim\n```\n\n### From Source\n\nTo build and install this library from source, first ensure you have [cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) installed. You will also need maturin, which you can install via `pip install 'maturin[patchelf]'`\n\npolars-strsim can then be installed in your current python environment by running `maturin develop --release`\n\n## Using the Library\n\n**Input:**\n\n```python\nimport polars as pl\nfrom polars_strsim import levenshtein, jaro, jaro_winkler, jaccard, sorensen_dice\n\ndf = pl.DataFrame(\n {\n \"name_a\": [\"phillips\", \"phillips\", \"\" , \"\", None , None],\n \"name_b\": [\"phillips\", \"philips\" , \"phillips\", \"\", \"phillips\", None],\n }\n).with_columns(\n levenshtein=levenshtein(\"name_a\", \"name_b\"),\n jaro=jaro(\"name_a\", \"name_b\"),\n jaro_winkler=jaro_winkler(\"name_a\", \"name_b\"),\n jaccard=jaccard(\"name_a\", \"name_b\"),\n sorensen_dice=sorensen_dice(\"name_a\", \"name_b\"),\n)\n\nwith pl.Config(ascii_tables=True):\n print(df)\n```\n**Output:**\n```\nshape: (6, 7)\n+----------+----------+-------------+----------+--------------+---------+---------------+\n| name_a | name_b | levenshtein | jaro | jaro_winkler | jaccard | sorensen_dice |\n| --- | --- | --- | --- | --- | --- | --- |\n| str | str | f64 | f64 | f64 | f64 | f64 |\n+=======================================================================================+\n| phillips | phillips | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |\n| phillips | philips | 0.875 | 0.958333 | 0.975 | 0.875 | 0.933333 |\n| | phillips | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |\n| | | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |\n| null | phillips | null | null | null | null | null |\n| null | null | null | null | null | null | null |\n+----------+----------+-------------+----------+--------------+---------+---------------+\n```\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Polars extension for string similarity",
"version": "0.2.3",
"project_urls": {
"Issues": "https://github.com/foxcroftjn/polars-strsim/issues",
"Repository": "https://github.com/foxcroftjn/polars-strsim"
},
"split_keywords": [
"polars-extension",
" string-similarity"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3c2610dcc77881417ed1fb349b19785fc8cd041ae1d4c9fea0e493a45563cee6",
"md5": "39b2416f751f1fa79d2594f0576e0053",
"sha256": "099b41cecdaa6bf70dc70bb1bc657303cfa7d84ad662da7acc06d0bc393ff88f"
},
"downloads": -1,
"filename": "polars_strsim-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl",
"has_sig": false,
"md5_digest": "39b2416f751f1fa79d2594f0576e0053",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 3296707,
"upload_time": "2024-10-03T19:42:23",
"upload_time_iso_8601": "2024-10-03T19:42:23.739630Z",
"url": "https://files.pythonhosted.org/packages/3c/26/10dcc77881417ed1fb349b19785fc8cd041ae1d4c9fea0e493a45563cee6/polars_strsim-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "22fd9c150b8efd43baa59d1cf602c8bde240bbf863f09f5502633b3caf49d5b3",
"md5": "e07c5cf3ac910b40adbef3c81f853bb8",
"sha256": "1c12eefd99d0d6f31748a4d2d0fc787dd7e4f2a99c2a4bffd4f76a78baa829dc"
},
"downloads": -1,
"filename": "polars_strsim-0.2.3-cp38-abi3-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "e07c5cf3ac910b40adbef3c81f853bb8",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 3124560,
"upload_time": "2024-10-03T19:42:21",
"upload_time_iso_8601": "2024-10-03T19:42:21.732847Z",
"url": "https://files.pythonhosted.org/packages/22/fd/9c150b8efd43baa59d1cf602c8bde240bbf863f09f5502633b3caf49d5b3/polars_strsim-0.2.3-cp38-abi3-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "8af4ac701f2dd822a4505e9ec04abb38ff69fa360ab3c8ea49112605532e6fb0",
"md5": "a72be08fa22ab0854520f8957974f708",
"sha256": "f993301f9390bd80880622ba09bc6b53c81b495b7bd797e01481b63964da3de7"
},
"downloads": -1,
"filename": "polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"has_sig": false,
"md5_digest": "a72be08fa22ab0854520f8957974f708",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 3821788,
"upload_time": "2024-10-03T19:42:12",
"upload_time_iso_8601": "2024-10-03T19:42:12.201905Z",
"url": "https://files.pythonhosted.org/packages/8a/f4/ac701f2dd822a4505e9ec04abb38ff69fa360ab3c8ea49112605532e6fb0/polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "72e8a76c3cd3c4a1e0b1b79cf9560ce1c039694b8ad97b21cdce278533b1ac5b",
"md5": "89f7f96b3d3427c787ad3ca42b2c04e9",
"sha256": "ce09bd951b6302b1fa904cb5d22870dfcb4092b115501cadb6a0a94ae5c0ff1f"
},
"downloads": -1,
"filename": "polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl",
"has_sig": false,
"md5_digest": "89f7f96b3d3427c787ad3ca42b2c04e9",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 3982552,
"upload_time": "2024-10-03T19:42:14",
"upload_time_iso_8601": "2024-10-03T19:42:14.404673Z",
"url": "https://files.pythonhosted.org/packages/72/e8/a76c3cd3c4a1e0b1b79cf9560ce1c039694b8ad97b21cdce278533b1ac5b/polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6ebdc8fe9f8f9de7828eb9194d570d2288a5f3a3dfb1dae1b1ee1eae46db7026",
"md5": "3ef389293c76ad12013a47e84ee9e8a8",
"sha256": "f05c6eaedc91ab657de84abc13796c3bf4e47ff304be10fb137db8db71fa9eb8"
},
"downloads": -1,
"filename": "polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl",
"has_sig": false,
"md5_digest": "3ef389293c76ad12013a47e84ee9e8a8",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4286243,
"upload_time": "2024-10-03T19:42:16",
"upload_time_iso_8601": "2024-10-03T19:42:16.668271Z",
"url": "https://files.pythonhosted.org/packages/6e/bd/c8fe9f8f9de7828eb9194d570d2288a5f3a3dfb1dae1b1ee1eae46db7026/polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "223b7b235b457075aa8ab63a251d33046301678f2af0040a1018c1659c5da941",
"md5": "85730d37a84eac64bc425a50ad2d7438",
"sha256": "a811fd2b169e2c4f732d087303d1038d84ba309f4debb7852f29f2487a3f8a7b"
},
"downloads": -1,
"filename": "polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "85730d37a84eac64bc425a50ad2d7438",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 3961747,
"upload_time": "2024-10-03T19:42:19",
"upload_time_iso_8601": "2024-10-03T19:42:19.221853Z",
"url": "https://files.pythonhosted.org/packages/22/3b/7b235b457075aa8ab63a251d33046301678f2af0040a1018c1659c5da941/polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a4b48f8f25245d3669b11e2615e5ba1496675d3e798bc61e0b54506ae4fa3db6",
"md5": "7ea49d97ccd30034e180dd4520818286",
"sha256": "dbb5afea1d06e8934d579e53deda6e883a59d28558336a339090bda50bf2022a"
},
"downloads": -1,
"filename": "polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_aarch64.whl",
"has_sig": false,
"md5_digest": "7ea49d97ccd30034e180dd4520818286",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 3876065,
"upload_time": "2024-10-03T19:42:26",
"upload_time_iso_8601": "2024-10-03T19:42:26.112586Z",
"url": "https://files.pythonhosted.org/packages/a4/b4/8f8f25245d3669b11e2615e5ba1496675d3e798bc61e0b54506ae4fa3db6/polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "e6bcdd7ae1edf72fbcbe261a39e19e29f1e3360addfe9e21bf18d09a6a9647d0",
"md5": "a61617a9b36d01fd05e94b3396b45ad9",
"sha256": "bfcd90efdcb54016caa119ea8c651df27096f3b2f9a0fda0f2faa2b150861203"
},
"downloads": -1,
"filename": "polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_armv7l.whl",
"has_sig": false,
"md5_digest": "a61617a9b36d01fd05e94b3396b45ad9",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4124209,
"upload_time": "2024-10-03T19:42:28",
"upload_time_iso_8601": "2024-10-03T19:42:28.807214Z",
"url": "https://files.pythonhosted.org/packages/e6/bc/dd7ae1edf72fbcbe261a39e19e29f1e3360addfe9e21bf18d09a6a9647d0/polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_armv7l.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2713b71b1f6af4d9dfc283dfafc881c8dcbb42e9706d77a63d91d2ba9bc18b85",
"md5": "afa42deeb81eb37e080ed40fa9c2e463",
"sha256": "f308588690b0a3bf4a4347d6ca7cdd4578f6ea6c290bd83572dcf89b06331853"
},
"downloads": -1,
"filename": "polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_i686.whl",
"has_sig": false,
"md5_digest": "afa42deeb81eb37e080ed40fa9c2e463",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4182439,
"upload_time": "2024-10-03T19:42:31",
"upload_time_iso_8601": "2024-10-03T19:42:31.115022Z",
"url": "https://files.pythonhosted.org/packages/27/13/b71b1f6af4d9dfc283dfafc881c8dcbb42e9706d77a63d91d2ba9bc18b85/polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "e3293df8d42ba9269e99db76c309bf2c9da88b4bdc16b7239b594d725823f370",
"md5": "f35873aa0e02be45dc38394314cf2129",
"sha256": "22e7c8afed562f7461e5c2e31c35184f024dd730f4477c5f3325d7beb74b5d52"
},
"downloads": -1,
"filename": "polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_x86_64.whl",
"has_sig": false,
"md5_digest": "f35873aa0e02be45dc38394314cf2129",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4054099,
"upload_time": "2024-10-03T19:42:33",
"upload_time_iso_8601": "2024-10-03T19:42:33.769991Z",
"url": "https://files.pythonhosted.org/packages/e3/29/3df8d42ba9269e99db76c309bf2c9da88b4bdc16b7239b594d725823f370/polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "233d9d68fa5a272989d063f174c81f888a2e30886e55edba7c3209338a3aaa0f",
"md5": "f809380ce5a5fa47b1292685c11357d2",
"sha256": "33e9e760558dde60296dba2893bf5001afa9b47bfc5bfddd081fb75a6d571401"
},
"downloads": -1,
"filename": "polars_strsim-0.2.3-cp38-abi3-win32.whl",
"has_sig": false,
"md5_digest": "f809380ce5a5fa47b1292685c11357d2",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 2892703,
"upload_time": "2024-10-03T19:42:39",
"upload_time_iso_8601": "2024-10-03T19:42:39.033571Z",
"url": "https://files.pythonhosted.org/packages/23/3d/9d68fa5a272989d063f174c81f888a2e30886e55edba7c3209338a3aaa0f/polars_strsim-0.2.3-cp38-abi3-win32.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b256e8ba294ed528a02bfaeecf83f8b732c3b230545f243d3b1b9044e72ef3f5",
"md5": "651fce92976016f9f781c90e8ad3278a",
"sha256": "12364251d8584cf0a42faf03a78c2ae5dc19a23fba74674ce3275d9b3475a193"
},
"downloads": -1,
"filename": "polars_strsim-0.2.3-cp38-abi3-win_amd64.whl",
"has_sig": false,
"md5_digest": "651fce92976016f9f781c90e8ad3278a",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 3281660,
"upload_time": "2024-10-03T19:42:36",
"upload_time_iso_8601": "2024-10-03T19:42:36.777919Z",
"url": "https://files.pythonhosted.org/packages/b2/56/e8ba294ed528a02bfaeecf83f8b732c3b230545f243d3b1b9044e72ef3f5/polars_strsim-0.2.3-cp38-abi3-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "cd0a2f7dac45cfebc9372faabd183f52f05a7b5349722666f492056d64bc8e1f",
"md5": "8041f1243e80e57a8e942a6c3897c80b",
"sha256": "3e92bc81c933867e3e812a7000a51bc830d78377e079e0fc98bb26ad022879e3"
},
"downloads": -1,
"filename": "polars_strsim-0.2.3.tar.gz",
"has_sig": false,
"md5_digest": "8041f1243e80e57a8e942a6c3897c80b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 31358,
"upload_time": "2024-10-03T19:42:35",
"upload_time_iso_8601": "2024-10-03T19:42:35.176365Z",
"url": "https://files.pythonhosted.org/packages/cd/0a/2f7dac45cfebc9372faabd183f52f05a7b5349722666f492056d64bc8e1f/polars_strsim-0.2.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-03 19:42:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "foxcroftjn",
"github_project": "polars-strsim",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "polars-strsim"
}