<p align="center">
<a href="https://github.com/Intsights/fastzy">
<img src="https://raw.githubusercontent.com/Intsights/fastzy/master/images/logo.png" alt="Logo">
</a>
<h3 align="center">
Python library for fast fuzzy search over a big file written in Rust
</h3>
</p>
![license](https://img.shields.io/badge/MIT-License-blue)
![Python](https://img.shields.io/badge/Python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10%20%7C%203.11-blue)
![Build](https://github.com/Intsights/fastzy/workflows/Build/badge.svg)
[![PyPi](https://img.shields.io/pypi/v/fastzy.svg)](https://pypi.org/project/fastzy/)
## Table of Contents
- [Table of Contents](#table-of-contents)
- [About The Project](#about-the-project)
- [Built With](#built-with)
- [Performance](#performance)
- [Installation](#installation)
- [Usage](#usage)
- [License](#license)
- [Contact](#contact)
## About The Project
Fastzy is a library written in Rust that can search through a file looking for text based on its distance (Levenshtein). For measuring the Levenshtein distance, the library uses mbleven's algorithm. In situations where the requested distance exceeds 3, where mbleven is slower, Wagner-Fischer is used instead of mbleven. This library loads the whole file into memory, and creates a lightweight index based on the length of the lines. The result is that only potential lines are looked up, opposed to a large number of lines.
### Built With
* [mbleven](https://github.com/fujimotos/mbleven)
* [Pyo3](https://github.com/PyO3/pyo3)
### Performance
| Library | Function | Time |
| ------------- | ------------- | ------------- |
| [polyleven](https://github.com/ztane/python-Levenshtein) | polyleven.levenshtein('text') | 8.48s |
| [fastzy](https://github.com/Intsights/fastzy) | fastzy.search('text) | 0.003s |
### Installation
```sh
pip3 install fastzy
```
## Usage
```python
import fastzy
# open a file and index it in memory
searcher = fastzy.Searcher(
file_path='input_text_file.txt',
separator='',
)
# search for the input text 'text' with the distance of 1
searcher.search(
pattern='text',
max_distance=1,
)
['test', 'texts', 'next']
```
## License
Distributed under the MIT License. See `LICENSE` for more information.
## Contact
Gal Ben David - gal@intsights.com
Project Link: [https://github.com/Intsights/fastzy](https://github.com/Intsights/fastzy)
Raw data
{
"_id": null,
"home_page": "https://github.com/intsights/fastzy",
"name": "fastzy",
"maintainer": null,
"docs_url": null,
"requires_python": "",
"maintainer_email": null,
"keywords": "fuzzy,levenshtein,rust",
"author": "Gal Ben David <gal@intsights.com>",
"author_email": "Gal Ben David <gal@intsights.com>",
"download_url": null,
"platform": null,
"description": "<p align=\"center\">\n <a href=\"https://github.com/Intsights/fastzy\">\n <img src=\"https://raw.githubusercontent.com/Intsights/fastzy/master/images/logo.png\" alt=\"Logo\">\n </a>\n <h3 align=\"center\">\n Python library for fast fuzzy search over a big file written in Rust\n </h3>\n</p>\n\n![license](https://img.shields.io/badge/MIT-License-blue)\n![Python](https://img.shields.io/badge/Python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10%20%7C%203.11-blue)\n![Build](https://github.com/Intsights/fastzy/workflows/Build/badge.svg)\n[![PyPi](https://img.shields.io/pypi/v/fastzy.svg)](https://pypi.org/project/fastzy/)\n\n## Table of Contents\n\n- [Table of Contents](#table-of-contents)\n- [About The Project](#about-the-project)\n - [Built With](#built-with)\n - [Performance](#performance)\n - [Installation](#installation)\n- [Usage](#usage)\n- [License](#license)\n- [Contact](#contact)\n\n\n## About The Project\n\nFastzy is a library written in Rust that can search through a file looking for text based on its distance (Levenshtein). For measuring the Levenshtein distance, the library uses mbleven's algorithm. In situations where the requested distance exceeds 3, where mbleven is slower, Wagner-Fischer is used instead of mbleven. This library loads the whole file into memory, and creates a lightweight index based on the length of the lines. The result is that only potential lines are looked up, opposed to a large number of lines.\n\n\n### Built With\n\n* [mbleven](https://github.com/fujimotos/mbleven)\n* [Pyo3](https://github.com/PyO3/pyo3)\n\n\n### Performance\n\n| Library | Function | Time |\n| ------------- | ------------- | ------------- |\n| [polyleven](https://github.com/ztane/python-Levenshtein) | polyleven.levenshtein('text') | 8.48s |\n| [fastzy](https://github.com/Intsights/fastzy) | fastzy.search('text) | 0.003s |\n\n\n### Installation\n\n```sh\npip3 install fastzy\n```\n\n\n## Usage\n\n```python\nimport fastzy\n\n# open a file and index it in memory\nsearcher = fastzy.Searcher(\n file_path='input_text_file.txt',\n separator='',\n)\n\n# search for the input text 'text' with the distance of 1\nsearcher.search(\n pattern='text',\n max_distance=1,\n)\n['test', 'texts', 'next']\n```\n\n\n## License\n\nDistributed under the MIT License. See `LICENSE` for more information.\n\n\n## Contact\n\nGal Ben David - gal@intsights.com\n\nProject Link: [https://github.com/Intsights/fastzy](https://github.com/Intsights/fastzy)\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python library for fast fuzzy search over a big file written in Rust",
"version": "0.5.0",
"split_keywords": [
"fuzzy",
"levenshtein",
"rust"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "5cc72ce68f950af59f60125e47753fa23b498beae601b26b6b858fe62705dcc5",
"md5": "6beeab9d5e6493a0bba94a87bcd33ece",
"sha256": "256a3137e90de4345d8ab51ab1356321f44f3858d14f014d9cf26514f704905f"
},
"downloads": -1,
"filename": "fastzy-0.5.0-cp310-cp310-macosx_10_7_x86_64.whl",
"has_sig": false,
"md5_digest": "6beeab9d5e6493a0bba94a87bcd33ece",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": null,
"size": 215938,
"upload_time": "2023-01-09T12:19:41",
"upload_time_iso_8601": "2023-01-09T12:19:41.783599Z",
"url": "https://files.pythonhosted.org/packages/5c/c7/2ce68f950af59f60125e47753fa23b498beae601b26b6b858fe62705dcc5/fastzy-0.5.0-cp310-cp310-macosx_10_7_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a67867d59961c30574cb9da3fcf2893be673b1e5a5178174d251f81412279048",
"md5": "08fb3ab318206bf0401a9ec24f755768",
"sha256": "a91c22d1d521aa6d28eb96488ad2b460d5a73aa87c9458f4cc90ede6cfc189d8"
},
"downloads": -1,
"filename": "fastzy-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "08fb3ab318206bf0401a9ec24f755768",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": null,
"size": 221745,
"upload_time": "2023-01-09T12:17:40",
"upload_time_iso_8601": "2023-01-09T12:17:40.966865Z",
"url": "https://files.pythonhosted.org/packages/a6/78/67d59961c30574cb9da3fcf2893be673b1e5a5178174d251f81412279048/fastzy-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4efe22a46e4b03e982d25e9b00f9f7535ba1219ff0be151b6b07b669a487f303",
"md5": "dc737352e9139b4305deaad84d2aea96",
"sha256": "2f560fb9a05be8ed14434cb3f2c976eb9fcc54bfa3a4745a79430391cf9865ba"
},
"downloads": -1,
"filename": "fastzy-0.5.0-cp310-none-win_amd64.whl",
"has_sig": false,
"md5_digest": "dc737352e9139b4305deaad84d2aea96",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": null,
"size": 153135,
"upload_time": "2023-01-09T12:18:05",
"upload_time_iso_8601": "2023-01-09T12:18:05.572328Z",
"url": "https://files.pythonhosted.org/packages/4e/fe/22a46e4b03e982d25e9b00f9f7535ba1219ff0be151b6b07b669a487f303/fastzy-0.5.0-cp310-none-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d3697e25721a89697c5f564b531183869943c372858f3e50ca8567be6bc768eb",
"md5": "7d845d6af2bc09311b5dabfbeea0814e",
"sha256": "fc12465aeba29c26b33935fe8b54b8d63eda467e4ee69c9a022b4dfc63e4214f"
},
"downloads": -1,
"filename": "fastzy-0.5.0-cp311-cp311-macosx_10_7_x86_64.whl",
"has_sig": false,
"md5_digest": "7d845d6af2bc09311b5dabfbeea0814e",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": null,
"size": 215936,
"upload_time": "2023-01-09T12:17:44",
"upload_time_iso_8601": "2023-01-09T12:17:44.323650Z",
"url": "https://files.pythonhosted.org/packages/d3/69/7e25721a89697c5f564b531183869943c372858f3e50ca8567be6bc768eb/fastzy-0.5.0-cp311-cp311-macosx_10_7_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "8571cb2a9a005a7ff061e81791fe549ec445ff8e2d1c8f3e2bd1366280b41e64",
"md5": "b9645841d23f36bb1655baace82401bc",
"sha256": "71888737fb918f75751fbb0dd0346bd579da705b6326414c0d8d70af3f7f6069"
},
"downloads": -1,
"filename": "fastzy-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "b9645841d23f36bb1655baace82401bc",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": null,
"size": 221745,
"upload_time": "2023-01-09T12:18:19",
"upload_time_iso_8601": "2023-01-09T12:18:19.554468Z",
"url": "https://files.pythonhosted.org/packages/85/71/cb2a9a005a7ff061e81791fe549ec445ff8e2d1c8f3e2bd1366280b41e64/fastzy-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a0de2a780dfef65aebcbf9f4177dd0af626e634fbd05aa2eadcce912ca798f2b",
"md5": "b31d1e869344e0ec21777ed29ea3dc83",
"sha256": "cc7c0d574aed292a2ee6e0fe4dca6b96e88b4ee1098279b444aee566065aacc7"
},
"downloads": -1,
"filename": "fastzy-0.5.0-cp311-none-win_amd64.whl",
"has_sig": false,
"md5_digest": "b31d1e869344e0ec21777ed29ea3dc83",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": null,
"size": 153137,
"upload_time": "2023-01-09T12:18:34",
"upload_time_iso_8601": "2023-01-09T12:18:34.998278Z",
"url": "https://files.pythonhosted.org/packages/a0/de/2a780dfef65aebcbf9f4177dd0af626e634fbd05aa2eadcce912ca798f2b/fastzy-0.5.0-cp311-none-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "7b3f5350e8a4799113562e33d6c358e5d96c1b0fa1bd59cb1b43aeb4985fa249",
"md5": "58ea76b638d5dfe251c316504dc2fea9",
"sha256": "1c8f216ed67bb46b12a8d9ad2500e3ea64d01796000abbc9409883a62c45ef88"
},
"downloads": -1,
"filename": "fastzy-0.5.0-cp37-cp37m-macosx_10_7_x86_64.whl",
"has_sig": false,
"md5_digest": "58ea76b638d5dfe251c316504dc2fea9",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": null,
"size": 215943,
"upload_time": "2023-01-09T12:17:49",
"upload_time_iso_8601": "2023-01-09T12:17:49.661237Z",
"url": "https://files.pythonhosted.org/packages/7b/3f/5350e8a4799113562e33d6c358e5d96c1b0fa1bd59cb1b43aeb4985fa249/fastzy-0.5.0-cp37-cp37m-macosx_10_7_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f7400706a613169ec3c9a21c37a8e3fa553a4301eb966a196c4ad91aa9647070",
"md5": "97a5441cac3d71f76dd131f0d28341e0",
"sha256": "c2ecdfd63092879a8df7ac115f972867c56e4041fb6b2280085e0a3d618ab528"
},
"downloads": -1,
"filename": "fastzy-0.5.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "97a5441cac3d71f76dd131f0d28341e0",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": null,
"size": 221682,
"upload_time": "2023-01-09T12:17:41",
"upload_time_iso_8601": "2023-01-09T12:17:41.487029Z",
"url": "https://files.pythonhosted.org/packages/f7/40/0706a613169ec3c9a21c37a8e3fa553a4301eb966a196c4ad91aa9647070/fastzy-0.5.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c7bd437148f2aa22fff84f14cf0c859cc2bc1444b59427d82d73a59d1a7894b7",
"md5": "b04ac5d22266a548c017e21f22584f8a",
"sha256": "3734a947c129ba384e86972ae4a3a8a25a04eb66d8d0d8068760159c93cf6f52"
},
"downloads": -1,
"filename": "fastzy-0.5.0-cp37-none-win_amd64.whl",
"has_sig": false,
"md5_digest": "b04ac5d22266a548c017e21f22584f8a",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": null,
"size": 153130,
"upload_time": "2023-01-09T12:18:26",
"upload_time_iso_8601": "2023-01-09T12:18:26.265483Z",
"url": "https://files.pythonhosted.org/packages/c7/bd/437148f2aa22fff84f14cf0c859cc2bc1444b59427d82d73a59d1a7894b7/fastzy-0.5.0-cp37-none-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2e22103150e4692048e7074c859fe85f29f12aaabc50684cd4a701c5ff9d3210",
"md5": "376e61f2501759d566ba96db457e5e7e",
"sha256": "dda6b3127f1db282e88b0c4bc311b5b836c68e3d18365ab90ae82cdf579698ec"
},
"downloads": -1,
"filename": "fastzy-0.5.0-cp38-cp38-macosx_10_7_x86_64.whl",
"has_sig": false,
"md5_digest": "376e61f2501759d566ba96db457e5e7e",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": null,
"size": 215884,
"upload_time": "2023-01-09T12:17:22",
"upload_time_iso_8601": "2023-01-09T12:17:22.256648Z",
"url": "https://files.pythonhosted.org/packages/2e/22/103150e4692048e7074c859fe85f29f12aaabc50684cd4a701c5ff9d3210/fastzy-0.5.0-cp38-cp38-macosx_10_7_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "852adb6ede2b86b3b9394057fde58f84860ceb9b08c581c89c8806729c89921a",
"md5": "3679b23ede8e17eca140fd0927521c41",
"sha256": "ab4574d8bfc63c62c3402a62a8bcd54e8338a1b72d89f097aadff4110840ec87"
},
"downloads": -1,
"filename": "fastzy-0.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "3679b23ede8e17eca140fd0927521c41",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": null,
"size": 221684,
"upload_time": "2023-01-09T12:17:44",
"upload_time_iso_8601": "2023-01-09T12:17:44.999018Z",
"url": "https://files.pythonhosted.org/packages/85/2a/db6ede2b86b3b9394057fde58f84860ceb9b08c581c89c8806729c89921a/fastzy-0.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c44e38300d3191a7f7da669709821e20b5f0dc279f4b3791eb46ef2336b76953",
"md5": "6d9d9d6c16af643354b7bdd51a9c04dd",
"sha256": "7cb80becdeb2ddbb6a84c0b3d4fcb1d132702b04cc083e6bf66e5462398fe890"
},
"downloads": -1,
"filename": "fastzy-0.5.0-cp38-none-win_amd64.whl",
"has_sig": false,
"md5_digest": "6d9d9d6c16af643354b7bdd51a9c04dd",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": null,
"size": 153159,
"upload_time": "2023-01-09T12:18:31",
"upload_time_iso_8601": "2023-01-09T12:18:31.833651Z",
"url": "https://files.pythonhosted.org/packages/c4/4e/38300d3191a7f7da669709821e20b5f0dc279f4b3791eb46ef2336b76953/fastzy-0.5.0-cp38-none-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a44287db8fe92663fb65e3d5a514301abe968dff64a0712f0004454ff5c4dcdf",
"md5": "6a110bbb8bafb831cdf28226be0cbcd5",
"sha256": "4fdf38e3dfe140b86d3313e78435cfe33194cb12bc286504bff0c6c7aa1f3e04"
},
"downloads": -1,
"filename": "fastzy-0.5.0-cp39-cp39-macosx_10_7_x86_64.whl",
"has_sig": false,
"md5_digest": "6a110bbb8bafb831cdf28226be0cbcd5",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 215979,
"upload_time": "2023-01-09T12:17:27",
"upload_time_iso_8601": "2023-01-09T12:17:27.357347Z",
"url": "https://files.pythonhosted.org/packages/a4/42/87db8fe92663fb65e3d5a514301abe968dff64a0712f0004454ff5c4dcdf/fastzy-0.5.0-cp39-cp39-macosx_10_7_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d5684c0e69d1de88f3630135ea803f43017095a0d70c50cabd73952f9c8ba2d7",
"md5": "c887c9fff1f95cbf13779e6e42797e3e",
"sha256": "a428304c8b0c469a065f41dcfb037cb065a8ba2a8be29bcf654b2be00ced423c"
},
"downloads": -1,
"filename": "fastzy-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "c887c9fff1f95cbf13779e6e42797e3e",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 221800,
"upload_time": "2023-01-09T12:18:02",
"upload_time_iso_8601": "2023-01-09T12:18:02.767028Z",
"url": "https://files.pythonhosted.org/packages/d5/68/4c0e69d1de88f3630135ea803f43017095a0d70c50cabd73952f9c8ba2d7/fastzy-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0b434aca24c67c7b866aaee5b1033d8a6f67d2e3920a04662e487ee285caaaa8",
"md5": "4c1edb463095fd25d04fc9249a89fc84",
"sha256": "02a6c17cd61e5608f0239d64c181ea74d9b061f0eec632ca0941515dd9622b01"
},
"downloads": -1,
"filename": "fastzy-0.5.0-cp39-none-win_amd64.whl",
"has_sig": false,
"md5_digest": "4c1edb463095fd25d04fc9249a89fc84",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 153135,
"upload_time": "2023-01-09T12:18:23",
"upload_time_iso_8601": "2023-01-09T12:18:23.616239Z",
"url": "https://files.pythonhosted.org/packages/0b/43/4aca24c67c7b866aaee5b1033d8a6f67d2e3920a04662e487ee285caaaa8/fastzy-0.5.0-cp39-none-win_amd64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-09 12:19:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "intsights",
"github_project": "fastzy",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "fastzy"
}