# Fuzzy Comparison Utilities for DataFrame Columns
## pip install fuzzypandaswuzzy
#### Tested against Windows 10 / Python 3.10 / Anaconda
```python
This module provides a function to perform fuzzy comparison between two columns of a DataFrame using the RapidFuzz library.
It also extends the DataFrame class to add a method for fuzzy comparison between two columns.
Module dependencies:
- pandas (pd)
- numpy (np)
- RapidFuzz (from rapidfuzz import process, fuzz)
Usage:
import pandas as pd
from rapidfuzz import fuzz
from fuzzypandaswuzzy import pd_add_fuzzy_all
pd_add_fuzzy_all()
df = pd.read_csv(r"arcore_devicelist.csv")
df2 = df.d_fuzzy2cols(scorer=fuzz.QRatio) # compares the first 2 columns
aa_value1 aa_match aa_index_v2 aa_value2
0 Mobicel 82.352943 1978 Mobicel_R1
1 Hyundai 66.666664 5425 Cunda
2 OPPO 66.666664 10102 P7PRO
3 samsung 80.000000 745 samseong
4 DEXP 66.666664 1174 EP
... ... ... ...
22523 TECNO 76.923080 587 TECNO-i5
22524 STYLO 83.333336 7272 STYLOF1
22525 GarantiaMOVIL 52.631580 16788 armani
22526 Cherry_Mobile 72.000000 3510 Cherry_Comet
22527 SANSUI 53.333332 3465 ASUS_P00I
Note:
The 'scorer' parameter in the fuzzy_compare function and d_fuzzy2cols method accepts a scoring function from the RapidFuzz library
(e.g., fuzz.WRatio, fuzz.QRatio, etc.). If no scorer is specified, the default scorer used is fuzz.QRatio.
For more information on the RapidFuzz library, visit https://github.com/maxbachmann/rapidfuzz.
```
Raw data
{
"_id": null,
"home_page": "https://github.com/hansalemaos/fuzzypandaswuzzy",
"name": "fuzzypandaswuzzy",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "fuzzy,wuzzy,pandas",
"author": "Johannes Fischer",
"author_email": "aulasparticularesdealemaosp@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f6/43/90648bb007b6e80d153d0aa6ad5dd59ef69489df695d4519e540a41190b7/fuzzypandaswuzzy-0.10.tar.gz",
"platform": null,
"description": "\r\n# Fuzzy Comparison Utilities for DataFrame Columns\r\n\r\n## pip install fuzzypandaswuzzy \r\n\r\n#### Tested against Windows 10 / Python 3.10 / Anaconda \r\n\r\n\t\r\n```python\r\n\r\nThis module provides a function to perform fuzzy comparison between two columns of a DataFrame using the RapidFuzz library.\r\nIt also extends the DataFrame class to add a method for fuzzy comparison between two columns.\r\n\r\nModule dependencies:\r\n\t- pandas (pd)\r\n\t- numpy (np)\r\n\t- RapidFuzz (from rapidfuzz import process, fuzz)\r\n\r\nUsage:\r\n\timport pandas as pd\r\n\tfrom rapidfuzz import fuzz\r\n\tfrom fuzzypandaswuzzy import pd_add_fuzzy_all\r\n\tpd_add_fuzzy_all()\r\n\r\n\tdf = pd.read_csv(r\"arcore_devicelist.csv\")\r\n\tdf2 = df.d_fuzzy2cols(scorer=fuzz.QRatio) # compares the first 2 columns\r\n\r\n\t\t\t aa_value1 aa_match aa_index_v2 aa_value2\r\n\t0 Mobicel 82.352943 1978 Mobicel_R1\r\n\t1 Hyundai 66.666664 5425 Cunda\r\n\t2 OPPO 66.666664 10102 P7PRO\r\n\t3 samsung 80.000000 745 samseong\r\n\t4 DEXP 66.666664 1174 EP\r\n\t\t\t\t ... ... ... ...\r\n\t22523 TECNO 76.923080 587 TECNO-i5\r\n\t22524 STYLO 83.333336 7272 STYLOF1\r\n\t22525 GarantiaMOVIL 52.631580 16788 armani\r\n\t22526 Cherry_Mobile 72.000000 3510 Cherry_Comet\r\n\t22527 SANSUI 53.333332 3465 ASUS_P00I\r\n\r\n\r\nNote:\r\n\tThe 'scorer' parameter in the fuzzy_compare function and d_fuzzy2cols method accepts a scoring function from the RapidFuzz library\r\n\t(e.g., fuzz.WRatio, fuzz.QRatio, etc.). If no scorer is specified, the default scorer used is fuzz.QRatio.\r\n\r\n\tFor more information on the RapidFuzz library, visit https://github.com/maxbachmann/rapidfuzz.\t\r\n```\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Fuzzy Comparison Utilities for DataFrame Columns",
"version": "0.10",
"project_urls": {
"Homepage": "https://github.com/hansalemaos/fuzzypandaswuzzy"
},
"split_keywords": [
"fuzzy",
"wuzzy",
"pandas"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ff0b14af787ab9f24cb1fe9912eb0e2d39fa908da779459b5e033ac981ad264b",
"md5": "8fe40de3bdb1bd036b605536f7d83ab6",
"sha256": "2b32b821a4b715d55b8a0f43fc4b290cfb109a379d2aff38d62bb7c7de59b6c4"
},
"downloads": -1,
"filename": "fuzzypandaswuzzy-0.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8fe40de3bdb1bd036b605536f7d83ab6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 22625,
"upload_time": "2023-07-30T00:08:49",
"upload_time_iso_8601": "2023-07-30T00:08:49.262000Z",
"url": "https://files.pythonhosted.org/packages/ff/0b/14af787ab9f24cb1fe9912eb0e2d39fa908da779459b5e033ac981ad264b/fuzzypandaswuzzy-0.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f64390648bb007b6e80d153d0aa6ad5dd59ef69489df695d4519e540a41190b7",
"md5": "59a657176bfb5b926959c9913abfe86f",
"sha256": "8515cb4b94a9b3db21ad0daa5b3790d19513abdf6151ecb70058ee24be782e1e"
},
"downloads": -1,
"filename": "fuzzypandaswuzzy-0.10.tar.gz",
"has_sig": false,
"md5_digest": "59a657176bfb5b926959c9913abfe86f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 21767,
"upload_time": "2023-07-30T00:08:50",
"upload_time_iso_8601": "2023-07-30T00:08:50.767114Z",
"url": "https://files.pythonhosted.org/packages/f6/43/90648bb007b6e80d153d0aa6ad5dd59ef69489df695d4519e540a41190b7/fuzzypandaswuzzy-0.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-30 00:08:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hansalemaos",
"github_project": "fuzzypandaswuzzy",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "fuzzypandaswuzzy"
}