fuzzypandaswuzzy


Namefuzzypandaswuzzy JSON
Version 0.10 PyPI version JSON
download
home_pagehttps://github.com/hansalemaos/fuzzypandaswuzzy
SummaryFuzzy Comparison Utilities for DataFrame Columns
upload_time2023-07-30 00:08:50
maintainer
docs_urlNone
authorJohannes Fischer
requires_python
licenseMIT
keywords fuzzy wuzzy pandas
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Fuzzy Comparison Utilities for DataFrame Columns

## pip install fuzzypandaswuzzy 

#### Tested against Windows 10 / Python 3.10 / Anaconda 

	
```python

This module provides a function to perform fuzzy comparison between two columns of a DataFrame using the RapidFuzz library.
It also extends the DataFrame class to add a method for fuzzy comparison between two columns.

Module dependencies:
	- pandas (pd)
	- numpy (np)
	- RapidFuzz (from rapidfuzz import process, fuzz)

Usage:
	import pandas as pd
	from rapidfuzz import fuzz
	from fuzzypandaswuzzy import pd_add_fuzzy_all
	pd_add_fuzzy_all()

	df = pd.read_csv(r"arcore_devicelist.csv")
	df2 = df.d_fuzzy2cols(scorer=fuzz.QRatio) # compares the first 2 columns

			   aa_value1   aa_match  aa_index_v2     aa_value2
	0            Mobicel  82.352943         1978    Mobicel_R1
	1            Hyundai  66.666664         5425         Cunda
	2               OPPO  66.666664        10102         P7PRO
	3            samsung  80.000000          745      samseong
	4               DEXP  66.666664         1174            EP
				  ...        ...          ...           ...
	22523          TECNO  76.923080          587      TECNO-i5
	22524          STYLO  83.333336         7272       STYLOF1
	22525  GarantiaMOVIL  52.631580        16788        armani
	22526  Cherry_Mobile  72.000000         3510  Cherry_Comet
	22527         SANSUI  53.333332         3465     ASUS_P00I


Note:
	The 'scorer' parameter in the fuzzy_compare function and d_fuzzy2cols method accepts a scoring function from the RapidFuzz library
	(e.g., fuzz.WRatio, fuzz.QRatio, etc.). If no scorer is specified, the default scorer used is fuzz.QRatio.

	For more information on the RapidFuzz library, visit https://github.com/maxbachmann/rapidfuzz.	
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hansalemaos/fuzzypandaswuzzy",
    "name": "fuzzypandaswuzzy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "fuzzy,wuzzy,pandas",
    "author": "Johannes Fischer",
    "author_email": "aulasparticularesdealemaosp@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f6/43/90648bb007b6e80d153d0aa6ad5dd59ef69489df695d4519e540a41190b7/fuzzypandaswuzzy-0.10.tar.gz",
    "platform": null,
    "description": "\r\n# Fuzzy Comparison Utilities for DataFrame Columns\r\n\r\n## pip install fuzzypandaswuzzy \r\n\r\n#### Tested against Windows 10 / Python 3.10 / Anaconda \r\n\r\n\t\r\n```python\r\n\r\nThis module provides a function to perform fuzzy comparison between two columns of a DataFrame using the RapidFuzz library.\r\nIt also extends the DataFrame class to add a method for fuzzy comparison between two columns.\r\n\r\nModule dependencies:\r\n\t- pandas (pd)\r\n\t- numpy (np)\r\n\t- RapidFuzz (from rapidfuzz import process, fuzz)\r\n\r\nUsage:\r\n\timport pandas as pd\r\n\tfrom rapidfuzz import fuzz\r\n\tfrom fuzzypandaswuzzy import pd_add_fuzzy_all\r\n\tpd_add_fuzzy_all()\r\n\r\n\tdf = pd.read_csv(r\"arcore_devicelist.csv\")\r\n\tdf2 = df.d_fuzzy2cols(scorer=fuzz.QRatio) # compares the first 2 columns\r\n\r\n\t\t\t   aa_value1   aa_match  aa_index_v2     aa_value2\r\n\t0            Mobicel  82.352943         1978    Mobicel_R1\r\n\t1            Hyundai  66.666664         5425         Cunda\r\n\t2               OPPO  66.666664        10102         P7PRO\r\n\t3            samsung  80.000000          745      samseong\r\n\t4               DEXP  66.666664         1174            EP\r\n\t\t\t\t  ...        ...          ...           ...\r\n\t22523          TECNO  76.923080          587      TECNO-i5\r\n\t22524          STYLO  83.333336         7272       STYLOF1\r\n\t22525  GarantiaMOVIL  52.631580        16788        armani\r\n\t22526  Cherry_Mobile  72.000000         3510  Cherry_Comet\r\n\t22527         SANSUI  53.333332         3465     ASUS_P00I\r\n\r\n\r\nNote:\r\n\tThe 'scorer' parameter in the fuzzy_compare function and d_fuzzy2cols method accepts a scoring function from the RapidFuzz library\r\n\t(e.g., fuzz.WRatio, fuzz.QRatio, etc.). If no scorer is specified, the default scorer used is fuzz.QRatio.\r\n\r\n\tFor more information on the RapidFuzz library, visit https://github.com/maxbachmann/rapidfuzz.\t\r\n```\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Fuzzy Comparison Utilities for DataFrame Columns",
    "version": "0.10",
    "project_urls": {
        "Homepage": "https://github.com/hansalemaos/fuzzypandaswuzzy"
    },
    "split_keywords": [
        "fuzzy",
        "wuzzy",
        "pandas"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ff0b14af787ab9f24cb1fe9912eb0e2d39fa908da779459b5e033ac981ad264b",
                "md5": "8fe40de3bdb1bd036b605536f7d83ab6",
                "sha256": "2b32b821a4b715d55b8a0f43fc4b290cfb109a379d2aff38d62bb7c7de59b6c4"
            },
            "downloads": -1,
            "filename": "fuzzypandaswuzzy-0.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8fe40de3bdb1bd036b605536f7d83ab6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 22625,
            "upload_time": "2023-07-30T00:08:49",
            "upload_time_iso_8601": "2023-07-30T00:08:49.262000Z",
            "url": "https://files.pythonhosted.org/packages/ff/0b/14af787ab9f24cb1fe9912eb0e2d39fa908da779459b5e033ac981ad264b/fuzzypandaswuzzy-0.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f64390648bb007b6e80d153d0aa6ad5dd59ef69489df695d4519e540a41190b7",
                "md5": "59a657176bfb5b926959c9913abfe86f",
                "sha256": "8515cb4b94a9b3db21ad0daa5b3790d19513abdf6151ecb70058ee24be782e1e"
            },
            "downloads": -1,
            "filename": "fuzzypandaswuzzy-0.10.tar.gz",
            "has_sig": false,
            "md5_digest": "59a657176bfb5b926959c9913abfe86f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 21767,
            "upload_time": "2023-07-30T00:08:50",
            "upload_time_iso_8601": "2023-07-30T00:08:50.767114Z",
            "url": "https://files.pythonhosted.org/packages/f6/43/90648bb007b6e80d153d0aa6ad5dd59ef69489df695d4519e540a41190b7/fuzzypandaswuzzy-0.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-30 00:08:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hansalemaos",
    "github_project": "fuzzypandaswuzzy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "fuzzypandaswuzzy"
}
        
Elapsed time: 0.32628s