a-pandas-ex-fuzzymerge


Namea-pandas-ex-fuzzymerge JSON
Version 0.10 PyPI version JSON
download
home_pagehttps://github.com/hansalemaos/a_pandas_ex_fuzzymerge
SummaryMerges two DataFrames using fuzzy matching on specified columns
upload_time2023-10-05 10:56:12
maintainer
docs_urlNone
authorJohannes Fischer
requires_python
licenseMIT
keywords merge dataframe fuzzy rapidfuzz
VCS
bugtrack_url
requirements numexpr numpy pandas rapidfuzz
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Merges two DataFrames using fuzzy matching on specified columns

## Tested against Windows / Python 3.11 / Anaconda

## pip install a-pandas-ex-fuzzymerge

```python


This function performs a fuzzy matching between two DataFrames `df1` and `df2`
based on the columns specified in `right_on` and `left_on`. Fuzzy matching allows
you to find similar values between these columns, making it useful for matching
data with small variations, such as typos or abbreviations.

Parameters:
df1 (DataFrame): The first DataFrame to be merged.
df2 (DataFrame): The second DataFrame to be merged.
right_on (str): The column name in `df2` to be used for matching.
left_on (str): The column name in `df1` to be used for matching.
usedtype (numpy.dtype, optional): The data type to use for the distance matrix.
	Defaults to `np.uint8`.
scorer (function, optional): The scoring function to use for fuzzy matching.
	Defaults to `fuzz.WRatio`.
concat_value (bool, optional): Whether to add a 'concat_value' column in the result DataFrame,
	containing the similarity scores. Defaults to `True`.
**kwargs: Additional keyword arguments to pass to the `pandas.merge` function.

Returns:
DataFrame: A merged DataFrame with rows that matched based on the specified fuzzy criteria.

Example:
	from a_pandas_ex_fuzzymerge import pd_add_fuzzymerge
	import pandas as pd
	import numpy as np
	from rapidfuzz import fuzz
	pd_add_fuzzymerge()
	df1 = pd.read_csv(
		"https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv"
	)
	df2 = df1.copy()
	df2 = pd.concat([df2 for x in range(3)], ignore_index=True)
	df2.Name = (df2.Name + np.random.uniform(1, 2000, len(df2)).astype("U"))
	df1 = pd.concat([df1 for x in range(3)], ignore_index=True)
	df1.Name = (df1.Name + np.random.uniform(1, 2000, len(df1)).astype("U"))

	df3 = df1.d_fuzzy_merge(df2, right_on='Name', left_on='Name', usedtype=np.uint8, scorer=fuzz.partial_ratio,
							concat_value=True)
	print(df3)

```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hansalemaos/a_pandas_ex_fuzzymerge",
    "name": "a-pandas-ex-fuzzymerge",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "merge,dataframe,fuzzy,rapidfuzz",
    "author": "Johannes Fischer",
    "author_email": "aulasparticularesdealemaosp@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/6a/a4/6a4f9217e0a30abfb127478c19539b66cff1d4aea6d7170323035ae59be0/a_pandas_ex_fuzzymerge-0.10.tar.gz",
    "platform": null,
    "description": "\r\n# Merges two DataFrames using fuzzy matching on specified columns\r\n\r\n## Tested against Windows / Python 3.11 / Anaconda\r\n\r\n## pip install a-pandas-ex-fuzzymerge\r\n\r\n```python\r\n\r\n\r\nThis function performs a fuzzy matching between two DataFrames `df1` and `df2`\r\nbased on the columns specified in `right_on` and `left_on`. Fuzzy matching allows\r\nyou to find similar values between these columns, making it useful for matching\r\ndata with small variations, such as typos or abbreviations.\r\n\r\nParameters:\r\ndf1 (DataFrame): The first DataFrame to be merged.\r\ndf2 (DataFrame): The second DataFrame to be merged.\r\nright_on (str): The column name in `df2` to be used for matching.\r\nleft_on (str): The column name in `df1` to be used for matching.\r\nusedtype (numpy.dtype, optional): The data type to use for the distance matrix.\r\n\tDefaults to `np.uint8`.\r\nscorer (function, optional): The scoring function to use for fuzzy matching.\r\n\tDefaults to `fuzz.WRatio`.\r\nconcat_value (bool, optional): Whether to add a 'concat_value' column in the result DataFrame,\r\n\tcontaining the similarity scores. Defaults to `True`.\r\n**kwargs: Additional keyword arguments to pass to the `pandas.merge` function.\r\n\r\nReturns:\r\nDataFrame: A merged DataFrame with rows that matched based on the specified fuzzy criteria.\r\n\r\nExample:\r\n\tfrom a_pandas_ex_fuzzymerge import pd_add_fuzzymerge\r\n\timport pandas as pd\r\n\timport numpy as np\r\n\tfrom rapidfuzz import fuzz\r\n\tpd_add_fuzzymerge()\r\n\tdf1 = pd.read_csv(\r\n\t\t\"https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv\"\r\n\t)\r\n\tdf2 = df1.copy()\r\n\tdf2 = pd.concat([df2 for x in range(3)], ignore_index=True)\r\n\tdf2.Name = (df2.Name + np.random.uniform(1, 2000, len(df2)).astype(\"U\"))\r\n\tdf1 = pd.concat([df1 for x in range(3)], ignore_index=True)\r\n\tdf1.Name = (df1.Name + np.random.uniform(1, 2000, len(df1)).astype(\"U\"))\r\n\r\n\tdf3 = df1.d_fuzzy_merge(df2, right_on='Name', left_on='Name', usedtype=np.uint8, scorer=fuzz.partial_ratio,\r\n\t\t\t\t\t\t\tconcat_value=True)\r\n\tprint(df3)\r\n\r\n```\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Merges two DataFrames using fuzzy matching on specified columns",
    "version": "0.10",
    "project_urls": {
        "Homepage": "https://github.com/hansalemaos/a_pandas_ex_fuzzymerge"
    },
    "split_keywords": [
        "merge",
        "dataframe",
        "fuzzy",
        "rapidfuzz"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7e4a42e8db0a2db08ab7751bf2ae8eed8f3dee494267572b959d22f5f1ad1e96",
                "md5": "144bc03787efce8f448807d70b2d6d1f",
                "sha256": "5701d08ce76cc3a0668f9e1c3c622a2def97671e8dd5cd9df38dce3ecbe10601"
            },
            "downloads": -1,
            "filename": "a_pandas_ex_fuzzymerge-0.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "144bc03787efce8f448807d70b2d6d1f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 23499,
            "upload_time": "2023-10-05T10:56:10",
            "upload_time_iso_8601": "2023-10-05T10:56:10.379841Z",
            "url": "https://files.pythonhosted.org/packages/7e/4a/42e8db0a2db08ab7751bf2ae8eed8f3dee494267572b959d22f5f1ad1e96/a_pandas_ex_fuzzymerge-0.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6aa46a4f9217e0a30abfb127478c19539b66cff1d4aea6d7170323035ae59be0",
                "md5": "3cad1120edd1697734a8e912561f7b1c",
                "sha256": "757b1d8511570adc1be41c3732f9b93e895e318de93a2af6c12c9d148d791a16"
            },
            "downloads": -1,
            "filename": "a_pandas_ex_fuzzymerge-0.10.tar.gz",
            "has_sig": false,
            "md5_digest": "3cad1120edd1697734a8e912561f7b1c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 22279,
            "upload_time": "2023-10-05T10:56:12",
            "upload_time_iso_8601": "2023-10-05T10:56:12.763318Z",
            "url": "https://files.pythonhosted.org/packages/6a/a4/6a4f9217e0a30abfb127478c19539b66cff1d4aea6d7170323035ae59be0/a_pandas_ex_fuzzymerge-0.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-05 10:56:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hansalemaos",
    "github_project": "a_pandas_ex_fuzzymerge",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "numexpr",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "rapidfuzz",
            "specs": []
        }
    ],
    "lcname": "a-pandas-ex-fuzzymerge"
}
        
Elapsed time: 0.12293s