healmatcher


Namehealmatcher JSON
Version 0.0.48 PyPI version JSON
download
home_pagehttps://github.com/JosephKBS/healmatcher
SummaryFast and simple probabilistic data matching package
upload_time2024-06-20 22:31:30
maintainerNone
docs_urlNone
authorJoseph Shim
requires_pythonNone
licenseNone
keywords probabilistic match probabilistic data match splink
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # healmatcher
- `healmatcher` is a simple but fast probabilistic data matching package developed by NYULH HEAL Lab. 
- The package is best optimized for matching healthcare database (e.g. EHR) as it has designed to link Medicaid and Client Database System data.
- `Splink package` is extensively being used to run core linkage processes.
- Currently, the model supports 4 variables (`sex`, `date of birth`, `last 4 digits of ssn`, and `first 2 letters of last name`) to run the linkage process.


## How to install

`pip install healmatcher`


## How to use (example)
```python
# Install package
!pip install healmatcher

# Load package
from healmatcher import hm

# create example dataset
testa = pd.DataFrame({
    'sex':[1,2,1,2,1,2,1,2,1,2],
    'dob':['2012-1-1','2011-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ss','zz','rr','ww','wa','tr','tt','hh','gq'],
    'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]
})
testb = pd.DataFrame({
    'sex':[2,2,1,1,1,2,1,2,1,1],
    'dob':['2012-1-1','2001-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ls','zz','rr','wb','wa','tr','tt','ha','gq'],
    'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]

# Run matching
hm(
    df_a = testa,
    df_b = testb,
    col_a=['sex','dob','ssn','ln'],
    col_b=['sex','dob','ssn','ln'],
    match_prob_threshold = 0.001,
    iteration = 20,
    model2 = True,
    blocking_rule_for_training_input = 'PROVIDER_NUMBER',
    onetoone = True,
    match_summary = True
)
```

## Updates

- `use_save_model=True` : Load pre-trained model to run matching
- `save_model_path = PATH` : add path to load a model (json format)
- `export_model=True` : argument to save current model
- `export_model_path=PATH` : add path to save current model


# Follow up
- Please visit our repo if you have any questions. 

# Webpage

- [healmatcher](https://pypi.org/project/healmatcher/)
- [healmatcher-github](https://github.com/JosephKBS/healmatcher)



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/JosephKBS/healmatcher",
    "name": "healmatcher",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "probabilistic match, probabilistic data match, splink",
    "author": "Joseph Shim",
    "author_email": "<joseph.shim.rok@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/9e/46/d43552f9aeab3b79993ef097dddcacc6a577fe3523706d24eacd9db1f947/healmatcher-0.0.48.tar.gz",
    "platform": null,
    "description": "# healmatcher\n- `healmatcher` is a simple but fast probabilistic data matching package developed by NYULH HEAL Lab. \n- The package is best optimized for matching healthcare database (e.g. EHR) as it has designed to link Medicaid and Client Database System data.\n- `Splink package` is extensively being used to run core linkage processes.\n- Currently, the model supports 4 variables (`sex`, `date of birth`, `last 4 digits of ssn`, and `first 2 letters of last name`) to run the linkage process.\n\n\n## How to install\n\n`pip install healmatcher`\n\n\n## How to use (example)\n```python\n# Install package\n!pip install healmatcher\n\n# Load package\nfrom healmatcher import hm\n\n# create example dataset\ntesta = pd.DataFrame({\n    'sex':[1,2,1,2,1,2,1,2,1,2],\n    'dob':['2012-1-1','2011-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],\n    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],\n    'ln':[\"as\",'ss','zz','rr','ww','wa','tr','tt','hh','gq'],\n    'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]\n})\ntestb = pd.DataFrame({\n    'sex':[2,2,1,1,1,2,1,2,1,1],\n    'dob':['2012-1-1','2001-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],\n    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],\n    'ln':[\"as\",'ls','zz','rr','wb','wa','tr','tt','ha','gq'],\n    'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]\n\n# Run matching\nhm(\n    df_a = testa,\n    df_b = testb,\n    col_a=['sex','dob','ssn','ln'],\n    col_b=['sex','dob','ssn','ln'],\n    match_prob_threshold = 0.001,\n    iteration = 20,\n    model2 = True,\n    blocking_rule_for_training_input = 'PROVIDER_NUMBER',\n    onetoone = True,\n    match_summary = True\n)\n```\n\n## Updates\n\n- `use_save_model=True` : Load pre-trained model to run matching\n- `save_model_path = PATH` : add path to load a model (json format)\n- `export_model=True` : argument to save current model\n- `export_model_path=PATH` : add path to save current model\n\n\n# Follow up\n- Please visit our repo if you have any questions. \n\n# Webpage\n\n- [healmatcher](https://pypi.org/project/healmatcher/)\n- [healmatcher-github](https://github.com/JosephKBS/healmatcher)\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Fast and simple probabilistic data matching package",
    "version": "0.0.48",
    "project_urls": {
        "Homepage": "https://github.com/JosephKBS/healmatcher"
    },
    "split_keywords": [
        "probabilistic match",
        " probabilistic data match",
        " splink"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "51ae050bed83e2be22a833f84f69bc028fa8828004dc8111b77420bbbfcd95c6",
                "md5": "7c47557600754ca4983029f9b67ac466",
                "sha256": "635197a28445071c9e4907983b2c738f203ef05d2a217875f396415ec1a71ab1"
            },
            "downloads": -1,
            "filename": "healmatcher-0.0.48-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7c47557600754ca4983029f9b67ac466",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 7286,
            "upload_time": "2024-06-20T22:31:28",
            "upload_time_iso_8601": "2024-06-20T22:31:28.253702Z",
            "url": "https://files.pythonhosted.org/packages/51/ae/050bed83e2be22a833f84f69bc028fa8828004dc8111b77420bbbfcd95c6/healmatcher-0.0.48-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9e46d43552f9aeab3b79993ef097dddcacc6a577fe3523706d24eacd9db1f947",
                "md5": "2c57f37d02dc9ac9caf42c8a70a5e5a1",
                "sha256": "0aa3f57ccfb94e886ab6903b6bd91f051892273d476c58644b99a4e2cf85101f"
            },
            "downloads": -1,
            "filename": "healmatcher-0.0.48.tar.gz",
            "has_sig": false,
            "md5_digest": "2c57f37d02dc9ac9caf42c8a70a5e5a1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7044,
            "upload_time": "2024-06-20T22:31:30",
            "upload_time_iso_8601": "2024-06-20T22:31:30.310266Z",
            "url": "https://files.pythonhosted.org/packages/9e/46/d43552f9aeab3b79993ef097dddcacc6a577fe3523706d24eacd9db1f947/healmatcher-0.0.48.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-20 22:31:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "JosephKBS",
    "github_project": "healmatcher",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "healmatcher"
}
        
Elapsed time: 0.29842s