# healmatcher
- `healmatcher` is a simple but fast probabilistic data matching package developed by NYULH HEAL Lab.
- The package is best optimized for matching healthcare database (e.g. EHR) as it has designed to link Medicaid and Client Database System data.
- `Splink package` is extensively being used to run core linkage processes.
- Currently, the model supports 4 variables (`sex`, `date of birth`, `last 4 digits of ssn`, and `first 2 letters of last name`) to run the linkage process.
## How to install
`pip install healmatcher`
## How to use (example)
```python
# Install package
!pip install healmatcher
# Load package
from healmatcher import hm
# create example dataset
testa = pd.DataFrame({
'sex':[1,2,1,2,1,2,1,2,1,2],
'dob':['2012-1-1','2011-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
'ln':["as",'ss','zz','rr','ww','wa','tr','tt','hh','gq'],
'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]
})
testb = pd.DataFrame({
'sex':[2,2,1,1,1,2,1,2,1,1],
'dob':['2012-1-1','2001-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
'ln':["as",'ls','zz','rr','wb','wa','tr','tt','ha','gq'],
'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]
# Run matching
hm(
df_a = testa,
df_b = testb,
col_a=['sex','dob','ssn','ln'],
col_b=['sex','dob','ssn','ln'],
match_prob_threshold = 0.001,
iteration = 20,
model2 = True,
blocking_rule_for_training_input = 'PROVIDER_NUMBER',
onetoone = True,
match_summary = True
)
```
## Updates
- `use_save_model=True` : Load pre-trained model to run matching
- `save_model_path = PATH` : add path to load a model (json format)
- `export_model=True` : argument to save current model
- `export_model_path=PATH` : add path to save current model
# Follow up
- Please visit our repo if you have any questions.
# Webpage
- [healmatcher](https://pypi.org/project/healmatcher/)
- [healmatcher-github](https://github.com/JosephKBS/healmatcher)
Raw data
{
"_id": null,
"home_page": "https://github.com/JosephKBS/healmatcher",
"name": "healmatcher",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "probabilistic match, probabilistic data match, splink",
"author": "Joseph Shim",
"author_email": "<joseph.shim.rok@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/9e/46/d43552f9aeab3b79993ef097dddcacc6a577fe3523706d24eacd9db1f947/healmatcher-0.0.48.tar.gz",
"platform": null,
"description": "# healmatcher\n- `healmatcher` is a simple but fast probabilistic data matching package developed by NYULH HEAL Lab. \n- The package is best optimized for matching healthcare database (e.g. EHR) as it has designed to link Medicaid and Client Database System data.\n- `Splink package` is extensively being used to run core linkage processes.\n- Currently, the model supports 4 variables (`sex`, `date of birth`, `last 4 digits of ssn`, and `first 2 letters of last name`) to run the linkage process.\n\n\n## How to install\n\n`pip install healmatcher`\n\n\n## How to use (example)\n```python\n# Install package\n!pip install healmatcher\n\n# Load package\nfrom healmatcher import hm\n\n# create example dataset\ntesta = pd.DataFrame({\n 'sex':[1,2,1,2,1,2,1,2,1,2],\n 'dob':['2012-1-1','2011-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],\n 'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],\n 'ln':[\"as\",'ss','zz','rr','ww','wa','tr','tt','hh','gq'],\n 'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]\n})\ntestb = pd.DataFrame({\n 'sex':[2,2,1,1,1,2,1,2,1,1],\n 'dob':['2012-1-1','2001-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],\n 'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],\n 'ln':[\"as\",'ls','zz','rr','wb','wa','tr','tt','ha','gq'],\n 'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]\n\n# Run matching\nhm(\n df_a = testa,\n df_b = testb,\n col_a=['sex','dob','ssn','ln'],\n col_b=['sex','dob','ssn','ln'],\n match_prob_threshold = 0.001,\n iteration = 20,\n model2 = True,\n blocking_rule_for_training_input = 'PROVIDER_NUMBER',\n onetoone = True,\n match_summary = True\n)\n```\n\n## Updates\n\n- `use_save_model=True` : Load pre-trained model to run matching\n- `save_model_path = PATH` : add path to load a model (json format)\n- `export_model=True` : argument to save current model\n- `export_model_path=PATH` : add path to save current model\n\n\n# Follow up\n- Please visit our repo if you have any questions. \n\n# Webpage\n\n- [healmatcher](https://pypi.org/project/healmatcher/)\n- [healmatcher-github](https://github.com/JosephKBS/healmatcher)\n\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Fast and simple probabilistic data matching package",
"version": "0.0.48",
"project_urls": {
"Homepage": "https://github.com/JosephKBS/healmatcher"
},
"split_keywords": [
"probabilistic match",
" probabilistic data match",
" splink"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "51ae050bed83e2be22a833f84f69bc028fa8828004dc8111b77420bbbfcd95c6",
"md5": "7c47557600754ca4983029f9b67ac466",
"sha256": "635197a28445071c9e4907983b2c738f203ef05d2a217875f396415ec1a71ab1"
},
"downloads": -1,
"filename": "healmatcher-0.0.48-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7c47557600754ca4983029f9b67ac466",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 7286,
"upload_time": "2024-06-20T22:31:28",
"upload_time_iso_8601": "2024-06-20T22:31:28.253702Z",
"url": "https://files.pythonhosted.org/packages/51/ae/050bed83e2be22a833f84f69bc028fa8828004dc8111b77420bbbfcd95c6/healmatcher-0.0.48-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9e46d43552f9aeab3b79993ef097dddcacc6a577fe3523706d24eacd9db1f947",
"md5": "2c57f37d02dc9ac9caf42c8a70a5e5a1",
"sha256": "0aa3f57ccfb94e886ab6903b6bd91f051892273d476c58644b99a4e2cf85101f"
},
"downloads": -1,
"filename": "healmatcher-0.0.48.tar.gz",
"has_sig": false,
"md5_digest": "2c57f37d02dc9ac9caf42c8a70a5e5a1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7044,
"upload_time": "2024-06-20T22:31:30",
"upload_time_iso_8601": "2024-06-20T22:31:30.310266Z",
"url": "https://files.pythonhosted.org/packages/9e/46/d43552f9aeab3b79993ef097dddcacc6a577fe3523706d24eacd9db1f947/healmatcher-0.0.48.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-20 22:31:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "JosephKBS",
"github_project": "healmatcher",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "healmatcher"
}