# README
## Language
- [English](#English)
- [中文](#中文)
# <div id = "English">English <div>
## Introduction
pymatchingtools is a tools for matching methods in common causal inference.
I've used some of the common causal inference packages available, and found that almost of them just implement the methods, ignoring the balancing checks before matching, and the refutation tests after matching. We can't judge the usability of these matches.
This python package is designed to help you complete the following in a relatively simple way:
- 1) evaluate the balance of variables before matching;
- 2) complete data matching;
- 3) evaluate the robustness of the results by refutation tests.
Due to my heavy work with limited time and energy, I am only able to complete the propensity score matching method. If there is a need for other methods, please leave me a message and I will schedule updates and complete them.
## Installation
Recommend to use pip to install, the installed python version should be limited to 3.7 or above.
```bash
$ pip install pysmatch
```
## Example
This is an example of using the Boston house price dataset, which is divided into five steps.
- Data Preparation
- Initialising the Matching class
- Variable balance checking before matching
- Matching
- Rebuttal check after match
For more information you can see example.ipynb
### Data Preparation
We need to get the data first, only DataFrame format is supported.
```python
column_names = [‘CRIM’, ‘ZN’, ‘INDUS’, ‘CHAS’, ‘NOX’, ‘RM’, ‘AGE’, ‘DIS’, ‘RAD’, ‘TAX’, ‘PTRATIO’, ‘B’, ‘LSTAT’, ‘MEDV’]
data = pd.read_csv(‘housing.csv’, header=None, delimiter=r‘\s+’, names=column_names)
```
### Initialising the Matching class
Initialise the Matching class with data prepared
```python
from pymatchingtools.matching import PropensityScoreMatch
matcher = PropensityScoreMatch(data=data)
```
### Variable balance checking before matching
There are two ways to complete this, one is to use a patsy-formatted formula, and the other is to pass in the covariates(x) and indicator variables(y).
The way to use the formula is as follows. You can print out the result of the balance check with ```summary_print=True```.
```python
formula = ‘CHAS ~ CRIM + ZN + INDUS + NOX + RM + AGE + DIS + RAD’
summary_df = matcher.get_match_info(formula=formula, summary_print=True)
```
The way to use covariates(x) and indicator variables(y)
```python
y = data[[‘CHAS’]]
x = data[[‘CRIM, ZN, INDUS, NOX, RM, AGE, DIS, RAD’]]
summary_df = matcher.get_match_info(x=x, y=y, summary_print=True)
```
### Matching
Get matches via the ``match`` method, with the restriction ``is_fliter==True`` in case of no-putback sampling.
Support both GLM and LGBM methods to train propensity score models
Only support the Manhattan distance now, and I will be gradually updated more distances.
It only supports the nearest match, so there is no need to restrict it.
```python
matched_data = matcher.match(
method='min',
is_fliter=True,
fit_mathod='glm
)
```
### Rebuttal check after match
Use the ```after_match_check``` method to perform a rebuttal test, currently the following rebuttal tests are supported:
- 1) add random confusion;
- 2) placebo test;
- 3) data subset test.
```python
matcher.after_match_check(
outcome_var=‘MEDV’,
frac=0.8,
match_method=‘min’
)
```
# <div id = "中文">中文 <div>
## 简介
pymatchingtools是一个常见的因果推断中匹配方法的工具箱
我曾经用过现在python里有的常见的因果推断相关的包, 但发现几乎所有的包只是实现了方法,而忽视了推断前的平衡性检查,以及推断后的反驳式检验. 这样的匹配结果,我们无法判断其可用性
这个python包的设计初衷是, 能够用较为简单的方式,帮助大家完成:
- 1)评估匹配前的变量平衡性;
- 2)完成一次Matching方式的推断;
- 3)评估当前Matching方式得到的结果是否具备鲁棒性
由于平时工作繁忙,时间精力有限,目前仅实现了倾向性得分匹配的方法,如果有其他方法需要,请给我留言,我会排期更新和实现
## 安装方法
建议使用pip方式安装, 安装的python版本需要限制在3.7以上
```bash
$ pip install pysmatch
```
## 使用示例
这里采用波士顿房价数据集进行说明,整个使用分为5个步骤
- 数据准备
- 初始化Matching类
- 匹配前的变量平衡性检查
- 匹配
- 匹配后的反驳式检验
更多信息可以看example.ipynb
### 数据准备
需要先导入相关的数据,目前仅支持DataFrame格式
```python
column_names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
data = pd.read_csv('housing.csv', header=None, delimiter=r"\s+", names=column_names)
```
### 初始化Matching类
将我们准备好的原始数据放入Matching类中进行初始化
```python
from pymatchingtools.matching import PropensityScoreMatch
matcher = PropensityScoreMatch(data=data)
```
### 匹配前的变量平衡性检查
目前支持两种方式,一种是使用patsy格式的公式,另一种是传入相应的协变量和指示变量
使用公式的方法如下, 如果需要打印出相应的检查结果,可以令```summary_print=True```
```python
formula = 'CHAS ~ CRIM + ZN + INDUS + NOX + RM + AGE + DIS + RAD'
summary_df = matcher.get_match_info(formula=formula, summary_print=True)
```
如果是传入相应的协变量和指示变量,则需要
```python
y = data[['CHAS']]
x = data[['CRIM, ZN, INDUS, NOX, RM, AGE, DIS, RAD']]
summary_df = matcher.get_match_info(x=x, y=y, summary_print=True)
```
### 匹配
通过```match```方法获取匹配结果,如果是无放回抽样,限制```is_fliter==True```
支持GLM和LGBM两种模式去训练倾向性得分模型
距离的实现方式目前仅实现了曼哈顿距离,后续会逐渐更新和补充更多距离
这里method仅实现了最近匹配,无需限制
```python
matched_data = matcher.match(
method='min',
is_fliter=True,
fit_mathod='glm
)
```
### 匹配后的反驳式检验
使用```after_match_check```方法进行反驳式检验, 目前支持的反驳式检验有:
- 1)添加随机混淆;
- 2)安慰剂检验;
- 3)数据子集检验
```python
matcher.after_match_check(
outcome_var='MEDV',
frac=0.8,
match_method='min'
)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/Trouvaille98/pymatchingtools",
"name": "pymatchingtools",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "causal inference, PSM, Matching, observational study, pymatchingtools, psm, propensity score, propensity score matching, balance check",
"author": "Trouvaille98",
"author_email": "dulingzhi.0710@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/85/e4/ed811c1e2359ae89d01687b4dbf870b78a6c4eb1297345c1e468fab45808/pymatchingtools-0.1.0.tar.gz",
"platform": null,
"description": "# README\n## Language\n- [English](#English)\n- [\u4e2d\u6587](#\u4e2d\u6587)\n\n\n\n# <div id = \"English\">English <div>\n## Introduction\n\npymatchingtools is a tools for matching methods in common causal inference.\n\nI've used some of the common causal inference packages available, and found that almost of them just implement the methods, ignoring the balancing checks before matching, and the refutation tests after matching. We can't judge the usability of these matches.\n\nThis python package is designed to help you complete the following in a relatively simple way: \n- 1) evaluate the balance of variables before matching; \n- 2) complete data matching; \n- 3) evaluate the robustness of the results by refutation tests.\n\nDue to my heavy work with limited time and energy, I am only able to complete the propensity score matching method. If there is a need for other methods, please leave me a message and I will schedule updates and complete them.\n\n## Installation\nRecommend to use pip to install, the installed python version should be limited to 3.7 or above.\n\n```bash\n$ pip install pysmatch\n```\n\n## Example\nThis is an example of using the Boston house price dataset, which is divided into five steps.\n- Data Preparation\n- Initialising the Matching class\n- Variable balance checking before matching\n- Matching\n- Rebuttal check after match\n\nFor more information you can see example.ipynb\n\n### Data Preparation\nWe need to get the data first, only DataFrame format is supported.\n\n```python\ncolumn_names = [\u2018CRIM\u2019, \u2018ZN\u2019, \u2018INDUS\u2019, \u2018CHAS\u2019, \u2018NOX\u2019, \u2018RM\u2019, \u2018AGE\u2019, \u2018DIS\u2019, \u2018RAD\u2019, \u2018TAX\u2019, \u2018PTRATIO\u2019, \u2018B\u2019, \u2018LSTAT\u2019, \u2018MEDV\u2019]\n\ndata = pd.read_csv(\u2018housing.csv\u2019, header=None, delimiter=r\u2018\\s+\u2019, names=column_names)\n```\n\n### Initialising the Matching class\nInitialise the Matching class with data prepared\n\n```python\nfrom pymatchingtools.matching import PropensityScoreMatch\nmatcher = PropensityScoreMatch(data=data)\n```\n\n### Variable balance checking before matching\nThere are two ways to complete this, one is to use a patsy-formatted formula, and the other is to pass in the covariates(x) and indicator variables(y).\n\nThe way to use the formula is as follows. You can print out the result of the balance check with ```summary_print=True```.\n\n```python\nformula = \u2018CHAS ~ CRIM + ZN + INDUS + NOX + RM + AGE + DIS + RAD\u2019\n\nsummary_df = matcher.get_match_info(formula=formula, summary_print=True)\n```\n\n\nThe way to use covariates(x) and indicator variables(y)\n\n```python\ny = data[[\u2018CHAS\u2019]] \n\nx = data[[\u2018CRIM, ZN, INDUS, NOX, RM, AGE, DIS, RAD\u2019]]\n\nsummary_df = matcher.get_match_info(x=x, y=y, summary_print=True)\n```\n\n\n### Matching\n\n\nGet matches via the ``match`` method, with the restriction ``is_fliter==True`` in case of no-putback sampling.\n\nSupport both GLM and LGBM methods to train propensity score models\n\nOnly support the Manhattan distance now, and I will be gradually updated more distances.\n\nIt only supports the nearest match, so there is no need to restrict it.\n\n```python\nmatched_data = matcher.match(\n method='min',\n is_fliter=True,\n fit_mathod='glm\n)\n```\n\n### Rebuttal check after match\nUse the ```after_match_check``` method to perform a rebuttal test, currently the following rebuttal tests are supported: \n- 1) add random confusion; \n- 2) placebo test; \n- 3) data subset test.\n\n\n```python\nmatcher.after_match_check(\n outcome_var=\u2018MEDV\u2019,\n frac=0.8,\n match_method=\u2018min\u2019\n)\n```\n\n# <div id = \"\u4e2d\u6587\">\u4e2d\u6587 <div>\n## \u7b80\u4ecb\n\npymatchingtools\u662f\u4e00\u4e2a\u5e38\u89c1\u7684\u56e0\u679c\u63a8\u65ad\u4e2d\u5339\u914d\u65b9\u6cd5\u7684\u5de5\u5177\u7bb1\n\n\u6211\u66fe\u7ecf\u7528\u8fc7\u73b0\u5728python\u91cc\u6709\u7684\u5e38\u89c1\u7684\u56e0\u679c\u63a8\u65ad\u76f8\u5173\u7684\u5305, \u4f46\u53d1\u73b0\u51e0\u4e4e\u6240\u6709\u7684\u5305\u53ea\u662f\u5b9e\u73b0\u4e86\u65b9\u6cd5,\u800c\u5ffd\u89c6\u4e86\u63a8\u65ad\u524d\u7684\u5e73\u8861\u6027\u68c0\u67e5,\u4ee5\u53ca\u63a8\u65ad\u540e\u7684\u53cd\u9a73\u5f0f\u68c0\u9a8c. \u8fd9\u6837\u7684\u5339\u914d\u7ed3\u679c,\u6211\u4eec\u65e0\u6cd5\u5224\u65ad\u5176\u53ef\u7528\u6027\n\n\u8fd9\u4e2apython\u5305\u7684\u8bbe\u8ba1\u521d\u8877\u662f, \u80fd\u591f\u7528\u8f83\u4e3a\u7b80\u5355\u7684\u65b9\u5f0f,\u5e2e\u52a9\u5927\u5bb6\u5b8c\u6210:\n- 1)\u8bc4\u4f30\u5339\u914d\u524d\u7684\u53d8\u91cf\u5e73\u8861\u6027;\n- 2)\u5b8c\u6210\u4e00\u6b21Matching\u65b9\u5f0f\u7684\u63a8\u65ad;\n- 3)\u8bc4\u4f30\u5f53\u524dMatching\u65b9\u5f0f\u5f97\u5230\u7684\u7ed3\u679c\u662f\u5426\u5177\u5907\u9c81\u68d2\u6027\n\n\u7531\u4e8e\u5e73\u65f6\u5de5\u4f5c\u7e41\u5fd9,\u65f6\u95f4\u7cbe\u529b\u6709\u9650,\u76ee\u524d\u4ec5\u5b9e\u73b0\u4e86\u503e\u5411\u6027\u5f97\u5206\u5339\u914d\u7684\u65b9\u6cd5,\u5982\u679c\u6709\u5176\u4ed6\u65b9\u6cd5\u9700\u8981,\u8bf7\u7ed9\u6211\u7559\u8a00,\u6211\u4f1a\u6392\u671f\u66f4\u65b0\u548c\u5b9e\u73b0\n## \u5b89\u88c5\u65b9\u6cd5\n\u5efa\u8bae\u4f7f\u7528pip\u65b9\u5f0f\u5b89\u88c5, \u5b89\u88c5\u7684python\u7248\u672c\u9700\u8981\u9650\u5236\u57283.7\u4ee5\u4e0a\n\n```bash\n$ pip install pysmatch\n```\n\n## \u4f7f\u7528\u793a\u4f8b\n\u8fd9\u91cc\u91c7\u7528\u6ce2\u58eb\u987f\u623f\u4ef7\u6570\u636e\u96c6\u8fdb\u884c\u8bf4\u660e,\u6574\u4e2a\u4f7f\u7528\u5206\u4e3a5\u4e2a\u6b65\u9aa4\n- \u6570\u636e\u51c6\u5907\n- \u521d\u59cb\u5316Matching\u7c7b\n- \u5339\u914d\u524d\u7684\u53d8\u91cf\u5e73\u8861\u6027\u68c0\u67e5\n- \u5339\u914d\n- \u5339\u914d\u540e\u7684\u53cd\u9a73\u5f0f\u68c0\u9a8c\n\n\n\u66f4\u591a\u4fe1\u606f\u53ef\u4ee5\u770bexample.ipynb\n\n### \u6570\u636e\u51c6\u5907\n\u9700\u8981\u5148\u5bfc\u5165\u76f8\u5173\u7684\u6570\u636e,\u76ee\u524d\u4ec5\u652f\u6301DataFrame\u683c\u5f0f\n\n```python\ncolumn_names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']\n\ndata = pd.read_csv('housing.csv', header=None, delimiter=r\"\\s+\", names=column_names)\n```\n\n### \u521d\u59cb\u5316Matching\u7c7b\n\u5c06\u6211\u4eec\u51c6\u5907\u597d\u7684\u539f\u59cb\u6570\u636e\u653e\u5165Matching\u7c7b\u4e2d\u8fdb\u884c\u521d\u59cb\u5316\n\n```python\nfrom pymatchingtools.matching import PropensityScoreMatch\nmatcher = PropensityScoreMatch(data=data)\n```\n\n### \u5339\u914d\u524d\u7684\u53d8\u91cf\u5e73\u8861\u6027\u68c0\u67e5\n\u76ee\u524d\u652f\u6301\u4e24\u79cd\u65b9\u5f0f,\u4e00\u79cd\u662f\u4f7f\u7528patsy\u683c\u5f0f\u7684\u516c\u5f0f,\u53e6\u4e00\u79cd\u662f\u4f20\u5165\u76f8\u5e94\u7684\u534f\u53d8\u91cf\u548c\u6307\u793a\u53d8\u91cf\n\n\u4f7f\u7528\u516c\u5f0f\u7684\u65b9\u6cd5\u5982\u4e0b, \u5982\u679c\u9700\u8981\u6253\u5370\u51fa\u76f8\u5e94\u7684\u68c0\u67e5\u7ed3\u679c,\u53ef\u4ee5\u4ee4```summary_print=True```\n\n```python\nformula = 'CHAS ~ CRIM + ZN + INDUS + NOX + RM + AGE + DIS + RAD'\n\nsummary_df = matcher.get_match_info(formula=formula, summary_print=True)\n```\n\n\n\u5982\u679c\u662f\u4f20\u5165\u76f8\u5e94\u7684\u534f\u53d8\u91cf\u548c\u6307\u793a\u53d8\u91cf,\u5219\u9700\u8981\n```python\ny = data[['CHAS']] \n\nx = data[['CRIM, ZN, INDUS, NOX, RM, AGE, DIS, RAD']]\n\nsummary_df = matcher.get_match_info(x=x, y=y, summary_print=True)\n```\n\n\n### \u5339\u914d\n\n\n\u901a\u8fc7```match```\u65b9\u6cd5\u83b7\u53d6\u5339\u914d\u7ed3\u679c,\u5982\u679c\u662f\u65e0\u653e\u56de\u62bd\u6837,\u9650\u5236```is_fliter==True```\n\n\u652f\u6301GLM\u548cLGBM\u4e24\u79cd\u6a21\u5f0f\u53bb\u8bad\u7ec3\u503e\u5411\u6027\u5f97\u5206\u6a21\u578b\n\n\u8ddd\u79bb\u7684\u5b9e\u73b0\u65b9\u5f0f\u76ee\u524d\u4ec5\u5b9e\u73b0\u4e86\u66fc\u54c8\u987f\u8ddd\u79bb,\u540e\u7eed\u4f1a\u9010\u6e10\u66f4\u65b0\u548c\u8865\u5145\u66f4\u591a\u8ddd\u79bb\n\n\u8fd9\u91ccmethod\u4ec5\u5b9e\u73b0\u4e86\u6700\u8fd1\u5339\u914d,\u65e0\u9700\u9650\u5236\n\n```python\nmatched_data = matcher.match(\n method='min',\n is_fliter=True,\n fit_mathod='glm\n)\n```\n\n### \u5339\u914d\u540e\u7684\u53cd\u9a73\u5f0f\u68c0\u9a8c\n\u4f7f\u7528```after_match_check```\u65b9\u6cd5\u8fdb\u884c\u53cd\u9a73\u5f0f\u68c0\u9a8c, \u76ee\u524d\u652f\u6301\u7684\u53cd\u9a73\u5f0f\u68c0\u9a8c\u6709: \n- 1)\u6dfb\u52a0\u968f\u673a\u6df7\u6dc6;\n- 2)\u5b89\u6170\u5242\u68c0\u9a8c;\n- 3)\u6570\u636e\u5b50\u96c6\u68c0\u9a8c\n\n\n```python\nmatcher.after_match_check(\n outcome_var='MEDV',\n frac=0.8,\n match_method='min'\n)\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A toolbox of common matching methods",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/Trouvaille98/pymatchingtools"
},
"split_keywords": [
"causal inference",
" psm",
" matching",
" observational study",
" pymatchingtools",
" psm",
" propensity score",
" propensity score matching",
" balance check"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c4d6d5121039e8c1185a38c9da43fb27698fbdac144766a635d5b9787dd6cfa3",
"md5": "6f8e8513e0cc93b7f2ab47bd257497e1",
"sha256": "2e578c0977ba2dd24269556d999e63dee0778a6b13168870898bd89eb2556780"
},
"downloads": -1,
"filename": "pymatchingtools-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6f8e8513e0cc93b7f2ab47bd257497e1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 12816,
"upload_time": "2024-09-17T15:05:15",
"upload_time_iso_8601": "2024-09-17T15:05:15.001117Z",
"url": "https://files.pythonhosted.org/packages/c4/d6/d5121039e8c1185a38c9da43fb27698fbdac144766a635d5b9787dd6cfa3/pymatchingtools-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "85e4ed811c1e2359ae89d01687b4dbf870b78a6c4eb1297345c1e468fab45808",
"md5": "c0c055615f956f7692a6729839698650",
"sha256": "ada2c8323f27700a73913e2a4e77e237f0d03c7ad9385f218abfaf3d9e69dad0"
},
"downloads": -1,
"filename": "pymatchingtools-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "c0c055615f956f7692a6729839698650",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 14665,
"upload_time": "2024-09-17T15:05:16",
"upload_time_iso_8601": "2024-09-17T15:05:16.502317Z",
"url": "https://files.pythonhosted.org/packages/85/e4/ed811c1e2359ae89d01687b4dbf870b78a6c4eb1297345c1e468fab45808/pymatchingtools-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-17 15:05:16",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Trouvaille98",
"github_project": "pymatchingtools",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "pymatchingtools"
}