| Name | syriskmodels JSON |
| Version |
0.2.4
JSON |
| download |
| home_page | None |
| Summary | 信用风险模型工具包 |
| upload_time | 2024-08-07 15:27:52 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.8 |
| license | None |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# `riskmodels` - 风险模型工具库
`riskmodels`意在提供风险模型开发中常用的函数和算法,目前主要覆盖了以下几类功能:
* 数据探索
* 变量分箱
* 逻辑回归建模
* 评分卡转换
* 模型评估
以下对主要功能进行说明,详细说明请参见代码文档。
## 数据探索
### 样本分布: `riskmodels.utils.sample_stats`
该函数作用是统计样本中的总样本数、好坏样本数及坏率,一般会结合 groupby 使用。如下例:
```{python}
# 按照申请月份进行样本统计
df.groupby('apply_month').apply(sample_stats, target='y')
# 按照不同信贷产品进行样本统计
df.groupby('product_id').apply(sample_stats, target='y')
```
### 变量探索: `riskmodels.detector.detect`
> 注:该函数源自 `toad`
该函数用于变量分布。对于数值型变量,统计其空值率、最大值、最小值、平均值、方差等统计量;对于类别型变量,统计器出现频次最高的类别。
## 变量分箱: `riskmodels.scorecard`模块
本模块基于 `scorecardpy` 项目进行重构,主要目的是提供分箱方法的可扩展性。
原项目的分箱步骤为:特殊值处理 → 细分箱:等距分箱 → 粗分箱:ChiMerge/树方法,本次重构进行了如下优化:
* 细分箱增加等频分箱,由于信贷场景的数据偏度极大,等距分箱可能在数据集中部分丢失细节,等频分箱更为合适
* 粗分箱中的树方法,增加了对单调性约束的支持(通过`ensure_monotonic=True`打开,默认为`False`)
### `woebin`函数
```{python}
def woebin(dt,
y,
x=None,
var_skip=None,
breaks_list=None,
special_values=None,
positive="bad|1",
no_cores=None,
methods=None,
ignore_const_cols=True,
ignore_datetime_cols=True,
check_cate_num=True,
replace_blank=True,
**kwargs): ...
```
该函数与`sc.woebin`函数接口基本类似,主要变更如下:
* methods: 默认为`['quantile', 'tree']`, 即采用等频分箱→树分箱的分箱方式;该参数默认可支持的分箱方法包括
* `hist`: 等距分箱,注册类`riskmodels.scorecard.HistogramInitBin`
* `quantile`: 等频分箱,注册类`riskmodels.scorecard.QuantileInitBin`
* `tree`: 树分箱,注册类`riskmodels.scorecard.TreeOptimBin`
* `chi2`/`chimerge`: ChiMerge分箱,注册类`riskmodels.scorecard.ChiMergeOptimBin`
使用该参数有以下**注意事项**:
* 首个分箱方法必须为(无监督)细分箱方法,此处可选为`hist`和`quantile`两类;
* 细分箱方法不可位于其他分箱方法之后,如`['quantile', 'tree']`,此时等频分箱方法不生效;
* 可以只包含细分箱,如`['quantile']`或`['hist']`,此时为纯无监督分箱;
* 列表长度可以大于2,例如:`['quantile', 'tree', 'chi2]`,即在树分箱的基础上,再用ChiMerge方法对无显著差异的相邻分箱进行合并。
* `**kwargs`: 该参数为各个分箱方法所需要的参数,具体可见分箱方法类的文档,下列最常见参数。
* 等距分箱和等频分箱
* initial_bins: 细分箱的数量,默认20
* 树分箱和ChiMerge分箱
* bin_num_limit: 最终分箱的最大数量(不含特殊值),默认5
* count_distr_limit: 分箱样本占总样本的最小比例,默认0.05
* stop_limit: 分箱停止条件,树分箱为IV值相对增量,ChiMerge为独立性检验P值,默认0.05
* ensure_monotonic(仅树分箱支持): 是否保证单调性(不含特殊值),默认`False`
#### 分箱方法的扩展
(略)
### `woebin_ply`函数
```{python}
def woebin_ply(dt, bins, no_cores=None, replace_blank=False, value='woe'):
...
```
该函数与`sc.woebin_ply`函数接口基本类似,增加如下参数:
* value: 可选项为 `['woe', 'index', 'bin']`,默认为 'woe'
* value='woe'时,将原始值替换为woe值,返回的字段名为 `变量名_woe`,与`sc.woebin_ply`一致;
* value='index'时,将原始值替换为变量分箱结果数据框中的index,返回的字段名为 `变量名_index`;
* value='bin' 时,返回结果为分箱区间 [a,b) 【数值型变量】或 a%,%b 【类别型变量】,返回的字段名为 `变量名_bin`。
### `woebin_psi`函数
```{python}
def woebin_psi(df_base, df_cmp, bins):
...
```
该函数为新增函数,用于计算变量PSI值,详细使用方式见函数文档。
### 其他函数
其他常用函数列举如下:
* sc_bins_to_df: 整合`woebin`返回值,生成woe表和iv表
* woebin_breaks: 根据`woebin`返回值保存切分点和特殊值点
* woebin_plot: 根据`woebin`返回值生成bivar图像
## 逻辑回归建模
## 评分卡转换
### `make_scorecard`函数
```{python}
def make_scorecard(sc_bins, coef, *, base_points=600, base_odds=50, pdo=20):
...
```
该函数用于生成评分卡,其中`coef`为各个入模变量的系数字典: `{变量名_woe: 系数}`
## 模型评估
Raw data
{
"_id": null,
"home_page": null,
"name": "syriskmodels",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "YAO Siyuan <siyuan89@163.com>",
"download_url": "https://files.pythonhosted.org/packages/a0/cb/60b3c6ef1a2f25127fc2819db4e30304f3c53f53de2d47f48185acdce9e3/syriskmodels-0.2.4.tar.gz",
"platform": null,
"description": "# `riskmodels` - \u98ce\u9669\u6a21\u578b\u5de5\u5177\u5e93\r\n\r\n`riskmodels`\u610f\u5728\u63d0\u4f9b\u98ce\u9669\u6a21\u578b\u5f00\u53d1\u4e2d\u5e38\u7528\u7684\u51fd\u6570\u548c\u7b97\u6cd5\uff0c\u76ee\u524d\u4e3b\u8981\u8986\u76d6\u4e86\u4ee5\u4e0b\u51e0\u7c7b\u529f\u80fd\uff1a\r\n\r\n* \u6570\u636e\u63a2\u7d22\r\n* \u53d8\u91cf\u5206\u7bb1\r\n* \u903b\u8f91\u56de\u5f52\u5efa\u6a21\r\n* \u8bc4\u5206\u5361\u8f6c\u6362\r\n* \u6a21\u578b\u8bc4\u4f30\r\n\r\n\u4ee5\u4e0b\u5bf9\u4e3b\u8981\u529f\u80fd\u8fdb\u884c\u8bf4\u660e\uff0c\u8be6\u7ec6\u8bf4\u660e\u8bf7\u53c2\u89c1\u4ee3\u7801\u6587\u6863\u3002\r\n\r\n## \u6570\u636e\u63a2\u7d22\r\n\r\n### \u6837\u672c\u5206\u5e03: `riskmodels.utils.sample_stats`\r\n\r\n\u8be5\u51fd\u6570\u4f5c\u7528\u662f\u7edf\u8ba1\u6837\u672c\u4e2d\u7684\u603b\u6837\u672c\u6570\u3001\u597d\u574f\u6837\u672c\u6570\u53ca\u574f\u7387\uff0c\u4e00\u822c\u4f1a\u7ed3\u5408 groupby \u4f7f\u7528\u3002\u5982\u4e0b\u4f8b\uff1a\r\n\r\n```{python}\r\n# \u6309\u7167\u7533\u8bf7\u6708\u4efd\u8fdb\u884c\u6837\u672c\u7edf\u8ba1\r\ndf.groupby('apply_month').apply(sample_stats, target='y')\r\n\r\n# \u6309\u7167\u4e0d\u540c\u4fe1\u8d37\u4ea7\u54c1\u8fdb\u884c\u6837\u672c\u7edf\u8ba1\r\ndf.groupby('product_id').apply(sample_stats, target='y')\r\n```\r\n\r\n### \u53d8\u91cf\u63a2\u7d22: `riskmodels.detector.detect`\r\n\r\n> \u6ce8\uff1a\u8be5\u51fd\u6570\u6e90\u81ea `toad`\r\n\r\n\u8be5\u51fd\u6570\u7528\u4e8e\u53d8\u91cf\u5206\u5e03\u3002\u5bf9\u4e8e\u6570\u503c\u578b\u53d8\u91cf\uff0c\u7edf\u8ba1\u5176\u7a7a\u503c\u7387\u3001\u6700\u5927\u503c\u3001\u6700\u5c0f\u503c\u3001\u5e73\u5747\u503c\u3001\u65b9\u5dee\u7b49\u7edf\u8ba1\u91cf\uff1b\u5bf9\u4e8e\u7c7b\u522b\u578b\u53d8\u91cf\uff0c\u7edf\u8ba1\u5668\u51fa\u73b0\u9891\u6b21\u6700\u9ad8\u7684\u7c7b\u522b\u3002\r\n\r\n## \u53d8\u91cf\u5206\u7bb1: `riskmodels.scorecard`\u6a21\u5757\r\n\r\n\u672c\u6a21\u5757\u57fa\u4e8e `scorecardpy` \u9879\u76ee\u8fdb\u884c\u91cd\u6784\uff0c\u4e3b\u8981\u76ee\u7684\u662f\u63d0\u4f9b\u5206\u7bb1\u65b9\u6cd5\u7684\u53ef\u6269\u5c55\u6027\u3002\r\n\r\n\u539f\u9879\u76ee\u7684\u5206\u7bb1\u6b65\u9aa4\u4e3a\uff1a\u7279\u6b8a\u503c\u5904\u7406 \u2192 \u7ec6\u5206\u7bb1\uff1a\u7b49\u8ddd\u5206\u7bb1 \u2192 \u7c97\u5206\u7bb1\uff1aChiMerge/\u6811\u65b9\u6cd5\uff0c\u672c\u6b21\u91cd\u6784\u8fdb\u884c\u4e86\u5982\u4e0b\u4f18\u5316\uff1a\r\n\r\n* \u7ec6\u5206\u7bb1\u589e\u52a0\u7b49\u9891\u5206\u7bb1\uff0c\u7531\u4e8e\u4fe1\u8d37\u573a\u666f\u7684\u6570\u636e\u504f\u5ea6\u6781\u5927\uff0c\u7b49\u8ddd\u5206\u7bb1\u53ef\u80fd\u5728\u6570\u636e\u96c6\u4e2d\u90e8\u5206\u4e22\u5931\u7ec6\u8282\uff0c\u7b49\u9891\u5206\u7bb1\u66f4\u4e3a\u5408\u9002\r\n* \u7c97\u5206\u7bb1\u4e2d\u7684\u6811\u65b9\u6cd5\uff0c\u589e\u52a0\u4e86\u5bf9\u5355\u8c03\u6027\u7ea6\u675f\u7684\u652f\u6301(\u901a\u8fc7`ensure_monotonic=True`\u6253\u5f00\uff0c\u9ed8\u8ba4\u4e3a`False`)\r\n\r\n### `woebin`\u51fd\u6570\r\n\r\n```{python}\r\ndef woebin(dt,\r\n y,\r\n x=None,\r\n var_skip=None,\r\n breaks_list=None,\r\n special_values=None,\r\n positive=\"bad|1\",\r\n no_cores=None,\r\n methods=None,\r\n ignore_const_cols=True,\r\n ignore_datetime_cols=True,\r\n check_cate_num=True,\r\n replace_blank=True,\r\n **kwargs): ...\r\n```\r\n\r\n\u8be5\u51fd\u6570\u4e0e`sc.woebin`\u51fd\u6570\u63a5\u53e3\u57fa\u672c\u7c7b\u4f3c\uff0c\u4e3b\u8981\u53d8\u66f4\u5982\u4e0b\uff1a\r\n\r\n* methods: \u9ed8\u8ba4\u4e3a`['quantile', 'tree']`, \u5373\u91c7\u7528\u7b49\u9891\u5206\u7bb1\u2192\u6811\u5206\u7bb1\u7684\u5206\u7bb1\u65b9\u5f0f\uff1b\u8be5\u53c2\u6570\u9ed8\u8ba4\u53ef\u652f\u6301\u7684\u5206\u7bb1\u65b9\u6cd5\u5305\u62ec\r\n * `hist`: \u7b49\u8ddd\u5206\u7bb1\uff0c\u6ce8\u518c\u7c7b`riskmodels.scorecard.HistogramInitBin`\r\n * `quantile`: \u7b49\u9891\u5206\u7bb1\uff0c\u6ce8\u518c\u7c7b`riskmodels.scorecard.QuantileInitBin`\r\n * `tree`: \u6811\u5206\u7bb1\uff0c\u6ce8\u518c\u7c7b`riskmodels.scorecard.TreeOptimBin`\r\n * `chi2`/`chimerge`: ChiMerge\u5206\u7bb1\uff0c\u6ce8\u518c\u7c7b`riskmodels.scorecard.ChiMergeOptimBin`\r\n \u4f7f\u7528\u8be5\u53c2\u6570\u6709\u4ee5\u4e0b**\u6ce8\u610f\u4e8b\u9879**\uff1a\r\n * \u9996\u4e2a\u5206\u7bb1\u65b9\u6cd5\u5fc5\u987b\u4e3a\uff08\u65e0\u76d1\u7763\uff09\u7ec6\u5206\u7bb1\u65b9\u6cd5\uff0c\u6b64\u5904\u53ef\u9009\u4e3a`hist`\u548c`quantile`\u4e24\u7c7b\uff1b\r\n * \u7ec6\u5206\u7bb1\u65b9\u6cd5\u4e0d\u53ef\u4f4d\u4e8e\u5176\u4ed6\u5206\u7bb1\u65b9\u6cd5\u4e4b\u540e\uff0c\u5982`['quantile', 'tree']`\uff0c\u6b64\u65f6\u7b49\u9891\u5206\u7bb1\u65b9\u6cd5\u4e0d\u751f\u6548\uff1b\r\n * \u53ef\u4ee5\u53ea\u5305\u542b\u7ec6\u5206\u7bb1\uff0c\u5982`['quantile']`\u6216`['hist']`\uff0c\u6b64\u65f6\u4e3a\u7eaf\u65e0\u76d1\u7763\u5206\u7bb1\uff1b\r\n * \u5217\u8868\u957f\u5ea6\u53ef\u4ee5\u5927\u4e8e2\uff0c\u4f8b\u5982\uff1a`['quantile', 'tree', 'chi2]`\uff0c\u5373\u5728\u6811\u5206\u7bb1\u7684\u57fa\u7840\u4e0a\uff0c\u518d\u7528ChiMerge\u65b9\u6cd5\u5bf9\u65e0\u663e\u8457\u5dee\u5f02\u7684\u76f8\u90bb\u5206\u7bb1\u8fdb\u884c\u5408\u5e76\u3002\r\n* `**kwargs`: \u8be5\u53c2\u6570\u4e3a\u5404\u4e2a\u5206\u7bb1\u65b9\u6cd5\u6240\u9700\u8981\u7684\u53c2\u6570\uff0c\u5177\u4f53\u53ef\u89c1\u5206\u7bb1\u65b9\u6cd5\u7c7b\u7684\u6587\u6863\uff0c\u4e0b\u5217\u6700\u5e38\u89c1\u53c2\u6570\u3002\r\n * \u7b49\u8ddd\u5206\u7bb1\u548c\u7b49\u9891\u5206\u7bb1\r\n * initial_bins: \u7ec6\u5206\u7bb1\u7684\u6570\u91cf\uff0c\u9ed8\u8ba420\r\n * \u6811\u5206\u7bb1\u548cChiMerge\u5206\u7bb1\r\n * bin_num_limit: \u6700\u7ec8\u5206\u7bb1\u7684\u6700\u5927\u6570\u91cf\uff08\u4e0d\u542b\u7279\u6b8a\u503c\uff09\uff0c\u9ed8\u8ba45\r\n * count_distr_limit: \u5206\u7bb1\u6837\u672c\u5360\u603b\u6837\u672c\u7684\u6700\u5c0f\u6bd4\u4f8b\uff0c\u9ed8\u8ba40.05\r\n * stop_limit: \u5206\u7bb1\u505c\u6b62\u6761\u4ef6\uff0c\u6811\u5206\u7bb1\u4e3aIV\u503c\u76f8\u5bf9\u589e\u91cf\uff0cChiMerge\u4e3a\u72ec\u7acb\u6027\u68c0\u9a8cP\u503c\uff0c\u9ed8\u8ba40.05\r\n * ensure_monotonic\uff08\u4ec5\u6811\u5206\u7bb1\u652f\u6301\uff09: \u662f\u5426\u4fdd\u8bc1\u5355\u8c03\u6027\uff08\u4e0d\u542b\u7279\u6b8a\u503c\uff09\uff0c\u9ed8\u8ba4`False`\r\n\r\n#### \u5206\u7bb1\u65b9\u6cd5\u7684\u6269\u5c55\r\n\r\n\uff08\u7565\uff09\r\n\r\n### `woebin_ply`\u51fd\u6570\r\n\r\n```{python}\r\ndef woebin_ply(dt, bins, no_cores=None, replace_blank=False, value='woe'):\r\n ...\r\n```\r\n\r\n\u8be5\u51fd\u6570\u4e0e`sc.woebin_ply`\u51fd\u6570\u63a5\u53e3\u57fa\u672c\u7c7b\u4f3c\uff0c\u589e\u52a0\u5982\u4e0b\u53c2\u6570\uff1a\r\n\r\n* value: \u53ef\u9009\u9879\u4e3a `['woe', 'index', 'bin']`\uff0c\u9ed8\u8ba4\u4e3a 'woe'\r\n * value='woe'\u65f6\uff0c\u5c06\u539f\u59cb\u503c\u66ff\u6362\u4e3awoe\u503c\uff0c\u8fd4\u56de\u7684\u5b57\u6bb5\u540d\u4e3a `\u53d8\u91cf\u540d_woe`\uff0c\u4e0e`sc.woebin_ply`\u4e00\u81f4\uff1b\r\n * value='index'\u65f6\uff0c\u5c06\u539f\u59cb\u503c\u66ff\u6362\u4e3a\u53d8\u91cf\u5206\u7bb1\u7ed3\u679c\u6570\u636e\u6846\u4e2d\u7684index\uff0c\u8fd4\u56de\u7684\u5b57\u6bb5\u540d\u4e3a `\u53d8\u91cf\u540d_index`\uff1b\r\n * value='bin' \u65f6\uff0c\u8fd4\u56de\u7ed3\u679c\u4e3a\u5206\u7bb1\u533a\u95f4 [a,b) \u3010\u6570\u503c\u578b\u53d8\u91cf\u3011\u6216 a%,%b \u3010\u7c7b\u522b\u578b\u53d8\u91cf\u3011\uff0c\u8fd4\u56de\u7684\u5b57\u6bb5\u540d\u4e3a `\u53d8\u91cf\u540d_bin`\u3002\r\n\r\n### `woebin_psi`\u51fd\u6570\r\n\r\n```{python}\r\ndef woebin_psi(df_base, df_cmp, bins):\r\n ...\r\n```\r\n\r\n\u8be5\u51fd\u6570\u4e3a\u65b0\u589e\u51fd\u6570\uff0c\u7528\u4e8e\u8ba1\u7b97\u53d8\u91cfPSI\u503c\uff0c\u8be6\u7ec6\u4f7f\u7528\u65b9\u5f0f\u89c1\u51fd\u6570\u6587\u6863\u3002\r\n\r\n\r\n\r\n\r\n\r\n### \u5176\u4ed6\u51fd\u6570\r\n\r\n\u5176\u4ed6\u5e38\u7528\u51fd\u6570\u5217\u4e3e\u5982\u4e0b\uff1a\r\n\r\n* sc_bins_to_df: \u6574\u5408`woebin`\u8fd4\u56de\u503c\uff0c\u751f\u6210woe\u8868\u548civ\u8868\r\n* woebin_breaks: \u6839\u636e`woebin`\u8fd4\u56de\u503c\u4fdd\u5b58\u5207\u5206\u70b9\u548c\u7279\u6b8a\u503c\u70b9\r\n* woebin_plot: \u6839\u636e`woebin`\u8fd4\u56de\u503c\u751f\u6210bivar\u56fe\u50cf\r\n\r\n\r\n## \u903b\u8f91\u56de\u5f52\u5efa\u6a21\r\n\r\n## \u8bc4\u5206\u5361\u8f6c\u6362\r\n\r\n### `make_scorecard`\u51fd\u6570\r\n\r\n```{python}\r\ndef make_scorecard(sc_bins, coef, *, base_points=600, base_odds=50, pdo=20):\r\n ...\r\n```\r\n\r\n\u8be5\u51fd\u6570\u7528\u4e8e\u751f\u6210\u8bc4\u5206\u5361\uff0c\u5176\u4e2d`coef`\u4e3a\u5404\u4e2a\u5165\u6a21\u53d8\u91cf\u7684\u7cfb\u6570\u5b57\u5178\uff1a `{\u53d8\u91cf\u540d_woe: \u7cfb\u6570}`\r\n\r\n## \u6a21\u578b\u8bc4\u4f30\r\n",
"bugtrack_url": null,
"license": null,
"summary": "\u4fe1\u7528\u98ce\u9669\u6a21\u578b\u5de5\u5177\u5305",
"version": "0.2.4",
"project_urls": {
"Bug Tracker": "https://github.com/siyuany/riskmodels/issues",
"Homepage": "https://github.com/siyuany/riskmodels"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "313ca98385433e22ea519cf1745b00f29bef2b514b4640a4bad3a20ebfaa8057",
"md5": "22b0341ab299924f241f9a45652bcb49",
"sha256": "025a86821c429e2eef6923ac19ad9891f14982b5dec00bdb3316ccd6712508ab"
},
"downloads": -1,
"filename": "syriskmodels-0.2.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "22b0341ab299924f241f9a45652bcb49",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 37052,
"upload_time": "2024-08-07T15:27:51",
"upload_time_iso_8601": "2024-08-07T15:27:51.099193Z",
"url": "https://files.pythonhosted.org/packages/31/3c/a98385433e22ea519cf1745b00f29bef2b514b4640a4bad3a20ebfaa8057/syriskmodels-0.2.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a0cb60b3c6ef1a2f25127fc2819db4e30304f3c53f53de2d47f48185acdce9e3",
"md5": "5de50ed7ee833fa19f0c91d035a2d3c7",
"sha256": "c9803a91d16c37ae14a0b9bc1187385a77844b5e4b4c1007a95d390720b9528e"
},
"downloads": -1,
"filename": "syriskmodels-0.2.4.tar.gz",
"has_sig": false,
"md5_digest": "5de50ed7ee833fa19f0c91d035a2d3c7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 37392,
"upload_time": "2024-08-07T15:27:52",
"upload_time_iso_8601": "2024-08-07T15:27:52.726957Z",
"url": "https://files.pythonhosted.org/packages/a0/cb/60b3c6ef1a2f25127fc2819db4e30304f3c53f53de2d47f48185acdce9e3/syriskmodels-0.2.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-07 15:27:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "siyuany",
"github_project": "riskmodels",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "syriskmodels"
}