rascpy

Name	rascpy JSON
Version	2025.8.24 JSON
	download
home_page	None
Summary	The RASC project has made some supplements and improvements to existing statistical methods in order to provide data analysts and modelers with a more accurate and easier-to-use algorithmic framework.
upload_time	2025-08-24 12:27:53
maintainer	None
docs_url	None
author	SIFU DATA SCIENCE
requires_python	None
license	Apache-2.0
keywords	credit risk management scorecard optimal bins auto report xgb auto params search step-wise liner-regression step-wise logistic-regression reject inference
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# RASC
The RASC project makes some supplements and improvements to existing statistical methods in order to provide data analysts and modelers with a more accurate and easier-to-use algorithmic framework.
Project evolution process:
- Phase 1 (ScoreConflow): The RASC provides a "business instruction + unattended" scoring card model development method. The original intention of the RASC project is to provide more accurate algorithms and more labor-saving scoring card development tools.
- Phase 2 (Risk Actuarial Score Card): Based on the previous phase, a risk actuarial score card is provided to build a model by comprehensively considering user default probability, user profit level, and data cost.
- Phase 3 (Risk Actuarial Statistics): This phase goes beyond scorecard development. It aims to improve the areas where statistics and machine learning fall short in actual risk measurement. Building on the previous phase, it adds features such as XGB automatic parameters search, high-dimensional stratified sampling, filling mixed data types, and rejection inference.
## Resources
[English Documents,Tutorials and Samples (Version = 2025.8.24)](https://github.com/sifuHK/rasc/tree/main/2025.8.24)
[中文文档、教程和示例 (版本 = 2025.8.24)](https://gitee.com/sifuHK/rasc/tree/master/2025.8.24)
## Install
pip install rascpy
## Project Introduction
Main Functions Include:
1. Provide binning algorithms that support monotonic or U-shaped constraints.
1. In addition to user-specified constraints, RASC can also automatically identify constraints (monotonic or U-shaped) applicable to the data based on the training and validation sets.Users can export the recognition results to Excel and compare them with their own understanding of variable trends.
1. It can handle multiple special values and None values at the same time.
1. Provide a more accurate binning algorithm, resulting in a higher IV. Mathematically, this set of split nodes is proven to be the global optimal node, regardless of constraints or even for categorical variables.
1. Provide Python's two-way stepwise regression algorithm (including logistic regression and linear regression), and supports multiple constraints, such as coefficient signs, P-values, group variable number limits, and other variable selecting conditions.Supports exporting it to Excel that the entire model iteration process and the reasons for each variable's exclusion from the model.
1. Provide a bidirectional stepwise regression algorithm with actuarial capabilities. This algorithm uses company profits as part of a loss function that considers the model's prediction accuracy, individual user profitability, and data costs. (This is during the testing phase and will undergo significant changes.)
1. Provide a more convenient missing value filling function. Common missing value filling methods can only handle missing values, but cannot handle special values, especially in scenarios where the data contains both missing values and special values. Special values cannot be simply equated with missing values. Simply treating special values as missing values without considering the business scenario will lead to information loss. Special values will turn numerical data into a complex data type that mixes categorical and numerical types. Currently, no model can directly handle this type of data (although some models can produce results, they are not accurate and have no practical significance). This problem can be solved by using the Impute package provided by RASC. The transformed data can be directly fed into any model and meet practical business requirements.
1. Provide high-precision, high-dimensional stratified sampling. The stratified sampling provided by machine learning is overly simplistic. When stratified sampling is performed solely based on the Y label, a phenomenon may occur: after the variables in the training and test sets are segmented with equal frequency according to the same nodes, the event rates of Y differ significantly. This makes it difficult to narrow the metric differences between the training and validation sets during modeling. Without high-precision, high-dimensional stratified sampling, this issue can only be addressed by reducing model performance to improve model generalization. Another phenomenon is that after binning, the binning results of the training and validation sets differ significantly, manifesting as significant differences in IV values. Without high-precision, high-dimensional stratified sampling, the only way to improve generalization is to increase the bin width. The stratified sampling method provided by RASC has been tested and shown to significantly mitigate this phenomenon, reducing the discrepancy between the training and validation sets without compromising model performance or binning IV. (Excessive disparity between datasets is often caused by inconsistencies in the high-dimensional joint distribution, but due to sampling precision limitations, this can only be treated as overfitting.)
1. Provide automatic parameter tuning for xgboost. According to tests, models created with other parameter tuning frameworks often exhibit significant discrepancies between training and validation set metrics. However, the xgboost automatic parameter tuning framework provided by RASC minimizes the discrepancy between training and validation set metrics.
1. Support the rejection inference model of the scorecard.
1. Support the rejection inference model of xgboost.
1. Provide model report output function, users can easily generate Excel documents for model development reports.
1. Provide batch beautification and export of DataFrame to Excel. The output format is similar to Excel pivot table with color scale.
1. Support for automated modeling. The functions provided above can be called using traditional API methods, allowing users to assemble each module through programming to build models. Alternatively, unattended modeling can be achieved by using the AI instruction templates provided by RASC.
## Introduction To Main Modules
### 1.Bins
The optimal split nodes calculated by Bins is a set of split nodes that maximizes IV with a mathematical proof.
For categorical variables, including ordered and unordered categories, there is also a set of split nodes that can be mathematically proven to maximize IV.
Its main functions are:
1. Find the split nodes that maximizes IV with or without constraints. Five constraint settings are supported: monotonic (automatically determines increasing or decreasing), monotonically decreasing, monotonically increasing, U-shaped (automatically determines convex or concave), and automatically set appropriate constraints (automatically determines monotonically decreasing, monotonically increasing, convex U-shaped, and concave U-shaped).
1. For categorical variables with or without constraints, the global optimal split nodes can also be figure out to maximize IV.
1. Use "Minimum difference in event rates between adjacent bins" instead of "Information Gain" or "Chi-Square Gain" to prevent the formation of bins with too small differences. This allows users to intuitively understand the size of the differences between bins. This feature is also supported for categorical variables.
1. Do not replace the minimum value of the first bin with negative infinity, nor the maximum value of the last bin with positive infinity. This ensures that outliers are not masked by extending extreme values to infinity. RASC also provides a comprehensive mechanism to handle online values exceeding modeling boundaries. This resolves the common contradiction between the need to detect outliers as early as possible during data analysis and the need to mask them in online applications to prevent process bottlenecks (while still providing timely alerts).
1. The concept of wildcards is introduced to solve the problem that the online values of categorical variables exceed the modeling value range.
1. Support multi-process parallel computing.
1. Support binning of weighted samples.
1. Support for left closed and right open binning.

In most cases, users do not need to interact directly with Bins components. However, RASC is designed to be pluggable, so advanced users can use Bins modules independently, just like any other Python module.
### 2.Reg_Step_Wise_MP
It is a linear/logistic two-way stepwise regression implemented in Python, which adds the following features to the traditional two-way stepwise regression:
1. When performing stepwise variable selection for logistic regression, AUC, KS, and LIFT metrics can be used instead of AIC and BIC. For some business scenarios, AUC and KS are more appropriate. For example, in ranking tasks, a model built using the KS metric uses fewer variables while maintaining the same KS, thereby reducing data costs.
1. When performing stepwise variable selection, use other datasets to calculate model evaluation metrics rather than the modeling dataset. Especially when the data size is large and a validation set is included in addition to the training and test sets, it is recommended to use the validation set to calculate evaluation metrics to guide variable selection. This helps reduce overfitting.
1. Supports using partial data to calculate model evaluation metrics to guide variable selection. For example, if a business requires a certain pass rate of N%, then the bad event rate can be minimized for the top N% of samples, without requiring all samples to be included in the calculation. Actual testing shows that in appropriate scenarios, using partial data as evaluation metrics results in fewer variables than using full data, but the metrics of interest to users remain unchanged. Because the model focuses only on the top, more easily distinguishable sample points, business objectives can be achieved without requiring too many variables.
1. Supports setting multiple conditions. Variables must meet all conditions simultaneously to be included in the model. Built-in conditions include: P-Value, VIF, correlation coefficient, coefficient sign, number of variables in a group, etc.
1. Supports specifying variables that must be entered into the model. If the specified variables conflict with the four conditions, a comprehensive mechanism has been designed to resolve the problem.
1. The modeling process is exported to Excel, recording the reasons for deleting each variable and the process information of each round of stepwise regression.
1. Support actuarial calculations, using company profits as a loss function that takes into account the model's prediction accuracy, the profit level of a single user, and data costs (in the testing phase, there will be significant changes later)

In most cases, users do not need to interact directly with the Reg_Step_Wise_MP component. However, RASC is designed to be pluggable, and advanced users can use the Reg_Step_Wise_MP module independently, just like any other Python module.
### 3.Cutter
Perform equal frequency segmentation or segmentation according to specified split points, which has the following enhancements over the built-in segmenters of Python or Pandas:
1. A mathematically provable analytical solution with minimum global error.
1. All split points are derived from the original data. The minimum and maximum values for each interval are derived from the original data. This is different from Python or Pandas built-in splitters, which modify the minimum and maximum values at each end of each group.
1. More humane support for left closed and right open: the last group is right closed.
1. A globally optimal segmentation solution can be given even for extremely skewed data.
1. Support weighted series.
1. Supports user-specified special values. Special values are grouped separately, and users can also combine multiple special values into one group through configuration.
1. Users can specify how to handle None values. If not specified and the sequence contains None values, the None values will be automatically grouped together.
1. When a sequence is split using a specified split point, if the maximum or minimum value of the sequence exceeds the maximum or minimum value of the split point, the maximum and minimum values of the split point will be automatically extended.

It is recommended to try using Cutter to replace the built-in equal frequency segmentation component of Python or Pandas.
### 4. Other Modules
There are also other modules that can significantly improve the accuracy and efficiency of data analysis and modeling:
The rascpy.Impute package can handle data with multiple special values and None values (binary classification tasks). This solves the current problem of using Impute to treat special values as None or as normal values, which can result in information loss or render the model meaningless.
1. Provides high-precision, high-dimensional stratified sampling. This solves the current problem of reducing the discrepancy between training and test set metrics by compromising model performance due to low sampling precision. rascpy.Sampling can reduce the discrepancy between training and test set metrics by minimizing the differences in dataset distribution without compromising model performance.
1. Provides automatic parameter tuning for xgboost. rascpy.Tree.auto_xgb differs from other automatic parameter tuning frameworks in that it can reduce the model variance while maintaining high training set metrics.
1. Support scorecard and xgboost rejection inference.
1. In addition to manually calling the above modules, users can choose to use AI instructions to automatically complete modeling without human supervision.
## Usage Tutorial
### Scorecard Development Example
```Python
from rascpy.ScoreCard import CardFlow
if __name__ == '__main__':# Windows must write the main function, Linux and MacOS do not need to write the main function
# Pass in the command file
scf = CardFlow('./inst.txt')
# There are 11 steps in total: 1. Read datas, 2. Equal frequency binning, 3. Variable pre-filtering, 4. Monotonicity suggestion, 5. Optimal binning, 6. WOE conversion, 7. Variable filtering, 8. Modeling, 9. Generate scorecard, 10. Output model report, 11. Develop rejection inference scorecard
scf.start(start_step=1,end_step=11)# will automatically give the score card + the score card of rejection inference

# You can stop at any step, as follows:
scf.start(start_step=1,end_step=10)#No scorecard of rejection inference will be developed
scf.start(start_step=1,end_step=9)#No model report will be output

# If the results of the run are not modified, there is no need to run again. As shown below, steps 1-4 that have been run will be automatically loaded (will not be affected by restarting the computer)
scf.start(start_step=5,end_step=8)

# You can also omit start_step and end_step, abbreviated as:
scf.start(1,10)
```
After each step of scf.start is completed, a lot of useful intermediate data will be retained. This data will be saved in the work_space specified in inst.txt as pkl. Users can manually load and access this data at any time. It can also be called through the CardFlow object instance. The intermediate results generated after each step is completed are:
- step1: scf.datas
- step2: scf.train_freqbins,scf.freqbins_stat
- step3: scf.fore_col_indices,scf.fore_filtered_cols
- step4: scf.mono_suggests,scf.mono_suggests_eventproba
- step5: scf.train_optbins,scf.optbins_stat
- step6: scf.woes
- step7: scf.col_indices,scf.filtered_cols,scf.filters_middle_data,scf.used_cols
- step8: scf.in_clf_cols,scf.clf_del_cols,scf.clf,scf.clf_perf,scf.clf_coef,scf.del_reason,scf.step_proc
- step9: scf.card
- step11:scf.rejInfer.train_freqbins,scf.rejInfer.freqbins_stat,
scf.rejInfer.fore_col_indices,scf.rejInfer.fore_filtered_cols,
scf.rejInfer.mono_suggests,scf.rejInfer.mono_suggests_eventproba,
scf.rejInfer.train_optbins,scf.rejInfer.optbins_stat,scf.rejInfer.woes,
scf.rejInfer.col_indices,scf.rejInfer.filtered_cols,scf.rejInfer.filters_middle_data,scf.rejInfer.used_cols
scf.rejInfer.in_clf_cols,scf.rejInfer.clf_del_cols,scf.rejInfer.clf,scf.rejInfer.clf_perf,scf.rejInfer.clf_coef,scf.rejInfer.del_reason,scf.rejInfer.step_proc
And store the newly synthesized dataset for rejection inference in scf.datas['rejData']['__synData']
**load_step**
```Python
# load_step is only loading without execution. If your Python program is closed after execution and needs to be read again, there is no need to run it again. Just load the previous result. Even if the user closes the Python kernel or restarts the computer, the user can easily restore the CardFlow instance and call the intermediate data.
# load_step avoids the trouble of loading pkl to obtain intermediate datas. CardFlow instance is equivalent to an intermediate data management container.
# For example: load all steps 5 and before, and then call them through scf.xx
from rascpy.ScoreCard import CardFlow
scf = CardFlow('./inst.txt')
scf.start(load_step = 5)
print(scf.datas)
print(scf.train_optbins)
```
#### Example Of A Instruction File
```txt
[PROJECT INST]
model_name = Test
work_space = ../ws
no_cores = -1

[DATA INST]
model_data_file_path = ../data/model
oot_data_file_path = ../data/oot
reject_data_file_path = ../data/rej
sample_weight_col = sample_weight
default_spec_value = {-1}

[BINS INST]
default_mono=L+
default_distr_min=0.02
default_rate_gain_min=0.001
default_bin_cnt_max = 8
default_spec_distr_min=${default_distr_min}
default_spec_comb_policy = A

[FILTER INST]
filters = {"big_homogeneity":0.99,"small_iv":0.02,"big_ivCoV":0.3,"big_corr":0.8,"big_psi":0.2}
filter_data_names = {"big_homogeneity":"train,test","small_iv":"train,test","big_ivCoV":"train,test","big_corr":"train","big_psi":"train,test"}

[MODEL INST]
measure_index=ks
pvalue_max=0.05
vif_max=2
corr_max=0.7
default_coef_sign = +

[CARD INST]
base_points=500
base_event_rate=0.05
pdo=50

[REPORT INST]
y_stat_group_cols = data_group
show_lift = 5,10,20

[REJECT INFER INST]
reject_train_data_name = rej
only_base_feas = True
```
#### Detailed Description Of All Instructions
[English all_instructions_detailed_desc.txt](https://github.com/sifuHK/rasc/blob/main/2025.8.24/all_instructions_detailed_desc.txt)
[中文 全部指令详细说明.txt](https://gitee.com/sifuHK/rasc/blob/master/2025.8.24/全部指令详细说明.txt)
### Optimal Binning Example
In the scorecard development example, rascpy.Bins.OptBin/OptBin_mp is automatically called through CardFlow.
Users can also manually call OptBin/OptBin_mp to build their own modeling solutions.
``` Python
# OptBin_mp is a multi-process version of OptBin
from rascpy.Bins import OptBin,OptBin_mp
# Main parameter description
# mono: Specifies the monotonicity constraint for each variable, such as: L+ is linearly increasing, U is automatically selecting from positive U or negative U, and A is automatically selecting from L+, L-, Uu, and Un based on the data. For the value range, see [BINS INST]:mono in "Detailed Description of All Instructions".
# default_mono: Default monotonicity constraint for variables not set in mono
# distr_min: Specify the minimum bin ratio of normal values except special values for each variable
# default_distr_min: If the variable is not configured in distr_min, the default minimum binning ratio of the normal value
# spec_value: Specify the special value of each variable. For the rules of writing special values, see [DATA INST]:spec_value in "Detailed description of all instructions".
# default_spec_value: The default special value of the variable that does not appear in spec_value. When the special value you configured does not exist in a certain variable, the special value configuration will be automatically ignored. This command is very convenient to use when there is global unified special values in the data.
# spec_distr_min: The minimum percentage of each special value for each variable (when the type is a double-nested dict) or the minimum percentage of all special values for the variable (when the type is a single-layer dict). If the proportion of special value of a variable is too small, the special values are merged using the merging strategy specified by spec_comb_policy. The purpose of special value merging is to reduce abnormal results caused by special values with too small a percentage.
# default_spec_distr_min: If the variable is not in spec_distr_min, the default special value minimum percentage under the variable. The value can be a dict (to configure the default minimum percentage for each special value separately) or a number (all special values use the same default percentage)
# spec_comb_policy: Specifies the merging rule for special values for each variable. When the proportion of the special value is less than the threshold specified by spec_distr_min, the merging rule is used. For the value range, see [BINS INST]:spec_comb_policy in "Detailed Description of All Instructions".
# default_spec_comb_policy: If the variable is not configured in spec_comb_policy, the default special value merging rule is used. If the variable has no special value, this parameter is automatically ignored.
# order_cate_vars: Specifies the ordered categorical variables in the data and gives the order of each category. ** represents a wildcard character; all unconfigured categories are merged into the wildcard character. Wildcards are well-suited for variables with long-tail distributions. If the order of a variable is set to None, lexicographic order is used.
# unorder_cate_vars: Specifies the unordered categorical variables in the data. Unordered categories will be sorted according to the event rate. Each variable corresponds to a float: if the proportion of the category is less than the threshold, it will be merged into the wildcard category. The corresponding value of the variable is None: no limit on the proportion (may cause large fluctuations)
# no_wild_treat: When a categorical variable does not have a wildcard and an uncovered category appears in the dataset, the method about how to treat the category. For the value range, see [CATE DATA INST]: no_wild_treat in "Detailed Description of All Instructions".
# default_no_wild_treat: If there is no variable configured in no_wild_treat, the default treatment method for this category will be used if an uncovered category occurs.
# cust_bins: User manually bins the variable
# cores: The number of CPU cores used by multiple processes. None: All cores int: When it is greater than 1, it specifies the number of cores to be used. When it is less than 0, it specifies the number of cores reserved for the system, that is, all cores minus the specified number of cores. When it is equal to 1, it turns off multiple processes and uses a single process, which is equivalent to calling OptBin
if __name__ == '__main__':# Windows must write the main function, Linux and MacOS do not need to write the main function
optBins = OptBin_mp(X_dats,y_dats,mono={'x1':'L+','x2':'U'},default_mono='A',
distr_min={'x1':0.05},default_distr_min=0.02,default_rate_gain_min=0.001,
bin_cnt_max={'x2':5},default_bin_cnt_max=8,
spec_value={'x1':['{-999,-888}','{-1000,None}']}, default_spec_value=['{-999,-888}','{-1000}'],
spec_distr_min={'x1':{'{-1000,None}':0.01,'{-999,-888}':0.05},'x2':0.01},default_spec_distr_min=0.02,
spec_comb_policy={'x2':'F','x3':'L'},default_spec_comb_policy='A',
order_cate_vars={'x7':['v3','v1','v2'],'x8':['v5','**','v4'],'x9':None},
unorder_cate_vars={"x10":0.01,"x11":None},no_wild_treat={'x10':'H','x11':'m'},default_no_wild_treat='M',
cust_bins={'x4':['[1.0,4.0)','[4.0,9.0)','[9.0,9.0]','{-997}','{-999,-888}','{-1000,None}']},cores=-1)
```
### Bidirectional Stepwise Logistic Regression Example
In the scorecard development example, rascpy.Reg_Step_Wise_MP.LogisticReg is automatically called through CardFlow.
Users can also manually call LogisticReg to build their own modeling solutions.
``` Python
from rascpy.Reg_Step_Wise_MP import LogisticReg
# Generate data: There are 10 variables in total, of which the first 4 are useful variables, the middle 2 are redundant variables (there is collinearity with the first 4 variables), and the last 4 are useless variables. Add appropriate noise
X, y = make_classification(n_samples=10000,n_features=10,n_informative=4,n_redundant=2,shuffle=False,random_state=random_state,class_sep=2)
# Convert X to a DataFrame and modify the column names to match their variable effects. rascpy.Reg_Step_Wise_MP.LogisticReg can only accept DataFrame as X
X = pd.DataFrame(X,columns=['informative_1','informative_2','informative_3','informative_4','redundant_1','redundant_2','useless_1','useless_2','useless_3','useless_4'])
# Convert y to Series. rascpy.Reg_Step_Wise_MP.LogisticReg can only accept Series as y
y=pd.Series(y).loc[X.index]
# Main parameter description
# measure: A metric used to determine whether a parameter should be entered into the model. It supports aic, bic, roc_auc, ks, and other indicators.
# pvalue_max: The pvalue of all variables in model cannot exceed this value. rasc designs a complex and reasonable mechanism to ensure that the pvalue of all variables in model is not greater than this value.
# vif_max: The vif of all variables in model cannot exceed this value. rasc designs a complex and reasonable mechanism to ensure that the vif of all variables in model is not greater than this value.
# corr_max: The pairwise correlation coefficients of all variables in model cannot exceed this value. rasc designs a complex and reasonable mechanism to ensure that the pairwise correlation coefficients of all variables in model are not greater than this value.
# iter_num: number of rounds of stepwise regression
# results_save: Output the model effect, information related to the model coefficients, reasons for deleting variables, and details of each round of stepwise regression into an Excel.
# Return value description
# in_vars: all variables entered into the model
# clf_final: The final model returned
# clf_perf: Model performance
# clf_coef: information related to model coefficients
# del_reason: reason for deleting variable
# step_proc: Details of each round of stepwise regression
if __name__ == '__main__':# Windows must write the main function, Linux and MacOS do not need to write the main function
lr = LogisticReg(X,y,measure='roc_auc',pvalue_max=0.05,vif_max=3,corr_max=0.8,iter_num=20,results_save = 'test_logit.xlsx')
in_vars,clf_final,clf_perf,clf_coef,del_reason,step_proc = lr.fit()

# Other important parameters
# user_save_cols: variables that are forced to enter into the model. A complex and reasonable mechanism is designed to handle conflicts between user_save_cols and commands such as pvalue_max, vif_max, and corr_max.
# coef_sign: dict, used to specify the coefficient sign of each variable
# default_coef_sign:When the variable is not in coef_sign, the default value of the coefficient sign constraint for that variable
# exc_group: Ensure that only one variable in the group can be entered into model. Subsequent versions will allow users to set the number of variables or the variables total cost limit in the group to be entered into model. For its value rules, see [MODEL INST]:exc_group in "Detailed Description of All Instructions".
```
### XGB Automatic Parameter Search Example
``` Python
from rascpy.Tree import auto_xgb
# Parameter description
# cands_num: auto_xgb will give a score to each hyperparameter tried during automatic parameter search. The higher the score, the more recommended the model trained with the hyperparameter is. Then the scores are sorted from high to low, and the models with the top cands_num scores are returned.
# In actual use, the model with the highest score (i.e. clf_cands[0]) is the best model in most cases. However, users can still select their favorite model from the candidate models clf_cands[n] according to their preferences.
# cost_time: The running time of auto_xgb. Because the essence of parameter search is a combinatorial explosion, the goal of any algorithm is to find the most likely optimal set of hyperparameters within a limited time. Therefore, the longer cost_time is, the more likely it is to find the optimal set of hyperparameters.
# However, in actual use, the author find that setting cost_time to 3-5 minutes is enough to yield the optimal model for most cases. Setting it longer generally fails to yield a higher-scoring model. If the user is dissatisfied with the model, they can try increasing cost_time, but increasing it to more than 8 minutes is not recommended and will likely be ineffective.
# If the user is not satisfied with the bias or variance of the model, the best approach is not to increase cost_time, but to try using a more accurate sampling method, such as rascpy.Impute.BCSpecValImpute
# Return value description
# perf_cands: list. Metrics of all candidate models. Each metric contains three pieces of information: train_ks(train_auc), val_ks(val_auc), |train - val| (the absolute value of the difference between the training set and the validation set)
# params_cands: list. Hyperparameters of all candidate models
# clf_cands: list. All candidate models
# vars_cands: list. All candidate model input variables
# Note: The indexes of these four return values are relative. If the user decides to use the clf_cands[0] model, he can view the model's metrics through perf_cands[0], the model's hyperparameters through params_cands[0], and the model's input variables through vars_cands[0].
perf_cands,params_cands,clf_cands,vars_cands = auto_xgb(train_X,train_y,val_X,val_y,metric='ks',cost_time=60*5,cands_num=5)
proba_hat = clf_cands[0].predict_proba(X)[:,1]#The columns of X need to completely correspond to the columns during training. Even if a column is not entered into the model, it must be passed into the predict_proba method.
# When making predictions, you can also try to use the more convenient predict_proba
from rascpy.Tool import predict_proba
proba_hat = predict_proba(clf_cands[0],X[vars_cands[0]],decimals=4)#Only the variables to be input into the model need to be passed in, which is very convenient for online systems. And the returned proba_hat is a Series with the same row index as X.
```
### Impute Example
BCSpecValImpute can be used to handle special values and missing values in data for binary classification problems. It can handle special values and missing values for continuous, unordered categorical, and ordered categorical variables.
BCSpecValImpute can simultaneously fill in empty values and transform special values.
If the data contains both None values and special values, most models cannot handle them well (in a business-meaningful way). We recommend using rascpy.Impute.BCSpecValImpute to preprocess the data before training it in a binary classification model.
``` Python
from rascpy.Impute import BCSpecValImpute
# Main parameter description
# spec_value: specifies the special value of each variable. For the rules of writing special values, see [DATA INST]:spec_value in "Detailed description of all instructions".
# default_spec_value: The default special value of the variable that does not appear in spec_value. When the special value you configured does not exist in a certain variable, the configuration of the special value will be automatically ignored. This command is very convenient to use when there is a global unified special value in the data.
# order_cate_vars: Specifies the ordered categorical variables in the data and gives the order of each category. ** represents a wildcard character; all unconfigured categories are merged into the wildcard character. Wildcards are well-suited for variables with long-tail distributions. If the order of a variable is set to None, lexicographic order is used.
# unorder_cate_vars: Specifies the unordered categorical variables in the data. Unordered categories will be sorted according to the event rate. If the value is float, if the proportion of the category is less than the threshold, it will be merged into the wildcard category. If the value is None, there is no limit on the proportion (which may cause large fluctuations)
# impute_None: Whether to fill in None values. Because some models can automatically handle None values, if you use such a model later, you can ignore None values when filling, and only need to handle special values. (Almost all models cannot directly handle data with both None values and special values)
bcsvi = BCSpecValImpute(spec_value={'x1':['{-999,-888}','{-1000,None}'],'x11':['{unknow}']},default_spec_value=['{-999}','{-1000}'],
order_cate_vars={'x8':['v5','**','v4'],'x9':None},
unorder_cate_vars={"x10":0.01,"x11":None},impute_None=True,cores=None)
bcsvi.fit(trainX,trainy,weight=None) # weight=None can be omitted
trainX = bcsvi.transform(trainX)
# trainX = bcsvi.fit_transform(trainX,trainy)
testX = bcsvi.transform(testX)

#View the specific filling rules:
print(bcsvi.impute_values)
#Output format: {'x1':{-999:2,-888:1,-1000:0,None:0},'x2':{-999:1,-1000:0},'x8':{None:'D'},'x11':{'unknow':'A'}}}
#From the results, we can see that the special value -999 of the numeric variable x1 is filled with 2, and the empty value is filled with 0, etc. The special value 'unknow' of the categorical variable x11 is filled with A
#If the key corresponding to a variable name is not found in the first-level dict, it means that the variable has no special value in the training set and does not need to be filled. (However, it is necessary to avoid the situation where special values exist in other datasets)
```
### High-Dimensional Stratified Sampling Example
One of the most important evaluation criteria for the effectiveness of a high-dimensional stratified sampling algorithm is whether the joint distribution of **each** x variable and y can remain consistent in each dataset after sampling.
If the data itself can be divided into multiple groups, then it is also required that the joint distribution of **each** x variable and y can remain consistent in each group in each dataset after sampling.
rascpy provides the rascpy.Sampling.split_cls algorithm, which is designed for high-precision sampling of binary classification problems. Compared with some sampling algorithms, it shows good consistency in joint distribution, regardless of whether the data contains groups. This is especially true for the x variable, which has a strong predictive effect.
``` Python
from rascpy.Sampling import split_cls
# Main parameter description
# dat: dataframe dataset
# y:y column name of the label
# test_size: sampling ratio
# w: column name of weight
# groups: data grouping fields
train,test = split_cls(dat,y='y',test_size=0.3,w='weight',groups=['c1','c2'],random_state=0)
```
### Scorecard Rejection Inference Model
There are three methods for developing scorecard rejection inference models. Users can choose any method based on their own situation.
Method 1: Complete the normal scorecard and rejection inference scorecard simultaneously. Suitable for developing scorecards from scratch
``` Python
from rascpy.ScoreCard import CardFlow
if __name__ == '__main__':# Windows must write the main function, Linux and MacOS do not need to write the main function
# Pass in the command file
scf = CardFlow('./inst.txt')
scf.start(start_step=1,end_step=11)# will automatically generate standard scorecards and rejection inference scorecards
```
Method 2: Complete the standard scorecard first, then generate the rejection inference scorecard. This is suitable for those who have already generated the standard scorecard with rascpy and need to generate a rejection inference scorecard.
``` Python
from rascpy.ScoreCard import CardFlow
if __name__ == '__main__':
# Pass in the command file
scf = CardFlow('./inst.txt')
scf.start(start_step=11,end_step=11)#If you have already run step 1 to step 10, you can set both start_step and end_step to 11 to generate a rejection inference scorecard.
```
Method 3: Directly call the CardRej module. This is suitable for those who have developed a scorecard using other python packages and then use rascpy to generate a rejection inference scorecard.
``` Python
from rascpy.ScoreCardRej import CardRej
if __name__ == '__main__':
# Main parameter description
# init_clf: unbiased logistic regression model
# init_optbins_stat_train: Unbiased bin statistics. Format: {'x1':pd.DataFrame(columns=['bin','woe'])}
# datas: data passed in by the user. Format example: {'rejData':{'rej':pd.DataFrame(),'otherRej':pd.DataFrame()},'ootData':{'oot1':pd.DataFrame(),'oot2':pd.DataFrame()}}
# inst_file: Instruction file. The instructions are the same as those in the 'Scorecard Development Example'. See "Detailed Instructions for All Instructions". If datas is empty, all data files under [DATA INST]:xx_data_file_path in the inst_file file will be automatically loaded. If datas is not empty, the configuration of [DATA INST]:xx_data_file_path will be ignored.
cr = CardRej(init_clf,init_optbins_stat_train,datas=None,inst_file='inst.txt')
cr.start()
```
Refer to the intermediate data generated by the rejection inference in step 11 of the "Scorecard Development Example". The intermediate data is called by scf.rejInfer.xx in Method 1 and Method 2, and by cr.xx in Method 3.
### Tree Rejection Inference Model
``` Python
from rascpy.TreeRej import auto_rej_xgb
# Main parameter description
# xx_w: weight of the corresponding dataset
# metric: two options, ks or auc
# Return value description
# not_rej_clf: non-rejection inference xgb model
# rej_clf: reject the inferred xgb model
# syn_train: synthetic data used to train the final round of rejection inference model
# syn_val: synthetic data used to validate the final round of rejection inference model
not_rej_clf,rej_clf,syn_train,syn_val = auto_rej_xgb(train_X,train_y,val_X,val_y,rej_train_X,rej_val_X,train_w=None,val_w=None,rej_train_w=None,rej_val_w=None,metric='auc')
```
## Contact Information
Email: scoreconflow@gmail.com
Email:scoreconflow@foxmail.com
WeChat:SCF_04

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rascpy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "Credit Risk Management, Scorecard, Optimal Bins, Auto Report, XGB Auto Params Search, Step-Wise Liner-Regression, Step-Wise Logistic-Regression, Reject Inference",
    "author": "SIFU DATA SCIENCE",
    "author_email": "scoreconflow@gmail.com",
    "download_url": null,
    "platform": null,
    "description": "# RASC\nThe RASC project makes some supplements and improvements to existing statistical methods in order to provide data analysts and modelers with a more accurate and easier-to-use algorithmic framework.\nProject evolution process:\n- Phase 1 (ScoreConflow): The RASC provides a \"business instruction + unattended\" scoring card model development method. The original intention of the RASC project is to provide more accurate algorithms and more labor-saving scoring card development tools.\n- Phase 2 (Risk Actuarial Score Card): Based on the previous phase, a risk actuarial score card is provided to build a model by comprehensively considering user default probability, user profit level, and data cost.  \n- Phase 3 (Risk Actuarial Statistics): This phase goes beyond scorecard development. It aims to improve the areas where statistics and machine learning fall short in actual risk measurement. Building on the previous phase, it adds features such as XGB automatic parameters search, high-dimensional stratified sampling, filling mixed data types, and rejection inference.\n## Resources\n[English Documents,Tutorials and Samples (Version = 2025.8.24)](https://github.com/sifuHK/rasc/tree/main/2025.8.24)   \n[\u4e2d\u6587\u6587\u6863\u3001\u6559\u7a0b\u548c\u793a\u4f8b (\u7248\u672c = 2025.8.24)](https://gitee.com/sifuHK/rasc/tree/master/2025.8.24)  \n## Install\npip install rascpy\n## Project Introduction\nMain Functions Include:\n1. Provide binning algorithms that support monotonic or U-shaped constraints.\n1. In addition to user-specified constraints, RASC can also automatically identify constraints (monotonic or U-shaped) applicable to the data based on the training and validation sets.Users can export the recognition results to Excel and compare them with their own understanding of variable trends.  \n1. It can handle multiple special values and None values at the same time.\n1. Provide a more accurate binning algorithm, resulting in a higher IV. Mathematically, this set of split nodes is proven to be the global optimal node, regardless of constraints or even for categorical variables.\n1. Provide Python's two-way stepwise regression algorithm (including logistic regression and linear regression), and supports multiple constraints, such as coefficient signs, P-values, group variable number limits, and other variable selecting conditions.Supports exporting it to Excel that the entire model iteration process and the reasons for each variable's exclusion from the model.\n1. Provide a bidirectional stepwise regression algorithm with actuarial capabilities. This algorithm uses company profits as part of a loss function that considers the model's prediction accuracy, individual user profitability, and data costs. (This is during the testing phase and will undergo significant changes.)\n1. Provide a more convenient missing value filling function. Common missing value filling methods can only handle missing values, but cannot handle special values, especially in scenarios where the data contains both missing values and special values. Special values cannot be simply equated with missing values. Simply treating special values as missing values without considering the business scenario will lead to information loss. Special values will turn numerical data into a complex data type that mixes categorical and numerical types. Currently, no model can directly handle this type of data (although some models can produce results, they are not accurate and have no practical significance). This problem can be solved by using the Impute package provided by RASC. The transformed data can be directly fed into any model and meet practical business requirements.\n1. Provide high-precision, high-dimensional stratified sampling. The stratified sampling provided by machine learning is overly simplistic. When stratified sampling is performed solely based on the Y label, a phenomenon may occur: after the variables in the training and test sets are segmented with equal frequency according to the same nodes, the event rates of Y differ significantly. This makes it difficult to narrow the metric differences between the training and validation sets during modeling. Without high-precision, high-dimensional stratified sampling, this issue can only be addressed by reducing model performance to improve model generalization. Another phenomenon is that after binning, the binning results of the training and validation sets differ significantly, manifesting as significant differences in IV values. Without high-precision, high-dimensional stratified sampling, the only way to improve generalization is to increase the bin width. The stratified sampling method provided by RASC has been tested and shown to significantly mitigate this phenomenon, reducing the discrepancy between the training and validation sets without compromising model performance or binning IV. (Excessive disparity between datasets is often caused by inconsistencies in the high-dimensional joint distribution, but due to sampling precision limitations, this can only be treated as overfitting.)\n1. Provide automatic parameter tuning for xgboost. According to tests, models created with other parameter tuning frameworks often exhibit significant discrepancies between training and validation set metrics. However, the xgboost automatic parameter tuning framework provided by RASC minimizes the discrepancy between training and validation set metrics.\n1. Support the rejection inference model of the scorecard.\n1. Support the rejection inference model of xgboost.\n1. Provide model report output function, users can easily generate Excel documents for model development reports.\n1. Provide batch beautification and export of DataFrame to Excel. The output format is similar to Excel pivot table with color scale.\n1. Support for automated modeling. The functions provided above can be called using traditional API methods, allowing users to assemble each module through programming to build models. Alternatively, unattended modeling can be achieved by using the AI instruction templates provided by RASC.\n## Introduction To Main Modules\n### 1.Bins\nThe optimal split nodes calculated by Bins is a set of split nodes that maximizes IV with a mathematical proof.\nFor categorical variables, including ordered and unordered categories, there is also a set of split nodes that can be mathematically proven to maximize IV.\nIts main functions are:\n1. Find the split nodes that maximizes IV with or without constraints. Five constraint settings are supported: monotonic (automatically determines increasing or decreasing), monotonically decreasing, monotonically increasing, U-shaped (automatically determines convex or concave), and automatically set appropriate constraints (automatically determines monotonically decreasing, monotonically increasing, convex U-shaped, and concave U-shaped).\n1. For categorical variables with or without constraints, the global optimal split nodes can also be figure out to maximize IV.\n1. Use \"Minimum difference in event rates between adjacent bins\" instead of \"Information Gain\" or \"Chi-Square Gain\" to prevent the formation of bins with too small differences. This allows users to intuitively understand the size of the differences between bins. This feature is also supported for categorical variables.\n1. Do not replace the minimum value of the first bin with negative infinity, nor the maximum value of the last bin with positive infinity. This ensures that outliers are not masked by extending extreme values to infinity. RASC also provides a comprehensive mechanism to handle online values exceeding modeling boundaries. This resolves the common contradiction between the need to detect outliers as early as possible during data analysis and the need to mask them in online applications to prevent process bottlenecks (while still providing timely alerts).\n1. The concept of wildcards is introduced to solve the problem that the online values of categorical variables exceed the modeling value range.\n1. Support multi-process parallel computing.\n1. Support binning of weighted samples.\n1. Support for left closed and right open binning.  \n  \nIn most cases, users do not need to interact directly with Bins components. However, RASC is designed to be pluggable, so advanced users can use Bins modules independently, just like any other Python module.  \n### 2.Reg_Step_Wise_MP\nIt is a linear/logistic two-way stepwise regression implemented in Python, which adds the following features to the traditional two-way stepwise regression:\n1. When performing stepwise variable selection for logistic regression, AUC, KS, and LIFT metrics can be used instead of AIC and BIC. For some business scenarios, AUC and KS are more appropriate. For example, in ranking tasks, a model built using the KS metric uses fewer variables while maintaining the same KS, thereby reducing data costs.\n1. When performing stepwise variable selection, use other datasets to calculate model evaluation metrics rather than the modeling dataset. Especially when the data size is large and a validation set is included in addition to the training and test sets, it is recommended to use the validation set to calculate evaluation metrics to guide variable selection. This helps reduce overfitting.\n1. Supports using partial data to calculate model evaluation metrics to guide variable selection. For example, if a business requires a certain pass rate of N%, then the bad event rate can be minimized for the top N% of samples, without requiring all samples to be included in the calculation. Actual testing shows that in appropriate scenarios, using partial data as evaluation metrics results in fewer variables than using full data, but the metrics of interest to users remain unchanged. Because the model focuses only on the top, more easily distinguishable sample points, business objectives can be achieved without requiring too many variables.\n1. Supports setting multiple conditions. Variables must meet all conditions simultaneously to be included in the model. Built-in conditions include: P-Value, VIF, correlation coefficient, coefficient sign, number of variables in a group, etc.\n1. Supports specifying variables that must be entered into the model. If the specified variables conflict with the four conditions, a comprehensive mechanism has been designed to resolve the problem.\n1. The modeling process is exported to Excel, recording the reasons for deleting each variable and the process information of each round of stepwise regression.\n1. Support actuarial calculations, using company profits as a loss function that takes into account the model's prediction accuracy, the profit level of a single user, and data costs (in the testing phase, there will be significant changes later)\n\nIn most cases, users do not need to interact directly with the Reg_Step_Wise_MP component. However, RASC is designed to be pluggable, and advanced users can use the Reg_Step_Wise_MP module independently, just like any other Python module.\n### 3.Cutter\nPerform equal frequency segmentation or segmentation according to specified split points, which has the following enhancements over the built-in segmenters of Python or Pandas:\n1. A mathematically provable analytical solution with minimum global error.\n1. All split points are derived from the original data. The minimum and maximum values for each interval are derived from the original data. This is different from Python or Pandas built-in splitters, which modify the minimum and maximum values at each end of each group.\n1. More humane support for left closed and right open: the last group is right closed.\n1. A globally optimal segmentation solution can be given even for extremely skewed data.\n1. Support weighted series.\n1. Supports user-specified special values. Special values are grouped separately, and users can also combine multiple special values into one group through configuration.\n1. Users can specify how to handle None values. If not specified and the sequence contains None values, the None values will be automatically grouped together.\n1. When a sequence is split using a specified split point, if the maximum or minimum value of the sequence exceeds the maximum or minimum value of the split point, the maximum and minimum values of the split point will be automatically extended.\n\nIt is recommended to try using Cutter to replace the built-in equal frequency segmentation component of Python or Pandas.\n### 4. Other Modules\nThere are also other modules that can significantly improve the accuracy and efficiency of data analysis and modeling:\nThe rascpy.Impute package can handle data with multiple special values and None values (binary classification tasks). This solves the current problem of using Impute to treat special values as None or as normal values, which can result in information loss or render the model meaningless.\n1. Provides high-precision, high-dimensional stratified sampling. This solves the current problem of reducing the discrepancy between training and test set metrics by compromising model performance due to low sampling precision. rascpy.Sampling can reduce the discrepancy between training and test set metrics by minimizing the differences in dataset distribution without compromising model performance.\n1. Provides automatic parameter tuning for xgboost. rascpy.Tree.auto_xgb differs from other automatic parameter tuning frameworks in that it can reduce the model variance while maintaining high training set metrics.\n1. Support scorecard and xgboost rejection inference.\n1. In addition to manually calling the above modules, users can choose to use AI instructions to automatically complete modeling without human supervision.\n## Usage Tutorial\n### Scorecard Development Example\n```Python\nfrom rascpy.ScoreCard import CardFlow\nif __name__ == '__main__':# Windows must write the main function, Linux and MacOS do not need to write the main function\n    # Pass in the command file\n    scf = CardFlow('./inst.txt')\n    # There are 11 steps in total: 1. Read datas, 2. Equal frequency binning, 3. Variable pre-filtering, 4. Monotonicity suggestion, 5. Optimal binning, 6. WOE conversion, 7. Variable filtering, 8. Modeling, 9. Generate scorecard, 10. Output model report, 11. Develop rejection inference scorecard\n    scf.start(start_step=1,end_step=11)# will automatically give the score card + the score card of rejection inference\n    \n    # You can stop at any step, as follows:\n    scf.start(start_step=1,end_step=10)#No scorecard of rejection inference will be developed \n    scf.start(start_step=1,end_step=9)#No model report will be output\n        \n    # If the results of the run are not modified, there is no need to run again. As shown below, steps 1-4 that have been run will be automatically loaded (will not be affected by restarting the computer)\n    scf.start(start_step=5,end_step=8)\n        \n    # You can also omit start_step and end_step, abbreviated as:\n    scf.start(1,10)\n```\nAfter each step of scf.start is completed, a lot of useful intermediate data will be retained. This data will be saved in the work_space specified in inst.txt as pkl. Users can manually load and access this data at any time. It can also be called through the CardFlow object instance. The intermediate results generated after each step is completed are:\n- step1: scf.datas\n- step2: scf.train_freqbins,scf.freqbins_stat\n- step3: scf.fore_col_indices,scf.fore_filtered_cols\n- step4: scf.mono_suggests,scf.mono_suggests_eventproba\n- step5: scf.train_optbins,scf.optbins_stat\n- step6: scf.woes\n- step7: scf.col_indices,scf.filtered_cols,scf.filters_middle_data,scf.used_cols\n- step8: scf.in_clf_cols,scf.clf_del_cols,scf.clf,scf.clf_perf,scf.clf_coef,scf.del_reason,scf.step_proc\n- step9: scf.card\n- step11:scf.rejInfer.train_freqbins,scf.rejInfer.freqbins_stat,\nscf.rejInfer.fore_col_indices,scf.rejInfer.fore_filtered_cols,\nscf.rejInfer.mono_suggests,scf.rejInfer.mono_suggests_eventproba,\nscf.rejInfer.train_optbins,scf.rejInfer.optbins_stat,scf.rejInfer.woes,\nscf.rejInfer.col_indices,scf.rejInfer.filtered_cols,scf.rejInfer.filters_middle_data,scf.rejInfer.used_cols\nscf.rejInfer.in_clf_cols,scf.rejInfer.clf_del_cols,scf.rejInfer.clf,scf.rejInfer.clf_perf,scf.rejInfer.clf_coef,scf.rejInfer.del_reason,scf.rejInfer.step_proc\nAnd store the newly synthesized dataset for rejection inference in scf.datas['rejData']['__synData']\n**load_step**\n```Python\n# load_step is only loading without execution. If your Python program is closed after execution and needs to be read again, there is no need to run it again. Just load the previous result. Even if the user closes the Python kernel or restarts the computer, the user can easily restore the CardFlow instance and call the intermediate data.\n# load_step avoids the trouble of loading pkl to obtain intermediate datas. CardFlow instance is equivalent to an intermediate data management container.\n# For example: load all steps 5 and before, and then call them through scf.xx\nfrom rascpy.ScoreCard import CardFlow\nscf = CardFlow('./inst.txt')\nscf.start(load_step = 5)\nprint(scf.datas)\nprint(scf.train_optbins)\n```\n#### Example Of A Instruction File\n```txt\n[PROJECT INST]\nmodel_name = Test\nwork_space = ../ws\nno_cores = -1\n\n[DATA INST]\nmodel_data_file_path = ../data/model\noot_data_file_path = ../data/oot\nreject_data_file_path = ../data/rej\nsample_weight_col = sample_weight\ndefault_spec_value = {-1}\n\n[BINS INST]\ndefault_mono=L+\ndefault_distr_min=0.02\ndefault_rate_gain_min=0.001\ndefault_bin_cnt_max = 8\ndefault_spec_distr_min=${default_distr_min}\ndefault_spec_comb_policy = A\n\n[FILTER INST]\nfilters = {\"big_homogeneity\":0.99,\"small_iv\":0.02,\"big_ivCoV\":0.3,\"big_corr\":0.8,\"big_psi\":0.2}\nfilter_data_names = {\"big_homogeneity\":\"train,test\",\"small_iv\":\"train,test\",\"big_ivCoV\":\"train,test\",\"big_corr\":\"train\",\"big_psi\":\"train,test\"}\n\n[MODEL INST]\nmeasure_index=ks\npvalue_max=0.05\nvif_max=2\ncorr_max=0.7\ndefault_coef_sign = +\n\n[CARD INST]\nbase_points=500\nbase_event_rate=0.05\npdo=50\n\n[REPORT INST]\ny_stat_group_cols = data_group\nshow_lift = 5,10,20\n\n[REJECT INFER INST]\nreject_train_data_name = rej\nonly_base_feas = True\n```\n#### Detailed Description Of All Instructions\n[English all_instructions_detailed_desc.txt](https://github.com/sifuHK/rasc/blob/main/2025.8.24/all_instructions_detailed_desc.txt)  \n[\u4e2d\u6587 \u5168\u90e8\u6307\u4ee4\u8be6\u7ec6\u8bf4\u660e.txt](https://gitee.com/sifuHK/rasc/blob/master/2025.8.24/\u5168\u90e8\u6307\u4ee4\u8be6\u7ec6\u8bf4\u660e.txt)   \n### Optimal Binning Example\nIn the scorecard development example, rascpy.Bins.OptBin/OptBin_mp is automatically called through CardFlow.\nUsers can also manually call OptBin/OptBin_mp to build their own modeling solutions.\n``` Python\n# OptBin_mp is a multi-process version of OptBin\nfrom rascpy.Bins import OptBin,OptBin_mp\n# Main parameter description\n# mono: Specifies the monotonicity constraint for each variable, such as: L+ is linearly increasing, U is automatically selecting from positive U or negative U, and A is automatically selecting from L+, L-, Uu, and Un based on the data. For the value range, see [BINS INST]:mono in \"Detailed Description of All Instructions\".\n# default_mono: Default monotonicity constraint for variables not set in mono\n# distr_min: Specify the minimum bin ratio of normal values except special values for each variable\n# default_distr_min: If the variable is not configured in distr_min, the default minimum binning ratio of the normal value\n# spec_value: Specify the special value of each variable. For the rules of writing special values, see [DATA INST]:spec_value in \"Detailed description of all instructions\".\n# default_spec_value: The default special value of the variable that does not appear in spec_value. When the special value you configured does not exist in a certain variable, the special value configuration will be automatically ignored. This command is very convenient to use when there is global unified special values in the data.\n# spec_distr_min: The minimum percentage of each special value for each variable (when the type is a double-nested dict) or the minimum percentage of all special values for the variable (when the type is a single-layer dict). If the proportion of special value of a variable is too small, the special values are merged using the merging strategy specified by spec_comb_policy. The purpose of special value merging is to reduce abnormal results caused by special values with too small a percentage.\n# default_spec_distr_min: If the variable is not in spec_distr_min, the default special value minimum percentage under the variable. The value can be a dict (to configure the default minimum percentage for each special value separately) or a number (all special values use the same default percentage)\n# spec_comb_policy: Specifies the merging rule for special values for each variable. When the proportion of the special value is less than the threshold specified by spec_distr_min, the merging rule is used. For the value range, see [BINS INST]:spec_comb_policy in \"Detailed Description of All Instructions\".\n# default_spec_comb_policy: If the variable is not configured in spec_comb_policy, the default special value merging rule is used. If the variable has no special value, this parameter is automatically ignored.\n# order_cate_vars: Specifies the ordered categorical variables in the data and gives the order of each category. ** represents a wildcard character; all unconfigured categories are merged into the wildcard character. Wildcards are well-suited for variables with long-tail distributions. If the order of a variable is set to None, lexicographic order is used.\n# unorder_cate_vars: Specifies the unordered categorical variables in the data. Unordered categories will be sorted according to the event rate. Each variable corresponds to a float: if the proportion of the category is less than the threshold, it will be merged into the wildcard category. The corresponding value of the variable is None: no limit on the proportion (may cause large fluctuations)\n# no_wild_treat: When a categorical variable does not have a wildcard and an uncovered category appears in the dataset, the method about how to treat the category. For the value range, see [CATE DATA INST]: no_wild_treat in \"Detailed Description of All Instructions\".\n# default_no_wild_treat: If there is no variable configured in no_wild_treat, the default treatment method for this category will be used if an uncovered category occurs.\n# cust_bins: User manually bins the variable\n# cores: The number of CPU cores used by multiple processes. None: All cores int: When it is greater than 1, it specifies the number of cores to be used. When it is less than 0, it specifies the number of cores reserved for the system, that is, all cores minus the specified number of cores. When it is equal to 1, it turns off multiple processes and uses a single process, which is equivalent to calling OptBin\nif __name__ == '__main__':# Windows must write the main function, Linux and MacOS do not need to write the main function\n    optBins = OptBin_mp(X_dats,y_dats,mono={'x1':'L+','x2':'U'},default_mono='A',\n                        distr_min={'x1':0.05},default_distr_min=0.02,default_rate_gain_min=0.001,\n                        bin_cnt_max={'x2':5},default_bin_cnt_max=8,\n                        spec_value={'x1':['{-999,-888}','{-1000,None}']}, default_spec_value=['{-999,-888}','{-1000}'],\n                        spec_distr_min={'x1':{'{-1000,None}':0.01,'{-999,-888}':0.05},'x2':0.01},default_spec_distr_min=0.02,\n                        spec_comb_policy={'x2':'F','x3':'L'},default_spec_comb_policy='A',\n                        order_cate_vars={'x7':['v3','v1','v2'],'x8':['v5','**','v4'],'x9':None},\n                        unorder_cate_vars={\"x10\":0.01,\"x11\":None},no_wild_treat={'x10':'H','x11':'m'},default_no_wild_treat='M',\n                        cust_bins={'x4':['[1.0,4.0)','[4.0,9.0)','[9.0,9.0]','{-997}','{-999,-888}','{-1000,None}']},cores=-1)\n```\n### Bidirectional Stepwise Logistic Regression Example\nIn the scorecard development example, rascpy.Reg_Step_Wise_MP.LogisticReg is automatically called through CardFlow.\nUsers can also manually call LogisticReg to build their own modeling solutions.\n``` Python\nfrom rascpy.Reg_Step_Wise_MP import LogisticReg\n# Generate data: There are 10 variables in total, of which the first 4 are useful variables, the middle 2 are redundant variables (there is collinearity with the first 4 variables), and the last 4 are useless variables. Add appropriate noise\nX, y = make_classification(n_samples=10000,n_features=10,n_informative=4,n_redundant=2,shuffle=False,random_state=random_state,class_sep=2)\n# Convert X to a DataFrame and modify the column names to match their variable effects. rascpy.Reg_Step_Wise_MP.LogisticReg can only accept DataFrame as X\nX = pd.DataFrame(X,columns=['informative_1','informative_2','informative_3','informative_4','redundant_1','redundant_2','useless_1','useless_2','useless_3','useless_4'])\n# Convert y to Series. rascpy.Reg_Step_Wise_MP.LogisticReg can only accept Series as y\ny=pd.Series(y).loc[X.index]\n# Main parameter description\n# measure: A metric used to determine whether a parameter should be entered into the model. It supports aic, bic, roc_auc, ks, and other indicators.\n# pvalue_max: The pvalue of all variables in model cannot exceed this value. rasc designs a complex and reasonable mechanism to ensure that the pvalue of all variables in model is not greater than this value.\n# vif_max: The vif of all variables in model cannot exceed this value. rasc designs a complex and reasonable mechanism to ensure that the vif of all variables in model is not greater than this value.\n# corr_max: The pairwise correlation coefficients of all variables in model cannot exceed this value. rasc designs a complex and reasonable mechanism to ensure that the pairwise correlation coefficients of all variables in model are not greater than this value.\n# iter_num: number of rounds of stepwise regression\n# results_save: Output the model effect, information related to the model coefficients, reasons for deleting variables, and details of each round of stepwise regression into an Excel.\n# Return value description\n# in_vars: all variables entered into the model\n# clf_final: The final model returned\n# clf_perf: Model performance\n# clf_coef: information related to model coefficients\n# del_reason: reason for deleting variable\n# step_proc: Details of each round of stepwise regression\nif __name__ == '__main__':# Windows must write the main function, Linux and MacOS do not need to write the main function\n    lr = LogisticReg(X,y,measure='roc_auc',pvalue_max=0.05,vif_max=3,corr_max=0.8,iter_num=20,results_save = 'test_logit.xlsx')\n    in_vars,clf_final,clf_perf,clf_coef,del_reason,step_proc = lr.fit()\n    \n# Other important parameters\n# user_save_cols: variables that are forced to enter into the model. A complex and reasonable mechanism is designed to handle conflicts between user_save_cols and commands such as pvalue_max, vif_max, and corr_max.\n# coef_sign: dict, used to specify the coefficient sign of each variable\n# default_coef_sign:When the variable is not in coef_sign, the default value of the coefficient sign constraint for that variable\n# exc_group: Ensure that only one variable in the group can be entered into model. Subsequent versions will allow users to set the number of variables or the variables total cost limit in the group to be entered into model. For its value rules, see [MODEL INST]:exc_group in \"Detailed Description of All Instructions\".\n```\n### XGB Automatic Parameter Search Example\n``` Python\nfrom rascpy.Tree import auto_xgb\n# Parameter description\n# cands_num: auto_xgb will give a score to each hyperparameter tried during automatic parameter search. The higher the score, the more recommended the model trained with the hyperparameter is. Then the scores are sorted from high to low, and the models with the top cands_num scores are returned.\n# In actual use, the model with the highest score (i.e. clf_cands[0]) is the best model in most cases. However, users can still select their favorite model from the candidate models clf_cands[n] according to their preferences.\n# cost_time: The running time of auto_xgb. Because the essence of parameter search is a combinatorial explosion, the goal of any algorithm is to find the most likely optimal set of hyperparameters within a limited time. Therefore, the longer cost_time is, the more likely it is to find the optimal set of hyperparameters.\n# However, in actual use, the author find that setting cost_time to 3-5 minutes is enough to yield the optimal model for most cases. Setting it longer generally fails to yield a higher-scoring model. If the user is dissatisfied with the model, they can try increasing cost_time, but increasing it to more than 8 minutes is not recommended and will likely be ineffective.\n# If the user is not satisfied with the bias or variance of the model, the best approach is not to increase cost_time, but to try using a more accurate sampling method, such as rascpy.Impute.BCSpecValImpute\n# Return value description\n# perf_cands: list. Metrics of all candidate models. Each metric contains three pieces of information: train_ks(train_auc), val_ks(val_auc), |train - val| (the absolute value of the difference between the training set and the validation set)\n# params_cands: list. Hyperparameters of all candidate models\n# clf_cands: list. All candidate models\n# vars_cands: list. All candidate model input variables \n# Note: The indexes of these four return values are relative. If the user decides to use the clf_cands[0] model, he can view the model's metrics through perf_cands[0], the model's hyperparameters through params_cands[0], and the model's input variables through vars_cands[0].\nperf_cands,params_cands,clf_cands,vars_cands = auto_xgb(train_X,train_y,val_X,val_y,metric='ks',cost_time=60*5,cands_num=5)\nproba_hat = clf_cands[0].predict_proba(X)[:,1]#The columns of X need to completely correspond to the columns during training. Even if a column is not entered into the model, it must be passed into the predict_proba method.\n# When making predictions, you can also try to use the more convenient predict_proba\nfrom rascpy.Tool import predict_proba\nproba_hat = predict_proba(clf_cands[0],X[vars_cands[0]],decimals=4)#Only the variables to be input into the model need to be passed in, which is very convenient for online systems. And the returned proba_hat is a Series with the same row index as X.\n```\n### Impute Example\nBCSpecValImpute can be used to handle special values and missing values in data for binary classification problems. It can handle special values and missing values for continuous, unordered categorical, and ordered categorical variables.\nBCSpecValImpute can simultaneously fill in empty values and transform special values.\nIf the data contains both None values and special values, most models cannot handle them well (in a business-meaningful way). We recommend using rascpy.Impute.BCSpecValImpute to preprocess the data before training it in a binary classification model.\n``` Python\nfrom rascpy.Impute import BCSpecValImpute\n# Main parameter description\n# spec_value: specifies the special value of each variable. For the rules of writing special values, see [DATA INST]:spec_value in \"Detailed description of all instructions\".\n# default_spec_value: The default special value of the variable that does not appear in spec_value. When the special value you configured does not exist in a certain variable, the configuration of the special value will be automatically ignored. This command is very convenient to use when there is a global unified special value in the data.\n# order_cate_vars: Specifies the ordered categorical variables in the data and gives the order of each category. ** represents a wildcard character; all unconfigured categories are merged into the wildcard character. Wildcards are well-suited for variables with long-tail distributions. If the order of a variable is set to None, lexicographic order is used.\n# unorder_cate_vars: Specifies the unordered categorical variables in the data. Unordered categories will be sorted according to the event rate. If the value is float, if the proportion of the category is less than the threshold, it will be merged into the wildcard category. If the value is None, there is no limit on the proportion (which may cause large fluctuations)\n# impute_None: Whether to fill in None values. Because some models can automatically handle None values, if you use such a model later, you can ignore None values when filling, and only need to handle special values. (Almost all models cannot directly handle data with both None values and special values)\nbcsvi = BCSpecValImpute(spec_value={'x1':['{-999,-888}','{-1000,None}'],'x11':['{unknow}']},default_spec_value=['{-999}','{-1000}'],\norder_cate_vars={'x8':['v5','**','v4'],'x9':None},\nunorder_cate_vars={\"x10\":0.01,\"x11\":None},impute_None=True,cores=None)\nbcsvi.fit(trainX,trainy,weight=None) # weight=None can be omitted\ntrainX = bcsvi.transform(trainX)\n# trainX = bcsvi.fit_transform(trainX,trainy)\ntestX = bcsvi.transform(testX)\n    \n#View the specific filling rules:\nprint(bcsvi.impute_values)\n#Output format: {'x1':{-999:2,-888:1,-1000:0,None:0},'x2':{-999:1,-1000:0},'x8':{None:'D'},'x11':{'unknow':'A'}}}\n#From the results, we can see that the special value -999 of the numeric variable x1 is filled with 2, and the empty value is filled with 0, etc. The special value 'unknow' of the categorical variable x11 is filled with A\n#If the key corresponding to a variable name is not found in the first-level dict, it means that the variable has no special value in the training set and does not need to be filled. (However, it is necessary to avoid the situation where special values exist in other datasets)\n```\n### High-Dimensional Stratified Sampling Example\nOne of the most important evaluation criteria for the effectiveness of a high-dimensional stratified sampling algorithm is whether the joint distribution of **each** x variable and y can remain consistent in each dataset after sampling.\nIf the data itself can be divided into multiple groups, then it is also required that the joint distribution of **each** x variable and y can remain consistent in each group in each dataset after sampling.\nrascpy provides the rascpy.Sampling.split_cls algorithm, which is designed for high-precision sampling of binary classification problems. Compared with some sampling algorithms, it shows good consistency in joint distribution, regardless of whether the data contains groups. This is especially true for the x variable, which has a strong predictive effect.\n``` Python\nfrom rascpy.Sampling import split_cls\n# Main parameter description\n# dat: dataframe dataset \n# y:y column name of the label\n# test_size: sampling ratio\n# w: column name of weight\n# groups: data grouping fields\ntrain,test = split_cls(dat,y='y',test_size=0.3,w='weight',groups=['c1','c2'],random_state=0)\n```\n### Scorecard Rejection Inference Model\nThere are three methods for developing scorecard rejection inference models. Users can choose any method based on their own situation.\nMethod 1: Complete the normal scorecard and rejection inference scorecard simultaneously. Suitable for developing scorecards from scratch\n``` Python\nfrom rascpy.ScoreCard import CardFlow\nif __name__ == '__main__':# Windows must write the main function, Linux and MacOS do not need to write the main function\n    # Pass in the command file\n    scf = CardFlow('./inst.txt')\n    scf.start(start_step=1,end_step=11)# will automatically generate standard scorecards and rejection inference scorecards\n```\nMethod 2: Complete the standard scorecard first, then generate the rejection inference scorecard. This is suitable for those who have already generated the standard scorecard with rascpy and need to generate a rejection inference scorecard.\n``` Python\nfrom rascpy.ScoreCard import CardFlow\nif __name__ == '__main__':\n    # Pass in the command file\n    scf = CardFlow('./inst.txt')\n    scf.start(start_step=11,end_step=11)#If you have already run step 1 to step 10, you can set both start_step and end_step to 11 to generate a rejection inference scorecard.\n```\nMethod 3: Directly call the CardRej module. This is suitable for those who have developed a scorecard using other python packages and then use rascpy to generate a rejection inference scorecard.\n``` Python\nfrom rascpy.ScoreCardRej import CardRej\nif __name__ == '__main__':\n    # Main parameter description\n    # init_clf: unbiased logistic regression model\n    # init_optbins_stat_train: Unbiased bin statistics. Format: {'x1':pd.DataFrame(columns=['bin','woe'])}\n    # datas: data passed in by the user. Format example: {'rejData':{'rej':pd.DataFrame(),'otherRej':pd.DataFrame()},'ootData':{'oot1':pd.DataFrame(),'oot2':pd.DataFrame()}}\n    # inst_file: Instruction file. The instructions are the same as those in the 'Scorecard Development Example'. See \"Detailed Instructions for All Instructions\". If datas is empty, all data files under [DATA INST]:xx_data_file_path in the inst_file file will be automatically loaded. If datas is not empty, the configuration of [DATA INST]:xx_data_file_path will be ignored.\n    cr = CardRej(init_clf,init_optbins_stat_train,datas=None,inst_file='inst.txt')\n    cr.start()\n```\nRefer to the intermediate data generated by the rejection inference in step 11 of the \"Scorecard Development Example\". The intermediate data is called by scf.rejInfer.xx in Method 1 and Method 2, and by cr.xx in Method 3.\n### Tree Rejection Inference Model\n``` Python\nfrom rascpy.TreeRej import auto_rej_xgb\n# Main parameter description\n# xx_w: weight of the corresponding dataset\n# metric: two options, ks or auc\n# Return value description\n# not_rej_clf: non-rejection inference xgb model\n# rej_clf: reject the inferred xgb model\n# syn_train: synthetic data used to train the final round of rejection inference model\n# syn_val: synthetic data used to validate the final round of rejection inference model\nnot_rej_clf,rej_clf,syn_train,syn_val = auto_rej_xgb(train_X,train_y,val_X,val_y,rej_train_X,rej_val_X,train_w=None,val_w=None,rej_train_w=None,rej_val_w=None,metric='auc')\n```\n## Contact Information\nEmail: scoreconflow@gmail.com\nEmail:scoreconflow@foxmail.com\nWeChat:SCF_04\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "The RASC project has made some supplements and improvements to existing statistical methods in order to provide data analysts and modelers with a more accurate and easier-to-use algorithmic framework.",
    "version": "2025.8.24",
    "project_urls": {
        "English Repository": "https://github.com/pypa/sampleproject/issues",
        "\u4e2d\u6587\u7f51\u5740": "https://donate.pypi.org"
    },
    "split_keywords": [
        "credit risk management",
        " scorecard",
        " optimal bins",
        " auto report",
        " xgb auto params search",
        " step-wise liner-regression",
        " step-wise logistic-regression",
        " reject inference"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "15389e13566d5ba5ab02ef3fac7c8e2c19f3cc9f05911edd97f6acb0d58aa786",
                "md5": "43053f29a4f13f49feef1b69cbd80a4d",
                "sha256": "cca18bc476fabd45a539bf00b72b984dc3aa1eb0fba3d2f1b91fbb36d0723326"
            },
            "downloads": -1,
            "filename": "rascpy-2025.8.24-cp310-cp310-macosx_10_9_universal2.whl",
            "has_sig": false,
            "md5_digest": "43053f29a4f13f49feef1b69cbd80a4d",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": null,
            "size": 2897195,
            "upload_time": "2025-08-24T12:27:53",
            "upload_time_iso_8601": "2025-08-24T12:27:53.771874Z",
            "url": "https://files.pythonhosted.org/packages/15/38/9e13566d5ba5ab02ef3fac7c8e2c19f3cc9f05911edd97f6acb0d58aa786/rascpy-2025.8.24-cp310-cp310-macosx_10_9_universal2.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0418b0f6c02f1c43bb29cdc35c78b096c84d4517c18f08764a34c66220cb3a0f",
                "md5": "db79543e18479a850774dd905c6b828f",
                "sha256": "b20c42545a653883c79ac47533e0df021a4c730ecbda7edd55ad7592f11cb779"
            },
            "downloads": -1,
            "filename": "rascpy-2025.8.24-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl",
            "has_sig": false,
            "md5_digest": "db79543e18479a850774dd905c6b828f",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": null,
            "size": 1729835,
            "upload_time": "2025-08-24T12:27:55",
            "upload_time_iso_8601": "2025-08-24T12:27:55.866467Z",
            "url": "https://files.pythonhosted.org/packages/04/18/b0f6c02f1c43bb29cdc35c78b096c84d4517c18f08764a34c66220cb3a0f/rascpy-2025.8.24-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6cfb037d77253b9802dc61e3b26cc47eb4980e336d12f961d8007879288359f1",
                "md5": "6a2e442a18227129c32f6219cf0114c9",
                "sha256": "108f84c38ece7a8ef0d5b323fa8c5e4ab421309cff3e6d6bcc34402a4365a602"
            },
            "downloads": -1,
            "filename": "rascpy-2025.8.24-cp310-cp310-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "6a2e442a18227129c32f6219cf0114c9",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": null,
            "size": 1246150,
            "upload_time": "2025-08-24T12:27:57",
            "upload_time_iso_8601": "2025-08-24T12:27:57.531121Z",
            "url": "https://files.pythonhosted.org/packages/6c/fb/037d77253b9802dc61e3b26cc47eb4980e336d12f961d8007879288359f1/rascpy-2025.8.24-cp310-cp310-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7a6e5883e6b5b61ed1b5af589eab0d096117e41a94e4f3ba3b859b36151aca9c",
                "md5": "6d4a8b2962bb3ee12ab3b3a639f15b22",
                "sha256": "6fa5f0afceb6b4ae98335f91d0b2d787458502c7a0aaf387d0d4f090b9c78902"
            },
            "downloads": -1,
            "filename": "rascpy-2025.8.24-cp311-cp311-macosx_10_9_universal2.whl",
            "has_sig": false,
            "md5_digest": "6d4a8b2962bb3ee12ab3b3a639f15b22",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": null,
            "size": 2958255,
            "upload_time": "2025-08-24T12:28:00",
            "upload_time_iso_8601": "2025-08-24T12:28:00.966417Z",
            "url": "https://files.pythonhosted.org/packages/7a/6e/5883e6b5b61ed1b5af589eab0d096117e41a94e4f3ba3b859b36151aca9c/rascpy-2025.8.24-cp311-cp311-macosx_10_9_universal2.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0a712ffa3d54bfbb70178711c6aa63561c37797097200138ef933259c87fbbdd",
                "md5": "b9766a1eb28a5bb44abd8c3436e53dc4",
                "sha256": "ad17c6cfbf54c51a9a228295fc891bcf78e4344e74d0ae3794443a8c4eb534c3"
            },
            "downloads": -1,
            "filename": "rascpy-2025.8.24-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl",
            "has_sig": false,
            "md5_digest": "b9766a1eb28a5bb44abd8c3436e53dc4",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": null,
            "size": 1771548,
            "upload_time": "2025-08-24T12:28:02",
            "upload_time_iso_8601": "2025-08-24T12:28:02.754011Z",
            "url": "https://files.pythonhosted.org/packages/0a/71/2ffa3d54bfbb70178711c6aa63561c37797097200138ef933259c87fbbdd/rascpy-2025.8.24-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c22742cf0eba1b6c879f77d3e3f157832f1135848bfb5a73deea12faabc10456",
                "md5": "67d19118f5a9ceb35c489af6a898b17c",
                "sha256": "8d773f1cf296f0287336fc0f2e0d812c4f473fb5cbb6975670fe2832a7ece3f7"
            },
            "downloads": -1,
            "filename": "rascpy-2025.8.24-cp311-cp311-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "67d19118f5a9ceb35c489af6a898b17c",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": null,
            "size": 1249720,
            "upload_time": "2025-08-24T12:28:04",
            "upload_time_iso_8601": "2025-08-24T12:28:04.421344Z",
            "url": "https://files.pythonhosted.org/packages/c2/27/42cf0eba1b6c879f77d3e3f157832f1135848bfb5a73deea12faabc10456/rascpy-2025.8.24-cp311-cp311-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b5f9678bcf0fb01586546ccb1694b29c7827a75a4d708210f9f46ca5e321712b",
                "md5": "ac74ad4a5903f886530e7074183d62cf",
                "sha256": "2ef45ab36301d361c35a80debddb668f3883142307396410b9fe62431f8ed2dc"
            },
            "downloads": -1,
            "filename": "rascpy-2025.8.24-cp312-cp312-macosx_10_13_universal2.whl",
            "has_sig": false,
            "md5_digest": "ac74ad4a5903f886530e7074183d62cf",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": null,
            "size": 2844736,
            "upload_time": "2025-08-24T12:28:06",
            "upload_time_iso_8601": "2025-08-24T12:28:06.259013Z",
            "url": "https://files.pythonhosted.org/packages/b5/f9/678bcf0fb01586546ccb1694b29c7827a75a4d708210f9f46ca5e321712b/rascpy-2025.8.24-cp312-cp312-macosx_10_13_universal2.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2e897000e7e0eff2a2828db4399119af75f285c9113a283f9db7a364e7f03cf1",
                "md5": "5cfdfb045ecc02f8dee72454069dc27d",
                "sha256": "571f08ec2d77e7d26f874d12234a60ac41b81f2bb09be149d40aad218184d205"
            },
            "downloads": -1,
            "filename": "rascpy-2025.8.24-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl",
            "has_sig": false,
            "md5_digest": "5cfdfb045ecc02f8dee72454069dc27d",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": null,
            "size": 1746239,
            "upload_time": "2025-08-24T12:28:07",
            "upload_time_iso_8601": "2025-08-24T12:28:07.672017Z",
            "url": "https://files.pythonhosted.org/packages/2e/89/7000e7e0eff2a2828db4399119af75f285c9113a283f9db7a364e7f03cf1/rascpy-2025.8.24-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8ab278682ac1c0df8d4a9a3909010e1545e5d50fda582a3087ca54c8d78db2bc",
                "md5": "f1985d215292fac99e08c2d6e57d2753",
                "sha256": "90168105479f3655cf9436d375e8217b9c0751570a3621ee67d3d3e6ace97470"
            },
            "downloads": -1,
            "filename": "rascpy-2025.8.24-cp312-cp312-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "f1985d215292fac99e08c2d6e57d2753",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": null,
            "size": 1198720,
            "upload_time": "2025-08-24T12:28:09",
            "upload_time_iso_8601": "2025-08-24T12:28:09.308249Z",
            "url": "https://files.pythonhosted.org/packages/8a/b2/78682ac1c0df8d4a9a3909010e1545e5d50fda582a3087ca54c8d78db2bc/rascpy-2025.8.24-cp312-cp312-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "48a8c54a35dbc2f359a1140b9c8d5396e634e9a2811627c282a1aa430b7f0e44",
                "md5": "9c6bb2b297e103046987bb9cb6c08b42",
                "sha256": "3ef6df7f6dc09153f66e90c96dd73efbeb9cc6dd2dc537dee150c44d7be8706d"
            },
            "downloads": -1,
            "filename": "rascpy-2025.8.24-cp313-cp313-macosx_10_13_universal2.whl",
            "has_sig": false,
            "md5_digest": "9c6bb2b297e103046987bb9cb6c08b42",
            "packagetype": "bdist_wheel",
            "python_version": "cp313",
            "requires_python": null,
            "size": 2829504,
            "upload_time": "2025-08-24T12:28:11",
            "upload_time_iso_8601": "2025-08-24T12:28:11.228478Z",
            "url": "https://files.pythonhosted.org/packages/48/a8/c54a35dbc2f359a1140b9c8d5396e634e9a2811627c282a1aa430b7f0e44/rascpy-2025.8.24-cp313-cp313-macosx_10_13_universal2.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9f71e21b502bdb9feb1d4f54376bfe83acd72c6a23e64377978d755810103d83",
                "md5": "7606a04f2844659360e58cbcfd4ee9c7",
                "sha256": "fe5c99e9533d9886d4501561cfd297c23f19415175231c33f1f1ccd04ee57972"
            },
            "downloads": -1,
            "filename": "rascpy-2025.8.24-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl",
            "has_sig": false,
            "md5_digest": "7606a04f2844659360e58cbcfd4ee9c7",
            "packagetype": "bdist_wheel",
            "python_version": "cp313",
            "requires_python": null,
            "size": 1755363,
            "upload_time": "2025-08-24T12:28:12",
            "upload_time_iso_8601": "2025-08-24T12:28:12.833809Z",
            "url": "https://files.pythonhosted.org/packages/9f/71/e21b502bdb9feb1d4f54376bfe83acd72c6a23e64377978d755810103d83/rascpy-2025.8.24-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c5e02b7db1e77d6098df0bad3a86d53fc4d53bf3ef0208a4b1b7f2ccdab85c81",
                "md5": "187ff2302633d95e134dfb0bbae40109",
                "sha256": "35f7579afca714f6ec7410f72a3772e047248bb18b789f46d9335d4a26f1da26"
            },
            "downloads": -1,
            "filename": "rascpy-2025.8.24-cp313-cp313-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "187ff2302633d95e134dfb0bbae40109",
            "packagetype": "bdist_wheel",
            "python_version": "cp313",
            "requires_python": null,
            "size": 1197900,
            "upload_time": "2025-08-24T12:28:14",
            "upload_time_iso_8601": "2025-08-24T12:28:14.322502Z",
            "url": "https://files.pythonhosted.org/packages/c5/e0/2b7db1e77d6098df0bad3a86d53fc4d53bf3ef0208a4b1b7f2ccdab85c81/rascpy-2025.8.24-cp313-cp313-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "672d0710b81aaa31b3289138ee02d627e39345a1014785d88a3eea7125a46f6a",
                "md5": "99569a7cc8a377cdee5dfda62df297c6",
                "sha256": "3d314b0398d0071e25e91515145633bc9e907fb42dbbccbbd394438835d9eb45"
            },
            "downloads": -1,
            "filename": "rascpy-2025.8.24-cp39-cp39-macosx_10_9_universal2.whl",
            "has_sig": false,
            "md5_digest": "99569a7cc8a377cdee5dfda62df297c6",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": null,
            "size": 2915504,
            "upload_time": "2025-08-24T12:28:16",
            "upload_time_iso_8601": "2025-08-24T12:28:16.111330Z",
            "url": "https://files.pythonhosted.org/packages/67/2d/0710b81aaa31b3289138ee02d627e39345a1014785d88a3eea7125a46f6a/rascpy-2025.8.24-cp39-cp39-macosx_10_9_universal2.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f3abc19ff2ba8f59b3176b5bac6972d1f087a24978bbaf557f0042a0ef86c01e",
                "md5": "51b188c5461c0f6be0164cfbcbe7bc61",
                "sha256": "72bb2f24e3f44b7e760c3e8967b2b70948bb87c168bd131c99bb4b773c532fdb"
            },
            "downloads": -1,
            "filename": "rascpy-2025.8.24-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl",
            "has_sig": false,
            "md5_digest": "51b188c5461c0f6be0164cfbcbe7bc61",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": null,
            "size": 1735462,
            "upload_time": "2025-08-24T12:28:18",
            "upload_time_iso_8601": "2025-08-24T12:28:18.245963Z",
            "url": "https://files.pythonhosted.org/packages/f3/ab/c19ff2ba8f59b3176b5bac6972d1f087a24978bbaf557f0042a0ef86c01e/rascpy-2025.8.24-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4dc4b7ad548018b064db3a41bcfc09285cdab9aee1304945f4bd5c676ba13c3d",
                "md5": "adbf70e1079ddffb042b98d333da74f3",
                "sha256": "1326e44855f3378c957cf272f386ee29b597dbb984899e8481362112040d9252"
            },
            "downloads": -1,
            "filename": "rascpy-2025.8.24-cp39-cp39-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "adbf70e1079ddffb042b98d333da74f3",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": null,
            "size": 1349989,
            "upload_time": "2025-08-24T12:28:19",
            "upload_time_iso_8601": "2025-08-24T12:28:19.647792Z",
            "url": "https://files.pythonhosted.org/packages/4d/c4/b7ad548018b064db3a41bcfc09285cdab9aee1304945f4bd5c676ba13c3d/rascpy-2025.8.24-cp39-cp39-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-24 12:27:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pypa",
    "github_project": "sampleproject",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "rascpy"
}

SIFU DATA SCIENCE