# An Overview on the Landscape of R Packages for Open Source Scorecard Modelling

## Abstract

**:**

## 1. Introduction

`creditmodel`(Fan 2022),

`scorecard`(Xie 2021),

`scorecardModelUtils`(Poddar 2019),

`smbinning`(Jopia 2019)

`woeBinning`(Eichenberg 2018),

`woe`(Thoppay 2015),

`Information`(Larsen 2016),

`InformationValue`(Prabhakaran 2016),

`glmdisc`(Ehrhardt and Vandewalle 2020),

`glmtree`(Ehrhardt 2020),

`Rprofet`(Stratman et al. 2020) and

`boottol`(Schiltgen 2015).

`smbinning`,

`InformationValue`and

`Information`are among the most popular, and they have been available for quite some time. Another popular toolbox is provided by the package

`scorecard`, which has been frequently updated in the recent past as has also happened with the package

`creditmodel`.

`creditR`(Dis 2020),

`riskr`(Kunst 2020), and

`scoringTools`(Ehrhardt 2018).

^{,}4. In particular, a python implementation of the

`scorecard`R package (Xie 2021) is available5 which means that some of the results as worked out in this paper are directly transferable into the python world. Nonetheless as R denotes the lingua franca of statistics Ligges (2009), it provides access to a huge number of contributed packages and functionalities from the field of statistics outside the aforementioned ones. For this reason, the paper concentrates on R and investigates whether scorecard development can be improved by access to other already existing packages that have initially been designed for other purposes but can improve the analyst’s life. If available, such functionalities will also be mentioned in the corresponding sections.

`mlr3`(Lang et al. 2019, 2021) or

`caret`(Kuhn 2008, 2021). Studies have investigated potential benefits from using modern machine learning algorithms (Baesens et al. 2002; Bischl et al. 2016; Lessmann et al. 2015; Louzada et al. 2016; Szepannek 2017), but regulators and the General Data Protection Regulation (GDPR) require models to be understandable (cf. Financial Stability Board 2017; Goodman and Flaxman 2017). The latter issue can be addressed by methodologies of explainable machine learning (for an overview Bücker et al. 2021), e.g., using frameworks as provided by the packages

`DALEX`(Biecek 2018) or

`iml`(Molnar et al. 2018) while taking into account to what extent a model actually is explainable (Szepannek 2019). It further turned out that the use of current state-of-the-art ML algorithms is not necessarily always beneficial in the credit scoring context (Chen et al. 2018; Szepannek 2017), and they should be rather carefully analyzed in each specific situation, rather than relying on preferred preferred models (Rudin 2019). For this reason this paper focuses on the traditional way of scorecard modelling as briefly described above.

## 2. Data

`creditability`) and 13 categorical as well as seven numeric predictors, and 1000 observations in total with 300 defaults (

`level == “bad”`) and 700 nondefaults (

`level == “good”`). The data are provided by several R packages such as

`klaR`(Roever et al. 2020),

`woeBinning`,

`caret`or

`scorecard`. For the examples in this paper, the data from the

`scorecard`package are used where in addition the levels of the categorical variables such as

`present.employment.since`,

`other.debtors.or.guarantors`,

`job`or

`housing`are sorted according to their expected order w.r.t. credit risk. Note that Groemping (2019) compared the data from the UCI repository to the original papers and made a corrected version of itavailable6 (cf. also Szepannek and Lübke 2021). Other (partly simulated) example data sets (amongst others loan data of the peer-to-peer lending company Lending Club7) are contained within the packages

`creditmodel`,

`scoringTools`and

`smbinning`and

`riskr`.

`mlr3`(see Section 6). Instead, usually one single holdout set is used. The package

`scorecard`has a function

`split_df()`that splits data according to a prespecified percentage into training and validation sets. For the examples in the remainder of the paper, the following data are used:

`smbinning`,

`woe`,

`creditR`,

`riskr`,

`glmdisc`,

`scoringTools`,

`scorecardModelUtils`and

`creditmodel`8) do require the target variable to take only values 0 and 1 as in the example’s data sets

`train2`and

`valid2`. Although this is of course easily obtained, the package

`scoringModelUtils`contains a function

`fn_target()`that does this job and replaces the original target variable with a new one of name

`Target`.

## 3. Binning and Weights of Evidence

#### 3.1. Overview

#### 3.2. Requirements

`predict()`function.

#### 3.3. Available Methodology for Automatic Binning

`partykit`(Hothorn and Zeileis 2015):

`scorecard::woebin()`,

`smbinning::smbinning()`,

`scorecardModelUtils::iv_table()`and

`riskr::superv_bin()`. The implementation in the

`scorecardModelUtils`package merges the resulting bins to ensure monotonicity in default rates w.r.t. with the original variable which might or might not be desired. For the same purpose, the package

`smbinning`offers a separate function (

`smbinning.monotonic()`). In contrast to all previously mentioned packages, the package

`woeBinning`implements its own tree algorithm where either initial bins of similar WoE are merged (

`woe.binning()`), or the set of bins is binary split (

`woe.tree.binning()`) as long as the IV of the resulting variables decreases (increases) by a percentage less (more) than a prespecified percentage (argument

`stop.limit`) while the initial bins are created to be of minimum size (

`min.perc.total`). The function

`creditmodel::get_breaks_all()`uses classification and regression trees (Breiman et al. 1984) of the package

`rpart`(Therneau and Atkinson 2019)9 to create initial bins. An additional argument,

`best = TRUE`, merges these bins subsequently according to different criteria such as the maximum number of bins, the minimum percentage of observations per bin, a threshold for the ${\chi}^{2}$ test or odds, a minimum population stability (cf. Section 4) or monotonicity of the default rates across the bins (all of these can be specified by the argument

`bins_control`).

`scorecard`package offers alternative algorithms (argument

`method`) for automatic binning based on either the ${\chi}^{2}$ statistic or equal width or size of numeric variables.

`glmdisc`, which is explicitly designed to be used in combination with logistic regression modelling for credit scoring (Ehrhardt et al. 2019). The bins are optimized to maximize either AIC, BIC or the Gini coefficient (cf. Section 6) of a subsequent logistic regression model (using binned variables, not WoEs) on validation data (argument

`criterion=`). Second order interactions can also be considered (argument

`interact = TRUE`). Note that this approach is comparatively intense in terms of computation time and does not take variable selection into acount (cf. Section 5).

`Rprofet::BinProfet()`uses the function

`greedy.bin()`of the package

`binr`(Izrailev 2015). The package

`scoringTools`contains a variety of functions (

`chiM_iter()`,

`mdlp_iter()`,

`chi2_iter()`,

`echi2_iter()`,

`modchi2_iter()`and

`topdown_iter()`) which provide interfaces to binning algorithms from the package

`discretization`(Kim 2012). The

`dlookr`package (Ryu 2021), which is primarily designed for exploratory data analysis, has an implemented interface (

`binning_by()`) to

`smbinning::smbinning()`.

#### 3.4. Manipulation of the Bins

`Scorecard::woebin()`allows passing an argument

`breaks_list`. Each element corresponds to a variable with manual binning and must be named like the corresponding variable. For numeric variables, it must be a vector of break points, and for factor variables, it must be a character vector of the desired bins given by the merged factor levels, separated by “

`%,%`” (cf. output from Example 3 for variable purpose). In addition, a function

`scorecard::woebin_adj()`allows for an interactive adjustment of bins. The package

`smbinning`provides two functions,

`smbinning.custom()`and

`smbinning.factor.custom()`.

`scorecard`:

`total_iv`(not shown here).

`glmdisc`,

`riskr`,

`Rprofet`,

`scorecard`,

`smbinning`,

`woeBinning`) provide a visualization of the bins on a variable level. Figure 2 (left) shows the binning resulting from code in Example 2 which is similar for most packages. A mosaic plot of the bins, which simultaneously visualizes default rates and the size of the bins, is offered by the package

`glmdisc`(Figure 2, right) while the names of the bins after automatic binning are not self-explanatory.

#### 3.5. Applying Bins to New Data

`scorecard`(

`woebin_ply()`),

`smbinning`(

`smbinning.gen()`and

`smbinning.factor.gen()`),

`woeBinning`(

`woe.binning.deploy()`),

`creditmodel`(

`split_bins_all()`),

`glmdisc`(

`discretize()`) and

`scorecardModelUtils`(

`num_to_cat()`) provide this functionality. Example 3 illustrates the application of binning results to a data set. Via the

`to = “bin”`argument, either bins or WoEs can be assigned:

`ctree`-based binning (cf. above) a workaround using the

`partykit::predict.party()`method for bin assignment can be obtained if the tree model is stored within the results object10.

`cut()`for numeric variables or by using lookup tables for factor variables (cf. Zumel and Mount 2014, p. 23)11. It is worth mentioning that several packages (

`smbinning`and

`riskr`) implement binning only on a single variable level but not simultaneously for several selected variables or all variables of a data frame12.

#### 3.6. Binning of Categorical Variables

`smbinning`does not offer an automatic merging of levels for factor variables, and its function

`smbinning.factor()`only returns the figures similar to the table resulting from Example 2. However, each original level corresponds to only one bin. The bins can be manipulated afterwards via

`smbinning.factor.custom()`and further be applied to new data via

`smbinning.factor.gen()`. An automatic binning of categorical variables based on conditional inference trees is supported by the packages

`riskr`and

`scorecard`(

`method = “tree”`). Additional merging strategies are provided by the packages

`glmdisc`and

`creditmodel`(as described above),

`scorecard`(

`method = “chimerge”`) and

`woeBinning`(according to similar WoEs).

`woeBinning`’s

`woe.binning()`function this can be ensured: Initial bins of a minimum size (

`min.perc.total`) are created and smaller factor levels are initially bundled into a positive or negative ‘miscellaneous’ category according to the sign of the corresponding WoE which is desirable to prevent overfitting. The package

`scorecardModelUtils`offers a separate function

`cat_new_class()`for this. All levels less frequent than specified by the argument

`threshold`are merged together, and a data frame with the resulting mapping table is stored in the output element

`$cat_class_new`13. The package

`creditmodel`provides a function

`merge_category`which keeps the

`m`most frequent categories and merges all other levels in a new category of name

`“other”`but no function is available to apply the same mapping to new data.

`woeBinning`’s

`woe.binning()`, the functions

`scorecard::woebin()`14 and

`creditmodel::get_breaks_all()`15 also merge adjacent levels of similar default rates for categorical variables. An important difference between both implementations consists in how they deal with the missing natural order of the levels and thus the notion of what ’adjacent’ means: In

`woe.binning()`the levels are sorted according to their WoE before merging. This is not the case for the other two functions where levels are merged along their natural order which is often alphabetical 16. This might lead to an undesired binning, and as an important conclusion an analyst should think about manually changing the level order for factor variables when working with the package

`scorecard`17.

#### 3.7. Weights of Evidence

`scorecard::woebin_ply()`(with argument

`to = “woe”`),

`woeBinning::woe.binning.deploy()`(with argument

`add.woe.or.dum.var = “woe”)`and

`creditmodel::woe_trans_all()`.

`woe()`in the

`klaR`package, probably the first and most comprehensive implementation of WoE computation in R. WoEs for binned variables are computed on the training data and stored in an

`S3`object of class

`woe`with a corresponding

`predict.woe()`method that allows application to new data. Furthermore, via an argument

`ids`, a subset of the variables can be selected for which WoEs are to be computed (default: all

`factor`variables) and a real value

`zeroadj`specified and added to the frequency of bins with empty target levels for computation of $f\left(\right)$ in Equation (1) to prevent WoEs from resulting in $\pm \infty $. In contrast to other implementations, it allows observation

`weights`which can be necessary for reject inference

#### 3.8. Short Benchmark Experiment

`age`,

`amount`and

`duration`are the most interesting ones. Further note that (although it is by far the most popular data set used in literature) for reasons of its size and the balance of the target levels, the German credit data might not be representative of typical credit scorecard developments (Szepannek 2017). For this reason, the results should not be overemphasized but rather used to give an idea on differences in performance of the various implementations.

`age`,

`amount`and

`duration`. The package

`Rprofet`(which interfaces to

`binr::bins.greedy()`, cf. above) returns the largest numbers of bins. The number of bins as returned by the tree-based binning via

`smbinning`and

`riskr`as well as

`glmdisc`and

`creditmodel`are comparatively small.

`klaR`. Afterward, univariate Gini coefficients (as one of the most commonly used performance measures for performance evaluation of credit scoring models, cf. Section 6) of the WoE variables are computed using the package

`pROC`(Robin et al. 2021). Note that some of the introduced functions for automatic binning allow for a certain degree ofhyperparameterization which could be used to improve the binning results. However, as the scope of automatic binning does not provide a highly tuned perfect model but rather a solid basis for a subsequent manual bin adjustment, all results in the experiment are computed using default parameterization. Further note that, for the package

`Rprofet`, no validation performance is available as there exists no

`predict()`method. For the packages

`riskr`, the workaround has been used as described above to assign bins to validation data18. Concerning the results, it also has to be mentioned that the package

`glmdisc`optimizes bins w.r.t. subsequent logistic regression based on dummy variables on the bins which further takes into account the multivariate dependencies between the variables and not just discriminative power of the single variables19.

`creditmodel`results of the automatic binning for the variables

`age`,

`amount`and

`duration`were significantly worse (below LCL) than the best method. In summary, none of the packages clearly dominates the others, and at first glance the choice of the algorithm does not seem to be crucial. In practice, it might be worth trying different algorithms and comparing their results to support the subsequent modelling step of their manual modification (cf. above).

#### 3.9. Summary of Available Packages for Binning

`creditmodel`using default parameters was a significantly worse performance used. However, because the resulting automatically generated bins should be analyzed and modified if necesseray, the choice of an explicit algorithm for the initial automatic binning becomes less important. In summary, the package

`woeBinning`offers quite a comprehensive toolbox with many desirable implemented functionalities, but unfortunately no manual modification of the results from automatic binning is supported. For the latter the

`scorecard`package can be used, but it must be used with care for factor variables because its automatic binning of categorical variables suffers from dependence on the natural order of the factor levels. As a remedy, a function has been suggested in the supplementary code (cf. footnote 17) to import the results of

`woeBinning`’s automatic binning into the result objects from the

`scorecard`package for further processing.

## 4. Preselection of Variables

#### 4.1. Overview

- Information values of single variables;
- Population stability analyses of single variables on recent out-of-time data
- Correlation analyses between variables.

#### 4.2. Information Value

`creditR`no adjustment is completed, and the resulting IV becomes ∞. Some packages (

`creditmodel`,

`Information`,

`InformationValue`and

`smbinning`) return a value different from ∞, but from the documentation it is not clear how it is computed. For the packages

`scorecard`and

`scorecardModelUtils`, the adjustment is known, and for the package

`klaR`the adjustment can be specified in an argument. Note that, depending on the adjustment, the resulting IVs of the affected variables may differ strongly.

`klaR`with zero adjustment (which in fact is not necessary here.) The function

`woe()`(cf. Example 4) automatically returns IVs for all factor variables.

`creditR`also offers a function

`IV_elimination()`that allows an

`iv_threshold`and returns a data set with a subset of variables with IV above threshold for the training data. Similarly, the package

`scorecardModelUtils`offers a function

`iv_filter()`that returns a list of variable names that pass (/fail) a prespecified threshold.

`creditR`can be used to compute Gini coefficients for simple logistic regression models on each single variable via the function

`Gini.univariate.data()`, and just as for IVs, this can be used for variable subset preselection (

`Gini_elimination()`). The function

`pred_ranking()`from the package

`riskr`returns a summary table containing IV as well as the values of the univariate AUC and KS statistic and an interpretation.

#### 4.3. Population Stability Analysis

`SSI.calc.data()`from the package

`creditR`returns a data frame of PSIs for all variables. The corresponding code (here, for a computation of PSIs between training and validation—not OOT—set) is given in Example 6.

`riskr::psi()`calculates the PSI for single variables and also provides a more detailed table on the bin-specific differences (cf. Example 7 for the variable

`purpose`). It does contain the absolute and relative distribution of the bins (for reasons of space two columns with the absolute frequencies have been discarded from the output). The PSI of the variable as given by the

`value`element of the output corresponds to the sum of the column

`index`:

`smbinning`comes along with a function

`smbinning.psi(df, y, x)`which requires both development and OOT sample to be in one data set (

`df`) and a variable

`y`that indicates the data set where an observations originates. In addition to a function

`get_psi_all()`for PSI calculation, the package

`creditmodel`provides a function

`get_psi_plots()`to visualize stability of the bins for two data sets using bar plots with juxtaposed bars. The packages

`creditR`and

`scorecard`further offer functions that can be used for an OOT stability analysis of the final score (cf. Section 5).

#### 4.4. Correlation Analysis

`caret`package (Kuhn 2008, 2021) offers a function

`findCorrelation()`that automatically identifies among any two variables of strong correlation the one that has the larger average (absolute) correlation to all other variables. A major advantage of performing correlation analysis in advance for variable preselection is that it can be used as another way to integrate expert’s experience into the modelling. Among variable clusters of high correlations, experts can choose which of these variables should be used or discarded for further modelling. There are some packages that are not originally intended to be used for credit scorecard modelling but that offer functions that can be used for this purpose. The package

`corrplot`

`corrplot`offers a function to visualize the correlation matrix and resort it such that groups of correlated variables are next to each other (cf. Figure 3, left). An alternative visualization is given by a phylogenetic tree of the clustered variables using the package

`ape`(Paradis and Schliep 2018; Paradis et al. 2021), where the variable clustering is obtained using the package

`ClustOfVar`((Chavent et al. 2012, 2017), cf. Figure 3, right). The code for creation of both plots is given in the following example (note that the choice of the

`hclust.method = “complete”`in the left plot guarantees a minimum correlation among all variables in a cluster, but all correlations on the training data are below $0.35$ in this example).

`clustVarLV`(Vigneau et al. 2015, 2020) offers variable clustering such that the correlation between each variable and the first latent principal component of its variable cluster is maximized. The number of clusters K has to be prespecified. As it can be seen in the output from Example 9 (only cluster 1 is shown), for each variable the correlation to the cluster’s latent component as well as the correlation to the ‘closest’ next cluster are shown.

`creditR`contains a function

`variable.clustering()`that performs

`cluster`’s

`pam`(Maechler et al. 2021) on the transposed data for variable clustering. The (sparsely documented) function

`correlation.cluster()`%data, output, “variable”, “Group”)} can be used to compute average correlations between the variables of each cluster. 20

`Rprofet`provides two functions

`WOEClust_hclust()`and

`WOEClust_kmeans()`that perform

`stats::hclust()`on the transformed data or

`ClustOfVar::kmeansvar()`and return a data frame with variable names and cluster index together with the IV of the variable, which may help to select variables from the clusters. Unfortunately, they are only designed to work with output from the package’s function

`WOEProfet()`and require a list of a specific structure as input argument. In addition to functions

`cor_plot()`for visualization of the correlation matrix,

`char_cor()`computes a matrix of Cramer’s V between or a set of categorical variables and

`get_correlation_group()`for detection of groups of correlated (numeric) variables. The package

`creditmodel`also contains a function

`fast_high_cor_filter()`for an automatic correlation- based variable selection. In a group of highly correlated variables, the one with the highest IV is selected as shown in Example 10.

`scorecardModelUtils`offers an alternative for an automatic variable preselection based on Cramer’s V using the function

`cv_filter()`. Among two (categorical) variables of $V>$

`threshold`, the one with lower IV is automatically removed (cf. Example 11). Finally, two functions,

`iv_filter()`and

`vif_filter()`can be used for variable preselection based on IVs only (w/o taking into account for correlations between the explanatory variables) and based on variance inflation (cf. also Section 5).

#### 4.5. Further Useful Functions to Support Variable Preselection

`scorecard`contains a function

`var_filter()`that performs an automatic variable selection based on IV and further allows for specifying a maximum percentage of missing or identical values within a variable, but it does not account for correlations among the predictor variables. Alternatively, the package

`creditmodel`has a function

`feature_selector()`for automatic variable preselection based on IV, PSI, correlation and xgboost variable importance (Chen and Guestrin 2016).

`creditR`has two functions to identify variables with missing values (

`na_checker()`) and compute the percentage of variables with missing values (

`missing_ratio()`). For imputation of numeric variables in a data set with mean or median values, a function

`na_filler_contvar()`is available. Of course, this has to be handled with care as the mean or median value will typically not be the same on training and validation data. The package

`mlr`(Bischl et al. 2016, 2020) offers imputation that can be applied to new data.

`scorecardModelUtils`also provides a function

`missing_val()`. This can be either a function such as

`“mean”`,

`“median”`or

`“mode”`or an explicit value such as -99999 which can be meaningful before binning to assign missing values to a separate bin. Similarly, for categorical variables the assignment of a specific level such as

`“missing_value”`can be meaningful. A function

`missing_elimination()`removes all variables with a percentage above

`missing_ratio_threshold`from training (but not from validation) data. The package

`creditmodel`offers a convenient function

`data_cleansing()`that can be used for automatic deletion of variables with low variance and a high percentage of missing values, to remove duplicated observations and reduce the number of levels of categorical variables. The package

`riskr`provides two functions

`select_categorical()`and

`select_numeric()`to select all (non-/) numeric variables of a data frame.

`univariate()`of the

`scorecardModelUtils`package. A summary for numeric variables can be computed using the function

`ez_summ_num()`from the package

`riskr`. A general overview of packages explicitly designed for exploratory data analysis that provide further functionalities are given in Staniak and Biecek (2019). The packages

`scorecard`(

`one_hot()`and

`var_scale()`) and

`creditmodel`(\texttt{one_hot_encoding(),

`de_one_hot_encoding()`,

`min_max_norm()`) provide functions for one-hot-encoding of categorical and standardization of numeric variables.

## 5. Multivariate Modelling

#### 5.1. Variable Selection

`glm()`(with

`family = binomial`). In addition to the manual variable preselection as described in the former section, typically, a subsequent variable selection is performed which can be completed by the

`step()`function. Common criteria for variable selection are AIC (

`k = 2`) or BIC (

`k = log(nrow(data))`). Example 12 gives an example for BIC based variable selection.

`null`) and the scope for the search have to be specified. This offers another possibility for expert knowledge integration. After each step the criteria of all candidates are reported and can be used to decide among several variable candidates of similar performance for the one that is most appropriate from a business point of view. The corresponding variable can be manually added to the formula of a new initial model in a subsequent variable selection step.

`smbinning.logitrank()`of package

`smbinning`runs all possible combinations of a specified set of variables, ranks them according to AIC and returns the corresponding model formulas in the result data frame. Depending on the size of the preselected set of variables (cf. Section 4), this can be time-consuming.

`car`(Fox and Weisberg 2019; Fox et al. 2021) and

`scorecard`offer a function

`vif()`that can be used for this purpose as well as the functions

`vif.calc()`and

`lr_vif()`of the packages

`creditR`and

`creditmodel`(cf. Example 13).

`glmtree`offers a function

`glmtree()`that computes a potential segmentation scheme according to a tree of recursive binary splits where each leaf of the tree consists in a logistic regression model. The resulting segmentation optimizes AIC, BIC or alternatively the likelihood or the Gini coefficient on validation data. Note that this optimization does not account for variable selection as described above.

#### 5.2. Turning Logistic Regression Models into Scorecard Points

`scorecard`offers a function

`scorecard()`that translates a

`glm`object into scorecard points as described above and in addition returns key figures such as frequencies, default rates and WoE for all bins. A function

`scorecard_ply()`is available that can be used to assign scores to new data. In addition to the

`glm`object, the

`bins`as created by

`scorecard`’s

`woebin()`(cf. Section 3) have to be passed as an input argument. Further arguments do specify the (

`pdo`) as well as a fixed number of points

`points0`that corresponds to odds of

`odds0`and whether the scorecard should contain an intercept or whether the intercept should be redistributed to all variables (

`basepoints_eq0`). The function requires WoEs (not just the binned

`factor`s) and the variable names in the

`coef(glm)`to match the convention of variable renaming as it is done by

`scorecard`’s

`woebin_ply()`function (i.e., a postfix

`_woe`)21.

`scorecard2()`is available that directly computes a scorecard based on

`bins`and a data frame of the original variables. Here, in addition, the name of the target variable (

`y`) and a named vector (

`x`) of the desired input variables have to be passed22. Example 14 illustrates the usage of

`scorecard2()`and its application to new data (here represented by the validation set) as well as its output for the variable

`duration.in.month`.

`report()`that takes the data, the (original) names of all variables in the final scorecard model and a breaks list (cf. Section 3 that can be obtained from the bins) as input arguments and generates an excel report summary of the scorecard model. Different sheets are reported with information and figures on the data, model, scorecard points, model performance and the binning figures for all variables of the model which can be used for model development documentation in practice.

`factor`variables (bins instead of WoEs) into scorecard points, the package

`scorecardModelUtils`provides a function

`scalling()`. Its output can be used to predict scores for new data by function

`scoring()`(cf. Example 15).

`creditmodel`transforms a

`glm`object into scorecard points via a function

`get_score_card()`, which requires a bin table created by

`creditmodel::get_bins_table_all()`and thus is restricted to application within its own universe. In addition, if a table of scorecard points is not required, it offers a function

`score_transfer()`that directly applies the

`glm`object to data and scales the resulting points accordingly (cf. Example 16) and another function

`p_to_score`to turn posterior probabilities into score points.

`smbinning.scaling()`, which comes with a predict function

`smbinning.scoring.gen()`that can be used to score new observations but that requires the binned variables have been generated with

`smbinning.gen()`or

`smbinning.factor.gen()`(cf. Section 3). A function

`smbinning.scoring.sql()`is available that transforms the resulting scorecard into SQL code.

`Rprofet`also contains a function

`ScorecardProfet()`for this purpose, which calculates a glm with corresponding scorecard points but only based on binning and WoEs as calculated by functions from the package itself (cf. Section 3), and no function is available for application of the scorecard points to new data. The function

`scaled.score()`of the package

`creditR`transforms posterior default probabilities into scores where any

`increase`points double the odds (of nondefault), and odds of

`increase`correspond to

`ceiling_score`points. In addition, the package

`creditR`offers a function that can be used to recalibrate an existing glm on calibration data. A simple logistic regression is fit on the

`calibration_data`with only one input variable: the predicted log odds by the current model.

#### 5.3. Class Imbalance

`klaR`allows for specifying observation weights for WoE computation (see Section 3.7). Within the

`mlr3`framework, imbalance correction can be performed using

`mlr3pipelines`(Binder et al. 2021). Several resampling algorithms are implemented in the packages

`imbalance`(Cordón et al. 2020, 2018) and

`unbalanced`(Pozzolo et al. 2015). The SMOTE algorithm is also implemented in the

`smotefamily`package (Siriseriwan 2019).

## 6. Performance Evaluation

#### 6.1. Overview

`mlr3`). While this is less critical in the case of simple models such as logistic regression, it should still be kept in mind, especially if the model is benchmarked against more flexible machine learning models such as support vector machines, random forests or gradient boosting (cf. e.g., Hastie et al. (2009)).

#### 6.2. Discrimination

`ks.test()`, one of the most popular ways to compute the AUC in R is given by the package

`ROCR`(Sing et al. 2005, 2020). Nonetheless, for the purpose of credit scorecard modelling, it is referred to the package

`pROC`at this point for the following three reasons:

- Different from standard binary classification problems, credit scores are typically supposed to be increasing if the event (= default-) probability decreases. The function
`roc()`of the package`pROC`has an argument`direction`that allows for specifying this. - In credit scoring applications, it may be given that not all observations of a data set are of equal importance, e.g., it may not be as important to distinguish which of two customers with small default probabilities has the higher score if his or her application will be accepted anyway. The package’s function
`auc()`has an additional argument`partial.auc`to compute partial area under the curve (Robin et al. 2011). - Finally, its function
`ci()`can be used to compute confidence intervals for the AUC using either bootstrap or the method of DeLong (DeLong et al. 1988; Sun and Xu 2014), e.g., to support the comparison of two models.

`pROC`can be used for performance analysis.

`creditR`offers a function

`Kolmogorov–Smirnov()`, and

`riskr`has two functions,

`ks()`and

`ks2()`, for computation of the Kolmogorov–Smirnov test statistic. In addition,

`riskr`provides a function

`divergence()`to compute the divergence between two empirical distributions as well as

`gg_dists()`and

`gg_cum()`to visualize the score densities for defaults and nondefaults and their empirical cumulative distribution functions. To compute the Gini coefficient, the package

`riskr`provides functions

`aucroc`(AUC),

`gini`(Gini coefficient),

`gg_roc()`(visualization of the ROC curve),

`gain()`(gains table for specified values on the x-axis) and

`gg_gain()`/

`gg_lift()`(for visualization of the gains-/lift-chart).

`creditmodel`, two functions

`ks_value()`and

`auc_value()`are available as well as a

`model_result_plot()`to visualize the ROC curve, cumulative score distributions of defaults vs. nondefaults, lift chart and the default rate over equal-sized score bins. A table with respective underlying numbers can be obtained via

`perf_table()`.

`InformationValue`contains two functions,

`ks_stat()`and

`ks_plot()`, for Kolmogorov-Smirnov analysis and several functions:

`AUROC()`,

`plot_ROC()`,

`Concordance()`and

`SomersD()`(Gini coefficient) to support analyses with regard to the Gini coefficient. Additionally, the

`confusionMatrix()`and derivative performance measures

`misClassError()`,

`sensitivity()`,

`specificity()`,

`precision()`,

`npv()`,

`kappaCohen()`and

`youdensIndex()`(cf. e.g., Zumel and Mount (2014) chp. 5 for an overview) can be computed for a given cut off by the corresponding functions. Note that these measures are computed with respect to the nondefault target level (supposed to be coded as ‘1’ in the target variable) as well as a cut off optimization w.r.t. the misclassification error, Youden’s Index or the minimum (/maximum) score such that no misclassified defaults (/non-defaults) occur in the data (function

`optimalCutoff()`).

`fn_conf_mat()`of the

`scorecardModelUtils`package. Numeric differences between the (0/1-coded) target and the model’s predictions in terms of MSE, MAE and RMSE can be computed by its

`fn_error()`function. The package

`boottol`contains a function

`boottol()`to compute bootstrap confidence intervals for Gini, AUC and KS, where subsets of the data above different cut off values are also considered. It may be desirable to analyze the (cumulative) frequencies of the binned scores. A table of such frequencies is returned by the function

`gini_table()`in the

`scorecardModelUtils`package. Example 18 shows selected columns for a binned score using the function

`gains_table()`from the

`scorecard`package.

`hmeasure`(Anagnostopoulos and Hand 2019). The expected maximum profit measure (Verbraken et al. 2014) as implemented in the package

`EMP`(Bravo et al. 2019) further takes into account the profitability of a model.

#### 6.3. Performance Summary

`smbinning`(which returns the largest number of performance measures of the four functions from Table 5) as well the function

`riskr::gg_perf()`that can be used to produce several graphs on the scorecard’s performance (cf. Figure 4). Note that although ROC curves are one of the most popular tools for performance visualization of binary classifiers, they are hardly suited to visualize the performance difference of several competitive models. One reason for this is that large areas of the TPR-FPR plane (e.g., everything below the main diagonal) are typically of no interest given a specific data situation. For this reason, in practice, ROC curves are not very useful for model selection.

#### 6.4. Rating Calibration and Concentration

`creditR`contains a function

`master.scale()`that takes a data frame with scores and corresponding default probabilities as input and uses the function

`woeBinning::woe.binning()`to group scores of similar WoE (cf. Example 20). The function

`odds_table()`of the

`riskr`package allows setting a

`breaks`argument with arbitrary bins.

`creditR`contains three functions (

`chisquare.test()`,

`binomial.test()`and

`adjusted.binomial.test()`) that provide a table with indicators for each rating grade (cf. Example 19). Another function,

`binomial.point()`, compares the observed average predicted default probability on the data with prespecified boundaries around some desired central tendency default probability. Bootstrap confidence intervals for default probabilities of rating grades can be computed using the function

`vas.test()`of the package

`boottol`. A Hosmer–Lemeshow goodness-of-fit test (Hosmer and Lemeshow 2000) is, e.g., implemented by the function

`hoslem.test()`in the

`resourceselection`package (Lele et al. 2019).

`creditR`’s

`Herfindahl.Hirschman.Index()`or

`Adjusted.Herfindahl.Hirschman.Index()`. Small values of HHI indicate low risk concentration.

#### 6.5. Cross Validation

`mlr3`or

`caret`). The function

`k.fold.cross.validation.glm()`of the

`creditR`package computes cross-validated Gini coefficients, while the function

`perf_cv()`of the

`scorecard`package offers an argument to specify different performance measures such as

`“auc”`,

`“gini”`and

`“ks”`. Both functions allow setting seeds to guarantee reproducibility of the results. The function

`fn_cross_index()`somewhat more generally returns a list of training observation indices that can be used to implement a cross-validation and compare models using identical folds.

## 7. Reject Inference

#### 7.1. Overview

`scoringTools`, which is available on Github but not on CRAN. It provides five functions for reject inference:

`augmentation()`,

`fuzzy_augmentation()`,

`parcelling()`,

`reclassification()`and

`twins()`, which correspond to common reject inference strategies of the same name (cf. e.g., Finlay (2012)). In the following, two of the most popular strategies, namely augmentation and parcelling are briefly explained as they are implemented within the package, completed by an example of their usage.

#### 7.2. Augmentation

#### 7.3. Parcelling

`scoringTools`package. Note that all other functions of this package are of similar syntax and output. For parcelling in particular, the

`probs`argument specifies quantiles w.r.t. the predicted default probabilities (i.e., from low risk to high risk). Although in the example the factor vector

`alpha`is constantly set to 1 for all bands, in practice it will be chosen to be increasing, at least for quantiles of high PDs.

`financed_model`and

`infered_model`. Both are of class

`glm`. Note that both models are automatically calculated without any further options of parameterization such as variable selection or a recomputation of the WoEs based on the combined sample of accepted applications and rejected applications with inferred target. For this purpose, the

`woe()`function of the

`klaR`package can be used, which supports the specification of observation weights as the only one among all presented packages. Finally, the combined sample can be used to rebuild the scorecard model as described in Section 4, Section 5 and Section 6.

## 8. Summary and Discussion

`scorecard`,

`scorcardModelUtils`,

`smbinning`and

`creditmodel`. With regard to the important modelling step of variable binning and WoE computation, the package

`woeBinning`provides an implementation that reflects a broad range of practical issues (cf. Section 3). The package

`creditmodel`comes with a whole set of additional functionalities such as cohort analysis, correlation based variable preselection or Cramer’s V. It further allows for an easy development of challenging models using xgboost (Chen et al. 2021), gradient boosting (Greenwell et al. 2020) or random forests (Liaw and Wiener 2002). In turn, it does not support manual modification of the bins but rather claims to make the development of binary classification models simple and fast. Unfortunately, its functions are poorly documented, and for the user it is not clear what exactly many of the functions do without looking into the source code. While it seems based on individual experiences, the package

`scorecard`is close to the methodology as described in literature (Siddiqi 2006).

`logisticPCA`package (Landgraf 2016) on the binary data given by Table 6.

`mlr3`framework in combination with explainable ML methodology to fulfill regulatory requirements (Bücker et al. 2021). The availability of open source frameworks for scorecard modelling as described above may help bridge the gap between academic advances in machine learning research and the traditional modelling process in the financial industry.

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Notes

1 | https://www.sas.com/en_us/software/credit-scoring.html (accessed on 15 February 2022). |

2 | https://cran.r-project.org/web/views/Finance.html (accessed on 15 February 2022). |

3 | https://www.openriskmanual.org/wiki/Credit_Scoring_with_Python (accessed on 15 February 2022). |

4 | https://towardsdatascience.com/how-to-develop-a-credit-risk-model-and-scorecard-91335fc01f03 (accessed on 15 February 2022). |

5 | https://github.com/ShichenXie/scorecardpy (accessed on 15 February 2022). |

6 | https://archive.ics.uci.edu/ml/datasets/South+German+Credit+%28UPDATE%29 (accessed on 15 February 2022). |

7 | https://www.lendingclub.com/ (accessed on 15 February 2022). |

8 | Note that the package creditmodel supports a pos_flag to define the level of the positive class which currently does not work for Binning and Weights of Evidence. |

9 | Argument equal_bins = FALSE or initial bins of equal sample size otherwise. |

10 | An example code for the package riskr is given in Snippet 2 of the supplementary code. |

11 | An example using a lookup table for the variable purpose is given in Snippet 3 of the supplementary code. |

12 | A code example of looping through all (numeric) variables for the package smbinning is given in Snippet 4 of the supplementary code. |

13 | An example code for application of this mapping to new data is given in Snippet 5 of the supplementary code. The names of the resulting new levels are the concatenated old levels, separated by commas. Note that the function cannot deal with commas in the original level names: a new level <NA> will be assigned |

14 | Using method = “chimerge”. |

15 | Using best = TRUE. |

16 | This can be easily checked using the variable purpose, cf. e.g., Snippet 6 of the supplementary code. |

17 | A code snippet for creating a breaks_list (cf. above) from a binning result using the package woeBinning that can be imported for further use within the package scorecard, e.g., for manual manipulation of the bins is given by the function woeBins2breakslist() in Snippet 7 of the supplementary code |

18 | See footnote 10. |

19 | Note that the call of glmdisc() ran in an internal error (incorrect number of subscripts on matrix) for more than 10 iterations. For this reason the number of iterations has been reduced to 10 which is much smaller than the default of 1000 iterations and the reported Gini coefficient does still strongly vary among subsequent iterations. For larger numbers of iterations better results might have been possible. |

20 | Its argument data denotes the training data, output is a data frame with two variables specifying the variable names of the training data (character) and the corresponding cluster index, as given, e.g., by the result from variable.clustering(). Finally, its arguments variables and clusters denote the names of these two variables in the data frame from the output argument where the clustering results are stored. |

21 | A remedy how it can be used in combination with WoE assignment using the package klaR as shown in Example 4 is given in Snippet 9 of the supplementary code. |

22 | Snippet 10 of the supplementary code illustrates how the vector x of the names of the input variables in the original data frame can be extracted from the bicglm model after variable selection from Example 12. |

23 | For the function augmentation(), this is obtained by rounding the posterior probabilities to the first digit. |

24 | Here, the augmented weights within each score-band are computed by $1+\frac{{n}_{rejected}}{{n}_{accepted}}$. |

25 | Within the function parcelling() this is done by sampling the labels from a binomial distribution. |

26 | As an exception, the package creditR has been developed as an extension of the package woeBinning. |

27 | Cf. corresponding footnotes in the paper. Supplementary code is available under https://github.com/g-rho/CSwR (accessed on 15 February 2022). |

## References

- Anagnostopoulos, Christoforos, and David J. Hand. 2019. Hmeasure: The H-Measure and Other Scalar Classification Performance Metrics, R Package Version 1.0-2; Available online: https://CRAN.R-project.org/package=hmeasure (accessed on 15 February 2022).
- Anderson, Raymond. 2007. The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation. Oxford: Oxford University Press. [Google Scholar]
- Anderson, Raymond. 2019. Credit Intelligence & Modelling: Many Paths through the Forest. Oxford: Oxford University Press. [Google Scholar]
- Azevedo, Ana, and Manuel F. Santos. 2008. KDD, SEMMA and CRISP-DM: A parallel overview. Paper presented at IADIS European Conference on Data Mining 2008, Amsterdam, The Netherlands, July 24–26; pp. 182–85. [Google Scholar]
- Baesens, Bart, Tony Van Gestel, Stijn Viaene, Maria Stepanova, Johan Suykens, and Jan Vanthienen. 2002. Benchmarking state-of-the-art classification algorithms for credit scoring. JORS 54: 627–35. [Google Scholar] [CrossRef]
- Banasik, John, and Jonathan Crook. 2007. Reject inference, augmentation and sample selection. European Journal of Operational Research 183: 1582–94. [Google Scholar] [CrossRef] [Green Version]
- Biecek, Przemyslaw. 2018. DALEX: Explainers for complex predictive models. Journal of Machine Learning Research 19: 1–5. [Google Scholar]
- Binder, Martin, Florian Pfisterer, Michel Lang, Lennart Schneider, Lars Kotthoff, and Bernd Bischl. 2021. mlr3pipelines—Flexible machine learning pipelines in r. Journal of Machine Learning Research 22: 1–7. [Google Scholar]
- Bischl, Bernd, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, Theresa Ullmann, Marc Becker, Anne-Laure Boulesteix, and et al. 2021. Hyperparameter optimization: Foundations, algorithms, best practices and open challenges. arXiv arXiv:2107.05847. [Google Scholar]
- Bischl, Bernd, Tobias Kühn, and Gero Szepannek. 2016. On class imbalance correction for classification algorithms in credit scoring. In Proceedings Operations Research 2014. Edited by Marco Lübbecke, Arie Koster, Peter Lethmathe, Reinhard Madlener, Britta Peis and Grit Walther. Heidelberg: Springer, pp. 37–43. [Google Scholar]
- Bischl, Bernd, Michel Lang, Lars Kotthoff, Julia Schiffner, Jakob Richter, Erich Studerus, Giuseppe Casalicchio, and Zachary Jones. 2016. mlr: Machine learning in R. Journal of Machine Learning Research 17: 1–5. [Google Scholar]
- Bischl, Bernd, Michel Lang, Lars Kotthoff, Patrick Schratz, Julia Schiffner, Jakob Richter, Zachary Jones, Giuseppe Casalicchio, Mason Gallo, Jakob Bossek, and et al. 2020. mlr: Machine Learning in R, R Package Version 2.17.1; Available online: https://CRAN.R-project.org/package=mlr (accessed on 15 February 2022).
- Bischl, Bernd, Olaf Mersmann, Heike Trautmann, and Claus Weihs. 2012. Resampling methods for meta-model validation with recommendations for evolutionary computation. Evolutionary Computation 20: 249–75. [Google Scholar] [CrossRef]
- Bravo, Cristian, Seppe van den Broucke, and Thomas Verbraken. 2019. EMP: Expected Maximum Profit Classification Performance Measure, R Package Version 2.0.5; Available online: https://CRAN.R-project.org/package=EMP (accessed on 15 February 2022).
- Breiman, Leo, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. 1984. Classification and Regression Trees. New York: Wadsworth and Brooks. [Google Scholar]
- Brown, Iain, and Christophe Mues. 2012. An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications 39: 3446–53. [Google Scholar] [CrossRef] [Green Version]
- Bücker, Michael, Maarten van Kampen, and Walter Krämer. 2013. Reject inference in consumer credit scoring with nonignorable missing data. Journal of Banking & Finance 37: 1040–45. [Google Scholar]
- Bücker, Michael, Gero Szepannek, Alicja Gosiewska, and Przemyslaw Biecek. 2021. Transparency, Auditability and explainability of machine learning models in credit scoring. Journal of the Operational Research Society 73: 1–21. [Google Scholar] [CrossRef]
- Chavent, Marie, Vanessa Kuentz, Benoit Liquet, and Jerome Saracco. 2017. ClustOfVar: Clustering of Variables, R Package Version 1.1; Available online: https://CRAN.R-project.org/package=ClustOfVar (accessed on 15 February 2022).
- Chavent, Marie, Vanessa Kuentz-Simonet, Benoît Liquet, and Jerome Saracco. 2012. Clustofvar: An r package for the clustering of variables. Journal of Statistical Software 50: 1–6. [Google Scholar] [CrossRef] [Green Version]
- Chawla, Nitesh V., Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16: 321–57. [Google Scholar] [CrossRef]
- Chen, Chaofan, Kangcheng Lin, Cynthia Rudin, Yaron Shaposhnik, Sijia Wang, and Tong Wang. 2018. An interpretable model with globally consistent explanations for credit risk. arXiv arXiv:1811.12615. [Google Scholar]
- Chen, Tianqi, and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. Paper presented at the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17; pp. 785–94. [Google Scholar] [CrossRef] [Green Version]
- Chen, Tianqi, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, Kailong Chen, Rory Mitchell, Ignacio Cano, Tianyi Zhou, and et al. 2021. xgboost: Extreme Gradient Boosting, R Package Version 1.5.0.2; Available online: https://CRAN.R-project.org/package=xgboost (accessed on 15 February 2022).
- Cordón, Ignacio, Salvador García, Alberto Fernández, and Francisco Herrera. 2018. Imbalance: Oversampling algorithms for imbalanced classification in r. Knowledge-Based Systems 161: 329–41. [Google Scholar] [CrossRef]
- Cordón, Ignacio, Salvador García, Alberto Fernández, and Francisco Herrera. 2020. imbalance: Preprocessing Algorithms for Imbalanced Datasets, R Package Version 1.0.2.1; Available online: https://CRAN.R-project.org/package=imbalance (accessed on 15 February 2022).
- Crone, Sven F., and Steven Finlay. 2012. Instance sampling in credit scoring: An empirical study of sample size and balancing. International Journal of Forecasting 28: 224–38. [Google Scholar] [CrossRef]
- Crook, Jonathan, and John Banasik. 2004. Does reject inference really improve the performance of application scoring models? Journal of Banking & Finance 28: 857–74. [Google Scholar] [CrossRef] [Green Version]
- Csárdi, Gábor. 2019. cranlogs: Download Logs from the ’RStudio’ ’CRAN’ Mirror, R Package Version 2.1.1; Available online: https://CRAN.R-project.org/package=cranlogs (accessed on 15 February 2022).
- DeLong, Elizabeth R., David M. DeLong, and Daniel L. Clarke-Pearson. 1988. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 44: 837–45. [Google Scholar] [CrossRef]
- Dis, Ayhan. 2020. creditR: A Credit Risk Scoring and Validation Package, R Package Version 0.1.0; Available online: https://github.com/ayhandis/creditR (accessed on 15 February 2022).
- Dua, Dheeru, and Casey Graff. 2019. UCI Machine Learning Repository. Irvine: University of California. [Google Scholar]
- Ehrhardt, Adrien. 2018. scoringTools: Credit Scoring Tools, R Package Version 0.1; Available online: https://CRAN.R-project.org/package=scoringTools (accessed on 15 February 2022).
- Ehrhardt, Adrien. 2020. glmtree: Logistic Regression Trees, R Package Version 0.2; Available online: https://CRAN.R-project.org/package=glmtree (accessed on 15 February 2022).
- Ehrhardt, Adrien, Christophe Biernacki, Vincent Vandewalle, and Philippe Heinrich. 2019. Feature quantization for parsimonious and interpretable predictive models. arXiv arXiv:1903.08920. [Google Scholar]
- Ehrhardt, Adrien, Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich, and Sébastien Beben. 2019. Réintégration des refusés en credit scoring. arXiv arXiv:1903.10855. [Google Scholar]
- Ehrhardt, Adrien, and Vincent Vandewalle. 2020. glmdisc: Discretization and Grouping for Logistic Regression, R Package Version 0.6; Available online: https://CRAN.R-project.org/package=glmdisc (accessed on 15 February 2022).
- Eichenberg, Thilo. 2018. woeBinning: Supervised Weight of Evidence Binning of Numeric Variables and Factors, R Package Version 0.1.6; Available online: https://CRAN.R-project.org/package=woeBinning (accessed on 15 February 2022).
- Fan, Dongping. 2022. creditmodel Toolkit for Credit Modeling, Analysis and Visualization, R Package Version 1.3.1; Available online: https://CRAN.R-project.org/package=creditmodel (accessed on 15 February 2022).
- Financial Stability Board. 2017. Artificial Intelligence and Machine Learning in Financial Services—Market Developments and Financial Stability Implications. Available online: https://www.fsb.org/2017/11/artificial-intelligence-and-machine-learning-in-financial-service/ (accessed on 15 February 2022).
- Finlay, Steven. 2012. Credit Scoring, Response Modelling and Insurance Rating. London: Palgarve MacMillan. [Google Scholar]
- Fox, John, and Sanford Weisberg. 2019. An R Companion to Applied Regression, 3rd ed. Thousand Oaks: Sage. [Google Scholar]
- Fox, John, Sanford Weisberg, Brad Price, Daniel Adler, Douglas Bates, Gabriel Baud-Bovy, Ben Bolker, Steve Ellison, David Firth, Michael Friendly, and et al. 2021. car: Companion to Applied Regression, R Package Version 3.0-12; Available online: https://CRAN.R-project.org/package=car (accessed on 15 February 2022).
- Goodman, Bryce, and Seth Flaxman. 2017. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine 38: 50–57. [Google Scholar] [CrossRef] [Green Version]
- Greenwell, Brandon, Bradley Boehmke, Jay Cunningham, and GBM Developers. 2020. gbm: Generalized Boosted Regression Models, R Package Version 2.1.8; Available online: https://CRAN.R-project.org/package=gbm (accessed on 15 February 2022).
- Groemping, Ulrike. 2019. South German Credit Data: Correcting a Widely Used Data Set. Technical Report 4/2019. Berlin: Department II, Beuth University of Applied Sciences Berlin. [Google Scholar]
- Hand, David. 2009. Measuring classifier performance: A coherent alternative to the area under the roc curve. Machine Learning 77: 103–23. [Google Scholar] [CrossRef] [Green Version]
- Hand, David, and William Henley. 1993. Can reject inference ever work? IMA Journal of Management Matehmatics 5: 45–55. [Google Scholar] [CrossRef]
- Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning, 2nd ed. New York: Springer. [Google Scholar]
- Hoffmann, Hans. 1994. German Credit Data Set (statlog). Available online: https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data) (accessed on 15 February 2022).
- Hosmer, David W., and Stanley Lemeshow. 2000. Applied Logistic Regression. Hoboken: Wiley. [Google Scholar]
- Hothorn, Thorsten, Kurt Hornik, and Achim Zeileis. 2006. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics 15: 651–74. [Google Scholar] [CrossRef] [Green Version]
- Hothorn, Torsten, and Achim Zeileis. 2015. partykit: A modular toolkit for recursive partytioning in R. Journal of Machine Learning Research 16: 3905–9. [Google Scholar]
- Izrailev, Sergei. 2015. Binr: Cut Numeric Values into Evenly Distributed Groups, R Package Version 1.1; Available online: https://CRAN.R-project.org/package=binr (accessed on 15 February 2022).
- Jopia, Herman. 2019. smbinning: Optimal Binning for Scoring Modeling, R Package Version 0.9; Available online: https://CRAN.Rproject.org/package=smbinning (accessed on 15 February 2022).
- Kaszynski, Daniel. 2020. Background of credit scoring. In Credit Scoring in Context of Interpretable Machine Learning. Edited by Daniel Kaszynski, Bogumil Kaminski and Tomasz Szapiro. Warsaw: SGH, pp. 17–26. [Google Scholar]
- Kaszynski, Daniel, Bogumil Kaminski, and Tomasz Szapiro. 2020. Credit Scoring in Context of Interpretable Machine Learning. Warsaw: SGH. [Google Scholar]
- Kim, HyunJi. 2012. Discretization: Data Preprocessing, Discretization for Classification, R Package Version 1.0-1; Available online: https://CRAN.R-project.org/package=discretization (accessed on 15 February 2022).
- Kuhn, Max. 2008. Building predictive models in r using the caret package. Journal of Statistical Software 28: 1–26. [Google Scholar] [CrossRef] [Green Version]
- Kuhn, Max. 2021. Caret: Classification and Regression Training, R Package Version 6.0-90; Available online: https://CRAN.R-project.org/package=caret (accessed on 15 February 2022).
- Kunst, Joshua. 2020. Riskr: Functions to Facilitate the Evaluation, Monitoring and Modeling process, R Package Version 1.0; Available online: https://github.com/jbkunst/riskr (accessed on 15 February 2022).
- Landgraf, Andrew J. 2016. logisticPCA: Binary Dimensionality Reduction, R Package Version 0.2; Available online: https://CRAN.Rproject.org/package=logisticPCA (accessed on 15 February 2022).
- Landgraf, Andrew J., and Yoonkyung Lee. 2015. Dimensionality reduction for binary data through the projection of natural parameters. arXiv arXiv:1510.06112. [Google Scholar] [CrossRef]
- Lang, Michel, Martin Binder, Jakob Richter, Patrick Schratz, Florian Pfisterer, Stefan Coors, Quay Au, Giuseppe Casalicchio, Lars Kotthoff, and Bernd Bischl. 2019. mlr3: A modern object-oriented machine learning framework in R. Journal of Open Source Software 4: 1903. [Google Scholar] [CrossRef] [Green Version]
- Lang, Michel, Bernd Bischl, Jakob Richter, Patrick Schratz, Giuseppe Casalicchio, Stefan Coors, Quay Au, and Martin Binder. 2021. mlr3: Machine Learning in R—Next Generation, R Package Version 0.13.0; Available online: https://CRAN.R-project.org/package=mlr3 (accessed on 15 February 2022).
- Larsen, Kim. 2016. Information: Data Exploration with Information Theory, R Package Version 0.0.9; Available online: https://CRAN.R-project.org/package=Information (accessed on 15 February 2022).
- Lele, Subhash R., Jonah L. Keim, and Peter Solymos. 2019. ResourceSelection: Resource Selection (Probability) Functions for Use-Availability Data, R Package Version 0.3-5; Available online: https://CRAN.R-project.org/package=ResourceSelection (accessed on 15 February 2022).
- Lessmann, Stefan, Bart Baesens, Hsin-Vonn Seow, and Lyn Thomas. 2015. Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research 247: 124–36. [Google Scholar] [CrossRef] [Green Version]
- Liaw, Andy, and Matthew Wiener. 2002. Classification and regression by randomforest. R News 2: 18–22. [Google Scholar]
- Ligges, Uwe. 2009. Programmieren Mit R, 3rd ed. Heidelberg: Springer. [Google Scholar]
- Little, Roderick, and Donald Rubin. 2002. Statistical Analysis with Missing Data. Hoboken: Wiley. [Google Scholar]
- Louzada, Francisco, Anderson Ara, and Guilherme Fernandes. 2016. Classification methods applied to credit scoring: A systematic review and overall comparison. Surveys in OR and Management Science 21: 117–34. [Google Scholar] [CrossRef] [Green Version]
- Maechler, Martin, Peter Rousseeuw, Anja Struyf, Mia Hubert, and Kurt Hornik. 2021. Cluster: Cluster Analysis Basics and Extensions, R Package Version 2.1.2; Available online: https://CRAN.R-project.org/package=cluster (accessed on 15 February 2022).
- Molnar, Christoph, Bernd Bischl, and Giuseppe Casalicchio. 2018. iml: An R package for Interpretable Machine Learning. JOSS 3: 786. [Google Scholar] [CrossRef] [Green Version]
- Paradis, Emmanuel, Simon Blomberg, Ben Bolker, Joseph Brown, Julien Claude, Hoa Sien Cuong, Richard Desper, Gilles Didier, Benoit Durand, Julien Dutheil, and et al. 2021. ape: Analyses of Phylogenetics and Evolution, R Package Version 5.6.1; Available online: https://CRAN.R-project.org/package=ape (accessed on 15 February 2022).
- Paradis, Emmanuel, and Klaus Schliep. 2018. ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35: 526–28. [Google Scholar] [CrossRef]
- Poddar, Arya. 2019. scorecardModelUtils: Credit Scorecard Modelling Utils, R Package Version 0.0.1.0; Available online: https://CRAN.R-project.org/package=scorecardModelUtils (accessed on 15 February 2022).
- Pozzolo, Andrea Dal, Olivier Caelen, and Gianluca Bontempi. 2015. unbalanced: Racing for Unbalanced Methods Selection, R Package Version 2.0; Available online: https://CRAN.R-project.org/package=unbalanced (accessed on 15 February 2022).
- Prabhakaran, Selva. 2016. InformationValue: Performance Analysis and Companion Functions for Binary Classification Models, R Package Version 1.2.3; Available online: https://CRAN.R-project.org/package=InformationValue (accessed on 15 February 2022).
- Robin, Xavier, Natacha Turck, Alexandre Hainard, Natalie Tiberti, Frederique Lisacek, Jean Sanchez, and Markus Müller. 2011. Proc: An open-source package for r and s+ to analyze and compare roc curves. BMC Bioinformatics 12: 1–8. [Google Scholar] [CrossRef]
- Robin, Xavier, Natacha Turck, Alexandre Hainard, Natalia Tiberti, Frederique Lisacek, Jean-Charles Sanchez, Markus Müller, Stefan Siegert, and Matthias Doering. 2021. pROC: Display and Analyze ROC Curves, R Package Version 1.18.0; Available online: https://CRAN.R-project.org/package=pROC (accessed on 15 February 2022).
- Roever, Christian, Nils Raabe, Karsten Luebke, Uwe Ligges, Gero Szepannek, and Marc Zentgraf. 2020. klaR: Classification and visualization, R Package Version 0.6-15; Available online: https://CRAN.R-project.org/package=klaR (accessed on 15 February 2022).
- Rudin, Cynthia. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1: 206–15. [Google Scholar] [CrossRef] [Green Version]
- Ryu, Choonghyun. 2021. dlookr: Tools for Data Diagnosis, Exploration, Transformation, R Package Version 0.5.4; Available online: https://CRAN.R-project.org/package=dlookr (accessed on 15 February 2022).
- Scallan, Gerard. 2011. Class(ic) scorecards—Selecting attributes in logistic regression. Credit Scoring and Credit Control XIII. Available online: https://www.scoreplus.com/papers/paper (accessed on 15 February 2022).
- Schiltgen, Garrett. 2015. Boottol: Bootstrap Tolerance Levels for Credit Scoring Validation Statistics, R Package Version 2.0; Available online: https://CRAN.R-project.org/package=boottol (accessed on 15 February 2022).
- Sharma, Dhuv. 2009. Guide to credit scoring in R. CRAN Documentation Contribution. Available online: https://cran.r-project.org/ (accessed on 15 February 2022).
- Siddiqi, Naeem. 2006. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring, 2nd ed. Hoboken: Wiley. [Google Scholar]
- Sing, Tobias, Oliver Sander, Niko Beerenwinkel, and Thomas Lengauer. 2005. ROCR: Visualizing classifier performance in R. Bioinformatics 21: 7881. [Google Scholar] [CrossRef]
- Sing, Tobias, Oliver Sander, Niko Beerenwinkel, and Thomas Lengauer. 2020. ROCR: Visualizing the Performance of Scoring Classifiers, R Package Version 1.0-11; Available online: https://CRAN.R-project.org/package=ROCR (accessed on 15 February 2022).
- Siriseriwan, Wacharasak. 2019. smotefamily: A Collection of Oversampling Techniques for Class Imbalance Problem Based on SMOTE, R Package Version 1.3.1; Available online: https://CRAN.R-project.org/package=smotefamily (accessed on 15 February 2022).
- Staniak, Mateusz, and Przemysław Biecek. 2019. The Landscape of R Packages for Automated Exploratory Data Analysis. The R Journal 11: 347–69. [Google Scholar] [CrossRef]
- Stratman, Eric, Riaz Khan, and Allison Lempola. 2020. Rprofet: WOE Transformation and Scorecard Builder, R Package Version 2.2.1; Available online: https://CRAN.R-project.org/package=Rprofet (accessed on 15 February 2022).
- Sun, Xu, and Weichao Xu. 2014. Fast implementation of delong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Processing Letters 21: 1389–93. [Google Scholar] [CrossRef]
- Szepannek, Gero. 2017. On the practical relevance of modern machine learning algorithms for credit scoring applications. WIAS Report Series 29: 88–96. [Google Scholar]
- Szepannek, Gero. 2019. How much can we see? A note on quantifying explainability of machine learning models. arXiv arXiv:1910.13376. [Google Scholar]
- Szepannek, Gero, and Karsten Lübke. 2021. Facing the challenges of developing fair risk scoring models. Frontiers in Artificial Intelligence 4: 117. [Google Scholar] [CrossRef] [PubMed]
- Therneau, Terry, and Beth Atkinson. 2019. rpart: Recursive Partitioning and Regression Trees, R Package Version 4.1-15; Available online: https://CRAN.R-project.org/package=rpart (accessed on 15 February 2022).
- Thomas, Lynn C., Jonathan N. Crook, and David B. Edelman. 2019. Credit Scoring and its Applications, 2nd ed. Philadelphia: SIAM. [Google Scholar]
- Thoppay, Sudarson. 2015. woe: Computes Weight of Evidence and Information Values, R Package Version 0.2; Available online: https://CRAN.R-project.org/package=woe (accessed on 15 February 2022).
- Verbraken, Thomas, Christian Bravo, Weber Richard, and Baesens Baesens. 2014. Development and application of consumer credit scoring models using profit-based classification measures. European Journal of Operational Research 238: 505–13. [Google Scholar] [CrossRef] [Green Version]
- Verstraeten, Geert, and Dirk Van den Poel. 2005. The impact of sample bias on consumer credit scoring performance and profitability. JORS 56: 981–92. [Google Scholar] [CrossRef]
- Vigneau, Evelyne, Mingkun Chen, and Veronique Cariou. 2020. ClustVarLV: Clustering of Variables Around Latent Variables, R Package Version 2.0.1; Available online: https://CRAN.R-project.org/package=ClustVarLV (accessed on 15 February 2022).
- Vigneau, Evelyne, Mingkun Chen, and El Mostafa Qannari. 2015. ClustVarLV: An R Package for the Clustering of Variables Around Latent Variables. The R Journal 7: 134–48. [Google Scholar] [CrossRef] [Green Version]
- Vincotti, Veronica, and David Hand. 2002. Scorecard construction with unbalanced class sizes. Journal of The Iranian Statistical Society 2: 189–205. [Google Scholar]
- Wrzosek, Malgorzata, Daniel Kaszynski, Karol Przanowski, and Sebastian Zajac. 2020. Selected machine learning methods used for credit scoring. In Credit Scoring in Context of Interpretable Machine Learning. Edited by Daniel Kaszynski, Bogumil Kaminski and Tomasz Szapiro. Warsaw: SGH, pp. 83–146. [Google Scholar]
- Xie, Shichen. 2021. scorecard: Credit Risk Scorecard, R Package Version 0.3.6; Available online: https://CRAN.R-project.org/package=scorecard (accessed on 15 February 2022).
- Zumel, Nina, and John Mount. 2014. Practical Data Science with R. New York: Manning. [Google Scholar]

**Figure 1.**CRAN release activity and download statistics (as returned by

`cranlogs`, Csárdi 2019) of packages available on CRAN.

**Figure 2.**Visualization of the bins for the variable

`purpose`as created by the package

`scorecard`(

**left**) and mosaicplot of the binning result by the package

`glmdisc`(

**right**).

**Figure 3.**Reordered correlation matrix (

**left**) and phylogenetic tree of the clustered variables (

**right**).

**Figure 4.**Scorecard performance graphs: ECDF (

**top left**); score densities (

**top right**); gains (

**bottom left**); ROC (

**bottom right**).

**Table 1.**Number of bins after automatic binning. Abbreviations of package names: sc =

`scorecard`; woeB =

`woeBinning`using

`woe.binning()`; woeB.T =

`woeBinning`using

`woe.tree.binning()`; sMU =

`scorecardModelUtils`; Rprof =

`Rprofet`; smb =

`smbinning`and cremo =

`creditmodel`.

Unique | sc | woeB | woeB.T | Glmdisc | sMU | Rprof | smb | Cremo | Riskr | |
---|---|---|---|---|---|---|---|---|---|---|

Avg. # bins | 6.33 | 4.33 | 6 | 2.67 | 3.67 | 11 | 2.67 | 2 | 2.67 | |

duration | 32 | 5 | 5 | 5 | 3 | 5 | 13 | 3 | 2 | 3 |

amount | 663 | 7 | 4 | 6 | 1 | 3 | 11 | 3 | 2 | 3 |

instRate | 4 | 4 | 4 | 4 | 1 | 4 | 4 | 4 | 2 | 1 |

residence | 4 | 4 | 4 | 4 | 3 | 2 | 4 | 4 | 3 | 1 |

age | 52 | 7 | 4 | 7 | 4 | 3 | 9 | 2 | 2 | 2 |

numCredits | 4 | 2 | 3 | 3 | 2 | 2 | 3 | 4 | 2 | 1 |

numLiable | 2 | 2 | 3 | 3 | 2 | 1 | 2 | 2 | 2 | 1 |

LCL | sc | woeB | woeB.T | Glmdisc | sMU | smb | Cremo | Riskr | |
---|---|---|---|---|---|---|---|---|---|

duration | 0.170 | 0.297 | 0.259 | 0.264 | 0.265 | 0.299 | 0.248 | 0.162 | 0.248 |

amount | 0.116 | 0.251 | 0.179 | 0.227 | 0.000 | 0.196 | 0.219 | 0.069 | 0.219 |

age | 0.078 | 0.179 | 0.169 | 0.222 | 0.189 | 0.200 | 0.187 | −0.003 | 0.187 |

numLiable | 0.000 | 0.006 | 0.006 | 0.006 | 0.006 | 0.000 | 0.006 | 0.006 | 0.000 |

numCredits | 0.000 | 0.068 | 0.068 | 0.068 | 0.068 | 0.068 | 0.061 | 0.068 | 0.000 |

residence | 0.000 | 0.006 | 0.017 | 0.017 | 0.017 | 0.029 | 0.006 | 0.017 | 0.000 |

instRate | 0.000 | 0.108 | 0.103 | 0.103 | 0.000 | 0.108 | 0.108 | 0.104 | 0.000 |

**Table 3.**Summary of the functionalities for binning and WoEs provided by the different packages where ✓ denotes available and ✗ not available. An empty field means that this is not relevant w.r.t. the scope of the package. (1): workaround available (cf. above); (2) separate bin (

`00.NA`) is created—binning of new data (

`split_bins_all()`) possible but no WoE assignment (

`(woe_trans_all)`); (3) always bin 1 assigned; (4) separate function

`missing_val()`for imputation; (5) additional function

`cat_to_new()`merges levels smaller than threshold (cf. above).

sc | smb | woeB | Cremo | Riskr | Glmdisc | sMU | Rprof | klaR | |
---|---|---|---|---|---|---|---|---|---|

automatic binning of numerics | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |

automatic binning of factors | ✓ | ✗ | ✓ | ✗ | ✓ | ✓ | ✗ | ✗ | ✗ |

store and predict numerics | ✓ | ✓ | ✓ | ✓ | (1) | ✓ | ✓ | ✗ | ✗ |

store and predict factors | ✓ | ✓ | ✓ | ✗ | (1) | ✓ | ✗ | ✗ | ✗ |

supports bin prediction | ✓ | ✓ | ✓ | ✗ | (1) | ✓ | ✓ | ✗ | ✗ |

supports WoE prediction | ✓ | ✗ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ |

summary table | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | ✗ | ✓ |

plot | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ | ✗ | ✓ | ✓ |

manual modification | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ |

multiple variables | ✓ | ✗ | ✓ | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ |

supported target levels | ✓ | ✗ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ |

adjust WoEs | ✓ | ✗ | ✓ | ✗ | ✗ | ✓ | ✓ | ||

NAs | ✓ | ✓ | ✓ | (2) | ✗ | (3) | (4) | ✓ | |

new levels | ✗ | ✗ | ✓ | ✗ | (3) | ||||

level order irrelevant | ✗ | ✓ | ✓ | ✓ | |||||

min. level size | ✗ | ✓ | ✗ | ✗ | ✗ | (5) | ✗ |

Package | Function | Target Type | Multiple Variables | WoE Adjustment |
---|---|---|---|---|

creditR | IV.calc.data() | both, levels 0/1 | yes | no |

creditmodel | get_iv_all() | both, levels 0/1 | yes | yes |

Information | create_infotables() | numeric 0/1 | yes | yes |

InformationValue | IV() | numeric 0/1 | no | yes |

klaR | woe() | factor | yes | argument |

riskr | pred_ranking() | numeric 0/1 | yes | no |

scorecard | iv() | both | yes | 0.99 |

scorecardModelUtils | iv_table() | numeric 0/1 | yes | 0.5 |

smbinning | smbinning.sumiv() | numeric 0/1 | yes | yes |

Package Function | RiskrPerf() | ScorecardPerf_Eva() | ScorecardModelUtilsGini_Table() | SmbinningSmbinning.Metrics() |
---|---|---|---|---|

KS | ✓ | ✓ | ✓ | ✓ |

AUC | ✓ | ✓ | ✓ | |

Gini | ✓ | ✓ | ✓ | |

Divergence | ✓ | |||

Bin table | ✓ | |||

Confusion matrix | ✓ | ✓ | ||

Accuracy | ✓ | ✓ | ||

Good rate | ✓ | |||

Bad rate | ✓ | |||

TPR | ✓ | |||

FNR | ✓ | ✓ | ||

TNR | ✓ | |||

FPR | ✓ | ✓ | ||

PPV | ✓ | |||

FDR | ✓ | |||

FOR | ✓ | |||

NPV | ✓ | |||

ROC curve | ✓ | ✓ | ✓ | ✓ |

Score densities | y | ✓ | |||

ECDF | ✓ | ✓ | ✓ | |

Gain chart | ✓ |

**Table 6.**Overview of R packages with the explicit scope of scorecard modelling and addressed stages of the development process.

Package | Binning & WoEs | Preselection | Scorecard | Performance | Reject Inference |
---|---|---|---|---|---|

boottol | ✓ | ||||

creditmodel | ✓ | ✓ | ✓ | ✓ | |

creditR | ✓ | ✓ | ✓ | ✓ | |

glmdisc | ✓ | ✓ | |||

glmtree | ✓ | ||||

Information | ✓ | ||||

InformationValue | ✓ | ||||

riskr | ✓ | ✓ | ✓ | ||

Rprofet | ✓ | ✓ | |||

scorecard | ✓ | ✓ | ✓ | ✓ | |

scoringTools | ✓ | ✓ | |||

scorecardModelUtils | ✓ | ✓ | ✓ | ✓ | |

smbinning | ✓ | ✓ | ✓ | ✓ | |

woe | ✓ | ✓ | |||

woeBinning | ✓ |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Szepannek, G.
An Overview on the Landscape of R Packages for Open Source Scorecard Modelling. *Risks* **2022**, *10*, 67.
https://doi.org/10.3390/risks10030067

**AMA Style**

Szepannek G.
An Overview on the Landscape of R Packages for Open Source Scorecard Modelling. *Risks*. 2022; 10(3):67.
https://doi.org/10.3390/risks10030067

**Chicago/Turabian Style**

Szepannek, Gero.
2022. "An Overview on the Landscape of R Packages for Open Source Scorecard Modelling" *Risks* 10, no. 3: 67.
https://doi.org/10.3390/risks10030067