Which Curve Fits Best: Fitting ROC Curve Models to Empirical Credit-Scoring Data

: In the practice of credit-risk management, the models for receiver operating characteristic (ROC) curves are helpful in describing the shape of an ROC curve, estimating the discriminatory power of a scorecard, and generating ROC curves without underlying data. The primary purpose of this study is to review the ROC curve models proposed in the literature, primarily in biostatistics, and to ﬁt them to actual credit-scoring ROC data in order to determine which models could be used in credit-risk-management practice. We list several theoretical models for an ROC curve and describe them in the credit-scoring context. The model list includes the binormal, bigamma, bibeta, bilogistic, power, and bifractal curves. The models are then tested against empirical credit-scoring ROC data from publicly available presentations and papers, as well as from European retail lending institutions. Except for the power curve, all the presented models ﬁt the data quite well. However, based on the results and other favourable properties, it is suggested that the binormal curve is the preferred choice for modelling credit-scoring ROC curves.


Introduction
In this paper, we discuss the modelling of receiver operating characteristic (ROC) curves in the domain of credit scoring. Over the last three decades, the retail lending world has transformed into a sophisticated, data-driven industry with automated processes and statistical tools. In a challenging environment where the danger of rising inflation, impending recession, repercussions of the COVID-19 pandemic and the Russo-Ukrainian war (to name only the most apparent risk factors) threaten the stability of the financial sector, a solid foundation of accurate data and proper tools seems even more needed.
Credit scoring is one of the most successful implementations of machine learning in finance. It is used to produce a numerical assessment of a customer's probability of not repaying the loan. Various tools are used to build credit-scoring models (scorecards), including logistic regression, decision trees and neural networks (Anderson 2007). The discriminatory power of a scoring model is usually assessed by the measures summarising its ROC curve.
ROC curve models are discussed in the literature primarily in the context of biostatistical applications. However, they may also prove to be a valuable tool in credit-risk management. As discussed in the literature section, credit-risk practitioners and researchers might need such models, for example, to estimate the AUROC (the area under the ROC curve) on a sample basis or to better understand the shape of the curve and its implications for credit decisions. In credit management practice, the need to model ROC curves also arises when one wants to assess the impact of the credit scorecard that is yet to be built. For example, a lender knows that an analyst team can produce a new scorecard with a Gini coefficient 15% higher than the current one. This new scorecard may be based on newly available data or may be developed using a new classification methodology. The question arises: to what extent introducing the scorecard can reduce credit losses or

Literature Background
The literature on ROC curves is extensive, but most of it comes from areas not related to credit risk. This is not surprising as ROC curves derive from signal-detection theory (Wichchukit and O'Mahony 2010;Swets 2014). They are currently popular in many domains as a method to graphically present the separation power of binary classifiers (Fawcett 2006). Examples of such binary classifiers include diagnostic tests and biomarkers in biostatistics (Hajian-Tilaki 2013;Mandrekar 2010;Faraggi et al. 2003;Cook 2007), detectors in signal processing (Bowyer et al. 2001;Atapattu et al. 2010), credit scores in banking (Blöchlinger and Leippold 2006;Thomas 2009;Anderson 2007), and, generally, binary classification models in machine-learning applications (Hamel 2009;Guido et al. 2020).
An ROC curve plots the true-positive fraction against the false-positive fraction as the cut-off point varies (Metz 1978;Krzanowski and Hand 2009). Depending on the domain, the true-positive and the false-positive fractions may be referred to as a detect rate and false-alarm rate in signal processing (Levy 2008;Chang 2010), as sensitivity and 1-specificity in biostatistics (Park et al. 2004), or as a cumulative bad and good proportion in credit scoring (Thomas 2009).
An ROC curve can be summarised with its "area under the curve" (AUC or AUROC) index (Bamber 1975;Bradley 1997;Bewick et al. 2004), also called a c-statistic or a c index (Cook 2008). The AUROC measures the discrimination power of the binary classifier. Generally, with some reservations related to uncertainty (Pencina et al. 2008), the shape of the curve (Idczak 2019;Řezáč and Koláček 2012) as well as to the cost of misclassification (Cook 2007;Hand 2009), the higher the AUROC, the better. Plotting the ROC curve for a scorecard is not a necessity. Some researchers propose alternative approaches (e.g., Hand 2009), but the ROC curve is considered standard practice by credit-risk professionals (Anderson 2007;Thomas et al. 2017;Siddiqi 2017). Credit-scoring researchers and practitioners frequently use the AUROC to assess, improve, and compare scorecards (Djeundje et al. 2021;Tripathi et al. 2020;Xia et al. 2020;Shen et al. 2021;Lappas and Yannacopoulos 2021).
In the credit-scoring domain, the Gini coefficient, which, in fact, is a rescaled AUROC, is often used by the practitioners (Thomas et al. 2017, p. 191;Řezáč andŘezáč 2011): It should be noted that the Gini coefficient is equivalent to a special case of Somers' D statistic where one of the associated variables (the target/response) is binary (0/1) and the other one (the classifier) is at least ordinal (Somers 1962;Thomas et al. 2017, p. 189).
If the classifier function and the data set are given, then the empirical ROC curve is uniquely determined. However, an ROC curve model, i.e., a mathematical formula approximating the curve, may be needed in certain circumstances. The literature examples of such situations include (1) the sample-based inference and estimation of AUROC, (2) the ROC curve shape description, and (3) the simulation of the ROC curve when data are absent or scarce.
(1) The use of ROC curve models to estimate the AUROC is widespread, primarily in biostatistics. With the ROC curve model, the path of the curve can be estimated based on sample results, and the confidence intervals for the AUROC may be computed (Lahiri and Yang 2018;Gonçalves et al. 2014;Hanley 1996;Faraggi and Reiser 2002). Satchell and Xia (2008) discussed using the analytic models for ROC curves in the credit-scoring context. They claimed that the theoretical ROC curves are helpful, especially when the sample size of bad customers is small, as the models help increase the accuracy of the AUROC estimation. Indeed, the scarcity of data usually drives the need for such models. Some of the formulas described in this paper were derived in this context and serve as the basis for such an estimation (Hanley 1996;Bandos et al. 2017;Mossman and Peng 2016).
(2) Knowing the area under the curve turns out to be insufficient, especially when one takes no account of its shape. Janssens and Martens (2020) discussed the importance of ROC curve shapes in medical diagnostics. Hautus et al. (2008) utilised the binormal model to demonstrate how the shape of the ROC curve relates to the same-different sensory tests in the food industry. Omar and Ivrissimtzis (2019) fitted the theoretical ROC bibeta curve to the results of machine-learning models in order to show that such an approach provides additional insight into the behaviour at the ends of the ROC curves. Řezáč and Koláček (2012) and Idczak (2019) showed that, depending on the shape, some credit-scoring models excel at distinguishing the best customers from good ones, while others are preferred if a lender intends to exclude the worst loan applicants. A practical example of such an analysis was provided by Tang and Chi (2005); the model with a much lower AUROC still outperformed the competing, high-AUROC model in terms of the accuracy in classifying the best customers.
Another example when information about the shape of an ROC curve is needed is a situation when the data are censored or "truncated" (Scallan 2013), i.e., a lender does not have good/bad information on rejected applicants. Models for ROC curves could enable the modelling of the unknown portion of the curve.
Not taking the shape of the ROC curve into account when summarising it with the AUROC was raised as one of the major deficiencies of an ROC analysis, which led to proposing alternative measures such as Hand's h-measure. (Hand 2009;Hand and Anagnostopoulos 2013).
(3) Occasionally, a credit-risk manager might need to draw ROC curves when data are absent or scarce (for example, when a scorecard is yet to be built). In this context, Kürüm et al. (2012) showed how to use the binormal approximation to optimise the AUROC of the model for a set of corporate loans where the number of bad customers was limited. Outside the domain of credit scoring, Lloyd (2000) provided an extreme example of an inference from limited data. He estimated ROC curves from one data point per curve, assuming that the ranking function (equivalent to credit scoring in finance) was an unobserved latent variable. Of course, such an inference would not be possible without the ROC curve models (binormal model in this case).
In the literature, many models for ROC curves have been suggested. Most of the propositions come from medical statistics. The most frequently used model is the binormal curve (Hanley 1996;Metz 1978) and its modifications (Metz and Pan 1999). Other models include the bibeta (Chen and Hu 2016;Gneiting and Vogel 2022;Mossman and Peng 2016;Omar and Ivrissimtzis 2019), bigamma (Dorfman et al. 1997), bilogistic (Ogilvie and Creelman 1968), power curve (Birdsall 1973) and exponential/bifractal model (England 1988;Kochański 2021). These models will be discussed in more detail in the following section. We found no systematic review where multiple ROC curve models were fitted to the same data, be it medical data or data from any other field. However, a few papers discussed the empirical goodness of fit of particular models. For example, Swets (1986) claimed that in many empirical cases the binormal model turns out to have a good fit, and Gneiting and Vogel (2022) showed that the bibeta model fits better than the binormal one, especially under the assumption of a concave ROC curve.
Note that in the following text, the credit-scoring nomenclature is used. The binary classifier in question is the credit scoring; the observed values of the classifier are referred to as the credit scores, positives (signal, hits, cases) are "bads" or "bad customers'", negatives (noise, false alarms, controls) are "goods" or "good customers", etc. In line with the practice in credit-risk management, the Gini coefficient is preferred over the AUROC as the summary statistic of an ROC curve.
Before we go any further, one thing needs to be clarified to avoid confusion. Despite the similarity of some names, ROC curve models and classification models are completely different animals. The latter are basically classification functions that return predictions or rankings (such as credit scores). When applied to data sets, the ranking functions generate empirical ROC curves. The ROC curve models, on the other hand, are approximations of the shape of ROC curves. If we find that, for example, the bilogistic curve best approximates the ROC curve of the scoring model, then it does not mean that the logistic regression was, or should be, used to develop the model. On the contrary, the ROC curve generated by logistic regression may be best approximated by the bibeta function, and the bilogistic model may prove to have the best fit when a neural network or support vector machine is used.

ROC Curve Models
One way to look at an ROC curve is to view it as a function [0; 1]→[0; 1] built using two cumulative distribution functions. In the context of credit scoring, these two CDFs are those of the credit scores for good and bad customers. The general formula for an ROC curve is then (Gönen and Heller 2010): where F B is a CDF of the scores of bad customers, F G is a CDF of the scores of good customers, and F −1 G is its inverse.
where s denotes the value of the test, that is, the score or its monotone transformation. Several ROC curve models proposed below take advantage of this simple observation: it may be assumed that the two CDFs follow specific probability distributions. Equation (2) shows that an ROC curve is invariant to monotone transformations of the underlying scores; the score does not go directly to the ROC equation, only the CDFs do. Therefore, in the following text, the "scores" may refer to the scores themselves or to their monotone transformations.

Bibeta and Simplified Bibeta Models
If the scores for bad borrowers are distributed according to a beta distribution with parameters α B and β B and the scores for good customers follow Beta(α G ,β G ), then the formula for the ROC curve is where F αB,βB and F αG,βG are CDFs of the two beta distributions (Gneiting and Vogel 2022; Omar and Ivrissimtzis 2019). Such a model could be called a "bibeta" ROC curve. The bibeta model that has been used in several articles (Chen and Hu 2016; Mossman and Peng 2016) is a simplified version of the above equation, where α B = 1 and β G = 1. Then: and so, the formula for the bibeta ROC curve is reduced to the following: In this paper, the curve generated by (7) will be called the "simplified" bibeta.

Bigamma Model
The "bigamma" model (Dorfman et al. 1997), by analogy, assumes that the scores, or some monotone transformation of them, follow two gamma distributions: Note that Dorfman et al. (1997) proposed that α B = α G , β B = 1 and 0 < β G ≤ 1, but these restrictions are not included in this article, because they resulted in a much worse fit in the preliminary calculations.

Binormal Model
The same analogy could be used to build the binormal model: Equation (9) has four parameters, but it can be rewritten in an equivalent form with just two parameters: where Φ is a CDF of a standard normal variable, Φ −1 is its inverse and a equals the distance between the mean scores of goods and bads measured in terms of units of s.d. of the good scores: and b is the ratio between the standard deviations of the bad and good scores:

Bilogistic Model
There is also another perspective of the binormal model. Φ may be viewed as a (probit) link function in the more general equation for an ROC curve: where g( ) is a link function. Taking such a perspective, we can select another link function. If a logit function is taken: the bilogistic ROC curve model is derived. After several transformations, the formula for the bilogistic curve looks as follows: The Formula (15) will be used as the bilogistic model in the next section. All the curves discussed before the bilogistic model are examples of parametric ROC curve models, where one starts with distributions and then arrives at the formula. The bilogistic curve can be viewed as an "algebraic" ROC curve model, where the underlying distribution is somewhat "secondary" to the formula itself.

Power Function
A simple example of a purely algebraic model is the "power function" (Birdsall 1973, p. 138): The power function can also be derived as the "Lehmann ROC curve" based on proportional hazards specification (Gönen and Heller 2010). From a credit-scoring perspective, the power ROC curve has an attractive property, which could be called "fractal" (Kochański 2021). If the shape of an ROC curve follows Equation (16), and if we take any fraction of the lowest-scored customers and graph the ROC curve for this group, then the shape of the ROC curve remains the same. The AUROC also remains constant: AUROC = 1/(1 + θ), as well as the Gini coefficient: Gini = (1 − θ)/(1 + θ). If one would like to make the Gini coefficient an explicit function parameter, one could reformulate Equation (16) in the following way: where γ is a parameter for the Gini coefficient.

Bifractal Model
Kochański (2021) proposed a function that keeps the shape (and the AUROC/Gini) when plotted for any fraction of the highest-scored customers: and showed that the empirical ROC curves lie somewhere between those two extremes. This observation prompted the development of the "bifractal" model: as a linear combination of the two "fractal" curves. The analogous formula was also experimentally found to provide a good fit and was referred to as "exponential ROC" or EROC by England (1988). Figure 1 illustrates the bifractal ROC curves for γ (Gini) = 0.5 and four levels of β.

Reformulated Binormal and Midnormal Models
An explicit Gini coefficient could be a decisive advantage of the bifractal compared to other models. However, as it turns out, the binormal curve function may also be reformulated, and we may obtain a function with the Gini coefficient as a parameter. Thanks to a simple analytic formula for the area under the ROC curve (Bandos et al. 2017 Equation (10) transforms so that now the formula has the Gini coefficient as its parameter (γ): The explicit Gini coefficient in the formula seems essential for the modelling. Consequently, the parametric form of the binormal model described by Equation (21) will be used for empirical curve fitting in the next section. Figure 2 illustrates the binormal ROC curves for γ (Gini) = 0.5 and various levels of b. The apparent advantage of the bifractal function in Equation (19) is that it contains the Gini coefficient as its explicit parameter. Moreover, the other parameter of the bifractal ROC curve has a meaningful and intuitive interpretation as a distance between the two fractal curves.

Reformulated Binormal and Midnormal Models
An explicit Gini coefficient could be a decisive advantage of the bifractal compared to other models. However, as it turns out, the binormal curve function may also be reformulated, and we may obtain a function with the Gini coefficient as a parameter. Thanks to a simple analytic formula for the area under the ROC curve (Bandos et al. 2017): Equation (10) transforms so that now the formula has the Gini coefficient as its parameter (γ): The explicit Gini coefficient in the formula seems essential for the modelling. Consequently, the parametric form of the binormal model described by Equation (21) will be used for empirical curve fitting in the next section. Older works have also described a simplified version of the binormal model (this simplified version is sometimes referred to as the "normal" curve). This model is based on the assumption of the equal variances of the underlying score distributions of good and bad observations (Swets 1986). As a consequence, the b parameter equals 1, and Equation (21) is transformed to: Such a model has only one parameter, the Gini coefficient, and will here be referred to as the "midnormal" model

Midfractal Model
Similarly, one can reduce the bifractal model (Equation (19)) to a one-parameter curve by setting β to 0.5 (in the middle of the two fractal curves): Again, this curve has only one parameter γ, which stands for the Gini coefficient. This curve will here be referred to as the "midfractal" model.

Fitting ROC Curve Models to Empirical Data
The theoretical models for an ROC curve presented in the previous section can be fitted to the empirical ROC data of real-life scoring models in credit institutions. For the empirical analysis presented below, two types of sources of ROC curve data were used: (1) research/industry articles and presentations and (2) data from credit institutions obtained under an anonymity condition. Older works have also described a simplified version of the binormal model (this simplified version is sometimes referred to as the "normal" curve). This model is based on the assumption of the equal variances of the underlying score distributions of good and bad observations (Swets 1986). As a consequence, the b parameter equals 1, and Equation (21) is transformed to: Such a model has only one parameter, the Gini coefficient, and will here be referred to as the "midnormal" model.

Midfractal Model
Similarly, one can reduce the bifractal model (Equation (19)) to a one-parameter curve by setting β to 0.5 (in the middle of the two fractal curves): Again, this curve has only one parameter γ, which stands for the Gini coefficient. This curve will here be referred to as the "midfractal" model.

Fitting ROC Curve Models to Empirical Data
The theoretical models for an ROC curve presented in the previous section can be fitted to the empirical ROC data of real-life scoring models in credit institutions. For the empirical analysis presented below, two types of sources of ROC curve data were used: (1) research/industry articles and presentations and (2) data from credit institutions obtained under an anonymity condition.
(1) We used the following papers containing data or at least graphs of empirical ROC curves related to credit scoring:Řezáč andŘezáč (2011) (2015) and Conolly (2017) were used. To obtain the numbers (x and y coordinates of the points that make up the empirical ROC) in some cases, it was necessary to read the data from the graph itself; therefore, an online tool was used to transform graphs into numbers by pointing and clicking.
(2) Four retail lenders in Europe shared the empirical ROC curves of their credit-scoring models. The data were provided under the condition of anonymity. These models are presented in this article under the symbols A1, A2, B1, B2, B3, C1, and D1.
All the empirical curves in the analysis reflect the discrimination characteristic of some form of a credit-scoring model; the only exception is the antifraud model from Wójcicki and Migut (2010), where the target variable is a fraudulent loan, not a default. The empirical ROC curves describe scoring models that are created using various methods, including support vector machines (Tobback and Martens 2019) or a proprietary methodology (Jennings 2015), but the dominant approach is logistic regression in various forms (Hahm and Lee 2011;Řezáč andŘezáč 2011;Wójcicki and Migut 2010;Berg et al. 2020). In the case of the data coming from the four credit institutions, we do not have information about the methods used to develop the scoring models.
Once the data are available, the question emerges: What is the adequate procedure for fitting the curve? The binormal and bilogistic curves may be fitted quite intuitively through a probit/logit transformation and a simple linear-regression fitting (Swets 1986), and the parametric ROC curve models may be fitted with maximum-likelihood procedures (Metz and Pan 1999;Ogilvie and Creelman 1968). Such a procedure is not available for algebraic models (such as the bifractal curve or power function). As it is reasonable to use the same fitting method for all the ROC curve models, we applied the minimum distance estimation (MDE) method as developed by Hsieh and Turnbull (1996) and Davidov and Nov (2012), and described by Jokiel-Rokita and Topolnicki (2019). We used the numerical optimisation in R (optim function from the R stats package). The "objective function" to be minimised is the L 2 -distance measure between the empirical ROC curve and the theoretical ROC curve function (Jokiel-Rokita and Topolnicki 2019): where ROC e (x) is the empirical ROC curve (piecewise linear interpolation of empirical ROC data points) and ROC t (θ, x) is the ROC curve model with θ as a vector of parameters (1-4 parameters depending on the function). The minimal distance estimatorθ of the parameter vector is defined by: As a result of the choice of such an objective function, the average vertical root-meansquare distance between the empirical curve and the theoretical curve was minimised.
Before we provide the summary results based on all the curve models and all the ROC data sets gathered, let us introduce an illustrative example. The example results of fitting four curves (bifractal, binormal, bilogistic, and power) to the empirical ROC curve obtained for the purposes of this study from an anonymous credit institution (D1) are presented in Figure 3 and Table 1. As it can be seen, the binormal and bifractal models fit the data quite well. At the same time, the highest deviation was observed in the case of the power curve. Moreover, the Gini coefficient implied by the power curve differed slightly from the Gini coefficient implied by the first two models (and also differed from the actual underlying Gini coefficient, which was circa 0.43).

Model
Parameters Fitting Objective (fobj) Binormal (Equation (21)) b = 0.9539; γ = 0.4290 fobj = 8.09 × 10 −5 Bifractal (Equation (19)) β = 0.4239; γ = 0.4298 fobj = 8.40 × 10 −5 Bilogistic (Equation (12)) α0 = 1.2884; α1 = 0.9279 fobj = 3.31 × 10 −4 Power (Equations (16) or (17)) θ = 0.4072 or γ = 0.4212 fobj = 1.27 × 10 −3 Table 2 presents the goodness of fit for all the data sets gathered. For reasons of clarity, the square root of the minimised objective (fobj) was multiplied by 100; note that it can then be interpreted as the average (root mean squared) vertical distance between the empirical ROC curve and the fitted ROC curve, expressed in percentage points. As it turns out, the binormal model was the best in terms of goodness of fit in 10 cases. On average, the vertical distance between the empirical and binormal ROC curves was less than one percentage point and in the worst case it did not exceed two points, which could be considered an excellent fit. The bibeta model won in five instances. The bilogistic model, which on average fit slightly worse than the competing models, was the best in four cases. The bifractal model also showed a good fit, but it was worse than the binormal in all but one case. The power curve showed the worst fit, with one exception. Figure 4 summarises findings from Table 2 and presents the average goodness of fit (square root of ) for each model. The summarised data showed that all the models presented, except for the power curve, were comparable in terms of the average goodness of fit. However, both four-parameter models (bibeta and bigamma) and the binormal model were, on average, the ones matching the data best.

Model
Parameters Fitting Objective (f obj ) Binormal (Equation (21)) b = 0.9539; γ = 0.4290 f obj = 8.09 × 10 −5 Bifractal (Equation (19)) β = 0.4239; γ = 0.4298 f obj = 8.40 × 10 −5 Bilogistic (Equation (12)) α 0 = 1.2884; α 1 = 0.9279 f obj = 3.31 × 10 −4 Power (Equations (16) or (17)) θ = 0.4072 or γ = 0.4212 f obj = 1.27 × 10 −3 Table 2 presents the goodness of fit for all the data sets gathered. For reasons of clarity, the square root of the minimised objective (f obj ) was multiplied by 100; note that it can then be interpreted as the average (root mean squared) vertical distance between the empirical ROC curve and the fitted ROC curve, expressed in percentage points. As it turns out, the binormal model was the best in terms of goodness of fit in 10 cases. On average, the vertical distance between the empirical and binormal ROC curves was less than one percentage point and in the worst case it did not exceed two points, which could be considered an excellent fit. The bibeta model won in five instances. The bilogistic model, which on average fit slightly worse than the competing models, was the best in four cases. The bifractal model also showed a good fit, but it was worse than the binormal in all but one case. The power curve showed the worst fit, with one exception. Figure 4 summarises findings from Table 2 and presents the average goodness of fit (square root of f obj (θ)) for each model. The summarised data showed that all the models presented, except for the power curve, were comparable in terms of the average goodness of fit. However, both four-parameter models (bibeta and bigamma) and the binormal model were, on average, the ones matching the data best.

Discussion and Conclusions
The empirical results presented in the previous section constitute the first-as far as we know-test of various ROC curve models against empirical credit-scoring data. They show that the bibeta model, on average, fits the empirical data best. The goodness of fit for the binormal model is at a comparable level. The binormal model is also the model that proved to be the best fit for the largest number of data sets. Obviously, the obtained results can be reinforced or undermined by further research based on more data on empirical ROC curves. This is one of the reasons why we are sharing the code used in this article that would allow anyone interested to perform such tests.
It is worth noting that ROC curve models are not predictive statistical models, as understood by, e.g., Hastie et al. (2016). ROC curve models are descriptive mathematical models; some are parametric (they use assumptions about probability distributions), whereas others are purely algebraic. Their primary purpose is to approximately describe empirical ROC curves. Therefore, the methods of selecting and assessing predictive models (Hastie et al. 2016;Ramspek et al. 2021) are not directly applicable in the exercise presented in the preceding section, as it is not about finding the best predictive model. Instead, it is an exercise in finding the best theoretical curve approximating the empirical one. It is somewhat similar to the curve-fitting tasks in computer science or engineering (Fang and Gossard 1995;Frisken 2008;Guest 2012). The minimum distance approach proposed and described by Hsieh and Turnbull (1996), Davidov and Nov (2012), and Jokiel-Rokita and Topolnicki (2019) seems to be the optimal method for such an exercise as it allows for fitting both parametric and algebraic curves in the same, unified way.
When assessing the appropriateness of a particular ROC curve model, not only the goodness of fit counts. One should consider some other aspects, including the number of parameters and the possibility of interpreting them. Intuitively, the fewer parameters in the formula, other things being equal, the better. As illustrated in Figure 4, there were three models with only one parameter (the Gini coefficient of a given scorecard): the midnormal, midfractal, and power models. The midnormal curve was the best-fitting model in this category. The binormal, bifractal, bilogistic and simple bibeta models require two parameters. The binormal model won over all the other two-parameter models. The best-fitting (on average) model, the bibeta curve, has four parameters.
From the perspective of the credit-scoring modelling practice, it is vital to have an explicit Gini/AUC parameter in the ROC formula (Kochański 2021). Models based on fractal curves (the bifractal and midfractal models) fulfil this postulate. When Equation (21) is the basis of a binormal (or midnormal) curve, the explicit Gini parameter is also available. A power curve is another example of a curve that can be defined so that its (one and only) parameter is the Gini coefficient. Still, as shown in the previous section, its fit with the empirical data was much worse than that of the competing models. In consequence, there were five models with an explicit Gini parameter on our list; other models did not allow for simple reformulation aimed at obtaining the Gini coefficient as the input.
The other parameter of the bifractal model, responsible for the shape of the curve, also has an apparent meaning. However, the bifractal model lacks a theoretical foundation, which may be considered a substantial disadvantage of this approach.
A potential shortcoming of the binormal model is the presence of "hooks", i.e., nonconcave regions that are irrational for the ROC curve. Such a "hook" is a portion of the curve below the 45º diagonal, which makes random guessing in these regions a better option than making decisions based on the ROC. Curves with such "hooks" are referred to as "improper" curves. To address this shortcoming, "proper" ROC curves have been suggested (Chen and Hu 2016;Dorfman et al. 1997;Metz and Pan 1999).
It seems that in practice, in the credit-scoring context, the "improperness" does not constitute a problem. The binormal curve demonstrates no hooks for b = 1, and if b is close to 1, the size of the hook regions is negligible from a practical standpoint. For the empirical data sets from Section 3, the maximum b of the fitted binormal curve was 1.29, and the minimum was 0.79. Visual inspection confirmed that the hooks were not visible (yet they were present; for example, for the fitted binormal curve with b = 1.29 and γ = 0.74, there was a hook region for x < 10 −9 ).
Another argument against the binormal/midnormal curves (as well as against the bibeta and bigamma) is that these models require quite complex mathematical operations (Birdsall 1973, pp. 100-8). Such an argument would support the bilogistic and bifractal model; however, thanks to the availability of specialised computer software and cheap computing power, it is not as important as it probably would have been half a century ago. Hanley (1988) summarised the arguments in favour of the binormal model. The claims included mathematical tractability and convenience, and theoretical considerations. Empirical results (Swets 1986) have also shown that the model fits the data quite well. Additionally, as shown by Hanley (1988), because of the relative scarcity of medical data, the random noise is much more visible than the deviations driven by the differences between the models. The scarcity of data is a less frequent problem in the credit-scoring domain, but the empirical argument (the fit turns out to be the best for many real-life instances) is even more vital. Figure 5 shows the results of a visual inspection as proposed by Swets (1986). The plots are "binormal" in the sense that the cumulative bad and good proportions were rescaled according to their standard normal distribution deviates (Φ −1 (x) and Φ −1 (y)). If the binormal model is adequate, then the empirical data points should gather along a straight line. For most of the empirical curves presented Section 3, this was true. The data from Tobback and Martens (2019) had the most pronounced deviations (see the irregular shape in Figure 5c), but none of the other ROC curve models could explain this anomaly.

Model Pros Cons
Bigamma/bibeta/simple Good fit. "Proper" with 4 parameters (2 in case of simple bibeta), no explicit AUROC parameter,  Table 3 brings together the advantages and disadvantages of the ROC curve models discussed in this article. Table 3. ROC curve models-pros and cons.
4 parameters (2 in case of simple bibeta), no explicit AUROC parameter, complicated implementation (requires beta and gamma distribution functions).
On average, the worst among the models with more than one parameter, presence of non-concave regions.
Bifractal/midfractal 2 parameters (or 1 in case of midfractal), explicit Gini parameter, interpretable shape parameter, only the simplest mathematical operations needed, monotone in the whole domain.
Binormal/midnormal 2 parameters (or 1 in case of midnormal), the model may be reformulated to produce an explicit Gini parameter. Mathematical tractability, convenience, good empirical fit in case of credit scoring and in other domains.
Presence of "hooks": non-concave regions of the curve if b = 1.

Power
One parameter Very poor fit.
So far, the literature on the subject of credit scoring has not devoted much attention to ROC curve models. With the exception of Satchell and Xia (2008), Kürüm et al. (2012), and Kochański (2021), we did not find articles in this area that directly refer to curve models, binormal or others. This study helps to fill that gap and provides credit-risk managers and researchers with a useful set of ROC curve modelling tools. As demonstrated in the literature section, ROC curve models are valuable for credit-risk management. They provide methods for determining confidence intervals and inferring the AUROC from a sample. Thanks to them, it is possible to model the impact of scoring models that have not yet been built. They also allow for a concise description of the curve shape: a scorecard with the same AUROC but a different shape of the curve may be used differently (cutting off the worst customers versus selecting the best of the best). Each of these applications deserves separate research; the results provided in this paper provide a good starting point for such studies.
Concluding, the binormal model seems to be the optimal approach to modelling creditscoring ROC curves. When a one-parameter ROC curve model is needed, the midnormal (the binormal model with equal variances assumption) seems to be the right choice. The binormal model can be accommodated to have the Gini coefficient as a parameter. This feature is quite essential from a credit-risk-management perspective. Additionally, the mathematical tractability of the model, as well as convenience and theoretical considerations provide arguments in favour of this approach. "Improperness" of the binormal model (presence of nonconcave "hook" regions if variances are not equal) seems to have little practical importance.