Evaluating the Performances of Biomarkers over a Restricted Domain of High Sensitivity

: The burgeoning advances in high-throughput technologies have posed a great challenge to the identiﬁcation of novel biomarkers for diagnosing, by contemporary models and methods, through bioinformatics-driven analysis. Diagnostic performance metrics such as the partial area under the ROC ( pAUC ) indexes exhibit limitations to analysing genomic data. Among other issues, the inability to differentiate between biomarkers whose ROC curves cross each other with the same pAUC value, the inappropriate expression of non-concave ROC curves, and the lack of a convenient interpretation, restrict their use in practice. Here, we have proposed the ﬁtted partial area index ( FpAUC ), which is computable through an algorithm valid for any ROC curve shape, as an alternative performance summary for the evaluation of highly sensitive biomarkers. The proposed approach is based on ﬁtter upper and lower bounds of the pAUC in a high-sensitivity region. Through variance estimates, simulations, and case studies for diagnosing leukaemia, and ovarian and colon cancers, we have proven the usefulness of the proposed metric in terms of restoring the interpretation and improving diagnostic accuracy. It is robust and feasible even when the ROC curve shows hooks, and solves performance ties between competitive biomarkers.


Introduction
Numerous medicinal sciences and life science issues dealing with data from highthroughput experiments are focused on the identification of key biomarkers, and the development of predictive models and medical prognosis systems. In the literature, two of the most intensively statistical approaches used for evaluating and comparing the overall binary diagnostic performance, both of single markers and scoring functions combining several tests, have been the receiver operating characteristic (ROC) curve and the area under this ROC curve (AUC).
The main goal of a diagnostic (bio)marker or classifier is basically to discriminate instances with a condition of interest (D = 1) from those without such a condition (D = 0), such as the presence of a suspect disease from absence of it, a positive response to a targeted therapy from a negative one, transcriptional activity of a sequence from inactivity, and faulty modules in software systems from non-faulty ones. A continuous marker, X, can be dichotomised into positive and negative instances by choosing one of the marker scores c as a cut-off point, also named the decision threshold. On the basis of the true status (real diagnosis) of each instance being known, named the gold standard, the diagnostic accuracy of a marker is mainly measured by its specificity and sensitivity. The first metric, also named the true negative ratio (TNR), is the probability for a negative instance to be correctly diagnosed as negative. The other one, also called the true positive ratio (TPR), is the probability for a positive instance to be correctly diagnosed as positive. Notice that the false-positive ratio (FPR or 1-specificity) and TPR (or sensitivity) represent the probability of type-I error and the complementary probability of type-II error, respectively. In addition, both FPR and TPR are functions of such a threshold running over the entire range of possible biomarker scores, defined formally as FPR(c) = P(X 0 > c) and TPR(c) = P(X 1 > c), where X 0 = (X|D = 0) and X 1 = (X > c|D = 1). The accuracy of a classifier is thereby measured by these two probabilities estimated at each diagnostic threshold through the ROC curve. This two-dimensional plot displays the pairs (FPR(c), TPR(c)) for all the thresholds c, and can be written either as (x, ROC(x)) with ROC(x) = TPR • FPR −1 (x) for x = FPR(c) ∈ [0, 1], or analogously as (ROC −1 (y), y) with ROC −1 (y) = FPR • TPR −1 (y) for y = TPR(c) ∈ [0, 1]. Graphically, it depicts the trade-offs available between both aspects of biomarker diagnostic performance across all the range of possible thresholds. An increase in the sensitivity comes at the expense of a decrease in the specificity and vice versa [1][2][3].
The AUC is commonly used in many ROC-based analyses [4,5] as a single global index or summary metric for evaluating the overall discriminative ability of a predictive and prognostic test to correctly classify instances into one of the two mutually exclusive states of the condition of interest. The empirical AUC is equivalent to the Mann-Whitney U statistic, and its value is commonly interpreted as the probability that an instance randomly drawn among the ones with the condition of interest shows a marker score higher than an instance randomly selected from those instances without it [6,7]. It is assumed that the ROC curve of a perfect biomarker would have AUC = 1; i.e., such a classifier discriminates instances perfectly as with condition of interest or without it. Meanwhile, a completely random classifier would have an ROC curve lying on the diagonal line (named chance line), i.e., AUC = 0.5. In this case, the discriminatory predictive ability of this diagnostic test is no better-than-chance (chance performance). Hence, AUC varies from 0.5 to 1 for ROC curves reporting better-than-chance performance. The AUC has other convenient interpretations, such as the average sensitivity value for all values of specificity or the average specificity value for all values of sensitivity [2]. This overall evaluation metric does not depend on both the cut-off value and the prevalence of the cases, and thus is invariant under the case-control sampling [8].
Nevertheless, not all the regions of the ROC curve are of interest and importance in many bioinformatics and screening medical applications [18][19][20][21], since low FPR and high TPR are biologically relevant or clinically acceptable. For instance, a high specificity (low FPR) range on the horizontal bandaxis would be demanded for the detection of a rare disease or cancer screening in which it is important to "rule in" a disease (e.g., a disease whose treatment implies major side effects), see [22]. On the other hand, a high sensitivity (high TPR) range on the vertical axis would be a priority when it is important to "rule out" a disease (e.g., a fatal disease if untreated) a range of relatively high TPRs would be chosen, i.e., high sensitivities [3,23]. Thus, AUC may not be a meaningful ROC-based metric of diagnostic performance in a pre-specified confined range. In such situations, the partial area under the ROC curve (pAUC) attracts more attention as diagnostic accuracy metric by summarising the portion of the ROC curve over a pre-specified range [23][24][25][26], such as the rule-out (high sensitivity) or rule-in (high specificity) regions [27].
However, the pAUC has been questioned for the lack of a convenient interpretation, since a biomarker describing locally better-than-chance performance might well yield pAUC values close to 0, in contrast to the conventional AUC. In addition, the pAUC has some limitations as a metric of predictive accuracy such as in the two classifier comparisons with equal pAUC values derived from ROC curves crossing over the same restricted range, which continues unsolved [28].
To address such shortcomings, some pAUC indexes have been developed by different transformations. Thus, the standardised partial AUC (SpAUC) index provided by McClish [24] is focused on a specificity range (FPR 1 , FPR 2 ). Upper and lower plausible bounds of the pAUC are proposed to scale the possible values into the interval (0.5, 1), and be thereby interpreted appropriately as a measure of diagnostic performance, see also [2,3]. As the upper limit of the pAUC for the SpAUC index, it was considered the rectangle with high the unit and base (FPR 2 − FPR 1 ) corresponding to the pAUC of a perfect classifier (ROC(x) = 1, for all x ∈ (0, 1)) restricted to the specificity range (FPR 1 , FPR 2 ). Whereas the lower one was established as the pAUC between FPR 1 and FPR 2 for a completely non-informative biomarker (FPR(c) = TPR(c), for all c) given by the area of the trapezoid (FPR 1 , 0), (FPR 2 , 0), (FPR 2 , FPR 2 ) and (FPR 1 , FPR 1 ). Furthermore, Ma et al. [5] derived two important properties which facilitate a suitable use of this summary index for proper ROC curves, i.e., ROC curves bounded below by the diagonal line, ROC(x) ≥ x. Nevertheless, the SpAUC index could still present some drawbacks that limit its widespread use, since the lower bound used in the SpAUC is not well-defined for improper ROC curves, i.e., ROC curves crossing the diagonal line which are frequent in practice [29][30][31][32][33]. Moreover, the SpAUC is not able to distinguish between two crossing ROC curves with equal pAUC values in the range of interest. Vivo et al. [26] have provided an alternative pAUC index for any restricted specificity range named the tighter partial area index (T pAUC) index which overcomes such limitations of the SpAUC. Recently, the T pAUC has been implemented in the R/Bioconductor package ROCpAI [34], which also offers functions to estimate pAUC and SpAUC, and their respective stabilities provided by their confidence intervals using bootstrap resampling.
On the other hand, in order to summarise meaningfully the diagnostic performance of a biomarker over a high sensitivity range in which "rule out" a disease is important, Jiang et al. [23] proposed a dual pAUC index, conceptually similar to the SpAUC, within the true-positive band (TPR0, 1). This normalised partial area (N pAUC) index comes from dividing the pAUC by 1 − TPR 0 , i.e., the area of the rectangle above the pre-selected high sensitivity. The N pAUC can be interpreted as the average value of specificity for all sensitivity values above TPR 0 [2,3,23] and is also valid for use in improper settings. Nevertheless, as the authors mentioned, the values of this N pAUC index might be less than 0.5. Moreover, although this partial area index is a more meaningful summary of diagnostic performance in high sensitivity situations, it could still present some drawbacks for comparing two or more diagnostic performances over the same restricted interval of TPR values, since two portions of ROC curves may differ in their shape but enclose the same pAUC value.
In this work, to tackle such issues, an alternative pAUC index called the fitted partial area index (FpAUC) is proposed to summarise the discriminatory performances of highly sensitive markers satisfying the following characteristics: (a) to be equivalent to full AUC when TPR 0 = 0 for informative biomarkers, (b) to have a suitable interpretation as a diagnostic performance metric, (c) to be applicable for any ROC curve shape, and (d) to be capable of distinguishing between two or more crossing ROC curves with the equal pAUC values. To do that, new upper and lower bounds are derived for the pAUC over the interval (TPR 0 , 1) by adding an important characteristic of the ROC curve and involving all the possible shapes. Through an algorithm, we provide a complete framework for the evaluation of highly sensitive markers in terms of discriminatory capacity by the proposed FpAUC, which does surpass the mentioned drawbacks of the N pAUC. Moreover, our algorithm was implemented in the R programming language [35]. The code of the FpAUC function and the required internal functions for its computing are available in the Supplementary Materials.
The rest of this paper is organised as follows. In Section 2, we derive fitter bounds for the pAUC above a pre-specified sensitivity threshold under flexible assumptions, which do not only assume concave shapes as proper ROC curves; they also extend to any ROC curve shape. From these bounds, the construction of the novel FpAUC index as a more meaningful metric of the pAUC is discussed and provided through a general algorithm to compute it for any ROC curve covering all possible situations in practice. In Section 3, the variance of the FpAUC estimator is derived under the assumption of a binormal ROC model, and the performance of the estimate of the FpAUC index is also assessed via the results of simulation studies to verify the appropriateness of the proposed index. Moreover, applications to genomic datasets involving leukaemia and ovarian and colon cancers are illustrated in Section 4. The paper is completed with a discussion in Section 5.

ROC Partial Area for Highly Sensitive Diagnostic Markers
In this section, lower and upper bounds involved in the standardisation of the pAUC are derived. We show that the boundaries provided here are fitter than those given in [23]. In addition, the pAUC transformed into the FpAUC index can produce more reliable performance estimates for highly sensitive markers with any ROC curve shapes, satisfying the characteristics listed in the introduction.

Fitter Boundaries
We firstly considered the pAUC restricted to the interval (TPR 0 , 1) ⊆ (0, 1), which is defined as the area that lies above TPR 0 under an ROC curve, mathematically expressed as follows: where the TPR 0 is a pre-selected sensitivity threshold for a given diagnostic test, y ∈ [TPR 0 , 1] represents the y-coordinate of an ROC curve, TPR −1 (y) is the decision threshold for the diagnostic classification, and FPR(TPR −1 (y)) corresponds to its x-coordinate of such an ROC curve. The A TPR 0 is bounded by 0 and 1. It is null when the interval is reduced to a point, and becomes identical to the AUC when the interval (0, 1) is considered. At first glance, A TPR 0 is bounded above by the area of the rectangle of the ROC space delimited by the band (TPR 0 , 1) that encompasses it, i.e., the rectangle of side-lengths 1 and 1 − TPR 0 . Moreover, when the ROC curve is proper in (TPR 0 , 1), the A TPR 0 is bounded below by the area of the triangle with corners (TPR 0 , TPR 0 ), (1, TPR 0 ), and (1, 1) (see Figure 1a). Therefore, the following boundaries can be established:  It is easily observed that this lower bound is only applicable for proper ROC curves, which, on the other hand, implies that the AUC is bounded by 0.5 and 1. In addition, notice that the upper bound in (2) was used by Jiang et al. [23] to scale the A TPR 0 , providing the N pAUC index, which is a more meaningful and more accurate index in such a highsensitivity region. This N pAUC index will be further discussed in the next subsection.
Nevertheless, boundaries fitter than those given in (2) can be derived for the A TPR 0 . Thus, considering that an ROC curve is determined by the representation of pairs of FPR and TPR, a fitter upper bound can be found by looking at the rectangle with apexes (FPR 0 , TPR 0 ) and (1, 1), where FPR 0 = FPR(TPR −1 (TPR 0 )) (see Figure 1a). Thus, we have that the A TPR 0 is closely bounded above by the area of the rectangle of height 1 − TPR 0 and width 1 − FPR 0 ; i.e., On the other hand, a narrower lower bound can be established to be also valid for any ROC curve shape, by incorporating the negative likelihood ratio (NLR) of the ROC curve of a classifier to the assumptions. The NLR is defined as the false negative ratio (1 − TPR) over true negative ratio (1 − FPR), and can be mathematically expressed by NLR(x) = (1 − ROC(x))/(1 − x) for each point (x, ROC(x)) on the ROC curve. It provides a diagnostic performance metric of how many times patients with a disease are more (or less) likely to have a negative result than patients without the disease [36]. Furthermore, NLR(x) represents the slope of the straight line which passes through the point of the ROC curve and the upper-right corner (1, 1). For concave ROC curves, the NLR is monotone decreasing [5], and consequently, the portion of the curve in the horizontal band (TPR 0 , 1) is above the straight line connecting (FPR 0 , TPR 0 ) and (1, 1) (see Figure 1a). By definition, the ROC curve is monotonous non-decreasing, but it is not necessarily concave, since it might cross the chance line and/or display a hook showing locally worse-than-chance performance.
Hence, a lower boundary of the A TPR 0 can be provided when the NLR of the ROC curve is bounded above by the lower extreme in the high sensitivity band (TPR 0 , 1), i.e., NLR(x) ≤ NLR 0 for x ≥ FPR 0 and NLR 0 = NLR(FPR 0 ), which will be called partially bounded NLR. Thereby, a fitter lower bound for the A TPR 0 can be found by looking at the triangle with corners (FPR 0 , TPR 0 ), (1, TPR 0 ), and (1, 1) ( Figure 1a): As is shown in Figure 1b, an ROC curve can be partially proper in (TPR 0 , 1). It is not concave over the entire high sensitivity range and dips below the line with slope NLR 0 , but it does not cross the chance line, which becomes a lower limit of the ROC curve in (TPR 0 , 1). Thus, if there exists at least an x ≥ FPR 0 such that NLR(x) > NLR 0 , and NLR(x) ≤ 1 for all x ≥ FPR 0 , the pAUC is bounded below by the area of triangle with vertices (TPR 0 , TPR 0 ), (1, TPR 0 ), and (1, 1) (see Figure 1b): Finally, an ROC curve might cross the chance line having a hook at the upper-right corner (see Figure 1c), which corresponds to grades of discriminatory accuracy worse than that of chance alone [37,38]. Thus, if there exists at least an x ≥ FPR 0 such that NLR(x) > max{1, NLR 0 }, then a positive lower bound of the pAUC cannot be found: Therefore, the pAUC above a pre-selected sensitivity threshold TPR 0 of any diagnostic test can be classified in one of these three types based on the partial boundary of its NLR, providing fitter bounds to be used for building the FpAUC index.

The Fitted Partial Area Index: FpAUC
In order to summarise the diagnostic performance in the horizontal band (TPR 0 , 1), the pAUC in (1) might be straight scaled by dividing it by the upper bound given in (2), which is the interval length of high sensitivity. Thereby, Jiang et al. [23] introduced the N pAUC for highly sensitive diagnostic tests, which is mathematically expressed as follows: This normalisation satisfies the two first characteristics mentioned in the introduction. As is easy to see from (8), the N pAUC becomes identical to the entire area when TPR 0 = 0. It may be interpreted as an average specificity value of the diagnostic marker over all values of TPR between TPR 0 and 1 when such a marker is used to provide the high sensitivity range of practical interest. However, despite the fact that the value of the N pAUC is bounded above by 1, its lower bound (1 − TPR 0 )/2 can have values of less than 0.5 for any classifier whose pAUC is less than the half area of the horizontal band (TPR 0 , 1). Furthermore, the N pAUC index might still poorly compare diagnostic performances when two ROC curves cross each other over the same high sensitivity range, inasmuch as two portions of ROC curves may differ in shape but encompass the same pAUC value, reporting the same N pAUC value.
For illustrative purposes, let us suppose a clinical task demanding a high sensitivity, TPR 0 = 0.8, such as the discovery of new biomarkers for the detection of breast cancer in vast clinical samples. Amongst some diagnostic classifiers, there are two suitable candidates with the same performance for that sensitivity threshold, i.e., with the same pAUC value, A 0.8 = 0.142298. Furthermore, their respective performances are described by the ROC curves that cross the minimum sensitivity level at FPR 0 = 0.1233548 and FPR 0 = 0.2362306, respectively. Figure 2 displays these ROC curves for highly sensitive diagnostic tests, from among others with the same pAUC above the pre-selected sensitivity threshold TPR 0 = 0.8, which correspond to the conventional binormal model with the following binormal parameters: a = 2 and b = 1 for ROC 1 ; and a = 3.4070515591 and b = 3.5706342338 for ROC 2 . The N pAUC index provides the same value for both ROC curves, A 0.8 = 0.711491. Thus, it is not appropriate for classifier comparison in such scenarios, since it is not sensitive for determining the best performance diagnostic test. Clearly, a new partial area index is necessary to assist in the identification of key biomarkers for biomedical decision making. One alternative for measuring the discriminatory performance of the highly sensitive classifiers is to use a novel pAUC index in the interval of practical interest. To do that, we propose the following transformation of the A TPR 0 : where min A TPR 0 and max A TPR 0 are the fitter lower and upper bounds of A TPR 0 , respectively. The FpAUC index given by (8) reaches its minimum value of 1/2 when A * TPR 0 = min A TPR 0 , and its maximum value of 1 when A * TPR 0 = max A TPR 0 . Furthermore, the characteristics mentioned in the introduction are satisfied, since it becomes identical to the entire area when TPR 0 = 0 for classifiers with better-than-chance performance, and can be interpreted as an average specificity value of a diagnostic test for all sensitivity values above TPR 0 . In addition, the proposed FpAUC index is mathematically well motivated and defined for any ROC curve shape from their corresponding fitter bounds derived in Section 2.1. Hence, Algorithm 1 is provided to compute the FpAUC value in the high sensitivity threshold TPR 0 according to the ROC curve shape with respect to its NLR over (TPR 0 , 1).

Algorithm 1 Computing the FpAUC value.
1: Set the high sensitivity threshold TPR 0 .
It is also worth remarking that the novel FpAUC index involves both aspects of diagnostic performance represented in the restricted portion of an ROC curve, FPR and TPR.
Regarding the capacity of distinguishing between two or more classifiers, the FpAUC index is more sensitive than the N pAUC. In practice, two ROC curves might cross at a point where the sensitivity is higher than TPR 0 , as shown in Figure 2, but they could encompass the same pAUC in the horizontal band (TPR 0 , 1). As in the above example, from the N pAUC index, both curves cannot be compared above the pre-specified sensitivity threshold 0.8, but this can be done with the FpAUC index because it always emphasises the performance differences for highly sensitive diagnostic tests.
In particular, for the above case study, the pAUC transformed to A * 0.8 reaches the value of 0.811606 for ROC 1 , and the value is A * 0.8 = 0.931551 for ROC 2 . In other words, the proposed FpAUC index allows us to unambiguously compare both diagnostic performances from the same sensitivity threshold. Thus, it might help with choosing the best tests for biomedical decision making, since the diagnostic test evaluated by the second ROC curve has more relevance when a minimum sensitivity TPR 0 = 0.8 is clinically demanded; i.e., when its average specificity value is higher than the other in this region.
In Figure 2, it is clearly shown that ROC 1 has a higher AUC value than ROC 2 does, namely, AUC ROC 1 = 0.921350 and AUC ROC 2 = 0.820908, which might drive one to discard the classifier ROC 2 . However, the reported FpAUC values revealed that the classifier ROC 2 performs much better than ROC 1 in the horizontal band (0.8, 1).
Similarly to [39] for pAUC, it was found that the FpAUC as a metric is different from AUC, because a classifier with a higher value of AUC does not necessarily lead to a higher value of FpAUC.

Performance of the Estimate of the F pAUC Index
The performance of the FpAUC index was assessed under the assumption of binormal ROC models, by the variance of the FpAUC estimator, and also through simulation studies in all plausible scenarios. The binormal model plays an important role in the signal detection theory for continuous classifiers [1], and is one of the most popular parametric models in ROC-based analyses, since it is obtained from the normality assumption of both groups, diseased or healthy subjects, or a monotone transformation of them [5,40]. In addition, the binormal ROC model involves a wide variety of possible curve shapes, proper curves, and improper curves crossing the chance line either at the upper-right corner or at the lower-left corner, which enables us to describe the performance in different situations.

Variance of the FpAUC Estimator
The variance of the FpAUC estimator in the horizontal band (TPR 0 , 1) can be obtained by using the first-order Taylor series approximation under the assumption of a parametric model, also known as the delta method (see among others [5,23,24,26]), or by using nonparametric resampling methods such as bootstrapping, an application of which to publicly available datasets is shown in the next section.
As aforementioned, the two-parametric binormal model is assumed to analyse the stochastic behaviour of the FpAUC estimate and its variance. Concretely, the binormal model is derived from the assumption that the classifier scores are normally distributed in the group of healthy subjects, X 0 ∼ N(µ 0 , σ 2 0 ), and in the group of the disease subjects, X 1 ∼ N(µ 1 , σ 2 1 ). Thus, the ROC curve for normally distributed test scores can be written as ROC(x) = Φ(a + bΦ −1 (x)), where Φ represents the standarised normal cumulative distribution function; a = (µ 1 − µ 0 )/σ 1 and b = σ 0 /σ 1 . Analogously, it can also be expressed by ROC −1 (y) = Φ Φ −1 (y) − a /b . It is named the binormal ROC curve, having parameters a and b, and without loss of generality, it can be assumed µ 0 = 0, σ 0 2 = 1 and µ 1 ≥ 0 due to the invariance of the ROC curve under strictly increasing transformations of the classifier.
Under this theoretical framework, the pAUC above a pre-specified sensitivity threshold TPR 0 given in (1) can be rewritten as where the last equality follows by substituting t = Φ −1 (y). Moreover, taking into account , the former equation can be expressed as where φ B and Φ B are the density and cumulative distribution functions of the standard bivariate normal model with correlation coefficient ρ, respectively. An equivalent expression was used in [23]. Therefore, for an admissible minimum TPR 0 , the partial area estimate A TPR 0 can be computed through (12) using the maximum likelihood estimates of the binormal ROC curve parameters, a and b, as functions of the estimated mean and variance for the healthy and disease groups, respectively, a = ( µ 1 − µ 0 )/ σ 1 and b = σ 0 / σ 1 (see among others [2,3]). Hence, the FpAUC can be estimated from A TPR 0 by using the expressions (9)-(11) given in Algorithm 1 according to the ROC curve shape with respect to its NLR in the horizontal band (TPR 0 , 1).
For any high sensitivity range (TPR 0 , 1), the variance of the FpAUC estimate for the fitted binormal ROC curve can be approximated by using the delta method as follows: where and n 0 and n 1 are the sample sizes of the healthy and disease groups, respectively, (e.g., see [5]). Therefore, the partial derivatives of the FpAUC estimator with respect to a and b are required to compute the variance (13) in the three cases, (9)-(11), established by Algorithm 1, which are represented for some particular binormal ROC models in Figure 3. To do that, we also need to calculate the partial derivatives of the pAUC given in (12): In the first case, when the binormal ROC curve has partially bounded NLR in the horizontal band (TPR 0 , 1), i.e., NLR(x) ≤ NLR(FPR 0 ) for all x ≥ FPR 0 , the FpAUC estimator can be written in terms of the parameters a and b, by substituting (12) into (9), as where g 0 (a, b) = a−Φ −1 (TPR 0 ) b , and hence, its partial derivatives can be expressed as which enable us to compute the variance (13).  In the second case, the binormal ROC curve does not have a partially bounded NLR, but it is above the chance line, i.e., NLR(x) ≤ 1 for all x ≥ FPR 0 , and there exists at least an x > FPR 0 such that NLR(x ) > NLR 0 . Here, the FpAUC estimator can be written in terms of a and b parameters by substituting (12) into (10), as whose partial derivatives with respect to the parameters a and b can be written as 0 (a, b))) .
Finally, in the third case, the NLR of the binormal ROC curve cannot be upper bounded in the horizontal band (TPR 0 , 1), and thus, by substituting (12) into (11), the FpAUC estimator can be written in terms of a and b as and then, its variance can be computed from (13) by using the following partial derivatives with respect to the parameters a and b: a, b)) .
In order to illustrate the stochastic behaviour of the FpAUC estimate and its variance, Figure 3 displays examples of binormal ROC models, including each one of possible curve shapes: concave ROC curves for b = 1 (Figure 3d-f), improper ROC curves crossing the chance line in the upper-right corner for b = 0.5 < 1 (Figure 3a-c), and improper ROC curves crossing the chance line in the lower-left corner for b = 2 > 1 (Figure 3g-i).
For each value of b, five binormal ROC curves with AUC values of 0.55, 0.65, 0.75, 0.85, and 0.95 were considered, and consequently, the parameter a = 1+b 2 [10]. The three graphics on the left column (Figure 3a,d,g) depict the behaviour of the FpAUC estimates (14)-(16) as a function of high sensitivity threshold TPR 0 . As is shown in Figure 3g for b > 1, the binormal ROC curves have a hook at the beginning, causing a change in the boundary of the NLR above TPR 0 , whereas this is not the case for b ≤ 1. The remaining six graphics on the central and right columns display the behaviour of the variances of the FpAUC as functions of TPR 0 . Obviously, (13) depends on the sample sizes assumed for the healthy and disease groups, n 0 and n 1 , respectively. Thus, we have considered two different settings. The central column shows Figure 3b,e,h for n 0 = n 1 = 50, and the right column corresponds to Figure 3c,f,i for n 0 = n 1 = 500. In general, all variance estimates suggest relatively good accuracy by the FpAUC index, since they are very small and tend to 0 as the high sensitivity range increases. In particular, this behaviour is also shown for b > 1 in Figure 3h,i, although the hook at the beginning produced a discontinuity point due to the change of the NLR boundary.

Simulation Studies
Through a set of simulation studies, the performance of the FpAUC estimates was assessed in terms of biases, standard deviations, and percentile confidence intervals (CI), proving the operating properties of the proposed FpAUC index, such as its robustness and feasibility, even when the fitted ROC curve has hooks and/or crosses the chance line over a high sensitivity range.
Similarly to the simulation studies in [5,26], test scores both for healthy (X 0 ) and diseased (X 1 ) subjects were generated from normal distributions with parameters set appropriately to obtain binormal ROC curves: AUC = 0.55, 0.65, 0.75, 0.85, and 0.95; and b = 0.5, 1, 2, and 3. Such settings for parameter b allowed us to analyse the different shapes of the underlying binormal ROC curve, concave ROC curves (b = 1), and improper ROC curves (b = 1), including curves crossing the chance line with a hook at the upper-right corner (b < 1) and with a hook at the lower-left corner (b > 1).
Within each one of the simulation scenarios, empirical means and standard deviations were computed from the 10, 000 estimations of the FpAUC index, for five high sensitivity thresholds TPR 0 = 0.9, 0.8, 0.7, 0.6, and 0.5. Biases of the FpAUC estimates are also reported. Furthermore, the percentile method was applied to construct the 95% CI for the FpAUC value, by taking the 2.5% trimmed ranges of each 10, 000 estimations.
For the sake of brevity, Table 1 displays the results corresponding to AUC values equal to 0.75, 0.85 and 0.95 and b values equal to 0.5, 1, and 2, which were obtained from the simulation study for n = 100. The simulation results for n = 1000 can be found in Appendix A Table A1. Full tables are available in the Supplementary Materials.
For both sample sizes, n = 100 and n = 1000, simulation results displayed in Tables 1 and A1 agree with the ones depicted in Figure 3. For all the 10, 000 simulated random samples in each setting, the FpAUC index was always applicable, including the scenarios in which the fitted ROC curves had hooks and crossed the chance line. In general, the stochastic behaviour of the FpAUC estimates over each high sensitivity range was similar for both sample sizes. The biases of the FpAUC estimates remained relatively stable and smaller than standard deviations. For the fitted ROC curves with high global accuracy (AUC ≥ 0.85), standard deviations and widths of the 95% CIs tended to decrease as the sensitivity threshold decreased; i.e., the precision of the FpAUC index increased as the high sensitivity range increased. However, for fitted curves with poor global accuracy (AUC ≤ 0.65), standard deviations and widths of the 95% CIs slightly increased as the sensitivity range increased, although remaining relatively small; i.e., the precision of the FpAUC index smoothly decreased as the sensitivity threshold decreased. In summary, the simulation studies showed reliable behaviour from the FpAUC index, making it a relatively accurate metric with which to evaluate diagnostic performance over a high sensitivity interval. Table 1. Simulation results from 10, 000 random samples with size n = 100 for each binormal ROC model. The first two columns correspond to the settings of each scenario, which were used to compute the FpAUC estimates for each TPR 0 , and to summarise its mean, bias, standard deviation, and 95% CI.

Applications to Genomic Data
To further examine the performance of the proposed FpAUC index for highly sensitive diagnostic tests, we analysed three experimental genomic datasets that are publicly available.
Before listing the results obtained, brief descriptions of the high-dimensional datasets are given next.

Ovarian Cancer Data
This dataset is concerned with the search for biomarkers of ovarian cancer in population screening [41], available at http://research.fhcrc.org/diagnostic-biomarkers-center, accesssed on 20 February 2021. Basically, it consists of mRNA expression, using glass array spotted for 1536 gene clones on 53 ovarian tissue samples, 23 healthy controls, and 30 cases with ovarian cancer.

Acute Leukaemia Data
The leukaemia dataset was studied to suggest the gene expression monitored by DNA microarrays for the diagnostic of two leukaemia types [42]: acute lymphoblastic leukaemia (ALL) and acute myeloid leukaemia (AML). The dataset consists of 72 patients (45 ALL, 27 AML) profiled on an early Affymetrix Hgu6800 chips in 7129 gene expressions (Affymetrix probes). The dataset is available in the Bioconductor package "golubEsets" [43] and the genes were labelled by using the Bioconductor annotation package "hu6800" [44].
Furthermore, 70, 803 (1.24%) out of 5, 730, 981 pairs of ROC curves reported the same pAUC over the high sensitivity range (0.9, 1). As examples of them, the genes U57721_at and X07743_at were chosen to illustrate the usefulness of our proposed FpAUC index (Figure 4b).

Colon Cancer Data
This colon cancer dataset consists of the expression levels of 2000 genes from 62 tissue samples (40 colon cancer and 22 normal tissues) analysed with an Affymetrix oligonucleotide Hum6000 array [46]. This dataset is publicly available in the R package "plsgenomics" [47].

Experimental Results
Nonparametric bootstrap resampling method [48] was applied to estimate the bias and standard deviation of the empirical FpAUC and its 95% bootstrap CI. These statistics were computed using 10, 000 bootstrapped replicates for TPR 0 = 0.5, 0.6, 0.7, 0.8, and 0.9.
For the two genes chosen from each dataset, Table 2 displays the FpAUC estimates over the high specificity range (TPF 0 , 1), along with biases, standard deviations, and the 95% CIs generated by bootstrap resampling. The calculation was carried out by using the R package "boot" [49]. Notice that the empirical ROC curves were not smooth and presented hooks in the middle (Figure 4), which might explain the slight jumps in the estimates due to the changes in the NLR boundary with varying the horizontal band (TPR 0 , 1).
In general, biases of the FpAUC estimates remained relatively stables and small for the chosen genes in the three datasets. For the ovarian cancer dataset, both standard deviation and width of the 95% CI of the FpAUC decreased as the high sensitivity range increased, i.e., the precision of the index increased as the threshold TPR 0 decreased. The slight difference at TPR 0 = 0.9 was provoked by the truncation of the CI at 1. For the leukaemia dataset, standard deviation decreased as the TPR 0 decreased for both genes, the width of the 95% CI also tended to decrease as the high sensitivity range increased, although showing a small drop for high TPR 0 values in both cases. For the colon cancer dataset, standard deviation also tended to decrease as the high sensitivity range increased, but showed a slight rise, reaching TPR 0 = 0.5. The width of the 95% CI for the gene Hsa.40063 decreased while the TPR 0 decreased, and it also presented a small increase at TPR 0 = 0.5 for the gene Hsa.549.
As aforementioned, each graphic displayed in Figure 4 corresponds to the empirical ROC curves of test scores of two genes with the same pAUC value in the horizontal band (0.9, 1). These applications also enabled us to illustrate the practical usefulness of the FpAUC index for solving such pAUC ties between competitive biomarkers in a high sensitivity range. Concretely, test scores of the genes g1243 and g1526 for the detection of ovarian cancer (Figure 4a) reported a pAUC value of A 0.9 = 0.06376812 for the sensitivity threshold TPR 0 = 0.9. Thus, the N pAUC index provided the same value for both ROC curves A 0.9 = 0.6376812, which did not allow us to discriminate between both genes. However, the FpAUC index found different diagnostic performances, since the gene g1243 (A * 0.9 = 0.8627451) reached a slight better performance than the gene g1526 (A * 0.9 = 0.8585323) in the high sensitivity range (0.9, 1). Regarding the identification of two leukaemia types, Figure 4b depicts the ROC curves for the genes U57721_at and X07743_at, both of them achieved a pAUC of 0.06395745, and so the N pAUC neither could differentiate their diagnostic accuracy in the high sensitivity range (0.9, 1). In contrast, the FpAUC values were A * TPR 0 = 0.8857143 and 0.7948718, respectively, and consequently, U57721_at was better than X07743_at for identifying between the two leukaemia types above the threshold TPR 0 = 0.9. Analogously, the empirical ROC curves represented in Figure 4c for the genes Hsa.3331 and Hsa.40063 obtained the same pAUC value of A 0.9 = 0.02840909, and the same N pAUC value of A 0.9 = 0.2840909. However, their FpAUC values were different, A * TPR 0 = 0.7362385 and A * TPR 0 = 0.78125, respectively, and then the FpAUC index detected that Hsa.40063 was a bit better marker of the colon cancer than Hsa.3331 when the high sensitivity range (0.9, 1) is required.

Discussion and Conclusions
The development of the high-throughput technologies has allowed researchers and practitioners to simultaneously input hundreds of markers in the identification stage of those which are key for diagnosing. Addressed commonly through AUC-based analyses, the costs associated with misdiagnosed samples have encouraged the evaluation of the discriminatory power of the marker performance to be restricted to a clinically meaningful range, by using refined metrics such as the scaled pAUC indexes. Enhancing the interpretation of the outcomes for pAUC analysis, these performance measures are currently gaining popularity in bioinformatics [29,[50][51][52][53]. One of these meaningful approaches is the N pAUC provided in [23] which is focused on highly sensitive diagnostic tests. Nevertheless, it presents some limitations. This performance metric might turn out to not be useful for interpreting the pAUC, since the N pAUC value might be less than 0.5. Moreover, it was found from empirical studies that it is unable to distinguish between two crossing ROC curves with equal pAUC values in the high sensitivity range of interest (TPR 0 , 1), resulting in unsolved ties until now.
The main contribution of this work is to provide a new scaled pAUC index, the fitted pAUC index (FpAUC), to assess the diagnostic performance for highly sensitive markers, addressing the issues associated with the N pAUC. The proposed metric is based on deriving new bounds of pAUC fitter than those involved in the transformation of the pAUC into the N pAUC, in order to efficiently handle situations in which the ROC curve lies below the chance line and/or has hooks. Under different assumption sets which included all the possible ROC curve shapes, such suitable bounds have been discussed in terms of the partial boundary of the NLR in the range of interest (TPR 0 , 1). Further, we have provided a comprehensive framework for the evaluation of the marker's discriminatory power in a high sensitivity interval (TPR 0 , 1), computing the FpAUC index through an algorithm applicable for any ROC curve.
In contrast to the N pAUC, the proposed FpAUC index varies within the range of 0.5 and 1, restoring the property that a summary metric should have a suitable interpretation. Furthermore, we have proven that the FpAUC is also capable of distinguishing between two or more crossing ROC curves with the same pAUC values in the horizontal band (TPR 0 , 1). Thus, the proposed FpAUC extends the N pAUC filling in an important literature gap, which might well have driven to discard highly informative biomarkers over a high sensitivity range.
The performance of the novel FpAUC index has been examined by simulation and case studies using three real-world publicly datasets concerning the diagnosis of leukaemia, and ovarian and colon cancers. Under the binormal ROC curve assumption, the variance was calculated for analysing the behaviour of the FpAUC estimates. In addition, test scores were generated guaranteeing the presence of all the possible shapes of the underlying binormal ROC curve, i.e., both concave ROC curves (b = 1) and improper ROC curves (b = 1). The results reported that the performance of the FpAUC was consistent across all the settings. A similar conclusion was deduced from experimental results. In addition, the practical usefulness of the FpAUC was illustrated for solving ties between the pAUC measurements of biomarkers in a high sensitivity range for each one of the genomic datasets.
It is this ability for discriminating highly sensitive biomarkers which encourages us to continue further studying inferential issues, and developing an R package to assist users in the identification of key biomarkers for biomedical decision making.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/math9212826/s1. Tables S1 and S2 summarise the full results of the simulation studies for both sample sizes, n = 100 and n = 1000, which can be downloaded from SupplMat-Tables-Simulation-HighSen.pdf. The R code file with the FpAUC function and the required internal functions for its computing can be downloaded from SupplMat_Rcode_FpAUC_in_TPR0.pdf. Data Availability Statement: Publicly available datasets were used in this work. The gene expression array dataset used for biomarkers of ovarian cancer can be found in https://research.fredhutch.org/ diagnostic-biomarkers-center/en/datasets.html, (accessed on 20 February 2021). For the other applications, datasets are available in R packages "golubEsets" [43] and "plsgenomics" [47], respectively.

Acknowledgments:
The authors would like to thank the anonymous reviewers for their comments and suggestions, which have improved our manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: