Abstract
The receiver operating characteristic (ROC) curve is a valuable statistical tool in medical research. It assesses a biomarker’s ability to distinguish between diseased and healthy individuals. The area under the ROC curve () and the Youden index (J) are common summary indices used to evaluate a biomarker’s diagnostic accuracy. Simultaneously examining and J offers a more comprehensive understanding of the ROC curve’s characteristics. In this paper, we utilize a semiparametric density ratio model to link the distributions of a biomarker for healthy and diseased individuals. Under this model, we establish the joint asymptotic normality of the maximum empirical likelihood estimator of and construct an asymptotically valid confidence region for . Furthermore, we propose a new test to determine whether a biomarker simultaneously exceeds prespecified target values of and with the null hypothesis or against the alternative hypothesis and . Simulation studies and a real data example on Duchenne Muscular Dystrophy are used to demonstrate the effectiveness of our proposed method and highlight its advantages over existing methods.
Keywords:
AUC; bootstrap method; confidence region; density ratio model; empirical likelihood; Youden index MSC:
62G10; 62G15
1. Introduction
The ROC curve is a valuable statistical tool in medical research for evaluating the performance of binary classifiers across different thresholds. It finds wide applications in fields like radiology, oncology, and genomics [1,2]. In medical studies, ROC curves are particularly useful when evaluating a continuous biomarker to classify individuals as diseased or healthy. Graphically, the ROC curve plots sensitivity (the proportion of true positive) versus one minus specificity (the proportion of false positive) at all possible biomarker thresholds. Extensive and in-depth research has delved into the intricate realm of statistical inferences associated with ROC curves, offering valuable insights and understanding of how these curves are used to evaluate the performance of classification models. For a detailed review, refer to [3,4,5,6].
Let denote the cumulative distribution function (CDF) of the healthy population and denote that of the diseased population. Without loss of generality, let us assume that biomarker values are higher in the diseased group than in the healthy group, and an individual is classified as diseased when their biomarker value exceeds a given threshold (x). Under this assumption, the sensitivity is , and the specificity is . The ROC curve is then given by
for , where .
In ROC analysis, two common summary indices are used to assess a biomarker’s diagnostic accuracy: the [7,8] and the Youden index (J) [9,10,11]. They are defined mathematically as
where is the “optimal” threshold. By definition, the summarizes the overall performance of a classifier across all possible thresholds. While valuable, it does not directly provide an “optimal” threshold. On the other hand, J is the maximum value of the sensitivity plus the specificity minus 1. Not only does J quantify the biomarker’s effectiveness (with a value of indicating complete separation of the biomarker’s distributions for diseased and healthy populations, and indicating complete overlap), but it also offers a distinct advantage over by providing a criterion for selecting the “optimal” threshold c. However, J only measures the diagnostic accuracy at the “optimal” threshold c and not at other thresholds.
In practical scenarios where medical practitioners encounter multiple biomarkers, they often use the to choose the most diagnostically useful biomarker [12,13]. However, relying solely on has limitations. The biomarker with the highest might not have the best overall accuracy at the “optimal” threshold. Similarly, focusing only on the Youden index selects the biomarker with the highest total accuracy at the “optimal” threshold. But this "best" biomarker by the Youden index may not perform well overall. If the threshold changes, the biomarker may no longer maintain satisfactory diagnostic accuracy. For real examples and further discussions, refer to [1,14]. In summary, both the and the Youden index are valuable tools for evaluating a biomarker’s effectiveness, each emphasizing distinct aspects of its performance. Simultaneously examining and J, which provide complementary information, may help us make better decisions [1]. This motivates us to develop joint inference procedures for and J in this paper.
In the literature, ref. [1] considered both parametric and nonparametric methods for constructing confidence regions of and J. Later, ref. [2] proposed both parametric and nonparametric tests to determine if a biomarker exceeds predefined target values with hypotheses versus . For the parametric inference procedures, it is assumed that the original biomarkers or the biomarkers after the Box–Cox transformation follow normal distributions in both the healthy and diseased groups. For the nonparametric inference procedures, the empirical CDF or kernel method is used to estimate and .
Generally, parametric joint inference procedures are highly efficient when the underlying parametric models are correct. This means that the resulting confidence region for has a smaller area, and the joint testing procedure has greater power. However, these procedures may not be robust to misspecification of the models for and . See Section 4 for more details. On the other hand, nonparametric methods are free from assumptions about the models of and . In medical research, it has been observed that healthy and diseased populations often share certain common characteristics [4,15,16,17]. However, fully nonparametric methods ignore this information, potentially leading to inefficient inference procedures.
In this paper, we develop new semiparametric joint inference procedures for based on a semiparametric density ratio model (DRM; refs. [18,19,20]), which effectively utilizes information from both healthy and diseased populations. Let and be the probability density functions of and , respectively. The DRM assumes that
where is a prespecified, p-variate vector-valued nontrivial function of x, , and are unknown parameters. The unspecified baseline distribution makes DRM a semiparametric model. This flexibility allows DRM to encompass many distributions commonly used in studying ROC curves [21]. For instance, if we set to , the DRM encompasses the lognormal distributions (with equal variance on the log scale) and the beta distributions (sharing the same power parameter for ). Similarly, setting to x, it includes the normal distributions with the same variance and exponential distributions. The DRM has a close relationship with the logistic regression model. To illustrate this point, let us define D = 0 and 1 as indicators for individuals from the healthy and diseased populations, respectively. As shown by [18,19], the DRM is equivalent to the logistic regression model through the following equation:
where .
The DRM has proven itself as a valuable tool for inference on ROC curves and their summary indices [4,15,17,22]. Existing theoretical and numerical studies have shown that point estimators of and J under the DRM are more efficient than fully nonparametric estimators. However, as far as we are aware, semiparametric joint inference procedures for , such as confidence regions and joint hypothesis testing procedures, remain uninvestigated under the DRM (2). This paper aims to fill this gap.
Our contributions are three-fold. First, we establish the joint asymptotic normality of the maximum empirical likelihood estimator (MELE) of under the DRM (2). This allows us to construct an asymptotically valid Wald-type confidence region for . We further propose a nonparametric bootstrap procedure to improve the coverage accuracy of the Wald-type confidence region. Second, we develop a joint testing procedure for the null hypothesis: versus the alternative hypothesis: . We introduce a novel bootstrap procedure to obtain its p-value. Finally, we evaluate the performance of our proposed methods through simulation studies and application to real data on Duchenne Muscular Dystrophy. The numerical studies demonstrate that the proposed method produces more precise confidence regions for with smaller areas. Additionally, the newly proposed joint testing procedure maintains controlled type-I error rates while achieving satisfactory power.
The rest of the paper is structured as follows. In Section 2, we introduce the maximum empirical likelihood estimator (MELE) of and prove its joint asymptotic normality. Section 3 details the proposed joint inference procedures. This includes constructing confidence regions and conducting joint hypothesis tests for . Section 4 presents simulation results and Section 5 contains a real application. A summary and discussion of the findings are given in Section 6 and Section 7, respectively.
2. Methodology
Let and denote independent random samples from the healthy and diseased populations, respectively. We define a combined sample of size by setting for and for .
2.1. Maximum Empirical Likelihood Estimators of AUC and J
We begin by developing the empirical likelihood (EL) function. By the EL principle [23] and under the DRM (2), the likelihood function based on the observed data is
where for , and they satisfy
The MELEs of , denoted as , are defined as the maximizer of subject to the constraints in (3).
Let . Following [19,24], we obtain the MELE of by
where
is the dual empirical log-likelihood function.
Once we have , we calculate the MELEs of as
Subsequently, the MELEs of and are given by
where is the indicator function.
Recall the definition of in (1). It can be verified that
The MELE of is then given by
Again, recall the definition of the Youden index J in (1). Then, the optimal threshold c should satisfy [17]. This is equivalent to
With , the MELE of c solves
The MELE of J is then defined as
2.2. Joint Asymptotic Normality of
In this section, we establish the joint asymptotic normality of . We begin by introducing some notation. Let denote the true value of ,
and
In Lemma A2 of Appendix A, we show that
where the detailed form of is given in Lemma A2.
We denote the true values of , Youden index J, and optimal threshold c as and , respectively. The asymptotic results in this section rely on the following regularity conditions.
- C1.
- for any .
- C2.
- and are continuous in the neighborhood of , with and .
- C3.
- The total sample size , and remains constant.
- C4.
- The DRM (2) is satisfied by and . Additionally, is positive definite, and for in a neighborhood of ,
We note that Conditions C1 and C2 are from [25]. These conditions ensure the identifiability of . Condition C4 ensures that the components of are linearly independent under both and . Conditions C3 and C4 guarantee the asymptotic normality of .
The following theorem establishes the joint asymptotic normality of . The proof is provided in Appendix A.
Theorem 1.
Suppose Conditions C1–C4 are satisfied. As the total sample size , we have
in distribution, where with
where
Remark 1.
Our method relies on the DRM assumption (refer to Equation (2)). To assess the validity of this model assumption in practice, we can use goodness-of-fit test statistics proposed by [19]:
and apply the bootstrap method to perform the test. Here,
It can be shown that . Therefore, the test results based on and are equivalent to each other. Consequently, we only need to consider one statistic, for example , for practical applications.
3. Joint Inference Procedures for under the DRM
3.1. Confidence Region of
The variance–covariance matrix in Theorem 1 depends on . Replacing these by their MELEs leads to a variance estimator
It can be easily shown that is consistent with .
Theorem 2.
Under the conditions of Theorem 1, it follows that in probability as .
For notational convenience, let , , and . Building upon the asymptotic results in Theorems 1 and 2, we conclude that
in distribution as . Therefore, a th asymptotic Wald-type confidence region for is
where represents the quantile of the Chi-square distribution with two degrees of freedom. Our simulation results in Section 4 demonstrate that this approach yields liberal confidence regions when sample sizes are insufficiently large. To enhance coverage accuracy, we propose a bootstrap method. Throughout the subsequent discussion, we denote quantities derived from the l-th bootstrap sample with the subscript “”.
- Step 1.
- Calculate and the corresponding based on the observed data and .
- Step 2.
- For , draw a bootstrap sample of size with replacements from and another bootstrap sample of size with replacements from .
- Step 3.
- For , based on the lth bootstrap two-sample data in Step 2, calculate the estimate for and the corresponding for , and compute
- Step 4.
- Obtain the th quantile of , which is denoted as .
- Step 5.
- The bootstrap confidence region of is given by
In our simulation study, we set . The resulting bootstrap confidence region for offers improved coverage accuracy, as will be discussed in Section 4.
3.2. Joint Hypothesis Testing on
In this section, we examine a testing procedure to determine whether a biomarker simultaneously exceeds prespecified target values of and . We define the hypotheses as follows:
It is important to note that the null hypothesis represents a multivariate order-restrictive hypothesis within a non-convex space.
Our testing procedure is motivated by the results in Proposition 1. The proof is provided in Appendix B.
Proposition 1.
Suppose that is a bivariate normal random vector with unknown mean and known variance–covariance matrix . Let
- (a)
- The maximum likelihood estimator of μ in based on is given by
- (b)
- The likelihood ratio test statistic for testing versus iswhere is the positive part of x.
Define
With the asymptotic results in Theorems 1 and 2, we propose, according to Proposition 1 (b), to test (8) based on the following test statistic
We reject the null hypothesis in (8) when exceeds a critical value.
The distribution of depends on the true null model, which is unknown. Motivated by Proposition 1, we suggest a bootstrap procedure based on for the hypothesis testing problem in (8). For ease of presentation, let and denote the MELEs of and subject to the constraint that the is fixed at . Similarly, let and denote the MELEs of and subject to the constraint that the Youden index J is fixed at . Their numerical calculations will be discussed in Appendix C. Let a be the significance level.
- Step 1.
- Calculate the test statistic based on the observed data and .
- Step 2.
- For , generate l-th boostrap two-sample data as follows:
- (a)
- If or , draw a bootstrap sample of size from and another bootstrap sample of size from .
- (b)
- If , , and , draw a sample of size from and another sample of size from .
- (c)
- If , , and , draw a sample of size from and another sample of size from .
- Step 3.
- For , calculate the test statistic based on the l-th bootstrap two-sample data in Step 2 (using the same method as in Step 1).
- Step 4.
- Calculate the p-value of as
- Step 5.
We note that Cases (a)–(c) in Step 2 correspond to the three cases outlined in Proposition 1(a). For instance, consider Step 2(b), in which , , and . By the second case of Proposition 1(a), we set the MELE of J to be under the null model in (8). Hence, we generate the bootstrap two-sample data from and .
In our simulation study, we set . The simulation results in Section 4 demonstrate that the bootstrap procedure effectively controls the type-I error.
4. Simulation Study
This section employs simulation examples to compare the finite-sample performances of our proposed joint inference procedures and some existing competitors.
4.1. Simulation Parameter Settings
We consider two distributional settings:
- (1)
- and ;
- (2)
- and .
Here, denotes the lognormal distribution with mean and variance , both with respect to the log scale; denotes the beta distribution with power parameters for x and being a and b, respectively.
We comment that Setting (1) corresponds to the case where the model assumption for the Box–Cox method is satisfied, while Setting (2) pertains to the case where this assumption is violated. In both cases, we consider three true values 0.3, 0.5, and 0.7 for the Youden index to cover low, moderate, and high levels of diagnostic accuracy. The details of the parameter settings are given in Table 1.
Table 1.
Simulation settings.
When specifying the parameter settings in Table 1, we first specify the parameter values (Columns 4 and 5) of and keep them fixed. Then, for each (Column 3), we choose the parameter values (Columns 6 and 7) of for the designated distribution (Column 1) such that the value is achieved. Finally, we calculate the corresponding (Column 2) under the pair of and . This approach of selecting parameter values has been used in [1,17,21].
Throughout this section, our proposed joint inference procedures use the correctly specified .
4.2. Simulation for Confidence Regions
We compare the proposed confidence regions for with four methods from [1] using simulation studies. The six methods are listed below:
- Empirical likelihood method (proposed in (6)), which is denoted as “EL”;
- Bootstrap empirical likelihood method with (proposed in (7)), which is denoted as “BEL”;
- Parametric Box–Cox asymptotic delta method, which is denoted as “AD”;
- Parametric Box–Cox generalized inference approach, which is denoted as “GPQ”;
- Nonparametric bootstrap confidence region, which is denoted as “BTI”;
- Nonparametric bootstrap confidence region with the arcsin-square-root transformation, which is denoted as “BTAT”.
We consider three different combinations of sample sizes: = (50, 50), (100, 100), and (150, 50). This results in nine combinations of parameters and sample sizes for each of the two distributional settings of and . We repeat the simulation 2000 times for each combination.
We now examine the behavior of the 95% confidence region of . The performance of a confidence region is evaluated by the coverage probability (CP) in percentage and area of the confidence region (ACR), which are computed as follows:
where is the confidence region of computed from the l-th simulation run, and is the area of the confidence region. The simulation results are presented in Table 2 and Table 3.
Table 2.
Summary of CP (%) and ACR (×100, in parentheses) of 95% confidence regions for under lognormal distributions.
Table 3.
Summary of CP (%) and ACR (×100, in parentheses) of 95% confidence regions for under beta distributions.
Table 2 presents simulation results at the nominal level of 95% under the lognormal distributional setting. The BEL method improves the coverage of the EL method at the cost of increasing the area size. Nonetheless, the proposed BEL method still demonstrates the most stable and comparable performance in almost all cases with CPs reasonably close to 95% in nearly all scenarios. As sample sizes become larger, the AD method generally overestimates the coverage probabilities, resulting in the largest area of confidence regions among the six methods. The GPQ method is generally comparable to the BEL method, maintaining satisfactory coverage probabilities with a relatively small area, especially when the true value of J is large. This is likely because the model assumption for the Box–Cox method is satisfied in this case. Both the BTI and BTAT methods underestimate the coverage probabilities, especially when the true value of J is 0.3, and they produce similar areas in many cases.
Table 3 presents simulation results for the beta distributional setting at the nominal level of 95%. The EL, BEL, BTI, and BTAT methods exhibit similar trends in the lognormal setting. However, the performance of the AD and GPQ methods is quite different. Both AD and GPQ methods struggle to accurately estimate the CPs when the true value of J is 0.5 or 0.7. The GPQ method exhibits particularly poor performance. For example, when the true value of J is 0.7 and , the CP of the GPQ method is only 39.05%.
In summary, when is correctly specified, and the Box–Cox model is satisfied, the BEL and GPQ methods have comparable performance and are better than other methods. When the Box–Cox model is not satisfied, the BEL method performs better than the parametric and nonparametric methods.
4.3. Simulation for Joint Hypothesis Testing
This section presents a simulation study comparing the performance of the proposed bootstrap joint test procedure with (denoted as “BELT”), introduced in Section 3.2, with two recommended joint test methods from [2]:
- Parametric bootstrap joint test method, which is denoted as “PBA”;
- Nonparametric kernel-smoothed-based joint test method, which is denoted as “NKS”.
All tests were carried out at the significance level . We consider three different combinations of sample sizes: = (50, 50), (75, 75), and (75, 50). For both lognormal and beta distributional settings, the model with is chosen as the null model and that with is chosen as the alternative model. The number of replications is 2000. The simulated type-I errors and powers of three tests are shown in Table 4 and Table 5.
Table 4.
Simulated type-I errors and powers of three tests for testing or versus and under lognormal distributions at the 0.05 significance level.
Table 5.
Simulated type-I errors and powers of three tests for testing or versus and under beta distributions at the 0.05 significance level.
The first block of Table 4 presents the simulated type-I errors of three tests under the lognormal distribution. We observe that all three methods maintain type-I error rates at 0.05, although the PBA and NKS methods tend to be conservative. The second block of Table 4 presents the simulated powers of the three tests. The proposed BELT method exhibits the largest power.
Table 5 presents the results for beta distributions. While BELT and NKS effectively control type-I errors, the PBA test’s errors significantly exceed 5%. When data come from the alternative model, BELT exhibits greater power compared to NKS.
In conclusion, the BELT method effectively controls type-I errors across both distributional settings (lognormal and beta) and demonstrates superior power compared to the nonparametric method.
5. Real Data Analysis
In this section, we evaluate the performance of the proposed methods using a dataset on Duchenne Muscular Dystrophy (DMD). DMD is a genetic disorder characterized by progressive muscle weakness and wasting. It is caused by mutations in the dystrophin gene, the largest human gene, located on the X chromosome (Xp21). DMD primarily affects males in early childhood. Interestingly, females with one copy of the mutated gene typically do not show symptoms. Therefore, identifying potential female carriers is crucial.
According to [26], individuals carrying the DMD gene mutation may not exhibit symptoms but often have elevated levels of specific biomarkers. The authors of [27] compiled a dataset encompassing four biomarkers: Creatine Kinase (CK), Hemopexin (H), Lactate Dehydrogenase (LD), and Pyruvate Kinase (PK). These biomarkers were measured in blood serum samples from a healthy control group ( = 127) and a group of DMD carriers ( = 67).
For illustration, we consider the biomarkers PK and H. We choose in the proposed methods for each biomarker. We perform the goodness-of-fit test suggested in Remark 1; the p-values based on 1000 bootstrap samples are 0.215 and 0.780 for PK and H, respectively. This suggests that the DRM in (2) with provides reasonable fits for both biomarkers PK and H.
Table 6 presents the point estimates (PEs) of and the ACRs for at the 95% confidence level based on the BEL, GPQ, and BTAT methods. We omit the results for the EL, AD, and BTI methods because, as shown in Section 4, the BEL method achieves better coverage than the EL method, the AD method has larger ACRs compared to the GPQ method, and the BTI method performs similarly to the BTAT method. Clearly, the BEL method gives the coverage region with the smallest area. As an illustration, we further plot the 95% confidence regions of for the biomarker H based on the BEL, GPQ, and BTAT methods in Figure 1, which demonstrates similar observations as in Table 6.
Table 6.
Point estimates (PEs) of and the ACRs(×100) for at the 95% confidence level based on the BEL, GPQ, and BTAT methods.
Figure 1.
The 95% confidence regions of for biomarker H in the DMD dataset based on the BEL (black solid), GPQ (red dashed), and BTAT (blue dot–dashed) methods. The BEL point estimate of is , which is indicated by the black hollow point at the center of the ellipse.
To illustrate the proposed joint test method BELT, we assess whether the biomarker PK simultaneously exceeds the prespecified target values of and simultaneously. These values represent the PE of based on the BEL method for biomarker H. Our BELT method with gives the p-value 0.022. This result provides strong evidence to reject the null hypothesis or at the 5% significance level. In contrast, applying both the PBA and NKS tests from [2] fails to reject the same null hypothesis. In conclusion, our BELT method provides stronger evidence against the null hypothesis or . This implies that the biomarker PK has better discriminatory ability than biomarker H in terms of both and the Youden index J.
6. Summary
In this paper, we proposed a bootstrap confidence region for and a bootstrap joint testing procedure for the hypothesis testing problem in (8) based on the MELE of . We conducted extensive simulations to evaluate the performance of our proposed semiparametric approaches. The results demonstrate that the BEL method accurately constructs confidence regions for with the desired coverage probability. Additionally, the proposed bootstrap testing method, BELT, consistently maintains the type-I error rate and exhibits satisfactory power compared to existing joint tests.
Theoretically, we established the joint asymptotic normality of the MELE of , providing the theoretical foundation for the proposed confidence region and joint testing procedure for . Practically, we developed R functions to implement the proposed methods, which are available in the Supplementary Materials.
To use the proposed methods, it is necessary to specify in (2). Common choices for include and . We recommend that practitioners first use the goodness-of-fit test described in Remark 1 to assess the suitability of a prespecified choice of . The R function for implementing the goodness-of-fit test for the DRM with commonly used is included in the Supplementary Materials. If practitioners do not have a suitable choice for , the nonparametric method outlined in Section 4 may be preferable.
7. Discussion
We observe that the proposed methods have the potential to be applied to other research problems. For example, we may extend them to compare paired or multiple markers based on both and the Youden index J [28]. This paper considers two-sample/two-group data only. Multiple sample/group data are also commonly seen [29], and we may explore the proposed methods in this scenario. Furthermore, although widely used, the has its limitations. A major drawback is that it summarizes the entire ROC curve, including regions that may not be directly relevant to clinical applications. To address this issue while retaining some of the ’s beneficial properties, one can use the partial area under the ROC curve (). Considering a clinically relevant range of false-positive or true-positive rates, the focuses on a specific portion of the curve [30,31,32,33,34]. We can extend the proposed method to study statistical inference for the . We leave these research problems for future investigation.
Supplementary Materials
The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/math12132118/s1.
Author Contributions
Conceptualization, S.L., Q.T., Y.L. and P.L.; methodology, S.L., Q.T., Y.L. and P.L.; software, S.L.; validation, Q.T., Y.L. and P.L.; formal analysis, S.L.; data curation, S.L.; writing—original draft preparation, S.L.; writing—review and editing, Q.T., Y.L. and P.L.; visualization, S.L.; supervision, Q.T., Y.L. and P.L.; project administration, Q.T., Y.L. and P.L.; funding acquisition, Q.T., Y.L. and P.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Key R&D Program of China (2021YFA1000100 and 2021YFA1000101), the National Natural Science Foundation of China (12171157 and 32030063), the 111 project (B14019), and the Natural Sciences and Engineering Research Council of Canada (RGPIN-2023-03479 and RGPIN-2020-04964).
Data Availability Statement
The R functions for implementing the proposed methods and the goodness-of-fit test for the DRM with commonly used , as well as the data supporting the findings of this study, are available in Supplementary Materials.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A. Proof of Theorem 1
Appendix A.1. Some Preparation
Recall that and . The following lemma is helpful for our subsequent calculation, in which we use to denote the expectation with respect to .
Lemma A1.
Let and be arbitrary functions of x such that the expectations below are all finite. Then, we have
Proof.
Equation (A2) can be similarly proved.
Next, we consider (A3). Note that
Equation (A4) can be similarly proved. □
Next, we rewrite the in (5) as
The next lemma presents the expectation and variance of .
Lemma A2.
Proof.
For , we show only that . The other parts can be verified using Lemma A1 directly. Recall that
Using (A2), we have that
For , we only verify
The other parts, again, can be similarly checked.
Note that
By Lemma A1, we have
and
This finishes the proof. □
Appendix A.2. Proof of Theorem 1
Proof.
Recall that the MELEs of are given by
Let
It follows that the MELEs of and are given by
The joint asymptotic normality of relies on linear approximations of . According to [4,17], we have
and
Using the weak law of large numbers and Lemma A1, we further obtain
Similarly, we obtain
Using the central limit theorem, Lemma A2, and Slutsky’s theorem, we have
in distribution, as claimed in Theorem 1. This finishes the proof. □
Appendix B. Proof of Proposition 1
Proof.
The log-likelihood function of based on , up to a constant not depending on , is
For (a), it is easy to see that if or , we have
Next, we concentrate on the cases when and . Denote
Then, and satisfy
It can be easily verified that
and
Therefore, when and , and further , equivalently, we have
When and , and further , equivalently, we have
This finishes the proof of Part (a).
For (b), the likelihood ratio test statistic for testing versus is
which is equal to
After some calculation, the likelihood ratio test statistic can be equivalently written as
This completes the proof of Part (b). □
Appendix C. Numerical Calculations of
For convenience, we write
where is the dual empirical log-likelihood function in (4). Further, let
and
which are the MELEs of and s with the fixed , respectively. Define
where .
For the fixed , the MELE of is
We use the following three steps to find and :
- Step 1.
- Find all s such that
- Step 2.
- Obtainwhere the maximization is over all s in Step 1.
- Step 3.
- Calculate and
Note that , and can be obtained similarly. For the fixed , we obtain by solving the following equation
With , the estimator of J for the fixed is defined as
We use the following three steps to find and :
- Step 1.
- Find all s such that
- Step 2.
- Obtainwhere the maximization is over all s in Step 1.
- Step 3.
- Calculate and
References
- Yin, J.; Tian, L. Joint confidence region estimation for area under ROC curve and Youden index. Stat. Med. 2014, 33, 985–1000. [Google Scholar] [CrossRef] [PubMed]
- Yin, J.; Mutiso, F.; Tian, L. Joint hypothesis testing of the area under the receiver operating characteristic curve and the Youden index. Pharm. Stat. 2021, 20, 657–674. [Google Scholar] [CrossRef] [PubMed]
- Pepe, M.S. Receiver operating characteristic methodology. J. Am. Stat. Assoc. 2000, 95, 308–311. [Google Scholar] [CrossRef]
- Qin, J.; Zhang, B. Using logistic regression procedures for estimating receiver operating characteristic curves. Biometrika 2003, 90, 585–596. [Google Scholar] [CrossRef]
- Zhou, X.H.; Obuchowski, N.A.; McClish, D.K. Statistical Methods in Diagnostic Medicine, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Chen, B.; Li, P.; Qin, J.; Yu, T. Using a monotonic density ratio model to find the asymptotically optimal combination of multiple diagnostic tests. J. Am. Stat. Assoc. 2016, 111, 861–874. [Google Scholar] [CrossRef]
- Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
- Faraggi, D.; Reiser, B. Estimation of the area under the ROC curve. Stat. Med. 2002, 21, 3093–3106. [Google Scholar] [CrossRef] [PubMed]
- Youden, W.J. Index for rating diagnostic tests. Cancer 1950, 3, 32–35. [Google Scholar] [CrossRef]
- Fluss, B.; Faraggi, D.; Reiser, B. Estimation of the Youden Index and its associated cutoff point. Biom. J. 2005, 47, 458–472. [Google Scholar] [CrossRef] [PubMed]
- Schisterman, E.F.; Perkins, N.J.; Liu, A.; Bond, H. Optimal cut-point and its corresponding Youden index to discriminate individuals using pooled blood samples. Epidemiology 2005, 16, 73–81. [Google Scholar] [CrossRef] [PubMed]
- Lavrentieva, A.; Kontakiotis, T.; Lazaridis, L.; Tsotsolis, N.; Koumis, J.; Kyriazis, G.; Bitzani, M. Inflammatory markers in patients with severe burn injury: What is the best indicator of sepsis? Burns 2007, 33, 189–194. [Google Scholar] [CrossRef] [PubMed]
- Bantis, L.E.; Nakas, C.T.; Reiser, B. Constr.Construction of confidence regions in the ROC space after the estimation of the optimal Youden index-based cut-off point. Biometrics 2014, 70, 212–223. [Google Scholar] [CrossRef] [PubMed]
- Wotschofsky, Z.; Busch, J.; Jung, M.; Kempkensteffen, C.; Weikert, S.; Schaser, K.D.; Melcher, I.; Kilic, E.; Miller, K.; Kristiansen, G. Diagnostic and prognostic potential of differentially expressed miRNAs between metastatic and non-metastatic renal cell carcinoma at the time of nephrectomy. Clin. Chim. Acta 2013, 416, 5–10. [Google Scholar] [CrossRef] [PubMed]
- Jiang, S.; Tu, D. Inference on the probability P(T1<T2) as a measurement of treatment effect under a density ratio model and random censoring. Comput. Stat. Data Anal. 2012, 56, 1069–1078. [Google Scholar]
- Wang, C.; Marriott, P.; Li, P. Testing homogeneity for multiple nonnegative distributions with excess zero observations. Comput. Stat. Data Anal. 2017, 114, 146–157. [Google Scholar] [CrossRef]
- Yuan, M.; Li, P.; Wu, C. Semiparametric inference of the Youden index and the optimal cut-off point under density ratio models. Can. J. Stat. 2021, 49, 965–986. [Google Scholar] [CrossRef]
- Anderson, J.A. Multivariate logistic compounds. Biometrika 1979, 66, 17–26. [Google Scholar] [CrossRef]
- Qin, J.; Zhang, B. A goodness-of-fit test for logistic regression models based on case-control data. Biometrika 1997, 84, 609–618. [Google Scholar] [CrossRef]
- Qin, J. Biased Sampling, Over-Identified Parameter Problems and Beyond; Springer: Singapore, 2017. [Google Scholar]
- Hu, D.; Yuan, M.; Yu, T.; Li, P. Statistical inference for the two-sample problem under likelihood ratio ordering, with application to the ROC curve estimation. Stat. Med. 2023, 42, 3649–3664. [Google Scholar] [CrossRef] [PubMed]
- Zhang, B. A semiparametric hypothesis testing procedure for the ROC curve area under a density ratio model. Comput. Stat. Data Anal. 2006, 50, 1855–1876. [Google Scholar] [CrossRef]
- Owen, A.B. Empirical Likelihood; Chapman and Hall/CRC: New York, NY, USA, 2001. [Google Scholar]
- Cai, S.; Chen, J.; Zidek, J.V. Hypothesis testing in the presence of multiple samples under density ratio models. Stat. Sin. 2017, 27, 761–783. [Google Scholar] [CrossRef]
- Hsieh, F.; Turnbull, B.W. Nonparametric methods for evaluating diagnostic tests. Stat. Sin. 1996, 6, 47–62. [Google Scholar]
- Percy, M.E.; Andrews, D.F.; Thompson, M.W. Duchenne muscular dystrophy carrier detection using logistic discrimination: Serum creatine kinase, hemopexin, pyruvate kinase, and lactate dehydrogenase in combination. Am. J. Med. Genet. 1982, 13, 27–38. [Google Scholar] [CrossRef] [PubMed]
- Andrews, D.F.; Herzberg, A.M. Data: A Collection of Problems from Many Fields for the Student and Research Worker; Springer: New York, NY, USA, 2012. [Google Scholar]
- Yin, J.; Samawi, H.; Tian, L. Joint inference about the AUC and Youden index for paired biomarkers. Stat. Med. 2022, 41, 37–64. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Yin, J.; Tian, L. Evaluating joint confidence region of hypervolume under ROC manifold and generalized Youden index. Stat. Med. 2024, 43, 869–889. [Google Scholar] [CrossRef]
- McClish, D.K. Analyzing a portion of the ROC curve. Med. Decis. Mak. 1989, 9, 190–195. [Google Scholar] [CrossRef] [PubMed]
- Jiang, Y.; Metz, C.E.; Nishikawa, R.M. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 1996, 201, 745–750. [Google Scholar] [CrossRef] [PubMed]
- Zhang, D.D.; Zhou, X.H.; Freeman, D.H., Jr.; Free, J.L. A non-parametric method for the comparison of partial areas under ROC curves and its application to large health care data sets. Stat. Med. 2002, 21, 701–715. [Google Scholar] [CrossRef]
- Dodd, L.E.; Pepe, M.S. Partial AUC estimation and regression. Biometrics 2003, 59, 614–623. [Google Scholar] [CrossRef] [PubMed]
- Ma, H.; Bandos, A.I.; Rocket, H.E.; Gur, D. On use of partial area under the ROC curve for evaluation of diagnostic performance. Stat. Med. 2013, 32, 3449–3458. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).