1. Introduction
Laboratory assays for blood-based tumor markers often show considerable overlap between cancer and non-cancer cohorts, requiring a tradeoff between sensitivity and specificity when defining an appropriate cutoff. To directly compare different diagnostic tools, the so-called receiver operating characteristic (ROC) curve has traditionally been used, but it provides limited information about a single biomarker profile and does not include the cutoff distributions across the whole range of possible cutoffs.
Traditional ROC curves were only used for investigations of sensitivity/specificity. They were first developed during World War II to assess the ability of radar operators to differentiate signals (e.g., enemy aircraft) from noise (e.g., a flock of birds), and they were later applied in signal theory, leading to the name “receiver operating characteristic” [
1,
2,
3].
In a traditional ROC curve, the relationship between the sensitivity and specificity is shown by plotting such parameters against one another; when increasing the cutoff concentration value, the sensitivity decreases while the specificity increases. Typically, both values range between 0 and 1, and the sensitivity (true positive (TP) rate) is plotted against 1 minus the specificity (false positive (FP) rate).
The potential of ROC curves for medical diagnostics was first highlighted with regard to radiology by Lusted in 1960 [
4]. However, this was only after the publication of the work of Swets and Pickett [
5]. During their later extension to other medical fields, these curves were extensively used in radiology to evaluate medical imaging devices [
6].
The ROC curve is also known as the relative operating characteristic curve because it is a comparison of two operating characteristics (TP rate and FP rate) at varied cutoff values [
7]. The first ROC curves for clinical biomarker diagnosis were introduced in 1981 [
8]. They included sensitivity–specificity ROC curves for the diagnostic analysis of in vitro bioassays in patients with tumors (lung, urinary bladder and testicular carcinomas, in comparison to control groups with/without benign diseases). These curves were developed from inverse distribution functions for the improved comparison of biomarker assays through the exclusion of different bioassay concentration parameters, and they were used to avoid the incorrect evaluation of two or more biomarkers when comparing their sensitivities at different specificities, which was common at that time (this article can be obtained via the Internet (Oehr Derigs Altmann ResearchGate)). Initially, the curves were called sensitivity–specificity diagrams, but after finding ROC curves in the literature, the author changed their name to ROC curves as well. At present, ROC curves are also referred to as sensitivity/specificity ROC (SS–ROC) curves because, in recent years, new ROC curves have been introduced.
In 2008, Shiu and Gatsonis were the first to present ROC curves for the joint study of the positive predictive values (PPVs) and negative predictive values (NPVs) of diagnostic tests. Using a mathematical approach, they defined the “Predictive Receiver Operating Characteristic Curve” (PROC) as a curve that consists of all possible pairs of PPVs and NPVs as the threshold for test positivity [
9]. Measures of test performance, such as sensitivity, specificity or ROC curves, provide the type of information that is typically needed for technology assessment and health policy purposes. Measures of predictive value provide the type of information that is typically needed for clinical decision making, where clinicians and patients decide whether to use a test or how to assess the implications of a test result [
9].
In 2012 and 2020, Special Issues on the application of information theory to epidemiology were published, including further mathematical and the first empirical PV–ROC curves for bioassays [
10,
11,
12,
13].
These graphs were plotted without integrated cutoff distribution curves. However, this deficiency could be overcome with the integration of the cutoff value distributions in the sensitivity–specificity/predictive value (SS/PV) ROC and newly established sensitivity–specificity–Youden/predictive value–predictive summary index (SS-J/PV–PSI) ROC curves [
14]. While ROC curves without cutoff distribution curves are useful for the direct comparison of different markers and tests, when profiling a biomarker with the inclusion of additional parameters such as precision and accuracy, the concentration distribution curves contribute additional valuable information. This is of particular importance when plotting ROC curves that include both the specificity and NPVs on the
x-axis of a single graph, because the threshold distribution curves run in opposite directions [
14].
This work’s novelty compared to existing methodology presents two new, unpublished ROC curves for the parameters of precision and accuracy. This work is not intended to provide a detailed clinical evaluation of a specific test or biomarker. Rather, the data are used to explain the construction of the novel ROC curves and to demonstrate the different characteristic distributions of all the mentioned ROC curves and their concerted applicability. First, definitions and formulas for the diagnostic parameters are given (
Section 2); then, examples of different combinations of ROC curves are presented. This work considers how the sensitivity/specificity, accuracy, precision and predictive values are related to each other within a bioassay and how these relationships change at different cutoff values. The discussion focuses on the importance of using accuracy as a defined diagnostic parameter, instead of examining the “bioassay accuracy” using some combination of single-point PV or sensitivity/specificity values and a traditional ROC curve, including the maximal Youden index as an “optimal cutoff” value. Another section discusses the clinical value of the maximal Youden index as a “diagnostic parameter” compared to the maximal values of the indices for the precision, accuracy and predictive values as an alternative. Finally, a new, diagnostically optimized cutoff value is derived, which takes all indices into account and provides the type of information that is needed for clinical decision making, in contrast to the maximal Youden index alone.
3. Results
Table 1 shows the results after the TP/TN and FP/FN values were used to calculate the sensitivity/specificity, accuracy, precision and predictive values at concentrations of 5, 10, 30, 50, 90, 110, 250 and 300 µg/L.
Figure 1 presents a novel ROC curve that, in addition to the sensitivity/specificity, displays precision and accuracy as a function of 1 − specificity. Moving from the left side of the graph to the right, from 300 µg/L to 0 µg/L, the sensitivity increases along the
y-axis from 0 to 1, while the precision and accuracy decrease from 0.5 and 0.9 to 0, diametrically opposed to the sensitivity, along the
x-axis. In contrast to the curve for the sensitivity, which is located above the diagonal lines and shows an area under the curve (AUC), the precision and accuracy curves intersect with the diagonal. While the curve for precision includes a steep decline at the beginning, the accuracy curve is approximately linear.
Figure 2 contains two different distribution curves. The values plotted on the
y-axis are originally the same, both calculated according to precision = PPV, using the formula TP/(TP + FP), and they refer to the same varied cutoff value concentrations. These calculation results are plotted as a function of either the specificity or 1 − NPV. In the first case, the curve is called a precision ROC (PR–ROC) curve; in the second case, it is a predictive value ROC (PV–ROC) curve. While the cutoff distribution curve for the “precision” decreases, the curve runs from the left- to the right-hand side at the bottom of the graph, whereas the values for the “PPV” run in the opposite direction. The distributions of their cutoff values run in opposite directions as well. For these reasons, their respective cutoffs are located at different points, as demonstrated in the figure for a cutoff of 10 µg/L. The distributions and sizes are entirely different. The “precision–specificity” curve extends between 0 and 1 on the
x-axis and has the maximal “precision” value at approximately 0.5, whereas the “PV–ROC” curve is only located between 0.075 and 0 on the
x-axis, and the maximal “PPV” value is approximately 0.5 as well.
Figure 3 presents the ROC curves investigated in this study and includes the distributions of the sensitivity, precision, accuracy and PV–ROC curves, including their corresponding UBC cutoff values along the
y-axis and at 1 − specificity and 1 − NPV on the
x-axis.
Table 2 contains the values of the specificity, precision, sensitivity, accuracy and predictive indices for the UBC
® Rapid test at their corresponding increasing cutoff values from 0.5 µg to 300 µg.
In order to achieve optimized scaling in the corresponding index diagram derived from
Table 2, which is shown in
Figure 4, a cutoff value of 1/1000 is applied.
Figure 4a,b show the curve distributions derived from
Table 2. At increasing cutoff values, the Youden index curve immediately increases to a maximum value of 4.25 at 10 µg/L, followed by a continuous decrease.
In contrast, compared to the maximal value at 10 µg/L, the precision and accuracy values do not display diagnostic utility, while the predictive value–predictive summary index only reaches approximately half of its optimal value. Up to a value of 100 µg/L, the index curves for precision, accuracy and PPV continue to increase; thereafter, the curves become flat and they approach their maximum values at approximately 250 µg/L.
At this cutoff level, the SS–J index is only 0.0558, whereas the PV index is 0.56, the AC–SP index is 0.9 and the PV–PSI is 0.459.
Figure 4b illustrates the cutoff value for the intersection of the SS, PR and PPV curves, which is approximately 40 µg/L. This cutoff corresponds to the following estimated values at 38% sensitivity and 94.2% specificity: SS–J index = 0.32; PR–SP index = 0.31; AC–SP index = 0.84; and PV–PSI = 0.58.
4. Discussion and Conclusions
In 1960, the potential utility of SS–ROC curves for medical diagnostics was first observed. During their later extension to other medical fields, these curves became widely used in radiology to evaluate medical imaging devices [
4,
5,
6]. According to Swift, the ROC curve is also known as the relative operating characteristic curve because it presents a comparison of two operating characteristics (TP rate and FP rate) at varied cutoff values [
7].
In 2008, Shiu and Gastonis presented a joint study of the PPVs and NPVs of diagnostic tests; this was the first publication in which, using a mathematical approach, the predictive receiver operating characteristic was described in the form of a PROC curve that consisted of all possible pairs of PPVs and NPVs as the threshold for test positivity [
9].
Within the present work, two new ROC curves were introduced, concerning precision and accuracy as a function of the specificity, including the curve for sensitivity and the common cutoff distribution curve. Precision reflects the similarity of the measurements to each other, and the accuracy of a test is its ability to correctly differentiate between patients and healthy cases and reflects the overall correctness. Understanding these metrics is crucial for medical diagnostics. Moving from the left side of the graph to the right (
Figure 1), with decreasing cutoff concentrations depicted along the
x-axis, the sensitivity increases while the precision and accuracy decrease. In contrast to the curve for the sensitivity, which is located above the diagonal line and shows an AUC, the precision and accuracy curves intersect with the diagonal and do not have an AUC. Moreover, while the curve for the precision shows a steep decline at the beginning, the accuracy curve is approximately linear.
Figure 2 contains two different ROC curves that display the results of the TP/(TP + FP) calculations, as a function of either 1 − specificity or 1 − NPV. In the case of 1 − specificity, the value distribution is the same as that shown in
Figure 1 and it is called the PR–SP–ROC curve. In the case of the function of 1 − NPV, the PV–ROC curve is applied. While the PR–SP–ROC curve is described for the first time in this work, the PV–ROC curve has already been established [
9,
12,
13], and the PV–ROC curve, shown in
Figure 2 and
Figure 3, was published as part of an SS–PV–ROC curve as well [
14]. However, a comparison such as that in
Figure 2 and
Figure 3, which demonstrates the different properties of ROC curves when the precision (=PPV) is plotted as a function of either the specificity or 1 − NPV, has not been presented before.
The curves for the sensitivity, specificity and predictive values, plotted in
Figure 1 and
Figure 2, have different values at different cutoff levels. As shown in
Table 1, at a cutoff of 10 µg/L, the sensitivity is high (0.66) and the precision is low (0.19); at a cutoff of 50 µg/L, the sensitivity and precision are similar (0.33 and 0.39); and at a cutoff of 90 µg/L, the sensitivity is lower (0.66) than the precision (0.51). This situation becomes even more complex when the cutoff values display the opposite patterns in the graph (
Figure 2 and
Figure 3). For this reason, the locations of the maximal cutoff values for the sensitivity, precision, accuracy and predictive values will differ in all cases, and four different maximal index values can be calculated. This indicates that the former method of selecting only the maximal Youden index as the optimal diagnostic cutoff value may be insufficient.
Figure 3 includes a summary of all ROC curves described in this study, demonstrating that any of the new ROC curves can be compared at various cutoff levels. This type of figure can be regarded as a diagnostic multi-parameter biomarker profile, which is given in a single graph, and can be used to obtain the necessary information according to the specific diagnostic goals within a clinical study.
Concerning the previous literature on the USB® Rapid bioassay, the authors often calculate the Youden index to determine the “optimal cutoff value”; in addition, they present some single-point determinations for the predictive values with reference to the Youden index.
Styrke et al. [
17] calculated an optimal threshold value of ≥8.1 µg/L, resulting in a sensitivity of 70.8%, specificity of 61.4%, a PPV of 71.3% and an NPV of 60.8%. Ritter et al., “using the optimal threshold obtained by receiver operating characteristic analysis (12.3 mg/L)”, stated that the sensitivity, specificity, PPV and NPV of the quantitative UBC
® Rapid test were 60.7%, 70.1%, 46.8% and 79.3%, respectively [
18]. Pichler et al. [
19] reported the best cutoff (highest Youden index; ≥6.7 ng/mL) for the quantitative UBC, which was determined using the ROC curves. For the quantitative UBC
® Rapid test, the sensitivity, specificity, PPV and NPV were 64.5%, 81.8%, 71.4% and 76.6%, respectively. According to the data presented in the present work, the maximal Youden index at 10 µg/L leads to 76% specificity.
The comparison of the single-point determinations at selected single cutoff values to determine the diagnostic utility of a test results in insufficient information because, as shown in this study, different curves have different distribution characteristics at various cutoff levels. Accordingly, in single-point determinations for any diagnostic parameter, a fixed specificity or NPV cannot provide relevant information, regardless of whether a maximal Youden cutoff value or any other single-point determination is located within the optimal area of any of the mentioned ROC curves.
The calculation of the maximum Youden index also does not consider whether the result contains a low value for the specificity. Concerning these “optimal cutoff values”, suboptimal specificity is the result. The low specificity of the published “optimal cutoff values” leads to an increased number of FP values, which could result in unnecessary invasive diagnostics in the clinical follow-up of patients suspected to have cancer. Furthermore, the PPV and NPV should not be included based on the optimal value derived from the “receiver operating characteristic analysis for ROC curves describing the sensitivity as a function of the specificity”. Instead, applying the PPV as a function of 1 − NPV is appropriate, e.g., by calculating the PSI [
16]. To provide an appropriate, unified threshold for the UBC
® Rapid test, threshold estimations from an SS/PV–ROC plot were proposed in 2020 [
13] and were later published—these included the maximum unified SS–J/PV–PSI value of 0.32 at the cutoff of 43 µg/L [
14].
It is complicated to perform a direct comparison and interpretation of several ROC curves within a single graph, especially when the cutoff distributions run in opposite directions. To solve this problem, the ROC curves shown in
Figure 3 were transformed into their corresponding index ROC curves and plotted as an index cutoff diagram, distributed across the complete cutoff range of the quantitative assay. This new tool [
14] is useful for the simultaneous comparison and selection of unified or separate optimal cutoff values for all mentioned diagnostic parameters; it can also be used to determine whether a common cutoff level can be applied according to clinical procedures such as diagnosis, follow-up or screening at the optimal specificity and/or NPV. At present, quantitative evaluations are made by calculating the area “over the diagonal”, called the area under the curve (AUC) which is often used as a measure of the test’s performance. Because ROC curves for accuracy and precision or positive predictive values are cutting the diagonal, a comparable AUC cannot be made. The proposed multi-parameter cutoff-index diagram includes novel index cutoff AOX curves. They can be used to make comparative quantitative evaluations by calculating the area over the
x-axis (AOX) for sensitivity, accuracy, precision and predictive values. It is a new different method that allows a quantitative comparison of results from multi-parameter ROC curves, which cannot be performed with the traditional AUC. However, both methods are different and do not exclude each other. Complete or partial areas across the
x-axis can be calculated for summarized quantitative effectivity evaluations, with respect to single and/or unified indices and single, separate or several unified cutoff thresholds. This offers an alternative to the AUC, which can only be derived from an SS–ROC curve.
It can be seen that the maximal values for the Youden index do not overlap with the maximal values of the other indices. However, according to the curve distributions, a common optimal cutoff value for all diagnostic parameters could be derived that includes higher specificity and an acceptable number of NPVs. This value was found at the intersection of the SS, PR and PPV curves at approximately 40 µg/L and corresponds to the following estimated values at 38% sensitivity and 94.2% specificity: SS–J index, 0.32; PR–SP index, 0.31; AC–SP index, 0.84; and PV–PSI, 0.58 (
Figure 4b).
One of the goals of this study was to introduce new ROC curves for precision and accuracy in order to evaluate their relationships with the previously published SS– and PV–ROC curves, including various cutoff distribution values. This was intended to reveal whether they could share a common optimal cutoff value and whether a single maximum Youden index is still the optimal choice.
Further goals were to demonstrate that the present practice involving the evaluation and characterization of quantitative diagnostic assays by including a ROC curve and single-point determinations derived from a clinical study cannot provide diagnostic information of the same quality as that offered by a multi-parameter diagnostic profile or a cutoff index diagram, as well as to propose a transparent method to identify appropriate cutoffs for multiple diagnostic parameters.
The final goal was to demonstrate that the use of individualized definitions for the characterization of diagnostic parameters leads to confusion in the scientific literature and should be avoided in the future. With respect to the data presented in this work, the use of the term “accuracy” was selected for the UBC® Rapid test for bladder cancer.
In statistics, the definition of “accuracy” is governed by the International Organization for Standardization (ISO) and is as follows: “Accuracy is the proximity of measurement results to the true value” [
20]. In the present work, “accuracy” refers to “diagnostic accuracy” and is expressed as the proportion of correctly classified subjects among all subjects. It is calculated using the formula (TP + TN)/(TP +TN + FP + FN). However, many scientific or medical publications, when confronted with the task of defining “accuracy”, refer to the common household definition (i.e., it is “the quality of being correct or true to some objective standard”). In the more recent literature, the ISO is used to define “accuracy”, particularly in publications concerning bioassays related to urinary bladder cancer. Here, an example is given for the common use of this term, from the publication “Evaluation of the diagnostic accuracy of UBC
® Rapid in bladder cancer: a Swedish multicentre study” [
17]. According to the authors, the aims of their study were to investigate the diagnostic accuracy of the UBC
® Rapid test in patients with primary bladder cancer, patients with a history of bladder cancer, those with a benign urological disease and healthy controls, based on the optimal cutoff value for the study population. They also sought to compare the test results in high- and low-risk urothelial tumors. The accuracy was described in terms of sensitivity, specificity and predictive values. The diagnostic accuracy of the UBC
® Rapid test in all cases was provided in tables based on cutoffs for the sensitivity, specificity, PPV and NPV. One of the published figures provided results for the PPV of the UBC
® Rapid at four different threshold concentrations. The “primary cutoff value” (also called the “optimal cutoff”) used for the calculation of the “diagnostic accuracy” and the predictive values was “based on an optimal cutoff (receiver operator characteristics curve analysis)”. However, according to ISO 5725-1 [
20], the general term “accuracy” is used to describe the closeness of a measurement to the true value, whereas optimal cutoffs such as the Youden index, derived from SS–ROC curves, measure the effectiveness of a diagnostic marker and permit the selection of an optimal sensitivity/specificity threshold value or cutoff point for the biomarker of interest. They cannot be used as a measure of accuracy, and this is also the case for predictive values.
Although the new methods described in this manuscript represent progress in the evaluation of bioassays, prevalence as a limiting factor should be mentioned. The PV–ROC curves from different studies including different prevalences cannot be directly compared with respect to the precision, accuracy and predictive values, because their curve distributions vary at different prevalences [
9,
12]. According to preliminary, unpublished data collected by the author of this manuscript, this is the case for the PRC– and AC–ROC curves as well. The prevalence value of 0.073 mentioned in this study reflects the daily routines of urological facilities and can already be regarded as valuable information for the practicing urologist.
PROC curves for different prevalence values may allow a preview of the likely extent of differences between the curves for subgroups. For example, the prevalence of bladder cancer is known to differ between subgroups of males and females [
13]. In such a situation, an array of more than one prevalence value is deemed necessary. PROC curves for different prevalence values may allow a preview of the likely extent of differences between the curves for each of the subgroups [
12]. Because of the dependence of PROC curves on prevalence, Hughes et al. displayed an array of PROC predictive values. The PROC curves for different prevalence values may allow a preview of the likely extent of differences [
12]. For this reason, the construction of arrays for precision and accuracy ROC curves is an important task for extending the applications of the new methods described in this work.
The mechanisms by which the curves may change their distributions remain to be investigated. Nevertheless, the new methods presented in this manuscript are applicable for the comparison of different biomarkers within the same study groups, and they are also applicable in other fields of science, e.g., plant epidemiology, machine learning algorithms and neural networks, AI and the economy.