Convergence Behavior of Optimal Cut-Off Points Derived from Receiver Operating Characteristics Curve Analysis: A Simulation Study

: The area under the receiver operating characteristics curve is a popular measure of the overall discriminatory power of a continuous variable used to indicate the presence of an outcome of interest, such as disease or disease progression. In clinical practice, the use of cut-off points as benchmark values for further treatment planning is greatly appreciated, despite the loss of information that such a dichotomization implies. Optimal cut-off points are often derived from ﬁxed sample size studies, and the aim of this study was to investigate the convergence behavior of optimal cut-off points with increasing sample size and to explore a heuristic and path-based algorithm for cut-off point determination that targets stagnating cut-off point values. To this end, the closest-to-(0,1) criterion in receiver operating characteristics curve analysis was used, and the heuristic and path-based algorithm aimed at cut-off points that deviated less than 1% from the cut-off point of the previous iteration. Such a heuristic determination stopped after only a few iterations, thereby implicating practicable sample sizes; however, the result was, at best, a rough estimate of an optimal cut-off point that was unbiased and positively and negatively biased for a prevalence of 0.5, smaller than 0.5, and larger than 0.5, respectively.


Introduction
The search for biomarkers that indicate a clinical outcome of interest (such as disease presence or recurrence) has been an incessant endeavor in medical research during the past decades.Several performance measures quantify a biomarker's added value in predictive modeling, including both categorizing a continuous marker with cut-off points and using the whole of the information that a biomarker provides on a continuous scale [1].The categorization of continuous variables has the advantage of straightforward implementation in clinical practice at the cost of information loss.The distance of a biomarker value to a given cut-off point may be small or large but still indicates the same classification of the subject, and functional relationships with the outcome of interest are easily disguised [2,3].A clinical example is the Framingham Risk Score, used for estimating the 10-year cardiovascular risk of an individual, with low (less than 10%), moderate (10-19%), and high (20% or higher) risk categories [4][5][6].Another example is the Agatston score for coronary calcification, which is classified, for instance, into 0, 1-9, 10-99, 100-399, and 400 or higher [7,8].Based on the results of the Multi-Ethnic Study of Atherosclerosis, the Framingham Risk Score was extended using the Agatston score, illuminating the differences in the Framingham Risk Score when incorporating or disregarding the Agatston score for coronary calcification [9][10][11].
In diagnostic research, several criteria for cut-off point optimality have been proposed that are based on the receiver operating characteristics (ROC) curve [12,13].The closestto-(0,1) criterion [14] and the Youden index [15] indicate the optimal cut-off point as the one closest to perfect discrimination of subjects with or without the condition of interest and the point farthest from no discrimination, respectively [16].Liu [17] introduced the concordance probability of the dichotomized measure at the optimal cut-off point, which geometrically represents the area of a rectangle below the ROC curve, with the optimal cut-off point as the top-left corner.In Stata, these three criteria are implemented in the command cutpt, with Liu's method used as the default.Lopez-Raton et al. [18] introduced the R package OptimalCutpoints to select optimal cut-off points.They included criteria based on sensitivity and specificity (e.g., Youden index and closest-to-(0,1) criterion), predictive values, diagnostic likelihood ratios, cost-benefit analysis of the diagnosis, and maximum chi-squared or minimum p-value criterion.
In cancer research, first-in-human dose-finding trials aim to determine a maximal tolerable dose, which is associated with a probability of observing dose-limiting toxicity of 33%.Traditionally, the rule-based 3 + 3 design was used; nowadays, more efficient but computationally more demanding model-based (especially the continual reassessment method) and model-assisted (such as Bayesian optimal interval design) designs are employed [19][20][21].
This study aimed to transfer the idea of up-and-down designs in cancer dose-finding trials (such as the traditional 3 + 3 dose-escalation rule) to cut-off point-finding endeavors in diagnostic research.To achieve this, we investigated the convergence behavior of optimal cut-off points with increasing sample size in a simulation study and explored a heuristic and path-based algorithm for cut-off point determination that targeted stagnating cut-off point values.

Simulation Set-Up
The distribution of scores in subjects with (D1) and without (D0) a target condition can take very different forms.Hypothetical distributions employ normal distributions [12,13,22] and right-skewed distributions [23].In practice, the abovementioned Agatston scores for coronary calcification are an example of a variable that often follows a right-skewed distribution, as the calcification scores are nonnegative integers, often with an overexpression of zeros in disease-free subjects [10,24].Four sets of distributions were assumed for D0 and D1.The prevalence of the disease was assumed to be 0.1, 0.3, 0.5, and 0.7, and the number of simulated trials was 1000.An optimal cut-off point according to the closest-to-(0,1) criterion was determined with a minimum sample size of 100 to ensure a minimum of approximately 10 cases.We chose 101 subjects instead of 100 as the starting point to increase the chance of identifying a unique, optimal cut-off point; as the empirical ROC curve is a step function, cutpt may identify more than one closest-to-(0,1) cut-off point, leading to ties and termination of the procedure.

•
Reproducible Stata codes for all results are available in Supplementary Materials S1, and Stata data files, including optimal cut-off points by trial number and n = 101-801 in increments of 50 subjects, are available in Supplementary Materials S2.All analyses were performed using Stata/MP 17.0 (StataCorp, College Station, TX 77845 USA).The prevalence of the disease was assumed to be 0.1, 0.3, 0.5, and 0.7, and the number of simulated trials was 1000.An optimal cut-off point according to the closest-to-(0,1) criterion was determined with a minimum sample size of 100 to ensure a minimum of approximately 10 cases.We chose 101 subjects instead of 100 as the starting point to increase the chance of identifying a unique, optimal cut-off point; as the empirical ROC curve is a step function, cutpt may identify more than one closest-to-(0,1) cut-off point, leading to ties and termination of the procedure.
Reproducible Stata codes for all results are available in Supplementary Materials S1, and Stata data files, including optimal cut-off points by trial number and n = 101-801 in increments of 50 subjects, are available in Supplementary Materials S2.All analyses were performed using Stata/MP 17.0 (StataCorp, College Station, TX 77845 USA).

Criterion for Optimality of a Cut-Off Point
The Stata package cutpt enables cut-off point determination according to the Youden index, closest-to-(0,1) criterion, and Liu's method.We employed the closest-to-(0,1) criterion because of its algorithmic stability when conducting the simulation study, as the non-identifiability of a unique optimal cut-off point, which leads to immediate termination of the algorithm, occurs less often with the closest-to-(0,1) criterion than with the other two methods.The vertical, dashed line indicates the optimal cut-off point according to the Youden index, which was 3, 3.42, 3.6, and 3.45 for scenarios 1-4, respectively.The vertical, dotted line indicates the optimal cut-off point according to the closest-to-(0,1) criterion, which was 3, 3.18, 3.65, and 3.88 for scenarios 1-4, respectively.

Criterion for Optimality of a Cut-Off Point
The Stata package cutpt enables cut-off point determination according to the Youden index, closest-to-(0,1) criterion, and Liu's method.We employed the closest-to-(0,1) criterion because of its algorithmic stability when conducting the simulation study, as the nonidentifiability of a unique optimal cut-off point, which leads to immediate termination of the algorithm, occurs less often with the closest-to-(0,1) criterion than with the other two methods.

True Optimal Cut-Off Points
With Se(c) and Sp(c) representing sensitivity (true-positives divided by the sum of true-positives and false-negatives) and specificity (true-negatives divided by the sum of true-negatives and false-positives), respectively, evaluated at cut-off point c, the optimal cut-off point is defined for each of these methods as follows [17]: As the assumed distributions for D0 and D1 are given (Figure 1), the true optimal cut-off points in scenarios 1-4 were evaluated by grid search (Supplementary Materials S1).For the closest-to-(0,1) criterion, the true optimal cut-off points were 3, 3.18, 3.65, and 3.88 for scenarios 1-4, respectively.Notably, the true optimal cut-off points were identical for the Youden index, closest-to-(0,1) criterion, and Liu's method only for homoscedastic scenario 1, whereas these were different for the remaining heteroscedastic scenarios (Table 1).Figure A1 depicts the respective ROC curves for all scenarios.

Convergence Behavior of Optimal Cut-off Points with Increasing Sample Size
For each setting and trial, the optimal cut-off points were determined for all sample sizes, n = 101, 151, 201, . . ., 801.For every estimated optimal cut-off point, the bias (in %) and mean squared error (MSE) in relation to the true values were derived.A bias smaller than a 1% deviation from the true optimal cut-off point was considered reasonably close to the true value.Boxplots demonstrate the location and skewness of the cut-off point distributions.Values larger than the third quartile plus 1.5 times the interquartile range and values smaller than the first quartile minus 1.5 times the interquartile range are shown individually, in accordance with the definition of boxplot outliers in Stata.

A Heuristic and Path-Based Algorithm for Cut-Off Point Determination
The optimal cut-off point estimate for ROC curves varies with increasing sample size and eventually converges to the true value.For each simulated trial, the search started with n = 101 subjects, and the cut-off point was estimated after increments of 50 (heuristic algorithm 1) and 100 (heuristic algorithm 2).The algorithm was stopped, and the cut-off point was identified when the estimated cut-off point deviated by less than 1% from the precedent estimate.To this end, the simulations in the previous section were used.The bias (in %) and MSE of the identified optimal cut-off points, as well as the mean number of patients and their respective 95% confidence intervals (95% CI), are reported.

Real-Life Example Data
The Agatston score for coronary calcification is a nonnegative marker based on a coronary computed tomography (CT) scan.It is the total calcium score across all calcific lesions detected on slices obtained from the proximal coronary arteries [7].The Agatston score has become a cardiovascular risk factor in addition to those previously known (male sex, age, smoking, systolic blood pressure, and total cholesterol) [25] and was measured as part of two population-based cardiac CT screening cohorts [26][27][28].These Danish samples comprised 17,252 participants aged 50 to 75 years, among which 15% had a history of cardiovascular disease and 11.2% were female [24].The data were randomly sorted by using 20,221,019 as seed.
The real-life example data are available in Supplementary Materials S4, and the Stata codes are part of Supplementary Materials S1.

Fixed Sample Size
Figure 2 shows boxplots for optimal cut-off points for sample sizes n = 101, 151, 201, …, 801 and scenario 1 by prevalence.For a prevalence below 0.5, the optimal cut-off point was overestimated on average (Figure 2, top left corner: prevalence = 0.1; Figure 2, top right corner: prevalence = 0.3).The smaller the prevalence, the higher the likelihood that sampling would include more D0 subjects in the tails of the distribution and, therefore, to the right of the true optimal cut-off point (see vertical, dashed, and dotted lines in Figure 1, top left corner).In contrast, sampling also included fewer D1 subjects in the tails of the distribution, leading to an overestimation of the estimated optimal cut-off point.With increasing sample size, the convergence of the estimate to the true optimal cut-off point was visible by means of narrower boxes (i.e., smaller interquartile ranges) around the true value of 3 (Figure 2).However, only for a prevalence of 0.5 (Figure 2, bottom left corner) did the first and third quartiles close onto the interval of 1% deviation from the true value, which was 2.97 to 3.03.The same was true for scenarios 2-4 (Supplementary Materials S3).
It was also only for a prevalence of 0.5 that the mean bias fell short of a 1% deviation from the true value across all four scenarios (Table 2, see bold print), even with a sample The smaller the prevalence, the higher the likelihood that sampling would include more D0 subjects in the tails of the distribution and, therefore, to the right of the true optimal cut-off point (see vertical, dashed, and dotted lines in Figure 1, top left corner).In contrast, sampling also included fewer D1 subjects in the tails of the distribution, leading to an overestimation of the estimated optimal cut-off point.With increasing sample size, the convergence of the estimate to the true optimal cut-off point was visible by means of narrower boxes (i.e., smaller interquartile ranges) around the true value of 3 (Figure 2).However, only for a prevalence of 0.5 (Figure 2, bottom left corner) did the first and third quartiles close onto the interval of 1% deviation from the true value, which was 2.97 to 3.03.The same was true for scenarios 2-4 (Supplementary Materials S3).
It was also only for a prevalence of 0.5 that the mean bias fell short of a 1% deviation from the true value across all four scenarios (Table 2, see bold print), even with a sample size as small as 101.With sample sizes of at least 201 or 301, this held true for a prevalence of 0.3 or 0.7 in scenarios 1-3 as well as for a prevalence of 0.7 in scenario 4. For a prevalence of 0.1, the mean bias exceeded a 1% deviation from the true value in all scenarios and sample sizes.The MSE decreased with increasing sample size for every prevalence and increased from scenario 1 to 4. The MSE was considerably larger in scenario 4 than in scenarios 1-3, probably because of the extreme assumption of exponentially distributed D0 values.

Heuristic and Path-Based Algorithm for Cut-Off Point Determination
Starting with n = 101 subjects and using increments of n = 50 (heuristic algorithm 1) until a cut-off point deviated less than 1% from the precedent estimate, 189 to 203 subjects were used on average to arrive at an optimal cut-off (Table 3).The bias and MSE values were slightly larger than the respective values for a fixed sample size of n = 201 (Table 2).Bold print: Mean bias deviated less than 1% from the true optimal cut point.
Apparently, the heuristic and path-based search was most often completed with 151 or 201 subjects.Figure 3 shows the path of supposedly optimal cut-off points for the first nine simulated trials in scenario 1 with a prevalence of 0.5 when the sample sizes increased for illustration purposes from n = 101 to n = 1401 in increments of 50.In six out of nine trials, the cut-off point was chosen with n = 151 subjects (see vertical, dotted lines); in three trials, n = 351 (top middle), 201 (middle center), and 201 subjects (bottom right) were necessary.
Apparently, the heuristic and path-based search was most often completed with 151 or 201 subjects.Figure 3 shows the path of supposedly optimal cut-off points for the first nine simulated trials in scenario 1 with a prevalence of 0.5 when the sample sizes increased for illustration purposes from n = 101 to n = 1401 in increments of 50.In six out of nine trials, the cut-off point was chosen with n = 151 subjects (see vertical, dotted lines); in three trials, n = 351 (top middle), 201 (middle center), and 201 subjects (bottom right) were necessary.Vertical, dotted lines and vertical, dashed-dotted lines represent the points at which the heuristic and path-based algorithm hypothetically stops when using increments of n = 50 and n = 100 subjects, respectively, as these estimated cut-off points are within 1% of the corresponding precedent estimate.Bold print: Mean bias deviated less than 1% from the true optimal cut point.
Starting with n = 101 subjects and using increments of n = 100 instead (heuristic algorithm 2) led to cut-off point determination with n = 310 to 343 subjects on average (Table 4).As before, bias and MSE were slightly larger than respective numbers for a fixed sample size of n = 301 (Table 2), and the heuristic and path-based search was most often already completed after a few "follow-up looks" as well.As shown in Figure 3, this was the case thrice with n = 201 subjects (top left, middle left, bottom middle; see vertical, dashed lines), twice with n = 301 (bottom left, bottom right), twice with n = 501 (top middle, top right), and once with n = 401 (middle right) and 601 (middle center).
Finally, Figure 3 suggests that the chosen cut-off point with n = 1401 subjects was very close to or within a 1% deviation of the true value in five out of nine trials (top row and left column).In contrast, the chosen cut-off point deviated considerably from the true value of 3 for three of the remaining four simulated trials at n = 1401 (middle center, middle right, and bottom middle).

Real-Life Example
For the sake of this example, we assumed that the Agatston score could serve as marker for previous cardiovascular disease in the subjects.Larger values for the Agatston score are associated with increased risk.Further, we declared the full dataset as a population from which we sampled.Then, the prevalence was 0.15, the area under the ROC curve was 0.73 (95% CI: [0.72-0.74]),and the empirical optimal cut-off point based on the full dataset was 184.7 (Figure A2).The real-life example data were analyzed analogously to the simulated data before; that is, in consecutive order.For all sample sizes n = 101, 151, 201, . . ., 17,251, optimal cut-off points were determined according to the closest-to-(0,1) criterion (Figure 4).The heuristic algorithms 1 and 2 stopped at n = 351 and n = 401, respectively.The chosen cut-off point oscillated heavily for smaller and still considerably for larger sample sizes.The smallest sample size, at which the chosen cut-off deviated less than 1% from the empirical optimal cut-off point of 184.7, was n = 5801.Only for sample sizes equal to or larger than n = 9301 did the chosen cut-off point deviate less than 1% from the empirical value.This illustrates our findings of slow convergence.Moreover, most chosen cut-off points for sample sizes less than 9301 were larger than the empirical value of 184.7 in Figure 4.This positive bias of the chosen cut-off point was due to the small prevalence of only 0.15 (see also Figure 2, top left).
Mathematics 2022, 10, x FOR PEER REVIEW 9 of 14 value.This illustrates our findings of slow convergence.Moreover, most chosen cut-off points for sample sizes less than 9301 were larger than the empirical value of 184.7 in Figure 4.This positive bias of the chosen cut-off point was due to the small prevalence of only 0.15 (see also Figure 2, top left).

Main Findings
With a disease prevalence of 0.5, the optimal cut-off point estimation was, on average, unbiased for all sample sizes, but positively biased for a prevalence smaller than 0.5 and negatively biased for a prevalence larger than 0.5.For a prevalence of 0.5, the mean bias fell short of a 1% deviation from the true optimal cut-off point across all four sce-Figure 4. Line plot of chosen cut-off points for the real-life example.Cut-off points were determined for n = 101, 151, 201, . . ., 17,251 (dark blue lines).Horizontal, dashed lines indicate a maximum of 1% deviation from the empirical optimal cut-off point (184.7;target area).Vertical, dotted lines and vertical dashed-dotted lines represent the point at which the heuristic and path-based algorithm stops when using increments of n = 50 and n = 100 subjects, respectively.

Main Findings
With a disease prevalence of 0.5, the optimal cut-off point estimation was, on average, unbiased for all sample sizes, but positively biased for a prevalence smaller than 0.5 and negatively biased for a prevalence larger than 0.5.For a prevalence of 0.5, the mean bias fell short of a 1% deviation from the true optimal cut-off point across all four scenarios.The MSE value was the worst in scenario 4, in which the D0 distribution was assumed to be exponential and highly right-skewed.The heuristic and path-based algorithm that looked for a deviation of up to 1% within two consecutive cut-off points stopped after only a few iterations, resulting in an imprecise cut-off point estimate.This was independent of whether increments of 50 or 100 subjects were used, leading to average sample sizes up to n = 203 (heuristic algorithm 1) and n = 343 (heuristic algorithm 2).

"Optimality" of a Cut-Off Point
According to Leeflang et al. [23], a prevalence of 50% is the most efficient to ensure that the combined uncertainty in sensitivity and specificity is the smallest.Perkins and Schisterman [16] pointed out that the Youden index and the closest-to-(0,1) criterion lead to the same chosen cut-off points in some situations but to different cut-off points in others.The Youden index reflects the intention of maximizing overall correct classification proportions and, thus, minimizing misclassification, whereas the closest-to-(0,1) criterion lacks such a clinical meaning.Thus, Perkins and Schisterman advised against the use of the closest-to-(0,1) criterion.In our simulation study, both the closest-to-(0,1) criterion and the Youden index identified the very same cut-off point as optimal only in scenario 1 (3) but different ones in scenario 2 (3.18 vs. 3.42), 3 (3.65 vs. 3.6), and 4 (3.88 vs. 3.45).From Figure 1, it becomes apparent that the indicated optimal cut-off points according to the Youden index (vertical, dashed lines) are clearly those that maximize the overall correct classification, as they indicate where the D0 and D1 distributions cross.
López-Ratón et al., focused on the symmetry point (also known as the point of equivalence) in optimal cut-off point determination [29,30].The symmetry point is defined by the intersection of the ROC curve and the line y = 1 − x and can be interpreted as the point that maximizes simultaneously both types of correct classifications; that is, true-positives and true-negatives.Liu [17] proposed an alternative criterion to the Youden index, and Schisterman et al. [31] discussed a generalized Youden index to integrate the costs of different types of errors (i.e., false-negatives and false-positives).Further, Schisterman et al. [31] proposed deriving bootstrapped 95% CIs to reflect the estimation uncertainty of the chosen cut-off point.However, these optimality criteria apply only if the sensitivity and specificity are weighted equally and any differential consequences are ignored.In contrast, Laking et al. [32] and Greiner et al. [33] also considered the impact of the cost of false-negative and false-positive results on the choice of the cut-off point.Pepe et al. [34] related the target values for the sensitivity and specificity of a diagnostic test to its clinical value.They argued that the necessary information comprises knowledge of the disease prevalence in the clinical population and the ratio of the benefit associated with the clinical consequences of a positive biomarker test in cases to the cost associated with a positive biomarker test in controls.In a practical application, the optimality criterion must be chosen with care; for the purpose of this simulation study, considering the straightforward closest-to-(0,1) criterion is sufficient to investigate its convergence pattern with increasing sample size.Peng et al. [35] proposed the broadest framework to categorize a continuous scale according to an ordinal outcome.They suggested a nonparametric cut-off point estimator that encompasses the Youden index in the context of ROC curve analysis.
The term "optimality" may suggest that a single, universal optimal cut-off point does actually exist.However, every criterion implicates its "own" optimal cut-off point, leading to differences in "optimal" cut-off points across methods (Table 1).Any "optimal" cut-off point is, just as in any optimization problem, optimized according to the prespecified criterion.
Finally, several approaches have been proposed that refrain from dichotomization of a continuous marker at all [36][37][38] but allow for an interval of uncertainty or a gray zone of transition where D0 and D1 overlap [39][40][41][42][43][44], possibly heavily so (see, for instance, D0 and D1 for 2 < x < 4 in Figure 1, top left).Briggs and Zaretzki [45] proposed a graphical technique to evaluate continuous diagnostic tests, the skill plot.The skill plot gives insight into the interval of marker values for which peak diagnostic performance occurs.Moreover, the skill plot indicates clearly whether any threshold value offers diagnostic power beyond a naive forecast (of an always present or always absent target condition).

Transferability of a Path-Based Design from Early Phase Cancer Research
The idea of path-based cut-off point determination is different from fixed-sample cut-off point determination followed by validation endeavors [23].In phase I cancer research, dose-finding studies serve the purpose of identifying a dose to be used for clinical development.As the risk of dose-limiting toxicity increases with increasing doses, caution is advised in dose escalation.In diagnostic trials, the choice of a cut-off point has indirect consequences on the subjects, as treatment planning may later depend on the biomarker value, with the inherent risks of false-positive (the cut-off point was chosen as too small) and false-negative (the cut-off point was chosen as too large) decisions.The chosen optimal dose represents the best estimate of the target dose level, implicating a certain probability of dose-limiting toxicity.In contrast, cut-off point selection based on the heuristic rules shown here represents, at best, a rough estimate of an optimal cut-off point, although admittedly at moderately, and thereby practicable, sample sizes.However, the need for internal, temporal, and external validation of any chosen cut-off point remains [46,47].

Limitations of the Study
We employed the closest-to-(0,1) criterion despite its lack of clinical interpretability-in opposition to both Liu's method and the Youden index-due to algorithmic stability in our Monte Carlo simulations.We believe, though, that the study of the convergence behavior in finding an optimal cut-off point according to the closest-to-(0,1) criterion is defensible as we would expect similar patterns with Liu's method or the Youden index.
The syntax of the cutpt command in Stata is derived from the roctab command that provides nonparametric estimation of the ROC curve for a given classifier and true-status reference variable.The points on the nonparametric ROC curve are generated by using each possible outcome of the diagnostic test as a classification cut-off point and computing the corresponding sensitivity and specificity.These points are, then, simply connected by straight lines, and the area under the resulting ROC curve is computed using the trapezoidal rule.Generally, the estimation of cut-off points can significantly vary with the shape of the ROC curve that can result from nonparametric, semiparametric, or parametric estimation [12,13,[48][49][50][51][52].Especially when the ROC curve is estimated empirically (for smaller sample sizes or for cases with extreme marker distributions), the cut-off point could be different as compared to when the ROC curve is estimated as a smooth curve based on parametric or semi-parametric estimation.The shape of the ROC curve (concave or nonconcave) can also impact the cut-off point estimation.In short, the estimation process of the ROC curve will affect the cut-off point estimate and, thus, the convergence pattern could also vary with respect to the ROC estimation.Our work is based on one specific criterion for optimality (closest-to-(0,1)) and one specific nonparametric ROC curve estimation.

Conclusions
The optimal cut-off points derived from the ROC curve analysis converged to the true but unknown optimal cut-off point beyond n = 1000 included subjects.Special attention should be paid to the prevalence of a disease in the cut-off point estimation.Simple heuristic rules may serve as a preliminary cut-off point estimate, which warrants further validation.

Figure 2 .
Figure 2. Boxplots of chosen cut-off points by sample size for scenario 1 and a prevalence of 0.1 (top left), 0.3 (top right), 0.5 (bottom left), and 0.7 (bottom right).Values smaller than the first quartile minus 1.5 times the interquartile range and values larger than the third quartile plus 1.5 times the interquartile range are shown individually as outliers.The vertical, solid lines indicate a maximum of 1% deviation from the true optimal cut-off point (target area).

Figure 2 .
Figure 2. Boxplots of chosen cut-off points by sample size for scenario 1 and a prevalence of 0.1 (top left), 0.3 (top right), 0.5 (bottom left), and 0.7 (bottom right).Values smaller than the first quartile minus 1.5 times the interquartile range and values larger than the third quartile plus 1.5 times the interquartile range are shown individually as outliers.The vertical, solid lines indicate a maximum of 1% deviation from the true optimal cut-off point (target area).

Figure 3 .
Figure 3. Line plots of chosen cut-off points for the first nine trials of scenario 1 with a prevalence of 0.5.Cut-off points were determined for n = 101, 151, 201, . . ., 1401 (dark blue lines).Horizontal, dashed lines indicate a maximum of 1% deviation from the true optimal cut-off point (target area).Vertical, dotted lines and vertical, dashed-dotted lines represent the points at which the heuristic and path-based algorithm hypothetically stops when using increments of n = 50 and n = 100 subjects, respectively, as these estimated cut-off points are within 1% of the corresponding precedent estimate.

Figure 4 .
Figure 4. Line plot of chosen cut-off points for the real-life example.Cut-off points were determined for n = 101, 151, 201, …, 17,251 (dark blue lines).Horizontal, dashed lines indicate a maximum of 1% deviation from the empirical optimal cut-off point (184.7;target area).Vertical, dotted lines and vertical dashed-dotted lines represent the point at which the heuristic and path-based algorithm stops when using increments of n = 50 and n = 100 subjects, respectively.

Figure A2 .
Figure A2.Histograms of the real-life example for previous cardiovascular disease (upper panel) and the ROC curve for this example (lower panel).Figure A2.Histograms of the real-life example for previous cardiovascular disease (upper panel) and the ROC curve for this example (lower panel).

Figure A2 .
Figure A2.Histograms of the real-life example for previous cardiovascular disease (upper panel) and the ROC curve for this example (lower panel).Figure A2.Histograms of the real-life example for previous cardiovascular disease (upper panel) and the ROC curve for this example (lower panel).

Table 2 .
Bias and mean squared error (MSE) of cut-off points in fixed sample designs.
Bold print: Mean bias deviated less than 1% from the true optimal cut point.

Table 3 .
Bias, mean squared error (MSE), and mean number of patients (95% CI) of cut-off points derived by the heuristic algorithm 1.

Table 4 .
Bias, mean squared error (MSE), and mean number of patients (95% CI) of cut-off points derived by the heuristic algorithm 2.