A Statistical Method to Schedule the First Exam Using Chest X-Ray in Lung Cancer Screening

Rahman, Farhin; Wu, Dongfeng

doi:10.3390/math13162623

Open AccessArticle

A Statistical Method to Schedule the First Exam Using Chest X-Ray in Lung Cancer Screening

by

Farhin Rahman

^† and

Dongfeng Wu

^*,†

Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, 485 E. Gray St., Louisville, KY 40202, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2025, 13(16), 2623; https://doi.org/10.3390/math13162623

Submission received: 14 June 2025 / Revised: 31 July 2025 / Accepted: 4 August 2025 / Published: 15 August 2025

(This article belongs to the Special Issue Statistical Analysis and Modeling in Medical Research)

Download

Browse Figures

Versions Notes

Abstract

We applied a recently developed statistical method to the National Lung Screening Trial (NLST) chest X-ray data, to find the optimal time for initiating chest X-ray screening in asymptomatic individuals. Incidence probability was used to control the risk of clinical incidence before the first exam, constraining it to a small value, given one’s current age. The simulation study shows that the optimal screening age remains relatively consistent as the current age increases. Notably, male heavy smokers tend to have slightly later screening ages compared to females, which contrasts with findings from the NLST study using computed tomography (CT) scans. After the future screening time/age is found, we can estimate the lead time distribution and the probability of overdiagnosis if one would be diagnosed at this future time/age. The lead time is relatively consistent across incidence probability and sensitivity, with a slight decrease in the mean lead time as the current age increases, and it is positively correlated with the sojourn time. The probability of overdiagnosis exhibits positive correlations with the mean sojourn time, incidence probability, and the current age, with only slight changes in sensitivity. Overall, the probability of overdiagnosis is small and is not a concern at a younger age.

Keywords:

lung cancer; incidence probability; overdiagnosis; lead time; sensitivity

MSC:

62N02; 62P10; 62F15

1. Introduction

Early detection of lung cancer is crucial for effective treatment. This study aims to determine the optimal timing for initiating chest X-rays for lung cancer screening in asymptomatic individuals, improving cure rates and survival. We will apply a probability method from Wu (2022) to the NLST chest X-ray data to identify the future first screening age [1]. After this future screening age is found, we will estimate the lead time and probability of overdiagnosis.

Although low-dose CT is preferred for lung cancer screening in the U.S., chest X-rays are still widely used, especially in developing countries. Both methods expose patients to radiation, which can increase cancer risk. Therefore, determining the right age and frequency for screening is essential to minimize unnecessary exposure and costs. By limiting the probability of incidence to a small value, we will identify the optimal screening timeline. This approach relies on previously estimated parameters of disease progression from Rahman and Wu (2021) [2].

We will use the commonly used disease-progressive model

S_{0} \to S_{p} \to S_{c}

[3]; where

S_{0}

is the disease-free state or the state when the disease just starts and can not be detected by any exam;

S_{p}

is the preclinical state when an asymptomatic individual unknowingly has the disease that a screening exam can detect; and

S_{c}

is the clinical state. There are three key parameters in the disease progressive model: sensitivity, transition density, and sojourn time. The sensitivity

β

is the probability of a positive test result given that one is in the preclinical state. The transition density

w (t)

is the probability density function (PDF) of the time duration in

S_{0}

, that is,

T_{1} \sim w (t)

, where the random variable

T_{1}

is the time duration in

S_{0}

. The sojourn time is the time duration in the preclinical state

S_{p}

, and the disease with a longer sojourn time is usually easier to detect by a screening exam. We use

T_{2}

to represent the sojourn time with a PDF

q (x)

, i.e.,

T_{2} \sim q (x)

. We let

Q (z) = \int_{z}^{\infty} q (x) d x

be the survival function of the sojourn time.

T_{1}

and

T_{2}

are assumed to be independent. The three key parameters are called thus because all other terms in cancer screening can be expressed as functions of the three [1].

Two important aspects of cancer screening, the lead time and the probability of overdiagnosis will be evaluated for individuals diagnosed with cancer in their initial screening exam using the probability method from Wu (2022) [1]. The lead time is the time interval between detection through screening and the onset of symptoms without screening. Liu et al. (2018) used Bayesian methods to simulate the lead times from the NLST low-dose CT data [4]. Overdiagnosis refers to the detection of cancer that would not have caused symptoms before death, with estimates ranging from 18% to 67% of screen-detected lung cancers [5]. Evaluating these factors is crucial once the optimal screening time is determined, as they provide predictive information for future exams.

Several studies have investigated the optimal scheduling of cancer screening exams, primarily focusing on frequency and interval timing to enhance early detection and cost-effectiveness. For instance, Zelen developed a foundational model for scheduling disease detection exams [6], which was later extended by Lee and Zelen with applications to breast cancer screening [7]. Lashner et al. explored the timing of colonoscopy for cancer screening in ulcerative colitis patients [8]. More recent methodological advancements include the use of partially observable Markov decision processes to guide screening policies [9] and restless bandit models to optimize hepatocellular carcinoma screening schedules [10]. Özekici and Pliska proposed a delayed Markov model that accounts for diagnostic errors when determining optimal inspection schedules [11]. The U.S. Preventive Services Task Force has also issued practice-oriented screening recommendations, such as those for lung cancer [12]. However, while all these studies address optimal screening schedules, none of them explicitly examine the timing of the first screening exam, highlighting a critical gap in the current literature.

The main contribution of this study lies in its application of a probability-based method to determine optimal initial screening age using chest X-ray data, an approach not previously applied to this modality. By incorporating age, risk, and tumor progression parameters, this model offers a data-driven framework to inform personalized lung cancer screening decisions, especially in resource-limited settings.

The following is the layout of this paper: A brief review of the probability method is provided in Section 2. Section 3.1 presents simulation studies on the optimal screening strategies under different scenarios, and estimated the lead time distribution and probability of overdiagnosis at the future first screening time. Bayesian inference to identify optimal screening time based on one’s current age using the NLST chest X-ray data is presented in Section 3.2, together with the estimated lead time distribution and the probability of overdiagnosis. We end with a discussion of the study results in Section 4.

2. Materials and Methods

We use female lung cancer as an example to briefly review the methods developed by Wu (2022) to find the optimal screening time based on one’s current age and risk tolerance, and to estimate the lead time distribution and probability of overdiagnosis at the future screening time [1].

An asymptomatic woman at her current age

a_{0}

plans to take her first screening at age

t_{0} = a_{0} + t_{x}

, where

t_{x} > 0

. The goal is to find the appropriate

t_{x}

, so that the probability of clinical incidence (i.e., clinical symptoms appear) is limited to a predetermined small value p (i.e., the risk tolerance level), such as 10% or 20%. This ensures that with a 90% or 80% probability, one will not develop clinical symptoms before her first screening exam, keeping the probability of incidence before her first screening to 10% or 20%. We define a few events:

\begin{matrix} H_{0} & = & {One has no screening exam and is asymptomatic in [0, a_{0}]}; \\ I_{0} & = & {One is a clinical incident case in (a_{0}, t_{0})} \cap H_{0}; \\ D_{0} & = & {One is diagosed with cancer at age t_{0}} \cap H_{0} . \end{matrix}

The probability of incidence among the people at risk (those included in events

I_{0}

and

D_{0}

) is

P (I_{0} | I_{0} \cup D_{0}) = \frac{P (I_{0})}{P (I_{0} \cup D_{0})} = \frac{P (I_{0})}{P (I_{0}) + P (D_{0})} .

(1)

where

P (I_{0})

is the probability of incidence in

(a_{0}, t_{0})

:

P (I_{0}) = \int_{0}^{a_{0}} w (x) [Q (a_{0} - x) - Q (t_{0} - x)] d x + \int_{a_{0}}^{t_{0}} w (x) [1 - Q (t_{0} - x)] d x,

(2)

and

P (D_{0})

is the probability of detection at the first exam:

P (D_{0}) = β \int_{0}^{t_{0}} w (x) Q (t_{0} - x) d x .

(3)

Since

P (I_{0} | I_{0} \cup D_{0})

is monotone increasing with

t_{x}

(hence increasing with

t_{0}

), for any given value p in (0, 1), there exists a unique solution

t_{0}

, such that

P (I_{0} | I_{0} \cap D_{0}) = p

.

After

t_{0}

is found, the lead time at age

t_{0}

, if one were diagnosed with cancer at the first exam is

f_{L} (z | D_{0}) = \frac{f_{L} (z, D_{0})}{P (D_{0})}, for z \in (0, \infty)

(4)

where the numerator is

f_{L} (z, D_{0}) = β \int_{0}^{t_{0}} w (x) q (t_{0} + z - x) d x .

(5)

The probability of overdiagnosis and the probability of true early detection at the first exam at the first exam (with age

t_{0}

) is

P (O v e r D | D_{0}, T > t_{0}) = \int_{t_{0}}^{\infty} P (O v e r D | D_{0}, T = t) f_{T} (t | T > t_{0}) d t,

(6)

P (T r u e E D | D_{0}, T > t_{0}) = \int_{t_{0}}^{\infty} P (T r u e E D | D_{0}, T = t) f_{T} (t | T > t_{0}) d t,

(7)

where

P (O v e r D | D_{0}, T = t) = \frac{P (O v e r D, D_{0} | T = t)}{P (D_{0} | T = t)}

(8)

P (T r u e E D | D_{0}, T = t) = \frac{P (T r u e E D, D_{0} | T = t)}{P (D_{0} | T = t)}

(9)

Since,

P (D_{0} | T = t) = P (D_{0})

, and the two numerators were derived as

P (O v e r D, D_{0} | T = t) = β \int_{0}^{t_{0}} w (x) Q (t - x) d x .

(10)

P (T r u e E D, D_{0} | T = t) = β \int_{0}^{t_{0}} w (x) [Q (t_{0} - x) - Q (t - x)] d x .

(11)

The conditional probability density function of a human lifetime

f_{T} (t | T > t_{0})

was defined in Wu et al (2012) as follows [13]:

f_{T} (t | T > t_{0}) = \{\begin{matrix} \frac{f_{T} (t)}{P (T > t_{0})} = \frac{f_{T} (t)}{1 - F_{T} (t_{0})} & t > t_{0} \\ 0 & otherwise \end{matrix}

(12)

3. Results

To evaluate the optimal screening interval, lead time, and probability of overdiagnosis, we first conducted simulation studies using the method in Section 2. Then, we applied a similar approach to the NLST chest X-ray data to obtain predictive information regarding the optimal first screening time and risk.

3.1. Simulation

In the simulation study, we considered the following scenarios to estimate the optimal screening age, lead time, and probability of overdiagnosis:

Four values of the probability of incidence: p = 0.05, 0.10, 0.15, 0.20
Three different screening sensitivities: $β$ = 0.80, 0.90, 0.95
Four different mean sojourn times (MST): MST = 1.5, 2.5, 5, 10 years
Three different current ages: $a_{0}$ = 55, 60, 65 years

The transition density follows a log-normal probability density function multiplied by 30% (a sub-PDF):

w (t | μ, σ^{2}) = \frac{0.3}{\sqrt{2 π} σ t} exp {- {(l o g t - μ)}^{2} / (2 σ^{2})},

(13)

where t represents the duration time in the disease-free state

S_{0}

. The constant 0.3 in the numerator of the transition density function was empirically selected to scale the log-normal distribution such that the cumulative probability aligns with the age-specific incidence observed in the NLST population for heavy smokers. It reflects the expected lifetime probability of disease progression in high-risk individuals. It is the upper limit for being diagnosed with lung and bronchus cancer. See Wu, Erwin, and Rosner for a justification [14]. The parameters

(μ, σ^{2}) = (4.25, 0.015)

were chosen such that the mode of the transition density is approximately 70 years old.

The sojourn time distribution follows a Weibull distribution, as described by Rahman and Wu [2].

q (x | α, λ) = α λ x^{α - 1} exp (- λ x^{α}),

(14)

Q (x) = exp (- λ x^{α}) .

(15)

where x represents the sojourn time in the pre-clinical state. The parameters

α

and

λ

represent the shape and scale parameters of the Weibull distribution, respectively. These values control the skewness and spread of the sojourn time distribution, which in turn affects lead time and overdiagnosis outcomes. Their interpretation is crucial for understanding how tumor progression rates impact optimal screening strategies. For the simulation study, specific parameter values were selected to achieve the desired mean sojourn times (MST) using the Weibull distribution. The chosen values for (

α, λ

) are (3.47, 0.18), (1.56, 0.202), (2, 0.031), and (1.6, 0.021), with the corresponding mean sojourn times (MST) of 1.5, 2.5, 5, and 10 years, representing fast growing, medium growing, and slow growing tumors.

Table 1 shows the optimal initial screening age

t_{0}^{*}

obtained using the method in Section 2 and binary search for different values of p. The analysis considered various sensitivities

β

,

M S T

, and current ages (

a_{0}

). From Table 1, when

M S T

is 2.5 years and

a_{0} = 55

with

β = 0.95

, the optimal screening ages for different p values are

55.16

,

55.34

,

55.56

, and

55.82

. This means that with a 95% probability of avoiding clinical incidents before the first exam, the first screening should occur at age 55.16 (approximately two months after turning 55); and with an 80% probability of no clinical incidence, the first screening should be carried out at age 55.82 (about ten months after turning 55).

The results also show that as screening sensitivity increases from 0.8 to 0.95, the optimal initial screening age slightly increases when other factors remain unchanged. Additionally, the optimal initial screening age increases with higher incidence probability p and longer

M S T

. The ideal first screening age

t_{0}^{*}

is also influenced by one’s current age

a_{0}

, with the time interval (

t_{0}^{*} - a_{0}

) decreasing as

a_{0}

increases, assuming other factors remain the same.

The primary objective is to find the optimal screening time

t_{0}

. After finding it, we can investigate the lead time distribution

f_{L} (z | D_{0})

and the probability of overdiagnosis

P (O v e r D | D_{0}, T > t_{0}^{*})

at the future screening time,

t_{0}^{*}

. Table 2 presents the estimated mean, median, mode, and standard deviation of the lead time at the

t_{0}^{*}

for individuals with a current age of 55, 60, or 65 years, respectively.

From the result, the lead time distribution

f_{L} (z | D_{0})

is not directly influenced by

β

, p, and

a_{0}

. However, both the lead time distribution and the probability of overdiagnosis depend on factors like

t_{0}^{*}

,

w (t)

, and

Q (x)

. The findings across the tables exhibit similar patterns, suggesting consistency in the results:

When the MST increases, the mean, median, and mode of the lead time also increase indicating a longer expected time from the screening to symptom appearance.
The lead time distribution shows minimal dependence on the incidence probability p and the sensitivity $β$ when the optimal scheduling time $t_{0}^{*}$ is used.
As the current age $a_{0}$ increases, the mean, median, and mode of the lead time decrease, while the standard deviation remains relatively unchanged. This suggests that with increasing age, the expected time from screening to symptom onset becomes shorter, indicating potentially more rapid disease progression, but the variability in lead time remains consistent.

Figure 1 shows the lead time PDF curves under different factors: p,

β

,

a_{0}

, and

M S T

. The figure consists of four panels, each depicting the estimated lead time density when the optimal first screening age

t_{0}^{*}

is used. In each panel, three factors are fixed, and the fourth factor varies.

The results demonstrate that, given

t_{0}^{*}

, the lead time distribution exhibits minimal changes with respect to the incidence probability p and sensitivity

β

. However, it varies notably with one’s current age

a_{0}

and the MST. Specifically, as

a_{0}

increases, the mean, median, and mode of the lead time slightly decrease. Conversely, as MST increases, the central location of the lead time distribution shifts towards higher values.

Table 3 displays the estimated probability of overdiagnosis (in percentage) when using the optimal initial scheduling age

t_{0}^{*}

. Specifically, if an individual undergoes the first screening exam at the age provided in Table 1 and is subsequently diagnosed with cancer, the probability of overdiagnosis is given by the corresponding value in Table 3.

The probability of overdiagnosis increases with higher MST, incidence probability (p) and older age (

a_{0}

). However, it shows minimal variation with sensitivity (

β

). Generally, when MST is less than or equal to one and a half years, the probability of overdiagnosis is typically less than 4%, considered negligible. Overall, the probability of overdiagnosis is very small, with the largest observed value less than 15% for an MST of 10 years.

3.2. Application

We applied the method to the NLST chest X-ray screening data. Rahman and Wu (2021) used a likelihood function and parametric link functions to estimate the

β, w (t), q (x)

[2]. The sensitivity was estimated by the epidemiologic method:

β (t) = β_{0} = \sum_{t_{0} = 55}^{74} \sum_{k = 1}^{K} s_{k, t_{0}} / [\sum_{t_{0} = 55}^{74} \sum_{k = 1}^{K} s_{k, t_{0}} + \sum_{t_{0} = 55}^{74} \sum_{k = 1}^{K} r_{k, t_{0}}]

(16)

Although denoted as

β (t)

, sensitivity is assumed to be a constant value

β_{0}

in this model to simplify the analysis and reflect an average test performance across ages. This is consistent with common practice in screening modeling where age-specific estimates are not available or stable. Additionally, K refers to the number of risk strata or groups (e.g., by age or smoking status) in the aggregated NLST chest X-ray data.

The parametric link functions of

w (t)

,

q (x)

, and

Q (x)

were the same as presented in Equations (13)–(15). The unknown parameters

θ = (μ, σ^{2}, α, λ)

were estimated using the Markov Chain Monte Carlo (MCMC) method with a Gibbs sampler and likelihood function [2]. Initially, 200,000 samples were generated. After discarding the first 30,000 samples as burn-in and applying thinning every 200 iterations, we obtained a posterior sample of 500 from each chain. Running three initially overdispersed chains resulted in a total of 1500 Bayesian posterior samples

θ_{j}^{*}

for each gender.

For each gender (male and female), we used the 1500 posterior samples

θ_{j}^{*}

,

j = 1, 2, \dots, 1500

, assuming the current age

a_{0} = 55, 60, 65

, and under different p = 0.05, 0.1, 0.15, 0.20, and Equation (1) was applied to determine the optimal scheduling time

t_{0}^{*}

. Once scheduling times were obtained for the 1500 MCMC samples, we also estimated the lead time distribution, probability of overdiagnosis, and true-early detection.

Table 4 summarizes the mean, standard error, and the 95% highest posterior density (HPD) interval of the future screening age

t_{j}^{*}

(in years) using the NLST X-ray data for male and female heavy smokers. The results show that optimal first screening times are very close for both genders under similar conditions, with the same current age

a_{0}

and incidence probability p. However, male heavy smokers tend to have slightly longer optimal screening times compared to their female counterparts.

After determining the optimal first screening time, the posterior distribution of the lead time was obtained as the average distribution across the pairs

(θ_{j}^{*}, t_{j}^{*})

, where

j = 1, 2, \dots, 1500

taking the following form:

f_{L} (z | N L S T) = \frac{1}{1500} \sum_{j = 1}^{1500} f_{L} (z | θ_{j}^{*})

Table 5 presents the mean, median, mode, and standard deviation of the lead time, calculated using

f_{L} (z | N L S T)

. Generally, male heavy smokers exhibit slightly longer mean lead times compared to their female counterparts under similar conditions. Figure 2 displays the estimated lead time density curves using the NLST X-ray data, with different current ages

(a_{0})

and incidence probabilities

(p)

. The lead time curves barely change with the incidence probability p when the optimal scheduling time

t_{0}^{*}

is used. However, the density curves vary with current age

a_{0}

: larger

a_{0}

values result in higher peaks in the density curve, leading to slightly smaller mode values.

Each pair

(θ_{j}, t_{j})

, with

j = 1, 2, \dots, 1500

, is used to estimate the probability of overdiagnosis. The posterior mean, standard error, and 95% highest posterior density (HPD) interval of this probability are listed in Table 6. The probability of true-early detection is 1 minus the probability of overdiagnosis.

The probability of overdiagnosis at the first screening for heavy smokers, based on NLST X-ray data, is very low (less than 3%). This probability slightly increases with age and is slightly higher for male heavy smokers compared to females. It also increases slightly with higher incidence probability p. However, the maximum probability of overdiagnosis remains below 3%, indicating that overdiagnosis is not a significant concern at the first screening exam using chest X-rays for heavy smokers.

4. Discussion

This project aimed to determine the optimal timing for the first screening exam for asymptomatic individuals, based on their current age using X-rays for lung cancer. Using the NLST chest X-ray data, the optimal first screening time for male and female heavy smokers was estimated and found consistent with simulation results. The simulation showed that the interval between the current age and first screening time slightly increases with the screening sensitivity and the incidence probability. These findings align with the research by Wu (2022) using the NLST CT data [1]. This is the first time that the scheduling method has been applied to the chest X-ray data.

The NLST chest X-ray data showed that the optimal screening age is not significantly influenced by one’s current age, remaining relatively constant as the current age increases. Male heavy smokers tend to have slightly longer optimal screening ages compared to females in our chest X-ray–based analysis. However, Wu (2022) found that, using NLST CT data, female heavy smokers had later optimal screening ages [1]. This difference suggests that the choice of imaging modality (CT vs. X-ray) may influence gender-specific screening recommendations and should be carefully considered when developing future guidelines.

This research observed that if cancer is diagnosed at the first screening exam, the lead time does not significantly change with the incidence probability and the sensitivity. However, the mean, median, and mode of the lead time slightly decrease as the current age increases, compatible with the findings of Wu 2022 [1].

Previous research by Wu et al. (2007) showed that the mean lead time increases as the interval between screening exams shortens [15]. Benbassat (2021) reported a decrease in the mean lead time from 0.9 years with annual screening to 0.6 years with bi-annual screening [16]. Liu et al. (2018) found that women had longer mean lead times than men using the NLST CT data [4]. However, this study using the NLST X-ray data indicates that male heavy smokers have a longer mean lead time compared to females at the (future) first exam.

The sojourn time is crucial to the lead time distribution and is positively correlated with the mean lead time. A longer mean sojourn time results in a longer mean lead time. Jang et al. (2013) noted that lung cancer’s sojourn time distribution is heavily skewed to the right with a large variance, which also applies to the lead time variance [17], and it aligns with the findings of this research. The selected

α

and

λ

values were chosen to yield the desired mean sojourn times using the known formula for the mean of a Weibull distribution. As

α

increases or

λ

decreases, the mean sojourn time increases. Sensitivity analysis across this range (1.5 to 10 years) allowed us to assess how different tumor growth rates influence optimal screening age, lead time, and overdiagnosis probabilities. The results demonstrated consistent trends in Table 1 through Table 3, validating the robustness of the model.

The probability of overdiagnosis, calculated using the estimated first screening age, is positively correlated with the mean sojourn time, incidence probability, and current age, but shows slight changes with sensitivity, especially when the mean sojourn time is under two years. In general, the probability of overdiagnosis at the first screening is very small. This research highlights that overdiagnosis is more closely associated with a person’s lifetime (T). Since the first screening occurs at a comparatively younger age, overdiagnosis is expected to be minimal.

Future research will extend this method to incorporate individualized risk prediction models based on genetic, environmental, and clinical factors. Additionally, the framework can be adapted to evaluate cost-effectiveness and repeated screening strategies. Exploring other imaging modalities such as low-dose CT and their comparative impacts on optimal timing and overdiagnosis will further enhance practical applicability.

Author Contributions

Methodology, F.R. and D.W.; Formal analysis, F.R.; Writing–original draft, F.R.;Writing–review and editing, D.W.; Supervision, D.W.; Funding acquisition, D.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by NIH/NCI, grant number 1R15CA242482 (Wu, 1 September 2019–31 August 2023).

Data Availability Statement

This study utilized restricted-access data from the National Lung Screening Trial (NLST) chest X-ray dataset. Access to these data is restricted due to participant privacy protections and data use agreements. Researchers may request access through the Cancer Data Access System (CDAS) at https://cdas.cancer.gov/. Use of the data in this study was conducted in accordance with all applicable data use and ethical guidelines.

Acknowledgments

We thank the National Cancer Institute (NCI) for access to the NCI’s data collected by the NLST team. The statements contained herein are solely those of the authors and do not represent or imply concurrence or endorsement by the NCI.

Conflicts of Interest

The authors declare that they have no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

NLST	National Lung Screening Trials
CT	Computed Tomography
MST	Mean Sojourn Time
HPD	Highest Posterior Density
PDF	Probability Density function

References

Wu, D. When to initiate cancer screening exam? Stat. Its Interface 2022, 15, 503–514. [Google Scholar] [CrossRef] [PubMed]
Rahman, F.; Wu, D. Inference of Sojourn Time and Transition Density using the NLST X-ray Screening Data in Lung Cancer. Med. Res. Arch. 2021, 9, 10–18103. [Google Scholar] [CrossRef] [PubMed]
Zelen, M.; Feinleib, M. On the Theory of Screening for Chronic Diseases. Biometrika 1969, 56, 601–614. [Google Scholar] [CrossRef]
Liu, R.; Pérez, A.; Wu, D. Estimation of lead time via low-dose CT in the National Lung Screening Trial. J. Healthc. Inform. Res. 2018, 2, 353–366. [Google Scholar] [CrossRef] [PubMed]
Lazris, A.; Roth, A.R. Lung cancer screening: Pros and cons. Am. Fam. Physician 2019, 99, 740–742. [Google Scholar] [PubMed]
Zelen, M. Optimal scheduling of examinations for the early detection of disease. Biometrika 1993, 80, 279–293. [Google Scholar] [CrossRef]
Lee, S.J.; Zelen, M. Scheduling periodic examinations for the early detection of disease: Applications to breast cancer. J. Am. Stat. Assoc. 1998, 93, 1271–1281. [Google Scholar] [CrossRef]
Lashner, B.A.; Hanauer, S.B.; Silverstein, M.D. Optimal timing of colonoscopy to screen for cancer in ulcerative colitis. Ann. Intern. Med. 1988, 108, 274–278. [Google Scholar] [CrossRef] [PubMed]
Alagoz, O. Optimizing Cancer Screening Using Partially Observable Markov Decision Processes. In INFORMS Tutorials in Operations Research; INFORMS: Aachen, Germany, 2014; pp. 75–89. [Google Scholar] [CrossRef]
Lee, E.; Lavieri, M.S.; Volk, M.L. Optimal Screening for Hepatocellular Carcinoma: A Restless Bandit Model. Manuf. Serv. Oper. Manag. 2018, 21, 198–212. [Google Scholar] [CrossRef]
Özekici, S.; Pliska, S.R. Optimal Scheduling of Inspections: A Delayed Markov Model with False Positives and Negatives. Oper. Res. 1991, 39, 261–273. [Google Scholar] [CrossRef]
U.S. Preventive Services Task Force. Lung Cancer Screening Draft Recommendation Statement. 2020. Available online: https://www.uspreventiveservicestaskforce.org/uspstf/recommendation/lung-cancer-screening (accessed on 23 August 2020).
Wu, D.; Kafadar, K.; Rosner, G.L.; Broemeling, L.D. The lead time distribution when lifetime is subject to competing risks in cancer screening. Int. J. Biostat. 2012, 8, 6. [Google Scholar] [CrossRef] [PubMed]
Wu, D.; Erwin, D.; Rosner, G.L. Sojourn time and lead time projection in lung cancer screening. Lung Cancer 2011, 72, 322–326. [Google Scholar] [CrossRef] [PubMed]
Wu, D.; Rosner, G.L.; Broemeling, L.D. Bayesian inference for the lead time in periodic cancer screening. Biometrics 2007, 63, 873–880. [Google Scholar] [CrossRef] [PubMed]
Benbassat, J. Duration of lead time in screening for lung cancer. BMC Pulm. Med. 2021, 21, 4. [Google Scholar] [CrossRef] [PubMed]
Jang, H.; Kim, S.; Wu, D. Bayesian lead time estimation for the Johns Hopkins Lung Project data. J. Epidemiol. Glob. Health 2013, 3, 157–163. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The PDF curves of the lead time under the four factors: changing one-factor considering others as fixed.

Figure 2. Lead time density curve for NLST X-ray.

Table 1. Optimal initial screening age

t_{0}^{*}

(in years) found by binary search.

Table 1. Optimal initial screening age

t_{0}^{*}

(in years) found by binary search.

	MST = 1.5
p	$a_{0}$ = 55			$a_{0}$ = 60			$a_{0}$ = 65
	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95
0.05	55.08	55.09	55.09	60.07	60.08	60.08	65.07	65.07	65.08
0.10	55.16	55.19	55.20	60.15	60.17	60.18	65.14	65.16	65.17
0.15	55.26	55.30	55.32	60.24	60.27	60.28	65.22	65.25	65.26
0.20	55.38	55.43	55.46	60.34	60.38	60.41	65.31	65.35	65.37
	MST = 2.5
p	$a_{0}$ = 55			$a_{0}$ = 60			$a_{0}$ = 65
	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95
0.05	55.13	55.15	55.16	60.12	60.14	60.15	65.11	65.13	65.14
0.10	55.29	55.32	55.34	60.26	60.29	60.31	65.24	65.27	65.29
0.15	55.47	55.53	55.56	60.42	60.47	60.50	65.39	65.43	65.46
0.20	55.68	55.77	55.82	60.60	60.68	60.72	65.55	65.62	65.65
	MST = 5
p	$a_{0}$ = 55			$a_{0}$ = 60			$a_{0}$ = 65
	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95
0.05	55.39	55.45	55.48	60.32	60.36	60.38	65.27	65.30	65.32
0.10	55.89	56.02	56.08	60.69	60.79	60.83	65.57	65.64	65.68
0.15	56.52	56.75	56.86	61.14	61.29	61.37	65.91	66.03	66.09
0.20	57.32	57.69	57.88	61.66	61.89	62.01	66.31	66.48	66.56
	MST = 10
p	$a_{0}$ = 55			$a_{0}$ = 60			$a_{0}$ = 65
	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95
0.05	55.99	56.14	56.22	60.79	60.89	60.95	65.65	65.73	65.77
0.10	57.47	57.88	58.09	61.79	62.05	62.18	66.41	66.59	66.69
0.15	59.55	60.31	60.69	63.05	63.50	63.73	67.29	67.59	67.75
0.20	62.05	63.04	63.52	64.57	65.22	65.55	68.31	68.74	68.96

Table 2. Estimated mean, median, mode and standard deviation of the lead time at the optimal time

t_{0}^{*}

when

a_{0}

= 60.

Table 2. Estimated mean, median, mode and standard deviation of the lead time at the optimal time

t_{0}^{*}

when

a_{0}

= 60.

MST = 1.5 years
p	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95
0.05	0.85, 0.8, 0.56, 0.54	0.85, 0.8, 0.56, 0.54	0.85, 0.8, 0.56, 0.54
0.10	0.85, 0.8, 0.56, 0.54	0.85, 0.8, 0.56, 0.54	0.85, 0.8, 0.56, 0.54
0.15	0.84, 0.8, 0.56, 0.54	0.85, 0.8, 0.56, 0.54	0.85, 0.8, 0.56, 0.54
0.20	0.84, 0.8, 0.55, 0.54	0.85, 0.8, 0.55, 0.54	0.85, 0.8, 0.55, 0.54
MST = 2.5 years
p	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95
0.05	1.63, 1.54, 0.21, 1.24	1.63, 1.54, 0.21, 1.24	1.63, 1.54, 0.21, 1.24
0.10	1.63, 1.54, 0.21, 1.24	1.63, 1.54, 0.21, 1.24	1.63, 1.54, 0.21, 1.24
0.15	1.63, 1.54, 0.2, 1.24	1.63, 1.54, 0.2, 1.24	1.63, 1.54, 0.2, 1.24
0.20	1.63, 1.53, 0.2, 1.241	1.62, 1.53, 0.2, 1.24	1.62, 1.53, 0.2, 1.24
MST = 5 years
p	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95
0.05	1.77, 3.19, 1.82, 1.49	1.77, 3.19, 1.82, 1.49	1.77, 3.19, 1.82, 1.50
0.10	1.77, 3.17, 1.79, 1.50	1.77, 3.17, 1.78, 1.50	1.77, 3.17, 1.78, 1.50
0.15	1.77, 3.16, 1.75, 1.50	1.77, 3.15, 1.73, 1.51	1.77, 3.15, 1.72, 1.51
0.20	1.76, 3.14, 1.69, 1.51	1.77, 3.13, 1.67, 1.51	1.77, 3.12, 1.66, 1.51
MST = 10 years
p	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95
0.05	4.92, 5.01, 3.42, 1.59	4.92, 5.01, 3.4, 1.59	4.92, 5.01, 3.39, 1.59
0.10	4.93, 5.01, 3.28, 1.59	4.93, 5.01, 3.24, 1.59	4.93, 5.01, 3.22, 1.59
0.15	4.94, 5.01, 3.09, 1.59	4.94, 5.01, 3.02, 1.59	4.95, 5.01, 2.99, 1.59
0.20	4.95, 5.01, 2.85, 1.58	4.96, 5.01, 2.74, 1.58	4.96, 5.01, 2.69, 1.58

Table 3. Estimated probability of overdiagnosis (in percentage) at the initial screening age

t_{0}^{*}

.

Table 3. Estimated probability of overdiagnosis (in percentage) at the initial screening age

t_{0}^{*}

.

	MST = 1.5
p	$a_{0}$ = 55			$a_{0}$ = 60			$a_{0}$ = 65
	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95
0.05	1.30	1.31	1.31	1.85	1.87	1.87	2.48	2.48	2.50
0.10	1.39	1.43	1.44	1.99	2.02	2.04	2.65	2.70	2.72
0.15	1.51	1.56	1.59	2.15	2.21	2.23	2.85	2.92	2.95
0.20	1.66	1.73	1.77	2.34	2.42	2.48	3.08	3.19	3.24
	MST = 2.5
p	$a_{0}$ = 55			$a_{0}$ = 60			$a_{0}$ = 65
	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95
0.05	2.39	2.41	2.43	3.33	3.37	3.38	4.45	4.50	4.52
0.10	2.58	2.61	2.64	3.57	3.62	3.65	4.75	4.83	4.88
0.15	2.80	2.87	2.91	3.85	3.94	3.99	5.12	5.22	5.30
0.20	3.06	3.18	3.25	4.18	4.33	4.41	5.53	5.72	5.80
	MST = 5
p	$a_{0}$ = 55			$a_{0}$ = 60			$a_{0}$ = 65
	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95
0.05	4.74	4.81	4.85	6.24	6.31	6.34	8.20	8.27	8.32
0.10	5.35	5.47	5.55	6.87	7.05	7.12	8.93	9.10	9.21
0.15	6.13	6.45	6.60	7.60	7.88	8.03	9.81	9.96	10.12
0.20	7.21	7.77	8.07	8.60	9.07	9.20	10.73	11.21	11.45
	MST = 10
p	$a_{0}$ = 55			$a_{0}$ = 60			$a_{0}$ = 65
	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95	$β$ = 0.8	$β$ = 0.9	$β$ = 0.95
0.05	9.39	9.52	9.65	10.30	10.53	10.67	11.41	11.65	11.77
0.10	10.71	10.46	10.72	11.04	11.09	11.11	12.04	12.18	12.31
0.15	11.46	11.92	11.76	12.06	12.16	12.49	13.81	13.96	14.53
0.20	12.56	12.84	12.11	13.07	13.53	13.61	14.10	14.34	14.61

Table 4. Estimatedinitial screening age

t_{0}^{*}

and its 95% HPD interval using the NLST X-ray data.

Table 4. Estimatedinitial screening age

t_{0}^{*}

and its 95% HPD interval using the NLST X-ray data.

	MALE
p	$a_{0}$ = 55			$a_{0}$ = 60			$a_{0}$ = 65
	mean	s.e.	95%CI	mean	s.e.	95%CI	mean	s.e.	95%CI
0.05	55.042	0.004	(55.035, 55.050)	60.042	0.004	(60.035, 60.050)	65.042	0.004	(65.035, 65.050)
0.10	55.089	0.009	(55.074, 55.105)	60.089	0.009	(60.074, 60.105)	65.089	0.009	(65.074, 65.105)
0.15	55.142	0.014	(55.117, 55.168)	60.142	0.014	(60.117, 60.168)	65.142	0.014	(65.117, 65.168)
0.20	55.201	0.020	(55.166, 55.237)	60.202	0.020	(60.166, 60.237)	65.202	0.020	(65.166, 65.238)
	FEMALE
p	$a_{0}$ = 55			$a_{0}$ = 60			$a_{0}$ = 65
	mean	s.e.	95%CI	mean	s.e.	95%CI	mean	s.e.	95%CI
0.05	55.041	0.004	(55.035, 55.049)	60.041	0.004	(60.035, 60.049)	65.041	0.004	(65.035, 65.049)
0.10	55.086	0.008	(55.074, 55.103)	60.086	0.008	(60.074, 60.103)	65.087	0.008	(65.074, 65.103)
0.15	55.137	0.013	(55.118, 55.164)	60.138	0.013	(60.118, 60.164)	65.138	0.013	(65.118, 65.164)
0.20	55.195	0.019	(55.167, 55.232)	60.195	0.019	(60.167, 60.232)	65.195	0.019	(65.167, 65.232)

Table 5. Estimated mean, median, mode and standard deviation of the lead time using NLST X-ray data.

MALE
p	$a_{0}$ = 55	$a_{0}$ = 60	$a_{0}$ = 65
0.05	0.68, 0.64, 0.46, 0.43	0.67, 0.63, 0.39, 0.43	0.66, 0.62, 0.31, 0.43
0.10	0.68, 0.64, 0.46, 0.43	0.67, 0.63, 0.39, 0.43	0.66, 0.62, 0.31, 0.43
0.15	0.68, 0.64, 0.46, 0.43	0.67, 0.63, 0.39, 0.43	0.66, 0.62, 0.31, 0.43
0.20	0.68, 0.64, 0.46, 0.43	0.67, 0.63, 0.39, 0.43	0.66, 0.62, 0.31, 0.43
FEMALE
p	$a_{0}$ = 55	$a_{0}$ = 60	$a_{0}$ = 65
0.05	0.67, 0.62, 0.35, 0.43	0.66, 0.61, 0.29, 0.43	0.66, 0.61, 0.22, 0.43
0.10	0.67, 0.62, 0.35, 0.43	0.66, 0.61, 0.29, 0.43	0.66, 0.61, 0.22, 0.43
0.15	0.67, 0.62, 0.34, 0.43	0.66, 0.61, 0.29, 0.43	0.66, 0.61, 0.22, 0.43
0.20	0.67, 0.62, 0.34, 0.43	0.66, 0.61, 0.29, 0.43	0.66, 0.61, 0.22, 0.43

Table 6. Estimated mean, standard error and 95% C.I. for probability of overdiagnosis at the first exam for the NLST X-ray data (in percentage).

	MALE
p	$a_{0}$ = 55			$a_{0}$ = 60			$a_{0}$ = 65
	mean	s.e.	95%CI	mean	s.e.	95%CI	mean	s.e.	95%CI
0.05	1.25	0.28	(1.05, 1.61)	1.78	0.29	(1.06, 2.21)	2.41	0.38	(2.06, 2.97)
0.10	1.32	0.25	(1.11, 1.63)	1.81	0.36	(1.60, 2.36)	2.52	0.39	(2.16, 3.09)
0.15	1.49	0.44	(1.13, 1.65)	1.85	0.38	(1.64, 2.40)	2.58	0.46	(2.21, 3.17)
0.20	1.53	0.52	(1.23, 1.69)	1.91	0.43	(1.71, 2.44)	2.67	0.58	(2.25, 3.22)
	FEMALE
p	$a_{0}$ = 55			$a_{0}$ = 60			$a_{0}$ = 65
	mean	s.e.	95%CI	mean	s.e.	95%CI	mean	s.e.	95%CI
0.05	0.74	0.08	(0.64, 0.90)	1.03	0.11	(0.89, 1.26)	1.48	0.16	(1.29, 1.82)
0.10	0.77	0.08	(0.67, 0.94)	1.08	0.11	(0.95, 1.32)	1.55	0.16	(1.35, 1.89)
0.15	0.81	0.08	(0.71, 0.98)	1.13	0.11	(0.99, 1.38)	1.63	0.17	(1.43, 1.99)
0.20	0.85	0.08	(0.75, 1.03)	1.19	0.12	(1.05, 1.45)	1.73	0.17	(1.51, 2.09)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rahman, F.; Wu, D. A Statistical Method to Schedule the First Exam Using Chest X-Ray in Lung Cancer Screening. Mathematics 2025, 13, 2623. https://doi.org/10.3390/math13162623

AMA Style

Rahman F, Wu D. A Statistical Method to Schedule the First Exam Using Chest X-Ray in Lung Cancer Screening. Mathematics. 2025; 13(16):2623. https://doi.org/10.3390/math13162623

Chicago/Turabian Style

Rahman, Farhin, and Dongfeng Wu. 2025. "A Statistical Method to Schedule the First Exam Using Chest X-Ray in Lung Cancer Screening" Mathematics 13, no. 16: 2623. https://doi.org/10.3390/math13162623

APA Style

Rahman, F., & Wu, D. (2025). A Statistical Method to Schedule the First Exam Using Chest X-Ray in Lung Cancer Screening. Mathematics, 13(16), 2623. https://doi.org/10.3390/math13162623

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Statistical Method to Schedule the First Exam Using Chest X-Ray in Lung Cancer Screening

Abstract

1. Introduction

2. Materials and Methods

3. Results

3.1. Simulation

3.2. Application

4. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI