Development and External Validation of an Improved Version of the Diagnostic Model for Opportunistic Screening of Malignant Esophageal Lesions

Simple Summary Simple and effective risk stratification tools which allow prediction of the risk of malignant esophageal lesions are needed for the practice of opportunistic screening. The aim of the current study was to develop an improved version of the diagnostic model based on a large-scale outpatient cohort and assess the robustness and generalizability of the model through external validations. The improved diagnostic model had seven predictors and generated an area under the receiver operating characteristic curve of 0.860 in the development set. Validation of the model in two external populations also showed high discrimination power and was able to increase the detection rate of malignant esophageal lesions. This questionnaire-based diagnostic model provides an easy-to-use tool to identify high-risk individuals and will be useful for the promotion of the opportunistic screening of esophageal cancer. Abstract We aimed to develop an improved version of the diagnostic model predicting the risk of malignant esophageal lesions in opportunistic screening and validate it in external populations. The development set involved 10,595 outpatients receiving endoscopy from a hospital in Hua County, a high-risk region for esophageal squamous cell carcinoma in northern China. Validation set A enrolled 9453 outpatients receiving endoscopy in a non-high-risk region in southern China. Validation set B involved 17,511 residents in Hua County. The improved diagnostic model consisted of seven predictors including age, gender, family history of esophageal squamous cell carcinoma, smoking, body mass index, dysphagia, and retrosternal pain, with an area under the receiver operating characteristic curve (AUC) of 0.860 (95% confidence interval: 0.835–0.886) in the development set. Ideal discrimination ability was achieved in external validations (AUC validation set A: 0.892, 95% confidence interval: 0.858–0.926; AUC validation set B: 0.799, 95% confidence interval: 0.705–0.894). This improved model also markedly increased the detection rate of malignant esophageal lesions compared with universal screening, demonstrating great potential for use in opportunistic screening of malignant esophageal lesions in heterogeneous populations.


Introduction
Esophageal cancer (EC) is one of the most common cancers and is the leading cause of cancer death globally [1,2]. There were an estimated~600,000 new cases and~540,000 deaths worldwide in 2020 [1]. More than half of the world's new EC cases occurred in China [3]. Esophageal squamous cell carcinoma (ESCC) was the predominant histologic type [3].
Due to the lack of typical symptoms at an early stage, most ESCC cases are diagnosed at an advanced stage with low 5-year survival [4]. Great importance has therefore been attached to screening. Cancer screening programs can be classified into organized screening and opportunistic screening according to differences in decision-maker, implementation, and payers. Organized screening programs for EC have been implemented by the government in high-risk areas in China. However, this screening modality entails continuous massive investment in human and material resources and is difficult to expand and sustain on a large scale. In contrast, opportunistic screening, which is defined as screening for patients who present to healthcare professionals for any complaint, is more cost-effective and preferable for scaling up esophageal cancer screening [5].
For the implementation of opportunistic screening for ESCC, two prerequisites must be considered. First, there must be confirmed evidence supporting the effectiveness of screening. Observational studies have reported that early-stage EC patients can benefit from cancer screening [6][7][8], and large-scale randomized controlled trials (RCT) have been initiated to provide the highest-grade evidence in the future [9,10]. Second, a simple and effective risk stratification tool to predict the risk of malignant esophageal lesions is needed to help patients and physicians decide whether to accept endoscopic examinations and to guide endoscopists.
In a previous study, we constructed the first model predicting the risk of prevalent esophageal malignant lesions for ESCC opportunistic screening by combining five easyto-collect predictors [11]. That model showed good discrimination in the development set. More validations are needed to evaluate the performance of that model in real-world screening scenarios and heterogeneous populations. Therefore, in this study, we aimed to develop an improved version of the model using a larger development set, and to assess the robustness and generalizability of the model through external validation using two different cohorts. We further tested its ability of increasing the detection rate of opportunistic screening.  [11]. In the current study, we expanded the development set by additionally enrolling consecutive outpatients from 21 February 2019 to 31 December 2021 and constructed an improved version of the model.

Validation Set A
Validation set A was previously used for validation of the original model and enrolled outpatients undergoing endoscopy in Peking University Shenzhen Hospital from 19 June 2017 to 14 January 2019. This hospital is a tertiary hospital in Shenzhen, which is an economically dynamic city with a huge migrant population and a low incidence of ESCC in southern China [11]. In this study, validation set A was expanded by recruiting additional consecutive outpatients up to 18 November 2021.

Validation Set B
To assess the performance of the model in a general population, we further validated it in the control group of the Efficacy of endoscopic Screening for Esophageal Cancer in China (ESECC) trial (ClinicalTrial: No. NCT01688908) as validation set B. As described previously, ESECC is a randomized controlled trial conducted in Hua County to evaluate the efficacy and cost-effectiveness of endoscopic screening for EC [9]. A total of 668 villages were randomly selected and equally allocated to a screening group and a control group [9]. The participants in the control group did not receive endoscopic screening.

Inclusion Criteria
For all three datasets, inclusion criteria were: (1) age 45 to 69 years; (2) no history of cancer, mental disorder, or contraindications for endoscopy; and (3) completion of an adequate upper GI endoscopic examination (not applicable for validation set B).

Data Collection and Outcome Ascertainment
All participants in these three cohorts completed a one-on-one computer-aided standardized questionnaire to collect demographic variables and information regarding potential predictors of ESCC. Candidate predictors, which were selected based on literature review, included age, gender, socioeconomic status (education level and marital status), cigarette smoking, alcohol consumption, consumption of hot tea, source of drinking water, family history of ESCC, body mass index (BMI), type of fuel used for cooking, exposure to fumes in the kitchen, pesticide exposure, intake of fruit and vegetables, unhealthy dietary habits, and upper GI symptoms in the last 1 month (including dysphagia, retrosternal pain, reflux or heartburn, loss of appetite or dyspepsia, nausea or vomiting, and epigastric pain) ( Table S1).
The outcome was defined as the detection of severe dysplasia and above (SDA) of the esophagus, which included severe squamous dysplasia, carcinoma in situ, and ESCC. For the development set and validation set A, it was ascertained based on pathological diagnoses of the biopsy specimens, that were taken from all focal lesions during upper GI endoscopic examination and reviewed independently by two experienced pathologists. The outcome for validation set B was obtained through annual follow-up via active doorto-door interviews and passive linkage with local health insurance claims data, which have been proved to have a sensitivity of over 95% in identifying cancer cases and may be an ideal data source for cancer follow-up [12][13][14].

Statistical Analysis
The chi-squared test and the Kruskal-Wallis rank-sum test were used to compare characteristics of the participants across the three datasets for categorical and continuous variables, respectively.
We used a two-step approach to develop the prediction model. The correlation of each potential predictor with the presence of SDA was first assessed using univariable logistic regression in the development set. Predictors with p < 0.05 or p < 0.5 and odds ratio > 1.3 were initially selected for multivariable logistic regression. Backward elimination using the Akaike information criterion (AIC) was adopted to determine the final multivariable model, which is to remove predictors that would increase AIC until reaching the smallest AIC. Patients with missing values or other upper GI cancers were excluded from the analysis.
Receiver operating characteristic (ROC) curves were plotted to visually assess the discrimination of the final model. AUCs were calculated according to the observed and predicted values and compared using the DeLong test among the three datasets [15].
We set different screening coverages in the development set to identify 'high-risk' individuals in the development set. The highest predicted probability for achieving the desired screening coverage in the development set was then applied to the two validation sets to assess the application performance of the model. We calculated the sensitivity, the average number of endoscopies needed to detect one SDA case, the detection rate, and the detection rate ratio compared to universal screening in each of the three datasets.
All analyses in this study were conducted using R software (version 4.0.2, Ross Ihaka and Robert Gentleman, Auckland, New Zealand). All tests were two-sided and p < 0.05 were considered statistically significant.

Ethics Statement
This study was approved by the Institutional Review Board of the Peking University School of Oncology, China. Written informed consent was obtained from all the participants.

Baseline Characteristics and Outcome
The three datasets showed statistically significant differences in selected characteristics ( Table 1). Compared to the general population in validation set B, outpatients recruited from hospitals (development set and validation set A) were more likely to present with dysphagia and retrosternal pain but less likely to have cigarette smoking habits. Participants from Hua County (development set and validation set B) were more likely to have a family history of ESCC and low BMI than participants from Shenzhen (validation set A). We identified 154 (1.5%) SDA cases among 10,595 participants in the development set, and 49 (0.5%) SDA cases among 9453 participants in validation set A. In validation set B, there were 18 (0.1%), 39 (0.2%), and 66 (0.4%) SDA cases within 1-, 3-, and 5-year period of follow-up, respectively, among the 17,511 participants.

Model Development
Among the candidate variables, seven predictors were selected to construct the new version of the model (Table 2). We provide a simple and easy-to-use calculator (Excel S1) for use in clinical settings to obtain the risk for malignant esophageal lesions based on our model. The formula of the model was as follows: Y risk of malignant esophageal lesions = 1/(1 + eˆ(− (−14.996 + 0.159 × age + 0.531 × gender + 0.639 × family history + 0.412 × smoking + 0.530 × BMI + 1.547 × dysphagia + 0.570 × retrosternal pain ))) (1) Abbreviation: OR, odds ratio; CI, confidence interval. a Predictors were selected by a 2-step selection method in which all candidate predictors were first evaluated in univariable logistic regression models and those with p < 0.05 or p < 0.5 and odds ratio > 1.3 were subjected to multivariable logistic regression models. The Akaike information criterion was used to determine the final predictor pattern. b Participants with missing values were excluded from the multivariable analysis. c Odds ratio = exp (coefficient). d Cigarette smoking was defined as a smoking history of at least 18 packs of cigarettes per year. e Positive symptoms were defined as occasional or frequent self-reported symptoms in the previous 1 month.
As shown in Figure 1a regression models. The Akaike information criterion was used to determine the final predictor pattern. b Participants with missing values were excluded from the multivariable analysis. c Odds ratio = exp (coefficient). d Cigarette smoking was defined as a smoking history of at least 18 packs of cigarettes per year. e Positive symptoms were defined as occasional or frequent self-reported symptoms in the previous 1 month. Figure 1a

Evaluation of Application Performance of the Model
We set different screening coverages to select 'high-risk' individuals in the development set and evaluated the application performance of the model in the two validation sets. In all three datasets, as the cutoff rose and the proportion of individuals defined as at high risk decreased, the detection rate increased notably, reflecting substantial risk enrichment of the model for patients with malignant lesions in the esophagus (Table 3). For example, if only the top 5% of all individuals were referred for endoscopic screening, over 40% of all cases could be detected, and the detection rate would be more than eight times higher than that of universal screening in the development set and validation set A. When the screening coverage was shifted to 30%, over 80% of cases could be detected, with a three-fold increase in the detection rate in the development set and validation set A compared to universal screening. For validation set B, the model could also increase the detection rate by four times (from 0.1% to 0.4%) and two times (from 0.1% to 0.2%) when the top 5% and 30% of the population were screened, respectively.

Discussion
In this study, we developed an improved version of diagnostic model to predict individualized risk of malignant esophageal lesions in an opportunistic screening scenario. The final model containing seven predictors demonstrated high discrimination ability in the development set as well as in two external validation sets. This model exhibited remarkable generalizability and potential for application in opportunistic screening for ESCC.
For the new model in the present study, seven predictors were selected and most of them were well-recognized risk factors for ESCC. Among these predictors, we included two upper GI symptoms, dysphagia and retrosternal pain. In the traditional epidemiologic concept, screening targets asymptomatic individuals at precancerous stage in a given population. However, disease-related symptoms could not be simply distinguished as positive or none, since it is usually a continuous and gradual process from the completely asymptomatic phase to obvious symptoms that make the patients seek for medical services on their own. Patients who have not consulted healthcare providers for early warning signs, i.e., at preclinical stage, are not necessarily 'asymptomatic'. In rural China, for example, many ESCC patients delayed seeking medical services until symptoms were quite obvious because of limited socioeconomic status and poor health awareness [16]. In this study, about 5% of the general population (validation set B) had self-reported dysphagia and retrosternal pain. Hence, another fundamental goal of opportunistic screening in this population would be to diagnose patients with symptoms as early as possible to achieve 'downstaging' effects [17]. We further conducted a stratification analysis in patients with or without the two ESCC-related symptoms in the development set (Table S2), and significantly increased detection rates were observed in both subgroups, suggesting that this model would provide homogeneous performance in these two subgroups. Additionally, these symptoms may also occur in cases of adenocarcinoma of the esophagogastric junction (AEG). Therefore, we tried to use our model to predict the risk of AEG and the results showed good predictive accuracy ( Figure S1). However, due to the differences in etiology, epidemiology, and histology between ESCC and AEG, a model specifically established for AEG would be warranted in the future.
Compared with the original version of the model, two new variables, namely family history of ESCC and gender, were introduced in the present model. Family history has been reported as an important risk factor in most previous epidemiological studies of ESCC, particularly in China. People with a positive family history of ESCC were 1.5 to 2.5 times more likely to have/develop ESCC probably because of shared lifestyle and/or genetic susceptibility [9, [18][19][20][21]. Family history was a key predictor in another model we constructed for the identification of high-risk individuals in the general population in high-risk areas of rural China [20]. Gender, the other new predictor, is also a widely recognized risk factor for ESCC. It has been reported that the risk of having malignant esophageal lesions for males was~1.5 times higher than that for females in high-incidence areas in China, such as the Taihang Mountain area [9, 13,22]. The gender difference may be even larger in non-high-risk areas, with the risk of malignant esophageal lesions in males 3 to 4 times higher than that in females [3,13]. After adding these two new predictors, we observed statistically significant improvement of the model in validation set A ( Figure S2b), and slightly improved performance, although not statistically significant, in the development set ( Figure S2a) and validation set B ( Figure S2c) as compared with the original version. This demonstrates the contribution of these new predictors, especially in non-high-risk areas.
We completed multidimensional validation of the model in two independent external populations, which was essential for considering/recommending application of the model in clinical practice [23]. Validation set A was established in a clinical setting, the same as the development set, but in a geographically separated region markedly different from Hua County in population structure. The excellent discrimination ability of this model in this external validation dataset suggested the outstanding generalizability of the model when applied in regions and populations different from the settings where it was developed. Validation set B, which was the control group in a large-scale RCT, recruited participants Cancers 2022, 14, 5945 9 of 11 from the general population in the same region as the development set. Since the outcome in validation set B was ascertained in follow-up, the cases were those which had finally progressed to the cancerous stage, the very target of screening efforts [17]. Although the AUCs when predicting long-term outcomes were slightly lower than that in the two clinical datasets, this model showed generally good discrimination in the general population. The performance of the model in these external populations demonstrated its great potential to be applied in real-world ESCC opportunistic screening programs.
In real-world screening programs for ESCC, the optimal cutoff to define 'high-risk individuals' in a certain population should be carefully determined in advance. This decision must be made based on overall consideration of resource availability, population coverage, and the capacity of endoscopic examination in local healthcare facilities. To facilitate decision-making in varied scenarios, we set different cutoffs for risk stratification and evaluated the application performance of the model for each cutoff, rather than simply dividing the population into various risk groups using fixed cutoff points. In situations where the screening program aims to increase the detection rate of malignant lesions to achieve the highest public health benefit as possible with limited resources, we would recommend a higher risk cutoff. For example, the risk cutoff of 0.0585372 may be used to identify the top 5% of the whole population that are predicted to be at the highest risk, for whom endoscopic screening would be recommended. With that cutoff, our model could achieve a detection rate of 12.1% (8.2 times higher than that in universal screening) and reduce the number of endoscopies needed to detect one SDA from 68 to 8. This would greatly relieve the burden on local healthcare system. In contrast, a lower cutoff could be adopted if the human, medical, and financial resources are adequate; for example, a risk cutoff of 0.0185875 to identify 20% of the population for screening. In this case, the screening program could achieve a detection rate of 5.3%, 3.6-fold higher than that in universal screening, and could detect one SDA in every 19 endoscopic examinations on average. In the scenario where the priority of the screening program is to detect all cancer cases, we could still avoid over 30% of screening examinations compared with universal screening by adopting a cutoff of 0.0022627.
As a pre-screening tool prior to the screening examinations, our model can be used, theoretically, in combination with any screening technique for esophageal cancer. Among the screening techniques currently available, upper gastrointestinal endoscopy is the gold standard and has been widely used in China and worldwide. Other screening techniques to detect esophageal cancer such as cytosponge have been reported to be simple and low-cost. The application performance of our model in combination with other screening techniques needs to be further investigated in the future based on real-world data.
A limitation of the present study should also be noted. Although this is a multi-center real-world study with a large sample size, more extensive validations and calibrations in other Chinese and non-Chinese populations are needed to confirm the generalizability and robustness of the model.

Conclusions
In summary, we developed and externally validated an improved version of the diagnostic model predicting the risk of malignant esophageal lesions for opportunistic screening of ESCC. This easy-to-use questionnaire-based risk stratification tool may be readily integrated into smart portable terminals and social media platforms, thereby greatly promoting the uptake of cancer screening programs through self-risk-assessment and self-health-management among the public. As such, high-risk individuals in the general population may for the first time be empowered to act as the initiator and decision-maker for cancer screening examination, in contrast to the traditional cancer screening strategy in which the leading role has long been taken by the government or healthcare facilities.