1. Introduction
Lung cancer remains the leading cause of cancer-related mortality worldwide, with non-small cell lung cancer (NSCLC) accounting for approximately 85% of all cases [
1]. The widespread adoption of low-dose computed tomography (CT) has substantially increased the detection of early-stage lung cancer, particularly small peripheral nodules [
2,
3]. This shift toward earlier detection has transformed surgical treatment paradigms, with accumulating evidence supporting anatomical sublobar resection (segmentectomy or wedge resection) as a viable alternative to lobectomy for selected patients with small NSCLC [
4,
5,
6].
Current clinical practice guidelines recommend lobectomy for tumors exceeding 20 mm in diameter, while anatomical sublobar resection may be considered for smaller lesions in patients with compromised pulmonary function or significant comorbidities [
7,
8]. However, the decision to perform either a lobectomy or an anatomical sublobar resection is dependent significantly on the preoperative CT measurements [
9,
10]. The critical assumption underlying this approach is that CT-measured tumor diameter accurately reflects pathological tumor size [
11].
Despite the clinical significance of this issue, several key questions remain unanswered. The patterns of CT–pathology discordance across different tumor sizes have not been well defined in thoracoscopic surgery. The optimal CT cut-off for balancing sensitivity and specificity for surgical decision-making is undefined. Moreover, the real-world clinical consequences of overtreatment caused by CT overestimation have not been adequately quantified.
We hypothesized that CT-based tumor diameter systematically differs from pathological size near the 20 mm surgical boundary in a size-dependent manner, leading a significant proportion of patients to undergo more extensive resection than pathology would warrant, and that modestly raising the CT threshold could reduce this overtreatment without compromising identification of tumors that genuinely require lobectomy.
2. Materials and Methods
2.1. Study Design and Patients
This retrospective cohort study was conducted at a single tertiary care facility with the approval of the Institutional Review Board (IRB protocol number KY2025-103). The requirement for informed consent was waived by the Institutional Review Board due to the retrospective nature of the study and the use of de-identified clinical records. The study evaluated CT–pathology tumor size discordance and its clinical impact on surgical decision-making among patients undergoing robotic-assisted thoracoscopic surgery for primary lung cancer between January 2020 and December 2024.
Patients were included if they had a preoperative chest CT scan performed within 3 months prior to surgery and a complete pathological evaluation of lung cancer following thoracoscopic lobectomy or segmentectomy, with a pathological diagnosis of primary lung cancer. Surgical patients were included regardless of preoperative CT tumor size, provided that paired CT and pathological maximal tumor diameter measurements were available.
Patients were excluded if they had received neoadjuvant chemotherapy or radiation, had multiple primary lung cancers, had detectable distant disease at the time of thoracoscopic surgery, had incomplete medical records, or if their preoperative chest CT images were unavailable, of poor quality, or inadequate for accurate tumor diameter measurement. Of the 1185 patients initially enrolled, 89 (7.5%) were excluded due to missing CT measurements or imaging issues, including inaccessible images (n = 52, 58%), poor image quality (n = 24, 27%), or other technical problems (n = 13, 15%), resulting in a final cohort of 1096 patients.
2.2. CT Measurement and Clinical Staging
All patients underwent multi-detector CT scans of the thorax using thin-section imaging (slice thickness ≤ 2 mm) prior to surgery using standardized institutional lung cancer protocols (120 kVp, automated tube current modulation; scanner models: Siemens SOMATOM Definition Edge (Erlangen, Germany) and GE Revolution CT(GE HealthCare, Chicago, IL, USA)). CT image measurements were performed by two experienced thoracic radiologists (each with >10 years of experience) blinded to the pathological results. CT measurements were obtained using a mediastinal setting (window width: 350–450 HU; window level: 40–60 HU) at the slice showing the largest tumor diameter. For ground-glass opacity (GGO) lesions, both mediastinal and lung settings were utilized, with the maximum measurement reported. Discrepancies > 2 mm were resolved by consensus. Inter-reader agreement was excellent (ICC = 0.92; 95% CI, 0.89–0.94). The maximal total diameter was used for preoperative clinical staging per the 8th edition of the TNM classification system.
2.3. Surgical Procedure
All procedures were performed using the Da Vinci Robotic Surgical System (Intuitive Surgical, Sunnyvale, CA, USA). The choice of surgical approach was determined by the attending surgeon based on tumor size, location, patient cardiopulmonary reserve, and patient preferences. Patients with tumors > 20 mm or centrally located were generally treated with lobectomy; patients with clinically T1a–b tumors (≤20 mm) or T1c tumors (≤30 mm) with compromised lung function were treated with segmentectomy.
2.4. Pathological Evaluation
All surgical specimens were evaluated by experienced thoracic pathologists in accordance with the 8th edition of the TNM lung cancer classification system. Pathological tumor size was defined as the maximal total lesion diameter (total tumor size), measured on the resected specimen after routine fixation and sectioning. This definition was applied consistently to both lobectomy and segmentectomy specimens, including lesions with ground-glass components. Pathologists were blinded to the CT measurements.
2.5. Definition of Overtreatment
Patients were considered to have undergone size-threshold–discordant lobectomy (operationally termed “potential overtreatment”) if they underwent lobectomy despite having a maximal pathological total tumor diameter of ≤20 mm, under a size-only eligibility assumption. We acknowledge that lobectomy may still be clinically appropriate for some such tumors because operative strategy also depends on tumor location, multiplicity, anatomic constraints, margin feasibility, nodal assessment, and patient factors; these were not explicitly modeled in the present analysis.
For analyses evaluating CT decision thresholds, “CT-driven overtreatment” was further operationalized as lobectomy performed in patients with pathological total diameter ≤ 20 mm whose CT-measured diameter exceeded a given CT threshold T. “CT-driven undertreatment” was defined as sublobar resection in patients with pathological total diameter > 20 mm whose CT-measured diameter was ≤T. These terms are used strictly as decision-analytic labels relative to a size-only rule.
2.6. Statistical Analysis
Measurement bias was assessed using Bland–Altman methodology. The measurement difference was defined as Δ = CT diameter − pathological total diameter (mm), with positive values indicating CT overestimation. Mean bias, standard deviation, and 95% limits of agreement (LOA) were calculated per tumor size group (≤10 mm, 11–20 mm, 21–30 mm, >30 mm).
The optimal CT threshold was determined using a restricted cubic spline (RCS) regression model with four knots at the 5th, 35th, 65th, and 95th percentiles of the CT diameter distribution, with pathological diameter >20 mm as the binary outcome. Bootstrap resampling (B = 2000 replicates) was performed to validate threshold stability under both Youden index and net benefit maximization criteria.
Decision curve analysis (DCA) was conducted comparing CT > 20 mm versus CT > 23 mm versus treat-all and treat-none strategies. Net benefit was calculated as NB = (TP/n) − (FP/n) × [Pt/(1 − Pt)]. A 2 × 2 reclassification table was constructed comparing surgical decisions under the two thresholds. To evaluate CT–pathology agreement formally, paired Wilcoxon signed-rank tests were applied to compare CT and pathological diameters within each size stratum, with Hodges–Lehmann estimates and 95% confidence intervals reported. Spearman’s rank correlation coefficient (ρ) between CT and pathological diameter was calculated overall and by stratum. A sensitivity analysis applied a 1 mm measurement tolerance (clinically meaningful discordance defined as |CT − pathological diameter| > 1 mm). Internal validation of the RCS model was performed using bootstrap resampling (B = 200) via the rms package, yielding optimism-corrected C-statistics (apparent C = 0.881; optimism-corrected C = 0.880); model calibration was assessed with the Hosmer–Lemeshow test (χ2 = 13.35, df = 8, p = 0.100). A multivariable logistic regression model for lobectomy versus sublobar resection was fitted including CT diameter, age, sex, body mass index, FEV1, smoking history, and modified Charlson Comorbidity Index as covariates; model discrimination was assessed by the area under the receiver operating characteristic curve (AUC = 0.812). Statistical analyses were performed in R (version 4.3.2; R Foundation for Statistical Computing, Vienna, Austria) with the rms, dcurves, pROC, and ggplot2 packages. A two-tailed p-value < 0.05 was considered statistically significant.
4. Discussion
This study evaluated CT–pathology size discordance in a contemporary cohort of 1096 patients undergoing thoracoscopic surgery, demonstrating three clinically consequential findings. First, CT measurement bias is size-dependent: CT systematically overestimates smaller (≤20 mm) tumors and underestimates larger (>30 mm) lesions. Second, under a 20 mm size-only decision rule, 15.8% of patients were potential overtreatment cases, and 3.4% were potential undertreatment cases. Third, shifting the CT threshold to 23 mm reduces CT-driven overtreatment by 51.4% at a 4.7:1 trade-off, with superior decision-analytic utility at threshold probabilities ≥ 0.17.
The crossover from overestimation in T1a tumors (+4.21 mm) to underestimation in ≥T2 tumors (−7.49 mm) supports the “crossover” phenomenon reported in prior imaging series, quantified here in a contemporary thoracoscopic surgery cohort [
12,
13]. Overestimation of small lesions is consistent with prior reports attributing inaccuracies to volume averaging and difficulty delineating GGO–parenchyma interfaces [
14,
15]. Tissue processing contributes further: formalin fixation and post-resection deflation can induce 10–30% shrinkage [
16,
17], more pronounced in solid components than lepidic growth patterns [
18,
19,
20].
A 15.8% CT-driven overtreatment rate is clinically consequential. The revision from 20 mm to 23 mm is best justified by decision-analytic utility rather than accuracy alone, since bootstrap results under Youden optimization do not identify a single consistently dominant cutpoint within 19–23 mm. A 3 mm adjustment is conceptually aligned with landmark trials (JCOG0802/WJOG4607L and CALGB 140503), which demonstrated non-inferiority of segmentectomy for tumors ≤ 20 mm [
4,
5]. Accumulating evidence also suggests oncologic adequacy of sublobar resection for selected T1c tumors with sufficient margins [
21,
22]. In our cohort, CT > 23 mm improved specificity and PPV while preserving strong rule-out performance (high NPV), primarily reducing false-positive “lobectomy indications”.
The selection of 23 mm rather than 22 mm or 24 mm as the upper boundary of the decision revision zone rests on three converging lines of evidence. First, bootstrap-based net benefit analysis at Pt = 0.25 identifies 23 mm as both the median and modal optimal cut-off (selected in 63.9% of replicates), whereas 22 mm is favored at Pt = 0.20 (median 22 mm; 23 mm modal at 38.0%); 23 mm thus represents the more robust choice across the plausible range of clinical decision weights. Second, 24 mm yields marginally higher specificity (91.8%) at the cost of sensitivity (65.4%), which may be clinically unacceptable in populations with higher prevalence of pathological size > 20 mm. Third, 23 mm aligns with the enrollment boundary of the landmark JCOG0802/WJOG4607L and CALGB 140503 trials (clinical T1a–b tumors, ≤20 mm), naturally defining the 20–23 mm CT range as a zone of clinical uncertainty not fully addressed by those trials [
4,
5]. We acknowledge that 22 mm remains a statistically defensible alternative; the decision revision zone (20–23 mm) is therefore best understood as a range warranting heightened clinical scrutiny rather than a single mandatory decision cut-off.
The apparent discrepancy between the Youden-optimal cut-off (median 20 mm; 95% CI, 19–23 mm) and the net benefit-optimal decision threshold (median 22–23 mm at Pt = 0.20–0.25) reflects a fundamental difference in optimization criteria rather than a methodological inconsistency. The Youden index treats false-positive and false-negative errors as equally costly; under this symmetric loss, the current 20 mm cut-off is near-optimal. However, in thoracic surgical decision-making, an unnecessary lobectomy carries substantially greater harm than a missed indication—due to increased perioperative morbidity, long-term reduction in pulmonary reserve, and absence of oncologic benefit for pathologically small tumors. Decision curve analysis incorporates this asymmetry explicitly through the threshold probability, Pt. At clinically plausible values of Pt ≥ 0.17, the CT > 23 mm rule yields higher net benefit, and bootstrap resampling confirms this advantage is robust (23 mm selected in 63.9% of replicates at Pt = 0.25). We therefore propose 23 mm not as the accuracy-optimal cut-off, but as the decision-analytic-optimal operating point under realistic clinical preferences in which overtreatment is weighted more heavily than undertreatment.
Clinically, unnecessary lobectomy for small tumors exposes patients to short-term risks without clear oncologic benefit [
23,
24]. Preserving lung parenchyma is increasingly important for postoperative quality of life and physiologic reserve, particularly in older patients at risk for metachronous lung cancer [
25]. These considerations underscore why a modest upward revision of the CT threshold—applied specifically to borderline measurements—may improve the balance between benefit and harm in real-world surgical decision-making.
Clinical Implications
These findings support a refined, precision-oriented surgical decision-making model applicable to contemporary minimally invasive thoracic surgery. For tumors measuring 20–23 mm on CT (the “decision revision zone”), several strategies may reduce CT-driven overtreatment: (1) Multidisciplinary tumor board discussion with explicit attention to CT overestimation in borderline-size tumors and interpretive nuances of GGO. (2) Anatomical segmentectomy as the default approach, paired with intraoperative frozen section to confirm margin adequacy and nodal status, with conversion to lobectomy reserved for inadequate margins or occult nodal involvement. (3) Enhanced preoperative risk stratification using solid-component measurement and radiomic signatures to improve staging reliability. Our DCA provides a practical anchor for shared decision-making. The threshold probability (Pt) represents the minimum probability of pathological size > 20 mm at which a clinician would recommend lobectomy. A Pt of 0.17 implies willingness to perform approximately one unnecessary lobectomy per five correctly identified large tumors; a Pt of 0.20 implies accepting up to four; a Pt of 0.25 implies no more than three. When Pt ≥ 0.17, CT > 23 mm yields higher net benefit; when Pt < 0.17, CT > 20 mm may be preferred. For patients who strongly value lung parenchyma preservation, a higher implicit Pt supports use of the 23 mm threshold.
5. Limitations
This study has several limitations. First, the retrospective, single-center design limits generalizability; external validation in multicenter cohorts is required, given inter-institutional variability in CT acquisition and reconstruction protocols [
26]. Second, 7.5% of patients were excluded due to unavailable imaging data, potentially introducing selection bias. Third, CT–pathology discordance was not stratified by radiologic phenotype (pure GGO, part-solid, or solid nodules) or GGO proportion, nor were radiomics incorporated. Fourth, although inter-reader reliability was excellent (ICC = 0.92), automated AI-based segmentation tools were not evaluated. Fifth, operative strategy is not determined by size alone but also by tumor location, multiplicity, anatomic constraints, margin feasibility, and nodal assessment; because these determinants were not explicitly modeled, clinical translation may differ across practice settings. Sixth, long-term oncologic endpoints (overall survival and disease-free survival) were not assessed; the oncologic safety of shifting from 20 mm to 23 mm cannot be inferred from our data.
The aim of this study was not to revise existing guideline recommendations but to identify a statistically supported candidate CT threshold (23 mm) from real-world data to inform subsequent independent validation. We therefore suggest heightened caution when CT-measured diameter falls within the approximately 18–25 mm decision revision zone, where reliance on a single linear measurement may be suboptimal. In such borderline cases, multimodal imaging assessment, multidisciplinary review, and individualized patient and tumor factors should be incorporated into decision-making.