Abstract
Since skeletal development is largely completed by adulthood, it is difficult for traditional methods to capture subtle age-related structural changes in bones and surrounding tissues. Recent advances in deep learning have demonstrated remarkable potential in medical image-based age estimation. The cervical vertebrae, as captured in lateral cephalometric radiographs (LCR), have shown particular value in such tasks. To systematically investigate the contribution of different vertebral representations to age estimation, we developed four distinct input modes: (1) Contour (C); (2) Mask (M); (3) Cervical Vertebrae (CV) and (4) Cervical vertebrae region (SR). Using a large-scale LCR dataset of 20,174 subjects aged 4–40 years, grouped into 5-year intervals, we evaluated these modes with deep learning models. The Mean Absolute Error (MAE) was used to evaluate performance. Results indicated that the SR mode achieved the lowest overall MAE, particularly for the C1–C4 combination, followed by CV, while C and M modes showed similar and poorer performance. For subjects younger than 25 years, MAEs for individual vertebrae (C1–2, C3, C4) were less than 5 years across all modes; however, in the 26–40 years group, MAEs for C and M modes exceeded 10 years, whereas CV and SR modes remained below 10 years for most combinations. Combining vertebrae consistently improved accuracy over individual ones, with continuous combinations (e.g., C1–2 + C3) outperforming discontinuous ones (e.g., C1–2 + C4). Visualization of age-related salience revealed that salient regions varied by input mode and expanded with increased information content. These findings underscore the critical importance of incorporating peripheral soft tissue and comprehensive vertebral context for accurate age estimation across a wide age spectrum.