The Reliability of Two- and Three-Dimensional Cephalometric Measurements: A CBCT Study

Cephalometry is a standard diagnostic tool in orthodontic and orthognathic surgery fields. However, built-in magnification from the cephalometric machine produces double images from left- and right-side craniofacial structures on the film, which poses difficulty for accurate cephalometric tracing and measurements. The cone-beam computed tomography (CBCT) images not only allow three-dimensional (3D) analysis, but also enable the extraction of two-dimensional (2D) images without magnification. To evaluate the most reliable cephalometric analysis method, we extracted 2D lateral cephalometric images with and without magnification from twenty full-cranium CBCT datasets; images were extracted with magnification to mimic traditional lateral cephalograms. Cephalometric tracings were performed on the two types of extracted 2D lateral cephalograms and on the reconstructed 3D full cranium images by two examiners. The intra- and inter-examiner intraclass correlation coefficients (ICC) were compared between linear and angular parameters, as well as between CBCT datasets of adults and children. Our results showed that overall, tracing on 2D cephalometric images without magnification increased intra- and inter-examiner reliability, while 3D tracing reduced inter-examiner reliability. Angular parameters and children’s images had the lowest inter- and intra-examiner ICCs compared with adult samples and linear parameters. In summary, using lateral cephalograms extracted from CBCT without magnification for tracing/analysis increased reliability. Special attention is needed when analyzing young patients’ images and measuring angular parameters.


Introduction
Dr. Broadbent first established reproducible head positioning of the cephalostat in 1930 [1], which set a precedent for using lateral cephalometric radiographs in orthodontics. When Dr. Brodie used cephalometric X-rays to investigate craniofacial growth factors affecting orthodontic treatment in 1941 [2], and Dr. Margolis evaluated the relationship between incisor inclinations and various craniofacial factors in 1943 [3], cephalometric analysis started to evolve. In 1948, Dr. Downs developed the first cephalometric analysis method [4]. Since then, multiple methods have been established, and lateral cephalometric 3D cephalometric tracing and analysis can provide more information for comprehensive clinical evaluation and diagnosis, but an inevitable challenge will be comparing the results of new studies with the well-validated and accepted knowledge in the orthodontic field.
CBCT not only allows for 3D evaluation of craniofacial structures, but also enables the generation of 2D images with or without magnification. Although great efforts have gone into the comparison of conventional 2D and 3D tracings, no evaluation has been done using 2D lateral cephalometric images extracted from CBCTs without magnification. Additionally, whether patients' maturation stages affect tracing reliability has not been assessed. Thus, in the current study, we compared the intra-and inter-examiner reliability of 2D cephalometric analysis conducted with CBCT extracted lateral cephalometric images with magnification, which mimic traditional lateral cephalograms taken with proper head position; 2D cephalometric analysis conducted with CBCT extracted lateral cephalometric images without magnification; and 3D cephalometric analysis conducted with CBCT reconstructed images. The intra-and inter-examiner correlations were calculated and compared between linear and angular parameters, as well as between adults' and children's CBCT datasets. Overall, the current study aims to provide more information about the factors influencing tracing reliability, which could potentially improve clinical data analysis and comparison.

Materials and Methods
CBCT scans for this study were derived from a pre-existing clinical database of preorthodontic treatment records, and the study protocol was approved by the institution review board (protocol #848424). No additional radiographic images were taken for study purposes. Images were taken with a voxel size of 0.400 × 0.400 × 0.400 mm 3 . CBCTs from 20 patients without craniofacial syndromes or large facial asymmetry were included. Among these 20 patients, 10 were adults with permanent dentition (five females CBCTs, and setting up a new 3D cephalometric tracing and analysis system [19]. There is no doubt that 3D cephalometric tracing and analysis can provide more information for comprehensive clinical evaluation and diagnosis, but an inevitable challenge will be comparing the results of new studies with the well-validated and accepted knowledge in the orthodontic field. CBCT not only allows for 3D evaluation of craniofacial structures, but also enables the generation of 2D images with or without magnification. Although great efforts have gone into the comparison of conventional 2D and 3D tracings, no evaluation has been done using 2D lateral cephalometric images extracted from CBCTs without magnification. Additionally, whether patients' maturation stages affect tracing reliability has not been assessed. Thus, in the current study, we compared the intra-and inter-examiner reliability of 2D cephalometric analysis conducted with CBCT extracted lateral cephalometric images with magnification, which mimic traditional lateral cephalograms taken with proper head position; 2D cephalometric analysis conducted with CBCT extracted lateral cephalometric images without magnification; and 3D cephalometric analysis conducted with CBCT reconstructed images. The intra-and inter-examiner correlations were calculated and compared between linear and angular parameters, as well as between adults' and children's CBCT datasets. Overall, the current study aims to provide more information about the factors influencing tracing reliability, which could potentially improve clinical data analysis and comparison.

Materials and Methods
CBCT scans for this study were derived from a pre-existing clinical database of preorthodontic treatment records, and the study protocol was approved by the institution review board (protocol #848424). No additional radiographic images were taken for study purposes. Images were taken with a voxel size of 0.400 × 0.400 × 0.400 mm 3 . CBCTs from 20 patients without craniofacial syndromes or large facial asymmetry were included. Among these 20 patients, 10 were adults with permanent dentition (five females [mean age 22.4 years, range 19.3 years-25.3 years], five males [mean age 22.4 years, range 19.6 years-25.7 years]), and 10 were children with early mixed dentition (five females [mean age 8.8 years, range 8.2 years-9.2 years], five males [mean age 8.7 years, range 8.1 years-9.2 years]) ( Figure 1). Figure 1. The age information of the samples included in the current study. Data are presented as raw data overlapped with mean ± standard deviation (SD). Two-tailed t-test was used for statistical analysis. No statistically significant difference was detected between genders for each age group.
All CBCT DICOM files were imported into Dolphin 3D software (Dolphin Imaging; version 11.95 Premium, Chatsworth, CA, USA) and oriented using the Frankfort plane as Figure 1. The age information of the samples included in the current study. Data are presented as raw data overlapped with mean ± standard deviation (SD). Two-tailed t-test was used for statistical analysis. No statistically significant difference was detected between genders for each age group.
All CBCT DICOM files were imported into Dolphin 3D software (Dolphin Imaging; version 11.95 Premium, Chatsworth, CA, USA) and oriented using the Frankfort plane as the horizontal plane ( Figure 2A). The orientation was adjusted axially, so that lateral borders of the orbits from a lateral view overlapped each other (Figure 2A), and coronally so that inferior borders of both orbits sat on the same plane from a frontal view ( Figure 2B). Utilizing the "Build X-Rays Tool" in Dolphin 3D, lateral cephalometric X-Rays were created using both sides of the volume. Under the "X-ray Building preferences" option, "Perspective" was selected to create an X-Ray with the measured distortion and warping effects of a traditional X-Ray ( Figure 2C,E), and "Orthogonal" was selected to create a non-distorted X-Ray ( Figure 2D,F). According to "Dolphin Imaging User's Guide (Version 11.95)", the settings for "Perspective" were set as follows to match with the Bolton-Broadbent dimensions: • Fictitious Magnification Factor: 9.7%; • Emitter to Patient's Mid Plane (Distance in millimeters from the emitter to the patient's mid-plane to use when generating X-rays using perspective projection): 1550 mm; • Mid-Plane to Film (Distance in millimeters from patient's mid-plane to the film to use when generating X-Rays using perspective projection): 150 mm. the horizontal plane ( Figure 2A). The orientation was adjusted axially, so that lateral borders of the orbits from a lateral view overlapped each other (Figure 2A), and coronally so that inferior borders of both orbits sat on the same plane from a frontal view ( Figure 2B). Utilizing the "Build X-Rays Tool" in Dolphin 3D, lateral cephalometric X-Rays were created using both sides of the volume. Under the "X-ray Building preferences" option, "Perspective" was selected to create an X-Ray with the measured distortion and warping effects of a traditional X-Ray ( Figure 2C,E), and "Orthogonal" was selected to create a nondistorted X-Ray ( Figure 2D,F). According to "Dolphin Imaging User's Guide (Version 11.95)", the settings for "Perspective" were set as follows to match with the Bolton-Broadbent dimensions: • Fictitious Magnification Factor: 9.7%; • Emitter to Patient's Mid Plane (Distance in millimeters from the emitter to the patient's mid-plane to use when generating X-rays using perspective projection): 1550 mm; • Mid-Plane to Film (Distance in millimeters from patient's mid-plane to the film to use when generating X-Rays using perspective projection): 150 mm. the "X-ray Building preferences" option, "Perspective" was selected to create an X-Ray with the measured distortion and warping effects mimicking a traditional X-Ray (C), and "Orthogonal" was selected to create a non-distorted X-Ray (D). The red arrows represent the direction of X-Rays beams simulated by computer calculation. (E,F) The generated lateral cephalometric X-Rays with (E) or without (F) magnification and distortion. Note the two mandibular posterior borders visible in (E), but not in (F). (G,H) Digital cephalometric tracing and analysis were performed with the extracted 2D lateral cephalometric images. (I) Tracings with 3D reconstructed images were performed by using the "Digitize/Measurement" function in Dolphin 3D.
Image extraction and tracing were performed by two examiners, both American Board of Orthodontics certified clinicians and full-time educators in an academic institution. Calibration between the two examiners was performed twice before formal analysis. The two examiners performed cephalometric analysis separately, and each image was analyzed twice by each examiner at least one month apart.
The values from two sets of cephalometric analyses were used to calculate the intraexaminer intraclass correlation coefficient (ICC), and the average value of two sets of cephalometric analyses from each examiner was used to calculate the inter-examiner ICC. The intra-and inter-examiner ICCs of each parameter were calculated utilizing the IBM SPSS software (Statistical Package for Social Sciences version 26.0, Chicago, IL, USA), and compared between linear and angular parameters, as well as between adults' and children's CBCT datasets. A Shapiro-Wilk normality test was performed by OriginPro 8 (Origin Lab Corp., Northampton, MA, USA). A two-tailed t-test was used for statistical analysis of the patients' age comparison. Since some of the ICC data did not pass the normal distribution test, all ICC data were presented with a violin plot. A Wilcoxon matched-pairs signed-rank test was used for statistical analysis of the overall comparison of intra-and inter-examiner ICCs, and the comparison within each type of parameter. A Mann-Whitney U test was used for statistical analysis for the comparison between the two types (linear and angular) of parameters. For all data presented in this manuscript, p < 0.05 (*) was considered a suggestive difference, while p < 0.005 (**) was recognized as a statistically significant difference based on a recent recommendation [33].

Comparisons of the Intra-and Inter-Examiner Reliability among the Tracings of Different Images
We first compared the intra-and inter-examiner ICCs for all measurement parameters with 20 samples together. The intra-examiner ICCs for each parameter are listed in Table 1, and the inter-examiner ICCs are listed in Table 2.   For examiner 1 ( Figure 3A), the median of the ICCs in the 2D tracings of extracted lateral cephalometric images with magnification was 0.968. Removing magnification could statistically significantly increase the median of the ICCs to 0.978 (p < 0.001). When evaluating the 3D tracings of examiner 1, the median of the ICCs was 0.971, which was statistically significantly higher than that of 2D tracings with magnification (p = 0.002), but not different to that of 2D tracings without magnification. For examiner 1 ( Figure 3A), the median of the ICCs in the 2D tracings of extracted lateral cephalometric images with magnification was 0.968. Removing magnification could statistically significantly increase the median of the ICCs to 0.978 (p < 0.001). When evaluating the 3D tracings of examiner 1, the median of the ICCs was 0.971, which was statistically significantly higher than that of 2D tracings with magnification (p = 0.002), but not different to that of 2D tracings without magnification.  For examiner 2 ( Figure 3A), the median of the ICCs in the 2D tracings with magnification was 0.872, and the median of the ICCs in the 2D tracings without magnification was 0.854. No statistical significance was detected when comparing the intra-examiner ICC of examiner 2 for the two types of 2D tracings. For the 3D tracings of examiner 2, the median of the ICCs was 0.850. There is no statistical significance between the intra-examiner ICC of examiner 2 in the 2D tracings without magnification and 3D tracing, but the 3D tracing had suggestively significantly lower intra-examiner ICC than the 2D tracings with magnification (p = 0.0461).
We then evaluated the inter-examiner reliability for each type of cephalometric analysis method ( Figure 3B). In the 2D tracings with magnification, the median of the ICCs was 0.824. In the 2D tracings without magnification, the median of the inter-examiner ICCs was 0.903. In the 3D tracings, the median of the inter-examiner ICCs was 0.780. Comparison among the three types of cephalometric analysis methods showed that 2D tracings without magnification had the highest inter-examiner reliability (p < 0.0001), in the range of excellent [34]. Both 2D tracings with magnification and 3D tracings had good inter-examiner ICCs [34], while 2D tracings with magnification suggested a higher inter-examiner ICC than that of 3D tracings (p = 0.0066).

Comparison between Linear and Angular Parameters
Previous studies showed that different cephalometric tracing methods may affect the measured results of linear and angular parameters differently [18,35,36]. Thus, in the current study, we compared the intra-and inter-examiner reliabilities of linear and angular parameters for all three types of tracing methods.
For examiner 1 ( Figure 4A), the intra-examiner ICC of linear parameters was suggestively higher than that of angular parameters (p = 0.0052) in the 2D tracings of extracted lateral cephalometric images with magnification. Tracing on 2D images without magnification could improve the intra-examiner ICCs for both linear (p = 0.0210) and angular parameters (p = 0.0017), while linear parameters still had higher intra-examiner ICCs than angular parameters (p = 0.0146). 3D tracings did not alter the intra-examiner ICCs of the linear parameters when compared with the two types of 2D tracings, but significantly increased the intra-examiner ICCs of the angular parameters when compared with the 2D tracings with magnification (p = 0.0029). However, there was no statistical significance when comparing angular and linear parameters in 3D tracings.
For examiner 2 ( Figure 4B), there was no statistical significance between linear and angular parameters. Changing tracing methods did not alter the intra-examiner ICCs for both linear and angular parameters.
Moving to the inter-examiner reliability ( Figure 4C), in the 2D tracings with magnification, there was no statistical significance when comparing the inter-examiner ICC of linear and angular parameters. In the 2D tracings without magnification, linear parameters had suggestively higher inter-examiner ICCs than angular parameters (p = 0.0240). 2D tracings without magnification had statistically significantly higher inter-examiner ICCs for both linear and angular parameters than 2D tracings with magnification (p < 0.0001). 3D tracings influenced the inter-examiner reliability differently: it decreased the interexaminer ICC of the linear parameters (p = 0.0017), while it did not affect that of the angular parameters when compared to 2D tracings with magnification. 3D tracings had statistically significantly lower inter-examiner ICCs for both linear and angular ICCs than 2D tracings without magnification (p < 0.0001). The inter-examiner ICC of the two examiners was calculated by comparing the mean value of the two sets of cephalometric tracing measurements from each examiner. All data presented with violin plots. The solid black line in each violin plot indicates the median, and the colored dotted lines in each violin plot indicates the quartiles. Wilcoxon matched-pairs signed-rank test was used for statistical analysis for the comparison within each type of the parameters, and Mann-Whitney U test was used for statistical analysis for the comparison between two types of the parameters. *: p < 0.05; **: p < 0.005.
For examiner 2 ( Figure 4B), there was no statistical significance between linear and angular parameters. Changing tracing methods did not alter the intra-examiner ICCs for both linear and angular parameters.
Moving to the inter-examiner reliability ( Figure 4C), in the 2D tracings with magnification, there was no statistical significance when comparing the inter-examiner ICC of linear and angular parameters. In the 2D tracings without magnification, linear parameters had suggestively higher inter-examiner ICCs than angular parameters (p = 0.0240). 2D tracings without magnification had statistically significantly higher inter-examiner ICCs for both linear and angular parameters than 2D tracings with magnification (p < 0.0001). 3D tracings influenced the inter-examiner reliability differently: it decreased the inter-examiner ICC of the linear parameters (p = 0.0017), while it did not affect that of the angular parameters when compared to 2D tracings with magnification. 3D tracings had statistically significantly lower inter-examiner ICCs for both linear and angular ICCs than 2D tracings without magnification (p < 0.0001).

Comparison between the Images from Adult Patients and the Images from Children Patients
In the current study, we included 10 young patients in early mixed dentition to represent the patient population seeking early orthodontic intervention, and 10 adult patients in permanent dentition to represent patients needing comprehensive orthodontic treatment. The intra-examiner ICCs for each parameter in different age groups were listed in The interexaminer ICC of the two examiners was calculated by comparing the mean value of the two sets of cephalometric tracing measurements from each examiner. All data presented with violin plots. The solid black line in each violin plot indicates the median, and the colored dotted lines in each violin plot indicates the quartiles. Wilcoxon matched-pairs signed-rank test was used for statistical analysis for the comparison within each type of the parameters, and Mann-Whitney U test was used for statistical analysis for the comparison between two types of the parameters. *: p < 0.05; **: p < 0.005.

Comparison between the Images from Adult Patients and the Images from Children Patients
In the current study, we included 10 young patients in early mixed dentition to represent the patient population seeking early orthodontic intervention, and 10 adult patients in permanent dentition to represent patients needing comprehensive orthodontic treatment. The intra-examiner ICCs for each parameter in different age groups were listed in Supplemental Tables S1 and S2, and the inter-examiner ICCs were listed in Supplemental Table S3. For examiner 1 ( Figure 5A), in the 2D tracings of extracted lateral cephalometric images with magnification, the intra-examiner ICC for images from adult patients was higher than that from children patients (p = 0.0059). Tracing 2D images without magnification could improve the intra-examiner ICCs for both adults (p = 0.0247) and children (p = 0.0002), while tracings from adult patients still had higher intra-examiner ICCs than that of children (p = 0.0015). 3D tracings could further improve the intra-examiner ICCs of the tracings of images from children, but not for that of adults. There was no statistical significance when comparing the intra-examiner ICCs of 3D tracings with images from adult and children patients. higher than that from children patients (p = 0.0059). Tracing 2D images without magnification could improve the intra-examiner ICCs for both adults (p = 0.0247) and children (p = 0.0002), while tracings from adult patients still had higher intra-examiner ICCs than that of children (p = 0.0015). 3D tracings could further improve the intra-examiner ICCs of the tracings of images from children, but not for that of adults. There was no statistical significance when comparing the intra-examiner ICCs of 3D tracings with images from adult and children patients. For examiner 2 ( Figure 5B), there was no statistical significance when comparing the intra-examiner ICCs of 2D tracings with images from adult and children patients. Interestingly, 3D tracings could increase the intra-examiner ICCs of images from adult patients, but decreased the intra-examiner ICCs of images from children patients.
When looking at the inter-examiner reliability ( Figure 5C), there was no statistical significance when comparing the inter-examiner ICCs of 2D tracings with images from adult and children patients. In both age groups, 2D tracings without magnification had higher ICCs than 2D tracings with magnification. 3D tracings had higher ICC than 2D For examiner 2 ( Figure 5B), there was no statistical significance when comparing the intra-examiner ICCs of 2D tracings with images from adult and children patients. Interestingly, 3D tracings could increase the intra-examiner ICCs of images from adult patients, but decreased the intra-examiner ICCs of images from children patients.
When looking at the inter-examiner reliability ( Figure 5C), there was no statistical significance when comparing the inter-examiner ICCs of 2D tracings with images from adult and children patients. In both age groups, 2D tracings without magnification had higher ICCs than 2D tracings with magnification. 3D tracings had higher ICC than 2D tracings with magnification in the adult group (p = 0.0035), but lower ICC than 2D tracings with magnification in the children group (p = 0.0348).

Combined Influent Effects of Parameter Types and Patients' Age on Tracing Reliabilities
Since both parameter types and patients' age affected the reliability of cephalometric analysis as mentioned above, we performed a more detailed comparison with the consideration of both factors in each type of tracing method.
For examiner 1 ( Figure 6A), in the 2D tracings of extracted lateral cephalometric images with magnification, there was no statistical significance between linear and angular parameters of images from adult patients. However, statistical significance was detected between linear and angular parameters of images from children patients (p = 0.0024). In addition, angular parameters of images from children patients had significantly lower intra-examiner ICCs than those from adult patients (p = 0.0383). Thus, angular parameters of images from children patients had the lowest intra-examiner ICCs in the 2D tracings with magnification. The same trends can also be observed in the 2D tracings without magnification. In the 3D tracings, no statistical significance was detected among groups. When comparing among different types of tracings for examiner 1, angular parameters of images from children in 2D tracings with magnification had the lowest intra-examiner ICCs.

Combined Influent Effects of Parameter Types and Patients' Age on Tracing Reliabilities
Since both parameter types and patients' age affected the reliability of cephalometric analysis as mentioned above, we performed a more detailed comparison with the consideration of both factors in each type of tracing method.
For examiner 1 ( Figure 6A), in the 2D tracings of extracted lateral cephalometric images with magnification, there was no statistical significance between linear and angular parameters of images from adult patients. However, statistical significance was detected between linear and angular parameters of images from children patients (p = 0.0024). In addition, angular parameters of images from children patients had significantly lower intra-examiner ICCs than those from adult patients (p = 0.0383). Thus, angular parameters of images from children patients had the lowest intra-examiner ICCs in the 2D tracings with magnification. The same trends can also be observed in the 2D tracings without magnification. In the 3D tracings, no statistical significance was detected among groups. When comparing among different types of tracings for examiner 1, angular parameters of images from children in 2D tracings with magnification had the lowest intra-examiner ICCs. Figure 6. Comparison of intra-examiner intraclass correlation coefficient (ICC) of different measurement parameters for the radiographic images from different patient age groups. (A,B) The intra-examiner ICCs of examiner 1 (A) and examiner 2 (B) were calculated for each examiner based on two sets of cephalometric tracing measurements performed with at least a one-month interval. All data presented with violin plots. The solid black line in each violin plot indicates the median, and the colored dotted lines in each violin plot indicates the quartiles. Wilcoxon matched-pairs signed-rank test was used for statistical analysis for the comparison within each type of the parameters, and Mann-Whitney U test was used for statistical analysis for the comparison between two types of the parameters. *: p < 0.05; **: p < 0.005. Figure 6. Comparison of intra-examiner intraclass correlation coefficient (ICC) of different measurement parameters for the radiographic images from different patient age groups. (A,B) The intra-examiner ICCs of examiner 1 (A) and examiner 2 (B) were calculated for each examiner based on two sets of cephalometric tracing measurements performed with at least a one-month interval. All data presented with violin plots. The solid black line in each violin plot indicates the median, and the colored dotted lines in each violin plot indicates the quartiles. Wilcoxon matched-pairs signed-rank test was used for statistical analysis for the comparison within each type of the parameters, and Mann-Whitney U test was used for statistical analysis for the comparison between two types of the parameters. *: p < 0.05; **: p < 0.005.
Similarly, for examiner 2 ( Figure 6B), in the 2D tracings with magnification, there was no statistical significance between linear and angular parameters of images from adult patients, and statistical significance was detected between linear and angular parameters of images from children patients (p = 0.0118). In the 2D tracings without magnification, examiner 2 also had the lowest intra-examiner ICC with angular parameters of images from children patients. In the 3D tracings, both linear and angular parameters of images from children patients had lower intra-examiner ICCs than did images from adult patients.
For inter-examiner reliabilities (Figure 7), linear parameters for the images from children patients had suggestively higher inter-examiner ICC than did the images from adult patients (p = 0.0445), while no statistical significance was detected for other comparisons in the 2D tracings with magnification. 2D tracings without magnification had higher interexaminer ICCs than 2D tracings with magnification in linear parameters of images from both adults and children, and in angular parameters of images from adult patients, but not in angular parameters of images from children patients. In fact, angular parameters of images from children patients had the lowest inter-examiner ICCs when compared to other groups in the 2D tracings without magnification. 3D tracing could only increase the inter-examiner ICCs of linear parameters of images from adult patients, and the ICCs were significantly higher than in linear parameters of images from children patients and in angular parameters of images from both age groups in 3D tracings. from children patients had lower intra-examiner ICCs than did images from a tients.
For inter-examiner reliabilities (Figure 7), linear parameters for the images f dren patients had suggestively higher inter-examiner ICC than did the images fr patients (p = 0.0445), while no statistical significance was detected for other com in the 2D tracings with magnification. 2D tracings without magnification had high examiner ICCs than 2D tracings with magnification in linear parameters of ima both adults and children, and in angular parameters of images from adult pati not in angular parameters of images from children patients. In fact, angular param images from children patients had the lowest inter-examiner ICCs when com other groups in the 2D tracings without magnification. 3D tracing could only inc inter-examiner ICCs of linear parameters of images from adult patients, and the IC significantly higher than in linear parameters of images from children patients a gular parameters of images from both age groups in 3D tracings.  Comparison of inter-examiner intraclass correlation coefficient (ICC) of different measurement parameters for the radiographic images from different patient age groups. The inter-examiner ICC of the two examiners was calculated by comparing the mean value of the two sets of cephalometric tracing measurements from each examiner. All data presented with violin plots. The solid black line in each violin plot indicates the median, and the colored dotted lines in each violin plot indicates the quartiles. Wilcoxon matched-pairs signed-rank test was used for statistical analysis for the comparison within each type of the parameters, and Mann-Whitney U test was used for statistical analysis for the comparison between two types of the parameters. *: p < 0.05; **: p < 0.005.

Discussion
In this study, we compared three types of digital tracings and evaluated the influence of parameter types and patients' age on the intra-and inter-examiner reliabilities.
When comparing the intra-examiner ICCs for each type of cephalometric analysis between the two examiners, examiner 1 had statistically significantly higher ICCs than examiner 2 (p < 0.0001, Figure 3A), which may be because examiner 1 has more imaging analysis experience. However, both examiners had ICC medians for all three types of cephalometric analysis methods higher than 0.75, indicating that all methods had good (between 0.75 and 0.9) to excellent (greater than 0.9) intra-examiner reliability [34]. The ICCs are similar to or higher than those reported in previous publications by other groups [37][38][39][40], demonstrating consistency and accuracy of the present study.
As expected, reducing double images on lateral cephalometric X-rays by eliminating magnification and distortion significantly improved the inter-examiner reliability from good (0.8240) to excellent (0.9030) ( Figure 3B). Interestingly, 3D tracing had the lowest inter-examiner ICC among all three types of tracing methods, even though the median ICC was still in the range of good (0.7800) ( Figure 3B). Both examiners experienced difficulty identifying certain landmarks during 3D tracings, such as orbitale, porion, DC point, and PT point. These landmarks are formed by overlapping craniofacial structures from different sagittal layers in 2D images. In addition, both examiners had low confidence identifying incisor root tips during 3D tracings. These difficulties are consistent with those encountered in previous studies where the authors compared landmark identification errors on conebeam computed tomography and conventional digital cephalograms. They found that gonion, condylion, and porion were located on flat or curved surfaces and thus difficult to precisely reference/define on CBCT images [27]. Additionally, certain locations with lower densities, such as the mandibular incisor apex, will have high measurement errors because they could not be visualized with 3D reconstruction [27]. The current study further supports the idea that traditional 2D landmarks do not completely map to 3D tracings, and new landmarks need to be identified in three axial planes to establish a more reliable 3D tracing system [19,41].
When looking at each parameter in detail, cranial deflection angle consistently had low intra-and inter-examiner ICCs in 2D tracings. Thus, the current study suggests that caution is needed while interpreting this measurement clinically.
Both linear and angular parameters have been evaluated by comparing 2D and 3D tracings, with more focus on angular parameters. However, the conclusion on whether angular parameters have similar ICCs in 2D and 3D tracing is controversial [37,42,43]. In the current study, no dramatic difference was found in the intra-or inter-examiner ICCs between linear and angular parameters. Using 2D tracings of extracted lateral cephalometric images without magnification could improve the intra-and inter-examiner ICCs for both linear and angular parameters when compared to 2D tracings with magnification. 3D tracings had low inter-examiner ICCs for linear and angular parameters relative to 2D tracings, but there was no difference between linear and angular parameters within 3D tracings ( Figure 4). Thus, linear parameters are more reliable than angular parameters in 2D tracings.
Unerupted permanent teeth overlap with maxillary and mandibular alveolar bone, which increases difficulty when tracing lateral cephalometric X-rays of children with mixed dentition. The low bone density of the children compared to adults may also add difficulty to landmark identification during the cephalometric analysis of young patients. Thus, in the current study, we evaluated whether there is any difference in the intra-and interexaminer reliabilities of cephalometric analysis on X-rays from patients of different ages and dentition types. To the best of our knowledge, this is the first study making this comparison. For intra-examiner reliability, a small but statistically significant difference was found between the adult (ICC median 0.9690) and children groups (ICC median 0.9400) in the 2D tracings with magnification in one examiner ( Figure 5A). Removing the magnification and distortion from the 2D X-rays could improve both intra-and inter-examiner reliability of the cephalometric analysis ( Figure 5), but the adult group still had higher intra-examiner ICC than the children group with the same examiner ( Figure 5A). For the 3D tracing, opposite trends were observed with different age groups: compared to the 2D tracings with magnification, 3D tracing could increase the intra-( Figure 5B) and inter-examiner ICCs ( Figure 5C) in the adult group, but decrease the intra-examiner ICC with the examiner who has less experience with 3D imaging analysis in the children group (Figure 5B), and further decrease the inter-examiner ICCs of the children group ( Figure 5C). With all the medians of ICCs for each type of cephalometric analysis in both age groups higher than 0.75, all tracing methods were reliable for both children and adult patients. However, the current study suggests clinical caution is needed while evaluating the images from children patients.
With the consideration of tracing methods, parameter types, and patients' ages, a detailed comparison was performed. We found that angular measurements of the images from children patients had the lowest intra-examiner reliabilities for both examiners in all three types of cephalometric analysis methods ( Figure 6). This subgroup also had the lowest inter-examiner reliability (Figure 7).
There is no doubt that this study had limitations. First, the CBCT datasets used in the current study did not have craniofacial syndromes or significant skeletal asymmetries. In a scenario with significant facial asymmetry, different borders of left-and right-side craniofacial structures may be distinguished on the extracted 2D lateral cephalometric X-rays even when using the setting of "orthogonal." Thus, whether 2D tracings without magnification have higher reliability than tracings with conventional lateral cephalometric images for such patients' needs to be verified. Additionally, only two American Board of Orthodontics certified clinicians were evaluated as examiners in the current study. The tracing performance of less experienced postgraduate program trainees in all three types of cephalometric analysis methods is worth considering to guide future clinical education and training.
We would like to emphasize that the CBCT datasets used in the current study were obtained from a pre-existing database of the patients who were prescribed with a full-volume CBCT for the initial purpose of evaluating impacted tooth/teeth, temporary anchorage device placement, orthognathic-orthodontic treatment plan, periodontal lesion, or endodontic lesion based on the clinical observation during the initial orthodontic consultation. In other words, no participants had a full-volume CBCT taken solely to extract lateral cephalometric images. We believe that all the radiological images in the orthodontic field should be taken with strict adherence to the ALARA (as low as reasonably achievable) principle. It is worth noting that, worldwide, substantial efforts have gone into further reducing radiation to the patients who need radiographic evaluation for orthodontic purposes. For example, a biplanar low-dose X-ray imaging system has been developed to take anteroposterior and lateral 2D images simultaneously, which can be used for 3D reconstruction based on statistical models [42]. Primarily used in the orthopedic field, this biplanar low-dose X-ray imaging system is capable of reliable cephalometric analysis [43]. Thus, for the patients who already have biplanar images for orthopedic purposes, no additional regular lateral/posterior-anterior cephalometric X-rays are needed if the patients are also seeking orthodontic and orthognathic management [43]. In addition, an ultra-low-dose CBCT imaging system has also been introduced in the dental field [44,45]. Excitingly, outside the range of 2 mm or degrees, there is no statistical difference in performing cephalometric analysis on the full-volume CBCTs taken under the ultra-low-dose protocol and standard protocol [38,45]. However, it has been noted that measurements based on images taken with the standard protocol have significantly smaller standard deviations than those taken with the ultra-low-dose protocol [38]. Moreover, previous studies indicate the patient scanning positions influence the accuracy of 3D cephalometry analysis, which could be sufficient to attract clinical attention [46,47]. Taken together, under current circumstances, commercially available 3D imaging systems cannot completely replace the 2D imaging systems regarding radiation exposure and analysis accuracy, and a large field CBCT scan for the sole reason of extracting a 2D pseudo-teleradiographic image is not good practice at the moment.

Conclusions
In summary, all three types of cephalometric analysis methods were reliable, with 2D tracings of extracted lateral cephalometric images without magnification having the highest intra-and inter-examiner reliabilities. However, since the current cephalometric norms were established using conventional lateral cephalometric images with built-in magnifications and distortions, 2D tracings of extracted lateral cephalometric images without magnification may not directly replace conventional cephalometric analysis. Further studies are needed to compare the tracing values between these two cephalometric analysis methods. Additionally, new landmarks are needed for 3D cephalometric tracing to improve the reliability of the 3D cephalometric analysis. Along with magnification, types of measurement parameters and patients' ages are also influential factors in the accuracy and reliability of the cephalometric analysis. Clinical attention is needed when interpreting the angular measurements of images from children patients. Last but not least, the radiological images in the orthodontic field should be taken by strictly following the ALARA principle. Exposing a large field CBCT scan for the sole reason of extracting a 2D pseudo-teleradiographic image is not recommended.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/diagnostics11122292/s1, Supplemental Table S1: The intra-examiner intraclass correlation coefficient (ICC) of examiner 1 of all measurement parameters for the radiographic images from different age groups of the patients; Supplemental Table S2: The intra-examiner intraclass correlation coefficient (ICC) of examiner 2 of all measurement parameters for the radiographic images from different age groups of the patients; Supplemental Table S3: The inter-examiner intraclass correlation coefficient (ICC) of all measurement parameters for the radiographic images from different age groups of the patients. Informed Consent Statement: Patient consent was waived as CBCT scans for this study were derived from pre-existing clinical database of pre-orthodontic treatment records, no additional radiologic images were taken for the current study, and no personal identification information was included in the current study.

Data Availability Statement:
The data presented in this study are contained within this article and Supplementary Materials.