Comparing a Fully Automated Cephalometric Tracing Method to a Manual Tracing Method for Orthodontic Diagnosis

Background: This study aims to compare an automated cephalometric analysis based on the latest deep learning method of automatically identifying cephalometric landmarks with a manual tracing method using broadly accepted cephalometric software. Methods: A total of 100 cephalometric X-rays taken using a CS8100SC cephalostat were collected from a private practice. The X-rays were taken in maximum image size (18 × 24 cm lateral image). All cephalometric X-rays were first manually traced using the Dolphin 3D Imaging program version 11.0 and then automatically, using the Artificial Intelligence CS imaging V8 software. The American Board of Orthodontics analysis and the European Board of Orthodontics analysis were used for the cephalometric measurements. This resulted in the identification of 16 cephalometric landmarks, used for 16 angular and 2 linear measurements. Results: All measurements showed great reproducibility with high intra-class reliability (>0.97). The two methods showed great agreement, with an ICC range of 0.70–0.92. Mean values of SNA, SNB, ANB, SN-MP, U1-SN, L1-NB, SNPg, ANPg, SN/ANS-PNS, SN/GoGn, U1/ANS-PNS, L1-APg, U1-NA, and L1-GoGn landmarks had no significant differences between the two methods (p > 0.0027), while the mean values of FMA, L1-MP, ANS-PNS/GoGn, and U1-L1 were statistically significantly different (p < 0.0027). Conclusions: The automatic cephalometric tracing method using CS imaging V8 software is reliable and accurate for all cephalometric measurements.


Introduction
Since Broadbent developed the imaging technique in 1931, cephalometry has been used to investigate growth, identify malocclusions, and create treatment plans as well as to assess the outcomes of those treatments. After tracing anatomical features and identifying landmarks on acetate paper for the lateral cephalogram analysis, measurements were taken with rulers and protractors. The entire process was laborious, time-consuming, prone to mistakes, and largely dependent on operator skill. Developments in computer software have mostly served to automate the cephalometric measurements, but the doctor must still manually pinpoint the appropriate landmarks [1,2]. Nowadays, the advancement of different computer-based technologies such as artificial intelligence and 3D printing is giving a new perspective to the everyday orthodontic practice [3][4][5].
Artificial intelligence (AI) is the ability of a technology to mimic human intelligence or make decisions that are effective and ethical by predetermined criteria. AI may now encompass many facets of contemporary culture thanks to advances in analytics techniques, computer power, and data availability. We can already see its effects in our everyday lives on a global scale. It filters content for social media, web searches, and consumer goods such as cameras, cellphones, tablets, and even autos. Machine learning is a prominent branch of artificial intelligence. Machine learning uses the statistical patterns of previously learned data to predict new data and circumstances. Machine learning needs to incorporate training data in order to work, therefore, training data are necessary for machine learning to function. With this method, the computer model can learn from experience rather than through traditional explicit programming, improving over time. In order to learn the features of the data using abstractions from several processing levels, a model must be given a lot of data. Deep learning has the advantage that it does not require a lot of engineering work to preprocess the data and deep learning techniques have been employed most prominently in object identification and visual object recognition. Machine learning has become increasingly important in the detection and classification of specific diseases found in medical imaging as a result of recent technological advancements. There have been initiatives in orthodontics to use machine learning in various methods, one of which is the automated AI recognition of cephalometric landmarks. The two-dimensional cephalometric image is still the crucial and most often used tool in orthodontics for diagnosis, treatment planning, and result prediction even though research using three-dimensional imaging has garnered attention [6,7].
There have been numerous attempts to include AI in the cephalometric study. The International Symposium on Biomedical Imaging conferences, supported by the Institute of Electrical and Electronics Engineers, launched global AI challenges in 2014 for precise AI measurements. The challenge has changed since 2015, becoming more clinically focused and providing success categorization rates. The Institute of Electrical and Electronics Engineers (IEEE) and the International Symposium on Biomedical Imaging (ISBI) hosted challenges on the automatic recognition of cephalometric landmarks and presented 400 different lateral cephalograms. In addition, new algorithms have been created on the same open dataset. Some of these methods, including decision trees, random forests, and deep learning, have been used to increase the precision of landmark detection [8][9][10][11]. Additionally, the resultant point must be located on the average of the coordinates for the left and right landmarks when the bilateral anatomic features do not overlap. There has not been any research that has successfully described this ailment in the past. This situation may compromise the precision of landmark placement and compromise the reliability of the cephalometric study. The convolutional neural network (CNN), a deep learning network structure, is the algorithm that exhibits significant advantages in graphics processing. It has been applied to problems involving images such as target identification, character recognition, face recognition, posture assessment, and others. In medicine, and more specifically in medical imaging, CNN has been successfully used to detect and classify lesions, image segmentation, auxiliary diagnosis, etc. [12][13][14][15][16][17][18][19][20][21][22][23][24]. In 2019, Park et al., a year after Hwang et al., used two different kinds of CNN. The first one was "You-Only-Look-Once version 3" and the second one was the "Single Shot Multibox Detector". They used both CNN types to find 80 landmarks with good results. Some of these landmarks were applied to perform measurement analysis, whereas others detected contours or outlines or were used to predict treatment outcomes [8,25].
Carestream dental is one of the companies that has shown high interest in automatic cephalometric tracing since the very beginning of this method. Nowadays, they offer a cephalometric imaging software capable of fully tracing any cephalometric X-ray taken by a Carestream cephalostat as long as the X-ray is taken in the maximum image size (18 × 24 cm lateral image). Cephalometric tracing is based on artificial intelligence, using a deep convolutional neural network (CNN) for landmark detection, followed by an active shape model (ASM) for adjusting the position of the whole structure. Some classical image processing (mathematical morphology) is also involved in tracing the soft-tissue profile. The software does not acquire information from the machine; the network is pre-trained. The training set was collected from a collaborator orthodontist, then manually annotated. There are many studies that looked over the accuracy of different software programs but, to the best of our knowledge, none of them looked over CS imaging V8 software (Carestream Dental LLC, Atlanta, GA, USA), which is a broadly used software. This study aims to compare the accuracy of automatic cephalometric analysis using CS imaging V8 software to manual cephalometric analysis.

Materials and Methods
Subjects were recruited from a private practice that owns a CS8100SC Evo Edition X-ray machine (Producer: Carestream Dental LLC, Atlanta, GA, USA, Year: 2020). Inclusion criteria consisted of subjects seeking orthodontic treatment whose records included cephalometric X-rays. Subjects with existing intraoral appliances were excluded. Poor quality cephalograms with artifacts that could interfere with the anatomical point identification were excluded as well. There was no restriction on patients' gender, age, and ethnicity at the time that cephalometric X-rays were taken. A sample size calculation test was performed based on previous research. A minimum sample size of 79 patients was calculated as appropriate to detect a significant deviation in the intraclass correlation coefficient equal to or greater than 0.70 (moderate agreement and upwards) from 0.50 (poor agreement), with a power of 80%. A sample of 100 subjects was recruited and used in this project [23].
Pre-treatment lateral cephalometric radiographs of 100 patients (43 males, 57 females, mean age: 15.9 ± 4.8 years) were randomly selected. The cephalometric images were taken with the patient in the upright standing position with the Frankfort plane parallel to the floor, keeping the teeth in centric relation and the lips relaxed. All the lateral cephalometric radiographs were taken using the same lateral cephalometric machine (CS 8100 SC) by the same technician in the maximum image size (18 × 24 cm lateral image) (Figure 1). the time that cephalometric X-rays were taken. A sample size calculation test was p formed based on previous research. A minimum sample size of 79 patients was calculat as appropriate to detect a significant deviation in the intraclass correlation coefficie equal to or greater than 0.70 (moderate agreement and upwards) from 0.50 (poor agre ment), with a power of 80%. A sample of 100 subjects was recruited and used in this p ject [23].
Pre-treatment lateral cephalometric radiographs of 100 patients (43 males, 57 femal mean age: 15.9 ± 4.8 years) were randomly selected. The cephalometric images were tak with the patient in the upright standing position with the Frankfort plane parallel to t floor, keeping the teeth in centric relation and the lips relaxed. All the lateral cepha metric radiographs were taken using the same lateral cephalometric machine (CS 8100 S by the same technician in the maximum image size (18 × 24 cm lateral image).

Statistical Analysis
All cephalometric measurement data were imported into an Excel spreadsheet (M crosoft, Redmond, Wash), and statistical analysis was performed using SPSS softwa (version 27; IBM, Armonk, NY). Normal distribution of the data was tested using the K mogorov-Smirnov test. Descriptive statistics (mean, standard deviation, and minimu and maximum values) were calculated for every parameter measured by each metho Differences between methods were assessed using paired the t-test or Wilcoxon test, wh appropriate. Furthermore, the Bonferroni method for multiple comparisons was appl for hypothesis testing of the equality of several parameters' means between the automa and manual methods. We applied the Bonferroni correction since we compared 18 para eters, so the level of statistical significance (a = 0.05) was divided by the number of para eters and was set to 0.0027 to avoid inflation of the type I error because of the multi comparisons. The intra-method agreement was evaluated using the intraclass correlati coefficient (ICC) [26]. All comparisons were two-sided at a = 0.05 level of statistical sign icance.

Results
The sample included 100 subjects, 43 males and 57 females with a mean age of 15. 4.8 years. The operator's reliability was calculated using intraclass correlation on 20 ra domly selected subjects, whose data were re-measured 3 weeks apart. All measureme showed excellent intraclass correlation. (Table 1

Statistical Analysis
All cephalometric measurement data were imported into an Excel spreadsheet (Microsoft, Redmond, WA, USA), and statistical analysis was performed using SPSS software (version 27; IBM, Armonk, NY, USA). Normal distribution of the data was tested using the Kolmogorov-Smirnov test. Descriptive statistics (mean, standard deviation, and minimum and maximum values) were calculated for every parameter measured by each method. Differences between methods were assessed using paired the t-test or Wilcoxon test, when appropriate. Furthermore, the Bonferroni method for multiple comparisons was applied for hypothesis testing of the equality of several parameters' means between the automatic and manual methods. We applied the Bonferroni correction since we compared 18 parameters, so the level of statistical significance (a = 0.05) was divided by the number of parameters and was set to 0.0027 to avoid inflation of the type I error because of the multiple comparisons. The intra-method agreement was evaluated using the intraclass correlation coefficient (ICC) [26]. All comparisons were two-sided at a = 0.05 level of statistical significance.

Results
The sample included 100 subjects, 43 males and 57 females with a mean age of 15.9 ± 4.8 years. The operator's reliability was calculated using intraclass correlation on 20 randomly selected subjects, whose data were re-measured 3 weeks apart. All measurements showed excellent intraclass correlation (Table 1). Table 1.

American Board of Orthodontics Cephalometric Analysis
There was no significant difference between the two methods for the measurements of SNA, SNB, ANB, SN-MP, U1-SN, U1-NA, L1-MP, and L1-NB (p > 0.05) while there was a significant difference between the two methods for the measurements of FMA and L1-MP (p < 0.05). All measurements showed a high correlation between the two methods (ICC > 0.70) (Tables 2 and 3). Table 2. Intraclass correlation coefficient (ICC) and 95% confidence interval (CI) for inter-method agreement (auto, manual).

European Board of Orthodontics Cephalometric Analysis
There was no significant difference between the two methods for the measurements of SNPg, ANPg, SN/ANS-PNS, SN/Go-Gn, U1/ANS-PNS, L1/GoGn, and L1/APg (p > 0.05) while there was a significant difference between the two methods for the measurements of ANS-PNS/GoGn and U1-L1 (p < 0.05). All measurements showed a high correlation between the two methods (ICC > 0.70) (Tables 2 and 3).

Discussion
This study compared a digital automatic method and a digital manual method of the cephalometric analysis of the skull.
There have been studies that compared the accuracy of digital tracing to manually tracing X-rays on acetate paper at the very beginning of digital cephalometric analysis. Their findings revealed no statistically significant differences between the two strategies for identifying landmarks. There is a strong correlation between the reproducibility of landmarks within examiners when using manual and computerized procedures. However, whereas the measurement errors were generally equal, the inter-examiner repeatability of landmarks was unsatisfactory. Computerized measures offer a sizable time benefit over the manual approach. When the time benefits are considered, computer-assisted cephalometric studies can benefit physicians more because they do not result in an increase in intra-and inter-examiner error. The results of the digital cephalometric tracing between the various programs were identical [27][28][29].
Later, studies looked over the accuracy of automatic landmark identification for digital cephalometric analysis using different software. In 2020, Meric and Naoumova discovered that fully automated solutions can perform cephalometric analyses more quickly and accurately. According to the study's findings, the manual correction of CephX landmarks produces results that are comparable to those of digital tracings made with CephNinja and Dolphin but take much less time to analyze. A year later, Bulatova et al. discovered that only the U apex, L apex, Basion, Orbitale, and Gonion landmarks identification from the automatic digital cephalometric approach revealed a statistically significant difference, but none of the other landmarks did [30,31].
Our study revealed significant differences in FMA, L1MP, ANS-PNS/GoGn, and U1-L1 landmarks while the rest showed no significant differences. These results showed an agreement with the research of Bulatova et al. It is important to mention that all measurements that resulted in a statistically significant difference between the two methods do not appear to have a clinically significant difference. In cephalometrics, for every measurement, there is a norm value with a standard deviation. The means and the standard deviation of the values between the manual and the automatic tracing that were statistically significant only differ in decimal points or by a couple of degrees. As a result, our final diagnosis will not be affected by these cephalometric measurements since those differences in the values are very small and will maintain a final diagnostic outcome in relation to the norms. Therefore, we can conclude that the automatic tracing method is reliable and accurate when used as a diagnostic method.

Conclusions
The automatic cephalometric tracing method using CS imaging V8 software is a reliable and accurate method for all cephalometric measurements. There was a high intraclass correlation coefficient between the two methods for all measurements. There were differences in FMA, L1-MP, ANS-PNS/GoGn, and U1-L1 measurements but they are not considered clinically significant.