Automated Adenoid Hypertrophy Assessment with Lateral Cephalometry in Children Based on Artificial Intelligence

Adenoid hypertrophy may lead to pediatric obstructive sleep apnea and mouth breathing. The routine screening of adenoid hypertrophy in dental practice is helpful for preventing relevant craniofacial and systemic consequences. The purpose of this study was to develop an automated assessment tool for adenoid hypertrophy based on artificial intelligence. A clinical dataset containing 581 lateral cephalograms was used to train the convolutional neural network (CNN). According to Fujioka’s method for adenoid hypertrophy assessment, the regions of interest were defined with four keypoint landmarks. The adenoid ratio based on the four landmarks was used for adenoid hypertrophy assessment. Another dataset consisting of 160 patients’ lateral cephalograms were used for evaluating the performance of the network. Diagnostic performance was evaluated with statistical analysis. The developed system exhibited high sensitivity (0.906, 95% confidence interval [CI]: 0.750–0.980), specificity (0.938, 95% CI: 0.881–0.973) and accuracy (0.919, 95% CI: 0.877–0.961) for adenoid hypertrophy assessment. The area under the receiver operating characteristic curve was 0.987 (95% CI: 0.974–1.000). These results indicated the proposed assessment system is able to assess AH accurately. The CNN-incorporated system showed high accuracy and stability in the detection of adenoid hypertrophy from children’ lateral cephalograms, implying the feasibility of automated adenoid hypertrophy screening utilizing a deep neural network model.


Introduction
Located in the posterior and anterior wall of the nasopharynx, the adenoids are parts of the pharyngeal lymphoid ring. The adenoids, or pharyngeal tonsils, increase in size during childhood to twice of their final adult size with a particular pattern of growth. Under the physiological condition, adenoids often get smaller at the age of 6 and disappear at 10 years old. However, frequent upper airway infections can lead to pathological hypertrophy of the adenoids. The prevalence of adenoid hypertrophy (AH) in children and adolescents ranges from 42 to 70% [1]. AH is one of the most prevalent causes of upper airway obstruction and obstructive sleep apnea (OSA) in children [2].
Mouth breathing resulted from upper airway obstruction may lead to abnormal dentofacial development. Many previous studies have focused on the association between mouth breathing and dentofacial development, according to which mouth breathing could lead to narrow upper arch, longer facial height, steeper mandibular plane angle, and a Diagnostics 2021, 11, 1386 2 of 11 more retrognathic mandible [3,4]. In addition, failure to thrive, neurobehavioral problems, and depressive symptoms are also believed to be associated with pediatric OSA [5][6][7][8].
Children with AH usually present in orthodontics department with malocclusion, thus the routine screening of AH in dental practice is helpful for preventing relevant craniofacial and systemic consequences [9]. Nasal endoscopy stands as the current gold standard of diagnosing AH [10]. However, nasal endoscopy is painful, and some young children cannot cooperate adequately. Plenty of studies have been performed to identify other reliable diagnostic tools for the detection of hypertrophic adenoid. In orthodontic practice, the lateral cephalogram is a simple, economic, and routine examination. Many studies have proven that lateral cephalograms had high reliability in detecting AH [11,12]. Recently, a systematic review suggested that despite a relatively high false-positive rate, the lateral cephalogram has great diagnostic accuracy (area under the receiver operating characteristic curve = 0.86) for the diagnosis of AH [13].
One of the most notable AH assessment method based on cephalograms is Fujioka's adenoid-nasopharyngeal (AN) ratio [14]. In Fujioka's [14] assessment method, four relevant landmarks are manually marked on the cephalograms to measure the AN ratio, which is similar to the process of cephalometric analysis. However, the entire assessment process, including landmark identification, is highly time-consuming and involves repetitive work. Besides, the accuracy of landmark identification depends largely on the examiner's clinical experience. Inaccurate identification of cephalometric landmarks may lead to incorrect assessment results. Therefore, it is necessary to develop an accurate and efficient algorithm to automatically classify AH in lateral cephalograms.
Artificial intelligence (AI) refers to intelligence demonstrated by machines that can imitate human knowledge and behavior. Deep learning is a subtype of machine learning technique using multi-layer mathematical operations for automated learning and inferring complex data, such as imagery [15]. Deep learning structures, such as convolutional neural networks (CNNs), have been widely used for automatic image classification [16]. In dentistry, images play an important role in screening, diagnosis, and treatment planning. Moreover, the application of deep learning algorithms in cephalometric analysis and the diagnosis of skeletal classification has shown good performance [17][18][19][20]. However, research on the use of deep-learning-based methods in radiographic AH assessment is still limited [21].
Therefore, the purpose of this study was to propose a deep learning method for automated AH assessment based on lateral cephalograms.

Materials and Methods
This study was approved by the Ethics Committee of the School and Hospital of Stomatology, Wuhan University (No. 2020-B55).

Samples and Identification of Landmarks
The pre-treatment digital lateral cephalograms of all outpatients (6 y to 12 y, n = 937) attending the Department of Orthodontics, Hospital of Stomatology, Wuhan University in April-August, 2019 were collected. As determined a priori, 36 images with poor quality, including those with unclear occipital slope, were excluded, resulting in a sample of 901 cephalograms (normal: 651, moderate hypertrophy: 197, severe hypertrophy: 53). The method used for AH assessment was based on Fujioka's A/N ratio [14]. As shown in Figure 1a, line segment L is drawn along the straight part of the anterior margin of the basiocciput; A' is the point of maximal convexity along the inferior margin of the adenoid; PNS is the posterior superior edge of the hard palate; line segment A indicates the size of the adenoid, and line segment N indicates the size of the nasopharyngeal space. A child can be suspected of AH if the A/N ratio is greater than 60%.
The original images were rotated from −20 to 20 degrees around the image center. In addition, these images were shifted by 10 pixels in the up, down, left, and right directions, and 20 pixels in the diagonal directions. The rotation and translation processes were carried out in a manner such that the ROI would be always within the image to avoid information loss. After this step, the size of training dataset grew from 581 images to 9877 images.   Table 1 demonstrate the overall architecture of our model, named HeadNet. It consisted of convolutional layers, attention residual modules [23,24], hourglass modules [25], and an integral regression layer [26]. The hourglass module with top-down and bottom-up design built with regular residual module (Supplementary Figure S1) had the advantage in integrating multiscale information for further detection. The attention residual module (Supplementary Figure S2) evolved from a regular residual module that was composed of a serialized placed channel attention part (Supplementary Figure S3a) and a spatial attention part (Supplementary Figure S3b) before output, as this kind of combination has been reported to achieve better results [23]. Among the 901 lateral cephalograms, 581 were randomly selected for training, and 160 were randomly selected for validation, while the remaining 160 were used for testing. As shown in Figure 1b Given that the original dataset size was relatively small, we augmented the training dataset to improve the performance and generalization ability of the neural network [22]. The original images were rotated from −20 to 20 degrees around the image center. In addition, these images were shifted by 10 pixels in the up, down, left, and right directions, and 20 pixels in the diagonal directions. The rotation and translation processes were carried out in a manner such that the ROI would be always within the image to avoid information loss. After this step, the size of training dataset grew from 581 images to 9877 images.  Table 1 demonstrate the overall architecture of our model, named Head-Net. It consisted of convolutional layers, attention residual modules [23,24], hourglass modules [25], and an integral regression layer [26]. The hourglass module with top-down and bottom-up design built with regular residual module (Supplementary Figure S1) had the advantage in integrating multiscale information for further detection. The attention residual module (Supplementary Figure S2) evolved from a regular residual module that was composed of a serialized placed channel attention part (Supplementary Figure S3a) and a spatial attention part (Supplementary Figure S3b) before output, as this kind of combination has been reported to achieve better results [23]. Diagnostics 2021, 11, 1386 4 of 13

Figure 2.
Model Architecture: The yellow rectangle represents the 2-d convolutional layer; the red rectangle represents the attentional residual module; the blue rectangle in the hourglass-style represents normal residual module; the green rectangle represents the integral regression layer, which converts heatmaps into keypoints; each convolutional layer is followed by a ReLU operation. For efficiency considerations, all images (format: JPEG) were resized into the resolutions of 256 × 256 from 2300 × 2300 without unduly compromising their accuracy. An integral regression layer was applied over generated feature heatmaps by hourglass module to convert them into continuous coordinates [26]. The backpropagation was performed with different losses. The basic loss item was obtained through the comparison between detection and ground truth with L1 Loss, as it performed better than L2 Loss [26].
By deploying prior knowledge for the neural network, the network model could achieve higher performance [27]. Rotation case (Supplementary Figure S4a) would affect the vertical intersection between A' and Ar-Ba. Translation case (ideal case: Supplementary Figure S4b), which usually comes with rotation (Supplementary Figure S4c,d), would affect the A and N. The distance from Ar(dt) to ground truth line (formed by Ar and Ba ) is marked as ; the distance from Ba(dt) to ground truth line is marked as D . Intermediate supervision was adopted since it would improve the accuracy of classification [22,28].
To evaluate the effect of our proposed losses, ablation experiments were performed: HeadNet was trained with rotation, translation loss, and attention residual module.

Training Details
We trained the HeadNet with batch size as 10 using the SGD optimizer (momentum was 0.9, and the weight decay was 2 10 ), and all parameters of convolutional layers were initialized randomly. The training process started with warm-up (initial learning The yellow rectangle represents the 2-d convolutional layer; the red rectangle represents the attentional residual module; the blue rectangle in the hourglass-style represents normal residual module; the green rectangle represents the integral regression layer, which converts heatmaps into keypoints; each convolutional layer is followed by a ReLU operation. For efficiency considerations, all images (format: JPEG) were resized into the resolutions of 256 × 256 from 2300 × 2300 without unduly compromising their accuracy. An integral regression layer was applied over generated feature heatmaps by hourglass module to convert them into continuous coordinates [26]. The backpropagation was performed with different losses. The basic loss item was obtained through the comparison between detection and ground truth with L1 Loss, as it performed better than L2 Loss [26].
By deploying prior knowledge for the neural network, the network model could achieve higher performance [27]. Rotation case (Supplementary Figure S4a) would affect the vertical intersection between A' and Ar-Ba. Translation case (ideal case: Supplementary  Figure S4b), which usually comes with rotation (Supplementary Figure S4c,d), would affect the A and N. The distance from Ar(dt) to ground truth line (formed by Ar gt and Ba gt ) is marked as D a ; the distance from Ba(dt) to ground truth line is marked as D b . Intermediate supervision was adopted since it would improve the accuracy of classification [22,28].
To evaluate the effect of our proposed losses, ablation experiments were performed: HeadNet was trained with rotation, translation loss, and attention residual module.

Training Details
We trained the HeadNet with batch size as 10 using the SGD optimizer (momentum was 0.9, and the weight decay was 2 × 10 −5 ), and all parameters of convolutional layers were initialized randomly. The training process started with warm-up (initial learning rate is 0.001) and an annealing strategy in which the learning rate was updated every 5 epochs.

Statistical Analysis and Evaluation
The absolute distance between the ground truth and the predicted point, and the average precision (AP), as well as the average recall (AR), were the evaluation metrics for keypoint detection. The AN ratio error as the key indicator was the absolute error between the predicted the AN ratio and the actual value. The AN ratio diagnostic accuracy, sensitivity, specificity, receiver operating characteristic curves (ROC), and the area under the curve (AUC), with 95% CIs, were used to test the system's performance.

Results
The system showed high performance in AH assessment. The sensitivity, specificity, and accuracy were 0.906 (95% CI: 0.750-0.980), 0.938 (95% CI: 0.881-0.973), 0.919 (95% CI: 0.877-0.961), respectively. The positive likelihood ratio was 10, and the negative likelihood was 0.067. The ROC is provided in Figure 3, and the AUC (95% CI) was 0.987 (95% CI: 0.974-1.000). These results indicated the accuracy of the proposed assessment system. rate is 0.001) and an annealing strategy in which the learning rate was updated every 5 epochs.

Statistical Analysis and Evaluation
The absolute distance between the ground truth and the predicted point, and the average precision (AP), as well as the average recall (AR), were the evaluation metrics for keypoint detection. The AN ratio error as the key indicator was the absolute error between the predicted the AN ratio and the actual value. The AN ratio diagnostic accuracy, sensitivity, specificity, receiver operating characteristic curves (ROC), and the area under the curve (AUC), with 95% CIs, were used to test the system's performance.

Results
The system showed high performance in AH assessment. The sensitivity, specificity, and accuracy were 0.906 (95% CI: 0.750-0.980), 0.938 (95% CI: 0.881-0.973), 0.919 (95% CI: 0.877-0.961), respectively. The positive likelihood ratio was 10, and the negative likelihood was 0.067. The ROC is provided in Figure 3, and the AUC (95% CI) was 0.987 (95% CI: 0.974-1.000). These results indicated the accuracy of the proposed assessment system. The area under the curve (AUC) was far exceeding 0.9, which indicated that the proposed system was able to accurately assess adenoid hypertrophy.
The evaluation process for 160 sampled images of this diagnostic system took approximately 11 s with a GTX 1070 graphics card. Figure 4 shows changes in the AN ratio error during 200 epochs of training, while Figure 5 shows absolute distance between ground truth and predicted point (in pixel). As the Figure 5 shows, although the average location error is small, the localization error of A' was exceedingly great, which might be due to unclear adenoid area in validation images. Figures 6 and 7 show changes in the validation of AP and AR during 200 epochs, respectively. These curves suggested that the HeadNet model could learn quickly and find the keypoints location during the first The area under the curve (AUC) was far exceeding 0.9, which indicated that the proposed system was able to accurately assess adenoid hypertrophy.
The evaluation process for 160 sampled images of this diagnostic system took approximately 11 s with a GTX 1070 graphics card. Figure 4 shows changes in the AN ratio error during 200 epochs of training, while Figure 5 shows absolute distance between ground truth and predicted point (in pixel). As the Figure 5 shows, although the average location error is small, the localization error of A' was exceedingly great, which might be due to unclear adenoid area in validation images. Figures 6 and 7 show changes in the validation of AP and AR during 200 epochs, respectively. These curves suggested that the HeadNet model could learn quickly and find the keypoints location during the first 50 epochs. However, as the model started to converge, the validation error gradually decreased, while validation accuracy increased slowly.      Table 2 presents the performance details of HeadNet. HeadNet* indicates attention residual module was applied. The rotation (r) loss and translation (t) loss were applied in both HeadNet (r, t) and HeadNet* (r, t). HeadNet* (r, t) could achieve the best performance among all the models with F1-Score = 0.936 and AN ratio error = 0.025. Table 3 shows the absolute localization error over keypoints between these models in test dataset; as the table showed, the HeadNetr* (,t) performs better than other models. Figure 8 shows the predicted keypoints by HeadNet* (r, t) are located closely to the manually landmarked ones.   Table 2 presents the performance details of HeadNet. HeadNet * indicates attention residual module was applied. The rotation (r) loss and translation (t) loss were applied in both HeadNet (r, t) and HeadNet * (r, t). HeadNet * (r, t) could achieve the best performance among all the models with F1-Score = 0.936 and AN ratio error = 0.025. Table 3 shows the absolute localization error over keypoints between these models in test dataset; as the table showed, the HeadNetr* (r, t) performs better than other models. Figure 8 shows the predicted keypoints by HeadNet * (r, t) are located closely to the manually landmarked ones.

Discussion
In children, AH is the most common etiology of partial or complete upper airway obstruction, which can further lead to mouth breathing. Increasing evidence has indicated that AH is associated with dentofacial anomalies [29,30]. For mouth breathing patients, the physiological stimulus for the maxilla growth and the subsequent lowering of the palatal vault could be suppressed due to the reduction of the continuous airflow through the nasal passage [31]. Children with AH are expected to have narrow dental arches, deep palatal height, increased mandibular angle, retrognathic mandible, and convex profile [29,30]. These certain facial features are also called "adenoid facies".
Both the upper airway and dentofacial structures can be observed in lateral cephalograms, and lateral cephalometry was therefore considered to be a useful screening tool in the assessment of upper airway structures [32,33]. Children with AH usually present in orthodontic clinics with a chief complaint of malocclusion or dissatisfaction with their profile. Besides, the prevalence of pediatric sleep breathing disorder in the general orthodontic population was more than twice that reported in a healthy pediatric population [34]. As cephalometry is routinely performed in orthodontic practice, orthodontists are strongly recommended to screen their patients for sleep breathing disorders and AH in clinical practice [35]. Children with suspected AH based on lateral cephalograms could be referred by orthodontists to the ENT department for diagnosis and treatment [9].
In the present study, we developed an AI method that can assess children's AH using their lateral cephalograms. The model was trained with lateral cephalograms of pediatric patients and showed the ability of locating the key points for AN ratio. If the AN ratio is greater than 0.6, a diagnosis of AH will be made. Over the 160 test samples, the

Discussion
In children, AH is the most common etiology of partial or complete upper airway obstruction, which can further lead to mouth breathing. Increasing evidence has indicated that AH is associated with dentofacial anomalies [29,30]. For mouth breathing patients, the physiological stimulus for the maxilla growth and the subsequent lowering of the palatal vault could be suppressed due to the reduction of the continuous airflow through the nasal passage [31]. Children with AH are expected to have narrow dental arches, deep palatal height, increased mandibular angle, retrognathic mandible, and convex profile [29,30]. These certain facial features are also called "adenoid facies".
Both the upper airway and dentofacial structures can be observed in lateral cephalograms, and lateral cephalometry was therefore considered to be a useful screening tool in the assessment of upper airway structures [32,33]. Children with AH usually present in orthodontic clinics with a chief complaint of malocclusion or dissatisfaction with their profile. Besides, the prevalence of pediatric sleep breathing disorder in the general orthodontic population was more than twice that reported in a healthy pediatric population [34]. As cephalometry is routinely performed in orthodontic practice, orthodontists are strongly recommended to screen their patients for sleep breathing disorders and AH in clinical practice [35]. Children with suspected AH based on lateral cephalograms could be referred by orthodontists to the ENT department for diagnosis and treatment [9].
In the present study, we developed an AI method that can assess children's AH using their lateral cephalograms. The model was trained with lateral cephalograms of pediatric patients and showed the ability of locating the key points for AN ratio. If the AN ratio is greater than 0.6, a diagnosis of AH will be made. Over the 160 test samples, the average keypoint localization error was 1.651 in pixels, while the average accuracy precision, recall, F1 score, and AN ratio error was 0.919, 0.954, 0.936, and 0.025, respectively. The diagnostic accuracy, sensitivity, and specificity were 0.919, 0.906, and 0.938, respectively. Besides, the AUC is 0.99, which far exceeds 0.9. These results indicated that the model was accurate and stable. To our knowledge, so far there are only two studies that have applied AI techniques to AH diagnosis. One of them proposed the VGG-Lite model for the automated evaluation of AH but eliminated the process of landmark identification [36]; the other [21] explored the use of AI in AH diagnosis based on magnetic resonance imaging (MRI), which is not routinely used in orthodontic practice. In contrast, the present study was based on lateral cephalometry, a routine examination conducted by orthodontists. Besides, our AI model was improved to be more suitable for lateral cephalograms and the calculation method. Attention residual modules that we used in this study could apparently improve the performance of keypoints detection and reduce the final AN ratio error.
The significance of this study is that our work could assist clinicians or dentists in the screening of AH by eliminating the possible human errors and greatly reducing the time consumption. Many experienced orthodontists and radiologists can estimate whether the adenoids are hypertrophic just by interpreting the image for a second without measuring the AN ratio. However, it would be time-consuming and fallible when manually evaluating the adenoids of a large sample. Therefore, this automated assessment tool can be used for relevant clinical/epidemiological studies, as well as health examinations at a community/population level.
However, this study has several limitations. Firstly, in order to simplify the labeling and learning process, we used the line connecting points Ar and Ba to replace the line tangent to occipital slope, which is similar to the standard AN ratio measurement method but may result in slightly different results in some borderline cases. Secondly, despite the advantages of being a routine diagnostic tool in orthodontic practice, cephalograms cannot provide 3-dimentional information for either adenoids or the upper airway. A previous study using CBCT showed that a AN ratio >0.6 correlates to a lower nasopharyngeal airway volume but not to the upper airway in general [37]. Thirdly, similar to other dental studies based on cephalograms, we had to manually mark relevant landmarks on cephalograms to construct the reference test [38]. The maximal convexity or deepest concavity on the contour were difficult to identify, which might be the reason why the localization deviation of A' was relatively large [39].

Conclusions
The CNN-incorporated system in this study has high accuracy and stability in the detection of AH. AI can be used in the screening of AH among children in dental practice. Institutional Review Board Statement: All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed Consent Statement: As a retrospective study using routinely collected data from healthcare activities, this study was approved by the Ethics Committee of School & Hospital of Stomatology, Wuhan University (No. 2020-B55) to be conducted without patients' informed consent.

Data Availability Statement:
The data underlying this article will be shared on reasonable request to the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.