Deep Learning Based Airway Segmentation Using Key Point Prediction

: The purpose of this study was to investigate the accuracy of the airway volume measurement by a Regression Neural Network-based deep-learning model. A set of manually outlined airway data was set to build the algorithm for fully automatic segmentation of a deep learning process. Manual landmarks of the airway were determined by one examiner using a mid-sagittal plane of cone-beam computed tomography (CBCT) images of 315 patients. Clinical dataset-based training with data augmentation was conducted. Based on the annotated landmarks, the airway passage was measured and segmented. The accuracy of our model was conﬁrmed by measuring the following between the examiner and the program: (1) a difference in volume of nasopharynx, oropharynx, and hypopharynx, and (2) the Euclidean distance. For the agreement analysis, 61 samples were extracted and compared. The correlation test showed a range of good to excellent reliability. A difference between volumes were analyzed using regression analysis. The slope of the two measurements was close to 1 and showed a linear regression correlation (r 2 = 0.975, slope = 1.02, p < 0.001). These results indicate that fully automatic segmentation of the airway is possible by training via deep learning of artiﬁcial intelligence. Additionally, a high correlation between manual data and deep learning data was estimated.


Introduction
Recently, artificial intelligence has been used in the medical field to predict risk factors through correlation analysis and genomic analyses, phenotype-genotype association studies, and automated medical image analysis [1]. Recent advances in machine learning are contributing to research on identifying, classifying, and quantifying medical image patterns in deep learning. Since the convolutional neural network (CNN) based on artificial neural networks has begun to be used in medical image analysis, research on various diseases is rapidly increasing [2,3]. The use of deep learning in the medical field helps diagnose and treat diseases by extracting and analyzing medical images, and its effectiveness has been proven [4].
However, studies related to deep learning in the areas of oral and maxillofacial surgery are limited [5]. For oral and maxillofacial surgery, radiology is used as an important evaluation criterion in the diagnosis of diseases, treatment plans, and follow-up after treatment. However, the evaluation process is performed manually and the assessment can be different among examiners, or even with the same examiner. This may result in an inefficient and time-consuming procedure [6]. In particular, the evaluation of the airway is difficult to analyze due to its anatomical complexity and the limited difference in gray scale between soft tissue and air [7][8][9]. Airway analysis is essential for diagnosis and assessment of the treatment progress of obstructive sleep apnea patients and for predicting the tendency of airway changes after orthognathic surgery [10][11][12][13][14][15][16][17][18][19][20][21].
In most previous studies, the airway was segmented semi-automatically using software systems for volumetric measurements using cone-beam computed tomography (CBCT) images [21][22][23]. These studies evaluated the reliability and reproducibility of the software systems on the measurement of the airway [7,[24][25][26][27] and compared the accuracy between the various software systems [9,24,25,27]. However, in all cases, the software systems require manual processing by experts.
In this study, a regression neural network-based deep-learning model is proposed, which will enable fully automatic segmentation of airways using CBCT. The differences between the manually measured data and data measured by deep learning will be analyzed. Using a manually positioned data set, training and deep learning will be performed to determine the possibility of a fully automatic segmentation of the airway and to introduce a method and its proposed future use.

Sample Collection and Information
Images from 315 patients who underwent CBCT for orthognathic surgery were collected retrospectively from 2017 to 2019. The CBCT data were acquired using PaX-i3D (Vatech Co., Hwaseong-si, Korea) at 105-114 KVP, 5.6-6.5 mA with 160 mm × 160 mm field of view, and 0.3 mm in voxel size. The scanning conditions were automatically determined by the machine according to the patients' age and gender. The CBCT images were converted to DICOM 3.0 and stored on a Windows-10-based graphic workstation (Intel Core i7-4770, 32 GB). The patients were all placed in a natural head position. All image processing was performed using MATLAB 2020a (MathWorks, Natick, MA, USA) programming language.

Coordinate Determination in the Mid-Sagittal Plane
Five coordinates for each original image were obtained manually in the midsagittal plane of the CBCT images ( Figure 1). The definitions of the points and planes for the airway division are presented in Table 1, referring to Lee et al. [28]. These five coordinates were predicted by a 2D convolutional neural network for airway segmentation in the sagittal direction.  The plane was perpendicular to the midsagittal plane passing through the PNS and the Vp. CV1 plane The plane was parallel to the natural head position plane passing through CV1. CV2 plane The plane was parallel to the natural head position plane passing through CV2. CV3 plane The plane was parallel to the natural head position plane passing through CV3. CV4 plane The plane was parallel to the natural head position plane passing through CV4. Volume Nasopharynx From PNS-VP plane to CV1 plane Oropharynx From CV1 plane to CV2 plane Hypopharynx From CV2 plane to CV4 plane

Airway Segmentation
First, the image was binarized, then it was filled through a 3D close operation, and hole filling, and then, the binarized image was subtracted from the filled image to obtain an airway image. After erasing the image outside, the area that references five points, and the 1/4 and 3/4 of the inferior border are connected. Only the largest object is left to obtain the airway image ( Figure 2).

Training via Regression Neural Network and Metrics for Accuracy Comparison
The 315 midsagittal images obtained from the patient's cone-beam computed tomography (CBCT) data were split into training and test sets at a ratio of 4:1. During clinical data set-based training, validation was not performed because the sample size was too small for validation. Instead, a five-fold cross-validation was applied. First, the image size was set to 200 × 200 pixels, and 16 convolution layers were packed for feature extraction.
To generate the regression model, the regression layer was connected to a fully connected layer. Mean-squared-error was used as a loss function. Data augmentation was then conducted, including rotation from −6 • to +6 • , uniform (isotropic) scaling from 0.5 • to 1 • , Poisson noise addition, and contrast and brightness adjustment. An NVIDIA Titan RTX GPU with CUDA (version 10.1) acceleration was used for network training. The models were trained for 243 epochs using an Adam optimizer with an initial learning rate of 1e-4 and a mini-batch size of 8.
The prediction accuracy of the model was calculated using (a) the volume difference between the predicted and manually determined nasopharynx, oropharynx, and hypopharynx, and (b) the Euclidean distance between where the predicted and manually determined points are real data.

Measurements of the Differences between Manual Analysis and Deep Learning Analysis
The five coordinates manually pointed and predicted by the deep learning model are shown in Figure 3. The Euclidean distance between the predicted and manually determined points was largest at CV4 (4.156 ± 2.379 mm) and smallest at CV1 (2.571 ± 2.028 mm). Other Euclidean distances were estimated as 2.817 ± 1.806 mm at PNS, 2.837 ± 1.924 mm at Vp, and 2.896 ± 2.205 mm at CV2. When the volume was compared for each part, the hypopharynx showed the largest difference difference (50 ± 57.891 mm 3 ), and the oropharynx was assessed as having the smallest difference (37.987 ± 43.289 mm 3 ). The difference in the nasopharyngeal area was 48.620 ± 49.468 mm 3 . The difference in total volume was measured as 137.256 ± 146.517 mm 3 . All measurements of the differences are shown in Table 2. Volume differences among parts of the airway are shown in Figure 4.   In the boxplots, 'x' within the box marks the mean of volume differences.

Agreement Analysis
Using agreement analysis, 61 samples were extracted and the manually measured value and deep learning network predicted value were compared for both volumes and coordinates. The total volume was the most correlated intra-class correlation coefficient (ICC) value in the oropharynx (0.986), followed by the hypopharynx (0.964), and the nasopharynx (0.912). The intra-class correlation coefficient (ICC) value for the coordinate CV2(x) was the most correlated (0.963) and the least correlated at CV4(y) (0.868). All ICC values are presented in Table 3.

Linear Regression Scatter Plots and Bland-Altman Plot for the Total Volume Data Set
The total volume measured by deep learning was compared with the volume manually measured using regression analysis ( Figure 5). The slopes of the two measurements were close to 1 and showed a linear regression correlation as r 2 = 0.975, slope = 1.02, and p < 0.001. Bland-Altman plots and analyses were used to compare the total volume of the two methods, and the results are presented in Figure 6. The Bland-Altman plot comparing the level of agreement between manual and deep learning indicates an upper limit of agreement (0.261 cm 3 ) and a lower limit of agreement (−0.207 cm 3 ). The range of the 95% confidence interval was 0.468 cm 3 .

Discussion
In the medical field, many studies have used artificial intelligence via deep learning in radiology [29,30]. There are studies on fully automated airway segmentation of lungs with volumetric computed tomographic images using a convolutional neural network (CNN) [31] and on automatic segmentation and 3D reconstruction of inferior turbinate and maxillary sinus from otorhinolaryngology [32]. Due to the complex anatomical structure of the airway, there are difficulties in researching the airway using manual measurements, which is a time-consuming process, and entails inter-examiner error, intra-examiner error, and a lack of certainty because of the small differences on a gray scale [23]. For these reasons, automated measurement and analysis are necessary, but the fully auto-segmentation of the airway is challenging and a study of airway segmentation using deep learning in the area of oral and maxillofacial surgery has not previously been reported.
Therefore, in this study, we performed a fully automated segmentation of the airway using artificial intelligence for enabling faster and more practical measurement and analysis in clinical practice. The correlation between the coordinates and volumes measured manually and by the deep learning network were evaluated and compared. The distance between the coordinates of each of the five airway reference points was measured between 2.5 mm and 4.1 mm, and the difference between the measured volumes was 48.620 mm 3 in the nasopharynx, 37.987 mm 3 in the oropharynx, and 50.010 mm 3 in the hypopharynx. The difference in total volume was observed to be 85.256 mm 3 . Therefore, it is considered that the correlation between each coordinate and volume showed good to excellent reliability.
In this study, the threshold is defined by the Otsu method [33], the binarized image is extracted, and deep learning performs fully automatic division of the airway and divides it into the nasopharynx, oropharynx, and hypopharynx parts through the reference plane.
The difference between the total volumes in this study was evaluated as an acceptable value at 0.46 cm 3 when compared to the Torres et al. [25] study, which gave the difference between the water volume of an actual prototype and the volume on the CT software as 0.2 cm 3 to 1.0 cm 3 . The difference in the volume of the oropharynx was measured as the smallest, which showed the same results as El et al. [34]. According to Alsufyani et al. [23], since the oropharynx airway is a completely empty space like a tube, it is straightforward to measure the volume. The more complex and narrow shape of the airway's soft tissue is due to anatomical complexity, such as epiglottis. This has the highest error in volumetric measurements [35]. Therefore, it can be considered that a simpler anatomical structure will result in a smaller difference between the measurement methods.
When comparing the distance of each point, the result of this study is not clinically applicable. A clinically acceptable difference between the landmarks is approximately 2 mm, according to Lee et al. [36]. There are several reasons for a possible error, which include the limitation in the number of training data sets and the necessity for more precise data preparation, such as setting more reference points on each slice segmentation. In setting the reference points for precise training, the reference points were selected on the bony parts to reduce the error due to the variety of soft tissue shapes. This allows clear determination of the anatomical point aided by the large difference on a gray scale, and a simpler comparison of the relationship before and after surgery. Hence, this study applied the reference points of the Lee et al. study [28]. Nevertheless, in the present study, the distance of CV4 had a larger error, which may be due to the shape of the spine CV4 appearing in various ways in the sagittal plane compared to CV1 or CV2. It is necessary to set an additional reference point to define the hypopharynx that appears to be constant in the midsagittal plane.
The limitation of most airway segmentation research is possibly due to an inconsistent patient head position [23,27,37]. Since patients underwent CBCT in the natural head position in this study, errors may occur. It has been reported that the shape of the airway can vary greatly depending on the angle of the head [38]. However, as concluded in most research, it is not a significant error when comparing the volume of the airway rather than evaluating the volume itself [25]. When performing CBCT, the patient's head position is consistently adjusted to a natural head position by the examiner through the head strap, chin support, and guide light. In addition, the natural head position has been proven to be reproducible [39], and, hence, there should be no major error when comparing. Due to breathing and tongue position, errors may occur in volumetric measurements [35,37]. Therefore, for each variable, controlled and consistent scanning is required. This study divided the airway volume using 5 points in the 2D mid-sagittal image. The accuracy of these points affects the accuracy of airway segmentation. Therefore, bigger data is needed for clinical application of our algorithm to raise accuracy of coordinate determination.
In the agreement analysis, according to Koo et al. [40], "Based on the 95% confident interval of the ICC estimate, values less than 0.5, between 0.5 and 0.75, between 0.75 and 0.90, and greater than 0.90 are indicative of poor, moderate, good, and excellent reliability, respectively." In the present study, oropharynx, hypopharynx, total volume, PNS(y), CV1(y), CV2(x), and CV4(x) indicated excellent reliability, and all other variables indicated good reliability based on the Koo et al. report [40].
These results indicate that fully automatic segmentation of the airway is possible through training via deep learning of artificial intelligence. In addition, high correlation between manual data and deep learning data was estimated. To improve the accuracy, validity, and reliability of auto-segmentation, further data collection and optimum training with big data will be required for future clinical application. Furthermore, to raise the robustness of our algorithm, bigger data is needed for accurate coordinate determination. Transfer learning with other datasets, such as facial coordinates, can also be useful. We plan to develop more robust algorithms with bigger data.

Conclusions
In this study, using a manually positioned data set, fully automatic segmentation of the airway was possible with artificial intelligence by training a deep learning algorithm and a high correlation between manual data and deep learning data was estimated.
As the first study to utilize artificial intelligence to reach full auto-segmentation of the airway, this paper is meaningful in showing the possibility of a more accurate and quicker way of producing airway segmentation. For a future clinical application, the more robust algorithms with bigger and multiplex datasets are required. Informed Consent Statement: Patient consent was waived because of the retrospective nature of the study and the analysis used anonymous clinical data.

Data Availability Statement:
The data presented in this study are openly available in Github at: https://github.com/JaeJoonHwang/airway_segmentation_using_key_point_prediction, accessed on 13 April 2021.

Conflicts of Interest:
The authors declare no conflict of interest.