Fully Automatic Segmentation, Identiﬁcation and Preoperative Planning for Nasal Surgery of Sinuses Using Semi-Supervised Learning and Volumetric Reconstruction

: The aim of this study is to develop an automatic segmentation algorithm based on paranasal sinus CT images, which realizes automatic identiﬁcation and segmentation of the sinus boundary and its inﬂamed proportions, as well as the reconstruction of normal sinus and inﬂamed site volumes. Our goal is to overcome the current clinical dilemma of manually calculating the inﬂammatory sinus volume, which is objective and ineffective. A semi-supervised learning algorithm using pseudo-labels for self-training was proposed to train convolutional neural networks, which consisted of SENet, MobileNet, and ResNet. An aggregate of 175 CT sets was analyzed, 50 of which were from patients who subsequently underwent sinus surgery. A 3D view and volume-based modiﬁed Lund-Mackay score were determined and compared with traditional scores. Compared to state-of-the-art networks, our modiﬁcations achieved signiﬁcant improvements in both sinus segmentation and classiﬁcation, with an average pixel accuracy of 99.67%, an MIoU of 89.75%, and a Dice coefﬁcient of 90.79%. The fully automatic nasal sinus volume reconstruction system was successfully obtained the relevant detailed information by accurately acquiring the nasal sinus contour edges in the CT images. The accuracy of our algorithm has been validated and the results can be effectively applied to actual clinical medicine or forensic research.


Introduction
The incidence of chronic rhinosinusitis (CRS) is growing even with medical treatment advances, and sufferers' quality of life continues to be influenced significantly. Clinically, the most frequently used tool for evaluating the condition of paranasal sinuses is computed tomography (CT) [1], and the Lund-Mackay Score (LMS) is the most universally accepted marking standard. The obstruction conditions can be divided into grades 0, 1, and 2 based on a 2D-CT image. A score of 0 points represents non-obstruction, 2 points is full obstruction, and 1 point is an intermediate condition. In terms of the advantages of conventional LMS, doctors with little clinical experience can handle it, and the marking standard is simple and objective. Nevertheless, people still attempt to improve this method, including using a 3D method to quantize the volume of inflamed mucosa. However, prior studies mostly used manual measurements when calculating the volume of paranasal sinuses [2][3][4][5][6][7][8]. These attempts are not extensively accepted in clinical medicine, as the complexity of grading is increased, and the objectivity is reduced.

Methods and Experimental Data
All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee. (Project identification code: C202105070).
This study collected data from 175 patients, comprised of 117 male and 58 female subjects, who received head CT in Tri-Service General Hospital from 2018 to 2019 and aged from 16 to 80 years. The datasets were 512 × 512 gray level JPG images, out of which 50 datasets were images jointly labeled by doctors, as shown in Figure 1. These images labeled data contain five different nasal sinus regions, namely maxillary sinus, frontal sinus, anterior ethmoid sinus, posterior ethmoid sinus, and sphenoid sinus. The datasets were split into training and validation datasets by k-fold cross-validation. The 125 groups of unlabeled data were split into training datasets for semi-supervised learning and testing datasets for the doctors to evaluate effectiveness. The overall data allocation is shown in Figure 2.    The datasets are detailed below: (1) Training dataset: For the labeled dataset, the number of images was doubled by the data augmentation method. The basis of the gradient descent backpropagation algorithm was calculated by the input image and actual labeling result. (2) Validation dataset: For the labeled dataset, the gradient descent backpropagation algorithm was not calculated. Only the difference between labeling and prediction was calculated in each iteration to evaluate the effect of the model in every iteration. (3) Semi-supervised learning dataset: For the unlabeled dataset, a dataset for generating pseudo-labels after a model was trained. (4) Testing dataset: The unlabeled dataset was a dataset for evaluating effectiveness after model training was completed. Doctors would give evaluation scores to confirm model accuracy.
This study proposed a fully automatic system for accurate segmentation of five kinds of paranasal sinuses in the head CT (i.e., maxillary sinus, anterior ethmoid sinus, posterior ethmoid sinus, frontal sinus, and sphenoid sinus) and volume reconstruction 3D printing. The system flow includes image data augmentation, deep learning semantic segmentation, k-fold cross-validation, morphology, marching cube 3D reconstruction method, and 3D printing technique.

Image Data Augmentation
This study employed data augmentation to increase the object feature learning difficulty of the model, reduce the overfitting problem, and overcome data deficiencies to improve model effectiveness [24].

Gaussian Blur
Some images are blurred in the imaging process. In order to overcome the problem and make sufficient data for the training, data augmentation was adopted to obtain more fuzzy characteristics such as brightness, deblur, rotation, etc. in the model training process, thereby increasing the difficulty level of model training. The Gaussian blur [23] function distribution is expressed as Equation (1): where γ is the blurred radius; σ is the standard deviation of normal distribution.
To equally allocate the weights of the images in two-dimensional space, the value of each pixel in the image is the weighted average of adjacent pixels. As the value of the original pixel has the maximum Gaussian distribution value, the weight is the largest. Further, the weight of the adjacent pixel decreases as the distance to the original pixel increases. Consequently, the edge effect is maintained better.

Gaussian Noise
As the CT imaging equipment employed in this study sometimes has image noise, the model adaptability can be enhanced by adding noise to the image data [25].

Mixup
This study adopted the Mixup Data Augmentation method [26] to combine the augmented data in order to accelerate the semantic segmentation model training process. Thus, the model learned images of different conditions, thereby learning different features, enhancing the model generalization effectiveness, and reducing the number of training iterations, defined as Equations (2) and (3): where x i and x j represent two different input image samples; y i and y j represent the one-hot label encoding corresponding to the two different input image samples respectively; λ is the ratio of the two image samples.

Semi-Supervised Learning
It was difficult to obtain the image segmentation data in this study and label all the obtained data. Thus, the unlabeled data were utilized using semi-supervised learning to enhance model effectiveness. Some labeled data were first employed in training, and then the trained model generated pseudo-labels for the unlabeled data to train the next model trained by a larger dataset.
In order to make the deep learning neural network model correctly learn the features of various nasal sinus regions-after the loss between neural network output layer value is correctly calculated-the neural network uses gradient descent and gradient to work out backpropagation to modify the neural network parameters. The gradient descent is expressed as Equation (4): where W is the weighting parameter; γ is the learning rate; L is the loss function.
This study adopted the gradient descent with momentum to solve the local optimal solution when the gradient was 0 so that the parameters cannot be updated continuously, expressed as Equations (5) and (6): where V is the directional velocity of momentum; β is the momentum.

Convolution Layer
Many convolution layers were employed in the semantic segmentation to learn the feature contours of different paranasal sinuses [17]. The convolution layer is a set of parallel feature maps and learning features from an input image (e.g., line or edge) and is composed of sliding different convolution kernels on the input image or feature map, executing certain operations [19]. Additionally, in each slide position, a neuron corresponding product and summing operation is executed between the convolution kernel and input image project Mathematics 2022, 10, 1189 6 of 32 the information in the receptive field to a neuron in the feature map. The output size after convolution kernel operation of the input image or feature map is expressed as Equation (7): where N is the length-width size of the output feature map; W is the length-width size of the input image or feature map; P is the length-width size of convolution kernel; is the padding quantity to avoid ignoring the farthest edge of an image or feature map when the convolution kernel performs an operation. To keep output and input feature maps of equal size, P = 1 was used in this study and S was employed as the step of convolution kernel (i.e., the sliding distance of each operation). The deep learning model has numerous parameters that slow down or fail training and prediction in the hardware facilities with small memory. In order to increase the instruction cycle and reduce parameters, MobileNet [27] proposed depthwise separable convolution.
It has two steps. First, the depthwise convolution performs a convolution operation in a width of 1 for each width dimension in the upper feature map, as shown in Figure 3, which is different from the general convolution operation in Figure 4. The general convolution operation calculates all convolution kernels (width larger than 1) for each width dimension in the feature map. This method performs operations separately and uses 1 × 1 pointwise convolution to fuse these independent feature maps. Thus, the parameters can be reduced effectively when the convolution kernel is large, and the computing time can be shortened greatly [28]. Therefore, the general convolution was replaced by depthwise separable convolution in this study.
Mathematics 2022, 10, x FOR PEER REVIEW 7 of 34 maps of equal size, P = 1 was used in this study and S was employed as the step of convolution kernel (i.e., the sliding distance of each operation). The deep learning model has numerous parameters that slow down or fail training and prediction in the hardware facilities with small memory. In order to increase the instruction cycle and reduce parameters, MobileNet [27] proposed depthwise separable convolution.
It has two steps. First, the depthwise convolution performs a convolution operation in a width of 1 for each width dimension in the upper feature map, as shown in Figure 3, which is different from the general convolution operation in Figure 4. The general convolution operation calculates all convolution kernels (width larger than 1) for each width dimension in the feature map. This method performs operations separately and uses 1 × 1 pointwise convolution to fuse these independent feature maps. Thus, the parameters can be reduced effectively when the convolution kernel is large, and the computing time can be shortened greatly [28]. Therefore, the general convolution was replaced by depthwise separable convolution in this study.  This study not only employed the depthwise separable convolution method to effectively reduce the amount of computation and lighten the weight of the architecture but also adopted squeeze-and-excitation networks [29] to train the influence weights among various feature maps. The important information of the feature map is further highlighted, as shown in Figure 5.
After the convolution operation, the squeeze-and-excitation network method employed global average pooling (GAP) to calculate the average value of different width dimensions and performed two fully connected layer neural networks to train the weights of different width dimensions. Finally, the learned weight was multiplied by the corresponding feature map width dimension, and the feature map was weighted, through which the importance of different width dimensions could be highlighted [30]. To make the output of the fully connected layer keep a normalized weight of 0~1, the Sigmoid function was used, expressed as Equation (8):   This study not only employed the depthwise separable convolution method to effectively reduce the amount of computation and lighten the weight of the architecture but also adopted squeeze-and-excitation networks [29] to train the influence weights among various feature maps. The important information of the feature map is further highlighted, as shown in Figure 5. After the convolution operation, the squeeze-and-excitation network method employed global average pooling (GAP) to calculate the average value of different width dimensions and performed two fully connected layer neural networks to train the weights of different width dimensions. Finally, the learned weight was multiplied by the corresponding feature map width dimension, and the feature map was weighted, through which the importance of different width dimensions could be highlighted [30]. To make the output of the fully connected layer keep a normalized weight of 0~1, the Sigmoid function was used, expressed as Equation (8): Additionally, this study adopted the residual connection of residual neural network (ResNet) [31][32][33][34]; the training degradation problem in the very deep neural network was solved successfully, as shown in Figure 6. Additionally, this study adopted the residual connection of residual neural network (ResNet) [31][32][33][34]; the training degradation problem in the very deep neural network was solved successfully, as shown in Figure 6.
where x l represents the input of the layer; x L represents the output of the layer. The effectiveness sometimes declines when the depth of the neural network is increased, leading to training degradation of the neural network. The residual connection guarantees at least identity mapping when computing gradient by backpropagation, and the gradient vanishing in multiple backpropagation processes is solved [35].

Activation Function
In this study, an activation function was added behind the convolution layer in the semantic segmentation model to increase the nonlinear characteristics of the model [19]. The common activation function in the convolutional neural network was rectified linear units (ReLU). The ReLU is more frequently used than the other activation functions because its functional operation is simple, where the instruction cycle of the neural network can be enhanced without a significant impact on the generalization accuracy of the model. Compared with the Sigmoid and Tanh functions used by conventional deep learning, the positive derivative of the ReLU function is 1. This special design alleviates the neural network gradient vanishing problem. Besides the ReLU function, the Mish function is extensively used in neural networks [36], expressed as Equation (11). When the input is a positive number, there is no limit value to the ReLU function. The main advantage is that the negative part is smoother than the ReLU function, which is different from the derivative 0 of ReLU in calculating the derivative by neural network gradient descent backpropagation. This activation function will not fail the update of corresponding neuron parameters.

K-Fold Cross-Validation
The K-fold cross-validation was employed to select the model for the next experiment. First, all the datasets were randomly divided into K equal parts; the data of the K − 1 part were used as a training dataset, and the residual part was used as a validation dataset. Afterward, one part was selected from the data, which was not used as a validation dataset. The data used as a validation dataset were added to the training dataset. Meanwhile, the K − 1 part remained for training and one part for validation. Then, the distribution was repeated until each part of the data had been used as a validation set, and the K iterative training and validations were performed. The single estimate could be obtained by averaging the K results. Further, similar to most deep learning methods, the data were randomly distributed into 80% training and 20% validation datasets. This study employed K = 5 to evaluate and select the optimal model [37].

Morphology
In this study, the region filter generated by the deep learning semantic segmentation model would be distorted by noise and texture so that the image might contain defects. This study also adopted morphology [38] to improve the filter. The shape of the structuring element was compared with the corresponding pixel neighborhood to generate dilation, erosion, opening, and closing results.

Erosion and Dilation
Erosion is the method that reduces the object image inward for several pixels. It has sets A and B, where B is the structuring element, A is eroded by B, as expressed in Equation (12).
The erosion shrinks or thins the object of the binary image and is most frequently used to remove the incoherent noise details in the binary image.
In contrast to erosion, dilation dilates the boundary of the object image outwards for several pixels. It has sets A and B, where B is the structuring element, A is dilated by B, as expressed in Equation (13).
The dilation expands the object of the binary image.

Opening
This study employed an opening to improve the left and right nasal sinus filters, which cannot be segmented. The opening made the object contour smoother, and the image was eroded before it was dilated. If Set A is opened by structuring element B, Set A is eroded by B, and B is dilated by the erosion result, as expressed in Equation (14):

Labeling
After the filters of different paranasal sinuses were predicted by the semantic segmentation model, this study adopted the labeling method to distinguish the left or right paranasal sinuses. The labeling of image processing meant that the adjacent pixels in the image were given the same label, and the connected pixels were regarded as the same region. The original image was divided into multiple regions based on this principle, and these regions were labeled to complete the labeling process [39].

Marching Cube 3D Reconstruction
This study employed deep learning semantic segmentation to obtain the nasal sinus region filter and adopted the Marching Cube 3D reconstruction algorithm [40] to reconstruct the image of various nasal sinus regions. The Marching Cube algorithm assumes the data to be discrete rule data points in 3D space. Taking the samples in this study as an example, the length and width of each head CT image are 512 × 512; if there are 10 slices of nasal sinus regions, one continuous function collects samples 512, 512, and 10 times in x, y, and z directions, respectively, continuous nasal sinus region slice images can be obtained. Therefore, based on this characteristic, this study adopted the Marching Cube algorithm for filter as the method for reconstructing various nasal sinus regions. Its main procedure is described below.
First, the nasal sinus region slice images in the 2D image were filled with unit blocks of equal size. The unit block had eight vertices, which were defined as voxels, as shown in Figure 7. The eight vertices of the unit block comprised the pixels of two layers of head CT scan slices, and the pixels of each layer were connected in this way to define the continuous function f (x i , y i , z i ) in the space. After the continuous function in the space was defined, the iso-value was give that ( )

, ,
i i i f x y z C = constructed a surface, and the intersecting region of the block and surface was obtained. Before the intersecting region of the unit block and face was obtained, the contour was described with 2D data, as shown in Figure 8 F pixels greater than or equal to the iso-value were defined as black dots, while t smaller than the iso-value were white dots. As shown in Figure 8b, four adjacent p form a square, and each vertex has a chance to be black or white. The pattern symm has three combinations, as shown in Figure 8a. When four vertices were all black o white, it implies that the square was inside or outside the isocontour, and the opp isocontour must have passed through a square with black dots at one end and white at the other end. For convenient demonstration, the pixels were represented only by 1 the iso-value was 6, and the isocontour was generated, as shown in Figure 8b. The 2D data were extended into the 3D dataset to find the iso-surface, the origina square was changed into a 3D unit block, and the four vertices were changed into e vertices of voxels. First, the iso-value was given; if the eight voxels of the unit block w greater than or equal to the iso-value, they were inside the iso-surface, otherwise out Similar to 2D data, the iso-surface would have passed through a unit block with one smaller than the iso-value and the other end greater than or equal to the iso-value. After the continuous function in the space was defined, the iso-value was given, so that f (x i , y i , z i ) = C constructed a surface, and the intersecting region of the unit block and surface was obtained. Before the intersecting region of the unit block and surface was obtained, the contour was described with 2D data, as shown in Figure 8 First, pixels greater than or equal to the iso-value were defined as black dots, while those smaller than the iso-value were white dots. As shown in Figure 8b, four adjacent pixels form a square, and each vertex has a chance to be black or white. The pattern symmetry has three combinations, as shown in Figure 8a. When four vertices were all black or all white, it implies that the square was inside or outside the isocontour, and the opposite isocontour must have passed through a square with black dots at one end and white dots at the other end. For convenient demonstration, the pixels were represented only by 1~10, the iso-value was 6, and the isocontour was generated, as shown in Figure 8b. After the continuous function in the space was defined, the iso-value was given, so that ( )

, ,
i i i f x y z C = constructed a surface, and the intersecting region of the unit block and surface was obtained. Before the intersecting region of the unit block and surface was obtained, the contour was described with 2D data, as shown in Figure 8 First, pixels greater than or equal to the iso-value were defined as black dots, while those smaller than the iso-value were white dots. As shown in Figure 8b, four adjacent pixels form a square, and each vertex has a chance to be black or white. The pattern symmetry has three combinations, as shown in Figure 8a. When four vertices were all black or all white, it implies that the square was inside or outside the isocontour, and the opposite isocontour must have passed through a square with black dots at one end and white dots at the other end. For convenient demonstration, the pixels were represented only by 1~10, the iso-value was 6, and the isocontour was generated, as shown in Figure 8b. The 2D data were extended into the 3D dataset to find the iso-surface, the original 2D square was changed into a 3D unit block, and the four vertices were changed into eight vertices of voxels. First, the iso-value was given; if the eight voxels of the unit block were greater than or equal to the iso-value, they were inside the iso-surface, otherwise outside. Similar to 2D data, the iso-surface would have passed through a unit block with one end smaller than the iso-value and the other end greater than or equal to the iso-value. If all The 2D data were extended into the 3D dataset to find the iso-surface, the original 2D square was changed into a 3D unit block, and the four vertices were changed into eight vertices of voxels. First, the iso-value was given; if the eight voxels of the unit block were greater than or equal to the iso-value, they were inside the iso-surface, otherwise outside. Similar to 2D data, the iso-surface would have passed through a unit block with one end smaller than the iso-value and the other end greater than or equal to the iso-value. If all the voxels were larger than or smaller than the iso-value, the iso-surface would not pass through the unit block, as shown in Figure 9. the voxels were larger than or smaller than the iso-value, the iso-surface would not pass through the unit block, as shown in Figure 9. Subsequently, the unit block was processed by line-search. As the eight voxels in the unit block were inside or outside the iso-surface, there were 28 combinations, which could be reduced to 15 according to the symmetric property of the method. The unit block illustration described the process of generating triangular iso-surface in the unit block.
Whether the iso-surface passes through the unit block was analyzed, the intersection point at the edge was obtained by linear interpolation, and the triangular iso-surface was generated online. The triangular facets of each unit block were connected to form a triangular mesh for 3D reconstruction.

Implementation Challenges
CT images are an important element to find sinusitis disease using the deep learning method. If the CT images are not good enough and labeled not properly then the deep learning method fails to learn. Therefore, acquiring CT images during data collection, the clinician must be careful. After collecting the data, the next challenge is to decide data augmentation technique. If the data augmentation doesn't meet the requirement based on CT images and based deep learning network, then also model fails to perform well. Therefore, the data augmentation technique must be done carefully and reasonably. Tuning hyperparameters is also a challenge; however, this can be overcome by training the model with a greater number of epochs and filters in the network.

Results
This experiment was divided into four major parts. Part 1 is the image data augmentation. Part 2 is the nasal sinus region segmentation, using different deep learning semantic segmentation models to train the datasets collected in this experiment. Part 3 employs the unlabeled image data for semi-supervised learning training, and the last part exhibits the volume reconstruction and analysis. The experimental system flow chart is shown in Figure 10. Subsequently, the unit block was processed by line-search. As the eight voxels in the unit block were inside or outside the iso-surface, there were 28 combinations, which could be reduced to 15 according to the symmetric property of the method. The unit block illustration described the process of generating triangular iso-surface in the unit block.
Whether the iso-surface passes through the unit block was analyzed, the intersection point at the edge was obtained by linear interpolation, and the triangular iso-surface was generated online. The triangular facets of each unit block were connected to form a triangular mesh for 3D reconstruction.

Implementation Challenges
CT images are an important element to find sinusitis disease using the deep learning method. If the CT images are not good enough and labeled not properly then the deep learning method fails to learn. Therefore, acquiring CT images during data collection, the clinician must be careful. After collecting the data, the next challenge is to decide data augmentation technique. If the data augmentation doesn't meet the requirement based on CT images and based deep learning network, then also model fails to perform well. Therefore, the data augmentation technique must be done carefully and reasonably. Tuning hyperparameters is also a challenge; however, this can be overcome by training the model with a greater number of epochs and filters in the network.

Results
This experiment was divided into four major parts. Part 1 is the image data augmentation. Part 2 is the nasal sinus region segmentation, using different deep learning semantic segmentation models to train the datasets collected in this experiment. Part 3 employs the unlabeled image data for semi-supervised learning training, and the last part exhibits the volume reconstruction and analysis. The experimental system flow chart is shown in Figure 10.

Image Data Augmentation
This study adopted the image data augmentation technology, making finite data generate more equivalent data to augment the training dataset. The data deficiencies were overcome, and the data augmentation method could increase the difficulty of the model for object features. In addition, the overfitting problem can be reduced [41], improving model effectiveness.

Image Data Augmentation
This study adopted the image data augmentation technology, making finite data generate more equivalent data to augment the training dataset. The data deficiencies were overcome, and the data augmentation method could increase the difficulty of the model for object features. In addition, the overfitting problem can be reduced [41], improving model effectiveness.

Gaussian Blur
The imaging blurred image samples often occur in the research on experimental data samples. Partial images were also blurred in the course of imaging in this study. Thus, the blurred image samples were intentionally made using Gaussian blur, and more fuzzy characteristics were obtained in the model training process, as shown in Figure 11.

Gaussian Blur
The imaging blurred image samples often occur in the research on experimental data samples. Partial images were also blurred in the course of imaging in this study. Thus, the blurred image samples were intentionally made using Gaussian blur, and more fuzzy characteristics were obtained in the model training process, as shown in Figure 11.

Gaussian Noise
After the CT image was blurred, the Gaussian noise was employed for the Gaussian distribution of the original blurred image, and the training difficulty of the semantic segmentation neural network model was further increased. Increasing noise in the input image is similar to using the Dropout method to reduce overfitting, as shown in Figure 12.

Mixup
Besides the two data augmentation methods of Gaussian blur and Gaussian noise, some common data augmentation methods for deep learning image recognition were adopted, such as random rotation, mirror reflection, contrast, and brightness control. Finally, a randomly selected original training dataset input image and another augmented similar data were combined using the Mixup method, forming a new data augmentation training dataset. The result is shown in Figure 13. The Mixup data refers to the superimposition of the original image and it is one of the augmented images. Figure 13a,b are the Mixup images which has original and random rotation augmented images in them. The purpose of this type of data is to feed images with more noise to the model for better identification.

Nasal Sinus Region Segmentation
To calculate the volume of each nasal sinus, the system should segment all kinds of nasal sinus contour regions for subsequent calculation. This section will detail the result of deep learning semantic segmentation used in this procedure.
for object features. In addition, the overfitting problem can be reduced [41], improving model effectiveness.

Gaussian Blur
The imaging blurred image samples often occur in the research on experimental data samples. Partial images were also blurred in the course of imaging in this study. Thus, the blurred image samples were intentionally made using Gaussian blur, and more fuzzy characteristics were obtained in the model training process, as shown in Figure 11.

Gaussian Noise
After the CT image was blurred, the Gaussian noise was employed for the Gaussian distribution of the original blurred image, and the training difficulty of the semantic segmentation neural network model was further increased. Increasing noise in the input image is similar to using the Dropout method to reduce overfitting, as shown in Figure 12. Besides the two data augmentation methods of Gaussian blur and Gaussian noise, some common data augmentation methods for deep learning image recognition were

Gaussian Noise
After the CT image was blurred, the Gaussian noise was employed for the Gaussian distribution of the original blurred image, and the training difficulty of the semantic segmentation neural network model was further increased. Increasing noise in the input image is similar to using the Dropout method to reduce overfitting, as shown in Figure 12. Besides the two data augmentation methods of Gaussian blur and Gaussian noise, some common data augmentation methods for deep learning image recognition were adopted, such as random rotation, mirror reflection, contrast, and brightness control. Fi- similar data were combined using the Mixup method, forming a new data augmentation training dataset. The result is shown in Figure 13. The Mixup data refers to the superimposition of the original image and it is one of the augmented images. Figure 13a,b are the Mixup images which has original and random rotation augmented images in them. The purpose of this type of data is to feed images with more noise to the model for better identification.

Nasal Sinus Region Segmentation
To calculate the volume of each nasal sinus, the system should segment all kinds of nasal sinus contour regions for subsequent calculation. This section will detail the result of deep learning semantic segmentation used in this procedure.

Loss Function Selection
The selection of the loss function is significant in optimizing the semantic segmentation model to accurately classify the paranasal sinuses of various pixels. For semi-supervised learning of probable pseudo labels, this study used Cross-Entropy as a loss function method.
where C is the number of classes; n is the number of data; , c i y is the actual answer; , c i p is the predicted probability of No. i data belonging to class C . The closer the predicted probability is to the actual answer, the closer the value is to 0. On the contrary, if the predicted probability is quite different from the actual answer, the value is close to 1. In order to calculate the class probability of each pixel, the softmax is used in the output layer, expressed as Equation (16). Since there are multi-class classifications involved in the model, softmax is used to classify the output with the probability of each class [42].
where i Z is the output of No. i output layer.

Loss Function Selection
The selection of the loss function is significant in optimizing the semantic segmentation model to accurately classify the paranasal sinuses of various pixels. For semi-supervised learning of probable pseudo labels, this study used Cross − Entropy as a loss function method.
where C is the number of classes; n is the number of data; y c,i is the actual answer; p c,i is the predicted probability of No. i data belonging to class C. The closer the predicted probability is to the actual answer, the closer the value is to 0. On the contrary, if the predicted probability is quite different from the actual answer, the value is close to 1. In order to calculate the class probability of each pixel, the softmax is used in the output layer, expressed as Equation (16). Since there are multi-class classifications involved in the model, softmax is used to classify the output with the probability of each class [42].
where Z i is the output of No. i output layer.

Semantic Segmentation Model Training
This study employed various famous deep learning semantic segmentation models, including UNet [43], PSPNet [44], DeepLab v3+ [45], and HRNet [46] to train and iterate the datasets 100 times. After preliminary training of these semantic segmentation models, the losses have converged. UNet has a minimum loss, followed by DeepLab v3+, shown in Figure 14. The training result prediction images are shown in Figure 15, where green is the maxillary sinus; blue is the anterior ethmoid sinus; red is the posterior ethmoid sinus; cyan is the frontal sinus; pink is the sphenoid sinus.
Besides basic convolution operation, the depthwise separable convolution, squeezeand-excitation networks, and residual connection were used in this study's deep learning semantic segmentation model. The entire model was divided into three main stages: high-resolution feature, low-resolution feature, and expanded receptive field feature, as shown in Figures 16 and 17. The expanded receptive field feature part employed dilated convolution and Atrous Spatial Pyramid Pooling to further enhance the receptive field of the model, which was integrated with the peripheral information of the nasal sinus region. The low-resolution feature was combined with the expanded receptive field feature information and combined with a high-resolution feature, respectively. Different detail features of different feature map scales were fused to make the nasal sinus contour more legible and complete. The up sampling adopted bilinear difference to guarantee a larger receptive field and maintain clear contour detail.

Semantic Segmentation Model Training
This study employed various famous deep learning semantic segmentation models, including UNet [43], PSPNet [44], DeepLab v3+ [45], and HRNet [46] to train and iterate the datasets 100 times. After preliminary training of these semantic segmentation models, the losses have converged. UNet has a minimum loss, followed by DeepLab v3+, shown in Figure 14. The training result prediction images are shown in Figure 15, where green is the maxillary sinus; blue is the anterior ethmoid sinus; red is the posterior ethmoid sinus; cyan is the frontal sinus; pink is the sphenoid sinus.

Semantic Segmentation Model Training
This study employed various famous deep learning semantic segmentation models, including UNet [43], PSPNet [44], DeepLab v3+ [45], and HRNet [46] to train and iterate the datasets 100 times. After preliminary training of these semantic segmentation models, the losses have converged. UNet has a minimum loss, followed by DeepLab v3+, shown in Figure 14. The training result prediction images are shown in Figure 15, where green is the maxillary sinus; blue is the anterior ethmoid sinus; red is the posterior ethmoid sinus; cyan is the frontal sinus; pink is the sphenoid sinus. Besides basic convolution operation, the depthwise separable convolution, squeezeand-excitation networks, and residual connection were used in this study's deep learning semantic segmentation model. The entire model was divided into three main stages: highresolution feature, low-resolution feature, and expanded receptive field feature, as shown in Figures 16 and 17. The expanded receptive field feature part employed dilated convo- lution and Atrous Spatial Pyramid Pooling to further enhance the receptive field of the model, which was integrated with the peripheral information of the nasal sinus region. The low-resolution feature was combined with the expanded receptive field feature information and combined with a high-resolution feature, respectively. Different detail features of different feature map scales were fused to make the nasal sinus contour more legible and complete. The up sampling adopted bilinear difference to guarantee a larger receptive field and maintain clear contour detail.  The common ReLU of the first fully connected layer activation function in the squeeze-and-excitation was changed to the latest activation function Mish [36]. The Mish activation function has been proven more accurate in other experiments [47]. The first 1 × 1 convolution dilated the feature map. The feature was extracted using depthwise separable convolution and multiplied by squeeze-and-excitation to obtain the weighted feature map. Then, a linear 1 × 1 convolution was adopted to compress the feature map, which was added by residual connection to deliver the neural network gradient to the next layer.

Effectiveness Evaluation Indexes
This study employed Pixel Accuracy, Mean Intersection-Over-Union, and Dice Coefficient as evaluation indexes.
(1) Pixel Accuracy (PA): Pixel Accuracy is the correct rate of all pixels in the image.
(2) Mean Intersection-Over-Union (MIoU) The IoU is the Intersection over Union. When the predicted image and the labeled The common ReLU of the first fully connected layer activation function in the squeezeand-excitation was changed to the latest activation function Mish [36]. The Mish activation function has been proven more accurate in other experiments [47]. The first 1 × 1 convolution dilated the feature map. The feature was extracted using depthwise separable convolution and multiplied by squeeze-and-excitation to obtain the weighted feature map. Then, a linear 1 × 1 convolution was adopted to compress the feature map, which was added by residual connection to deliver the neural network gradient to the next layer.

Effectiveness Evaluation Indexes
This study employed Pixel Accuracy, Mean Intersection-Over-Union, and Dice Coefficient as evaluation indexes.
(1) Pixel Accuracy (PA): Pixel Accuracy is the correct rate of all pixels in the image.
(2) Mean Intersection-Over-Union (MIoU) The IoU is the Intersection over Union. When the predicted image and the labeled image are not intersecting, the output is 0. On the contrary, when the predicted image and the labeled image are fully intersecting, the output is 1. The Mean IoU was adopted for the multi-class segmentation of images in this study.

(3) Dice Coefficient
The Dice Coefficient is similar to IoU, a double intersection of pixels of the predicted image and the labeled image.

Semi-Supervised Learning
After the semantic segmentation model training was completed, this study employed a semi-supervised learning training method for further training [48] to further optimize the semantic segmentation model. For the data in this study, the data volume of 6319 CT images was adequate; labeling all these image data to train the model would take a lot of time.

Pseudo Label Generation
Besides 1506 labeled image data, 1141 unlabeled image data were prepared for semisupervised learning. The pseudo label should be generated for these unlabeled image data before training. The CT images were imported into the trained semantic segmentation model so that the semantic segmentation model could predict the probability class of various paranasal sinuses of various pixels through softmax. In predicting the contour of the maxillary sinus region, as shown in Figure 18, the probability higher than a value was adopted as new label data through a predicted self-confidence threshold. This threshold is the optimum threshold selected by calculating the maximum Dice score of the validation dataset, as shown in Table 1. Figure 18d is derived from subtracting the pseudo label image from the predicted image, showing that the self-confidence threshold can filter the pixels without the confidence of the semantic segmentation model [49] to increase the Dice score of the corresponding validation dataset.

Semi-Supervised Learning Results
After the semantic segmentation model generated the pseudo label, the former could be trained for semi-supervised learning with labeled data. Finally, the difference between the minimum losses of the third and second training is only 0.0003, as shown in Table 2.
In the semi-supervised learning training process, more data augmentation noise was added to the unlabeled image to avoid overfitting in the semantic segmentation model. Each feature map of the low-resolution feature and the expanded receptive field was normalized using dropout [50]. The dropout was employed in the part of a high-resolution feature in the experimental process. As the Gaussian noise had been added in the CT image, the features of various nasal sinus region contours of model training would be damaged. Further, the convolution kernel width was modified to double VGG16 of the previous width whenever a depthwise separable convolution kernel in step size of 2 was passed by [51]. The new neural network feature map width trained three times was increased from 8 to 1.5, 2, and 2 times to train the features of more dimensions in each training process. The training result is shown in Figure 19, where the teacher represents the original semantic segmentation model, and the student represents the frequency of repetitive semi-supervised learning training. before training. The CT images were imported into the trained semantic segmentation model so that the semantic segmentation model could predict the probability class of various paranasal sinuses of various pixels through softmax. In predicting the contour of the maxillary sinus region, as shown in Figure 18, the probability higher than a value was adopted as new label data through a predicted self-confidence threshold. This threshold is the optimum threshold selected by calculating the maximum Dice score of the validation dataset, as shown in Table 1.  Figure 18d is derived from subtracting the pseudo label image from the predicted image, showing that the self-confidence threshold can filter the pixels without the confidence of the semantic segmentation model [49] to increase the Dice score of the corresponding validation dataset.

Volume Reconstruction
After the contours of five kinds of nasal sinus regions were predicted by the semantic segmentation model to further clarify the 3D nasal sinus contour to provide clinical refer-ence observation for doctors, the morphology and 3D reconstruction were adopted in this study for processing.

Volume Reconstruction
After the contours of five kinds of nasal sinus regions were predicted by the semantic segmentation model to further clarify the 3D nasal sinus contour to provide clinical reference observation for doctors, the morphology and 3D reconstruction were adopted in this study for processing.

Morphology
In terms of the contours of nasal sinus regions predicted by the semantic segmentation model, left or right paranasal sinuses cannot be recognized successfully because a small part of paranasal sinuses does not have an apparent edge. Therefore, this study employed the opening method of morphology and subsequently adopted the connection representation of morphology to distinguish left and right paranasal sinuses. The result is shown in Figures 20 and 21. Figure 19. Semi-supervised learning validation dataset loss.

Morphology
In terms of the contours of nasal sinus regions predicted by the semantic segmentation model, left or right paranasal sinuses cannot be recognized successfully because a small part of paranasal sinuses does not have an apparent edge. Therefore, this study employed the opening method of morphology and subsequently adopted the connection representation of morphology to distinguish left and right paranasal sinuses. The result is shown in Figures 20 and 21.

Volume Reconstruction
After the contours of five kinds of nasal sinus regions were predicted by the semantic segmentation model to further clarify the 3D nasal sinus contour to provide clinical reference observation for doctors, the morphology and 3D reconstruction were adopted in this study for processing.

Morphology
In terms of the contours of nasal sinus regions predicted by the semantic segmentation model, left or right paranasal sinuses cannot be recognized successfully because a small part of paranasal sinuses does not have an apparent edge. Therefore, this study employed the opening method of morphology and subsequently adopted the connection representation of morphology to distinguish left and right paranasal sinuses. The result is shown in Figures 20 and 21. The left and right paranasal sinuses can be distinguished effectively using the opening method. After the left and right paranasal sinuses were recognized using labeling and the air section (black) and mucosa inflammation region (gray) under the nasal sinus region were segmented by the threshold, the severity of sinus mucosa inflammation in the CT image could be clarified directly to calculate the mucosa inflammation condition of different paranasal sinuses. The final segmentation is shown in Figure 22, and the system execution output is shown in Figure 23. The left and right paranasal sinuses can be distinguished effectively using the opening method. After the left and right paranasal sinuses were recognized using labeling and the air section (black) and mucosa inflammation region (gray) under the nasal sinus region were segmented by the threshold, the severity of sinus mucosa inflammation in the CT image could be clarified directly to calculate the mucosa inflammation condition of different paranasal sinuses. The final segmentation is shown in Figure 22, and the system execution output is shown in Figure 23. The left and right paranasal sinuses can be distinguished effectively using the opening method. After the left and right paranasal sinuses were recognized using labeling and the air section (black) and mucosa inflammation region (gray) under the nasal sinus region were segmented by the threshold, the severity of sinus mucosa inflammation in the CT image could be clarified directly to calculate the mucosa inflammation condition of different paranasal sinuses. The final segmentation is shown in Figure 22, and the system execution output is shown in Figure 23.

3D Reconstruction
Furthermore, this study employed the Marching Cube algorithm to reconstruct the image of paranasal sinuses in order to clearly observe the shape or the mucosa inflammation condition of paranasal sinuses. First, the obtained 2D nasal sinus images were overlapped to form 3D matrix data. The unit block was added to each layer of the 2D matrix, and the voxels were defined to evaluate whether or not the voxels in each unit block were larger than the iso-value. Subsequently, the triangular mesh in each unit block was established. Finally, all the triangular structures were connected to reconstruct the image, as shown in Figure 24. The maxillary sinus, ethmoid sinus, frontal sinus, and sphenoid sinus were reconstructed, with the gold being the right nasal sinus, the blue being the left nasal sinus, and the red arrow being the orientation of the face. The sinus mucosa inflammation proportions are shown in Figure 25, with the gold being the right nasal sinus, the blue being the left nasal sinus, the gray being the uninflamed region, and the red arrow is the orientation of the face. Figure 26 shows the 3D reconstruction image of the overall paranasal sinuses. Mathematics 2022, 10, x FOR PEER REVIEW 24 of 34 (a) (b) (c) Figure 23. (a-c) are the system's execution output. Figure 23. (a-c) are the system's execution output.
x were reconstructed, with the gold being the right nasal sinus, the blue being the left nasal sinus, and the red arrow being the orientation of the face. The sinus mucosa inflammation proportions are shown in Figure 25, with the gold being the right nasal sinus, the blue being the left nasal sinus, the gray being the uninflamed region, and the red arrow is the orientation of the face. Figure 26 shows the 3D reconstruction image of the overall paranasal sinuses.

Discussion
Normal sinuses appear black in CT scan images because the sinuses are filled with air, and as long as there are symptoms of gray mucosal inflammation in the sinuses, it can be called sinusitis. Therefore, CT scans can evaluate the scope of sinus inflammation, severity and help diagnose the assessment and decide on the treatment of the disease [1]. The Lund-Mackay grading system is a method used by physicians to observe the severity of sinusitis using computed tomography, so it can be known that if the Lund-Mackay score is 0, it means that there is no sinusitis. However, if physicians are required to view continuous two-dimensional image CT images, they cannot accurately know how serious it is. Garneau et al. [3] used a grading system that calculated the proportion of mucosal inflammation as a standard for evaluating the severity of sinusitis. The study achieved better results than the general Lund-Mackay score. However, it takes more than an hour to manually select various sinus contours in all computed tomography slice images. Compared with the grading system that calculates the proportion of sinus mucosal inflammation in the same system and the experiment, there is a big difference of only 9.7 s. Therefore, the study provides a more rapid, objective, and accurate tool for assessing the severity of sinusitis, as the gold standard for measuring the severity of sinusitis in the future.
Despite the high incidence of chronic rhinosinusitis, the optimal strategy for treating chronic rhinosinusitis has so far been uncertain, so there is no specific relationship between the degree of sinus inflammation and the decision to perform sinus surgery or use of medication. In the Lund-Mackay study [52], it was also stated that the score is only a method of assessing and quantifying the degree of inflammation in sinusitis, so as a study of clinical criteria, there is no absolute score threshold for sinusitis surgery. Patients improved better after surgery, but there was no certain score threshold to tell patients how much they must have surgery. Therefore, in order to better recommend whether patients with chronic rhinosinusitis should undergo surgery, this study traced the past medical records of chronic rhinosinusitis and calculated a grading system for the proportion of sinus mucosal inflammation as a recommendation index for surgery. Through the Youden index and the optimal threshold to assess whether the need for surgery can be effectively diagnosed. The optimal threshold is 6.92 when the appropriate score is 99.67%, 90.79%, and 88.75% for the PA, Dice, and mIOU respectively, which proves that the system can provide good surgical advice to physicians.
The three effectiveness evaluation indexes were worked out according to the trained semantic segmentation models. The result is shown in Table 3. UNet has the best effect, followed by DeepLab v3+, PSPNet, and HRNet. The semantic segmentation models used in this study were compared using different prominent convolutional neural network backbone methods, such as MobileNet [53], ResNet [31], ResNext [27], and Inception [54][55][56]. According to the experimental results in Table 4, the nasal sinus region segmentation accuracy can be enhanced greatly by combining the semantic segmentation model with semi-supervised learning training. Besides accuracy, the difference from the UNet computing time was worked out. The time for predicting each of the 100 head CT images is shown in Table 5. The nasal sinus segmentation effectiveness has enhanced a lot. In order to analyze the predicted result in more detail, the confusion matrix was employed to analyze the predicted and actual results of different paranasal sinuses. The result of the confusion matrix is shown in Figure 27, where 0 represents the background; 1 is the maxillary sinus; 2 is the anterior ethmoid sinus; 3 represents the posterior ethmoid sinqualityus; 4 represents the frontal sinus; 5 represents the sphenoid sinus.
According to the experimental results in Table 4, the nasal sinus region segmentation accuracy can be enhanced greatly by combining the semantic segmentation model with semi-supervised learning training. Besides accuracy, the difference from the UNet computing time was worked out. The time for predicting each of the 100 head CT images is shown in Table 5. The nasal sinus segmentation effectiveness has enhanced a lot. In order to analyze the predicted result in more detail, the confusion matrix was employed to analyze the predicted and actual results of different paranasal sinuses. The result of the confusion matrix is shown in Figure 27, where 0 represents the background; 1 is the maxillary sinus; 2 is the anterior ethmoid sinus; 3 represents the posterior ethmoid sinqualityus; 4 represents the frontal sinus; 5 represents the sphenoid sinus. The final training result is the most effective in segmenting the contour of the maxillary sinus from the validation dataset, followed by the sphenoid sinus; their pixel segmentation accuracy is 98.53% and 98.03%, respectively. The anterior ethmoid sinus and posterior ethmoid sinus were the most difficult to distinguish. Of them, 0.31% and 0.32% were The final training result is the most effective in segmenting the contour of the maxillary sinus from the validation dataset, followed by the sphenoid sinus; their pixel segmentation accuracy is 98.53% and 98.03%, respectively. The anterior ethmoid sinus and posterior ethmoid sinus were the most difficult to distinguish. Of them, 0.31% and 0.32% were predicted to be each other, and 0.04% of anterior ethmoid sinus pixels were falsely segmented as frontal sinus.
Besides the labeled validation dataset, three otorhinolaryngologists from Tri-Service General Hospital were invited to accurately score the 100 additional groups of different patients' CT data (testing dataset) predicted by the semantic segmentation model. The evaluation method is that each nasal sinus region in each group of head CT images is given 0 to 2 points; 0 is a severe error, 1 is a partial error, and 2 is correct. As 100 groups of different patients' data were evaluated, the maximum score of each nasal sinus region was 200 points, and the final score was divided by 200 points to obtain the scoring rate. The result is shown in Table 6. According to the doctors' evaluation in Table 6, the average score is 94.6%. Anterior ethmoid sinus occurs when the patient is in a characteristic condition of Frontal Cells in the CT image. As it is connected to the frontal sinus region from three angles, the segmentation of the anterior ethmoid sinus and frontal sinus is likely to be false, leading to worse grading results.
The volume of different paranasal sinuses in the CT can be calculated rapidly in this study. In most paranasal sinus studies, the volume of an Asian's paranasal sinuses is seldom studied. Hence, statistical analysis was conducted based on the data collected in this study, and the relationships between Taiwanese subjects' age and sex and different nasal sinus volumes were worked out. This study investigated 144 subjects, composed of 95 male and 45 female subjects, whose ages ranged from 16 to 80 years old, body heights from 146 to 188 cm, body weights from 43 to 102 kg, and BMIs from 17.18 to 36.58, as shown in Table 7. The result shows that the males have larger body heights and weights than females and larger paranasal sinuses, showing a significant difference (p-value < 0.05). In addition, the symmetry of the left and right paranasal sinuses were analyzed and calculated as (1-left/right). When the left nasal sinus was larger than the right nasal sinus, the value was negative. The calculation result is shown in Table 8. It is observed that the left frontal sinus and sphenoid sinus were larger than their right counterparts. Additionally, many documents (e.g., forensic medicine) studied the relationship between the size of paranasal sinuses and physiological parameters. Therefore, this study adopted the p-value to analyze the correlation between the size of paranasal sinuses and the other physiological parameters. The correlation between the sex and the size of paranasal sinuses is most extensively studied. Some studies [7,8] indicated that there is not any relationship between the sex and the size of paranasal sinuses, while others indicated that females have larger [51] or smaller [57] paranasal sinuses. However, the relationship between an Asian person's sex and the size of paranasal sinuses was seldom mentioned. Females have smaller paranasal sinuses (p-value < 0.05) in terms of the Taiwanese subjects investigated in this study. According to the analyzed correlation coefficient, the sizes of various paranasal sinuses have a high significance (p-value < 0.001). It is noteworthy that the age and left maxillary sinus (p value < 0.05) and right sphenoid sinus (p-value < 0.05) also have significance; the larger the age is, the smaller is the volume. Notably, this result is similar to [58]. The body height and the size of all paranasal sinuses have a high significance (p-value < 0.001), the body weight and anterior, posterior, left and right ethmoid sinuses have significance (p-value < 0.05), and the BMI and right sphenoid sinus (p-value < 0.001) have a high significance, as shown in Table 9.
In the Lund-Mackay grading system, the complexity of sinus orifice score is still a controversial issue, and its interpretation will vary greatly with the thickness and number of slices in computed tomography. This is an indicator that has not yet reached a consensus. If there is a clearer way to define this indicator in the future, the system will definitely be more accurate. The slice thickness of the head computed tomography images used in this study is 3 mm, and the spacing is large. As shown in the 3D reconstruction results in Figures 24-26, the reconstruction results are relatively rough, and there may be sinus edges just right. It is ignored by slicing, which makes the system unable to obtain complete contour information accurately. On the other hand, if the slices of the head computed tomography scan just hit the edge of the sinus, it is difficult to distinguish between inflammation and nasal polyps due to the blurry gray on the image. In the same way, when the patient has frontal cells specific condition, the contours of the frontal sinus and ethmoid sinus are difficult to be well segmented in slices with a thickness of 3 mm, and since there are few cases with frontal cells specific conditions, 50 strokes have been marked. Only 1 data appeared in the data, the sample is too small, and 6 data out of 100 in the test data set had the problem of Frontal Cells. If a slice thickness of 1 mm is used to make more detailed observations and collect more such special cases, the accuracy of the segmentation of the anterior ethmoid and frontal sinuses will surely be improved.

Conclusions
The fully automatic nasal sinus volume reconstruction system in this study successfully obtained the relevant detailed information by accurately acquiring the nasal sinus contour edges in the CT images, expanding the model perception. The volume of important paranasal sinuses in clinical medicine can be quantized rapidly. The accuracy of this study has been validated, and the results can be effectively used by physicians in different departments or hospitals. At present, a doctor spends more than one hour accurately calculating the volume of one patient's paranasal sinuses. This study can shorten the time for the diagnosis to accelerate treatment and save resources. As the anatomical structure of paranasal sinuses and endonasal skull base surgery is a complex 3D space structure, this study can be combined with a 3D printing technique for doctors to print out nasal sinus samples through a computer or entity for observation. Further, the CT image can be converted into more visual 3D objects so that doctors can observe the shape or mucosa inflammation condition of various paranasal sinuses from the computer. In this way, the paths from the nostrils to the frontal sinus or maxillary sinus are obtained, assisting surgical judgment, and the shape and size of the patient's paranasal sinuses can be observed effectively. In addition, the paths from the nostrils to different paranasal sinuses are obtained, which helps doctors make detailed plans for dissection and surgery of paranasal sinuses. The normal volumes of Taiwanese patients were analyzed in this study, and the correlation between the volume of different paranasal sinuses and physiological parameters was analyzed. Furthermore, it is proved that males have larger nasal sinus volume than females, and the greater the body height, the greater the volume of paranasal sinuses. Moreover, this result can provide a more accurate assessment of forensic anatomy.
Author Contributions: C.-F.J.K. study design, critical article review/editing; S.-C.L. study design, data collection, data analysis and interpretation, literature search, images editing, article drafting, article submission. All authors have read and agreed to the published version of the manuscript.  (TSGH-A-111004, TCAFGH-E111044, TSGH-NTUST-111-03). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Institutional Review Board Statement:
The research protocol (NO: C202105070) has been reviewed and approved by the Institutional Review Board of Tri-Service General Hospital.