Midpalatal Suture CBCT Image Quantitive Characteristics Analysis Based on Machine Learning Algorithm Construction and Optimization

Background: Midpalatal suture maturation and ossification status is the basis for appraising maxillary transverse developmental status. Methods: We established a midpalatal suture cone-beam computed tomography (CBCT) normalized database of the growth population, including 1006 CBCT files from 690 participants younger than 24 years old. The midpalatal suture region of interest (ROI) labeling was completed by two experienced clinical experts. The CBCT image fusion algorithm and image texture feature analysis algorithm were constructed and optimized. The age range prediction convolutional neural network (CNN) was conducted and tested. Results: The midpalatal suture fusion images contain complete semantic information for appraising midpalatal suture maturation and ossification status during the fast growth and development period. Correlation and homogeneity are the two texture features with the strongest relevance to chronological age. The overall performance of the age range prediction CNN model is satisfactory, especially in the 4 to 10 years range and the 17 to 23 years range, while for the 13 to 14 years range, the model performance is compromised. Conclusions: The image fusion algorithm can help show the overall perspective of the midpalatal suture in one fused image effectively. Furthermore, clinical decisions for maxillary transverse deficiency should be appraised by midpalatal suture image features directly rather than by age, especially in the 13 to 14 years range.


Introduction
Maxillary deficiency is a type of craniofacial malformation with a high population incidence exceeding 20% of the global population [1]. Maxillary transverse deficiency plays an important role in maxillary deficiency and results in various malocclusions, including posterior crossbite, dentition crowding, and can even lead to obstructive sleep apnea, etc. [1][2][3][4]. Moreover, dentofacial deformities, including craniosynostosis and cleft lip/palate, can also be accompanied by maxillary transverse deficiency [1,5]. Maxillary Compared with previous studies that extracted and analyzed midpalatal suture image characteristics through a single image section, we designed an image fusion algorithm to utilize multi-slice valuable image information in CBCT. This image fusion algorithm avoids the influence of CBCT examination orientation and the convex palatal vault, therefore helping to show the overall perspective of the midpalatal suture in one fused image [40][41][42]. Furthermore, structure labeling by clinical experts will improve the proportion of midpalatal sutures in the final images.
The remainder of this article is organized as follows: the automated processing techniques of CBCT, midpalatal suture region image fusion method, and the chronological age range prediction model are all covered in Section 2; the performance of the proposed methods is evaluated in Section 3. Finally, we present some discussions in Section 4 and conclude this article in Section 5. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of the Peking University Hospital of Stomatology Institutional Review Board (PKUSSIRB-202163037).

Materials and Methods
The sample collection was carried out at the Peking University Hospital of Stomatology. CBCT from patients younger than 24 years old that undergone single or multiple CBCT examinations in the Department of oral and maxillofacial radiology according to diagnosis or treatment needs (1 January 2015 to 31 December 2020) were screened. The examination field should include the supra-orbital arch (upper boundary) and the lower margin of the fourth cervical vertebra (lower boundary), and the examination interval for the same participant should be longer than 1 month. The exclusion criteria are shown in Table 1. The gender and clinical departments of the participants were not limited. Table 1. Exclusion criteria for participants.

Exclusion Criteria
(1) History of severe systemic diseases; (2) History of cranial and maxillofacial bone fracture; (3) History of cranial and maxillofacial bone tumor; (4) History of cleft lip and/or palate; (5) History of syndromes or endocrine diseases affecting cranial and maxillofacial bone development.

CBCT Examination
CBCT images were taken with NewTom VGi (Quantitative Radiology, Verona, Italy), at 2.81 mA, 110 kV, 3.6-s exposure, and a 15 × 15 cm field of view, with an axial slice thickness of 0.3 mm, and isotropic voxels ( Figure 1). The participants sat upright with a natural head position and jaws immobilized using a chin holder, keeping the Frankfort plane horizontal to the ground. The teeth were occluded at the intercuspal position, with facial muscles relaxed.

Region of Interest Labeling in Midpalatal Suture CBCT Images
The region of interest (ROI) labeling was completed by two experienced clinical experts. The upper and lower boundaries of the CBCT axial sections for each CBCT file were located by Dolphin Imaging software (11.8

Region of Interest Labeling in Midpalatal Suture CBCT Images
The region of interest (ROI) labeling was completed by two experienced cli perts. The upper and lower boundaries of the CBCT axial sections for each CBCT located by Dolphin Imaging software (11.8

Region of Interest Labeling in Midpalatal Suture CBCT Images
The region of interest (ROI) labeling was completed by two perts. The upper and lower boundaries of the CBCT axial section located by Dolphin Imaging software (11.8, Oakdale, CA, USA) an Excel software (2203, Redmond, WA, USA). The anterior and po CBCT axial sections for each CBCT file were located by MicroDic ware (2022.1, Sofia, Bulgaria) and recorded by Colabeler sof China) ( Figure 2). The upper boundary of the CBCT axial sections is the upp vault, the lower boundary is the apical point of the upper cen higher one when the two apical points are in different sections), the most anterior point of the midpalatal suture on the maxilla, a ary is the most posterior point of the midpalatal suture on the pa

Image Analysis Algorithm
The algorithm in this study consists of two parts: the midpa The upper boundary of the CBCT axial sections is the upper margin of the palatal vault, the lower boundary is the apical point of the upper central incisors (choose the higher one when the two apical points are in different sections), the anterior boundary is the most anterior point of the midpalatal suture on the maxilla, and the posterior boundary is the most posterior point of the midpalatal suture on the palatine bone.

Image Analysis Algorithm
The algorithm in this study consists of two parts: the midpalatal suture CBCT image fusion algorithm (introduced in Section 2.3) and the image texture feature analysis algorithm (introduced in Section 2.4).
As the midpalatal suture image is complicated, it is difficult to be obtained through single-section image analysis. In addition, the proportion of midpalatal suture in the total CBCT field is small; thus, more noises will arise from other regions apart from the midpalatal suture. Therefore, the raw images cannot be applied to appraise the maturation and ossification status effectively or be used to train a convolutional neural network (CNN) [43]. Therefore, we proposed a CBCT image fusion algorithm, which includes three parts: image processing, image fusion, and fused image optimization.

Image Processing
The CBCT files were read and converted into three-dimensional gray matrixes and then converted into a series of axial images of 512 × 512 resolution. The midpalatal suture normalized ROI of 50 × 200 resolution were extracted.

Image Fusion
The fusion weights were calculated and adjusted by combining the existing pixel-level image fusion algorithm with the characteristics of the midpalatal suture region. Image fusion was carried out in every two sections of midpalatal suture multi-slice ROI images for each CBCT file until all of the images were fused into one overall midpalatal suture image. The pixel value of each point in the fused image was calculated by the following formula: A ij refers to the average gray scale value of point (i, j) in each of the two images that need to be fused, e refers to the total average gray scale of the images that need to be fused, and d refers to the adjustment factor based on the maximum gray scale difference of the images that need to be fused.
It can be predicted that if all of the images are fused directly, all of the pixels will approach the average gray level, resulting in blurred fused images. Therefore, we performed the weighted fusion of images in pairs and then continued to fuse the fused images in pairs until all of the images were fused into one. The computational complexity of the image fusion algorithm is O(n log n), since the structure of the image fusion algorithm is merging.

Fused Image Optimization
During the image fusion process, we used the convolution operator to optimize the fused image so as to improve the clarity of the midpalatal suture. The operator weight was adjusted according to the image fusion result to make the image textures clearer.

Image Texture Feature Analysis Algorithm
The image texture feature analysis was then conducted to find the correlation between the midpalatal suture CBCT image texture features and chronological age. Compared with CNN, the image texture features training is more intuitive. The preliminary texture features analysis can also provide evidence for the effectiveness of CNN training since the CNN lacks interpretability.

Correlation
Correlation reflects the consistency of image texture. It is used to measure the similarity of spatial gray level co-occurrence matrix elements in row or column direction.

Homogeneity
Homogeneity is used to measure how much the local texture changes. A large value indicates that there is less change between different regions of the image texture, and the parts are more uniform.

Energy
Energy is the sum of the squares for the values of each element in the gray level co-occurrence matrix. It is a measure of the stability of the gray level change of the image texture and reflects the uniformity of the image gray level distribution and the thickness of the texture. A larger energy value indicates that the current texture is stable, with regular changes.

Contrast
Contrast reflects the clarity of the image and the depth of the texture grooves. The deeper the texture grooves, the greater the contrast is, and the clearer the visual effect will be. On the contrary, if the contrast is small, the grooves are shallow; thus, the effect will be fuzzy.

Dissimilarity
The dissimilarity reflects the total amount of local gray changes in the image. However, different from contrast, the weight of dissimilarity increases linearly with the distance between matrix elements and diagonal.

ASM (Angular Second Moment)
ASM is used to describe the uniformity of gray image distribution and the thickness of texture. If all values of GLCM are very close, the ASM value will be smaller. If the values of matrix elements differ greatly, the ASM value will be larger.
The image texture features were extracted by Scikit-Image. Then, scatter diagrams of all samples were drawn by pyplot, in which chronological age was taken as an independent variable and image texture feature value was taken as a dependent variable. Correlations between image texture features with chronological age were evaluated to find out if they are suitable to appraise midpalatal suture maturation and ossification status.

Age Range Prediction of Midpalatal Suture CBCT Image Features
The age range prediction CNN model was carried out to further clear the prediction efficiency of midpalatal suture maturation status image features for chronological age.

Datasets and Labels
Five age ranges were classified and labeled: 4 to 10 years old labeled as 0, 11 to 12 years old labeled as 1, 13 to 14 years old labeled as 2, 15 to 16 years old labeled as 3, and 17 to 23 years old labeled as 4. In addition, the data were expanded through random translation, tilt, contrast and brightness adjustment, clip in small amplitude, and horizontal mirroring. The finally adjusted images were normalized into 50 × 200 pixels.
(1) Validation set: Out of the total samples, 10 typical samples were selected from each age range, and these 50 images were used as the validation set. (2) Test set: Out of the total samples, 20 typical samples were selected from each age range, and these 100 images were used as the test set. (3) Training set: Out of the total samples, the remaining 856 samples, apart from those used in the validation set and test set, were used as the training set.
The optimized deep residual network (ResNet) 50 model CNN was used to conduct the chronological age range prediction ( Figure 3). Age range prediction by midpalatal suture image is a multi-classification task. The Softmax function was used in the output layer to make the total probability of five age ranges equal to 1. Then the cross-entropy loss function was used to quantify the error between the model outputs and labels. Grad-CAM was applied to generate heat maps for model prediction. The redder the color is, the more dependent the model is on the image features of this region. was applied to generate heat maps for model prediction. The redder the color is, the more dependent the model is on the image features of this region.

CNN
As the most widely used deep learning method, CNN was used in age range prediction tasks by using midpalatal suture fused images [44].
The CNN in our study mainly consisted of an input layer, a convolutional layer, a pooling layer, a full connected layer, and an output layer. The input was the raw image . refers to the feature map of layer i ( = ). As the convolution layer, was generated by the following formula: represents the weight vector for the convolution kernel of i, the ⊗ symbol represents the convolution operation between the convolution kernel and the (i − 1) layer image. The output of the convolution was added to (offset vector of layer i). Finally, (feature image of layer i) was obtained through the nonlinear excitation function ( ).
The convolutional layer was followed by the pooling layer. The pooling layer compressed the input feature image to reduce feature dimensions, thus simplifying the complexity of the CNN calculation, while maintaining certain invariance of the feature (rotation, translation, expansion-retraction, etc.).
Essentially, the CNN was a mathematical model of mapping the original matrix to a new probability expression through data transformation or dimensionality reduction at multiple levels. After the alternating transmission of multiple convolutional layers and pooling layers, we classified the extracted image features and obtained the input-based probability distribution by the CNN relied on a fully connected network.

Deep Residual Learning
ResNet solves the problem of difficulty in CNN model training [45] and shows excellent performance in CNN [46][47][48]. Compared with other network structures, ResNet's learning results are more sensitive to the fluctuations of network weights and data, and it

CNN
As the most widely used deep learning method, CNN was used in age range prediction tasks by using midpalatal suture fused images [44].
The CNN in our study mainly consisted of an input layer, a convolutional layer, a pooling layer, a full connected layer, and an output layer. The input was the raw image X. X i refers to the feature map of layer i (X 0 = X). As the convolution layer, X i was generated by the following formula: W i represents the weight vector for the convolution kernel of i, the ⊗ symbol represents the convolution operation between the convolution kernel and the (i − 1) layer image. The output of the convolution was added to b i (offset vector of layer i). Finally, X i (feature image of layer i) was obtained through the nonlinear excitation function f (X).
The convolutional layer was followed by the pooling layer. The pooling layer compressed the input feature image to reduce feature dimensions, thus simplifying the complexity of the CNN calculation, while maintaining certain invariance of the feature (rotation, translation, expansion-retraction, etc.).
Essentially, the CNN was a mathematical model of mapping the original matrix to a new probability expression through data transformation or dimensionality reduction at multiple levels. After the alternating transmission of multiple convolutional layers and pooling layers, we classified the extracted image features and obtained the input-based probability distribution by the CNN relied on a fully connected network.

Deep Residual Learning
ResNet solves the problem of difficulty in CNN model training [45] and shows excellent performance in CNN [46][47][48]. Compared with other network structures, ResNet's learning results are more sensitive to the fluctuations of network weights and data, and it is one of the best model choices at present. The network structure in this study is the optimized ResNet50 network.
Residual blocks in ResNet are designed to learn the residuals of underlying features rather than the underlying features. In a residual block, if the learned features for input X is recorded as H(X), the expected residual F(X) = H(X) − X. In this way, the original learning feature is F(X) + X.
Deep residual learning is easier than directly learning original features. When the residual is 0, there is only identity mapping in the accumulation layer, and at least the network performance will not decline. In fact, the residual will not be 0; thus, deep residual learning will enable the accumulation layer to learn new features based on input features so as to improve performance. The residual learning process is a shortcut connection (Figure 4), which is similar to a short circuit in the electric circuit.
is one of the best model choices at present. The network structure in this study is the timized ResNet50 network.
Residual blocks in ResNet are designed to learn the residuals of underlying feat rather than the underlying features. In a residual block, if the learned features for i is recorded as ( ), the expected residual ( ) = ( ) − . In this way, the orig learning feature is ( ) + .
Deep residual learning is easier than directly learning original features. When residual is 0, there is only identity mapping in the accumulation layer, and at leas network performance will not decline. In fact, the residual will not be 0; thus, deep re ual learning will enable the accumulation layer to learn new features based on input tures so as to improve performance. The residual learning process is a shortcut connec (Figure 4), which is similar to a short circuit in the electric circuit. Intuitively, the learning content reduces residual learning. The residual is relati small, making the learning process easy. The residual unit is expressed as the follow formulas: represent the input and output of residual unit of l, respectively. E residual unit is a multi-layer structure. , as the residual function, represents the lea residual. The ℎ( ) = represents the identity mapping, and is the ReLU activa function. Based on formula (3) and (4), the learning features from shallower layer deeper layer is: The gradient of the reverse process can be obtained by the chain rule: The first factor represents the gradient of the loss function to . The "1" in parentheses represents that the short-circuit mechanism can spread the gradient no structively, while the other residual gradient needs to pass through layers with weig The gradient is not directly transmitted. Intuitively, the learning content reduces residual learning. The residual is relatively small, making the learning process easy. The residual unit is expressed as the following formulas:

ResNet Structure
x l and x l+1 represent the input and output of residual unit of l, respectively. Each residual unit is a multi-layer structure. F, as the residual function, represents the learned residual. The h(x l ) = x l represents the identity mapping, and f is the ReLU activation function. Based on Formulas (3) and (4), the learning features from shallower layer l to deeper layer L is: The gradient of the reverse process can be obtained by the chain rule: The first factor ∂loss ∂x l represents the gradient of the loss function to L. The "1" in the parentheses represents that the short-circuit mechanism can spread the gradient nondestructively, while the other residual gradient needs to pass through layers with weights. The gradient is not directly transmitted.

ResNet Structure
As shown in Figure 3, ResNet was divided into 5 stages, wherein stage 0 contains one convolution layer and one pooling layer, and stages 1 to stage 4 contain 3, 4, 6, and 3 con-volution accumulation structures, respectively. Finally, the output results were converted from the average pooling layer.

Hyperparameters Selection
In terms of hyperparameter selection, we firstly used the recognized parameters with excellent performance for model training. Then, within the specified parameter range, we used the grid search method to adjust the parameters by step. According to the performance of the saved model on the test set, the best set of hyperparameters was selected from all of the hyperparameters. The final selected hyperparameters are shown in Table 3. The training process of CNN is generally considered a "black box", and the model lacks intuitive interpretability [49]. Therefore, the Grad-CAM [50] method was adopted to generate heat maps according to the dependence degree of the midpalatal suture image feature region. The redder color of the region, the stronger the dependence of the model on the image feature of that region in the prediction process.
In Grad-CAM, the gradient of network back propagation was used to calculate the weight of each channel in the heat map. For the category c, the weight α of each channel was first obtained. Then the weighted sum of data from all of the channels in the feature layer A was calculated. Finally, the heat map was obtained by the ReLU activation function. The formulas are as follows: In Formulas (7) and (8), c refers to category, y c refers to the score that has been forecasted by the neural network but without softmax processing. A represents the feature value of the last convolution output layer, k refers to the k-th channel of feature layer A, A k refers to the calculation value of the k-th channel in feature layer A, A k ij refers to the calculation value of coordinate point (i, j) in the k-th channel of feature layer A. Z refers to the size of the feature layer (e.g., width × height).
Each Grad-CAM heat map was superposed with the fused image for age range prediction so as to intuitively show the dependence degree of the model on that image region in the prediction process and help further evaluate the rationality of the model.

Demographic Characteristic
The midpalatal suture CBCT normalized database with a total of 1006 CBCT files (CBCT files of females: 610, CBCT files of males: 396) was obtained from 690 participants of the growth population (female: 403, male: 287). In the database, there are 414 participants with single-time CBCT, 245 participants with two-times CBCT, 23 participants with threetimes CBCT, seven participants with four-times CBCT, and one participant with fivetimes CBCT.
The demographic characteristics of the total 1006 CBCT files are shown in Table 4.    Figures 5 and 6 show the image processing results of the midpalatal suture region. After reading, the sagittal, coronal, and axial views of each selected CBCT file contain hundreds of sections. After labeling by clinical experts, the midpalatal suture ROI images were extracted from the multi-slice axial images ( Figure 6).

Midpalatal Suture ROI Extraction and Image Fusion Algorithm
Then, the direct image fusion, weighted optimization, and convolution operator optimization were carried out (Figures 7 and 8). The direct fusion shows poorer performance, in which the image is blurred, and the morphological characteristics of the midpalatal suture region are not clear. By adjusting the fusion weight, the image contrast increases, and the midpalatal suture structure is clearer. Furthermore, after convolution operator optimization, the fused images show clear and distinct texture, which is more conducive for clinical evaluation and subsequent model training process.     Figures 5 and 6 show the image processing results of the midpalatal suture region. After reading, the sagittal, coronal, and axial views of each selected CBCT file contain hundreds of sections. After labeling by clinical experts, the midpalatal suture ROI images were extracted from the multi-slice axial images ( Figure 6).

Midpalatal Suture ROI Extraction and Image Fusion Algorithm
Then, the direct image fusion, weighted optimization, and convolution operator optimization were carried out (Figures 7 and 8). The direct fusion shows poorer performance, in which the image is blurred, and the morphological characteristics of the midpalatal suture region are not clear. By adjusting the fusion weight, the image contrast increases, and the midpalatal suture structure is clearer. Furthermore, after convolution operator optimization, the fused images show clear and distinct texture, which is more conducive for clinical evaluation and subsequent model training process.   Then, the direct image fusion, weighted optimization, and convolution operator optimization were carried out (Figures 7 and 8). The direct fusion shows poorer performance, in which the image is blurred, and the morphological characteristics of the midpalatal suture region are not clear. By adjusting the fusion weight, the image contrast increases, and the midpalatal suture structure is clearer. Furthermore, after convolution operator optimization, the fused images show clear and distinct texture, which is more conducive for clinical evaluation and subsequent model training process.

Image Feature Analysis
The image texture feature scatter diagrams show obvious positive correlat tween the correlation feature with chronological age and the homogeneity featu chronological age, respectively (Figures 9-12). The positive correlation trends ar among females and males.
Homogeneity is used to measure how much the local texture changes. A lar indicates that there is less change between different regions of the image texture, parts are more uniform. Correlation reflects the consistency of image texture. It is measure the similarity of spatial gray level co-occurrence matrix elements in a row umn direction. The homogeneity feature and the correlation feature both tend to with chronological age, which may be due to the increased maturation and oss degree of the midpalatal suture region.

Image Feature Analysis
The image texture feature scatter diagrams show obvious positive correlati tween the correlation feature with chronological age and the homogeneity featu chronological age, respectively (Figures 9-12). The positive correlation trends are among females and males.
Homogeneity is used to measure how much the local texture changes. A larg indicates that there is less change between different regions of the image texture, parts are more uniform. Correlation reflects the consistency of image texture. It is measure the similarity of spatial gray level co-occurrence matrix elements in a row umn direction. The homogeneity feature and the correlation feature both tend to i with chronological age, which may be due to the increased maturation and ossi degree of the midpalatal suture region.

Image Feature Analysis
The image texture feature scatter diagrams show obvious positive correlations between the correlation feature with chronological age and the homogeneity feature with chronological age, respectively (Figures 9-12). The positive correlation trends are similar among females and males.

Image Feature Analysis
The image texture feature scatter diagrams show obvious positive correlations between the correlation feature with chronological age and the homogeneity feature with chronological age, respectively (Figures 9-12). The positive correlation trends are similar among females and males.
Homogeneity is used to measure how much the local texture changes. A large value indicates that there is less change between different regions of the image texture, and the parts are more uniform. Correlation reflects the consistency of image texture. It is used to measure the similarity of spatial gray level co-occurrence matrix elements in a row or column direction. The homogeneity feature and the correlation feature both tend to increase with chronological age, which may be due to the increased maturation and ossification degree of the midpalatal suture region.

Model Evaluation
The evaluation parameters for the age range prediction model using the midpalata suture image features include precision ratio P, recall ratio R, and the test set classificatio F1-score. P refers to the proportion of correctly classified positive samples in the positiv samples determined by a classifier, and R refers to the proportion of correctly classifie positive samples in the true positive samples. F1-score is the harmonic average of P an R, and Acc refers to the proportion of correctly identified samples in all samples. The ca culation formulas are as follows:

Model Evaluation
The evaluation parameters for the age range prediction model using the midpalata suture image features include precision ratio P, recall ratio R, and the test set classificatio F1-score. P refers to the proportion of correctly classified positive samples in the positiv samples determined by a classifier, and R refers to the proportion of correctly classifie positive samples in the true positive samples. F1-score is the harmonic average of P an R, and Acc refers to the proportion of correctly identified samples in all samples. The ca culation formulas are as follows:

Model Evaluation
The evaluation parameters for the age range prediction model using the midpalata suture image features include precision ratio P, recall ratio R, and the test set classificatio F1-score. P refers to the proportion of correctly classified positive samples in the positiv samples determined by a classifier, and R refers to the proportion of correctly classifie positive samples in the true positive samples. F1-score is the harmonic average of P an R, and Acc refers to the proportion of correctly identified samples in all samples. The ca culation formulas are as follows: Homogeneity is used to measure how much the local texture changes. A large value indicates that there is less change between different regions of the image texture, and the parts are more uniform. Correlation reflects the consistency of image texture. It is used to measure the similarity of spatial gray level co-occurrence matrix elements in a row or column direction. The homogeneity feature and the correlation feature both tend to increase with chronological age, which may be due to the increased maturation and ossification degree of the midpalatal suture region.

Model Evaluation
The evaluation parameters for the age range prediction model using the midpalatal suture image features include precision ratio P, recall ratio R, and the test set classification F1-score. P refers to the proportion of correctly classified positive samples in the positive samples determined by a classifier, and R refers to the proportion of correctly classified positive samples in the true positive samples. F1-score is the harmonic average of P and R, and Acc refers to the proportion of correctly identified samples in all samples. The calculation formulas are as follows: For each age range X, T PX refers to the number of correctly predicted samples which predicted the certain age range; F PX refers to the number of wrongly predicted samples which predicted the certain age range; F NX refers to the number of wrongly predicted samples which predicted to other age ranges; T NX refers to the number of correctly predicted samples which predicted to other age ranges.
The accuracy of test set verification results of models with different f training times are shown in Figure 13, in which model accuracy reaches the maximum value in the 2000th round of training. This model was saved for further testing and analysis.
ioengineering 2022, 9, x FOR PEER REVIEW 1 = 2 + Acc = + + + + For each age range X, X refers to the number of correctly predicted s predicted the certain age range; X refers to the number of wrongly pred which predicted the certain age range; NX refers to the number of wron samples which predicted to other age ranges; NX refers to the number of dicted samples which predicted to other age ranges.
The accuracy of test set verification results of models with different f are shown in Figure 13, in which model accuracy reaches the maximum 2000th round of training. This model was saved for further testing and ana The confusion matrix of the age range prediction model by midpalata features verified in the test set is shown in Figure 14. The sum of each row number of actual samples of a certain label, and the sum of each column number of samples predicted as this label. P, R, F1-score, and area under values of the prediction model can be calculated by the confusion matrix, a are shown in Table 5.  The confusion matrix of the age range prediction model by midpalatal suture image features verified in the test set is shown in Figure 14. The sum of each row represents the number of actual samples of a certain label, and the sum of each column represents the number of samples predicted as this label. P, R, F1-score, and area under curve (AUC) values of the prediction model can be calculated by the confusion matrix, and the results are shown in Table 5.   The CBCT data set in this study is self-constructed, including a total of 1006 subjects from 4 to 23 years old, while most of the subjects belong to the middle age range. For this five-category classification task, clinicians paid more attention to sensitivity, specificity, and especially the AUC value, which is 0.7532, indicating that this model has reached the clinical auxiliary level. At present, the compromised classification accuracy is limited by the data set imbalance on the one hand and optimization of the sequence fusion algorithm on the other hand. The image fusion algorithm is very important in reflecting the image characteristics of midpalatal suture maturation status for subjects of different chronological age groups. Our future work will focus on optimizing and adjusting the image fusion algorithm to further support and improve classification accuracy.

Evaluation of Model Performance
Receiver operating characteristic (ROC) curves and area under curve (AUC) values are taken to evaluate the age range prediction model ( Figure 15). The true positive rate refers to the number of correctly predicted samples that are predicted to a certain age range; the false positive rate refers to the number of wrongly predicted samples that are predicted to a certain age range.

Evaluation of Model Performance
Receiver operating characteristic (ROC) curves and area under curve (AUC) values are taken to evaluate the age range prediction model ( Figure 15). The true positive rate refers to the number of correctly predicted samples that are predicted to a certain age range; the false positive rate refers to the number of wrongly predicted samples that are predicted to a certain age range. The AUC values for predicting all age ranges are above 65%, in which the AUC values of the 4 to 10 years range (0.9106) and the 17 to 23 years range (0.7887) are the two best age ranges.

Feature-Based Visualization
The image feature heat maps of the midpalatal suture region show that the redder areas are all located in the midpalatal suture (Figure 16), indicating that the image features The AUC values for predicting all age ranges are above 65%, in which the AUC values of the 4 to 10 years range (0.9106) and the 17 to 23 years range (0.7887) are the two best age ranges.

Feature-Based Visualization
The image feature heat maps of the midpalatal suture region show that the redder areas are all located in the midpalatal suture (Figure 16), indicating that the image features of the midpalatal suture region have satisfactory performance in its maturation and ossification status appraisal, as well as chronological age range prediction. of the midpalatal suture region have satisfactory performance in its maturation and ossification status appraisal, as well as chronological age range prediction.

Discussion
Clinical effectiveness and treatment-induced trauma of various kinds of RME methods are distinctly different. Treatment timing is vital in determining the clinical effectiveness and severity of side effects for each RME method [12]. Expansion during inappropriate timing can cause unnecessary trauma, as well as increased side effects, including periodontal attachment level loss, buccal cortical bone fenestrations, and dental root resorption [13][14][15][16]. Therefore, accurate appraisal of maxillary transverse developmental status is critical to provide evidence for the appropriate timing of different methods in maxillary transverse deficiency treatment so as to optimize the treatment strategies.
The research conception of this study is to prove the correlation between chronological age and maturation status of midpalatal suture and to provide evidence and theoretical support for our following study of establishing the staging standard of the midpalatal suture fused images. Therefore, it is necessary to prove the relationship between chronological age and maturation status of midpalatal suture through image characteristics from multiple perspectives.

Discussion
Clinical effectiveness and treatment-induced trauma of various kinds of RME methods are distinctly different. Treatment timing is vital in determining the clinical effectiveness and severity of side effects for each RME method [12]. Expansion during inappropriate timing can cause unnecessary trauma, as well as increased side effects, including periodontal attachment level loss, buccal cortical bone fenestrations, and dental root resorption [13][14][15][16]. Therefore, accurate appraisal of maxillary transverse developmental status is critical to provide evidence for the appropriate timing of different methods in maxillary transverse deficiency treatment so as to optimize the treatment strategies.
The research conception of this study is to prove the correlation between chronological age and maturation status of midpalatal suture and to provide evidence and theoretical support for our following study of establishing the staging standard of the midpalatal suture fused images. Therefore, it is necessary to prove the relationship between chronological age and maturation status of midpalatal suture through image characteristics from multiple perspectives.

Innovative Midpalatal Suture Image Fusion Algorithm
Ossification and maturation status of midpalatal suture is complicated. Age-related morphological changes in the midpalatal suture of human and animal specimen samples indicate that midpalatal suture can remain unfused for many years postnatal, even wholelife long period [51][52][53][54][55][56].
Given the histomorphological conclusions, the appraisal of midpalatal suture maturation and ossification status by "if it's fused/obliterated" is not reliable. However, the current imaging appraisal methods, especially the CBCT appraisal methods, are mainly based on "if midpalatal suture is fused/obliterated" in a single image section and mainly through human-eye qualitative appraisal, which leads to the loss of a large amount of valuable image information, high technical sensitivity, as well as low feasibility and simplicity [22,[27][28][29][30].
Therefore, quantitative imaging analysis not entirely reliant on the human eye is necessary to find more valuable information related to midpalatal suture growth and development more than its obliteration and absolute width. Image fusion has been used in this study to extract multi-section image information and then synthesize high-quality fused images. While as a widely used medical image analysis method in studies of several diseases [33][34][35], image fusion has not been applied in craniofacial growth and development studies. The combination of image fusion and craniofacial growth analysis, especially skeletal growth analysis, can help us utilize comprehensive image information of the complicated structures effectively and reliably.

Clinical Implications of Midpalatal Suture Image Texture Features
The correlation feature and the homogeneity feature are the two texture features with strongest relevance with chronological age for midpalatal suture fused images.
Belonging to gray-level co-occurrence matrix (GLCM) texture, the correlation feature and the homogeneity feature basically reflect the uniformity of the image texture. Higher values of correlation and homogeneity indicate that each midpalatal suture fusion image has more uniform textures [57,58]. During growth and development, the gray level of the midpalatal suture and its adjacent regions grow closer, and the image texture is more consistent, referring to the increasing maturation and ossification process. Midpalatal suture, yet not obliterated, will change a lot in morphological characteristics during this process [1,18,52,54,55]. However, much valuable image information was lost in previous studies since the image texture features are difficult to be directly recognized by human eyes.
The positive relevance between the midpalatal suture maturation process and the overall growth status represented by chronological age indicates that even though midpalatal suture may not fuse or obliterate for many years or even during a life-long period, its maturation and ossification status experiences significant changes during the fast growth and development period, for both females and males.

Clinical Significance of Age Range Prediction Model by Midpalatal Suture Image Features
As mentioned above, the maturation and ossification status of a midpalatal suture experience significant changes during the fast growth and development period [1,18,52,54,55]. Our age range prediction model using the midpalatal suture image features proves that the overall prediction efficiency is satisfactory, especially for the youngest 4 to 10 years range (0.9106) and the oldest 17 to 23 years range (0.7887) ( Figure 15).
Meanwhile, the prediction efficiency for the 11 to 12 years range (0.6825), the 13 to 14 years range (0.6581), and the 15 to 16 years range (0.7262) are relatively lower, especially the 13 to 14 years range (0.6581). It is in correspondence with the clinical dilemma when predicting the skeletal effectiveness of RME treatment for patients of this age range [1]. The midpalatal suture maturation and ossification process are sensitive in this age range, and individual differences are more obvious in this period than in other age ranges. If chronological age is not an efficient indicator for midpalatal suture maturation and ossification status for these patients, RME clinical effectiveness should then be appraised by midpalatal suture image features directly. Further studies should focus on identifying optimized image characteristics to appraise midpalatal suture maturation and ossification status more satisfactory than chronological age, especially for RME treatment clinical sensitive period of the 13 to 14 years range.
Compared with the previous methods that extract and analyze midpalatal suture image characteristics through a single image section, the image fusion algorithm in this study helps utilize multi-slice valuable image information to show the overall perspective of the midpalatal suture in one fused image [40][41][42]. Furthermore, structure labeling by clinical experts helps improve the proportion of midpalatal sutures in the final images. The chronological age prediction model in this study thus provides obvious indicative evidence for midpalatal suture maturation and ossification appraisal.

Conclusions
(1) We designed a midpalatal suture CBCT image fusion algorithm to utilize multi-slice valuable image information to improve the appraisal accuracy of midpalatal suture maturation and ossification status. This algorithm avoids the influence of CBCT examination orientation and the convex palatal vault, thus helping to show the overall perspective of midpalatal suture in one fused image. (2) The correlation feature and the homogeneity feature are the two texture features with the strongest relevance to chronological age. The midpalatal suture maturation and ossification status experience significant changes during the fast growth and development period. Furthermore, the overall performance of the age range prediction CNN model by midpalatal suture image features is satisfactory, especially in the youngest 4 to 10 years range and the oldest 17 to 23 years range. While for adolescents of 13 to 14 years range, the prediction performance is compromised, indicating that RME clinical effectiveness should be appraised by midpalatal suture image features directly rather than by chronological age for this age range. (3) There are some limitations to this study. Sample representativeness and sample size should be further improved and expanded by the addition of multicenter samples. Furthermore, the relationship between the midpalatal suture fused image features and maxillary transverse developmental status need to be further clarified to provide evidence for appraising suitable RME treatment timing.  Informed Consent Statement: Patient consent was waived due to the CBCT images will not lead to privacy disclosure of participants, which has been approved by the Institutional Review Board of Peking University School of Stomatology.
Data Availability Statement: Not applicable.