Next Article in Journal
Semi-Quantitative ΔCt Thresholds for Bacteriuria and Pre-Analytic Drivers of PCR-Culture Discordance in Complicated UTI: An Analysis of NCT06996301
Next Article in Special Issue
Longitudinal Monitoring of Brain Volume Changes After COVID-19 Infection Using Artificial Intelligence-Based MRI Volumetry
Previous Article in Journal
Recurrence, Readmission, and Key Mortality Predictors in Patients with Carbapenem-Resistant Enterobacterales Infections
Previous Article in Special Issue
Artificial Intelligence in the Diagnosis and Management of Atrial Fibrillation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluation of the Performance of an Artificial Intelligence-Based Classification Model for Pediatric Maxillofacial Morphology †

by
Hiroki Sato
1,
Akane Ueda
1,
Camila Tussie
2,
Sophie Kim
3,
Yukinori Kuwajima
1,
Emiko Kikuchi
1,
Shigemi Nagai
4 and
Kazuro Satoh
1,*
1
Division of Orthodontics, Department of Developmental Oral Health Science, School of Dentistry, Iwate Medical University, 1-3-27 Chuo-dori, Morioka 020-8505, Japan
2
Division of Orthodontics, Department of Developmental Biology, Harvard School of Dental Medicine, 188 Longwood Avenue, Boston, MA 02115, USA
3
Division of Orthodontics, Department of Regenerative and Reconstructive Sciences, UCLA School of Dentistry, 714 Tiverton, Los Angeles, CA 90095, USA
4
Department of Restorative Dentistry and Biomaterials Sciences, Harvard School of Dental Medicine, 188 Longwood Avenue, Boston, MA 02115, USA
*
Author to whom correspondence should be addressed.
This manuscript is a revised and expanded version of the poster presentation entitled “Classification of Maxillofacial Morphology in Japanese Children Using Artificial Intelligence and Verification of Accuracy,” which was presented at the 84th Annual Meeting of the Japanese Orthodontic Society, held in Sapporo, Japan, on 29 September–1 October 2025.
Diagnostics 2025, 15(23), 2958; https://doi.org/10.3390/diagnostics15232958
Submission received: 17 October 2025 / Revised: 13 November 2025 / Accepted: 13 November 2025 / Published: 21 November 2025

Abstract

Background/Objectives: Accurate assessment of craniofacial morphology is essential for orthodontic diagnosis and treatment planning. The Sassoni classification provides a useful framework for categorizing craniofacial morphology into nine groups but lacks standardized clinical criteria. This study developed an AI model to classify pediatric craniofacial morphology based on the Sassoni classification using lateral cephalometric radiographs and evaluated its agreement with classifications made by orthodontists. Methods: Data from 300 pediatric patients aged 6 to 10 years were analyzed. Nine cephalometric measurements and patient gender were used as input features. Three orthodontists classified morphology based on the Sassoni classification. Random forest (RF), logistic regression (LR), and support vector classification (SVC) models were trained and evaluated using 10-fold cross-validation. Results: The Random Forest (RF) model demonstrated the highest accuracy (RF: 0.907 ± 0.051, LR: 0.837 ± 0.057, SVC: 0.770 ± 0.055). It also outperformed the other two models in terms of F1 score, sensitivity, and positive predictive value, showing the best overall classification performance. The most influential feature was the ANB angle, while gender had minimal impact. Conclusions: The RF-based AI model demonstrated high accuracy in pediatric maxillofacial classification. Performance may be further improved with larger datasets and more balanced case distributions.

1. Introduction

In recent years, artificial intelligence (AI) has increasingly been applied in healthcare to improve diagnostic accuracy [1,2].
Within orthodontics, research has focused on developing AI models for diagnostic and treatment planning purposes. One study used machine learning (ML) models trained on lateral cephalometric radiographs (lateral cephalograms) to determine the need for orthognathic surgery or tooth extraction [3].
In recent years, various applications of AI have been investigated, such as automatic landmark identification on cephalometric radiographs [4], prediction of three-dimensional facial morphology following orthognathic or orthodontic treatment [5], and other emerging approaches in orthodontic research [6,7].
Accurate diagnosis, the establishment of clear treatment goals, and effective treatment planning are all essential in orthodontics. For children in the growth and development phase, clinicians must account for jaw growth direction and treatment response in relation to the type of malocclusion. Accordingly, precise evaluation of maxillofacial morphology is critical when formulating a treatment plan. Lateral cephalometric analysis is widely used for assessing malocclusion and maxillofacial morphology and is considered a fundamental diagnostic tool in orthodontics. Various methods exist for categorizing maxillofacial morphology, most of which are based on lateral cephalometric analysis items.
The Sassouni classification defines maxillofacial morphology using horizontal and vertical relationships. Horizontally, skeletal Class I represents a balance between the upper and lower jaws, skeletal Class II reflects maxillary protrusion or mandibular retrusion, and skeletal Class III represents mandibular protrusion or maxillary retrusion. Vertically, patients are classified as short frame, medium frame, or long frame based on facial height. Combining these horizontal and vertical categories yields nine classification patterns [8].
Ricketts’ analysis also employs a three-classification system based on skeletal Class I, II, and III in the horizontal direction, sharing the basic diagnostic framework with Sassoni’s classification. In the vertical direction, however, it classifies faces into three types: Brachy facial, Meso facial, and Dolicho facial, which roughly correspond to Sassoni’s short frame, medium frame, and long frame types.
Therefore, the Sassouni classification is an effective framework for understanding maxillofacial morphology as a comprehensive analysis method that can integrate horizontal and vertical elements while maintaining consistency with other diagnostic systems such as Ricketts analysis. However, clear criteria and standardization for its application have not been established, and the thresholds and weightings for cephalometric analysis items have not been unified, so interpretations may differ among evaluators.
Furthermore, the Sassouni classification is based on adult populations and does not fully reflect the morphological changes specific to the growth period. When applying it to children, adjustments and standardization that take growth characteristics into account are required.
To address this issue, developing an AI classification model for pediatric craniofacial morphology and enhancing the objectivity and reproducibility of diagnosis is expected to enable adaptation to morphological changes associated with growth, contributing to early treatment intervention and optimization of treatment plans.
The objective of this study is to develop an AI model for classifying maxillofacial morphology in children using lateral cephalometric analysis and to evaluate its classification performance.
A part of this study was previously presented as a poster at the 84th Annual Meeting of the Japanese Orthodontic Society in 2025 [9].

2. Materials and Methods

2.1. Data Collection

This cross-sectional study included 300 children aged 6 to 10 years (134 boys: 8 years 6 months ± 12 months; 166 girls: 8 years 6 months ± 10 months) who were examined at the Orthodontic Department of Iwate Medical University Uchimaru Medical Center between January 2011 and July 2023. All subjects had complete initial examination records. All participants were in Hellmann’s developmental stage (tooth age) IIIA to IIIB. Patients who had previously undergone orthodontic treatment or who had conditions that could affect maxillofacial morphology during growth, such as cleft lip and palate, or chromosomal abnormalities, were excluded.
This study was approved by the Ethics Committee of Iwate Medical University (Approval Number: 01373).

2.2. Training Data Preparation

Lateral cephalograms obtained at the initial examination were analyzed using WinCeph software (Version 11; Rise Corporation, Sendai, Japan). From the 127 analysis items generated, two horizontal parameters (ANB angle and overjet) and seven vertical parameters (mandibular plane to FH, mandibular plane to SN, ramus plane to FH, ramus plane to SN, gonial angle, overbite, and N-Me/Cd-Go) [10,11] were selected as input features for maxillofacial morphology classification (Figure 1). The ANB angle, selected as a horizontal parameter, is the most widely used indicator and is employed in the general Class I, Class II, and Class III classifications.
In vertical classification, the Mandibular plane to FH, Mandibular Plane to SN, Ramus Plane to FH, and Ramus Plane to SN accurately represent the position of the mandible. Together with the Gonial Angle, these are crucial for evaluating mandibular morphology and rotation direction.
Furthermore, N-Me/Cd-Go represents the ratio of anterior facial height to posterior facial height, serving as an auxiliary parameter for vertical classification.
Alongside these items, overjet and overbite, reflecting the occlusal state, were entered as input features.
Calibration was performed using lateral cephalograms with a metal ruler inserted. Scale correction was applied based on the known distance between two points on the ruler, and calibration was implemented in WinCeph. This converted all linear measurements on the cephalometric images to actual physical distances.
Three orthodontists (two instructing physicians and one certified physician, all members of the Japanese Orthodontic Society) classified each patient’s maxillofacial morphology horizontally into skeletal Class I, Class II, or Class III and vertically into short frame, medium frame, or long frame. Combining the three horizontal and three vertical classifications yielded nine morphological patterns (skeletal Class I short frame, skeletal Class II short frame, skeletal Class III short frame, skeletal Class I medium frame, skeletal Class II medium frame, skeletal Class III medium frame, skeletal Class I long frame, skeletal Class II long frame, and skeletal Class III long frame).
The orthodontists classified each patient independently. If two orthodontists agreed on one category and the third differed, a majority decision was applied. If all three selected different categories, the case was discussed until a consensus was reached. These consensus-based classifications were then used as the gold standard classification. Model training and evaluation were performed using 10-fold cross-validation. The 300 patients were randomly divided into 10 subsets. At each iteration, nine subsets were used for model training, and one subset was used for evaluation. The average performance across all iterations was calculated as the final evaluation metric.

2.3. AI Model Development

For the Sassouni classification, the Python scikit-learn (sklearn) package (version 1.6.1) was used. Data preprocessing and numerical computations were performed using pandas (version 2.2.2) and numpy (version 2.0.2).
Three machine learning (ML) models were trained and evaluated:
(1)
Random forest classifier (RF);
(2)
Logistic regression (LR);
(3)
Support vector classification (SVC).
Each model was trained and evaluated using 10-fold cross-validation. Results from each iteration were averaged to assess overall performance. Model classification outputs were then compared with the orthodontists’ classifications (Figure 2).

2.4. Analysis of Input Features Importance

The feature importance function in the scikit-learn package was applied to identify which input features most strongly influenced the output of the Sassouni classification model.

3. Results

3.1. Orthodontist Classification Results

The classification agreement between Observer 1 and Observer 2 was 0.589 according to Cohen’s κ, between Observer 2 and Observer 3 it was 0.603, and between Observer 1 and Observer 3 it was 0.871.
Fleiss’s κ, indicating overall agreement, was 0.684. All these values indicated agreement within an acceptable range.
First, we present the individual classification results by three orthodontists (Observers 1–3, Figure 3).
The distribution of case numbers in the final integrated classification is as follows. Final training data: 4 skeletal Class II short frames, 9 skeletal Class I short frames, 4 skeletal Class III short frames, 86 skeletal Class II medium frames, 130 skeletal Class I medium frames, 31 skeletal Class III medium frames, 12 skeletal Class II long frames, 19 skeletal Class I long frames, 5 skeletal Class III long frames (Figure 4).

3.2. Comparison with ML Model Classification Performance

Three ML models (RF, LR, and SVC) from the Python scikit-learn package were tested. Their performance was compared, and the optimal model was identified. The RF model achieved a precision of 0.907 ± 0.051, an F1 score of 0.740 ± 0.160, a sensitivity of 0.750 ± 0.147, and a positive predictive value of 0.741 ± 0.170. Compared with the LR and SVC models, RF demonstrated the highest performance across all evaluation metrics (precision, F1 score, sensitivity, and positive predictive value) (Table 1).

3.3. Classification Performance for Horizontal Classification Alone

For horizontal classification (Skeletal Classes I, II, and III), the RF model achieved a macro-average accuracy of 0.963 ± 0.031, an F1 score of 0.948 ± 0.044, a sensitivity of 0.937 ± 0.053, and a positive predictive value of 0.971 ± 0.030 (Table 2). The concordance rate was highest for skeletal Class II (0.98), followed by skeletal Class III (0.97), and lowest for skeletal Class I (0.95) (Table 3). Misclassification occurred in 11 patients, most commonly involving skeletal Class III cases being classified as skeletal Class I (Figure 5).
In horizontal classification, misclassifications tended to occur frequently when the ANB angle was close to the classification boundary (e.g., near the threshold between Class I and Class III, Figure 6).

3.4. Classification Performance for Vertical Classification Alone

For vertical classification (Short frame, Medium frame, Long frame), the RF model achieved a macro-average accuracy of 0.973 ± 0.025, an F1 score of 0.937 ± 0.055, a sensitivity of 0.915 ± 0.075, and a positive predictive value of 0.983 ± 0.028 (Table 2). The concordance rate was highest for short frame (1.00), followed by medium frame (0.97) and long frame (0.97) (Table 4). Misclassification occurred in 8 patients, with the most frequent error being short frame cases misclassified as medium frame, followed by short frame cases misclassified as long frame (Figure 7).
In vertical classification, misclassification was more likely to occur in cases where there was a misalignment between the reference plane and the patient’s plane (Figure 8).

3.5. Classification Performance for Sassouni Classifications

Concordance rate, reproducibility, and F1 score were calculated for each of the nine Sassouni classifications using the RF model. The highest concordance rates were observed in skeletal Class II short frame, skeletal Class I short frame, skeletal Class II long frame, and skeletal Class III long frame. The lowest concordance rate was observed for the skeletal Class III short frame. The classifications with the highest reproducibility were skeletal Class II medium frame and skeletal Class I medium frame, while skeletal Class III short frame had the lowest reproducibility. The highest F1 score was observed in skeletal Class II medium frame, and the lowest in skeletal Class III short frame (Table 5). A total of 28 patients were misclassified. Of these, 2 patients were misclassified in both the horizontal and vertical components, 11 were misclassified only in the horizontal component, and 15 were misclassified only in the vertical component. Common errors included skeletal Class I short frame misclassified as skeletal Class I medium frame, skeletal Class III short frame misclassified as skeletal Class III medium frame, skeletal Class III medium frame misclassified as skeletal Class I medium frame, and skeletal Class I long frame misclassified as skeletal Class I medium frame (Figure 9).

3.6. Feature Importance

Feature importance analysis was performed to identify the influence of input variables on the Sassouni classification model (Figure 10). The ANB angle had the greatest impact, followed by the mandibular plane to FH and overjet. In contrast, the patient’s sex had the least impact on model predictions.

4. Discussion

4.1. AI Model Machine Learning Algorithm

In this study, we developed an AI model to classify the maxillofacial morphology of children in the growth and development phase and compared the performance of three ML algorithms: RF, LR, and SVC. Data from 300 children were used to train the AI model, and classification performance was evaluated. Results showed that the RF model outperformed the other models across all metrics—precision, F1 score, sensitivity, and positive predictive value (PPV). The RF model constructs multiple classification rules (decision trees) based on randomly selected features and integrates them for prediction. This method can handle non-linear data and complex patterns and is relatively tolerant to overfitting. In our study, the analysis items followed either normal or lognormal distributions, with some showing regression relationships with each other. If classification can be achieved using only a few analysis items, a linear model may suffice. However, because classification in this study required combinations of multiple items, the RF model, which is well-suited for complex data, demonstrated superior performance [12,13,14,15,16].
The LR model showed the second-best performance. It calculates probabilities from input features and applies them to binary classifications. LR is computationally efficient, accurate, and easy to interpret. However, it assumes linear relationships among input variables. In this study, the response variable was maxillofacial morphology, and cephalometric items were explanatory variables, which differed from standard binary classification settings. This may have contributed to lower performance than RF. Nevertheless, increasing the number of samples in smaller groups (e.g., skeletal Class III short frame, skeletal Class III long frame, skeletal Class II short frame) could potentially improve LR performance despite its binary-classification design. The small sample size in certain groups likely reduced their classification performance [17,18,19,20,21].
SVC performance was the weakest. SVC maps data into a higher-dimensional space and classifies it by identifying a separating hyperplane with the maximum margin. Although kernel functions can enhance performance with non-linear data, model accuracy is highly dependent on appropriate kernel and parameter selection, which can be difficult. Inappropriate settings may significantly reduce classification performance. Moreover, SVC relies heavily on support vectors near decision boundaries; thus, performance declines when there is high variability or class imbalance. Because our dataset included uneven group sizes, this likely contributed to the weaker classification results [22,23,24].
Although RF demonstrated high classification performance, this study only performed k-fold cross-validation as internal validation, and its reproducibility on external data remains unverified. Therefore, the possibility of overfitting cannot be completely ruled out, and future work requires confirmation of generalization performance through external validation using independent datasets.
On the other hand, while LR and SVM excel in computational efficiency and parameter interpretability, they have limitations in classifying pediatric maxillofacial morphology, which involves non-linear changes during growth and interactions among multiple factors. In contrast, the decision tree-based RF is advantageous as it can flexibly capture complex non-linear relationships while suppressing overfitting. It is considered a method that combines high performance and practicality for real-world clinical use.
Therefore, the results of this study suggest that RF may be the most practical and reproducible algorithm for automated jaw classification, and its further application and development are anticipated.

4.2. RF Model Horizontal and Vertical Classification

For horizontal classification, 11 cases were misclassified, indicating strong overall performance. However, skeletal Class III cases were frequently misclassified as skeletal Class I. This may be explained by the role of the ANB angle in horizontal classification. Borderline values of the ANB angle (around 2 ± 2°) are interpreted differently among researchers [25,26,27], and borderline values may have led to misclassification in this study. Additionally, the small number of skeletal Class III cases in the training dataset may have reduced the model’s ability to accurately recognize this group. Since participants were in Hellmann’s stages IIIA to IIIB (pre-adolescence), skeletal features, particularly mandibular protrusion, were not yet fully pronounced. Because mandibular growth accelerates during adolescence, ANB angle differences become more apparent later, but at this developmental stage, the feature distribution was relatively narrow. This developmental factor may have contributed to misclassifications. For vertical classification, 8 cases were misclassified, with most errors involving short frames or long frames being classified as medium frames. Overall, vertical classification performance was superior to horizontal classification. This may be due to vertical classification relying on seven analysis items, compared with only two for horizontal classification. However, data imbalance was also present: 247 cases were classified as medium frame, while only 36 were short frame cases, and 17 were long frame cases. Such an imbalance may have influenced erroneous classifications.
In horizontal classification, misclassification tended to occur more frequently in cases where the ANB angle was located near the classification boundary (e.g., near the threshold between Class I and Class III). In these cases, the ANB angle is prone to slight variation due to the position of the S-point or N-point, or the rotation direction of the mandible, and may not always accurately reflect the actual skeletal relationship (Figure 6).
In contrast, in vertical classification, errors in setting the reference plane (FH plane or SN plane) and individual variations caused angle measurements like MP-FH and MP-SN to shift vertically. This resulted in classifications crossing the boundary line. It was confirmed that even slight differences in the tilt of the reference plane could be a factor in misclassification (Figure 8).

4.3. Sassouni Classification

Using the Sassouni classification, 28 cases were misclassified (11 horizontal, 15 vertical, and 2 in both). Overall classification performance decreased compared with horizontal- or vertical-only models. Common misclassifications included skeletal Class III as Class I and short frame as medium frame. Ueda et al. [28] applied AI classification to adult patients and reported clearer morphological distinctions, with an even distribution of 220 patients across groups. Their RF model achieved horizontal precision of 0.986 ± 0.011, vertical precision of 0.855 ± 0.037, and overall Sassouni precision of 0.823 ± 0.060. In contrast, our study targeting children achieved slightly lower horizontal precision (0.963 ± 0.031) but higher vertical and overall precision compared with the adult study. This may be attributable to our larger sample size (300 children), high-quality training data, and the large number of skeletal Class I medium frame cases (130 patients), which likely enhanced learning and improved classification accuracy.

4.4. Input Feature Importance

Feature importance analysis indicated that ANB angle, mandibular plane to FH, and overjet were the most influential features, reflecting their relevance to anteroposterior jaw relationships and mandibular rotation. In contrast, sex had the least impact, likely because boys and girls in the pre-adolescent phase show minimal differences in maxillofacial size. Additionally, distance measurements reflecting size were not included, and N-Me/Cd-Go was expressed as a proportion. Similarly to prior adult studies, the ANB angle was the most important feature, followed by the mandibular plane to FH. However, slight differences were observed in other feature rankings between children and adults, reflecting morphological variation by age group. Similarly to adults, the high feature importance of ANB angle and FMA is likely because the training data encapsulates the diagnostic reasoning patterns of clinicians. In actual diagnosis, a stepwise process is common: first, the ANB angle is used to assess the anterior–posterior relationship, followed by the FMA to evaluate the vertical tendency. Subsequently, consistency with supplementary measurements is verified, and if inconsistencies are found, the diagnosis is reconsidered. This conditional and hierarchical clinical decision-making structure was learned by the model, resulting in the high contribution of the primary indicators, ANB angle and FMA.
Furthermore, because most patients in our dataset were classified as skeletal Class I medium frame, the AI model may not have had sufficient training data for borderline cases. Expanding underrepresented categories through targeted data collection may help improve classification performance.
Future efforts should focus on balancing group sizes and incorporating additional feature quantities to enhance model accuracy and generalizability.

5. Conclusions

The RF-based AI model demonstrated high accuracy in classifying pediatric maxillofacial morphology, outperforming both LR and SVC. These results indicate that AI-based Sassouni classification can be performed with high precision in children, potentially providing objective and reproducible diagnoses for maxillofacial morphology during growth stages. AI-based automatic classification is considered useful for reducing inter-clinician variability in assessments and for early identification of growth direction and skeletal tendencies. Therefore, this approach supports the auxiliary use of AI in orthodontic diagnosis and has the potential to contribute to diagnostic standardization and improved early treatment planning.
Future studies should further verify the generalizability and clinical applicability of the proposed model through external validation using multi-center datasets.

Author Contributions

Conceptualization, H.S., Y.K., A.U., E.K. and K.S.; methodology, H.S., Y.K., A.U., S.N., S.K., C.T. and K.S.; software, S.K. and C.T.; validation, H.S., A.U., Y.K., E.K., S.N. and K.S.; formal analysis, H.S., A.U., Y.K., S.N., K.S., S.K. and C.T.; investigation, H.S., A.U., Y.K., S.N., E.K., K.S., S.K. and C.T.; resources, K.S.; data curation, H.S., A.U., Y.K., E.K. and K.S.; writing—original draft preparation, H.S., A.U., Y.K., E.K. and K.S.; writing—review and editing, Y.K. and K.S.; visualization, H.S., S.N., S.K. and C.T.; supervision, K.S.; project administration, K.S.; funding acquisition, K.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was approved by the Institutional Review Board at Iwate Medical University, School of Dentistry (approval number 01373, 10 August 2022).

Informed Consent Statement

All data used in this study were completely anonymized, and no personally identifiable information was included. In accordance with the “Ethical Guidelines for Life Sciences and Medical Research Involving Human Subjects” in Japan, this study adopted an opt-out approach instead of obtaining individual informed consent.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Acknowledgments

The authors acknowledge Ryoichi Tanaka for his invaluable advice.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
RFRandom Forest
LRLogistic Regression
SVCSupport Vector Classification

References

  1. Davis, S.; Zhang., J.; Lee, I.; Rezaei, M.; Greiner, R.; McAlister, F.A.; Padwal, R. Effective hospital readmission prediction models using machine-learned features. BMC Health Serv. Res. 2022, 22, 1415. [Google Scholar] [CrossRef]
  2. Miotto, R.; Li, L.; Kidd, B.A.; Dudley, J.T. Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 2016, 6, 26094. [Google Scholar] [CrossRef]
  3. Choi, H.-I.; Jung, S.-K.; Baek, S.-H.; Lim, W.H.; Ahn, S.-J.; Yang, I.-H.; Kim, T.-W. Artificial intelligent model with neural network machine learning for the diagnosis of orthognathic surgery. J. Craniofac. Surg. 2019, 30, 1986–1989. [Google Scholar] [CrossRef] [PubMed]
  4. Kunz, F.; Stellzig-Eisenhauer, A.; Zeman, F.; Boldt, J. Artificial intelligence in orthodontics: Evaluation of a fully automated cephalometric analysis using a customized convolutional neural network. J. Orofac. Orthop. 2020, 81, 52–68. [Google Scholar] [CrossRef] [PubMed]
  5. Tanikawa, C.; Yamashiro, T. Development of novel artificial intelligence systems to predict facial morphology after orthognathic surgery and orthodontic treatment in Japanese patients. Sci. Rep. 2021, 11, 15853. [Google Scholar] [CrossRef] [PubMed]
  6. Li, Z.; Wang, L. Multi-task reinforcement learning and explainable AI-Driven platform for personalized planning and clinical decision support in orthodontic-orthognathic treatment. Sci. Rep. 2025, 15, 24502. [Google Scholar] [CrossRef]
  7. Olawade, D.B.; Leena, N.; Egbon, E.; Rai, J.; Mohammed, A.P.E.K.; Oladapo, B.I.; Boussios, S. AI-Driven Advancements in Orthodontics for Precision and Patient Outcomes. Dent. J. 2025, 13, 198. [Google Scholar] [CrossRef]
  8. Sassouni, V. A classification of skeletal facial types. Am. J. Orthod. 1969, 55, 109–123. [Google Scholar] [CrossRef]
  9. Hiroki, S.; Akane, U.; Tussie, C.; Kim, S.; Yukinori, K.; Emiko, K.; Shigemi, N.; Kazuro, S. Classification of Maxillofacial Morphology in Japanese Children Using Artificial Intelligence and Verification of Accuracy. In Proceedings of the 84th Annual Meeting of the Japanese Orthodontic Society, Sapporo, Japan, 29 September–1 October 2025. [Google Scholar]
  10. Downs, W.B. Variations in facial relationships: Their significance in treatment and prognosis. Am. J. Orthod. 1948, 34, 812–840. [Google Scholar] [CrossRef]
  11. Steiner, C.C. Cephalometrics in clinical practice. Angle. Orthod. 1959, 29, 8–29. [Google Scholar] [CrossRef]
  12. Montella, A.; Aria, M.; D’aMbrosio, A.; Mauriello, F. Analysis of powered two-wheeler crashes in Italy by classification trees and rules discovery. Accid. Anal. Prev. 2012, 49, 58–72. [Google Scholar] [CrossRef]
  13. Morgan, E.W.; Perdew, G.H.; Patterson, A.D. Multi-Omics Strategies for Investigating the Microbiome in Toxicology Research. Toxicol. Sci. 2022, 187, 189–213. [Google Scholar] [CrossRef] [PubMed]
  14. Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef] [PubMed]
  15. Schonlau, M.; Zou, R.Y. The random forest algorithm for statistical learning. Stata. J. 2020, 20, 3–29. [Google Scholar] [CrossRef]
  16. Tong, T.; Li, Z. Predicting learning achievement using ensemble learning with result explanation. PLoS ONE 2025, 20, e0312124. [Google Scholar] [CrossRef]
  17. Gnardellis, C.; Notara, V.; Papadakaki, M.; Gialamas, V.; Chliaoutakis, J. Overestimation of relative risk and prevalence ratio: Misuse of logistic modeling. Diagnostics. 2022, 12, 2851. [Google Scholar] [CrossRef]
  18. Levy, J.J.; O’Malley, A.J. Don’t dismiss logistic regression: The case for sensible extraction of interactions in the era of machine learning. BMC Med. Res. Methodol. 2020, 20, 171. [Google Scholar] [CrossRef]
  19. Boateng, E.; Abaye, D. A Review of the Logistic Regression Model with Emphasis on Medical Research. J. Data Anal. Inform. Process. 2019, 7, 190–207. [Google Scholar] [CrossRef]
  20. Sperandei, S. Understanding logistic regression analysis. Biochem. Med. 2014, 24, 12–18. [Google Scholar] [CrossRef] [PubMed]
  21. Moral, R.D.A.; Díaz-Orueta, U.; Oltra-Cucarella, J. Logistic versus linear regression-based reliable change index: A simulation study with implications for clinical studies with different sample sizes. Psychol. Assess. 2022, 34, 731–741. [Google Scholar] [CrossRef]
  22. Haubold., A.; Peterson., B.S.; Bansal., R. Progress in using brain morphometry as a clinical tool for diagnosing psychiatric disorders. J. Child Psychol. Psychiatry 2012, 53, 519–535. [Google Scholar] [CrossRef] [PubMed]
  23. Amari, S.; Wu, S. Improving support vector machine classifiers by modifying kernel functions. Neural Netw. 1999, 12, 783–789. [Google Scholar] [CrossRef] [PubMed]
  24. Abukhait, J.; Mansour, A.M.; Obeidat, M. Classification based on Gaussian-kernel Support Vector Machine with Adaptive Fuzzy Inference System. Przegl. Elektrotech. 2018, 1, 16–24. [Google Scholar] [CrossRef]
  25. Eraydin, F.; Cakan, D.G.; Tozlu, M.; Ozdemir, F. Evaluation of buccolingual molar inclinations among different vertical facial types. Korean J. Orthod. 2018, 48, 333–338. [Google Scholar] [CrossRef]
  26. Fontenele, R.C.; Gomes, A.F.; Moreira, N.R.; Costa, E.D.; Oliveira, M.L.; Freitas, D.Q. Do the location and dimensions of the mental foramen differ among individuals of different facial types and skeletal classes? A CBCT study. J. Prosthet Dent. 2023, 129, 741–747. [Google Scholar] [CrossRef]
  27. Miralles, R.; Hevia, R.; Contreras, L.; Carvajal, R.; Bull, R.; Manns, A. Patterns of electromyographic activity in subjects with different skeletal facial types. Angle. Orthod. 1991, 61, 277–284. [Google Scholar] [CrossRef]
  28. Ueda, A.; Tussie, C.; Kim, S.; Kuwajima, Y.; Matsumoto, S.; Kim, G.; Satoh, K.; Nagai, S. Classification of maxillofacial morphology by artificial intelligence using cephalometric analysis measurements. Diagnostics 2023, 13, 2134. [Google Scholar] [CrossRef]
Figure 1. Diagram of cephalometric analysis items used for classification. The blue lines (1, 2) indicate the items used for horizontal classification, whereas the red lines (3–9) indicate the items used for vertical classification.
Figure 1. Diagram of cephalometric analysis items used for classification. The blue lines (1, 2) indicate the items used for horizontal classification, whereas the red lines (3–9) indicate the items used for vertical classification.
Diagnostics 15 02958 g001
Figure 2. Input and output items of the AI model for horizontal, vertical, and Sassouni classifications. Horizontal classification generated three results from three input items, vertical classification generated three results from eight input items, and Sassouni classification generated nine results from ten input items.
Figure 2. Input and output items of the AI model for horizontal, vertical, and Sassouni classifications. Horizontal classification generated three results from three input items, vertical classification generated three results from eight input items, and Sassouni classification generated nine results from ten input items.
Diagnostics 15 02958 g002
Figure 3. Distribution of Sassouni Classification by Three Orthodontists. The vertical axis represents the nine jaw classifications based on the Sassouni classification, while the horizontal axis indicates each observer (1–3).
Figure 3. Distribution of Sassouni Classification by Three Orthodontists. The vertical axis represents the nine jaw classifications based on the Sassouni classification, while the horizontal axis indicates each observer (1–3).
Diagnostics 15 02958 g003
Figure 4. Distribution of orthodontists’ classification results. The vertical axis indicates the horizontal classification, and the horizontal axis indicates the vertical classification.
Figure 4. Distribution of orthodontists’ classification results. The vertical axis indicates the horizontal classification, and the horizontal axis indicates the vertical classification.
Diagnostics 15 02958 g004
Figure 5. RF confusion matrix for horizontal classifications using k-fold (n = 10) cross-validation. The vertical axis represents the orthodontists’ classification results, and the horizontal axis represents the AI’s classification results. Orange to lemon-yellow cells indicate agreement between the orthodontists and AI classifications, whereas gray cells indicate disagreement.
Figure 5. RF confusion matrix for horizontal classifications using k-fold (n = 10) cross-validation. The vertical axis represents the orthodontists’ classification results, and the horizontal axis represents the AI’s classification results. Orange to lemon-yellow cells indicate agreement between the orthodontists and AI classifications, whereas gray cells indicate disagreement.
Diagnostics 15 02958 g005
Figure 6. Example of misclassification in horizontal classification. The red line shows the patient’s profilogram, while the blue line shows the average profilogram of individuals with normal occlusion at the same stage of growth and development. In this case, the ANB angle is located near the classification boundary (between Class I and Class III).
Figure 6. Example of misclassification in horizontal classification. The red line shows the patient’s profilogram, while the blue line shows the average profilogram of individuals with normal occlusion at the same stage of growth and development. In this case, the ANB angle is located near the classification boundary (between Class I and Class III).
Diagnostics 15 02958 g006
Figure 7. RF confusion matrix for vertical classifications using k-fold (n = 10) cross-validation. The vertical axis represents the orthodontists’ classification results, and the horizontal axis represents the AI’s classification results. Orange to lemon-yellow cells indicate agreement between the orthodontists and AI classifications, whereas gray cells indicate disagreement.
Figure 7. RF confusion matrix for vertical classifications using k-fold (n = 10) cross-validation. The vertical axis represents the orthodontists’ classification results, and the horizontal axis represents the AI’s classification results. Orange to lemon-yellow cells indicate agreement between the orthodontists and AI classifications, whereas gray cells indicate disagreement.
Diagnostics 15 02958 g007
Figure 8. Example of misclassification in vertical classification. The red line shows the patient’s profilogram, while the blue line shows the average profilogram of individuals with normal occlusion at the same stage of growth and development. In this case, the patient’s profile is displaced relative to the reference plane.
Figure 8. Example of misclassification in vertical classification. The red line shows the patient’s profilogram, while the blue line shows the average profilogram of individuals with normal occlusion at the same stage of growth and development. In this case, the patient’s profile is displaced relative to the reference plane.
Diagnostics 15 02958 g008
Figure 9. Sassouni classification confusion matrices for the RF model using 10-fold cross-validation. The vertical axis represents the orthodontists’ classification results, and the horizontal axis represents the AI’s classification results. I, II, and III denote skeletal Class I, skeletal Class II, and skeletal Class III, respectively; S, M, and L denote short frame, medium frame, and long frame, respectively. Orange to lemon-yellow cells indicate agreement between the orthodontists and AI classifications, whereas gray cells indicate disagreement.
Figure 9. Sassouni classification confusion matrices for the RF model using 10-fold cross-validation. The vertical axis represents the orthodontists’ classification results, and the horizontal axis represents the AI’s classification results. I, II, and III denote skeletal Class I, skeletal Class II, and skeletal Class III, respectively; S, M, and L denote short frame, medium frame, and long frame, respectively. Orange to lemon-yellow cells indicate agreement between the orthodontists and AI classifications, whereas gray cells indicate disagreement.
Diagnostics 15 02958 g009
Figure 10. Feature importance of gender and nine cephalometric variables in Sassouni classification. The vertical axis lists the features, and the horizontal axis indicates their relative importance.
Figure 10. Feature importance of gender and nine cephalometric variables in Sassouni classification. The vertical axis lists the features, and the horizontal axis indicates their relative importance.
Diagnostics 15 02958 g010
Table 1. Comparison of the performance of Sassouni classification using random forest classifier (RF), logistic regression (LR), and support vector classification (SVC) with k-fold (n = 10) cross-validation.
Table 1. Comparison of the performance of Sassouni classification using random forest classifier (RF), logistic regression (LR), and support vector classification (SVC) with k-fold (n = 10) cross-validation.
RFLRSVC
Accuracy0.907 ± 0.0510.837 ± 0.0570.770 ± 0.055
F1 score0.740 ± 0.1600.623 ± 0.1320.507 ± 0.102
Sensitivity0.750 ± 0.1470.631 ± 0.1160.534 ± 0.102
Positive predictive value (PPV)0.741 ± 0.1700.641 ± 0.1550.499 ± 0.113
Table 2. Comparison of the performance of RF models for horizontal and vertical classifications.
Table 2. Comparison of the performance of RF models for horizontal and vertical classifications.
Horizontal ClassificationVertical Classification
Accuracy0.963 ± 0.0310.973 ± 0.025
F1 score0.948 ± 0.0440.937 ± 0.055
Sensitivity0.937 ± 0.0530.915 ± 0.075
Positive predictive value (PPV)0.971 ± 0.0300.983 ± 0.028
Table 3. Performance of horizontal classification outputs in the RF model.
Table 3. Performance of horizontal classification outputs in the RF model.
ClassificationPrecisionRecallF1 ScoreSupport
Class I0.950.980.97158
Class II0.980.980.98102
Class III0.970.850.9140
Accuracy  0.96300
Macro avg.0.970.940.95300
Weighted avg.0.960.960.96300
Table 4. Performance of vertical classification outputs in the RF model.
Table 4. Performance of vertical classification outputs in the RF model.
ClassificationPrecisionRecallF1 ScoreSupport
Short1.000.820.9017
Medium0.971.000.98247
Long0.970.890.9336
Accuracy  0.97300
Macro avg.0.980.900.94300
Weighted avg.0.970.970.97300
Table 5. Performance of Sassouni classification outputs in the RF model.
Table 5. Performance of Sassouni classification outputs in the RF model.
ClassificationPrecisionRecallF1 ScoreSupport
Class II Short1.000.750.864
Class I Short1.000.560.719
Class III Short0.000.000.004
Class II Medium0.970.980.9786
Class I Medium0.880.980.93130
Class III Medium0.840.840.8431
Class II Long1.000.920.9612
Class I Long0.830.790.8119
Class III Long1.000.200.335
Accuracy  0.91300
Macro avg.0.840.670.71300
Weighted avg.0.900.910.89300
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sato, H.; Ueda, A.; Tussie, C.; Kim, S.; Kuwajima, Y.; Kikuchi, E.; Nagai, S.; Satoh, K. Evaluation of the Performance of an Artificial Intelligence-Based Classification Model for Pediatric Maxillofacial Morphology. Diagnostics 2025, 15, 2958. https://doi.org/10.3390/diagnostics15232958

AMA Style

Sato H, Ueda A, Tussie C, Kim S, Kuwajima Y, Kikuchi E, Nagai S, Satoh K. Evaluation of the Performance of an Artificial Intelligence-Based Classification Model for Pediatric Maxillofacial Morphology. Diagnostics. 2025; 15(23):2958. https://doi.org/10.3390/diagnostics15232958

Chicago/Turabian Style

Sato, Hiroki, Akane Ueda, Camila Tussie, Sophie Kim, Yukinori Kuwajima, Emiko Kikuchi, Shigemi Nagai, and Kazuro Satoh. 2025. "Evaluation of the Performance of an Artificial Intelligence-Based Classification Model for Pediatric Maxillofacial Morphology" Diagnostics 15, no. 23: 2958. https://doi.org/10.3390/diagnostics15232958

APA Style

Sato, H., Ueda, A., Tussie, C., Kim, S., Kuwajima, Y., Kikuchi, E., Nagai, S., & Satoh, K. (2025). Evaluation of the Performance of an Artificial Intelligence-Based Classification Model for Pediatric Maxillofacial Morphology. Diagnostics, 15(23), 2958. https://doi.org/10.3390/diagnostics15232958

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop