Next Article in Journal
Breakfast Consumption Moderates the Association Between Bullying Victimization and Anxiety and Depressive Symptoms in Adolescents
Previous Article in Journal
A Pilot Study of a Youth Gardening Retrospective Survey Tool: Evaluating Outcomes of School-Based, Garden-Enhanced Nutrition Education Programs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning-Based Prediction of Dental Caries Risk in Preschool Children Using Data from the CAMBRA-Kids Mobile Application

1
Carius Dental Clinic, 618th Dental Company (AS), Dental Health Activity, USFK Army, Pyeongtaek 17977, Republic of Korea
2
Department of Dental Hygiene, Namseoul University, Cheonan 31020, Republic of Korea
*
Author to whom correspondence should be addressed.
Future 2026, 4(2), 15; https://doi.org/10.3390/future4020015
Submission received: 31 January 2026 / Revised: 11 April 2026 / Accepted: 16 April 2026 / Published: 20 April 2026

Highlights

What are the main findings?
  • Longitudinal change variables derived from CAMBRA-kids data were effective in predicting caries risk transitions, achieving a ROC-AUC of 0.773 and an average precision of 0.919.
  • SHAP analysis revealed that changes in light-induced fluorescence loss (ΔΔF), restored teeth status (ΔD3), and red-fluorescent plaque area (ΔΔR70) are the most influential predictors of risk escalation, indicating that it is driven by accumulated clinical and biological alterations.
What are the implications of the main findings?
  • Caries risk transitions reflect accumulated biological and clinical changes over time rather than transient oral hygiene fluctuations.
  • Change-based, explainable machine learning approaches may improve the early identification of children at risk for caries progression.

Abstract

Early childhood caries risk is dynamic and can change over relatively short periods, even in the presence of preventive interventions. This study aimed to predict caries risk transitions in preschoolers using longitudinal data from the CAMBRA-kids mobile application. Using machine learning, we identified children whose risk progressed to high or extreme categories over 12 months and clarified the key contributing factors. A Random Forest model was developed using a multidimensional dataset that integrated parent-reported behavioral data and clinical assessments. Model performance was evaluated through ROC and precision–recall (PR) analyses, while SHAP was employed to ensure model interpretability and identify influential variables. Despite improvements in disease indicators and risk factors overall following the intervention, a subset of children transitioned to high or extreme risk. The model demonstrated acceptable discriminative performance with high precision in an imbalanced dataset. Changes in quantitative light-induced fluorescence loss, restored teeth, and red-fluorescent plaque area were identified as key predictors. These findings suggest that caries risk escalation reflects cumulative biological and clinical changes rather than short-term behavioral fluctuations and support the use of longitudinal, explainable machine learning for early risk identification and targeted prevention.

1. Introduction

Advances in information and communication technology (ICT), artificial intelligence (AI), and data analytics have ushered in an era of precision medicine, in which healthcare is increasingly tailored to individual characteristics [1]. This paradigm shift has gradually extended to dentistry, evolving into precision dentistry. This new approach integrates biological, behavioral, and environmental information to predict disease risk and develop personalized preventive strategies. Accordingly, oral healthcare is transitioning from a treatment-oriented approach focused on disease occurrence to a data-driven, predictive, personalized prevention model [2].
Dental caries is a multifactorial disease influenced by the complex interplay of microorganisms, dietary habits, behavioral patterns, and socioeconomic factors [3]. To address the limitations of conventional preventive education that fails to account for such complexity, Featherstone proposed the Caries Management by Risk Assessment (CAMBRA) model, which comprehensively evaluates disease indicators, risk factors, and protective factors [4]. However, CAMBRA has inherent limitations in adequately capturing the complex interactions among risk factors and temporal transitions in caries risk over time [5].
The oral environment and health-related behaviors of preschool children change rapidly due to growth and caregiver involvement. This makes it difficult for single-time-point assessments to fully reflect dynamic changes in caries risk [6]. Continuous follow-up and data-driven approaches are therefore required. Given that caregivers’ behaviors directly influence children’s oral health, participatory mobile health (mHealth) strategies have been shown to be particularly effective [7]. The CAMBRA-kids application, developed based on a Korean-adapted caries risk assessment tool, has been used to promote self-care and behavioral modification in children. However, previous studies have primarily focused on usability evaluation and short-term effectiveness [8]. To date, limited research has examined long-term changes in caries risk categories or the determinants influencing such transitions.
Recently, machine learning-based predictive models have been increasingly applied across healthcare disciplines to elucidate complex relationships among multiple interacting factors [9,10]. Among these methods, Random Forest algorithms, which aggregate multiple decision trees, are widely used. They provide high predictive accuracy and enable a quantitative assessment of relative feature importance. This makes them particularly suitable for clinical prediction research [11]. Despite their strong predictive performance, ensemble models are often criticized as “black-box” systems due to their limited interpretability [12]. To address this limitation, explainable artificial intelligence (XAI) techniques, such as SHapley Additive exPlanations (SHAP), have gained attention. SHAP uses a game-theoretic framework to visualize the contribution of individual variables, thereby enhancing model interpretability and expanding the clinical applicability of complex predictive models [13].
This study aimed to apply the concept of precision dentistry to caries prevention in preschool children by utilizing machine learning techniques to predict caries risk based on data collected through the CAMBRA-kids mobile application and to analyze interactions among associated factors. This approach aims to provide foundational evidence for establishing a long-term, data-driven oral healthcare management framework.

2. Materials and Methods

2.1. Study Design and Ethical Approval

This study was conducted as a retrospective cohort study with longitudinal follow-up to identify factors associated with the escalation of dental caries risk in preschool children, using secondary data collected through a mobile-based caries management program. This study was conducted in accordance with the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines for cohort studies (see Table S1). The analysis focused on changes in caries risk status from pre-intervention to post-intervention following a 12-month caries management program delivered via the CAMBRA-kids mobile application, which is based on the Caries Management by Risk Assessment (CAMBRA) framework [4].
The dataset used in this study originated from a previously implemented intervention evaluating the effectiveness of the CAMBRA-kids mobile application for children under 5 years of age [8]. The present study represents a secondary analysis of de-identified data collected during that intervention, and no additional data were collected.
The study protocol was approved by the Institutional Review Board of Namseoul University (IRB No. NSU-1041479-007; approval date: 11 June 2019). To ensure participant confidentiality, all personal identifiers were removed prior to analysis. These de-identified data were stored in a secure, password-protected environment accessible only to the investigators. The requirement for informed consent was waived due to the retrospective use of secondary data.

2.2. Study Population and Data Collection

Data were derived from a longitudinal clinical study [8] involving a one-year caries management program via the CAMBRA-kids mobile application. The original intervention targeted preschool children under 5 years of age and was designed to support caries risk management through individualized feedback and behavior modification [8].
Participants were included if caregivers provided informed consent and completed the CAMBRA-kids application with the required entries on their child’s risk and protective factors, and if the child was able to participate throughout the intervention period. Participants were excluded if caregivers did not provide consent or did not download or complete the application, if the child was absent from scheduled program sessions or was transferred during the study period, or if caregiver questionnaire responses were incomplete. For this secondary analysis, only children with complete data at both baseline (T0) and 12-month follow-up (T1) were included. A total of 119 preschool children were included in the final analysis.
Pre-intervention assessments conducted at enrollment and post-intervention assessments conducted at the 12-month follow-up included the evaluation of CAMBRA-based caries-related variables, including disease indicators, risk factors, and protective factors; QLF measures (ΔF, ΔR30, ΔR70, and ΔR120); oral hygiene status assessed using the SHS; salivary flow rate for the assessment of severe dry mouth; and caregiver-related measures such as oral health knowledge and self-efficacy.

2.3. Outcome Definition

The primary outcome was defined as a ‘high-risk transition’, a binary variable indicating whether a child moved to or remained in the ‘high’ or ‘extreme high’ caries risk categories at the 12-month follow-up (T1). Children classified into these categories at T1 were coded as positive cases (event = 1), whereas those in the ‘low’ or ‘moderate’ risk categories were coded as negative cases (event = 0). This binary outcome was used for the subsequent predictive modeling.

2.4. Variables and Measurements

Caries-related variables were collected in accordance with CAMBRA guidelines and categorized into disease indicators, risk factors, and protective factors [4,5]. Variable definitions and assessment procedures followed those used in a prior CAMBRA-kids study [8] to ensure consistency and comparability across analyses.
Assessments were performed at baseline and repeated after 12 months. Risk and protective factors, including caregiver-related characteristics, dietary habits, oral hygiene behaviors, fluoride exposure, and preventive practices, were obtained from caregiver responses entered into the CAMBRA-kids mobile application.
Disease indicators were assessed by clinicians through clinical examination and included enamel defects or white spot lesions, cavitated caries lesions, and restorations due to caries. Quantitative light-induced fluorescence (QLF) was additionally used to evaluate enamel demineralization and plaque-related conditions. Plaque accumulation and plaque maturity were assessed using Qraycam™ with dedicated analysis software (QA2 v1.23; Inspektor Research Systems BV, Amsterdam, the Netherlands), whereas enamel demineralization was assessed using Qraypen™ (AIOBIO Co., Ltd., Seoul, Republic of Korea) and Q-ray v1.34 software (v1.34; AIOBIO Co., Ltd., Seoul, Republic of Korea), with fluorescence loss quantified as the average ΔF (%).
Plaque maturity was evaluated using red fluorescence plaque parameters (ΔR30, ΔR70, and ΔR120), with higher ΔR values indicating greater plaque maturity. All fluorescence images were obtained in a dark environment by a single trained dental hygienist to ensure measurement reliability. Oral hygiene status was assessed using the Simple Hygiene Score (SHS), scored from 0 to 5, and severe dry mouth was assessed by measuring salivary flow rate [8,14,15,16].

2.5. Pre–Post Change Variables

To capture longitudinal changes over the intervention period, change variables (delta variables) were calculated as the difference between the 12-month follow-up (T1) and baseline (T0) values (Δ = T1 − T0). Baseline (T0) values were also included as covariates to adjust for initial differences in caries risk status among participants.

2.6. Data Preprocessing and Feature Selection

Categorical variables, including binary-coded risk factors, protective factors, and disease indicators (0 = absence, 1 = presence), were entered into the model as binary indicators, whereas continuous delta variables were standardized to have a mean of zero and a standard deviation of one to minimize scale-related bias. Changes in caregiver oral health knowledge and self-efficacy demonstrated minimal contribution to predictive performance and were excluded from the final model to improve model parsimony. All analyses were conducted using Python (version 3.12; Python Software Foundation, Wilmington, DE, USA) in the Google Colab environment (Google LLC, Mountain View, CA, USA), with the scikit-learn (version 1.4.2) and SHAP libraries (version 0.44.1).

2.7. Machine Learning Model Development

A Random Forest classifier consisting of 500 decision trees was used to predict post-intervention caries risk escalation [17,18]. Delta variables were used as primary predictors, and pre-intervention variables were included as covariates. Model development was conducted within a pipeline-based framework incorporating data preprocessing, class-weight adjustment, and cross-validation. Numerical variables were imputed using the median, whereas categorical variables were imputed using the most frequent value and one-hot-encoded. Class imbalance was addressed using class-weight adjustment. Model development and internal validation were performed using 5-fold stratified cross validation with shuffled splits (random state = 42). Feature importance and stability were examined using SHAP values, and variables showing stable contributions across models were retained for the final model.

2.8. Model Evaluation and Interpretability

Model performance was evaluated using receiver operating characteristic (ROC) and precision–recall (PR) curves. Given the imbalanced nature of the outcome variable, PR curves were used alongside ROC curves to provide a more informative assessment of predictive performance [18]. Area under the ROC curve (AUC) and average precision (AP) were calculated [19]. Out-of-fold predicted probabilities obtained from the cross-validation procedure were used for ROC and PR analyses, and the optimal classification threshold was determined from the precision–recall curve using the threshold that maximized the F1 score.
To improve model interpretability, SHapley Additive exPlanations (SHAP) analysis was performed to quantify the contribution and directional impact of individual pre–post change variables on model predictions. SHAP is a game-theoretic approach that enables transparent interpretation of complex ensemble models [12,13].

3. Results

3.1. Distribution of Disease Indicators, Risk Factors, and Protective Factors at Pre- and Post-Intervention

The distributions of CAMBRA-based disease indicators, risk factors, and protective factors at pre- and post-intervention are presented in Table 1. Frequencies and percentages were calculated to describe changes over the 12-month period.
Among the disease indicators, the proportion of children with restorations present (past caries experience for the child) increased from 33.6% at pre-intervention to 67.2% at post-intervention, whereas the proportion with obvious white spots, decalcifications, or enamel defects decreased from 19.3% to 5.0%. The proportion of children for whom plaque is obvious on the teeth and/or gums bleed easily remained high at both pre-intervention (85.7%) and post-intervention (84.0%).
Among risk factors, caregiver caries experience and frequent sugar intake decreased from 38.7% to 28.6%. Among protective factors, the proportion of children brushing at least twice daily with fluoride toothpaste increased from 84.9% to 90.8%, fluoride varnish application within the previous 6 months increased from 32.8% to 65.5%, and caregiver xylitol use increased from 67.2% to 86.6%.

3.2. Transitions in Caries Risk Categories from Pre- to Post-Intervention

Transitions in caries risk categories over the 12-month period are illustrated in Figure 1. Among children classified as low-risk at pre-intervention, 28.6% transitioned to high-risk at post-intervention. Among those classified as moderate-risk at pre-intervention, 42.9% transitioned to high risk. Conversely, 22.2% of children classified as extreme high-risk at pre-intervention transitioned to high-risk at post-intervention.

3.3. Predictive Performance of the Random Forest Model

The Random Forest model demonstrated acceptable discriminative performance for predicting post-intervention caries risk escalation. The ROC curve yielded an AUC of 0.773. Precision–recall analysis showed an average precision (AP) of 0.919.
Using a PR-optimized classification threshold of 0.698, the model achieved an accuracy of 0.798 and a balanced accuracy of 0.701. Precision and recall were 0.892 and 0.856, respectively, while specificity was 0.545. The F1 score at this threshold was 0.874 (Figure 2).

3.4. SHAP-Based Importance of Pre–Post Change Variables

SHAP analysis was performed to evaluate the contribution of the pre–post change variables to model predictions (Figure 3). Among all delta variables, change in fluorescence loss (ΔΔF) showed the highest importance (0.074), followed by change in restored teeth status (ΔD3; 0.068) and change in red fluorescence plaque area at the 70% threshold (ΔΔR70; 0.061).
Increases in ΔΔF, ΔD3, and ΔΔR70 contributed to classification into the high or extreme caries risk group. Among protective factors, change in caregiver caries-free status (ΔP6) demonstrated a moderate contribution, with decreases associated with higher risk classification and increases associated with lower risk classification.

4. Discussion

This study used 12-month longitudinal follow-up data collected through the CAMBRA-kids mobile application to predict transitions to high- or extreme-risk categories for dental caries and identify the determinants of these transitions. A Random Forest machine learning model, combined with SHapley Additive exPlanations (SHAP), was used to quantify the contribution of changes in clinical indicators and behavioral factors to caries risk transitions. This model also provided a clinically interpretable explanation of the predictive process.
Following the intervention, the overall mean levels of disease indicators and risk factors improved; however, a subset of children continued to transition to high or extreme caries risk categories. These findings suggest that although caries risk in early childhood may be reduced at the population level through intervention, risk transitions can persist at the individual level. The oral environment during early childhood is highly dynamic and can change rapidly in response to external factors such as caregiver management behaviors, dietary habits, and fluoride exposure, leading to temporal fluctuations in caries risk [17]. This highlights the limitation of relying solely on average intervention effects to explain individual risk trajectories and supports the necessity of approaches that incorporate pre–post change variables (delta variables). Accordingly, this study moved beyond traditional CAMBRA score-based or cross-sectional analyses [8] by examining how longitudinal biological and clinical changes contribute to caries risk transitions over time.
The Random Forest-based predictive model demonstrated good overall discriminative performance, achieving an ROC–AUC of 0.773. Notably, despite the pronounced class imbalance in the dataset, with relatively few cases exhibiting transitions to higher-risk categories, the area under the precision–recall (PR) curve was high, with an average precision (AP) of 0.919. This indicates that the model achieved high precision and reliability in identifying true cases of caries risk escalation. These findings support the use of a machine learning approach integrating clinical indicators with data collected through the CAMBRA-kids mobile platform as a digital healthcare tool for the early prediction of oral health deterioration and the support of preventive interventions in preschool children [20].
The classification threshold was set to prioritize recall. This decision was made in accordance with a clinical judgment that failing to identify children at high risk (false negatives) poses a greater potential for harm than overclassifying low-risk children as high-risk (false positives) [21]. The relatively lower specificity observed in the model indicates an increase in false-positive classifications; however, in clinical prediction contexts, sensitivity and specificity are inherently subject to a trade-off. In instances where the clinical cost of false negatives exceeds that of false positives, previous studies have recommended prioritizing sensitivity [22]. From this perspective, the implementation of supplementary procedures, including closer follow-up and additional assessments, has the potential to mitigate the adverse effects of misclassification in real-world clinical applications.
Analysis of feature importance using SHAP revealed that changes in ΔΔF, ΔD3, and ΔΔR70 were the most influential predictors of transitions to high-risk categories after the intervention. ΔΔF exerted the strongest impact on model predictions and may be considered a clinically meaningful indicator. The term ‘ΔF’ is used to denote the change in enamel fluorescence loss, which is measured using quantitative light-induced fluorescence (QLF). This method can reflect subtle progression of early mineral loss with a high degree of sensitivity [23]. Previous studies have reported that ΔF can distinguish between progression and arrest of demineralization prior to cavitation [24]. In line with these findings, the present results suggest that caries risk transitions do not occur abruptly at the stage of clinically evident lesions but rather emerge along a biological continuum characterized by the cumulative progression of early demineralization over time.
It is evident that the presence of restored teeth is indicative of prior caries experience, a phenomenon that is reflected in the alterations in disease indicators as measured by ΔD3. The identification of ΔD3 as a key predictor suggests exposure to a high-risk environment in which caries has already developed. Previous cohort studies [16] and analyses of the CAMBRA-students mobile application in adolescent populations [5] have similarly reported that changes in disease indicators, such as prior caries or restoration experience, are strong predictors of subsequent transitions to high-risk categories. The findings of this study indicate that past caries experience functions not merely as a record of treatment history but as an accumulated marker of disease activity and sustained risk exposure.
The emergence of ΔΔR70 as a highly significant variable indicates that increases in caries risk are more closely associated with qualitative changes in dental plaque accumulated over time than with transient fluctuations in oral hygiene status [16]. Together with early demineralization changes (ΔΔF), this finding supports the notion that transitions to high-risk categories occur along a continuum of caries development, involving plaque maturation, increased acid production, mineral loss, and subsequent disease activation. This interpretation is consistent with the findings previously reported by Spatafora et al. These researchers described caries development as a progressive biological process rather than a discrete clinical event [25].
This study has several limitations. As a retrospective cohort study involving preschool children who participated in a single intervention program, the sample size was limited, and the number of cases exhibiting upward risk transitions was relatively small. This resulted in an imbalanced data structure. Consequently, the prioritization of recall in threshold selection led to a trade-off with lower specificity. Nevertheless, the study provides both academic and clinical value by moving beyond single-time-point scoring or cross-sectional analyses and by predicting and explaining caries risk transitions using pre–post change variables. It is recommended that future longitudinal cohort studies be conducted, with extended follow-up periods, to evaluate the external generalizability of change-based predictive models and to assess their applicability across diverse clinical settings. Based on our findings, we propose several implications for clinical practice and future research. In clinical practice, oral healthcare providers should prioritize the longitudinal monitoring of clinical indicators such as ΔΔF, which reflects mineral loss of the tooth surface, and ΔΔR70, which reflects dental plaque maturity, as these measures may capture cumulative biological changes associated with caries risk progression. Future research should further investigate whether the provision of explainable artificial intelligence (XAI)-based feedback through mobile health platforms can influence caregiver behaviors and improve long-term oral health outcomes across diverse clinical settings.

5. Conclusions

This study used multidimensional data collected through the CAMBRA-kids mobile application to predict which preschool children would transition to high- or extreme-risk categories for dental caries following a 12-month intervention. The study also aimed to identify the key contributing factors underlying such transitions.
The Random Forest-based predictive model demonstrated strong discriminative performance, achieving an ROC–AUC of 0.773 and an average precision (AP) of 0.919, effectively identifying cases of caries risk escalation. The change-based approach, which focuses on pre–post delta variables, was particularly useful in explaining risk transitions at the individual level that are obscured by average intervention effects in conventional point-in-time classifications. SHAP analysis identified change variables such as ΔΔF, ΔD3, and ΔΔR70 as the most influential contributors to transitions to high-risk categories. These findings suggest that the escalation of caries risk reflects cumulative biological and clinical changes rather than transient behavioral fluctuations.
These results demonstrate that change-based approaches incorporating explainable artificial intelligence can facilitate the early identification of high-risk children and provide a robust foundation for developing personalized preventive strategies.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/future4020015/s1, Table S1: STROBE checklist.

Author Contributions

Conceptualization, S.-Y.L., Y.-M.K. and A.-N.Y.; methodology, Y.-M.K.; software, Y.-M.K.; validation, S.-Y.L., Y.-M.K. and A.-N.Y.; formal analysis, Y.-M.K.; investigation, Y.-M.K. and A.-N.Y.; resources, Y.-M.K. and A.-N.Y.; data curation, Y.-M.K. and A.-N.Y.; writing—original draft preparation, Y.-M.K. and A.-N.Y.; writing—review and editing, S.-Y.L., Y.-M.K. and A.-N.Y.; visualization, Y.-M.K.; supervision, S.-Y.L.; project administration, S.-Y.L. and A.-N.Y.; funding acquisition, S.-Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and was approved by the Institutional Review Board of the Namseoul University (protocol code: NSU-1041479-007, approved 11 June 2019).

Informed Consent Statement

Due to the retrospective nature of the study and the use of anonymized data, patient consent was waived, as there was no direct interaction with participants.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request. The data presented in this study are available on request from the corresponding author due to ethical restrictions and the need to protect the privacy of pediatric participants.

Acknowledgments

The authors acknowledge the use of Google Colab, a cloud-based Python (version 3.12) computing environment, for data preprocessing, machine learning model implementation, and visualization. The platform supported the development of the Random Forest models and SHAP-based interpretability analyses used in this study.

Conflicts of Interest

Authors Yu-Min Kang and An-Na Yeo are employed by the company Carius Dental Clinic, 618th Dental Company (AS). The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
APAverage Precision
AUCArea Under the Curve
CAMBRACaries Management by Risk Assessment
ICTInformation and Communication Technology
IRBInstitutional Review Board
mHealthMobile Health
PR curvePrecision–Recall Curve
QLFQuantitative Light-Induced Fluorescence
ROC curveReceiver Operating Characteristic Curve
SHAPSHapley Additive exPlanations
SHSSimple Hygiene Score
XAIExplainable Artificial Intelligence

References

  1. Collins, F.S.; Varmus, H. A new initiative on precision medicine. N. Engl. J. Med. 2015, 372, 793–795. [Google Scholar] [CrossRef]
  2. Kimon, D. Precision dentistry in early childhood: The central role of genomics. Dent. Clin. N. Am. 2017, 61, 619–625. [Google Scholar] [CrossRef]
  3. Pitts, N.B.; Zero, D.T.; Marsh, P.D.; Ekstrand, K.; Weintraub, J.A.; Ramos-Gomez, F.; Ismail, A. Dental caries. Nat. Rev. Dis. Primers 2017, 3, 17030. [Google Scholar] [CrossRef] [PubMed]
  4. Featherstone, J.D.; Domejean-Orliaguet, S.; Jenson, L.; Wolff, M.; Young, D.A. Caries risk assessment in practice for age 6 through adult. J. Calif. Dent. Assoc. 2007, 35, 703–710. [Google Scholar] [CrossRef] [PubMed]
  5. Kang, Y.M.; Yeo, A.Y.; Lee, S.Y. Analysis of predictive factors for dental caries risk among adolescents using the random forest algorithm. J. Korean Soc. Dent. Hyg. 2025, 25, 323–333. [Google Scholar] [CrossRef]
  6. Zaborskis, A.; Razmienė, J.; Razmaitė, A.; Andruškevičienė, V.; Narbutaitė, J.; Bendoraitienė, E.A.; Kavaliauskienė, A. Twelve-year changes in preschoolers’ oral health and parental involvement in children’s dental care: Results from two repeated cross-sectional surveys in Lithuania. Children 2024, 11, 1380. [Google Scholar] [CrossRef]
  7. Ajay, K.; Azevedo, L.B.; Haste, A.; Morris, A.J.; Giles, E.; Gopu, B.P.; Subramanian, M.P.; Zohoori, F.V. App-based oral health promotion interventions on modifiable risk factors associated with early childhood caries: A systematic review. Front. Oral Health 2023, 4, 1125070. [Google Scholar] [CrossRef]
  8. Yeo, A.Y.; Lee, S.Y. Effect of dental caries management using ‘CAMBRA-kids’ mobile application for children under 5 years old. Int. J. Dent. Hyg. 2022, 20, 443–452. [Google Scholar] [CrossRef]
  9. Lim, H.J. A step-by-step guide to random forest modeling using Orange data mining in the field of periodontitis. J. Korean Acad. Oral Health 2021, 45, 218–226. [Google Scholar] [CrossRef]
  10. Lee, H.C.; Park, M.B.; Won, Y.J. Artificial intelligence–based machine learning prediction of diabetes in older adults in South Korea: A cross-sectional analysis. JMIR Form. Res. 2025, 9, e57874. [Google Scholar] [CrossRef] [PubMed]
  11. Shin, S.B.; Cho, H.J. Correlated variable importance for random forests. Korean J. Appl. Stat. 2021, 34, 177–190. [Google Scholar] [CrossRef]
  12. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
  13. Salih, A.M.; Raisi-Estabragh, Z.; Galazzo, I.B.; Radeva, P.; Petersen, S.E.; Lekadir, K.; Menegaz, G. A perspective on explainable artificial intelligence methods: SHAP and LIME. Adv. Intell. Syst. 2025, 7, 2400304. [Google Scholar] [CrossRef]
  14. Lee, J.B.; Choi, D.H.; Mah, Y.J.; Pang, E.K. Validity assessment of quantitative light-induced fluorescence-digital (QLF-D) for the dental plaque scoring system: A cross-sectional study. BMC Oral Health 2018, 18, 187. [Google Scholar] [CrossRef]
  15. Han, S.Y.; Kim, B.R.; Ko, H.Y.; Kwon, H.K.; Kim, B.I. Assessing the use of quantitative light-induced fluorescence-digital as a clinical plaque assessment. Photodiagnosis Photodyn. Ther. 2016, 13, 34–39. [Google Scholar] [CrossRef]
  16. Hummel, R.; van der Sanden, W.; Bruers, J.; van der Heijden, G. The relationship between claimed restorations and future restorations in children and adolescents: An observational follow-up study on risk categories for dental caries. PLoS ONE 2021, 16, e0259495. [Google Scholar] [CrossRef]
  17. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  18. Saito, T.; Rehmsmeier, M. The precision–recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef]
  19. Boyd, K.; Eng, K.H.; Page, C.D. Area under the precision–recall curve: Point estimates and confidence intervals. Mach. Learn. 2013, 92, 241–261. [Google Scholar] [CrossRef]
  20. Cabral, M.B.B.S.; Mota, E.L.A.; Cangussu, M.C.T.; Vianna, M.I.P.; Floriano, F.R. Risk factors for caries-free time: A longitudinal study in early childhood. Rev. Saude Publica 2017, 51, 118. [Google Scholar] [CrossRef]
  21. Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef] [PubMed]
  22. Marioriyad, A.; Ramazi, P. Optimizing accuracy, recall, specificity, and precision using ILP. Mathematics 2025, 13, 1059. [Google Scholar] [CrossRef]
  23. Bratthall, D.; Hänsel Petersson, G.; Sundberg, H. Reasons for the caries decline: What do the experts believe? Eur. J. Oral Sci. 1996, 104, 416–422. [Google Scholar] [CrossRef] [PubMed]
  24. Pretty, I.A.; Ellwood, R.P. The caries continuum: Opportunities to detect, treat and monitor the remineralization of early caries lesions. J. Dent. 2013, 41, S12–S21. [Google Scholar] [CrossRef]
  25. Spatafora, G.; Li, Y.; He, X.; Cowan, A.; Tanner, A.C.R. The evolving microbiome of dental caries. Microorganisms 2024, 12, 121. [Google Scholar] [CrossRef]
Figure 1. Changes in the caries risk group.
Figure 1. Changes in the caries risk group.
Future 04 00015 g001
Figure 2. Receiver operating characteristic (ROC) and precision–recall (PR) curves of the final Random Forest model. The dashed orange line represents the performance of a random classifier (AUC = 0.5).
Figure 2. Receiver operating characteristic (ROC) and precision–recall (PR) curves of the final Random Forest model. The dashed orange line represents the performance of a random classifier (AUC = 0.5).
Future 04 00015 g002
Figure 3. SHAP summary plot of delta variables.
Figure 3. SHAP summary plot of delta variables.
Future 04 00015 g003
Table 1. Distribution of disease indicators, risk factors, and protective factors for dental caries assessed at pre- and post-intervention.
Table 1. Distribution of disease indicators, risk factors, and protective factors for dental caries assessed at pre- and post-intervention.
CategoryVariableDescriptionPre n (%)Post n (%)
Disease indicatorsD1Obvious white spots, decalcifications, or enamel defects23 (19.3%)6 (5.0%)
D2Obvious decay present on the child’s teeth54 (45.4%)44 (37.0%)
D3Restorations present (past caries experience for the child)40 (33.6%)80 (67.2%)
D4Plaque is obvious on the teeth and/or gums bleed easily102 (85.7%)100 (84.0%)
D5Visually inadequate saliva flow0 (0.0%)4 (3.4%)
Risk factorsR1Mother or primary caregivers had active dental decay in the past 12 months46 (38.7%)34 (28.6%)
R2Frequent (>3 times/day) between-meal snacks of sugars/cooked starch/sugared beverages46 (38.7%)34 (28.6%)
R3Saliva-reducing factors are present, including medications (e.g., some for asthma [albuterol] or hyperactivity), medical (cancer treatment) or genetic factors5 (4.2%)6 (5.0%)
R4Child has developmental problems/CSHCN (child with special health care needs)4 (3.4%)3 (2.5%)
R5What kinds of insurance do you have currently?15 (12.6%)10 (8.4%)
Protective factorsP1Use of fluoridated water or fluoride supplements6 (5.0%)5 (4.2%)
P2Teeth brushed with fluoride toothpaste (pea size) at least 2× daily101 (84.9%)108 (90.8%)
P3Fluoride varnish in last 6 months39 (32.8%)78 (65.5%)
P4Mother/caregiver chews/dissolves xylitol chewing gum/lozenges 2–4× daily80 (67.2%)103 (86.6%)
P5Child has a dental home and regular dental care 8 (6.7%)30 (25.2%)
P6Mother/caregiver decay-free for last three years71 (59.7%)73 (61.3%)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kang, Y.-M.; Yeo, A.-N.; Lee, S.-Y. Machine Learning-Based Prediction of Dental Caries Risk in Preschool Children Using Data from the CAMBRA-Kids Mobile Application. Future 2026, 4, 15. https://doi.org/10.3390/future4020015

AMA Style

Kang Y-M, Yeo A-N, Lee S-Y. Machine Learning-Based Prediction of Dental Caries Risk in Preschool Children Using Data from the CAMBRA-Kids Mobile Application. Future. 2026; 4(2):15. https://doi.org/10.3390/future4020015

Chicago/Turabian Style

Kang, Yu-Min, An-Na Yeo, and Su-Young Lee. 2026. "Machine Learning-Based Prediction of Dental Caries Risk in Preschool Children Using Data from the CAMBRA-Kids Mobile Application" Future 4, no. 2: 15. https://doi.org/10.3390/future4020015

APA Style

Kang, Y.-M., Yeo, A.-N., & Lee, S.-Y. (2026). Machine Learning-Based Prediction of Dental Caries Risk in Preschool Children Using Data from the CAMBRA-Kids Mobile Application. Future, 4(2), 15. https://doi.org/10.3390/future4020015

Article Metrics

Back to TopTop