Artificial Intelligence for Color Prediction and Esthetic Design in CAD/CAM Ceramic Restorations: A Systematic Review and Meta-Analyses

Ardila, Carlos M.; Pulgarín-Medina, Diana María; Pineda-Vélez, Eliana; Vivares-Builes, Anny M.

doi:10.3390/prosthesis7060160

Open AccessSystematic Review

Artificial Intelligence for Color Prediction and Esthetic Design in CAD/CAM Ceramic Restorations: A Systematic Review and Meta-Analyses

by

Carlos M. Ardila

^1,2,*

,

Diana María Pulgarín-Medina

³,

Eliana Pineda-Vélez

^2,4

and

Anny M. Vivares-Builes

^2,4

¹

Department of Periodontics, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai 600077, India

²

Biomedical Stomatology Research Group, Basic Sciences Department, Faculty of Dentistry, Universidad de Antioquia (UdeA), Medellín 050010, Colombia

³

Faculty of Dentistry, Colegio Odontológico Colombiano, Cali 760033, Colombia

⁴

Faculty of Dentistry, Institución Universitaria Visión de las Américas, Medellín 050040, Colombia

^*

Author to whom correspondence should be addressed.

Prosthesis 2025, 7(6), 160; https://doi.org/10.3390/prosthesis7060160

Submission received: 28 October 2025 / Revised: 26 November 2025 / Accepted: 30 November 2025 / Published: 4 December 2025

(This article belongs to the Special Issue Shaping the Future: Artificial Intelligence in Prosthodontics and Prosthesis Innovation)

Download

Browse Figures

Versions Notes

Abstract

Background/Objectives: Artificial intelligence (AI) is increasingly embedded in CAD/CAM workflows to address persistent challenges in restorative dentistry, including unpredictable color outcomes and time-intensive crown design steps. Yet, evidence on its accuracy and efficiency remains fragmented across heterogeneous study designs and metrics. This systematic review and meta-analyses aimed to evaluate the accuracy and performance of AI for color prediction and automated crown design in CAD/CAM ceramics. Methods: A systematic review with random-effects meta-analyses. The outcomes included design time, internal fit, finish-line accuracy, color-prediction acceptability using ΔE₀₀ (AT₀₀), morphology deviation, and occlusal and proximal contacts. Results: Fifteen studies met the inclusion criteria. The meta-analyses showed that AI-equipped CAD reduced crown design time compared to conventional CAD (MD −88.7 s; 95% CI −134.5 to −42.9; I² = 72%). The internal fit showed a small advantage for AI (MD −17.1 µm; 95% CI −26.2 to −7.9; I² = 90%). For finish-line identification, the pooled mean Hausdorff distance was ~0.35 mm (95% CI 0.316–0.382; I² = 0%). For color prediction, the pooled proportion of predictions within each study’s prespecified acceptability threshold (AT₀₀) was near-universal (0.996; 95% CI 0.988–0.999; I² = 0%). Morphology and functional contacts were not pooled due to incompatible metrics and units. Narrative synthesis indicated AI performance comparable to, or favorable over, conventional/technician workflows in selected regions. Conclusions: AI for CAD/CAM dentistry shows practical promise, most clearly for design-time efficiency and with encouraging signals for internal fit, finish-line identification, and color-prediction acceptability under study thresholds. However, clinical translation should proceed cautiously.

Keywords:

artificial intelligence; CAD/CAM; crowns; CIEDE2000; morphology; occlusion; finish-line

1. Introduction

The pursuit of natural esthetics in restorative dentistry has driven the widespread adoption of computer-aided design and computer-aided manufacturing (CAD/CAM) technologies [1,2]. Ceramic materials fabricated through CAD/CAM have become the standard of care due to their superior esthetic, mechanical, and biocompatibility properties [3,4]. However, one of the most persistent clinical challenges remains the accurate prediction of the final color outcome of restorations and the efficient design of crowns with harmonious morphology, occlusion, and proximal contacts [5,6]. Even with standardized ceramic blocks, the interplay of material translucency, restoration thickness, background substrate, and clinical handling frequently results in unpredictable shade outcomes, necessitating remakes or adjustments that increase cost and chairside time [7,8].

Artificial intelligence (AI), and particularly machine-learning (ML) approaches, have emerged as powerful tools to address these challenges. Recent research has demonstrated that ML algorithms can predict the final color outcome of ceramic restorations with higher accuracy than conventional visual or instrumental methods [9,10,11]. Studies using partial least squares regression, neural networks, and fusion models have reported clinically acceptable error thresholds (ΔE₀₀ = refers to the CIEDE2000 color-difference formula, which quantifies perceptual differences between two colors; lower values indicate closer color matching, with clinically accepted thresholds commonly ranging from 1.0 to 1.8 depending on the context) in predicting the final appearance of leucite-reinforced and lithium disilicate ceramics [9,11]. For example, Kose et al. [9] demonstrated that machine-learning models can predict the final color of leucite-reinforced CAD/CAM ceramic restorations with average ΔE₀₀ errors (~0.38) that fall well below clinical acceptability thresholds (AT₀₀ ≈ 1.77–1.81), indicating clinically acceptable performance. Similarly, Yang et al. [11] validated a fusion machine-learning model to predict color and minimal ceramic thickness across different backgrounds, confirming reliable accuracy against spectrophotometric standards.

More recently, Mascaro et al. [10] confirmed the feasibility of regression models to predict the basic color components of CAD/CAM ceramics—lightness (brightness), as well as red–green and yellow–blue tones—with deviations that remained within clinically acceptable thresholds.

Beyond color prediction, AI has been applied to the automated design of crowns, with promising results in morphology reproduction, occlusal surface generation, proximal contact accuracy, and margin detection [12,13,14]. Cho et al. [6] reported that deep learning-based software could achieve crown morphology, internal fit, and occlusal contact accuracy comparable to technician-designed CAD/CAM crowns. Ding et al. [12] introduced a 3D-DCGAN approach to crown design, showing superior 3D similarity and stress distribution to natural teeth compared to conventional CAD systems. Wu et al. [13] compared AI-powered design software with conventional CAD in vitro, showing significantly shorter design times but variable accuracy depending on the software tested. Similarly, Choi et al. [14] validated a deep learning hybrid method for finish-line detection, reporting lower error thresholds compared to existing CAD software, specifically 3Shape Dental System 2021-1, exocad DentalCAD 3.1, and MEDIT Margin Lines 1.0. These findings demonstrate that AI applications are not limited to esthetics but extend to reproducibility, efficiency, and functional optimization of CAD/CAM restorations.

Despite this growing body of literature, the available evidence remains highly heterogeneous across studies, with substantial variation in ceramic materials, AI methodologies, evaluation metrics, and comparator workflows. This variability makes direct comparison difficult and limits the ability to draw consistent conclusions regarding the true clinical advantages of AI systems in color prediction or crown design. A structured synthesis is, therefore, essential to clarify the extent to which AI contributes to accuracy, efficiency, and overall predictability in CAD/CAM restorative workflows.

This heterogeneity precludes clinicians from drawing robust conclusions on the actual advantages of AI over conventional methods. Importantly, no systematic review or meta-analysis has comprehensively synthesized evidence on AI applications in both color prediction and esthetic design for CAD/CAM ceramics. Thus, the extent to which AI improves clinical outcomes, efficiency, and predictability remains unclear.

To address this gap, the present systematic review and meta-analysis aims to quantitatively synthesize the accuracy and clinical performance of AI methods for color prediction of CAD/CAM ceramic restorations and for automated esthetic design of crowns. By integrating evidence from eligible experimental and clinical investigations, this review seeks to clarify whether AI truly outperforms conventional techniques in terms of accuracy (ΔE₀₀, RMS, and RMSE), efficiency (design time), and functional outcomes, such as occlusion, internal fit, and finish-line detection. The objective is to evaluate the accuracy of AI models in predicting the final color of ceramic restorations compared with conventional methods, to assess the performance of AI-based crown design in reproducing morphology, occlusal contacts, proximal contacts, margin detection, and internal fit in comparison with technician-designed or conventional CAD crowns, and to determine whether AI-powered systems contribute to shorter design times and greater efficiency. This review also aims to highlight existing gaps, limitations, and future research directions in AI-driven restorative dentistry, providing evidence that is essential to guide clinicians, educators, and developers in integrating these technologies into daily practice.

2. Materials and Methods

This systematic review and meta-analysis was conducted following the PRISMA 2020 guidelines [15]. The review protocol was prospectively developed and registered in the International Prospective Register of Systematic Reviews (CRD420251139474).

2.1. Eligibility Criteria

Studies were included if they met the following PICOS-based criteria:

Participants (P): In vitro, in silico with experimental validation, and clinical studies involving CAD/CAM ceramic restorations.
Intervention (I): Application of artificial intelligence (AI) or machine learning (ML) for color prediction of CAD/CAM ceramic restorations or automated esthetic crown design (morphology, occlusion, proximal contacts, finish-line detection, or internal fit).
Comparisons (C): Conventional or reference methods, such as spectrophotometry, spectroradiometry, colorimetry, technician-designed CAD/CAM crowns, or established commercial CAD software.
Outcomes (O): For color prediction: L*, a*, and b* coordinates, ΔE/ΔE₀₀ values, RMSE, mean absolute error (MAE), R² values, and thresholds of perceptibility/acceptability. For crown design: morphological accuracy (3D similarity, RMS deviation, Hausdorff distance, or Intersection over Union), occlusal and proximal contact accuracy, finish-line detection error, internal fit, and design time.
Study design (S): Original experimental or observational studies with quantitative results.

2.2. Exclusion Criteria

We excluded case reports, reviews, conference abstracts without full text, editorials, and letters. Studies were also excluded if they did not involve CAD/CAM ceramic restorations, applied AI methods without validation on dental data, or failed to report quantitative outcomes related to color prediction or crown design.

2.3. Information Sources and Search Strategy

Electronic searches were conducted in PubMed/MEDLINE, Embase, Scopus, and SciELO to identify eligible articles published up to September 2025. No language restrictions were applied. Additional records were retrieved by manually reviewing reference lists of included studies and citation tracking. The search strategy combined controlled vocabulary (MeSH and Emtree) and free-text terms related to artificial intelligence, machine learning, CAD/CAM ceramics, color prediction, and crown design. A representative PubMed strategy was: (“Artificial Intelligence” [MeSH] OR “Machine Learning” [MeSH] OR AI OR ML OR “deep learning” OR “neural network”) AND (“Computer-Aided Design” [MeSH] OR CAD OR CAD/CAM) AND (ceramic OR ceramics OR zirconia OR “lithium disilicate” OR leucite OR “glass-ceramic”) AND (“color prediction” OR “shade matching” OR “crown design” OR morphology OR occlusion OR “proximal contact” OR “finish line” OR “internal fit”). Complete search strategies for all databases are provided in Supplementary Table S1 and as part of the publicly available dataset (Supplementary File S1). After duplicate removal, potentially eligible records were screened by title and abstract. Full texts of articles meeting the inclusion criteria or considered uncertain were retrieved for detailed assessment. Studies excluded after full-text review, along with specific reasons, are summarized in Supplementary Table S2.

2.4. Selection Process

All records were imported into the Rayyan software (Rayyan Systems Inc., Doha, Qatar; version 1.0) for duplicate removal and screening. Two reviewers independently screened titles and abstracts, followed by full-text assessment for eligibility. Disagreements were resolved by discussion or consultation with a third reviewer.

2.5. Data Collection Process

Data extraction was performed independently by two reviewers using a standardized, prepiloted form. Inter-reviewer agreement was assessed using Cohen’s kappa (κ). Agreement for the title/abstract screening phase was κ = 0.86, and agreement for full-text assessment was κ = 0.91, indicating substantial to almost perfect reliability. Agreement during data extraction was also high (κ = 0.88). Discrepancies were resolved through discussion or adjudication by a third reviewer. The following details were recorded: author, year, country, study design, sample characteristics, type of ceramic material, AI/ML algorithm, comparator, primary and secondary outcomes, and main findings. Disagreements were resolved through consensus. All extracted variables used for the qualitative synthesis, quantitative pooling, and forest plot construction are provided in the open data-extraction spreadsheet (Supplementary File S1) and also summarized in Supplementary Table S3.

2.6. Data Items

From each study we collected: the characteristics of the study and participants, descriptions of AI methodologies (e.g., regression models, convolutional neural networks, and generative adversarial networks), training/validation approach, comparator methods, and quantitative outcome measures (ΔE, ΔE₀₀, RMSE, and MAE, R² for color prediction. RMS deviation, Hausdorff distance, occlusal and proximal contacts, finish-line accuracy, internal fit, and design time for crown design). We also noted whether clinical relevance thresholds (e.g., ΔE₀₀ acceptability or margin fit standards) were applied.

2.7. Outcome Definitions

Design time: The total time (in seconds or minutes) required for the software or operator to generate a complete crown design, including automated steps and operator-adjusted phases when applicable.
Internal fit: The discrepancy (µm) between the intaglio surface of the designed crown and the prepared abutment surface, typically quantified using root-mean-square (RMS) error or equivalent gap-measurement techniques.
Finish-line accuracy: The deviation (mm) between the AI-detected margin line and the reference margin determined by experienced technicians or validated gold standards, most commonly expressed as mean Hausdorff distance.
Color-prediction acceptability: The proportion of AI-generated color predictions falling within each study’s prespecified CIEDE2000 acceptability threshold (AT₀₀), reflecting clinically acceptable shade matching.
Morphology deviation: The three-dimensional geometric discrepancy between the AI-generated crown and the reference model, measured using RMS deviation, Chamfer distance, or other 3D surface comparison metrics.
Functional contacts (occlusal and proximal): The presence, number, position, or intensity of simulated occlusal and proximal contacts, typically assessed through intermesh distance thresholds, contact-point loss metrics, or virtual articulating papers.

2.8. Study Risk of Bias Assessment

Risk of bias was independently assessed by two reviewers. For in vitro and in silico studies, we applied the Joanna Briggs Institute (JBI) critical appraisal checklist [16]. For observational studies, the ROBINS-I tool was used as appropriate [17]. Domains included selection bias, classification of interventions, measurement of outcomes, missing data, and selective reporting. Disagreements were resolved by consensus.

2.9. Certainty of Evidence (GRADE Assessment)

The certainty of evidence for each outcome was assessed using the GRADE framework [18]. The five domains considered were risk of bias, inconsistency, indirectness, imprecision, and publication bias. Based on these, the certainty of evidence was rated as high, moderate, low, or very low for each outcome.

2.10. Data Synthesis and Statistical Analysis

All included studies were first described narratively. A quantitative synthesis (meta-analysis) was conducted when at least two studies provided methodologically comparable and sufficiently reported data. Because the included studies reported methodologically distinct outcomes, we conducted several independent meta-analyses (one per eligible outcome), rather than a single unified synthesis. Effect estimates were pooled for design time and internal fit (mean difference, MD). For color prediction, we pooled the proportion of predictions within each study’s prespecified CIEDE2000 acceptability threshold (AT₀₀) using a logit transformation. Morphology deviation and functional contacts were synthesized narratively due to incompatible metrics/units and heterogeneous regions of interest. For the meta-analysis of color-prediction acceptability, the proportion within the CIEDE2000 clinical acceptability threshold (AT₀₀) was defined using each study’s stated threshold. Between-study heterogeneity was assessed with the I² and τ² statistics. Preplanned subgroup analyses considered ceramic type, AI/ML approach, and study setting. Sensitivity analyses were performed by excluding studies judged at high risk of bias.

Forest plots were generated to visually present results, and funnel plots with Egger’s test were planned to assess publication bias where at least ten studies were available. Statistical analyses were conducted with Python (v3.11 Python Software Foundation, Wilmington, DE, USA) and Review Manager (RevMan, v5.4 Cochrane, London, UK). A p-value < 0.05 was considered statistically significant. The full analytic scripts (R and Python) used to generate pooled estimates and plots are openly provided in Supplementary File S2.

3. Results

The electronic database search retrieved 69 records. After the removal of 21 duplicates, 48 unique records remained for screening. Records automatically categorized by the databases as non-eligible document types (e.g., reviews, editorials, and commentaries) were excluded prior to the manual title/abstract screening and are, therefore, labeled as before screening in Figure 1. Of the remaining records, 28 were excluded based on title and abstract evaluation, and 20 full texts were assessed for eligibility. Following full-text review, 5 studies were excluded with reasons (Supplementary Table S2), resulting in 15 articles being included in the final synthesis and meta-analyses [6,9,10,11,12,13,14,19,20,21,22,23,24,25,26]. Figure 1 depicts the PRISMA 2020 flow diagram, showing record numbers at each stage.

3.1. Characteristics of Included Studies

As summarized in Table 1, the 15 included studies comprised three investigations on color prediction for CAD/CAM ceramics [9,10,11] and twelve studies on automated esthetic crown design [6,12,13,14,19,20,21,22,23,24,25,26]. No randomized controlled trials were identified among the eligible studies. The color-prediction studies were all in vitro and used spectrophotometric/spectroradiometric ground truth with AI models to predict CIE L*a*b* values and color differences (ΔE/ΔE₀₀). Methodologies used included multiple ML regressors (e.g., decision trees) [9], partial least-squares regression [10], and a fusion/stacking approach (ExtraTrees + XGBoost) with internal/external validation [11]. Clinical benchmarks varied by study: ΔE₀₀ perceptibility/acceptability thresholds were applied in Kose [9] and Mascaro [10], while ΔE (CIE76) perceptibility/acceptability criteria guided Yang’s minimum-thickness recommendations [11].

The crown-design body of evidence encompassed in vivo clinical datasets (anterior single-unit cases) [22], in vitro or retrospective laboratory datasets using intraoral scans and milled crowns [6,13,19,20,21,23,24,25], and in silico frameworks assessing generative pipelines and similarity metrics [12,26]. Approaches ranged from deep learning–based commercial CAD systems (Automate or Dentbird) evaluated against technician-designed CAD comparators [6,13,19,20,21,22,25], to knowledge-based AI within commercial software [23], and generative adversarial networks (GAN/3D-DCGAN or two-stage DCPR-GAN) for biomimetic morphology generation and testing against ground-truth or state-of-the-art baselines [12,24,26]. The reported quantitative outcomes included the morphology deviation (typically the root mean square—RMS or Chamfer distance), occlusal and proximal contacts, finish-line detection accuracy (Hausdorff or Chamfer distances) [14,19], internal fit, and design time [6,12,13,14,19,20,21,22,23,24,25,26]. Several design studies prespecified operational tolerances or acceptability criteria—for example, ±50 µm bands for morphology deviation and cement-space–based parameters for internal fit in technician/AI comparisons [6,25], threshold-based acceptability for finish-line errors [14,19], and standard references for marginal fit in micro-CT assessments [20].

Figure 2 provides an evidence map that summarizes which outcomes each included study reported and whether an explicit clinical threshold was applied. Given the heterogeneity of designs and metrics across studies, this visualization clarifies coverage versus gaps at a glance. Color-prediction metrics (ΔE/ΔE₀₀ and/or RMSE/MAE/R²) were reported by 3/15 studies [9,10,11]. Within automated crown design, morphological deviation (RMS) was most frequent (9/15), followed by Hausdorff distance (3/15) and Chamfer distance (2/15) [6,12,13,14,19,20,21,22,23,24,25,26]. Functional endpoints included occlusal contacts (8/15) and proximal contacts (4/15). Finish-line accuracy appeared in 3/15 studies [14,19,25], internal fit in 3/15 [6,20,25], and design time in 3/15 [13,20,25]. In total, 9/15 studies stated clinical relevance thresholds (e.g., ΔE/ΔE₀₀ acceptability, ±50 µm morphology bands, ≤120 µm marginal fit, or predefined limits for finish-line error) [6,9,10,11,13,14,19,20,25].

Figure 3 displays the distribution of study settings by domain, highlighting the relative weight of clinical versus laboratory/simulation evidence. For color prediction, all three studies were in vitro [9,10,11]. For crown design, most investigations were in vitro (8/12) [13,14,19,20,21,23,24,25], with two in vivo [6,22] and two in silico [12,26]. This visualization is included to emphasize the degree of clinical translatability of the evidence base and to contextualize heterogeneity in subsequent quantitative syntheses.

3.2. Design Time for Crown Design

For clarity, all forest plots were grouped consecutively and explicitly labeled as forest plots in the figure titles and captions. Across two methodologically comparable studies (AI vs. conventional CAD), AI-assisted software reduced crown design time, although the magnitude varied across settings. In Cho et al. [25], total elapsed time for the entire design workflow (T6) was 277.6 ± 89.7 s with AI versus 337.3 ± 93.5 s with conventional CAD (n = 30 per arm). Wu et al. [13] reported median (IQR) times, which we converted to the mean ± SD using Wan/Luo estimators: combined AI (Automate + Dentbird) ≈ 146.8 ± 23.5 s versus experienced technician ≈ 254.3 ± 46.7 s (n = 66 vs. 33) (details in Supplementary Table S4). In the primary random-effects meta-analysis (k = 2), the pooled mean difference (AI–control) was −88.7 s (95% CI −134.5 to −42.9; I² = 72.2%), favoring AI (Figure 4). The 95% prediction interval was −134.5 to −42.9 s, indicating that future studies are expected to fall within a range consistent with the observed benefit of AI-assisted workflows.

Nagata et al. [20] reported minutes-scale “total elapsed design time” per tooth (AI 98.5 ± 13.6 min vs. conventional 456.8 ± 92.1 min; ~5 technicians across two teeth). Given the broader workflow definition and time scale, this study was treated as non-comparable and analyzed in sensitivity. Including Nagata yielded an unstable pooled estimate (MD −242.5 s; 95% CI −542.9 to +57.9; I² = 98.7%; k = 3) (Figure 5). The 95% prediction interval ranged from −542.9 to +57.9 s, reflecting the substantial heterogeneity introduced when broader workflow definitions are included.

3.3. Internal Fit

Two studies were comparable for the overall internal gap measured on finalized crowns [6,25] (Figure 6). Cho et al. [25] favored AI (MD −30.2 µm; 95% CI −42.4 to −18.0), whereas Cho et al. [6] showed no clear difference (MD −0.2 µm; 95% CI −14.1 to 13.6). The fixed-effect pooled estimate indicated a 17 µm reduction with AI (MD −17.1 µm; 95% CI −26.2 to −7.9), but between-study heterogeneity was high (I² ≈ 90%), and the random-effects sensitivity was imprecise (MD −15.4 µm; 95% CI −44.8 to 14.0). Given only k = 2, these results should be considered exploratory and interpreted with caution, as publication-bias diagnostics were not performed. The 95% prediction interval was –44.8 to +14.0 µm, suggesting that future studies may observe either small improvements or negligible differences.

3.4. Finish-Line Accuracy (Hausdorff Distance, mm)

Two studies reported HD (mm) for automated finish-line detection/design using Dentbird software (Imagoworks Inc., Seoul, South Korea) [14,19]. In Choi et al. [14], subgroup means (digital set vs. intraoral scan; anterior vs. posterior) were combined to a study-level estimate (n = 168, mean 0.343 mm, SD 0.239). Sawangsri et al. [19] reported an overall HD for Dentbird of 0.380 mm (SD 0.431); successful reconstructions were 97/100, and we used n = 97 for pooling. A random-effects meta-analysis (k = 2) yielded a pooled mean HD of 0.349 mm (95% CI 0.316–0.382) with I² = 0% and τ² = 0.00, indicating negligible between-study heterogeneity (Figure 7). The 95% prediction interval ranged from 0.316 to 0.382 mm, consistent with the narrow dispersion expected under negligible between-study heterogeneity. As sensitivity, using n = 100 for Sawangsri gave a virtually identical pooled mean (0.349 mm; 95% CI 0.316–0.382). In Sawangsri et al. [19], Automate achieved a lower HD (0.132 ± 0.057 mm; 99/100 successes), but it was not reported by Choi et al. [14] and was therefore not pooled.

3.5. Color-Prediction Acceptability

Two in vitro studies enabled a meta-analysis of the proportion of AI predictions falling within each study’s CIEDE2000 acceptability threshold [9,10]. Kose et al. [9] reported 632/634 (99.7%) predictions ≤ AT₀₀, with two outliers > 1.77. Mascaro et al. [10] stated that all predictions across four CAD/CAM materials, three thicknesses, and nine backgrounds were within an acceptable ΔE₀₀ range (108/108; 100%). A random-effects, logit-transformed model (with a 0.5 continuity correction) yielded a pooled proportion of 0.996 (95% CI 0.988–0.999; I² = 0%), indicating a very high likelihood of clinically acceptable color predictions across the two datasets (Figure 8). Given only two studies and differences in modeling pipelines, these estimates should be interpreted cautiously.

Because the included studies applied slightly different acceptability thresholds (AT₀₀ values ranging from 1.77 to 1.81), we conducted a sensitivity analysis using a unified threshold of AT₀₀ = 1.80. Under this harmonized cut-off, the pooled proportion of acceptable predictions remained very high (0.992; 95% CI 0.982–0.997; I² = 0%), confirming the robustness of the findings (Figure 9).

To enhance transparency, we also provide a forest-of-proportions figure showing per-study acceptability distributions according to each study’s original threshold (Supplementary Figure S1). This allows readers to visually compare color-acceptability performance without harmonization.

3.6. Morphology Deviation

Across the included crown-design studies, morphology was quantified with different 3D deviation metrics, reference models, and regions of interest, which limited quantitative pooling. A detailed mapping of measurement metrics, units, regions of interest, and clinical acceptability thresholds for all included studies is provided in Supplementary Table S5. Several investigations reported root-mean-square (RMS) surface deviation in micrometers after best-fit alignment, typically comparing AI-generated crowns with conventional/technician workflows. Cho et al. [6] assessed RMS over the entire external, occlusal, and axial surfaces, and also classified deviations using a ±50 µm tolerance band. Depending on the region and software, AI showed lower or comparable deviations from the technician reference. Nagata et al. [20] analyzed the difference in occlusal morphology (RMS, µm) between an AI and a non-AI CAD system and found significantly smaller deviations for the AI design across total, intraoral-scan, and cast-scan datasets. Cho et al. [25] likewise reported the occlusal morphology RMS (µm) using a triple-scan workflow, again indicating an advantage for the AI-based design over a conventional CAD system.

Other studies quantified morphology using point-to-surface distances in millimeters (Chamfer/Hausdorff-type or global RMS) and often referenced a different “ground truth.” Ding et al. [12] evaluated crowns produced by a 3D-DCGAN against a standard/reference crown with 3D deviation metrics, alongside mechanical testing. Chau et al. [24] reported on the accuracy of AI-designed single-molar prostheses using 3D surface deviation (mm) in a feasibility setting. Broll et al. [21] investigated how training data quantity affects morphology fidelity, again using deviation from a standard design. In a clinical application study, Çakmak et al. [22] judged morphologic fidelity together with occlusion and proximal contacts. Where 3D deviations were reported, units and regions differed from the RMS-(µm) framework used in [6,20,25]. Wu et al. [13] also compared AI versus conventional CAD performance, reporting morphology and functional outcomes, but with metric definitions and reference models that do not directly match those above. Finally, Chen et al. [23] contrasted human- and knowledge-based AI designs for lithium–disilicate crowns, analyzing morphology relative to a reference and linking deviations to fracture behavior.

Despite methodological diversity, the direction of effect is broadly consistent: AI-based designs tend to produce equal or lower 3D morphology deviations relative to conventional/technician workflows in controlled comparisons [6,20,25], while studies using millimeter-scale surface metrics also support good geometric fidelity of AI-generated crowns in laboratory and early clinical contexts [12,21,22,24]. However, the lack of standardized outcome definitions and units currently limits the strength of quantitative synthesis.

A direction-of-effect map (Figure 10) summarizes this study’s conclusions, despite heterogeneous metrics and workflows. Comparator studies cluster at AI lower or similar deviations versus conventional/technician designs [6,20,25], whereas benchmark and feasibility studies using millimeter-scale surface metrics also support the geometric fidelity of AI-generated crowns in laboratory and early clinical contexts [12,21,22,24]. This pattern aligns with the qualitative synthesis and explains why a global meta-analysis was not attempted for the morphology outcome.

3.7. Occlusal and Proximal Contacts

Across studies that reported functional contact outcomes, definitions and measurement frameworks varied. Some analyses defined occlusal contact via intermesh distance thresholds (<20 µm) [13], others quantified contact-point number, position, or distance losses [21], and several reported proximal contact length or contact-intensity distributions [20,22]; some also used virtual articulating papers (100/200 µm) to estimate contact number/area [12]. Collectively, these reports describe the establishment of occlusal and proximal contacts by AI-assisted workflows, although comparative data versus technician/conventional designs were inconsistently defined and often allowed limited manual refinement. Because thresholds, regions of interest, and units were not standardized across studies, no quantitative pooling was attempted for contact outcomes.

Preplanned subgroup analyses (ceramic type, AI/ML approach, and study setting) were not conducted owing to the limited number of methodologically comparable studies per outcome and the heterogeneity of outcome definitions and units.

3.8. Study Risk of Bias Assessment

Across in vitro and in silico investigations assessed with the JBI checklists [16], aims and procedures were generally well described, yet recurrent limitations included lack of blinded outcome measurement, incomplete reporting of operator/instrument calibration or repeatability, and absence of a priori sample-size justification. In modeling studies, additional concerns related to potential overfitting, limited or absent external validation, and unclear separation of training/validation/test sets.

For nonrandomized in vivo comparisons evaluated with ROBINS-I [17], the predominant issues were confounding (case selection and operator effects), deviations from intended interventions, and outcome assessment without blinding. Missing data were infrequent, whereas selective reporting was difficult to rule out.

Table 2 synthesizes the study-level judgments by tool and design and indicates that most laboratory and modeling studies were judged as “some concerns,” while the in vivo comparisons were typically “moderate” risk of bias.

3.9. Certainty of Evidence (GRADE)

Across domains, downgrading was applied primarily for indirectness—reflecting the predominance of in vitro/in silico evidence with limited direct clinical applicability—and for imprecision, owing to small sample sizes and low numbers of methodologically comparable studies per outcome [18]. Where substantial between-study heterogeneity was present (notably in morphology metrics and, to a lesser extent, design time), inconsistency contributed to additional downgrading. Assessment of publication bias (funnel plots or Egger’s test) was not undertaken because fewer than ten studies were available per meta-analysis.

Table 3 presents the outcome-specific ratings and their rationale, with overall certainty ranging from very low to low across the key endpoints.

4. Discussion

This review synthesizes emerging evidence on AI–assisted crown design and color prediction across in vitro, in silico, and nonrandomized in vivo settings. Within this heterogeneous literature, two consistent signals were observed: (i) AI-equipped CAD workflows tend to shorten design time compared with conventional CAD, and (ii) machine-learning models for color prediction achieve high proportions of ΔE₀₀ values within clinically acceptable thresholds under bench conditions. At the same time, performance for morphology fidelity, finish-line identification, and functional contacts varied by metric, region of interest, and software implementation.

Comparative studies consistently suggested that AI-equipped CAD can accelerate crown design relative to conventional tools. In an in vitro programmatic comparison, an AI-assisted workflow reduced overall design time across different tooth types and scan sources [20] (minutes-scale definition; treated as a sensitivity analysis rather than pooled). Complementary clinical-dataset work also reported shorter operator time across specific design stages when deep-learning CAD was used instead of technician-directed CAD [25]. Taken together, these observations imply that automation of repetitive subtasks (e.g., initial anatomical proposal or automatic margin propagation) can translate into meaningful time savings at the bench and in preclinical pipelines. However, the magnitude of benefit is likely to vary with operator experience, case complexity, and software presets. Future randomized chairside studies are needed to determine whether the same advantage holds under clinical constraints.

Across two in vitro comparisons, AI-assisted workflows achieved smaller or comparable internal discrepancies than conventional CAD [6,25]. Cho and colleagues evaluated internal fit together with morphology and functional outcomes and reported tighter internal gaps for the deep-learning design in their bench protocol [6]; similar findings were noted in a standardized master-model study contrasting AI-equipped and non-AI CAD [20]. Because internal fit is sensitive to cement space definitions, milling parameters, and scanning strategy, reproducibility across systems must be demonstrated [27,28,29]. Moreover, the evidence remains indirect for clinical outcomes such as retention, marginal integrity, and longevity.

Finish-line identification is a pivotal step for downstream accuracy. A deep-learning detector reached high pixel-level accuracy for finish-line delineation under curated datasets, introducing reference thresholds for mean Hausdorff and Chamfer distances expressed in millimeters [14]. A comparative evaluation against dental laboratory technicians further showed that acceptability and deviation profiles differed across two AI programs and human designs, indicating that software-specific implementations and presets influence the outcome [19]. Collectively, these data support the feasibility of automated or hybrid finish-line detection, but they also underscore the need for consensus metrics and cross-platform validation before routine adoption.

Two in vitro machine-learning studies—one on leucite-reinforced ceramics and the other across CAD-CAM materials and backgrounds—reported high proportions of predictions within their prespecified CIEDE2000 acceptability thresholds (AT₀₀ = 1.77 and 1.81, respectively) [9,10]. A fusion-model study expanded the paradigm by linking predicted color to minimal ceramic thickness across clinical backgrounds [11]. The consistency of bench-level acceptability suggests that supervised models capture a clinically relevant mapping between substrates and backgrounds, ceramic parameters, and optical outcomes. Nevertheless, acceptability thresholds differed between studies, and per-prediction ΔE₀₀ distributions were not available for harmonization, which limits pooled inference. Prospective clinical validation—ideally including operator-blinded spectrophotometric assessment and patient-reported esthetics—remains an unmet need.

Morphology fidelity showed software- and metric-dependent results. Several investigations reported RMS deviations in micrometers after best-fit alignment and found the AI proposal equal or superior to technician or conventional CAD references for selected surfaces or regions [6,13,20,21,25]. Other studies quantified point-to-surface distances in millimeters (mean Hausdorff) or reported unitless Chamfer distance, as well as feasibility metrics, such as Intersection over Union for 3D reconstructions [12,24,26]. Because units and regions of interest were not standardized, and because different “ground truths” were used (standard crowns, technician designs, or best-fit references), a global quantitative synthesis was not warranted. The direction of effect nonetheless suggests that data-driven anatomical priors can produce clinically plausible occlusal anatomy, particularly when trained with sufficient and diverse inputs [12,21,30,31].

Functional contacts were variably defined across studies, ranging from intermesh distance thresholds and the number/position of contact points to proximal contact length or intensity and the use of virtual articulating papers. An in vitro comparison of 3D-GAN-generated crowns reported no significant differences in the number or area of occlusal contacts relative to comparator workflows [12], whereas dataset-level analyses suggested that the quantity and quality of input data influence the occlusal outcome of AI designs [21]. Bench studies integrating morphology, internal fit, and contacts indicated that deep-learning proposals can establish contacts within acceptable ranges, but superiority over technician-directed CAD was not consistent across regions or software [6,25]. Standardized functional endpoints—ideally including dynamic occlusion under load—would allow more definitive conclusions.

Study-level quality appraisals indicated that most laboratory and modeling investigations were judged as presenting “some concerns” using JBI checklists [16], commonly due to unblinded outcome measurement, incomplete reporting of calibration or repeatability, and the absence of a priori sample-size justification. In silico studies also faced risks related to overfitting and limited external validation. Nonrandomized in vivo comparisons were typically at “moderate” risk of bias on ROBINS-I [17], driven by potential confounding from case selection and operator effects, deviations from intended interventions, and unblinded assessments. These patterns, summarized in Table 2, temper the strength of the observed effects and delineate priorities for study design.

Outcome-level certainty was predominantly low to very low, according to GRADE [18]. Downgrading was mainly applied for indirectness—reflecting the predominance of bench or modeling data with limited direct clinical applicability—and for imprecision due to small sample sizes and low numbers of comparable studies per outcome. Where substantial heterogeneity across metrics and units existed, particularly in morphology and, to a lesser extent, design time, inconsistency contributed to further downgrading. Publication bias could not be meaningfully assessed because meta-analyses did not reach the conventional threshold of ten studies.

The overall low to very low certainty of evidence across outcomes was driven by several recurring methodological limitations. First, many in vitro and in silico investigations exhibited risk of bias due to unblinded outcome assessment, limited reporting of calibration or measurement repeatability, and absence of preregistration or sample-size justification. Second, imprecision was substantial, as most outcomes were derived from small sample sizes and only two or three methodologically comparable studies, resulting in wide confidence intervals and unstable pooled effects. Third, there was marked inconsistency and heterogeneity across studies, particularly in morphology and functionality metrics, because AI models differed widely in architecture, training datasets, and workflow implementation, while outcome units (e.g., RMS in micrometers vs. Hausdorff distance in millimeters) were not standardized. These factors collectively constrained the certainty of evidence and underscore the need for future primary studies using standardized metrics, preregistered protocols, adequate sample sizes, and externally validated AI systems.

This review is limited by heterogeneity in outcome definitions and units (RMS in µm versus mean Hausdorff distance in mm or unitless Chamfer distance), by disparate ground truths for morphology comparison, and by variability in scanner types, software presets, and operator workflows. In addition, several studies evaluated similar or identical AI platforms (e.g., Automate or Dentbird), which may introduce cross-study dependence and clustering by software. The limited number of studies per outcome and the partial overlap of platforms across domains prevented us from modeling this formally (e.g., via multilevel or software-stratified meta-regression), so some degree of over-precision cannot be excluded. No randomized controlled trials were identified, and the evidence base is, therefore, drawn exclusively from in vitro, in silico, or observational studies, which reduces the overall certainty and limits clinical generalizability. Moreover, the quality of reporting across studies was frequently suboptimal, with common omissions including lack of blinding, absence of sample-size justification, insufficient description of operator workflows, and limited or missing external validation. Meta-analyses were feasible only for the color-acceptability endpoint and selected time/fit outcomes, and publication bias could not be formally assessed. Balanced against these limitations, the work offers several strengths: a protocolized approach with explicit handling of AT₀₀ heterogeneity for color endpoints; comprehensive capture of design, morphology, fit, and functional contact outcomes across multiple platforms; dual, independent quality appraisal using design-appropriate tools; and transparent GRADE judgments that align outcome-level inferences with the underlying evidence base [18].

Overall, although the quantitative synthesis suggests that AI may shorten design workflows, improve certain geometric parameters, and achieve consistently acceptable color predictions, these findings must be interpreted within the context of an evidence base that remains in its early developmental stage. The underlying data derive largely from controlled experimental or simulation environments, and the real-world performance of these systems has not yet been evaluated through rigorously designed clinical trials. As such, the advantages observed here should be considered preliminary signals of potential benefit rather than definitive clinical effects. Future research—including prospective, well-controlled clinical investigations—is essential to determine whether these improvements translate to routine prosthodontic practice.

For practitioners and laboratories, the synthesis indicates that AI-assisted CAD can plausibly shorten design time and achieve internal fit and morphology that are at least comparable to conventional CAD in bench settings. Automated or hybrid finish-line detection appears feasible, but accuracy depends on software implementation and image quality. In esthetics, supervised models for color prediction achieved high acceptability relative to ΔE₀₀ thresholds under controlled conditions; clinical deployment should pair these tools with robust spectrophotometric protocols and a clear policy for acceptability thresholds. Given the current certainty levels, AI proposals are best adopted as decision-support tools within a supervised workflow that preserves technician and clinician oversight, routine verification of internal fit and contacts, and quality assurance steps prior to milling or delivery.

An additional consideration for future clinical translation is the inherent ‘black-box’ nature of many AI models used in both color prediction and crown-design workflows. Most deep-learning systems provide highly accurate outputs but limited insight into the decision-making process behind color estimates, margin delineation, or morphological proposals. This lack of interpretability may reduce clinician trust, particularly when unexpected shade predictions or occlusal features are generated without a clear rationale. In color prediction, opacity regarding feature weighting (e.g., background substrate, ceramic thickness, and optical parameters) may hinder error analysis when ΔE₀₀ values fall outside acceptable thresholds. In crown design, non-transparent algorithms may obscure how specific anatomical features are inferred, complicating quality control. Future research should incorporate explainable-AI (XAI) techniques—such as feature-attribution maps, uncertainty quantification, and model-agnostic interpretability frameworks—to clarify how predictions are produced and to support safer, more accountable integration of AI systems into restorative workflows.

Future studies should prioritize prospective, preferably randomized, clinical designs that compare AI-assisted and conventional CAD under standardized scanning, milling, and cementation protocols. External validation across centers, scanners, and software versions is essential, as is preregistration and a priori sample-size justification. Reporting should adopt common morphology and functional metrics with consistent units (e.g., RMS in µm; mean Hausdorff in mm), and provide per-prediction ΔE₀₀ distributions so that acceptability thresholds can be harmonized in meta-analyses. Dynamic occlusion under load, proximal contact integrity, marginal adaptation, and restoration longevity should be measured alongside efficiency outcomes and patient-reported esthetics. Open datasets and code, cross-platform benchmarks, and formal cost-effectiveness analyses will accelerate transparent translation to the clinic.

5. Conclusions

Artificial intelligence applied to CAD/CAM dentistry shows practical promise while requiring cautious clinical translation. Across comparative investigations, AI-equipped CAD consistently shortened crown design time relative to conventional workflows, and internal fit was generally comparable or modestly improved in bench settings. Automated or hybrid finish-line detection demonstrated sub-millimeter accuracy under curated conditions, and machine-learning models for color prediction yielded high proportions of ΔE₀₀ values within clinically acceptable thresholds when evaluated against each study’s prespecified AT₀₀. Morphology fidelity was frequently comparable to or better than technician/conventional CAD for selected surfaces, and occlusal and proximal contacts were generally established, although superiority over technician-directed CAD was not consistent across regions or software.

The strength of inference is tempered by study-level limitations and outcome-level uncertainty. Most evidence derives from in vitro or in silico designs with small samples, heterogeneous metrics, and non-standardized units and regions of interest, which precluded global pooling for morphology and functional contacts. These factors, together with variable reference standards and limited blinding and calibration reporting, support cautious interpretation of effect sizes and generalizability.

In practice, the most defensible near-term role for AI is as a decision-support adjunct: accelerating design workflows, providing structured assistance for finish-line delineation, and informing shade planning within explicit AT₀₀ policies and robust spectrophotometric protocols, while preserving technician/clinician oversight for internal fit, morphology, and functional contacts. Continued progress will depend on standardized outcome definitions and units, externally validated clinical studies under routine workflows, and transparent reporting that enables reliable synthesis and adoption.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/prosthesis7060160/s1, Table S1. Complete search strategies for each database. Table S2. Full-text articles excluded at the eligibility stage with reasons. Table S3. Data extraction. Table S4. Median to mean conversions. Table S5. Measurement metrics. Supplementary File S1. Data extraction. Supplementary File S2. Meta-analysis scripts. Figure S1. Original thresholds.

Author Contributions

C.M.A. performed the conceptualization, data curation, data analysis, manuscript writing, and revision of the manuscript; D.M.P.-M. performed the data curation, data analysis, and revision of the manuscript; E.P.-V. performed the data curation, data analysis, and revision of the manuscript; A.M.V.-B. performed the data curation, data analysis, and revision of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Miyazaki, T.; Hotta, Y.; Kunii, J.; Kuriyama, S.; Tamaki, Y. A review of dental CAD/CAM: Current status and future perspectives from 20 years of experience. Dent. Mater. J. 2009, 28, 44–56. [Google Scholar] [CrossRef]
Miyazaki, T.; Nakamura, T.; Matsumura, H.; Ban, S.; Kobayashi, T. Current status of zirconia restoration. J. Prosthodont. Res. 2013, 57, 236–261. [Google Scholar] [CrossRef]
Rexhepi, I.; Santilli, M.; D’Addazio, G.; Tafuri, G.; Manciocchi, E.; Caputi, S.; Sinjari, B. Clinical Applications and Mechanical Properties of CAD-CAM Materials in Restorative and Prosthetic Dentistry: A Systematic Review. J. Funct. Biomater. 2023, 14, 431. [Google Scholar] [CrossRef]
Saravi, B.; Vollmer, A.; Hartmann, M.; Lang, G.; Kohal, R.J.; Boeker, M.; Patzelt, S.B.M. Clinical Performance of CAD/CAM All-Ceramic Tooth-Supported Fixed Dental Prostheses: A Systematic Review and Meta-Analysis. Materials 2021, 14, 2672. [Google Scholar] [CrossRef] [PubMed]
Morsy, N.; Holiel, A.A. Color difference for shade determination with visual and instrumental methods: A systematic review and meta-analysis. Syst. Rev. 2023, 12, 95. [Google Scholar] [CrossRef] [PubMed]
Cho, J.H.; Çakmak, G.; Yi, Y.; Yoon, H.I.; Yilmaz, B.; Schimmel, M. Tooth morphology, internal fit, occlusion and proximal contacts of dental crowns designed by deep learning-based dental software: A comparative study. J. Dent. 2024, 141, 104830. [Google Scholar] [CrossRef] [PubMed]
Kelly, J.R.; Benetti, P. Ceramic materials in dentistry: Historical evolution and current practice. Aust. Dent. J. 2011, 56 (Suppl. S1), 84–96. [Google Scholar] [CrossRef] [PubMed]
Heffernan, M.J.; Aquilino, S.A.; Diaz-Arnold, A.M.; Haselton, D.R.; Stanford, C.M.; Vargas, M.A. Relative translucency of six all-ceramic systems. Part II: Core and veneer materials. J. Prosthet. Dent. 2002, 88, 10–15. [Google Scholar] [CrossRef]
Kose, C., Jr.; Oliveira, D.; Pereira, P.N.R.; Rocha, M.G. Using artificial intelligence to predict the final color of leucite-reinforced ceramic restorations. J. Esthet. Restor. Dent. 2023, 35, 105–115. [Google Scholar] [CrossRef]
Mascaro, B.A.; Conejo, R.V.; Tejada-Casado, M.; Fonseca, R.G.; Reis, J.M.D.S.N.; Pérez, M.M. Machine learning regression models for color prediction of CAD-CAM materials against different tooth-colored backgrounds. J. Dent. 2025, 161, 105994. [Google Scholar] [CrossRef]
Yang, J.; Hao, Z.; Xu, J.; Wang, J.; Jiang, X. Fusion machine learning model predicts CAD-CAM ceramic colors and the corresponding minimal thicknesses over various clinical backgrounds. Dent. Mater. 2024, 40, 285–296. [Google Scholar] [CrossRef]
Ding, H.; Cui, Z.; Maghami, E.; Chen, Y.; Matinlinna, J.P.; Pow, E.H.N.; Fok, A.S.L.; Burrow, M.F.; Wang, W.; Tsoi, J.K.H. Morphology and mechanical performance of dental crown designed by 3D-DCGAN. Dent. Mater. 2023, 39, 320–332. [Google Scholar] [CrossRef]
Wu, Z.; Zhang, C.; Ye, X.; Dai, Y.; Zhao, J.; Zhao, W.; Zheng, Y. Comparison of the Efficacy of Artificial Intelligence-Powered Software in Crown Design: An In Vitro Study. Int. Dent. J. 2025, 75, 127–134. [Google Scholar] [CrossRef]
Choi, J.; Ahn, J.; Park, J.M. Deep learning-based automated detection of the dental crown finish line: An accuracy study. J. Prosthet. Dent. 2024, 132, 1286.e1–1286.e9. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Barker, T.H.; Stone, J.C.; Sears, K.; Klugar, M.; Leonardi-Bee, J.; Tufanaru, C.; Aromataris, E.; Munn, Z. Revising the JBI quantitative critical appraisal tools to improve their applicability: An overview of methods and the development process. JBI Evid. Synth. 2023, 21, 478–493. [Google Scholar] [CrossRef] [PubMed]
Sterne, J.A.; Hernán, M.A.; Reeves, B.C.; Savović, J.; Berkman, N.D.; Viswanathan, M.; Henry, D.; Altman, D.G.; Ansari, M.T.; Boutron, I.; et al. ROBINS-I: A tool for assessing risk of bias in non-randomised studies of interventions. BMJ 2016, 355, i4919. [Google Scholar] [CrossRef]
Guyatt, G.H.; Oxman, A.D.; Vist, G.E.; Kunz, R.; Falck-Ytter, Y.; Alonso-Coello, P.; Schünemann, H.J.; GRADE Working Group. GRADE: An emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008, 336, 924–926. [Google Scholar] [CrossRef] [PubMed]
Sawangsri, K.; Bekkali, M.; Lutz, N.; Alrashed, S.; Hsieh, Y.L.; Lai, Y.C.; Arreaza, C.; Nassani, L.M.; Hammoudeh, H.S. Acceptability and deviation of finish line detection and restoration contour design in single-unit crown: Comparative evaluation between 2 AI-based CAD software programs and dental laboratory technicians. J. Prosthet. Dent. 2025, 134, 409–417. [Google Scholar] [CrossRef] [PubMed]
Nagata, K.; Inoue, E.; Nakashizu, T.; Seimiya, K.; Atsumi, M.; Kimoto, K.; Kuroda, S.; Hoshi, N. Verification of the accuracy and design time of crowns designed with artificial intelligence. J. Adv. Prosthodont. 2025, 17, 1–10. [Google Scholar] [CrossRef]
Broll, A.; Goldhacker, M.; Hahnel, S.; Rosentritt, M. Morphological effects of input data quantity in AI-powered dental crown design. J. Dent. 2025, 159, 105767. [Google Scholar] [CrossRef]
Çakmak, G.; Cho, J.H.; Choi, J.; Yoon, H.I.; Yilmaz, B.; Schimmel, M. Can deep learning-designed anterior tooth-borne crown fulfill morphologic, aesthetic, and functional criteria in clinical practice? J. Dent. 2024, 150, 105368. [Google Scholar] [CrossRef]
Chen, Y.; Lee, J.K.Y.; Kwong, G.; Pow, E.H.N.; Tsoi, J.K.H. Morphology and fracture behavior of lithium disilicate dental crowns designed by human and knowledge-based AI. J. Mech. Behav. Biomed. Mater. 2022, 131, 105256. [Google Scholar] [CrossRef]
Chau, R.C.W.; Hsung, R.T.; McGrath, C.; Pow, E.H.N.; Lam, W.Y.H. Accuracy of artificial intelligence-designed single-molar dental prostheses: A feasibility study. J. Prosthet. Dent. 2024, 131, 1111–1117. [Google Scholar] [CrossRef]
Cho, J.H.; Yi, Y.; Choi, J.; Ahn, J.; Yoon, H.I.; Yilmaz, B. Time efficiency, occlusal morphology, and internal fit of anatomic contour crowns designed by dental software powered by generative adversarial network: A comparative study. J. Dent. 2023, 138, 104739. [Google Scholar] [CrossRef] [PubMed]
Tian, S.; Wang, M.; Dai, N.; Ma, H.; Li, L.; Fiorenza, L.; Sun, Y.; Li, Y. DCPR-GAN: Dental Crown Prosthesis Restoration Using Two-Stage Generative Adversarial Networks. IEEE J. Biomed. Health Inform. 2022, 26, 151–160. [Google Scholar] [CrossRef]
Shim, J.S.; Lee, J.S.; Lee, J.Y.; Choi, Y.J.; Shin, S.W.; Ryu, J.J. Effect of software version and parameter settings on the marginal and internal adaptation of crowns fabricated with the CAD/CAM system. J. Appl. Oral Sci. 2015, 23, 515–522. [Google Scholar] [CrossRef] [PubMed]
Tribst, J.P.M.; Hosseini, F.; Pilecco, R.O.; Serrano, C.M.; Kleverlaan, C.J.; Dal Piva, A.M.O. The Influence of Extra-Fine Milling Protocol on the Internal Fit of CAD/CAM Composite and Ceramic Crowns. Materials 2024, 17, 5601. [Google Scholar] [CrossRef]
Farag, E.A.A.; Rizk, A.; Ashraf, R.; Emad Eldin, F. Effect of the scanner type on the marginal gap and internal fit of two monolithic CAD/CAM esthetic crown materials: An in vitro study. Dent. Med. Probl. 2024. ahead of print. [Google Scholar] [CrossRef]
Litzenburger, A.P.; Hickel, R.; Richter, M.J.; Mehl, A.C.; Probst, F.A. Fully automatic CAD design of the occlusal morphology of partial crowns compared to dental technicians’ design. Clin. Oral Investig. 2013, 17, 491–496. [Google Scholar] [CrossRef] [PubMed]
Uribe-Hernández, L.; Latorre-Correa, F.; Perea-Lowery, L.; Ardila, C.M. Effect of preheating and curing lamp distance on the degree of conversion of four nanohybrid resins: An in vitro study. J. Clin. Exp. Dent. 2024, 16, e975–e983. [Google Scholar] [CrossRef] [PubMed]

Figure 1. PRISMA 2020 flow diagram.

Figure 2. Evidence map of outcomes and clinical thresholds reported across included studies. A yellow cell indicates that the domain was quantitatively reported in that study, while dark violet indicates no reporting. The rightmost column identifies whether an explicit clinical threshold was stated (e.g., ΔE/ΔE₀₀ acceptability, ±50 µm morphology tolerance, ≤120 µm marginal fit, predefined finish-line limits) [6,9,10,11,12,13,14,19,20,21,22,23,24,25,26].

Figure 3. Study setting breakdown by domain (in vitro, in vivo, and in silico). The stacked bars summarize the number of included studies in each domain (color prediction vs. crown design) and setting (in vitro, in vivo, and in silico). The figure highlights the predominance of laboratory evaluations in crown design and that, for color prediction, the current evidence is exclusively in vitro.

Figure 4. Forest plot of design time (primary analysis). Forest plot showing mean differences (AI–control), negative values favor AI [13,25]. Color coding: light blue and yellow squares = individual study estimates; blue line = pooled confidence interval; dark blue diamond = pooled effect; yellow dashed line = line of no difference (MD = 0).

Figure 5. Forest plot of design time (sensitivity including Nagata et al.). Forest plot showing pooled random effects estimate for mean differences in design time (seconds) [13,20,25]. Color coding: blue = Cho et al.; yellow = Wu et al.; orange = Nagata et al.; purple diamond = pooled effect; yellow dashed line = null effect (MD = 0 s).

Figure 6. Forest plot of internal fit (RMS, µm). Forest plot showing mean differences in internal gap (AI–Control), negative values favor AI. Fixed-effect pooled estimate shown (solid diamond) with random effects as sensitivity [6,25].

Figure 7. Forest plot of finish-line accuracy (Hausdorff distance, mm). Squares indicate study means with 95% CIs; the diamond shows the random-effects pooled mean [14,19].

Figure 8. Forest plot of color-prediction acceptability (CIEDE2000 threshold). Forest plot showing the pooled random-effects proportion of predictions within the acceptability threshold [9,10].

Figure 9. Forest plot of the proportion of AI-generated color predictions falling within the unified CIEDE2000 acceptability threshold (AT₀₀ = 1.80). The error bars represent 95% confidence intervals. A pooled estimate is displayed as a vertical line [9,10].

Figure 10. The direction of effect for crown morphology deviation across the studies. Each point represents one study, positioned according to its primary conclusion: AI lower deviation, similar, AI higher deviation, or benchmark/feasibility (supports fidelity). The map emphasizes a consistent pattern—AI designs generally achieve equal or lower 3D deviation than conventional workflows [6,12,20,21,22,23,24,25].

Table 1. Characteristics of included studies.

Author	Domain	Setting	Sample Size	AI Method	Comparator	Outcomes	Thresholds
Cho et al. (2024) [6]	Design	In vivo digital datasets; 30 partial-arch scans of prepared posterior teeth (16 maxillary, 14 mandibular) from a single dental laboratory; STL files; IRB-exempt.	30, 16, 14	Two deep-learning–based commercial CAD tools: Automate (3Shape; AA) and Dentbird Crown (Imagoworks; AD); AD employs GAN/CNN per prior work; designs auto-generated without technician intervention.	Technician-designed CAD comparator (Dental System, 3Shape; NC) by an experienced technician.	3D morphology deviation (RMS, positive/negative averages; % within ±50 µm), internal fit RMS (µm), margin-line location RMS (µm), cusp angle (°), occlusal contact count and intensity, proximal contact intensity.	Applied criteria/tolerances: ±50 µm for morphology; ±40 µm for internal fit (cement space); occlusal contact defined as intermesh distance < 20 µm; proximal contact preset −20 µm with intensity classes.
Kose et al. (2023) [9]	Color	In vitro (CAD/CAM leucite-reinforced ceramics; IPS Empress CAD HT/LT A1/A3; thicknesses: 0.3–1.2 mm; substrates: black, A1, A3, white)	1, 3, 0, 3, 1, 2, 1, 3	Multiple machine-learning regressors to predict L, a, b*, and ΔE₀₀.	Spectrophotometric reference.	ΔE₀₀ prediction error (and ΔL/Δa/Δb), MAE, R²	ΔE₀₀ acceptability threshold = 1.77.
Mascaro et al. (2025) [10]	Color	In vitro (four CAD/CAM materials—Lava Ultimate, Grandio Blocs, VITA Enamic, VITA Mark II; thicknesses 0.5–1.5 mm with n = 3 per material/thickness; measured against white, black, and nine tooth-colored ND1–ND9 backgrounds using a spectroradiometer).	0, 5, 1, 5, 3, 1, 9	Partial Least Squares (PLS) regression to predict L, a, and b* from material, thickness, and background.	Spectroradiometric ground truth (predicted vs. measured CIE Lab*).	ΔE₀₀ between predicted and measured; RMSE for L, a, b* (e.g., overall mean ΔE₀₀ = 1.04 in approach 1; by-material mean ≈ 0.97: LU 1.01, GB 1.05, VE 1.10, VM 0.71; best RMSE for a* ≈ 0.14 in approach 3).	ΔE₀₀ thresholds applied: PT00 = 0.80; AT₀₀ = 1.81.
Yang et al. (2024) [11]	Color	In vitro (4 CAD/CAM materials: IPS e.max CAD, IPS e.max ZirCAD, Upcera Li CAD, Upcera TT CAD; thicknesses: 0.5/1.0/2.0 mm; n = 10 per thickness → 120 specimens; 7 backgrounds: A1, A2, A3.5, ND2, ND7, cobalt–chromium [CC], medium-precious alloy [MPA]).	4, 0, 5, 1, 0, 2, 0, 10, 120, 7, 1, 2, 3, 5, 2, 7	Fusion/stacking ML model combining ExtraTreesRegressor and XGBRegressor; SHAP used for feature importance/explainability.	Spectrophotometric ground truth (CIELab measured with a digital spectrophotometer; predictions benchmarked to measured values).	ΔE (CIE76) and ΔL prediction performance (R² and RMSE); external-test performance of fusion model for ΔE: R² ≈ 0.906, RMSE ≈ 0.348; minimal thickness estimates per background to meet targets.	ΔE perceptibility = 2.6, acceptability = 5.5; ΔL threshold = 0.5 (used to derive minimal-thickness recommendations.
Ding et al. (2023) [12]	Design	In silico + experimental: training on 600 full-arch digital casts from healthy subjects (intraoral scans; teeth 44–46); independent test set of 12 mandibular second-premolar cases.	600, 44, 46, 12	3D-DCGAN (true 3D deep-convolutional GAN) implemented in PyTorch for automated crown design.	Natural tooth (NT), CEREC Biogeneric individual design (BI), and technician CAD design (TD).	3D morphology discrepancy (RMS, mean positive/negative deviation)—e.g., NT vs. AI RMS = 0.3611 (0.1160); cusp angle (means: NT 54.05°, AI 49.43°, BI 67.11°, TD 63.34°); occlusal contact number/area using virtual articulating papers (100 µm, 200 µm); dynamic FEA (max principal and shear stresses; fatigue lifetime under 100–400 N).	None explicitly; operational parameters included virtual articulating paper thicknesses (100/200 µm) and FEA loading range of 100–400 N.
Wu et al. (2025) [13]	Design	In vitro (33 intraoral scan datasets of posterior crowns; reference built from clinically adapted crowns; TRIOS 3 scans; STL files).	33, 3	Two AI-powered design programs: Automate (3Shape; AA) and Dentbird Crown (Imagoworks; AD—reports using GAN/CNN); fully automated crown generation.	Exocad DentalCAD by experienced technician (CE) and novice technician (CN); Std crown set as reference.	Design time (seconds, median [IQR]); morphological accuracy as RMS deviation (mm) for occlusal, mesial, distal surfaces and margin lines.	Applied criteria/tolerances: deviation map range ±500 µm with “green” tolerance ±50 µm; occlusal contact defined as intermesh distance < 20 µm; Std marginal fit noted as <120 µm.
Choi et al. (2024) [14]	Design (finish-line)	Retrospective dataset of 182 jaw scans; evaluation on two sets—desktop-scanner trimmed casts (DS: 58 anterior, 83 posterior) and intraoral scans (IS: 35 anterior, 58 posterior).	182, 58, 83, 35, 58	Hybrid deep learning + CAD (Dentbird): CNN preparation-tooth detector (CenterNet with Stacked Hourglass), UneXt encoder–decoder finish-line segmentor, spline curve-on-mesh.	3Shape Dental System 2021-1, exocad DentalCAD 3.1, MEDIT Margin Lines 1.0.	Hausdorff distance (mm) and Chamfer distance (a.u.) with mean ± SD by group (e.g., IS posterior HD means—Dentbird 0.543, Dental System 0.712, DentalCAD 0.635, Medit 0.694); counts of clinically acceptable preparations.	Clinical thresholds applied: DS—HD ≤ 0.366 mm, CD ≤ 0.026; IS—HD ≤ 0.566 mm, CD ≤ 0.100.
Sawangsri et al. (2025) [19]	Design (finish-line detection and restoration contour)	In vitro/retrospective convenience sample; 100 single-crown abutment scans with antagonists (TRIOS 4) replicated 3× and assigned to dental technicians (DT), Dentbird (DB), and Automate (AM)	100, 4, 3	Two fully automated AI-based CAD programs: Dentbird Crown 4.1.1 (hybrid deep learning + CAD) and Automate 1.02 (proprietary AI CAD) for autonomous finish-line detection and crown design.	Four Certified Dental Technicians using 3Shape Dental Manager (manual margin registration; library of choice) as the reference designs.	Acceptability scores (finish- line; 8-criterion restoration design), Hausdorff distance (mm) for finish-line deviation, and RMS error (mm) for restoration design; key results—AM lower deviation than DB (overall HD 0.132 ± 0.057 vs. 0.380 ± 0.431 mm; overall RMS 0.195 ± 0.059 vs. 0.253 ± 0.060 mm); DT highest acceptability.	Clinical thresholds/criteria: authors propose an acceptability threshold of ~130 µm for AI-generated finish-line deviation based on study findings; design parameters included occlusal clearance 0.00 mm, interproximal distance −0.03 mm, cement space 0.03 mm.
Nagata et al. (2025) [20]	Design	In vitro model study: master models for #15 (maxillary 2nd premolar) and #26 (maxillary 1st molar); TRIOS 3 intraoral scanning to STL; five dental technicians each designed one crown per tooth with both systems (total 20 crowns); milled from CAD-CAM blocks	15, 2, 26, 1, 3, 20	AI-equipped CAD (Dentbird; commercial deep-learning-assisted platform with automatic abutment/margin detection and auto crown design); conventional CAD: 3Shape Dental System.	Conventional CAD (3Shape Dental System) vs. AI-equipped CAD (Dentbird), both used by the same five technicians.	Occlusal-surface “accuracy” (RMS-type misfit across points a–f)—means (±SD) overall: conventional 275.5 ± 116.8 µm vs. AI 25.7 ± 13.0 µm; design time (min): #15—3Shape 397.2 ± 80.4 vs. Dentbird 99.4 ± 17.1; #26—3Shape 516.4 ± 61.3 vs. Dentbird 97.6 ± 11.1; proximal contact intensity distribution; marginal fit by micro-CT (e.g., #15 sites 52–72 µm; #26 sites 60–76 µm; no significant differences between systems).	Clinical thresholds/criteria: desirable marginal fit ≤ 120 µm cited from the prior literature; results for both systems remained <120 µm.
Broll et al. (2025) [21]	Design	Retrospective in vitro; n = 30 patients (11M/19F, age 22–31); CEREC PrimeScan intraoral scans; target lower first molar (#36/46); input data groups: full jaw (full), quadrant (quad), adjacent teeth (adj); antagonists included.	30, 11, 19, 22, 31, 36, 46	Commercial AI-based CAD: Dentbird Crown (Imagoworks Inc.); authors note a 2D deep-learning approach underlying the software.	Original tooth (ground truth) as reference; comparisons between full vs. quad vs. adj data quantities using the same software.	Morphology—Chamfer Distance (L2), complemented IoU; occlusion-specific—penetration loss, contact-point distance loss, contact-point position loss (Chamfer on CPs), contact-point number loss; failure rates for reconstruction/occlusion establishment reported.	None explicit; computational thresholds specified (e.g., occlusion threshold d_thresh = 0 mm; cluster convergence ε_conv = 0.45 mm; GT extraction ε = 1 × 10⁻⁶ mm); lower values indicate better performance for normalized metrics.
Çakmak et al. (2024) [22]	Design (clinical; anterior crown)	In vivo datasets: 25 complete/nearly complete arch scans with a prepared maxillary central incisor abutment; STL files; retrospective lab collection with recorded maxillomandibular relations.	25	Deep learning–based CAD (Dentbird Crown; GAN + CNN): as-generated output (DB) and technician-modified output (DM) versus conventional CAD without DL (NC).	DL as-generated (DB), DL technician-modified (DM), and technician-designed conventional CAD (NC).	Morphology—TDV (mm³), TDV ratio (%), linear deviation (RMS, PA, NA) by regions; function—incisal path deviation (RMS µm), length, mean inclination; aesthetics—width, height, width/height ratio, mesioincisal angle radius, proximal contact length, tooth axis.	Applied criteria/tolerances: operational design presets as above; no explicit external clinical acceptability thresholds reported.
Chen et al. (2022) [23]	Crown design (occlusal morphology and fracture behavior)	In vivo-derived datasets; 12 participants with mandibular premolar #45; lithium–disilicate crowns fabricated on 3D-printed casts	12, 45, 3	Knowledge-based AI (CEREC “biogeneric individual”, BI) vs. human CAD designs (experienced technician, TD; trained dental students, AD).	Human-designed CAD (TD, AD); original tooth used as morphology reference.	Occlusal profile discrepancy (avg ± SD), RMS_estimate, z-difference, volume/area discrepancy, cusp angle; load-to-fracture (N) and failure modes.	Clinical thresholds: fracture loads judged “clinically acceptable”; no explicit numerical fit tolerance reported.
Chau et al. (2024) [24]	Design	In vivo-derived casts and digital workflow; 169 participants recruited; 159 training casts and 10 validation pairs (maxillary right first molar removed/retained); evaluation on 10 × 10 comparisons.	169, 159, 10, 10, 10	3D generative adversarial network (GAN) system to generate a biomimetic single-molar prosthesis from the remaining dentition.	The subject’s original natural tooth (ground truth) for morphology matching.	Mean Hausdorff distance (mm) and Intersection over Union (IoU) for “true reconstruction”; validation results—mean HD across matched pairs = 0.633 ± 0.961 mm; IoU = 0.600 (6/10 true reconstructions).	Clinical thresholds: none explicitly stated; authors note 0.500 (50%) as a commonly accepted IoU threshold in the segmentation literature; morphological error interpreted via HD (lower is better).
Cho et al. (2023) [25]	Design	In vivo-derived digital datasets; 30 partial-arch scans of prepared posterior abutments (12 intraoral, 18 cast) with jaw-relation records; retrospective single-lab STL collection; IRB-exempt.	30, 12, 18	GAN-based deep-learning dental design software (Dentbird Crown): CNN modules for prepared-tooth detection and margin-line segmentation; StyleGAN-based occlusal-surface generator; inner-surface generator for intaglio.	Conventional CAD (3Shape Dental System, non-AI) designed by an experienced technician (use of Auto Crown/Auto Placement per manufacturer).	Design time (T1–T6); 3D occlusal morphology deviation (RMS, µm) between initial and final CAD; internal fit RMS (µm) to abutment; finish-line deviation (µm).	Applied criteria/tolerances: deviation maps with nominal ±50 µm for morphology and ±40 µm for internal fit (reflecting 40-µm cement space); no explicit external clinical acceptability threshold.
Tian et al. (2022) [26]	Design	In silico (large 3D dental crown database of mandibular first molars #36/#46; depth-map representation; 780 samples total; 700 for training, 80 for testing).	3, 36, 46, 780, 700, 80	Two-stage GAN (DCPR-GAN): Stage-I conditional GAN for global morphology; Stage-II adds occlusal fingerprint constraint and GroNet groove-parsing loss; TensorFlow implementation.	Ground-truth target crowns and DL baselines.	PSNR, RMSE, SSIM, FSIM; deviation (RMS, SD) between generated occlusal surface and target crown < 0.161 mm; deviation boxplots and runtime (~11–15 s).	None explicit (computational similarity metrics; functional target enforced via occlusal fingerprint constraint).

AA: Automate (3Shape)—commercial AI-based crown design software; AD: context-dependent— (1) Dentbird Crown (Imagoworks) in design-software studies; and (2) advanced/trained dental students in Chen 2022; AM: Automate; AT₀₀: acceptability threshold for CIEDE2000 color difference; BI: Biogeneric Individual (knowledge-based CAD mode in CEREC); CAD: computer-aided design; CC: cobalt–chromium; CD: Chamfer Distance (geometry similarity metric); CE: an experienced CAD operator/technician; CIELab: CIE L*, a*, and b* color space; CN: a novice CAD operator/technician; CNN: convolutional neural network; CV: cross-validation; DB: Dentbird Crown (Imagoworks)—commercial AI-based crown design software; DCPR-GAN: Dental Crown Prosthesis Restoration GAN (two-stage generative model); DL: deep learning; DT: dental technician(s); ΔE (CIE76): the color difference under the 1976 formula; ΔE₀₀ (CIEDE2000): the color difference under the 2000 formula; FEA: finite element analysis; FSIM: Feature SIMilarity index; GAN: generative adversarial network; HD: Hausdorff Distance; HPC: high-performance computing; IoU: Intersection over Union; IQR: interquartile range; IRB: Institutional Review Board; LOO-CV: leave-one-out cross-validation; MAE: mean absolute error; MPA: medium-precious alloy; NC: non-AI conventional CAD (technician-designed); ND1–ND9: tooth-colored background set; PA/NA: Positive Average/Negative Average surface deviation; PLS: partial least squares regression; PSNR: peak signal-to-noise ratio; R²: coefficient of determination; RMS: root-mean-square deviation; RMSE: root mean squared error; SHAP: SHapley Additive exPlanations; SSIM: Structural SIMilarity index; Std: standard reference crown/model; STL: stereolithography file format; TD: technician-designed CAD; TDV: Total Deviation Volume; XGBoost/XGBRegressor: Extreme Gradient Boosting; 3D-DCGAN: 3D deep-convolutional GAN.

Table 2. Risk of bias by study (tool per design).

Study	Setting and Study Design	Tool	Overall RoB	Key Concerns
Mascaro et al. [10]	In vitro color-prediction vs. spectrophotometry; AT₀₀ = 1.81.	JBI	Some concerns	Small n per condition; blinding of measurements unclear; internal cross-validation.
Kose et al. [9]	In vitro ML regression for leucite-reinforced ceramics; spectrophotometer reference.	JBI	Some concerns	Specimen preparation/replicate independence unclear; no preregistration.
Yang et al. [11]	In vitro fusion ML for color/thickness across backgrounds; external test set.	JBI	Some concerns	Indirectness to clinical performance; assessor blinding not explicit.
Broll et al. [21]	In vitro retrospective dataset assessing morphology/occlusion from AI-CAD with varying input.	JBI	Some concerns	Convenience sample; custom metrics; analytical flexibility.
Sawangsri et al. [19]	In vitro comparison of two AI-CAD vs. technicians for finish-line detection and restoration design.	JBI	Some concerns	Potential conflicts of interest; qualitative and quantitative mix; assessor blinding unclear.
Nagata et al. [20]	In vitro master models; AI-equipped vs. conventional CAD; accuracy and design time.	JBI	Low–some concerns	Standardized protocol; generalizability limited; learning-curve effects.
Cho et al. [6]	In vitro partial-arch scans; DL-CAD vs. technician; morphology, internal fit, occlusion, proximal contacts.	JBI	Some concerns	Multiple outcomes; ROI/threshold choices; blinding not explicit.
Çakmak et al. [22]	In vivo nonrandomized comparison; morphology and functional outcomes.	ROBINS-I	Moderate	Confounding and selection; outcome assessment without blinding; few missing data.
Chen et al. [23]	In vivo-derived designs (12 participants) compared across knowledge-based AI vs. human CAD; ex vivo tests.	ROBINS-I	Moderate	Nonrandomized; blinding unclear; small sample.
Chau et al. [24]	In vitro; 3D-GAN evaluation with Hausdorff/IoU.	JBI	Some concerns	Validation n ≈ 10; best-fit alignment; non-clinical thresholds.
Choi et al. [14]	In silico/retrospective; DL for finish-line detection with proposed HD/Chamfer thresholds.	JBI	Some concerns	Convenience sampling; outlier removal rules; generalizability.
Tian et al. [26]	In silico two-stage DCPR-GAN; large bench dataset.	JBI	Some concerns	Validation on bench data; indirect clinical transferability.
Ding et al. [12]	In vitro/in silico 3D-DCGAN with mechanical/FE analysis.	JBI	Some concerns	Small test set; multiple endpoints; blinding not reported.
Wu et al. [13]	In vitro; two AI-CAD vs. expert/novice CAD; surface-wise accuracy.	JBI	Some concerns	Nonrandomized; per-surface metrics; mixed unit reporting.
Cho et al. [25]	In vivo clinical datasets; AI-CAD vs. conventional CAD.	ROBINS-I	Moderate	Nonrandomized design; possible operator bias; small samples.

Table 3. Summary of findings (GRADE) by outcome.

Outcome	Evidence Base (k, Setting)	Summary of Findings	Main Limitations (GRADE Domains)	Certainty
Color-prediction acceptability (ΔE₀₀ within AT₀₀)	k = 2 in vitro ML (Kose et al. AT₀₀ = 1.77 [9]; Mascaro et al. AT₀₀ = 1.81 [10])	High proportion of predictions within each study’s prespecified acceptability threshold.	Indirectness (bench → clinical), imprecision (low k/n), threshold heterogeneity (1.77 vs. 1.81).	Low
Internal fit	k = 2 (Cho et al. 2023 [25]; Nagata et al. 2025 [20])	AI-assisted CAD showed better internal fit than conventional CAD in both studies.	Imprecision (small samples), indirectness (lab/retrospective), possible inconsistency across systems.	Very low–Low
Morphology deviation	k ≥ 7 (Chau et al. 2024 [24]; Tian et al. 2022 [26]; Ding et al. 2023 [12]; Cho et al. 2024 [6]; Nagata et al. 2025 [20]; Broll et al. 2025 [21]; Wu et al. 2025 [13])	AI often matches or exceeds comparators on RMS/HD, but effects depend on metric/ROI/software.	Serious inconsistency (RMS µm vs. HD/Chamfer mm; custom indices), indirectness, imprecision.	Very low
Finish-line detection/restoration design	k = 2 (Choi et al. [14] 2024; Sawangsri et al. 2025 [19])	Hybrid DL and CAD comparable or better; software-specific differences; thresholds proposed for HD and Chamfer.	Single-study per subtasks, potential COI, indirectness.	Very low–Low
Design time	k = 2 (Cho et al. 2023 [25]; Nagata et al. 2025 [20])	AI-equipped CAD reduced design time compared with conventional CAD.	Imprecision (low k), indirectness.	Low
Functional contacts (occlusal and proximal)	k = 3 (Ding et al. 2023 [12]; Broll et al. 2025 [21]; Cho et al. 2024 [6])	Contact counts/intensity varied; AI generally comparable, not consistently superior.	Heterogeneous definitions/units, low k, indirectness.	Very low

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ardila, C.M.; Pulgarín-Medina, D.M.; Pineda-Vélez, E.; Vivares-Builes, A.M. Artificial Intelligence for Color Prediction and Esthetic Design in CAD/CAM Ceramic Restorations: A Systematic Review and Meta-Analyses. Prosthesis 2025, 7, 160. https://doi.org/10.3390/prosthesis7060160

AMA Style

Ardila CM, Pulgarín-Medina DM, Pineda-Vélez E, Vivares-Builes AM. Artificial Intelligence for Color Prediction and Esthetic Design in CAD/CAM Ceramic Restorations: A Systematic Review and Meta-Analyses. Prosthesis. 2025; 7(6):160. https://doi.org/10.3390/prosthesis7060160

Chicago/Turabian Style

Ardila, Carlos M., Diana María Pulgarín-Medina, Eliana Pineda-Vélez, and Anny M. Vivares-Builes. 2025. "Artificial Intelligence for Color Prediction and Esthetic Design in CAD/CAM Ceramic Restorations: A Systematic Review and Meta-Analyses" Prosthesis 7, no. 6: 160. https://doi.org/10.3390/prosthesis7060160

APA Style

Ardila, C. M., Pulgarín-Medina, D. M., Pineda-Vélez, E., & Vivares-Builes, A. M. (2025). Artificial Intelligence for Color Prediction and Esthetic Design in CAD/CAM Ceramic Restorations: A Systematic Review and Meta-Analyses. Prosthesis, 7(6), 160. https://doi.org/10.3390/prosthesis7060160

Article Menu

Artificial Intelligence for Color Prediction and Esthetic Design in CAD/CAM Ceramic Restorations: A Systematic Review and Meta-Analyses

Abstract

1. Introduction

2. Materials and Methods

2.1. Eligibility Criteria

2.2. Exclusion Criteria

2.3. Information Sources and Search Strategy

2.4. Selection Process

2.5. Data Collection Process

2.6. Data Items

2.7. Outcome Definitions

2.8. Study Risk of Bias Assessment

2.9. Certainty of Evidence (GRADE Assessment)

2.10. Data Synthesis and Statistical Analysis

3. Results

3.1. Characteristics of Included Studies

3.2. Design Time for Crown Design

3.3. Internal Fit

3.4. Finish-Line Accuracy (Hausdorff Distance, mm)

3.5. Color-Prediction Acceptability

3.6. Morphology Deviation

3.7. Occlusal and Proximal Contacts

3.8. Study Risk of Bias Assessment

3.9. Certainty of Evidence (GRADE)

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI