Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management
Abstract
:1. Introduction
2. Materials and Methods
Statistical Analysis
3. Results
4. Discussion
- Google Gemini Advanced achieved the highest average scores, while Google Gemini and Microsoft Copilot received the lowest.
- The Kruskal–Wallis test showed no statistically significant differences in the average scores among the LLMs.
- Overall, the LLMs demonstrated varying strengths and weaknesses across the assessed criteria, with no single LLM consistently outperforming the others in all categories; however, there was a trend for Google Gemini Advanced to score higher in comprehensiveness and clarity, and a trend for Microsoft CoPilot to score lower in relevance.
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
LLM | Large Language Model |
References
- Orrù, G.; Piarulli, A.; Conversano, C.; Gemignani, A. Human-like problem-solving abilities in large language models using ChatGPT. Front. Artif. Intell. 2023, 6, 1199350. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Chen, A.; PourNejatian, N.; Shin, H.C.; Smith, K.E.; Parisien, C.; Compas, C.; Martin, C.; Costa, A.B.; Flores, M.G.; et al. A large language model for electronic health records. npj Digit. Med. 2022, 5, 194. [Google Scholar] [CrossRef]
- Tian, S.; Jin, Q.; Yeganova, L.; Lai, P.-T.; Zhu, Q.; Chen, X.; Yang, Y.; Chen, Q.; Kim, W.; Comeau, D.C.; et al. Opportunities and challenges for ChatGPT and large language models in biomedicine and health. Brief. Bioinform. 2023, 25, bbad493. [Google Scholar] [CrossRef]
- Adams, L.C.; Truhn, D.; Busch, F.; Kader, A.; Niehues, S.M.; Makowski, M.R.; Bressem, K.K. Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study. Radiology 2023, 307, e230725. [Google Scholar] [CrossRef]
- Jiang, L.Y.; Liu, X.C.; Pour Nejatian, N.; Nasir-Moin, M.; Wang, D.; Abidin, A.; Eaton, K.; Riina, H.A.; Laufer, I.; Punjabi, P.; et al. Health system-scale language models are all-purpose prediction engines. Nature 2023, 619, 357–362. [Google Scholar] [CrossRef] [PubMed]
- Zhou, L.-Q.; Wu, X.-L.; Huang, S.-Y.; Wu, G.-G.; Ye, H.-R.; Wei, Q.; Bao, L.-Y.; Deng, Y.-B.; Li, X.-R.; Cui, X.-W.; et al. Lymph Node Metastasis Prediction from Primary Breast Cancer US Images Using Deep Learning. Radiology 2020, 294, 19–28. [Google Scholar] [CrossRef]
- Rim, T.H.; Lee, C.J.; Tham, Y.-C.; Cheung, N.; Yu, M.; Lee, G.; Kim, Y.; Ting, D.S.W.; Chong, C.C.Y.; Choi, Y.S.; et al. Deep-learning-based cardiovascular risk stratification using coronary artery calcium scores predicted from retinal photographs. Lancet Digit. Health 2021, 3, e306–e316. [Google Scholar] [CrossRef] [PubMed]
- Schwendicke, F.; Samek, W.; Krois, J. Artificial Intelligence in Dentistry: Chances and Challenges. J. Dent. Res. 2020, 99, 769–774. [Google Scholar] [CrossRef]
- Ahmed, W.M.; Azhari, A.A.; Fawaz, K.A.; Ahmed, H.M.; Alsadah, Z.M.; Majumdar, A.; Carvalho, R.M. Artificial intelligence in the detection and classification of dental caries. J. Prosthet. Dent. 2023, 133, 1326–1332. [Google Scholar] [CrossRef]
- Li, S.; Liu, J.; Zhou, Z.; Zhou, Z.; Wu, X.; Li, Y.; Wang, S.; Liao, W.; Ying, S.; Zhao, Z. Artificial intelligence for caries and periapical periodontitis detection. J. Dent. 2022, 122, 104107. [Google Scholar] [CrossRef]
- Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Tanwani, A.; Cole-Lewis, H.; Pfohl, S.; Payne, P.; et al. Large language models encode clinical knowledge. Nature 2023, 620, 172–180. [Google Scholar] [CrossRef] [PubMed]
- Khan, B.; Fatima, H.; Qureshi, A.; Kumar, S.; Hanan, A.; Hussain, J.; Abdullah, S. Drawbacks of Artificial Intelligence and Their Potential Solutions in the Healthcare Sector. Biomed. Mater. Devices 2023, 1, 731–738. [Google Scholar] [CrossRef] [PubMed]
- Wei, Q.; Yao, Z.; Cui, Y.; Wei, B.; Jin, Z.; Xu, X. Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis. J. Biomed. Inform. 2024, 151, 104620. [Google Scholar] [CrossRef]
- Shen, Y.; Heacock, L.; Elias, J.; Hentel, K.D.; Reig, B.; Shih, G.; Moy, L. ChatGPT and Other Large Language Models Are Double-edged Swords. Radiology 2023, 307, e230163. [Google Scholar] [CrossRef] [PubMed]
- Piwowar, H.; Priem, J.; Larivière, V.; Alperin, J.P.; Matthias, L.; Norlander, B.; Farley, A.; West, J.; Haustein, S. The state of OA: A large-scale analysis of the prevalence and impact of Open Access articles. PeerJ 2018, 6, e4375. [Google Scholar] [CrossRef]
- McGrath, S.P.; Kozel, B.A.; Gracefo, S.; Sutherland, N.; Danford, C.J.; Walton, N. A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions. J. Am. Med. Inform. Assoc. 2024, 31, 2271–2283. [Google Scholar] [CrossRef]
- Nibali, L.; Zavattini, A.; Nagata, K.; Di Iorio, A.; Lin, G.H.; Needleman, I.; Donos, N. Tooth loss in molars with and without furcation involvement–a systematic review and meta-analysis. J. Clin. Periodontol. 2016, 43, 156–166. [Google Scholar] [CrossRef]
- Sanz, M.; Jepsen, K.; Eickholz, P.; Jepsen, S. Clinical concepts for regenerative therapy in furcations. Periodontology 2015, 68, 308–332. [Google Scholar] [CrossRef]
- Al-Shammari, K.F.; Kazor, C.E.; Wang, H.L. Molar root anatomy and management of furcation defects. J. Clin. Periodontol. 2001, 28, 730–740. [Google Scholar] [CrossRef]
- Jepsen, S.; Deschner, J.; Braun, A.; Schwarz, F.; Eberhard, J. Calculus removal and the prevention of its formation. Periodontology 2000 2011, 55, 167–188. [Google Scholar] [CrossRef]
- Svärdström, G.; Wennström, J.L. Furcation topography of the maxillary and mandibular first molars. J. Clin. Periodontol. 1988, 15, 271–275. [Google Scholar] [CrossRef] [PubMed]
- Loos, B.; Nylund, K.; Claffey, N.; Egelberg, J. Clinical effects of root debridement in molar and non-molar teeth. A 2-year follow-up. J. Clin. Periodontol. 1989, 16, 498–504. [Google Scholar] [CrossRef] [PubMed]
- Nordland, P.; Garrett, S.; Kiger, R.; Vanooteghem, R.; Hutchens, L.H.; Egelberg, J. The effect of plaque control and root debridement in molar teeth. J. Clin. Periodontol. 1987, 14, 231–236. [Google Scholar] [CrossRef] [PubMed]
- Graziani, F.; Gennai, S.; Karapetsa, D.; Rosini, S.; Filice, N.; Gabriele, M.; Tonetti, M. Clinical performance of access flap in the treatment of class II furcation defects. J. Clin. Periodontol. 2015, 42, 169–181. [Google Scholar] [CrossRef]
- Jepsen, S.; Gennai, S.; Hirschfeld, J.; Kalemaj, Z.; Buti, J.; Graziani, F. Regenerative surgical treatment of furcation defects: A systematic review and Bayesian network meta-analysis. J. Clin. Periodontol. 2020, 47, 352–374. [Google Scholar] [CrossRef]
- Dermata, A.; Arhakis, M.A.; Makrygiannakis, K.; Giannakopoulos, E.G.; Kaklamanos, E.G. Evaluating the evidence-based potential of six large language models in paediatric dentistry: A comparative study on generative artificial intelligence. Eur. Arch. Paediatr. Dent. 2025, 26, 527–535. [Google Scholar] [CrossRef]
- Ahmed, W.M.; Azhari, A.A.; Alfaraj, A.; Alhamadani, A.; Zhang, M.; Lu, C.T. The quality of AI-generated dental caries multiple choice questions: A comparative analysis of ChatGPT and Google Bard language models. Heliyon 2024, 10, e28198. [Google Scholar] [CrossRef]
- Jeong, H.; Han, S.S.; Yu, Y.; Kim, S.; Jeon, K.J. How well do large language model-based chatbots perform in oral and maxillofacial radiology? Dentomaxillofac. Radiol. 2024, 53, 390–395. [Google Scholar] [CrossRef]
- Makrygiannakis, M.A.; Giannakopoulos, K.; Kaklamanos, E.G. Evidence-based potential of generative artificial intelligence large language models in orthodontics: A comparative study of ChatGPT, Google Bard, and Microsoft Bing. Eur. J. Orthod. 2024, cjae017. [Google Scholar] [CrossRef]
- Tiwari, A.; Kumar, A.; Jain, S.; Dhull, K.S.; Sajjanar, A.; Puthenkandathil, R.; Paiwal, K.; Singh, R. Implications of ChatGPT in public health dentistry: A systematic review. Cureus 2023, 15, e40367. [Google Scholar] [CrossRef]
- Suárez, A.; Díaz-Flores García, V.; Algar, J.; Gómez Sánchez, M.; Llorente de Pedro, M.; Freire, Y. Unveiling the ChatGPT phenomenon: Evaluating the consistency and accuracy of endodontic question answers. Int. Endod. J. 2024, 57, 108–113. [Google Scholar] [CrossRef] [PubMed]
- Freire, Y.; Laorden, A.S.; Pérez, J.O.; Sánchez, M.G.; García, V.D.-F.; Suárez, A. ChatGPT performance in prosthodontics: Assessment of accuracy and repeatability in answer generation. J. Prosthet. Dent. 2024, 131, 659.e1–659.e6. [Google Scholar] [CrossRef] [PubMed]
- Albagieh, H.; Alzeer, Z.O.; Alasmari, O.N.; Alkadhi, A.A.; Naitah, A.N.; Almasaad, K.F.; Alshahrani, T.S.; Alshahrani, K.S.; Almahmoud, M.I. Comparing artificial intelligence and senior residents in oral lesion diagnosis: A comparative study. Cureus 2024, 16, e51584. [Google Scholar] [CrossRef] [PubMed]
- Ozden, I.; Gokyar, M.; Ozden, M.E.; SazakOvecoglu, H. Assessment of artificial intelligence applications in responding to dental trauma. Dent. Traumatol. 2024, 40, 722–729. [Google Scholar] [CrossRef]
- Chatzopoulos, G.S.; Koidou, V.P.; Tsalikis, L.; Kaklamanos, E.G. Large language models in periodontology: Assessing their performance in clinically relevant questions. J. Prosthet. Dent. 2024; in press. [Google Scholar] [CrossRef]
- Koidou, V.P.; Chatzopoulos, G.S.; Tsalikis, L.; Kaklamanos, E.G. Large language models in peri-implant disease: How well do they perform? J. Prosthet. Dent. 2025; in press. [Google Scholar] [CrossRef]
- Mohammad-Rahimi, H.; Ourang, S.A.; Pourhoseingholi, M.A.; Dianat, O.; Dummer, P.M.H.; Nosrat, A. Validity and reliability of artificial intelligence chatbots as public sources of information on endodontics. Int. Endod. J. 2023, 57, 305–314. [Google Scholar] [CrossRef]
- Sanz, M.; Herrera, D.; Kebschull, M.; Chapple, I.; Jepsen, S.; Beglundh, T.; Sculean, A.; Tonetti, M.S. Treatment of stage I–III periodontitis—The EFP S3 level clinical practice guideline. J. Clin. Periodontol. 2020, 47, 4–60. [Google Scholar] [CrossRef]
- Chatzopoulos, G.S.; Koidou, V.P.; Tsalikis, L. Local drug delivery in the treatment of furcation defects in periodontitis: A systematic review. Clin. Oral Investig. 2023, 27, 955–970. [Google Scholar] [CrossRef]
- Das, R.K.; Bharathwaj, V.V.; Sindhu, R.; Prabu, D.; Rajmohan, M.; Dhamodhar, D.; Sathiyapriya, S. Comparative analysis of various forms of local drug delivery systems on a class 2 furcation: A systematic review. J. Pharm. Bioallied Sci. 2023, 15, S742–S746. [Google Scholar] [CrossRef]
- Nibali, L.; Buti, J.; Barbato, L.; Cairo, F.; Graziani, F.; Jepsen, S. Adjunctive effect of systemic antibiotics in regenerative/reconstructive periodontal surgery: A systematic review with meta-analysis. Antibiotics 2021, 11, 8. [Google Scholar] [CrossRef]
- Chiou, L.L.; Herron, B.; Lim, G.; Hamada, Y. The effect of systemic antibiotics on periodontal regeneration: A systematic review and meta-analysis of randomized controlled trials. Quintessence Int. 2023, 54, 210–219. [Google Scholar] [CrossRef] [PubMed]
- Choi, I.G.G.; Cortes, A.R.G.; Arita, E.S.; Georgetti, M.A.P. Comparison of conventional imaging techniques and CBCT for periodontal evaluation: A systematic review. Imaging Sci. Dent. 2018, 48, 79–86. [Google Scholar] [CrossRef] [PubMed]
- Walter, C.; Schmidt, J.C.; Rinne, C.A.; Mendes, S.; Dula, K.; Sculean, A. Cone beam computed tomography (CBCT) for diagnosis and treatment planning in periodontology: Systematic review update. Clin. Oral Investig. 2020, 24, 2943–2958. [Google Scholar] [CrossRef]
- Assiri, H.; Dawasaz, A.A.; Alahmari, A.; Asiri, Z. Cone beam computed tomography (CBCT) in periodontal diseases: A systematic review based on the efficacy model. BMC Oral Health 2020, 20, 191. [Google Scholar] [CrossRef] [PubMed]
- Jolivet, G.; Huck, O.; Petit, C. Evaluation of furcation involvement with diagnostic imaging methods: A systematic review. Dentomaxillofac. Radiol. 2022, 51, 20210529. [Google Scholar] [CrossRef]
- Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates: Hillsdale, NJ, USA, 1988. [Google Scholar]
- Hinkle, D.E.; Wiersma, W.; Jurs, S.G. Applied Statistics for the Behavioral Sciences, 5th ed.; Houghton Mifflin: Boston, MA, USA, 2003. [Google Scholar]
- Balel, Y. Can ChatGPT be used in oral and maxillofacial surgery? J. Stomatol. Oral Maxillofac. Surg. 2023, 124, 101471. [Google Scholar] [CrossRef]
- Danesh, A.; Pazouki, H.; Danesh, K.; Danesh, F.; Danesh, A. The performance of artificial intelligence language models in board-style dental knowledge assessment: A preliminary study on ChatGPT. J. Am. Dent. Assoc. 2023, 154, 970–974. [Google Scholar] [CrossRef]
- Vaira, L.A.; Lechien, J.R.; Abbate, V.; Allevi, F.; Audino, G.; Beltramini, G.A.; Bergonzani, M.; Bolzoni, A.; Committeri, U.; Crimi, S.; et al. Accuracy of ChatGPT-Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis. Otolaryngol. Head Neck Surg. 2024, 170, 1492–1503. [Google Scholar] [CrossRef]
- Giannakopoulos, K.; Kavadella, A.; Salim, A.A.; Stamatopoulos, V.; Kaklamanos, E.G. Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study. J. Med. Internet Res. 2023, 25, e51580. [Google Scholar] [CrossRef]
- Rokhshad, R.; Zhang, P.; Mohammad-Rahimi, H.; Pitchika, V.; Entezari, N.; Schwendicke, F. Accuracy and consistency of chatbots versus clinicians for answering pediatric dentistry questions: A pilot study. J. Dent. 2024, 144, 104938. [Google Scholar] [CrossRef] [PubMed]
Question Number | Question Description |
---|---|
1 | How should molars with class II and III furcation involvement and residual pockets be best managed? |
2 | What is the best treatment option for residual deep pockets associated with mandibular Class II furcation involvement? |
3 | What is the best treatment option for residual deep pockets associated with maxillary buccal Class II furcation involvement? |
4 | What is the best choice of regenerative biomaterials for the regenerative treatment of residual deep pockets associated with Class II mandibular and maxillary buccal furcation involvement? |
5 | What is the best treatment option for maxillary interdental Class II furcation involvement? |
6 | What is the best treatment option for maxillary Class III furcation involvement? |
7 | What is the best treatment option for mandibular Class III furcation involvement? |
8 | Does adjunctive use of local drugs to subgingival instrumentation improve the clinical outcomes of furcation involvement? |
9 | Does adjunctive use of systemic antibiotics improve the clinical outcomes of furcation involvement? |
10 | What is the best imaging technique for assessing furcation defects? |
Score 1 | Score 2 | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ChatGPT 4.0 | Google Gemini | Google Gemini Advanced | Microsoft CoPilot | ChatGPT 4.0 | Google Gemini | Google Gemini Advanced | Microsoft CoPilot | |||||||||
Evaluator | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 |
Mean | 6.0 | 6.0 | 5.7 | 5.7 | 6.9 | 6.7 | 5.7 | 5.7 | 5.8 | 6.0 | 5.7 | 5.7 | 6.9 | 6.7 | 5.7 | 5.6 |
Standard Error of Mean | 0.8 | 0.8 | 0.9 | 0.9 | 0.7 | 0.8 | 0.8 | 0.8 | 0.7 | 0.8 | 0.9 | 0.9 | 0.7 | 0.8 | 0.8 | 0.8 |
Median | 6.0 | 6.0 | 6.0 | 6.0 | 7.0 | 6.5 | 6.0 | 6.0 | 6.0 | 6.0 | 6.0 | 6.0 | 7.0 | 6.5 | 6.0 | 6.0 |
Minimum | 2.0 | 2.0 | 2.0 | 2.0 | 3.0 | 3.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 3.0 | 3.0 | 2.0 | 2.0 |
Maximum | 9.0 | 9.0 | 10.0 | 10.0 | 10.0 | 10.0 | 9.0 | 9.0 | 8.0 | 9.0 | 10.0 | 10.0 | 10.0 | 10.0 | 9.0 | 9.0 |
Standard deviation | 2.4 | 2.4 | 2.8 | 2.8 | 2.2 | 2.4 | 2.6 | 2.6 | 2.2 | 2.4 | 2.8 | 2.8 | 2.2 | 2.4 | 2.6 | 2.6 |
Variance | 5.6 | 5.6 | 7.6 | 7.6 | 5.0 | 5.6 | 6.9 | 6.9 | 4.6 | 5.6 | 7.6 | 7.6 | 5.0 | 5.6 | 6.9 | 6.5 |
Large Language Models (LLMs) [Evaluator 1–2] | Score 1 | Score 2 | ||
---|---|---|---|---|
Pearson Correlation | Spearman Rho | Pearson Correlation | Spearman Rho | |
ChatGPT 4.0 | 1.000 (p < 0.001) | 1.000 (-) | 0.987 (p < 0.001) | 0.965 (p < 0.001) |
Google Gemini | 1.000 (p < 0.001) | 1.000 (-) | 1.000 (p < 0.001) | 1.000 (-) |
Google Gemini Advanced | 0.985 (p < 0.001) | 0.975 (p < 0.001)) | 0.985 (p < 0.001) | 0.975 (p < 0.001) |
Microsoft CoPilot | 1.000 (p < 0.001) | 1.000 (-) | 0.993 (p < 0.001) | 0.991 (p < 0.001) |
Large Language Models (LLMs) | Score 1 | Score 2 | Pooled Scores 1 and 2 | ||||||
---|---|---|---|---|---|---|---|---|---|
Cronbach α | Interclass Correlation Coefficient | Cronbach α | Interclass Correlation Coefficient | Cronbach α | Interclass Correlation Coefficient | ||||
Single | Average | Single | Average | Single | Average | ||||
ChatGPT 4.0 | 1.000 | 1.000 (p < 0.001) | 1.000 (p < 0.001) | 0.991 | 0.983 (p < 0.001) | 0.991 (p < 0.001) | 1.000 | 1.000 (p < 0.001) | 1.000 (p < 0.001) |
Google Gemini | 1.000 | 1.000 (p < 0.001) | 1.000 (p < 0.001) | 1.000 | 1.000 (p < 0.001) | 1.000 (p < 0.001) | 1.000 | 1.000 (p < 0.001) | 1.000 (p < 0.001) |
Google Gemini Advanced | 0.992 | 0.983 (p < 0.001) | 0.992 (p < 0.001) | 0.992 | 0.983 (p < 0.001) | 0.992 (p < 0.001) | 1.000 | 1.000 (p < 0.001) | 1.000 (p < 0.001) |
Microsoft CoPilot | 1.000 | 1.000 (p < 0.001) | 1.000 (p < 0.001) | 0.996 | 0.993 (p < 0.001) | 0.996 (p < 0.001) | 1.000 | 1.000 (p < 0.001) | 1.000 (p < 0.001) |
Large Language Models (LLMs) [Evaluator 1–2] | Wilcoxon Test | Friedman Test Pooled Scores 1 and 2 | |
---|---|---|---|
Score 1 | Score 2 | ||
ChatGPT 4.0 | 1.000 | 0.157 | 1.000 |
Google Gemini | 1.000 | 1.000 | 1.000 |
Google Gemini Advanced | 0.157 | 0.157 | 1.000 |
Microsoft CoPilot | 1.000 | 0.317 | 1.000 |
Average Score | ChatGPT 4.0 | Google Gemini | Google Gemini Advanced | Microsoft CoPilot |
---|---|---|---|---|
Mean | 5.95 | 5.70 | 6.80 | 5.68 |
Standard Error of Mean | 0.73 | 0.87 | 0.72 | 0.82 |
Median | 6.00 | 6.00 | 6.75 | 6.00 |
Minimum | 2.00 | 2.00 | 3.00 | 2.00 |
Maximum | 8.75 | 10.00 | 10.00 | 9.00 |
Standard deviation | 2.30 | 2.75 | 2.29 | 2.60 |
Variance | 5.29 | 7.57 | 5.23 | 6.78 |
Large Language Models (LLMs) [Average Scores] | Kruskal–Wallis (Adjusted p-Value by Bonferroni Correction for Multiple Tests) |
---|---|
ChatGPT 4.0 vs. Google Gemini | 0.870 (1.000) |
ChatGPT 4.0 vs. Google Gemini Advanced | 0.329 (1.000) |
ChatGPT 4.0 vs. Microsoft CoPilot | 0.938 (1.000) |
Google Gemini vs. Google Gemini Advanced | 0.254 (1.000) |
Google Gemini vs. Microsoft CoPilot | 0.931 (1.000) |
Google Gemini Advanced vs. Microsoft CoPilot | 0.292 (1.000) |
ChatGPT 4.0 | Google Gemini | Google Gemini Advanced | Microsoft CoPilot | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Criteria examined | 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 |
Mean | 6.0 | 6.2 | 6.0 | 5.2 | 5.7 | 5.8 | 5.9 | 5.0 | 7.3 | 7.0 | 7.0 | 6.8 | 5.6 | 5.4 | 5.8 | 5.9 |
Standard Error of Mean | 0.7 | 0.8 | 0.7 | 0.9 | 0.9 | 0.9 | 0.9 | 1.1 | 0.6 | 0.7 | 0.7 | 1.0 | 0.8 | 0.9 | 0.8 | 1.0 |
Median | 5.5 | 6.0 | 6.0 | 6.0 | 6.0 | 6.0 | 6.5 | 4.5 | 7.8 | 7.0 | 7.0 | 6.8 | 6.0 | 6.0 | 6.0 | 6.8 |
Minimum | 2.0 | 1.5 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 1.5 | 4.0 | 4.0 | 3.0 | 2.0 | 2.0 | 1.5 | 2.0 | 1.0 |
Maximum | 9.0 | 9.0 | 9.0 | 8.5 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 9.0 | 9.0 | 9.0 | 10.0 |
Standard deviation | 2.2 | 2.5 | 2.3 | 2.9 | 2.8 | 2.7 | 2.7 | 3.3 | 2.0 | 2.1 | 2.3 | 3.1 | 2.6 | 2.9 | 2.6 | 3.0 |
Variance | 4.9 | 6.0 | 5.4 | 8.2 | 7.6 | 7.3 | 7.4 | 11.0 | 4.0 | 4.2 | 5.3 | 9.8 | 6.5 | 8.7 | 6.9 | 9.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chatzopoulos, G.S.; Koidou, V.P.; Tsalikis, L.; Kaklamanos, E.G. Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management. Dent. J. 2025, 13, 271. https://doi.org/10.3390/dj13060271
Chatzopoulos GS, Koidou VP, Tsalikis L, Kaklamanos EG. Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management. Dentistry Journal. 2025; 13(6):271. https://doi.org/10.3390/dj13060271
Chicago/Turabian StyleChatzopoulos, Georgios S., Vasiliki P. Koidou, Lazaros Tsalikis, and Eleftherios G. Kaklamanos. 2025. "Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management" Dentistry Journal 13, no. 6: 271. https://doi.org/10.3390/dj13060271
APA StyleChatzopoulos, G. S., Koidou, V. P., Tsalikis, L., & Kaklamanos, E. G. (2025). Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management. Dentistry Journal, 13(6), 271. https://doi.org/10.3390/dj13060271