Meta-Analysis on Comparison of Diagnostic Accuracy Between Artificial Intelligence and Healthcare Professionals
Abstract
1. Introduction
2. Materials and Methods
2.1. Criteria for Inclusion and Exclusion
- Inclusion criteria:
- Published journal articles between 01 January 2015 and 30 August 2025 in peer-reviewed journals.
- Research studies focused on using AI to enhance quality healthcare.
- Presented data on healthcare outcomes such as diagnostic accuracy, streamlining the HCP tasks, and reducing their workload.
- Exclusion criteria:
- Abstracts lacking original data from editorials, reviews, discussion articles, and conference papers.
- Research does not specifically regard AI’s application in healthcare.
2.1.1. Study Types
2.1.2. Participant Types
2.1.3. Intervention Types and Controls
2.1.4. Outcomes Measures
2.2. Literature Searches
2.2.1. Electronic Searches
- PubMed (from 2015 to 2025).
- Google Scholar (from 2015 to 2025).
- Embase (from 2015 to 2025).
- Scopus (from 2015 to 2025).
- Web of Science (from 2015 to 2025).
- Science.gov beta (from 2015 to 2025).
- Clinical Trials.gov (on 30 September 2025).
- Saudi Clinical Registry (03 October 2025).
- Cumulative Index to Nursing and Allied Health Literature (CINAHL).
2.2.2. Searching Other Resources
2.3. Data Selection
2.4. Data Extraction
- Reference.
- Year of Publication.
- Place of Study.
- Specialty.
- Same dataset for AI and HCPs.
- Comparison Type.
- AI Model.
- Samples.
- Correct diagnosis by AI and HCPs.
2.5. Quality/Risk of Bias Assessment of Included Studies
2.5.1. Risk of Bias
Risk of Bias Judgment
- Participants
- Predictors
- Analysis
- Outcome
Interpretation
2.5.2. Applicability Concern
Interpretation
2.6. Data Analysis
2.6.1. Synthesis Methods
2.6.2. Investigation of Heterogeneity and Subgroup Analysis
2.6.3. Sensitivity Analysis
2.6.4. Certainty of the Evidence Assessment
3. Results
3.1. Study Characteristics
3.1.1. Results of the Search
3.1.2. Included Studies
3.1.3. Excluded Studies
3.2. Quality/Risk of Bias of Included Studies
3.2.1. Risk of Bias Judgment
3.2.2. Applicability Concern
3.3. Intervention Effects
3.3.1. Artificial Intelligence vs. General Healthcare Professionals
3.3.2. Artificial Intelligence vs. Expert Healthcare Professionals
3.3.3. Artificial Intelligence vs. Non-Expert Healthcare Professionals
3.3.4. Use of AI Reducing Work Burden
3.3.5. Diagnostic Accuracy of Different Specialties and Overall Accuracy
3.3.6. Interpretation of Sensitivity Analysis
4. Discussion
4.1. Summary of Evidence
4.2. Agreements and Disagreements with Other Reviews
4.2.1. Strengths of the Review
4.2.2. Weaknesses of the Review
4.3. Limitations of Review
4.4. Clinical and Research Implications
4.4.1. Implications for Practice
4.4.2. Implications for Research
4.4.3. Evidence of Certainty and Generalizability
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| HCP | Healthcare Professionals |
| DA | Diagnostic Accuracy |
| ML | Machine Learning |
| DL | Deep Learning |
| GPT | Generative Pre-Trained Transformer |
| ECG | Electrocardiogram |
| CNN: | Convolutional neural networks |
| AI-CDSS | Artificial Intelligence-Based Clinical Decision Support System |
| R-CNN | Region-based Convolutional Neural Network |
| CHATGPT | Chat Generative Pre-Trained Transformer |
References
- Staggering U.S. Misdiagnosis Statistics in Healthcare. Daniel Harwin. Freedland Harwin Valori. Available online: https://www.fhvlegal.com/blog/staggering-u-s-diagnostic-error-statistics-july-2024/ (accessed on 6 October 2025).
- Wheeler, T.; Barnes, J.; Shurland, T.; Shukla, M.; Malhotra, R.; Wagh, M. How CFOs Can Help Transform Healthcare Organizations Amid an Uncertain Economic Environment. Deloitte Insights. Available online: https://www.deloitte.com/us/en/insights/industry/health-care/health-care-cfos-help-transform-organizations.html (accessed on 14 September 2023).
- Shannon Germain Farraher. Generative AI Impact on Clinicians: Bringing the Fever Down. Forrester. Available online: https://www.forrester.com/blogs/generative-ai-impact-on-clinicians-bringing-the-fever-down/ (accessed on 16 August 2024).
- Markets and Markets. Artificial Intelligence in Health Care Markets. Growth, Size, Share, Trends. Markets and Markets May-2025. Available online: https://www.marketsandmarkets.com/Market-Reports/artificial-intelligence-healthcare-market-54679303.html (accessed on 5 September 2025).
- Wang, H.; Zu, Q.; Chen, J.; Yang, Z.; Ahmed, M.A. Application of Artificial Intelligence in Acute Coronary Syndrome: A Brief Literature Review. Adv. Ther. 2021, 38, 5078–5086. [Google Scholar] [CrossRef]
- Bidenko, N.V.; Stuchynska, N.V.; Palamarchuk, Y.V.; Matviienko, M.M. Integrating artificial intelligence in healthcare practice: Challenges and future prospects. Wiad. Lek. 2025, 78, 1199–1205. [Google Scholar] [CrossRef] [PubMed]
- Milan, T.; Jan, K. Artificial intelligence in healthcare. Klin. Mikrobiol. Infekcni Lek. 2025, 31, 22–26. [Google Scholar] [PubMed]
- Zhang, H.; Zhao, H.; Fu, Y.; Ma, J.; Xiang, Y. The class labels and spatial information based fault diagnosis of air handling unit via combining kernel Fischer discriminant analysis with an improved graph convolutional neural network. Measurement 2026, 257, 118622. [Google Scholar] [CrossRef]
- Al-Obeidat, F.; Hafez, W.; Gador, M.; Ahmed, N.; Abdeljawad, M.M.; Yadav, A.; Rashed, A. Diagnostic performance of AI-based models versus physicians among patients with hepatocellular carcinoma: A systematic review and meta-analysis. Front. Artif. Intell. 2024, 7, 1398205. [Google Scholar] [CrossRef] [PubMed]
- Shen, J.; Zhang, C.J.P.; Jiang, B.; Chen, J.; Song, J.; Liu, Z.; He, Z.; Wong, S.Y.; Fang, P.H.; Ming, W.K. Artificial intelligence versus clinicians in disease diagnosis: Systematic review. JMIR Med. Inform. 2019, 7, e10010. [Google Scholar] [CrossRef] [PubMed]
- Takita, H.; Kabata, D.; Walston, S.L.; Tatekawa, H.; Saito, K.; Tsujimoto, Y.; Miki, Y.; Ueda, D. A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians. npj Digit. Med. 2025, 8, 175. [Google Scholar] [CrossRef] [PubMed]
- Page, M.J.; Moher, D.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. PRISMA 2020 explanation and elaboration: Updated guidance and exemplars for reporting systematic reviews. BMJ 2021, 372, n160. [Google Scholar] [CrossRef]
- Boginskis, V.; Zadoroznijs, S.; Cernavska, I.; Beikmane, D.; Sauka, J. Artificial intelligence effectivity in fracture detection. Med. Perspect. 2023, 28, 68–78. [Google Scholar] [CrossRef]
- Levine, D.M.; Tuwani, R.; Kompa, B.; Varma, A.; Finlayson, S.G.; Mehrotra, A.; Beam, A. The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: An observational study. Lancet Digit. Health 2024, 06, e555–e561. [Google Scholar] [CrossRef] [PubMed]
- Rauschecker, A.M.; Rudie, J.D.; Xie, L.; Wang, J.; Duong, M.T.; Botzolakis, E.J.; Kovalovich, A.M.; Egan, J.; Cook, T.C.; Bryan, R.N.; et al. Artificial intelligence system approaching neuroradiologist-level differential diagnosis accuracy at brain MRI. Radiology 2020, 295, 626–637. [Google Scholar] [CrossRef]
- Delshad, S.; Dontaraju, V.S.; Chengat, V. Artificial intelligence-based application provides accurate medical triage advice when compared to consensus decisions of healthcare providers. Cureus 2021, 13, e16956. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Gan, K.; Xu, D.; Lin, Y.; Shen, Y.; Zhang, T.; Hu, K.; Zhou, K.; Bi, M.; Pan, L.; Wu, W.; et al. Artificial intelligence detection of distal radius fractures: A comparison between the convolutional neural network and professional assessments. Acta Orthop. 2019, 90, 394–400. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Gottlieb, M.; Patel, D.; Viars, M.; Tsintolas, J.; Peksa, G.D.; Bailitz, J. Comparison of artificial intelligence versus real-time physician assessment of pulmonary edema with lung ultrasound. Am. J. Emerg. Med. 2023, 70, 109–112. [Google Scholar] [CrossRef] [PubMed]
- Cohen, M.; Puntonet, J.; Sanchez, J.; Kierszbaum, E.; Crema, M.; Soyer, P.; Dion, E. Artificial intelligence vs. radiologist: Accuracy of wrist fracture detection on radiographs. Eur. Radiol. 2023, 33, 3974–3983. [Google Scholar] [CrossRef] [PubMed]
- Twinprai, N.; Boonrod, A.; Boonrod, A.; Chindaprasirt, J.; Sirithanaphol, W.; Chindaprasirt, P.; Twinprai, P. Artificial intelligence (AI) vs. human in hip fracture detection. Heliyon 2022, 8, e11266. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Homayounieh, F.; Digumarthy, S.; Ebrahimian, S.; Rueckel, J.; Hoppe, B.F.; Sabel, B.O.; Conjeti, S.; Ridder, K.; Sistermanns, M.; Wang, L.; et al. An artificial intelligence–based chest X-ray model on human nodule detection accuracy from a multicenter study. JAMA Netw. Open 2021, 4, e2141096. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Wang, L.; Song, H.; Wang, M.; Wang, H.; Ge, R.; Shen, Y.; Yu, Y. Utilization of Ultrasonic Image Characteristics Combined with Endoscopic Detection on the Basis of Artificial Intelligence Algorithm in Diagnosis of Early Upper Gastrointestinal Cancer. J. Healthc. Eng. 2021, 2021, 2773022. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Tamai, K.; Terai, H.; Hoshino, M.; Tabuchi, H.; Kato, M.; Toyoda, H.; Suzuki, A.; Takahashi, S.; Yabu, A.; Sawada, Y.; et al. Deep Learning Algorithm for Identifying Cervical Cord Compression Due to Degenerative Canal Stenosis on Radiography. Spine 2023, 48, 519–525. [Google Scholar] [CrossRef] [PubMed]
- Harada, Y.; Katsukura, S.; Kawamura, R.; Shimizu, T. Efficacy of artificial-intelligence-driven differential-diagnosis list on the diagnostic accuracy of physicians: An open-label randomized controlled study. Int. J. Environ. Res. Public Health 2021, 18, 2086. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Liu, P.; Lu, L.; Chen, Y.; Huo, T.; Xue, M.; Wang, H.; Fang, Y.; Xie, Y.; Xie, M.; Ye, Z. Artificial intelligence to detect the femoral intertrochanteric fracture: The arrival of the intelligent-medicine era. Front. Bioeng. Biotechnol. 2022, 10, 927926. [Google Scholar] [CrossRef]
- Gräf, M.; Knitza, J.; Leipe, J.; Krusche, M.; Welcker, M.; Kuhn, S.; Mucke, J.; Hueber, A.J.; Hornig, J.; Klemm, P.; et al. Comparison of physician and artificial intelligence-based symptom checker diagnostic accuracy. Rheumatol. Int. 2022, 42, 2167–2176. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, W.; Chen, H.; Xie, S.; Wang, C.; Liang, T.; Yu, Y.; Liu, X. Artificial intelligence versus radiologist in the accuracy of fracture detection based on computed tomography images: A multi-dimensional, multi-region analysis. Quant. Imaging Med. Surg. 2023, 13, 6424. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- van Doorn, W.P.T.M.; Stassen, P.M.; Borggreve, H.F.; Schalkwijk, M.J.; Stoffers, J.; Bekers, O.; Meex, S.J.R. A comparison of machine learning models versus clinical evaluation for mortality prediction in patients with sepsis. PLoS ONE 2021, 16, e0245157. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Han, S.S.; Kim, Y.J.; Moon, I.J.; Jung, J.M.; Lee, M.Y.; Lee, W.J.; Won, C.H.; Lee, M.W.; Kim, S.H.; Navarrete-Dechent, C.; et al. Evaluation of Artificial Intelligence-Assisted Diagnosis of Skin Neoplasms: A Single-Center, Paralleled, Unmasked, Randomized Controlled Trial. J. Investig. Dermatol. 2022, 142, 2353–2362.e2. [Google Scholar] [CrossRef] [PubMed]
- Garcia, P.; Ma, S.P.; Shah, S.; Smith, M.; Jeong, Y.; Devon-Sand, A.; Tai-Seale, M.; Takazawa, K.; Clutter, D.; Vogt, K.; et al. Artificial intelligence–generated draft replies to patient inbox messages. JAMA Netw. Open 2024, 7, e243201. [Google Scholar] [CrossRef]
- Yamamura, Y.; Fujii, K.; Nakashima, C.; Otsuka, A. Evaluation of the accuracy of artificial intelligence (AI) models in dermatological diagnosis and comparison with dermatology specialists. Cureus 2025, 17, e77067. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Herzog, L.; Kook, L.; Hamann, J.; Globas, C.; Heldner, M.R.; Seiffge, D.; Antonenko, K.; Dobrocky, T.; Panos, L.; Kaesmacher, J.; et al. Deep Learning Versus Neurologists: Functional Outcome Prediction in LVO Stroke Patients Undergoing Mechanical Thrombectomy. Stroke 2023, 54, 7. [Google Scholar] [CrossRef] [PubMed]
- Choi, D.J.; Park, J.J.; Ali, T.; Lee, S. Artificial intelligence for the diagnosis of heart failure. npj Digit. Med. 2020, 3, 54. [Google Scholar] [CrossRef]
- Fonseca, Â.; Ferreira, A.; Ribeiro, L.; Moreira, S.; Duque, C. Embracing the future-is artificial intelligence already better? A comparative study of artificial intelligence performance in diagnostic accuracy and decision-making. Eur. J. Neurol. Off. J. Eur. Fed. Neurol. Soc. 2024, 31, e16195. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Çamkıran, V.; Tunç, H.; Achmar, B.; Ürker, T.S.; Kutlu, İ.; Torun, A. Artificial intelligence (ChatGPT) ready to evaluate ECG in real life? Not yet! Digit. Health 2025, 11, 20552076251325279. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Misurac, J.; Knake, L.A.; Blum, J.M. The effect of ambient artificial intelligence notes on provider burnout. Appl. Clin. Inform. 2025, 16, 252–258. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Faqar-Uz-Zaman, S.F.; Anantharajah, L.; Baumartz, P.; Sobotta, P.; Filmann, N.; Zmuc, D.; von Wagner, M.; Detemble, C.; Sliwinski, S.; Marschall, U.; et al. The Diagnostic Efficacy of an App-based Diagnostic Health Care Application in the Emergency Room: eRadaR-Trial. A prospective, Double-blinded, Observational Study. Ann. Surg. 2022, 276, 935–942. [Google Scholar] [CrossRef] [PubMed]
- Olson, K.D.; Meeker, D.; Troup, M.; Barker, T.D.; Nguyen, V.H.; Manders, J.B.; Stults, C.D.; Jones, V.G.; Shah, S.D.; Shah, T.; et al. Use of ambient AI scribes to reduce administrative burden and professional burnout. JAMA Netw. Open 2025, 8, e2534976. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Krusche, M.; Callhoff, J.; Knitza, J.; Ruffer, N. Diagnostic accuracy of a large language model in rheumatology: Comparison of physician and ChatGPT-4. Rheumatol. Int. 2024, 44, 303–306. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Guermazi, A.; Tannoury, C.; Kompel, A.J.; Murakami, A.M.; Ducarouge, A.; Gillibert, A.; Li, X.; Tournier, A.; Lahoud, Y.; Jarraya, M.; et al. Improving radiographic fracture recognition performance and efficiency using artificial intelligence. Radiology 2022, 302, 627–636. [Google Scholar] [CrossRef] [PubMed]
- Baek, G.; Cha, C. AI-Assisted Tailored Intervention for Nurse Burnout: A Three-Group Randomized Controlled Trial. Worldviews Evid.-Based Nurs. 2025, 22, e70003. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Lyons, R.J.; Arepalli, S.R.; Fromal, O.; Choi, J.D.; Jain, N. Artificial intelligence chatbot performance in triage of ophthalmic conditions. Can. J. Ophthalmol. 2024, 59, e301–e308. [Google Scholar] [CrossRef] [PubMed]
- Luo, Y.; Zhang, Y.; Liu, M.; Lai, Y.; Liu, P.; Wang, Z.; Xing, T.; Huang, Y.; Li, Y.; Li, A.; et al. Artificial intelligence-assisted colonoscopy for detection of colon polyps: A prospective, randomized cohort study. J. Gastrointest. Surg. Off. J. Soc. Surg. Aliment. Tract 2021, 25, 2011–2018. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Keenan, T.D.L.; Clemons, T.E.; Domalpally, A.; Elman, M.J.; Havilio, M.; Agrón, E.; Benyamini, G.; Chew, E.Y. Retinal Specialist versus Artificial Intelligence Detection of Retinal Fluid from OCT: Age-Related Eye Disease Study 2: 10-Year Follow-On Study. Ophthalmology 2020, 128, 100–109. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Review Manager, Version 9.14.0; The Cochrane Collaboration: London, UK, 2025. Available online: https://revman.cochrane.org (accessed on 16 October 2025).
- Chang, T.Y.; Chou, T.Y.; Jen, I.A.; Yuh, Y.S. Artificial intelligence algorithm improves radiologists’ bone age assessment accuracy artificial intelligence algorithm improves radiologists’ bone age assessment accuracy. J. Chin. Med. Assoc. 2025, 88, 10–1097. [Google Scholar] [CrossRef] [PubMed]
- Edström, A.B.; Makouei, F.; Wennervaldt, K.; Lomholt, A.F.; Kaltoft, M.; Melchiors, J.; Hvilsom, G.B.; Bech, M.; Tolsgaard, M.; Todsen, T. Human-AI collaboration for ultrasound diagnosis of thyroid nodules: A clinical trial. Eur. Arch. Oto-Rhino-Laryngol 2025, 282, 3221–3231. [Google Scholar] [CrossRef] [PubMed]
- Eng, D.K.; Khandwala, N.B.; Long, J.; Fefferman, N.R.; Lala, S.V.; Strubel, N.A.; Milla, S.S.; Filice, R.W.; Sharp, S.E.; Towbin, A.J.; et al. Artificial intelligence algorithm improves radiologist performance in skeletal age assessment: A prospective multicenter randomized controlled trial. Radiology 2021, 301, 692–699. [Google Scholar] [CrossRef] [PubMed]
- Geneş, M.; Deveci, B. A clinical evaluation of cardiovascular emergencies: A comparison of responses from ChatGPT, emergency physicians, and cardiologists. Diagnostics 2024, 14, 2731. [Google Scholar] [CrossRef] [PubMed]
- Gertz, R.J.; Dratsch, T.; Bunck, A.C.; Lennartz, S.; Iuga, A.I.; Hellmich, M.G.; Persigehl, T.; Pennig, L.; Gietzen, C.H.; Fervers, P.; et al. Potential of GPT-4 for detecting errors in radiology reports: Implications for reporting accuracy. Radiology 2024, 311, e232714. [Google Scholar] [CrossRef] [PubMed]
- Gottlieb, M.; Schraft, E.; O’Brien, J.; Patel, D. Diagnostic accuracy of artificial intelligence for identifying systolic and diastolic cardiac dysfunction in the emergency department. Am. J. Emerg. Med. 2024, 86, 115–119. [Google Scholar] [CrossRef]
- Hoppe, J.M.; Auer, M.K.; Strüven, A.; Massberg, S.; Stremmel, C. ChatGPT with GPT-4 outperforms emergency department physicians in diagnostic accuracy: Retrospective analysis. J. Med. Internet Res. 2024, 26, e56110. [Google Scholar] [CrossRef] [PubMed]
- Janik, M.; Raad, G.; Nijmeh, G.; O’Steen, M.; Rasmussen, J. Diagnostic accuracy for detecting atrial fibrillation using a novel machine learning algorithm in a blood pressure monitor. Heart Rhythm. 2024, 21, 2023–2027. [Google Scholar] [CrossRef] [PubMed]
- Jiao, C.; Rosas, E.; Asadigandomani, H.; Delsoz, M.; Madadi, Y.; Raja, H.; Munir, W.M.; Tamm, B.; Mehravaran, S.; Djalilian, A.R.; et al. Diagnostic performance of publicly available large language models in corneal diseases: A comparison with human specialists. Diagnostics 2025, 15, 1221. [Google Scholar] [CrossRef]
- Johnson, S.; Kantartjis, M.; Severson, J.; Dorsey, R.; Adams, J.L.; Kangarloo, T.; Kostrzebski, M.A.; Best, A.; Merickel, M.; Amato, D.; et al. Wearable sensor-based assessments for remotely screening early-stage Parkinson’s disease. Sensors 2024, 24, 5637. [Google Scholar] [CrossRef] [PubMed]
- Kücking, F.; Hübner, U.H.; Busch, D. Diagnostic accuracy differences in detecting wound maceration between humans and artificial intelligence: The role of human expertise revisited. J. Am. Med. Inform. Assoc. JAMIA 2025, 32, 1425–1433. [Google Scholar] [CrossRef] [PubMed]
- Li, T.; Liu, Y.; Guo, J.; Wang, Y. Prediction of the activity of Crohn’s disease based on CT radiomics combined with machine learning models. J. X-Ray Sci. Technol. 2022, 30, 1155–1168. [Google Scholar] [CrossRef] [PubMed]
- Luna, A.; Casertano, L.; Timmerberg, J.; O’Neil, M.; Machowsky, J.; Leu, C.-S.; Lin, J.; Fang, Z.; Douglas, W.; Agrawal, S. Artificial intelligence application versus physical therapist for squat evaluation: A randomized controlled trial. Sci. Rep. 2021, 11, 18109. [Google Scholar] [CrossRef] [PubMed]
- Park, A.; Chute, C.; Rajpurkar, P.; Lou, J.; Ball, R.L.; Shpanskaya, K.; Jabarkheel, R.; Kim, L.H.; McKenna, E.; Tseng, J.; et al. Deep learning–assisted diagnosis of cerebral aneurysms using the HeadXNet model. JAMA Netw. Open 2019, 2, e195600. [Google Scholar] [CrossRef] [PubMed]
- Pluym, I.D.; Afshar, Y.; Holliman, K.; Kwan, L.; Bolagani, A.; Mok, T.; Silver, B.; Ramirez, E.; Han, C.S.; Platt, L.D. Accuracy of automated three-dimensional ultrasound imaging technique for fetal head biometry. Ultrasound Obstet. Gynecol. 2021, 57, 798–803. [Google Scholar] [CrossRef] [PubMed]
- Prinster, D.; Mahmood, A.; Saria, S.; Jeudy, J.; Lin, C.T.; Yi, P.H.; Huang, C.M. Care to explain? AI explanation types differentially impact chest radiograph diagnostic performance and physician trust in AI. Radiology 2024, 313, e233261. [Google Scholar] [CrossRef] [PubMed]
- Richard, C.; Schriger, D.; Weingrow, D. Rapid Electroencephalography and Artificial Intelligence in the Detection and Management of Nonconvulsive Seizures. Ann. Emerg. Med. 2024, 84, 422–427. [Google Scholar] [CrossRef] [PubMed]
- Surya, J.; Garima Pandy, N.; Hyungtaek Rim, T.; Lee, G.; Priya, M.N.S.; Subramanian, B.; Raman, R. Efficacy of deep learning-based artificial intelligence models in screening and referring patients with diabetic retinopathy and glaucoma. Indian J. Ophthalmol. 2023, 71, 3039–3045. [Google Scholar] [CrossRef] [PubMed]
- Taloni, A.; Borselli, M.; Scarsi, V.; Rossi, C.; Coco, G.; Scorcia, V.; Giannaccare, G. Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology. Sci. Rep. 2023, 13, 18562. [Google Scholar] [CrossRef] [PubMed]
- Turan, E.İ.; Baydemir, A.E.; Balıtatlı, A.B.; Şahin, A.S. Assessing the accuracy of ChatGPT in interpreting blood gas analysis results ChatGPT-4 in blood gas analysis. J. Clin. Anesth. 2025, 102, 111787. [Google Scholar] [CrossRef] [PubMed]
- Yang, C.; Zhao, H.; Wang, A.; Li, J.; Gao, J. Comparison of lung ultrasound assisted by artificial intelligence to radiology examination in pneumothorax. J. Clin. Ultrasound 2024, 52, 1051–1055. [Google Scholar] [CrossRef] [PubMed]
- Yu, G.; Liu, X.; Li, Y.; Zhang, Y.; Yan, R.; Zhu, L.; Wang, Z. The nomograms for predicting overall and cancer-specific survival in elderly patients with early-stage lung cancer: A population-based study using SEER database. Front. Public Health 2022, 10, 946299. [Google Scholar] [CrossRef] [PubMed]









| AI vs. HCPs | AI vs. HCPs | ||
|---|---|---|---|
| GLEAMER BoneView-AI v2.0.2a [13] | Radiologist | GPT-3 [14] | General Physician |
| AI Software [15] | Radiologist | Maya-MD [16] | General Physician |
| CNNs [17] | Radiologist | AI Software [18] | General Physician |
| AI Software [19] | Radiologist | AI Software [20] | General Physician |
| AI Software [21] | Radiologist | Cascade-RCNN [22] | General Physician |
| Deep Learning [23] | Radiologist | AI software [24] | General Physician |
| Faster-RCNN algorithm [25] | Radiologist | Ada App [26] | General Physician |
| AI Software [27] | Radiologist | Machine Learning [28] | General Physician |
| AI Software [29] | Dermatologist | AI Software [30] | General Physician |
| ChatGPT-4o, Claude 3.5, and Gemini 1.5 Pro [31] | Dermatologist | Deep Learning [32] | Neurologist |
| AI-CDSS [33] | Heart Specialist | GPT-3.5 [34] | Neurologist |
| GPT-ECG Reader, Analyzer, Interpreter [35] | Cardiologist | AI Software [36] | HealthCare Practitioner |
| Ada App [37] | ER Physician | AI Software [38] | HealthCare Practitioner |
| CHATGPT-4 [39] | Rheumatologist | Detectron2 [40] | HealthCare Practitioner |
| AI-assisted tailored intervention [41] | Nurses | GPT-4 [42] | Ophthalmology trainees |
| AI Software [43] | Endoscopists | Notal OCT Analyzer [44] | Retinal Specialist |
| Reference | Year | Place | Specialty | Same Dataset for AI and HCP | Comparison Type | AI Model |
|---|---|---|---|---|---|---|
| Radiology | ||||||
| Boginskis 2023 [13] | 2023 | Europe | Radiology | Yes | Paired | GLEAMER Bone View |
| Cohen 2023 [19] | 2023 | France | Radiology | Yes | Paired | AI Software |
| Gan 2019 [17] | 2019 | China | Radiology | Yes | Paired | CNN |
| Guermazi 2022 [40] | 2022 | Boston | Radiology | Yes | Paired | Detectron 2 |
| Homayounieh 2021 [21] | 2021 | USA | Radiology | Yes | Paired | AI Software |
| Liu 2022 [25] | 2022 | China | Radiology | Yes | Paired | Faster-RCNN algorithm |
| Liu 2023 [27] | 2023 | China | Radiology | Yes | Paired | AI Software |
| Luo 2021 [43] | 2021 | China | Radiology | Yes | Paired | AI Software |
| Michael Gottlieb 2024 [18] | 2024 | USA | Radiology | Yes | Paired | AI Software |
| Rauschecker 2020 [15] | 2020 | San Francisco | Radiology | Yes | Paired | AI Software |
| Tamai 2023 [23] | 2023 | Japan | Radiology | Yes | Paired | Deep Learning |
| Twinprai 2022 [20] | 2022 | Thailand | Radiology | Yes | Paired | AI Software |
| Wang 2021 [22] | 2021 | China | Radiology | Yes | Paired | Cascade-RCNN |
| Cardiology | ||||||
| Camkıran 2025 [35] | 2025 | Turkey | Cardiology | Yes | Paired | GPT |
| Choi 2020 [33] | 2020 | Korea | Cardiology | Yes | Paired | AI-CDSS |
| Graf 2022 [26] | 2022 | Germany | Cardiology | Yes | Paired | Ada App |
| Krusche 2024 [39] | 2024 | Germany | Cardiology | Yes | Paired | CHATGPT-4 |
| Emergency | ||||||
| David M. Levine 2024 [14] | 2024 | USA | Emergency | No | Unpaired | GPT-3 |
| Delshad 2021 [16] | 2021 | USA | Emergency | Yes | Paired | Maya-MD |
| Faqar-Uz-Zaman 2022 [37] | 2022 | Germany | Emergency | Yes | Paired | Ada App |
| Van Doorn 2021 [28] | 2021 | Netherlands | Emergency | Yes | Paired | Machine Learning |
| Neurology | ||||||
| Fonseca 2024 [34] | 2024 | Portugal | Neurology | Yes | Paired | GPT-3.5 |
| Lisa Herzog 2023 [32] | 2023 | Switzerland | Neurology | Yes | Paired | Deep Learning |
| Dermatology | ||||||
| Han 2022 [29] | 2022 | Korea | Dermatology | Yes | Paired | AI Software |
| Yamamura 2025 [31] | 2025 | Japan | Dermatology | Yes | Paired | ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro |
| Pathology | ||||||
| Harada 2021 [24] | 2021 | Japan | Pathology | Yes | Paired | AI Software |
| Opthamology | ||||||
| Keenan 2020 [44] | 2020 | Maryland | Ophthalmology | Yes | Paired | Notal OCT Analyzer (NOA) |
| Lyons 2024 [42] | 2024 | Atlanta | Ophthalmology | Yes | Paired | GPT-4 |
| Burnout | ||||||
| Baek 2025 [41] | 2025 | South Korea | Burnout | AI-assisted tailored intervention | ||
| Gracia 2024 [30] | 2024 | California | Burnout | AI Software | ||
| Misurac 2025 [36] | 2025 | USA | Burnout | AI Software | ||
| Olson 2025 [38] | 2025 | Connecticut | Burnout | AI Software |
| Reference | Samples | Correct Diagnosis by AI and HCP | |||
|---|---|---|---|---|---|
| AI | Expert | Non-Expert | General | ||
| Boginskis 2023 [13] | Radiographs-100 | 85 | 78 | ||
| Cohen 2023 [19] | Radiographs-318 | 285 | 273 | ||
| Gan 2019 [17] | Radiographs-2340 | 2246 | 1989 | 2223 | |
| Guermazi 2022 [40] | Radiographs-480 | 451 | 422 | ||
| Liu 2022 [25] | Radiographs-57 | 50 | 47 | ||
| Liu 2023 [27] | Radiographs-191 | 165 | 159 | ||
| Twinprai 2022 [20] | Radiographs-1000 | 950 | 960 | 820 | |
| Krusche 2024 [39] | Patients-100 | 60 | 65 | ||
| Homayounieh 2021 [21] | Patients-100 | 80 | 86 | 86 | |
| Luo 2021 [43] | Patients-150 | 59 | 51 | ||
| Michael Gottlieb 2024 [18] | Patients-71 | 69 | 68 | ||
| Rauschecker 2020 [15] | Patients-86 | 78 | 73 | 48 | |
| Tamai 2023 [23] | Patients-42 | 34 | 28 | ||
| Choi 2020 [33] | Patients-1198 | 1174 | 1198 | 910 | |
| Wang 2021 [22] | Patients-80 | 71 | 35 | ||
| Faqar-Uz-Zaman 2022 [37] | Patients-450 | 391 | 364 | ||
| Van Doorn 2021 [28] | Patients-100 | 92 | 72 | ||
| Lisa Her Zog 2023 [32] | Patients-50 | 36 | 32 | ||
| Han 2022 [29] | Patients-295 | 159 | 124 | ||
| Yamamura 2025 [31] | Patients-30 | 21 | 20 | ||
| Graf 2022 [26] | Vignettes-132 | 93 | 70 | ||
| David M. Levine 2024 [14] | Vignettes-48 | 42 | 46 | ||
| Delshad 2021 [16] | Vignettes-50 | 44 | 39 | ||
| Fonseca 2024 [34] | Vignettes-188 | 134 | 130 | ||
| Harada 2021 [24] | Vignettes-16 | 9 | 9 | ||
| Lyons 2024 [42] | Vignettes-44 | 41 | 42 | ||
| Camkıran 2025 [35] | ECG-107 | 63 | 93 | ||
| Keenan 2020 [44] | Eyes-1127 | 913 | 958 | ||
| Reference | Diagnostic Accuracy | |||
|---|---|---|---|---|
| A | B | C | D | |
| Boginskis 2023 [13] | 85% | 78% | ||
| Cohen 2023 [19] | 83% | 76% | ||
| Gan 2019 [17] | 96% | 85% | 95% | |
| Guermazi 2022 [40] | 86% | 78% | ||
| Liu 2022 [25] | 88% | 84% | ||
| Liu 2023 [27] | 86% | 71% | ||
| Twinprai 2022 [20] | 95% | 96% | 82% | |
| Krusche 2024 [39] | 60% | 55% | ||
| Homayounieh 2021 [21] | 80% | 86% | 86% | |
| Luo 2021 [43] | 39% | 34% | ||
| Michael Gottlieb 2024 [18] | 97% | 96% | ||
| Rauschecker 2020 [15] | 91% | 86% | 56% | |
| Tamai 2023 [23] | 81% | 66% | ||
| Choi 2020 [33] | 98% | 100% | 76% | |
| Wang 2021 [22] | 89% | 44% | ||
| Faqar-Uz-Zaman 2022 [37] | 52% | 81% | ||
| Van Doorn 2021 [28] | 80% | 73% | ||
| Lisa Herzog 2023 [32] | 72% | 64% | ||
| Han 2022 [29] | 54% | 44% | ||
| Yamamura 2025 [31] | 70% | 65% | ||
| Graf 2022 [26] | 70% | 54% | ||
| David M. Levine 2024 [14] | 88% | 96% | ||
| Delshad 2021 [16] | 92% | 80% | ||
| Fonseca 2024 [34] | 71% | 69% | ||
| Harada 2021 [24] | 57% | 56% | ||
| Lyons 2024 [42] | 93% | 95% | ||
| Camkıran 2025 [35] | 63% | 93% | ||
| Keenan 2020 [44] | 85% | 81% | ||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Kumar, P.; Alnaimi, N.A.; Soman, S.; Suansing, L.; Ryan Arriola, D., II; Jamea, L.A. Meta-Analysis on Comparison of Diagnostic Accuracy Between Artificial Intelligence and Healthcare Professionals. Sci 2026, 8, 73. https://doi.org/10.3390/sci8040073
Kumar P, Alnaimi NA, Soman S, Suansing L, Ryan Arriola D II, Jamea LA. Meta-Analysis on Comparison of Diagnostic Accuracy Between Artificial Intelligence and Healthcare Professionals. Sci. 2026; 8(4):73. https://doi.org/10.3390/sci8040073
Chicago/Turabian StyleKumar, Prem, Nouf A. Alnaimi, Sumi Soman, Leda Suansing, Daniel Ryan Arriola, II, and Lamiaa Al Jamea. 2026. "Meta-Analysis on Comparison of Diagnostic Accuracy Between Artificial Intelligence and Healthcare Professionals" Sci 8, no. 4: 73. https://doi.org/10.3390/sci8040073
APA StyleKumar, P., Alnaimi, N. A., Soman, S., Suansing, L., Ryan Arriola, D., II, & Jamea, L. A. (2026). Meta-Analysis on Comparison of Diagnostic Accuracy Between Artificial Intelligence and Healthcare Professionals. Sci, 8(4), 73. https://doi.org/10.3390/sci8040073

