A Comparative Assessment of ChatGPT, Gemini, and DeepSeek Accuracy: Examining Visual Medical Assessment in Internal Medicine Cases with and Without Clinical Context
Abstract
1. Introduction
2. Method
2.1. Study Design
2.2. Phases and Prompting
2.3. Data Source and Case Selection
2.4. Reference Answer
2.5. Models and Configurations
2.6. Outcomes
2.7. Statistical Analysis
2.8. Ethics and Copyright
3. Results
3.1. Overall Diagnostic Accuracy
3.2. Performance by Disease Nature Category
3.3. Performance by Organ System Category
3.4. Differential Diagnosis Precision
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hansmann, M.-L.; Klauschen, F.; Samek, W.; Müller, K.-R.; Donnadieu, E.; Scharf, S.; Hartmann, S.; Koch, I.; Ackermann, J.; Pantanowitz, L. Imaging bridges pathology and radiology. J. Pathol. Inform. 2023, 14, 100298. [Google Scholar] [CrossRef]
- Micheletti, R.G.; Shinkai, K.; Madigan, L. Introducing “images in dermatology”. JAMA Dermatol. 2018, 154, 1255–1256. [Google Scholar] [CrossRef]
- Yapp, K.E.; Brennan, P.; Ekpo, E. The effect of clinical history on diagnostic imaging interpretation—A systematic review. Acad. Radiol. 2022, 29, 255–266. [Google Scholar] [CrossRef]
- Kung, T.H.; Cheatham, M.; Medenilla, A.; Sillos, C.; De Leon, L.; Elepaño, C.; Madriaga, M.; Aggabao, R.; Diaz-Candido, G.; Maningo, J. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS Digit. Health 2023, 2, e0000198. [Google Scholar] [CrossRef]
- Goh, E.; Gallo, R.; Hom, J.; Strong, E.; Weng, Y.; Kerman, H.; Cool, J.A.; Kanjee, Z.; Parsons, A.S.; Ahuja, N. Large language model influence on diagnostic reasoning: A randomized clinical trial. JAMA Netw. Open 2024, 7, e2440969. [Google Scholar] [CrossRef]
- Guerra, G.A.; Hofmann, H.L.; Le, J.L.; Wong, A.M.; Fathi, A.; Mayfield, C.K.; Petrigliano, F.A.; Liu, J.N. ChatGPT, Bard, and Bing chat are large language processing models that answered orthopaedic in-training examination questions with similar accuracy to first-year orthopaedic surgery residents. Arthrosc. J. Arthrosc. Relat. Surg. 2025, 41, 557–562. [Google Scholar] [CrossRef] [PubMed]
- Meng, X.; Yan, X.; Zhang, K.; Liu, D.; Cui, X.; Yang, Y.; Zhang, M.; Cao, C.; Wang, J.; Wang, X. The application of large language models in medicine: A scoping review. Iscience 2024, 27, 109713. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Ong, H.; Kennedy, P.; Wu, C.C.; Kazam, J.; Hentel, K.; Flanders, A.; Shih, G.; Peng, Y. Evaluating GPT-4V (GPT-4 with Vision) on detection of radiologic findings on chest radiographs. Radiology 2024, 311, e233270. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Li, Y.; Wang, Z.; Liang, X.; Liu, L.; Wang, L.; Cui, L.; Tu, Z.; Wang, L.; Zhou, L. A systematic evaluation of GPT-4V’s multimodal capability for chest X-ray image analysis. Meta-Radiology 2024, 2, 100099. [Google Scholar] [CrossRef]
- Busch, F.; Han, T.; Makowski, M.R.; Truhn, D.; Bressem, K.K.; Adams, L. Integrating text and image analysis: Exploring GPT-4v’s capabilities in advanced radiological applications across subspecialties. J. Med. Internet Res. 2024, 26, e54948. [Google Scholar] [CrossRef]
- Cirone, K.; Akrout, M.; Abid, L.; Oakley, A. Assessing the utility of multimodal large language models (GPT-4 vision and large language and vision assistant) in identifying melanoma across different skin tones. JMIR Dermatol. 2024, 7, e55508. [Google Scholar] [CrossRef]
- Dai, D.; Zhang, Y.; Yang, Q.; Xu, L.; Shen, X.; Xia, S.; Wang, G. Pathologyvlm: A large vision-language model for pathology image understanding. Artif. Intell. Rev. 2025, 58, 186. [Google Scholar] [CrossRef]
- Ding, L.; Fan, L.; Shen, M.; Wang, Y.; Sheng, K.; Zou, Z.; An, H.; Jiang, Z. Evaluating ChatGPT’s diagnostic potential for pathology images. Front. Med. 2025, 11, 1507203. [Google Scholar] [CrossRef] [PubMed]
- Hartsock, I.; Rasool, G. Vision-language models for medical report generation and visual question answering: A review. Front. Artif. Intell. 2024, 7, 1430984. [Google Scholar] [CrossRef]
- Huang, S.-C.; Jensen, M.; Yeung-Levy, S.; Lungren, M.P.; Poon, H.; Chaudhari, A.S. Multimodal Foundation Models for Medical Imaging-A Systematic Review and Implementation Guidelines. medRxiv 2024. [Google Scholar] [CrossRef]
- Ryu, J.S.; Kang, H.; Chu, Y.; Yang, S. Vision-language foundation models for medical imaging: A review of current practices and innovations. Biomed. Eng. Lett. 2025, 15, 809–830. [Google Scholar] [CrossRef]
- Ge, X.; Chen, J.; Yuan, C.; Chu, Z.; Li, X.; Zhang, X.; Chen, Y.; Zheng, W.Y.; Miao, C. Systematic Comparison of Multimodal Large Language Models for Pediatric Profile Orthodontic Assessment and Early Intervention: ChatGPT, DeepSeek, and Gemini. 2025. Available online: https://www.researchsquare.com/article/rs-7750405/v1 (accessed on 16 January 2026).
- Hayat, M. Endoscopic Image Super-Resolution Algorithm Using Edge and Disparity Awareness. 2023. Available online: https://digital.car.chula.ac.th/chulaetd/11935/ (accessed on 16 January 2026).
- Maizlin, N.N.; Somers, S. The role of clinical history collected by diagnostic imaging staff in interpreting of imaging examinations. J. Med. Imaging Radiat. Sci. 2019, 50, 31–35. [Google Scholar] [CrossRef]
- Graber, M. The Harrison’s Visual Case Challenge; McGraw-Hill Education: New York, NY, USA, 2021. [Google Scholar]
- Aykac, K.; Cubuk, O.; Demir, O.O.; Choe, Y.J.; Aydin, M.; Ozsurekci, Y. Comparing ChatGPT-3.5, Gemini 2.0, and DeepSeek V3 for pediatric pneumonia learning in medical students. Sci. Rep. 2025, 15, 40342. [Google Scholar] [CrossRef] [PubMed]
- Bahir, D.; Zur, O.; Attal, L.; Nujeidat, Z.; Knaanie, A.; Pikkel, J.; Mimouni, M.; Plopsky, G. Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge. Graefe’s Arch. Clin. Exp. Ophthalmol. 2025, 263, 527–536. [Google Scholar] [CrossRef]
- Han, T.; Adams, L.C.; Bressem, K.K.; Busch, F.; Nebelung, S.; Truhn, D. Comparative analysis of multimodal large language model performance on clinical vignette questions. JAMA 2024, 331, 1320–1321. [Google Scholar] [CrossRef] [PubMed]
- Hu, M.; Pan, S.; Li, Y.; Yang, X. Advancing medical imaging with language models: A journey from n-grams to chatgpt. arXiv 2023, arXiv:2304.04920. [Google Scholar] [CrossRef]
- Bradshaw, T.J.; Tie, X.; Warner, J.; Hu, J.; Li, Q.; Li, X. Large Language Models and Large Multimodal Models in Medical Imaging: A Primer for Physicians. J. Nucl. Med. 2025, 66, 173–182. [Google Scholar] [CrossRef]
- Hayat, M.; Aramvith, S. Superpixel-Guided Graph-Attention Boundary GAN for Adaptive Feature Refinement in Scribble-Supervised Medical Image Segmentation. IEEE Access 2025, 13, 196654–196668. [Google Scholar] [CrossRef]
- Suh, P.S.; Shim, W.H.; Suh, C.H.; Heo, H.; Park, C.R.; Eom, H.J.; Park, K.J.; Choe, J.; Kim, P.H.; Park, H.J. Comparing diagnostic accuracy of radiologists versus GPT-4V and Gemini Pro Vision using image inputs from diagnosis please cases. Radiology 2024, 312, e240273. [Google Scholar] [CrossRef] [PubMed]
- Kim, S.H.; Schramm, S.; Adams, L.C.; Braren, R.; Bressem, K.K.; Keicher, M.; Platzek, P.-S.; Paprottka, K.J.; Zimmer, C.; Hedderich, D.M. Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports. npj Digit. Med. 2025, 8, 97. [Google Scholar] [CrossRef]
- Elboardy, A.T.; Khoriba, G.; al-Shatouri, M.; Mousa, M.; Rashed, E.A. Benchmarking vision-language models for brain cancer diagnosis using multisequence MRI. Inform. Med. Unlocked 2025, 58, 101692. [Google Scholar] [CrossRef]
- Kao, J.-P.; Kao, H.-T. Large Language Models in radiology: A technical and clinical perspective. Eur. J. Radiol. Artif. Intell. 2025, 30, 100021. [Google Scholar] [CrossRef]
- Ling, C.; Zhao, X.; Lu, J.; Deng, C.; Zheng, C.; Wang, J.; Chowdhury, T.; Li, Y.; Cui, H.; Zhang, X. Domain specialization as the key to make large language models disruptive: A comprehensive survey. ACM Comput. Surv. 2023, 58, 1–39. [Google Scholar] [CrossRef]
- Huang, S.-C.; Pareek, A.; Seyyedi, S.; Banerjee, I.; Lungren, M.P. Fusion of medical imaging and electronic health records using deep learning: A systematic review and implementation guidelines. npj Digit. Med. 2020, 3, 136. [Google Scholar] [CrossRef]
- Qin, Z.; Yi, H.; Lao, Q.; Li, K. Medical image understanding with pretrained vision language models: A comprehensive study. arXiv 2022, arXiv:2209.15517. [Google Scholar]
- Atsukawa, N.; Tatekawa, H.; Oura, T.; Matsushita, S.; Horiuchi, D.; Takita, H.; Mitsuyama, Y.; Omori, A.; Shimono, T.; Miki, Y. Evaluation of radiology residents’ reporting skills using large language models: An observational study. Jpn. J. Radiol. 2025, 43, 1204–1212. [Google Scholar] [CrossRef]
- Sahoo, P.; Singh, A.K.; Saha, S.; Jain, V.; Mondal, S.; Chadha, A. A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv 2024, arXiv:2402.07927. [Google Scholar] [CrossRef]
- Wang, G.; Yang, R.; Zhang, Y.; Wen, X.; Liu, C.; Liu, E.; Tang, M.; Xue, L.; Liu, Z. Evaluating the performance of large language models in rheumatology for connective tissue diseases: DeepSeek-R1, ChatGPT-4.0, Copilot, and Gemini-2.0. Int. J. Med. Inform. 2026, 207, 106172. [Google Scholar] [CrossRef] [PubMed]
- Feldman, M.J.; Hoffer, E.P.; Conley, J.J.; Chang, J.; Chung, J.A.; Jernigan, M.C.; Lester, W.T.; Strasser, Z.H.; Chueh, H.C. Dedicated AI Expert System vs Generative AI With Large Language Model for Clinical Diagnoses. JAMA Netw. Open 2025, 8, e2512994. [Google Scholar] [CrossRef] [PubMed]
- Pillai, A.; Parappally-Joseph, S.; Kreutz, J.; Traboulsi, D.; Gandhi, M.; Hardin, J. Evaluating the Diagnostic and Treatment Capabilities of GPT-4 Vision in Dermatology: A Pilot Study. J. Cutan. Med. Surg. 2025, 29, 570–576. [Google Scholar] [CrossRef] [PubMed]
- Jiang, Y.; Omiye, J.A.; Zakka, C.; Moor, M.; Gui, H.; Alipour, S.; Mousavi, S.S.; Chen, J.H.; Rajpurkar, P.; Daneshjou, R. Evaluating general vision-language models for clinical medicine. MedRxiv 2024, 2024-04. [Google Scholar]




| Metric | CHATGPT | GEMINI | DEEPSEEK | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Phase 1 N (%) | Phase 2 N (%) | Δ Difference | Phase 1 N (%) | Phase 2 N (%) | Δ Difference | Phase 1 N (%) | Phase 2 N (%) | Δ Difference | |
| Diagnosis Accuracy | |||||||||
| Total Cases | 138 | 138 | — | 138 | 138 | — | 138 | 138 | — |
| Correct Diagnoses | 70 (50.72%) | 111 (80.43%) | +41 (+58.57%) | 55 (39.86%) | 100 (72.46%) | +45 (+81.82%) | 42 (30.43%) | 104 (75.36%) | +62 (+147.62%) |
| Incorrect Diagnoses | 68 (49.28%) | 27 (19.57%) | −41 (−60.29%) | 83 (60.14%) | 38 (27.54%) | −45 (−54.22%) | 96 (69.57%) | 34 (24.64%) | −62 (−64.58%) |
| Overall Accuracy | 50.70% | 80.40% | +29.70 ppt | 39.90% | 72.50% | +32.60 ppt | 30.43% | 75.36% | +44.93 ppt |
| Differential Diagnosis (Phase 2 Only) Accuracy | |||||||||
| Average (%) | 6.99% | 36.39 | 32.74 | ||||||
| Range (Min %–Max %) | 0.0–21.42 | 0.0–92.85 | 0.0–87.5 | ||||||
| Median % | 6.47 | 37.5 | 33.33 | ||||||
| Disease Nature Category | Total Cases | CHATGPT | GEMINI | DEEPSEEK | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Phase 1 Correct (% Acc.) Diagnosis | Phase 2 Correct (% Acc.) Diagnosis | Difference Δ % Accuracy | Phase 1 Correct (% Acc.) Diagnosis | Phase 2 Correct (% Acc.) Diagnosis | Difference Δ % Accuracy | Phase 1 Correct (% Acc.) Diagnosis | Phase 2 Correct (% Acc.) Diagnosis | Difference Δ % Accuracy | ||
| Cutaneous Inflammatory/Autoimmune | 20 | 8 (40.00%) | 16 (80.00%) | +40.00 ppt | 15 (75.00%) | 16 (80.00%) | +5.00 ppt | 2 (10.00%) | 17 (85.00%) | +75.00 ppt |
| Systemic Inflammatory/Autoimmune | 12 | 6 (50.00%) | 10 (83.30%) | +33.30 ppt | 10 (83.30%) | 10 (83.30%) | 0.00 ppt | 6 (50.00%) | 10 (83.33%) | +33.33 ppt |
| Bacterial and Fungal Infections | 12 | 6 (50.00%) | 11 (91.70%) | +41.70 ppt | 10 (83.30%) | 11 (91.70%) | +8.40 ppt | 4 (33.33%) | 8 (66.67%) | +33.34 ppt |
| Viral and Parasitic Infections | 12 | 4 (33.30%) | 10 (83.30%) | +50.00 ppt | 9 (75.00%) | 10 (83.30%) | +8.30 ppt | 0 (0.00%) | 9 (75.00%) | +75.00 ppt |
| Neoplastic and Proliferative | 20 | 10 (50.00%) | 16 (80.00%) | +30.00 ppt | 14 (70.00%) | 16 (80.00%) | +10.00 ppt | 6 (30.00%) | 15 (75.00%) | +45.00 ppt |
| Metabolic and Toxic | 10 | 2 (20.00%) | 6 (60.00%) | +40.00 ppt | 7 (70.00%) | 6 (60.00%) | −10.00 ppt | 3 (30.00%) | 8 (80.00%) | +50.00 ppt |
| Arrhythmic and Electrophysiological | 20 | 4 (20.00%) | 13 (65.00%) | +45.00 ppt | 17 (85.00%) | 13 (65.00%) | −20.00 ppt | 1 (5.00%) | 10 (50.00%) | +45.00 ppt |
| Structural and Degenerative | 10 | 8 (80.00%) | 10 (100.00%) | +20.00 ppt | 9 (90.00%) | 10 (100.00%) | +10.00 ppt | 5 (50.00%) | 8 (80.00%) | +30.00 ppt |
| Traumatic and Hemorrhagic | 8 | 6 (75.00%) | 8 (100.00%) | +25.00 ppt | 7 (87.50%) | 8 (100.00%) | +12.50 ppt | 2 (25.00%) | 7 (87.50%) | +62.50 ppt |
| Others | 16 | 16 (100.00%) | 16 (100.00%) | 0.00 ppt | 15 (93.75%) | 16 (100.00%) | +6.25 ppt | 9 (56.25%) | 12 (75.00%) | +18.75 ppt |
| Organ System Category | Total Cases | CHATGPT | GEMINI | DEEPSEEK | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Phase 1 Correct (% Acc.) Diagnosis | Phase 2 Correct (% Acc.) Diagnosis | Difference Δ % Accuracy | Phase 1 Correct (% Acc.) Diagnosis | Phase 2 Correct (% Acc.) Diagnosis | Difference Δ % Accuracy | Phase 1 Correct (% Acc.) Diagnosis | Phase 2 Correct (% Acc.) Diagnosis | Difference Δ % Accuracy | ||
| Scaly Skin Disorders | 12 | 4 (33.30%) | 10 (83.30%) | +50.00 ppt | 9 (75.00%) | 10 (83.30%) | +8.30 ppt | 2 (16.67%) | 10 (83.33%) | +66.66 ppt |
| Blistering and Nodular Skin Disorders | 28 | 12 (42.90%) | 22 (78.60%) | +35.70 ppt | 21 (75.00%) | 22 (78.60%) | +3.60 ppt | 8 (28.57%) | 21 (75.00%) | +46.43 ppt |
| Ocular System | 10 | 5 (50.00%) | 9 (90.00%) | +40.00 ppt | 7 (70.00%) | 9 (90.00%) | +20.00 ppt | 0 (0.00%) | 5 (50.00%) | +50.00 ppt |
| Oral and Mucosal System | 4 | 2 (50.00%) | 4 (100.00%) | +50.00 ppt | 3 (75.00%) | 4 (100.00%) | +25.00 ppt | 0 (0.00%) | 4 (100.00%) | +100.00 ppt |
| Cardiovascular System | 24 | 6 (25.00%) | 15 (62.50%) | +37.50 ppt | 20 (83.30%) | 15 (62.50%) | −20.80 ppt | 1 (4.17%) | 13 (54.17%) | +50.00 ppt |
| Neurological System | 12 | 9 (75.00%) | 11 (91.70%) | +16.70 ppt | 11 (91.70%) | 11 (91.70%) | 0.00 ppt | 4 (33.33%) | 11 (91.67%) | +58.34 ppt |
| Abdominopelvic and Gastrointestinal System | 16 | 10 (62.50%) | 14 (87.50%) | +25.00 ppt | 13 (81.25%) | 14 (87.50%) | +6.25 ppt | 12 (75.00%) | 13 (81.25%) | +6.25 ppt |
| Hematological and Fluid Systems | 14 | 4 (28.60%) | 12 (85.70%) | +57.10 ppt | 11 (78.60%) | 12 (85.70%) | +7.10 ppt | 6 (42.86%) | 13 (92.86%) | +50.00 ppt |
| Pulmonary and Thoracic System | 4 | 4 (100.00%) | 4 (100.00%) | 0.00 ppt | 4 (100.00%) | 4 (100.00%) | 0.00 ppt | 2 (50.00%) | 3 (75.00%) | +25.00 ppt |
| Others | 14 | 14 (100.00%) | 14 (100.00%) | 0.00 ppt | 13 (92.90%) | 14 (100.00%) | +7.10 ppt | 7 (50.00%) | 11 (78.57%) | +28.57 ppt |
| Category | Total Cases | CHATGPT Differential Diagnosis Accuracy | GEMINI Differential Diagnosis Accuracy | DEEPSEEK Differential Diagnosis Accuracy | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Average % Precision | % Median | Range (Min–Max %) | Average % Precision | Median | Range | Average % Precision | Median | Range | ||
| Papulosquamous, Plaque, and Scaling Skin Photos | 12 | 10.23 | 10.49 | 3.12–16.52 | 44.03 | 43.75 | 20.0–61.66 | 43.75 | 37.49 | 16.66–87.5 |
| Vesicular, Ulcerative, Nodular, and Pigmented Skin Photos | 28 | 5.63 | 5.52 | 0.0–11.46 | 35.45 | 32.08 | 10.0–77.5 | 36.01 | 39.58 | 0.0–66.66 |
| Ocular and Conjunctival Photographs | 8 | 4.64 | 0 | 0.0–18.58 | 37.8 | 29.17 | 0.0–92.85 | 32.29 | 20.83 | 12.5–75.0 |
| Oral and Mucosal Photographs | 4 | 7.6 | 7.6 | 6.54–8.66 | 30 | 30 | 10.0–50.0 | 41.67 | 41.67 | 0.0–83.34 |
| Electrocardiography (ECG/Rhythm Strips) | 24 | 9.01 | 10.04 | 0.0–21.43 | 39.14 | 43.75 | 7.14–87.5 | 23.75 | 16.66 | 0.0–62.5 |
| Cross-Sectional Neuroimaging (CT/MRI Brain) | 12 | 6.01 | 6.84 | 0.0–11.66 | 34.23 | 39.58 | 14.58–50.0 | 35.42 | 39.59 | 16.66–50.0 |
| Cross-Sectional Neuroimaging (CT Abdomen/Pelvis) | 12 | 6.78 | 5.32 | 0.0–18.34 | 36.9 | 39.64 | 12.14–55.0 | 27.64 | 26.66 | 0.0–58.34 |
| Microscopy (Blood Smears and Parasites) | 10 | 6.68 | 4.54 | 2.5–16.54 | 36.67 | 37.5 | 0.0–90.0 | 29.33 | 30 | 16.66–50.0 |
| Microscopy (Urine and Synovial Fluid) | 4 | 7.74 | 7.74 | 4.76–10.72 | 45 | 45 | 40.0–50.0 | 50 | 50 | 33.33–66.67 |
| Others | 24 | 6.2 | 6.08 | 0.0–14.36 | 30.28 | 33.33 | 12.5–62.5 | 30.68 | 33.33 | 12.5–50.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Asiri, R.; Ishaqui, A.A.; Ahmad, S.A.; Imran, M.; Orayj, K.; Iqbal, A. A Comparative Assessment of ChatGPT, Gemini, and DeepSeek Accuracy: Examining Visual Medical Assessment in Internal Medicine Cases with and Without Clinical Context. Diagnostics 2026, 16, 388. https://doi.org/10.3390/diagnostics16030388
Asiri R, Ishaqui AA, Ahmad SA, Imran M, Orayj K, Iqbal A. A Comparative Assessment of ChatGPT, Gemini, and DeepSeek Accuracy: Examining Visual Medical Assessment in Internal Medicine Cases with and Without Clinical Context. Diagnostics. 2026; 16(3):388. https://doi.org/10.3390/diagnostics16030388
Chicago/Turabian StyleAsiri, Rayah, Azfar Athar Ishaqui, Salman Ashfaq Ahmad, Muhammad Imran, Khalid Orayj, and Adnan Iqbal. 2026. "A Comparative Assessment of ChatGPT, Gemini, and DeepSeek Accuracy: Examining Visual Medical Assessment in Internal Medicine Cases with and Without Clinical Context" Diagnostics 16, no. 3: 388. https://doi.org/10.3390/diagnostics16030388
APA StyleAsiri, R., Ishaqui, A. A., Ahmad, S. A., Imran, M., Orayj, K., & Iqbal, A. (2026). A Comparative Assessment of ChatGPT, Gemini, and DeepSeek Accuracy: Examining Visual Medical Assessment in Internal Medicine Cases with and Without Clinical Context. Diagnostics, 16(3), 388. https://doi.org/10.3390/diagnostics16030388

