AI-Powered Physiotherapy: Evaluating LLMs Against Students in Clinical Rehabilitation Scenarios
Abstract
1. Introduction
- 1.
- Evaluate LLMs and student responses for quality and conceptual understanding.
- 2.
- Assess LLMs’ potential as educational and clinical tools in physiotherapy.
- 3.
- Explore whether GenAI could support physiotherapy, including the role of AI voice assistants and characters, with a focus on its potential to augment core clinical practices rather than disrupt them.
2. Methodology
2.1. Study Design
2.2. Participants
2.3. Question Development
- Basic knowledge (4–5 questions): these questions covered etiology, pathophysiology, and epidemiology (e.g., “How is knee osteoarthritis diagnosed clinically and radiographically (e.g., X-ray, MRI)?”).
- Diagnosis (3–4 questions): these questions focused on assessment techniques and diagnostic criteria (e.g., “Which standardized scales (e.g., EDSS, MSIS-29) do you use to quantify disability in MS patients?”).
- Alternative treatments (3–4 questions): these questions addressed complementary therapies, such as acupuncture or hydrotherapy (e.g., “What alternative treatments benefit frozen shoulder?”).
- Rehabilitation practices (3–4 questions): these questions emphasized evidence-based interventions, such as exercise or manual therapy (e.g., “Can Low Back Pain be prevented through lifestyle modifications or exercise?”).
2.4. Data Collection
2.5. LLM Query Protocol
2.6. Evaluation
2.7. Statistical Analysis
3. Results
4. Discussion
4.1. Evidence-Based Findings
4.2. Implications for AI-Augmented Learning
4.3. Future Implications
4.4. Practical Safe Use Guidelines
4.5. Limitations
4.6. Future Directions
- AI Voice Assistants: evaluate their effectiveness in delivering real-time rehabilitation guidance, particularly for home-based programs, and their impact on patient adherence and outcomes.
- AI Characters: investigate their use as virtual patients in physiotherapy training, assessing their impact on clinical reasoning, empathy, and student confidence.
- Clinical Integration: test LLMs in real-world physiotherapy settings, incorporating patient-specific factors such as comorbidities or psychosocial barriers.
- Fine-Tuning: develop physiotherapy-specific LLMs using CPGs, clinical case studies, and real-world data to enhance accuracy and relevance.
- Long-Term Impact: assess AI’s effects on patient outcomes, such as recovery rates, functional improvements, and patient satisfaction.
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Calderone, A.; Perin, P.; Orsenigo, C.; Turolla, A. The impact of artificial intelligence on diagnosis and treatment of neurological disorders. Biomedicines 2024, 12, 2415. [Google Scholar] [CrossRef]
- Safran, E.; Yildirim, S. A cross-sectional study on ChatGPT’s alignment with clinical practice guidelines in musculoskeletal rehabilitation. BMC Musculoskelet. Disord. 2025, 26, 411. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Davids, J.; Lidströmer, N.; Ashrafian, H. Artificial Intelligence for Physiotherapy and Rehabilitation; Springer eBooks; Springer: Berlin/Heidelberg, Germany, 2021; pp. 1–19. Available online: https://link.springer.com/rwe/10.1007/978-3-030-58080-3_339-1 (accessed on 15 October 2025).
- Mavrych, V.; Yousef, E.M.; Yaqinuddin, A.; Bolgova, O. Large language models in medical education: A comparative cross-platform evaluation in answering histological questions. Med. Educ. Online 2025, 30, 2534065. [Google Scholar] [CrossRef] [PubMed]
- Salam, M.A.; Imtiaz, S.; Lucy, I.B. Artificial Intelligence in Medical Education: Opportunities and Challenges. Bangladesh J. Infect. Dis. 2025, 12, 189–194. [Google Scholar] [CrossRef]
- Lowe, S.W. The role of artificial intelligence in Physical Therapy education. Bull. Fac. Phys. Ther. 2024, 29, 13. [Google Scholar] [CrossRef]
- Gürses, Ö.A.; Özüdoğru, A.; Tuncay, F.; Kararti, C. The Role of Artificial Intelligence Large Language Models in Personalized Rehabilitation Programs for Knee Osteoarthritis: An Observational Study. J. Med. Syst. 2025, 49, 73. [Google Scholar] [CrossRef]
- Bitterman, J.; D’Angelo, A.; Holachek, A.; Eubanks, J.E. Advancements in large language model accuracy for answering physical medicine and rehabilitation board review questions. PM R 2025, 17, 1091–1096. [Google Scholar] [CrossRef]
- Koes, B.W.; van Tulder, M.; Thomas, S. Diagnosis and treatment of low back pain. BMJ 2006, 332, 1430–1434. [Google Scholar] [CrossRef]
- Compston, A.; Coles, A. Multiple sclerosis. Lancet 2008, 372, 1502–1517. [Google Scholar] [CrossRef]
- Kelley, B.J.; Rodriguez, M. Frozen shoulder: Evidence and a proposed model guiding rehabilitation. J. Orthop. Sports Phys. Ther. 2009, 39, 135–148. [Google Scholar] [CrossRef]
- McAlindon, T.E.; Bannuru, R.R.; Sullivan, M.C. OARSI guidelines for the non-surgical management of knee osteoarthritis. Osteoarthr. Cartil. 2014, 22, 363–388. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Wang, Y.; Jiang, L.; Chang, Y.; Zhang, S.; Zhao, K.; Chen, L.; Gao, C. Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation. Eur. J. Med. Res. 2025, 30, 45. [Google Scholar] [CrossRef] [PubMed]
- Arbel, Y.; Gimmon, Y.; Shmueli, L. Evaluating the Potential of Large Language Models for Vestibular Rehabilitation Education: A Comparison of ChatGPT, Google Gemini, and Clinicians. Phys. Ther. 2025, 105, pzaf010. [Google Scholar] [CrossRef] [PubMed]
- Lai, X.; Chen, J.; Lai, Y.; Huang, S.; Cai, Y.; Sun, Z.; Wang, X.; Pan, K.; Gao, Q.; Huang, C. Using Large Language Models to Enhance Exercise Recommendations and Physical Activity in Clinical and Healthy Populations: Scoping Review. JMIR Med. Inform. 2025, 13, e59309. [Google Scholar] [CrossRef]
- Hao, J.; Yao, Z.; Tang, Y.; Remis, A.; Wu, K.; Yu, X. Artificial Intelligence in Physical Therapy: Evaluating ChatGPT’s Role in Clinical Decision Support for Musculoskeletal Care. Ann. Biomed. Eng. 2025, 53, 9–13. [Google Scholar] [CrossRef]
- Zhang, C.; Liu, S.; Zhou, X.; Zhou, S.; Tian, Y.; Wang, S.; Xu, N.; Li, W. Examining the Role of Large Language Models in Orthopedics: Systematic Review. J. Med. Internet Res. 2024, 26, e59607. [Google Scholar] [CrossRef]
- Ermolina, A.; Tiberius, V. Voice-Controlled Intelligent Personal Assistants in Health Care: International Delphi Study. J. Med. Internet Res. 2021, 23, e25312. [Google Scholar] [CrossRef]
- Khalid, U.B.; Naeem, M.; Stasolla, F.; Syed, M.H.; Abbas, M.; Coronato, A. Impact of AI-Powered Solutions in Rehabilitation Process: Recent Improvements and Future Trends. Int. J. Gen. Med. 2024, 17, 943–969. [Google Scholar] [CrossRef]
- Hatem, R.; Simmons, B.; Thornton, J.E. A call to address AI “hallucinations” and how healthcare professionals can mitigate their risks. Cureus 2023, 15, e44720. [Google Scholar] [CrossRef]
- Zidoun, Y.; Mardi, A.E. Artificial Intelligence (AI)-Based simulators versus simulated patients in undergraduate programs: A protocol for a randomized controlled trial. BMC Med. Educ. 2024, 24, 1260. [Google Scholar] [CrossRef]
- O’Connor, S. Virtual Reality and Avatars in Health care. Clin. Nurs. Res. 2019, 28, 523–528. [Google Scholar] [CrossRef] [PubMed]
- Foronda, C.L.; Fernandez-Burgos, M.; Nadeau, C.; Kelley, C.N.; Henry, M.N. Virtual Simulation in Nursing Education: A Systematic Review Spanning 1996 to 2018. Simul. Healthc. 2020, 15, 46–54. [Google Scholar] [CrossRef] [PubMed]
- Buch, V.H.; Ahmed, I.; Maruthappu, M. Artificial intelligence in medicine: Current trends and future possibilities. Br. J. Gen. Pract. 2018, 68, 143–144. [Google Scholar] [CrossRef] [PubMed]
- Rajpurkar, P.; Irvin, J.; Zhu, K.; Yang, B.; Mehta, H.; Duan, T.; Duan, T.; Ding, D.; Bagul, A.; Langlotz, C.P.; et al. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 2018, 15, e1002686. [Google Scholar] [CrossRef]
- Attia, Z.I.; Noseworthy, P.A.; Lopez-Jimenez, F.; Asirvatham, S.J.; Deshmukh, A.J.; Gersh, B.J.; Carter, R.E.; Yao, X.; Rabinstein, A.A.; Erickson, B.J.; et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: A retrospective analysis of outcome prediction. Lancet 2019, 394, 861–867. [Google Scholar] [CrossRef]
- Plater, J.C.; Baxter, G.D.; Wood, L.C.; Mueller, J.; Fisher, T. Development of evidence-based standards for inpatient physiotherapy services: A systematic review and content analysis of clinical practice guidelines. BMJ Open 2024, 14, e088692. [Google Scholar] [CrossRef]
- Topol, E.J. Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again; Basic Books: New York, NY, USA, 2023. [Google Scholar]
- Laranjo, L.; Dunn, A.G.; Tong, H.L.; Kocaballi, A.B.; Chen, J.; Bashir, R.; Surian, D.; Gallego, B.; Magrabi, F.; Lau, A.Y.S.; et al. Conversational agents in healthcare: A systematic review. J. Am. Med. Inform. Assoc. 2018, 25, 1248–1258. [Google Scholar] [CrossRef]
- Plackett, R.; Kassianos, A.P.; Mylan, S.; Kambouri, M.; Raine, R.; Sheringham, J. The effectiveness of using virtual patient educational tools to improve medical students’ clinical reasoning skills: A systematic review. BMC Med. Educ. 2022, 22, 365. [Google Scholar] [CrossRef]
- McComiskie, E. AI: The Future of Physio? The Chartered Society of Physiotherapy. 2023. Available online: https://www.csp.org.uk/frontline/article/ai-future-physio (accessed on 20 October 2025).
- Singh, S.; Bansal, S.; Saddik, A.; Saini, M. From ChatGPT to DeepSeek AI: A Comprehensive Analysis of Evolution, Deviation, and Future Implications in AI-Language Models. arXiv 2025, arXiv:2504.03219. [Google Scholar] [CrossRef]
- Green, J. Artificial intelligence in communication sciences and disorders: Introduction to the forum. J. Speech Lang. Hear. Res. 2024, 67, 3093–3097. [Google Scholar] [CrossRef]
- Zhang, Q.; Zhu, Y.; Cordeiro, F.; Chen, Q. PSSCL: A progressive sample selection framework with contrastive loss designed for noisy labels. Pattern Recognit. 2025, 161, 111284. [Google Scholar] [CrossRef]
- Zhang, Q.; Chen, Q. A Two-Stage Noisy Label Learning Framework with Uniform Consistency Selection and Robust Training. Appl. Intell. 2026, 56, 21. [Google Scholar] [CrossRef]
- Bulan, P.M.P.; Kuizon, D.A.Y.; Casaña, R.S.E.; Fuentes, C.G.; Pestaño, N.Y.; Suerte, J.R.O. A Scoping Review on Artificial Intelligence in Occupational Therapy. OTJR 2025. Online ahead of print. [Google Scholar] [CrossRef]
- Masters, K. Submitting artificial intelligence in health professions education papers to Medical Teacher. Med. Teach. 2024, 46, 1256–1257. [Google Scholar] [CrossRef]
- Vygotsky, L.S. Mind in Society: The Development of Higher Psychological Processes; Harvard University Press: Cambridge, MA, USA, 1978. [Google Scholar]
- Meskó, B.; Topol, E.J. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. npj Digit. Med. 2023, 6, 120. [Google Scholar] [CrossRef]
- Liu, W.; Hu, J.; Lv, F.; Tang, Z. A new method for long-term temperature compensation of structural health monitoring by ultrasonic guided wave. Measurement 2025, 252, 117310. [Google Scholar] [CrossRef]
- European Union. General Data Protection Regulation (GDPR). 2016. Available online: https://gdpr.eu (accessed on 25 November 2025).


| Study | Focus | Methods | Key Findings | Distinction from Our Work |
|---|---|---|---|---|
| Safran and Yildirim (2025) [2] | CPG alignment in musculoskeletal rehabilitation | LLM evaluation on 50 queries | 80% adherence, gaps in radicular pain | Domain-specific (physiology students vs. LLMs in multi-domain open questions) |
| Wang et al. (2025) [14] | Lumbar disc herniation management | ChatGPT-4o on vignettes | High accuracy in basics, low in personalization | Includes student comparison, broader rehabilitation domains |
| Mavrych et al. (2025) [5] | Histological Q&A in medical education | Cross-platform LLM evaluation | GPT-4o > students in accuracy | Physiotherapy-specific, conceptual depth focus |
| Gürses et al. (2025) [8] | Personalized rehabilitation programs for knee OA | Observational study on LLMs | Role in enhancing adherence and personalization | Student-LLM benchmark in education, not just programs |
| Bitterman et al. (2025) [9] | PM&R question accuracy | LLM vs. board review questions | LLMs 85% accurate in basics | Direct student comparison under exam constraints |
| Arbel et al. (2025) [15] | LLMs vs. clinicians in vestibular rehabilitation | Comparative response evaluation | LLMs comparable in routine cases | Focus on students as entry-level proxies |
| Lai et al. (2025) [16] | LLM-enhanced exercise recommendations | Scoping review of prompts | Improved personalization in clinical populations | Exploratory written quality in diverse domains |
| Hao et al. (2025) [17] | ChatGPT accuracy in PT decision support | Clinical query testing for musculoskeletal care | 78% alignment with guidelines | Multi-LLM and student-inclusive design |
| Zhang et al. (2024) [18] | LLMs in orthopedics | Exam question evaluation | 55–93% accuracy | Broader rehabilitation and conceptual understanding emphasis |
| Lowe (2024) [7] | AI in PT education | Review of integration | Potential for case-based learning | Empirical benchmarking vs. review |
| Domain | Group | Relevance | Accuracy | Clarity | Completeness | CPG Consistency | Global Quality |
|---|---|---|---|---|---|---|---|
| Low Back Pain | Students | 3.8 ± 0.6 | 3.7 ± 0.7 | 3.6 ± 0.6 | 3.5 ± 0.7 | 3.6 ± 0.7 | 3.65 ± 0.6 |
| ChatGPT | 4.6 ± 0.4 | 4.5 ± 0.5 | 4.8 ± 0.3 | 4.7 ± 0.4 | 4.5 ± 0.5 | 4.65 ± 0.4 | |
| DeepSeek | 4.4 ± 0.5 | 4.3 ± 0.5 | 4.6 ± 0.4 | 4.5 ± 0.5 | 4.4 ± 0.5 | 4.45 ± 0.4 | |
| Multiple Sclerosis | Students | 3.6 ± 0.7 | 3.5 ± 0.8 | 3.4 ± 0.7 | 3.3 ± 0.8 | 3.5 ± 0.8 | 3.45 ± 0.7 |
| ChatGPT | 4.3 ± 0.5 | 4.2 ± 0.6 | 4.5 ± 0.4 | 4.4 ± 0.5 | 4.2 ± 0.6 | 4.35 ± 0.5 | |
| DeepSeek | 4.7 ± 0.3 | 4.6 ± 0.4 | 4.8 ± 0.3 | 4.7 ± 0.3 | 4.6 ± 0.4 | 4.70 ± 0.3 | |
| Frozen Shoulder | Students | 4.0 ± 0.5 | 4.1 ± 0.6 | 3.8 ± 0.6 | 3.7 ± 0.6 | 4.0 ± 0.5 | 3.90 ± 0.5 |
| ChatGPT | 4.4 ± 0.4 | 4.3 ± 0.5 | 4.6 ± 0.4 | 4.5 ± 0.4 | 4.3 ± 0.5 | 4.45 ± 0.4 | |
| DeepSeek | 4.3 ± 0.5 | 4.2 ± 0.5 | 4.5 ± 0.4 | 4.4 ± 0.5 | 4.2 ± 0.5 | 4.35 ± 0.4 | |
| Knee Osteoarthritis | Students | 3.7 ± 0.6 | 3.6 ± 0.7 | 3.5 ± 0.6 | 3.4 ± 0.7 | 3.6 ± 0.7 | 3.55 ± 0.6 |
| ChatGPT | 4.7 ± 0.3 | 4.6 ± 0.4 | 4.8 ± 0.3 | 4.7 ± 0.3 | 4.6 ± 0.4 | 4.70 ± 0.3 | |
| DeepSeek | 4.5 ± 0.4 | 4.4 ± 0.5 | 4.6 ± 0.4 | 4.5 ± 0.4 | 4.4 ± 0.5 | 4.50 ± 0.4 |
| Domain | Students | ChatGPT | DeepSeek |
|---|---|---|---|
| Low Back Pain | 3.7 ± 0.6 | 4.6 ± 0.4 | 4.4 ± 0.5 |
| Multiple Sclerosis | 3.4 ± 0.7 | 4.3 ± 0.5 | 4.7 ± 0.3 |
| Frozen Shoulder | 3.9 ± 0.5 | 4.4 ± 0.4 | 4.3 ± 0.5 |
| Knee Osteoarthritis | 3.6 ± 0.6 | 4.7 ± 0.3 | 4.5 ± 0.4 |
| Domain | Subcategory | p-Value (ANOVA/Kruskal–Wallis) | Post Hoc (Students vs. ChatGPT) | Post Hoc (Students vs. DeepSeek) |
|---|---|---|---|---|
| Low Back Pain | Basic Knowledge | <0.001 | <0.001 | <0.001 |
| Diagnosis | 0.002 | 0.003 | 0.005 | |
| Alternative Treatments | <0.001 | <0.001 | <0.001 | |
| Rehabilitation Practices | <0.001 | <0.001 | <0.001 | |
| Multiple Sclerosis | Basic Knowledge | <0.001 | <0.001 | <0.001 |
| Diagnosis | 0.001 | 0.002 | <0.001 | |
| Alternative Treatments | <0.001 | <0.001 | <0.001 | |
| Rehabilitation Practices | <0.001 | <0.001 | <0.001 | |
| Frozen Shoulder | Basic Knowledge | 0.001 | 0.002 | 0.003 |
| Diagnosis | 0.12 | 0.15 | 0.18 | |
| Alternative Treatments | <0.001 | <0.001 | <0.001 | |
| Rehabilitation Practices | 0.002 | 0.003 | 0.004 | |
| Knee Osteoarthritis | Basic Knowledge | <0.001 | <0.001 | <0.001 |
| Diagnosis | 0.003 | 0.004 | 0.006 | |
| Alternative Treatments | <0.001 | <0.001 | <0.001 | |
| Rehabilitation Practices | <0.001 | <0.001 | <0.001 |
| Domain | Subcategory | Students | ChatGPT | DeepSeek |
|---|---|---|---|---|
| Low Back Pain | Basic Knowledge | 3.8 (0.6) | 4.6 (0.4) | 4.4 (0.5) |
| Diagnosis | 3.9 (0.6) | 4.5 (0.4) | 4.3 (0.5) | |
| Alternative Treatments | 3.4 (0.8) | 4.6 (0.4) | 4.5 (0.5) | |
| Rehabilitation Practices | 3.6 (0.7) | 4.7 (0.3) | 4.5 (0.4) | |
| Multiple Sclerosis | Basic Knowledge | 3.6 (0.7) | 4.3 (0.5) | 4.7 (0.3) |
| Diagnosis | 3.7 (0.7) | 4.2 (0.6) | 4.6 (0.4) | |
| Alternative Treatments | 3.3 (0.8) | 4.4 (0.5) | 4.7 (0.3) | |
| Rehabilitation Practices | 3.5 (0.8) | 4.4 (0.5) | 4.7 (0.3) | |
| Frozen Shoulder | Basic Knowledge | 4.0 (0.5) | 4.4 (0.4) | 4.3 (0.5) |
| Diagnosis | 4.1 (0.6) | 4.3 (0.5) | 4.2 (0.5) | |
| Alternative Treatments | 3.7 (0.6) | 4.5 (0.4) | 4.4 (0.5) | |
| Rehabilitation Practices | 3.8 (0.6) | 4.5 (0.4) | 4.4 (0.5) | |
| Knee Osteoarthritis | Basic Knowledge | 3.7 (0.6) | 4.7 (0.3) | 4.5 (0.4) |
| Diagnosis | 3.8 (0.7) | 4.6 (0.4) | 4.4 (0.5) | |
| Alternative Treatments | 3.4 (0.7) | 4.7 (0.3) | 4.5 (0.4) | |
| Rehabilitation Practices | 3.5 (0.7) | 4.7 (0.3) | 4.5 (0.4) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Michou, I.; Fouras, A.; Chrysanthakopoulou, D.; Theodoritsi, M.; Mariettou, S.; Stellatou, S.; Koutsojannis, C. AI-Powered Physiotherapy: Evaluating LLMs Against Students in Clinical Rehabilitation Scenarios. Appl. Sci. 2026, 16, 1165. https://doi.org/10.3390/app16031165
Michou I, Fouras A, Chrysanthakopoulou D, Theodoritsi M, Mariettou S, Stellatou S, Koutsojannis C. AI-Powered Physiotherapy: Evaluating LLMs Against Students in Clinical Rehabilitation Scenarios. Applied Sciences. 2026; 16(3):1165. https://doi.org/10.3390/app16031165
Chicago/Turabian StyleMichou, Ioanna, Athanasios Fouras, Dionysia Chrysanthakopoulou, Marina Theodoritsi, Savina Mariettou, Sotiria Stellatou, and Constantinos Koutsojannis. 2026. "AI-Powered Physiotherapy: Evaluating LLMs Against Students in Clinical Rehabilitation Scenarios" Applied Sciences 16, no. 3: 1165. https://doi.org/10.3390/app16031165
APA StyleMichou, I., Fouras, A., Chrysanthakopoulou, D., Theodoritsi, M., Mariettou, S., Stellatou, S., & Koutsojannis, C. (2026). AI-Powered Physiotherapy: Evaluating LLMs Against Students in Clinical Rehabilitation Scenarios. Applied Sciences, 16(3), 1165. https://doi.org/10.3390/app16031165

