Semantic Evaluation of Nursing Assessment Scales Translations by ChatGPT 4.0: A Lexicometric Analysis
Abstract
1. Introduction
2. Materials and Methods
2.1. Database Building
2.2. Lexicometric Analysis
2.3. Semantic Subgroup Analysis
2.4. Advanced Embedding-Based Consistency Metrics
3. Results
3.1. Predictive Performance
3.2. Subgroup Analysis
3.3. Sentence Length
3.4. Presence of Negations
3.5. Presence of Intensity
4. Discussion
5. Limitations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Public Involvement Statement
Guidelines and Standards Statement
Use of Artificial Intelligence
Conflicts of Interest
References
- Neugebauer, J.; Tóthová, V.; Doležalová, J. Use of Standardized and Non-Standardized Tools for Measuring the Risk of Falls and Independence in Clinical Practice. Int. J. Environ. Res. Public Health 2021, 18, 3226. [Google Scholar] [CrossRef] [PubMed]
- Seton, R.; Wetzer, E.; Hultin, L. The impact of a risk assessment tool on hospital pressure injury prevalence and prevention: A quantitative pre-post evaluation. Int. J. Nurs. Stud. Adv. 2025, 8, 100342. [Google Scholar] [CrossRef]
- Spaner, D.; Caraiscos, V.B.; Muystra, C.; Furman, M.L.; Zaltz-Dubin, J.; Wharton, M.; Whitehead, K. Use of Standardized Assessment Tools to Improve the Effectiveness of Palliative Care Rounds: A Quality Improvement Initiative. J. Palliat. Care 2017, 32, 134–140. [Google Scholar] [CrossRef]
- Allan, H.T.; Westwood, S. English language skills requirements for internationally educated nurses working in the care industry: Barriers to UK registration or institutionalised discrimination? Int. J. Nurs. Stud. 2016, 54, 1–4. [Google Scholar] [CrossRef] [PubMed]
- Beaton, D.E.; Bombardier, C.; Guillemin, F.; Ferraz, M.B. Guidelines for the Process of Cross-Cultural Adaptation of Self-Report Measures. Spine 2000, 25, 3186–3191. [Google Scholar] [CrossRef]
- Anazawa, R.; Ishikawa, H.; Park, M.; Kiuchi, T. Online Machine Translation Use with Nursing Literature: Evaluation Method and Usability. Comput. Infor. Nurs. 2013, 31, 59–65. [Google Scholar] [CrossRef]
- Anazawa, R.; Ishikawa, H.; Takahiro, K. Use of Online Machine Translation for Nursing Literature: A Questionnaire-Based Survey. Open Nurs. J. 2013, 7, 22–28. [Google Scholar] [CrossRef] [PubMed]
- Anazawa, R.; Ishikawa, H.; Takahiro, K. Evaluation of Online Machine Translation by Nursing Users. Comput. Inform. Nurs. 2013, 31, 382–387. [Google Scholar] [CrossRef]
- OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; et al. GPT-4 Technical Report. arXiv 2024, arXiv:2303.08774. Available online: http://arxiv.org/abs/2303.08774 (accessed on 4 May 2025).
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. Available online: https://arxiv.org/abs/1706.03762 (accessed on 4 May 2025).
- Reynolds, L.; McDonell, K. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm. arXiv 2021, arXiv:2102.07350. Available online: https://arxiv.org/abs/2102.07350 (accessed on 4 May 2025).
- Zhao, T.Z.; Wallace, E.; Feng, S.; Klein, D.; Singh, S. Calibrate Before Use: Improving Few-Shot Performance of Language Models. arXiv 2021, arXiv:2102.09690. Available online: https://arxiv.org/abs/2102.09690 (accessed on 4 May 2025).
- Olczak, J.; Pavlopoulos, J.; Prijs, J.; Ijpma, F.F.A.; Doornberg, J.N.; Lundström, C.; Hedlund, J.; Gordon, M. Presenting artificial intelligence, deep learning, and machine learning studies to clinicians and healthcare stakeholders: An introductory reference with a guideline and a Clinical AI Research (CAIR) checklist proposal. Acta Orthop. 2021, 92, 513–525. [Google Scholar] [CrossRef]
- Behr, D. Assessing the use of back translation: The shortcomings of back translation as a quality testing method. Int. J. Soc. Res. Methodol. 2017, 20, 573–584. [Google Scholar] [CrossRef]
- Harkness, J.; Pennell, B.; Schoua-Glusberg, A. Survey Questionnaire Translation and Assessment. In Methods for Testing and Evaluating Survey Questionnaires, 1st ed.; Presser, S., Rothgeb, J.M., Couper, M.P., Lessler, J.T., Martin, E., Martin, J., Singer, E., Eds.; Wiley: Hoboken, NJ, USA, 2004; pp. 453–473. Available online: https://onlinelibrary.wiley.com/doi/10.1002/0471654728.ch22 (accessed on 4 May 2025).
- Spielberger, C.D.; Merenda, P.F. (Eds.) Adapting Educational and Psychological Tests for Cross-Cultural Assessment; Applied Psychology; Erlbaum Associates: Mahwah, NJ, USA, 2005; 378p. [Google Scholar]
- York University. Prompts Library. AI Tools & Resources. Available online: https://www.yorku.ca/uit/ai/prompts/ (accessed on 4 May 2025).
- Maastricht University. Translate Prompt. Available online: https://www.maastrichtuniversity.nl/translate (accessed on 4 May 2025).
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 3982–3992. Available online: https://www.aclweb.org/anthology/D19-1410 (accessed on 4 May 2025).
- Jaccard, P. Étude Comparative de la Distribution Florale Dans Une Portion des Alpes et du Jura. 1901. Available online: https://www.e-periodica.ch/digbib/view?pid=bsv-002:1901:37::790 (accessed on 4 May 2025).
- Salton, G.; McGill, M.J. Introduction to Modern Information Retrieval, 3rd ed.; McGraw-Hill International Editions; McGraw-Hill Book Comp: New York, NY, USA, 1987; 448p. [Google Scholar]
- Salton, G.; Buckley, C. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 1988, 24, 513–523. [Google Scholar] [CrossRef]
- Manning, C.D.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics-ACL ’02, Philadelphia, PA, USA, 7–12 July 2002; Association for Computational Linguistics: Philadelphia, PA, USA, 2001; p. 311. Available online: http://portal.acm.org/citation.cfm?doid=1073083.1073135 (accessed on 4 May 2025).
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4171–4186. Available online: http://aclweb.org/anthology/N19-1423 (accessed on 4 May 2025).
- Hossain, M.M.; Anastasopoulos, A.; Blanco, E.; Palmer, A. It’s not a Non-Issue: Negation as a Source of Error in Machine Translation. arXiv 2020, arXiv:2010.05432. Available online: https://arxiv.org/abs/2010.05432 (accessed on 4 May 2025).
- Carrillo-de-Albornoz, J.; Plaza, L. An emotion-based model of negation, intensifiers, and modality for polarity and intensity classification. J. Assoc. Inf. Sci. Technol. 2013, 64, 1618–1633. [Google Scholar] [CrossRef]
- Bordet, L. From Vogue Words to Lexicalized Intensifying Words: The Renewal and Recycling of Intensifiers in English. A Case-Study of Very. Lexis. 2016. Available online: http://journals.openedition.org/lexis/1125 (accessed on 4 May 2025).
- Sblendorio, E.; Dentamaro, V.; Lo Cascio, A.; Germini, F.; Piredda, M.; Cicolini, G. Integrating human expertise & automated methods for a dynamic and multi-parametric evaluation of large language models’ feasibility in clinical decision-making. Int. J. Med. Inform. 2024, 188, 105501. [Google Scholar]
- Galli, C.; Donos, N.; Calciolari, E. Performance of 4 Pre-Trained Sentence Transformer Models in the Semantic Query of a Systematic Review Dataset on Peri-Implantitis. Information 2024, 15, 68. [Google Scholar] [CrossRef]
- Dash, D.; Tharpa, R.; Swaminathan, A.; Kashyap, M.; Kotecha, N.; Cheatham, M.; Banda, J.; Chen, J.; Gombar, S.; Downing, L.; et al. How Well Do Large Language Models Support Clinician Information Needs? Stanford HAI. 2024. Available online: https://hai.stanford.edu/news/how-well-do-large-language-models-support-clinician-information-needs (accessed on 4 May 2025).
- Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.L.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training language models to follow instructions with human feedback. arXiv 2022, arXiv:2203.02155. Available online: https://arxiv.org/abs/2203.02155 (accessed on 4 May 2025).
- Bernardini, S. Think-aloud protocols in translation research: Achievements, limits, future prospects. Target 2001, 13, 241–263. [Google Scholar] [CrossRef]
- Van De Vijver, F.; Hambleton, R.K. Translating Tests. Eur. Psychol. 1996, 1, 89–99. [Google Scholar] [CrossRef]
- Castilho, S.; Moorkens, J.; Gaspari, F.; Calixto, I.; Tinsley, J.; Way, A. Is Neural Machine Translation the New State of the Art? Prague Bull. Math. Linguist. 2017, 108, 109–120. [Google Scholar] [CrossRef]
Name | Prompt |
---|---|
York (York University) | Translate the following document from English to Italian, ensuring that the translation maintains the original meaning and context. Pay special attention to technical terms and industry-specific jargon to ensure accuracy and consistency. |
Maastricht (Maastricht University) | Above, you see a text in English language. Please translate it to Italian language. Do not print the original text, just the translation. Follow the following instructions: Ensure the translation accurately reflects the original text’s meaning. The translation should have correct grammar, including proper sentence structure, verb conjugation, punctuation, and the correct use of articles. The translation should read naturally and fluently as if originally written in the target language. Avoid awkward phrasing or literal translations that sound unnatural. Pay special attention to proper nouns and specific terms. Names of people, places, organizations, and other terms that should not be translated must be handled with care to maintain their original meaning and recognition. Ensure that the translation maintains the original text’s tone and style. |
York | Maastricht | p-Value | |
---|---|---|---|
SBERT M(SD) | 0.814 (0.137) | 0.819 (0.137) | 0.026 |
Jaccard M(SD) | 0.413 (0.238) | 0.430 (0.247) | >0.001 |
TF-IDF M(SD) | 0.478 (0.262) | 0.495 (0.269) | >0.001 |
Overlap M(SD) | 0.559 (0.246) | 0.574 (0.246) | >0.001 |
Metric | York | Maastricht |
---|---|---|
Sensitivity | 0.93 | 0.932 |
Specificity | 1 | 1 |
Precision | 1 | 1 |
Accuracy | 0.96 | 0.964 |
F1 Score | 0.96 | 0.965 |
SBERT | SBERT > 85% | ||||||
---|---|---|---|---|---|---|---|
Group | York | Maastricht | York | Maastricht | Cohen’s D | Student t | p-Value |
Length | |||||||
Short (n = 539) | 0.799 | 0.805 | 0.443 | 0.456 | −0.110 | −2.546 | 0.011 |
Medium (n = 205) | 0.855 | 0.853 | 0.639 | 0.600 | 0.060 | 0.861 | 0.390 |
Long (n = 28) | 0.820 | 0.827 | 0.357 | 0.429 | −0.160 | −0.844 | 0.406 |
Negations | |||||||
True (n = 37) | 0.853 | 0.856 | 0.676 | 0.595 | −0.070 | −0.425 | 0.674 |
False (n = 735) | 0.812 | 0.817 | 0.483 | 0.488 | −0.080 | −2.181 | 0.030 |
Intensity (false) (n = 772) | 0.814 | 0.819 | 0.492 | 0.494 | −0.080 | −2.223 | 0.027 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Parozzi, M.; Bozzetti, M.; Lo Cascio, A.; Napolitano, D.; Pendoni, R.; Marcomini, I.; Sblendorio, E.; Cangelosi, G.; Mancin, S.; Bonacaro, A. Semantic Evaluation of Nursing Assessment Scales Translations by ChatGPT 4.0: A Lexicometric Analysis. Nurs. Rep. 2025, 15, 211. https://doi.org/10.3390/nursrep15060211
Parozzi M, Bozzetti M, Lo Cascio A, Napolitano D, Pendoni R, Marcomini I, Sblendorio E, Cangelosi G, Mancin S, Bonacaro A. Semantic Evaluation of Nursing Assessment Scales Translations by ChatGPT 4.0: A Lexicometric Analysis. Nursing Reports. 2025; 15(6):211. https://doi.org/10.3390/nursrep15060211
Chicago/Turabian StyleParozzi, Mauro, Mattia Bozzetti, Alessio Lo Cascio, Daniele Napolitano, Roberta Pendoni, Ilaria Marcomini, Elena Sblendorio, Giovanni Cangelosi, Stefano Mancin, and Antonio Bonacaro. 2025. "Semantic Evaluation of Nursing Assessment Scales Translations by ChatGPT 4.0: A Lexicometric Analysis" Nursing Reports 15, no. 6: 211. https://doi.org/10.3390/nursrep15060211
APA StyleParozzi, M., Bozzetti, M., Lo Cascio, A., Napolitano, D., Pendoni, R., Marcomini, I., Sblendorio, E., Cangelosi, G., Mancin, S., & Bonacaro, A. (2025). Semantic Evaluation of Nursing Assessment Scales Translations by ChatGPT 4.0: A Lexicometric Analysis. Nursing Reports, 15(6), 211. https://doi.org/10.3390/nursrep15060211