Knee Osteoarthritis Severity Grading Using Contrastive Learning Image Pre-Training
Abstract
1. Introduction
2. Background
2.1. Deep Learning in Knee Osteoarthritis KL Severity Grading
2.2. Contrastive Language–Image Pre-Training (CLIP)
2.2.1. Architecture
2.2.2. Applications of CLIP in Medical Imaging
3. Methods
3.1. Datasets
3.2. Evaluation Metrics
3.3. CLIP Setup
3.4. Implementation and Fine-Tuning of the CLIP Model
3.5. Reproducibility and Training Environment
- Preprocessing and Augmentation: Input radiographs were resized to a fixed resolution of 224 × 224 pixels and normalized according to standard ImageNet statistics. Data augmentations applied exclusively to the training folds included random horizontal flips (p = 0.5) and random geometric rotations up to ±10.
- Hyperparameters: Model weights were optimized using the AdamW optimizer with an initialized base learning rate of 1 × 10−4 and a weight decay setting of 1 × 10−5. The training ran for 30 epochs under a standard symmetric cross-entropy loss function, controlled by a cosine annealing scheduler with a global batch size of 64.
- Hardware and Runtime: All scripts were executed on a high-performance computing node equipped with an NVIDIA GTX 1650 (4 GB memory) GPU running PyTorch 2.5.1, Python 3.9. The entire training runtime across epochs spanned approximately 42 min. The global random number seed was fixed to 42 to eliminate stochastic variations.
4. Results
4.1. Fine-Tuned Model Evaluation
4.2. Comparative Analysis with Other Pre-Trained Models
4.3. Comparative Analysis with Other Related Works
4.4. Images to Queries Similarity Scores
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hunter, H.; Ryan, M.S. Knee Osteoarthritis-Statpearls-NCBI Bookshelf. 4 August 2019. Available online: https://www.ncbi.nlm.nih.gov/books/NBK507884/ (accessed on 2 February 2023).
- Schiphof, D.; Boers, M.; Bierma-Zeinstra, S.M. Differences in descriptions of Kellgren and Lawrence grades of knee osteoarthritis. Ann. Rheum. Dis. 2008, 67, 1034–1036. [Google Scholar] [CrossRef]
- Kellgren, J.H.; Lawrence, J.S. Radiological Assessment of Osteo-Arthrosis. Ann. Rheum. Dis. 1957, 16, 494–502. [Google Scholar] [CrossRef]
- Chen, P.; Gao, L.; Shi, X.; Allen, K.; Yang, L. Fully automatic knee osteoarthritis severity grading using deep neural networks with a novel ordinal loss. Comput. Med. Imaging Graph. 2019, 75, 84–92. [Google Scholar] [CrossRef] [PubMed]
- Roy, S.; Meena, T.; Lim, S.-J. Demystifying supervised learning in healthcare 4.0: A new reality of transforming diagnostic medicine. Diagnostics 2022, 12, 2549. [Google Scholar] [CrossRef]
- Pi, S.W.; Lee, B.D.; Lee, M.S.; Lee, H.J. Ensemble deep-learning networks for automated osteoarthritis grading in knee X-ray images. Sci. Rep. 2023, 13, 22887. [Google Scholar] [CrossRef] [PubMed]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Helwan, A.; Azar, D.; Abdellatef, H. An update on the knee osteoarthritis severity grading using wide residual learning. J. X-Ray Sci. Technol. 2022, 30, 1009–1021. [Google Scholar] [CrossRef]
- Schmidt, J.E.; Amrami, K.K.; Manduca, A.; Kaufman, K.R. Semi-automated digital image analysis of joint space width in knee radiographs. Skelet. Radiol. 2005, 34, 639–643. [Google Scholar] [CrossRef]
- Sekhri, A.; Kerkouri, M.A.; Chetouani, A.; Tliba, M.; Nasser, Y.; Jennane, R.; Bruno, A. Automatic diagnosis of knee osteoarthritis severity using Swin transformer. In Proceedings of the 20th International Conference on Content-Based Multimedia Indexing, Orleans, France, 20–22 September 2023; pp. 41–47. [Google Scholar]
- Wang, Y.; Wang, X.; Gao, T.; Du, L.; Liu, W. An automatic knee osteoarthritis diagnosis method based on deep learning: Data from the osteoarthritis initiative. J. Healthc. Eng. 2021, 2021, 5586529. [Google Scholar] [CrossRef]
- Wang, Z.; Chetouani, A.; Jarraya, M.; Hans, D.; Jennane, R. Transformer with Selective Shuffled Position Embedding and key-patch exchange strategy for early detection of Knee Osteoarthritis. Expert Syst. Appl. 2024, 255, 124614. [Google Scholar] [CrossRef]
- Jahan, M.; Hasan, M.Z.; Samia, I.J.; Fatema, K.; Rony, M.A.H.; Arefin, M.S.; Moustafa, A. KOA-CCTNet: An Enhanced Knee Osteoarthritis Grade Assessment Framework Using Modified Compact Convolutional Transformer Model. IEEE Access 2024, 12, 107719–107741. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Association for Computational Linguistics (ACL): Stroudsburg, PA, USA, 2019. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2021; pp. 8748–8763. [Google Scholar]
- Antony, J.; McGuinness, K.; Moran, K.; O’Connor, N.E. Automatic detection of knee joints and quantification of knee osteoarthritis severity using convolutional neural networks. In Proceedings of the Machine Learning and Data Mining in Pattern Recognition: 13th International Conference, MLDM 2017, New York, NY, USA, 15–20 July 2017; Springer International Publishing: Cham, Switzerland, 2017; pp. 376–390. [Google Scholar]
- Tiulpin, A.; Thevenot, J.; Rahtu, E.; Lehenkari, P.; Saarakkala, S. Automatic knee osteoarthritis diagnosis from plain radiographs: A deep learning-based approach. Sci. Rep. 2018, 8, 1727. [Google Scholar] [CrossRef]
- Alshamrani, H.A.; Rashid, M.; Alshamrani, S.S.; Alshehri, A.H. Osteo-net: An automated system for predicting knee osteoarthritis from x-ray images using transfer-learning-based neural networks approach. Healthcare 2023, 11, 1206. [Google Scholar] [CrossRef] [PubMed]
- Wahyuningrum, R.T.; Anifah, L.; Purnama, I.K.E.; Purnomo, M.H. A new approach to classify knee osteoarthritis severity from radiographic images based on CNN-LSTM method. In Proceedings of the 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST); IEEE: New York, NY, USA, 2019; pp. 1–6. [Google Scholar]
- Rani, S.; Memoria, M.; Choudhury, T.; Sar, A. A Comprehensive Review of Machine Learning’s Role within KOA. EAI Endorsed Trans. Internet Things 2024, 10, 1. [Google Scholar] [CrossRef]
- Zhang, C.; Chen, S.; Cigdem, O.; Rajamohan, H.R.; Cho, K.; Kijowski, R.; Deniz, C.M. MR-Transformer: Vision Transformer for Total Knee Replacement Prediction Using Magnetic Resonance Imaging. arXiv 2024, arXiv:2405.02784. [Google Scholar] [CrossRef]
- Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M.K. Occam’s razor. Inf. Process. Lett. 1987, 24, 377–380. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2016; pp. 770–778. [Google Scholar]
- Eslami, S.; Meinel, C.; De Melo, G. Pubmedclip: How much does clip benefit visual question answering in the medical domain? In Proceedings of the Findings of the Association for Computational Linguistics; European Chapter of the Association for Computational Linguistics: Rabat, Morocco, 2023; pp. 1151–1163. [Google Scholar]
- Holste, G.; Zhou, Y.; Wang, S.; Jaiswal, A.; Lin, M.; Zhuge, S.; Yang, Y.; Kim, D.; Nguyen-Mau, T.H.; Tran, M.T.; et al. Towards long-tailed, multi-label disease classification from chest x-ray: Overview of the cxr-lt challenge. Med. Image Anal. 2024, 97, 103224. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Wu, Z.; Agarwal, D.; Sun, J. MedCLIP: Contrastive learning from unpaired medical images and text. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing; Association for Computational Linguistics: Abu Dhabi, United Arab Emirates, 2022; pp. 3876–3887. [Google Scholar] [CrossRef]
- Zhao, Z.; Liu, Y.; Wu, H.; Li, Y.; Wang, S.; Teng, L.; Liu, D.; Cui, Z.; Wang, Q.; Shen, D. Clip in medical imaging: A comprehensive survey. arXiv 2023, arXiv:2312.07353. [Google Scholar]
- Martín Pérez, I.M.; Bourhim, S.; Martín Pérez, S.E. Artificial Intelligence-Based Models for Automated Bone Age Assessment from Posteroanterior Wrist X-Rays: A Systematic Review. Appl. Sci. 2025, 15, 5978. [Google Scholar] [CrossRef]
- Luo, H.; Bao, J.; Wu, Y.; He, X.; Li, T. Segclip: Patch aggregation with learnable centers for open-vocabulary semantic segmentation. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2023; pp. 23033–23044. [Google Scholar]
- Tanida, T.; Müller, P.; Kaissis, G.; Rueckert, D. Interactive and explainable region-guided radiology report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2023; pp. 7433–7442. [Google Scholar]
- Muller, P.; Meissen, F.; Brandt, J.; Kaissis, G.; Rueckert, D. Anatomydriven pathology detection on chest x-rays. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2023; pp. 57–66. [Google Scholar]
- Mehta, S.; Mercan, E.; Bartlett, J.; Weaver, D.; Elmore, J.G.; Shapiro, L. Y-net: Joint segmentation and classification for diagnosis of breast biopsy images. In Proceedings of the Medical Image Computing and Computer Assisted Intervention MICCAI 2018: 21st International Conference, Granada, Spain, 16–20 September 2018, Proceedings, Part II 11; Springer: Cham, Switzerland, 2018; pp. 893–901. [Google Scholar]
- Rahimi, S.; Oktay, O.; Alvarez-Valle, J.; Bharadwaj, S. Addressing the exorbitant cost of labeling medical images with active learning. In Proceedings of the International Conference on Machine Learning in Medical Imaging and Analysis, Strasbourg, France, 27 September 2021; p. 1. [Google Scholar]
- Qu, C.; Zhang, T.; Qiao, H.; Liu, J.; Tang, Y.; Yuille, A.; Zhou, Z. Abdomenatlas-8k: Annotating 8000 ct volumes for multi-organ segmentation in three weeks. In Proceedings of the Thirty-Seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Gornale, S.; Patravali, P. Digital Knee X-Ray Images. In Mendeley Data; V1; Elsevier: Amsterdam, The Netherlands, 2020. [Google Scholar] [CrossRef]
- Ma’aitah, M.K.S.; Helwan, A.; Radwan, A.; Mohammad Salem Manasreh, A.; Alshareef, E.A. Multimodal model for knee osteoarthritis KL grading from plain radiograph. J. X-Ray Sci. Technol. 2025, 33, 608–620. [Google Scholar] [CrossRef] [PubMed]
- Howard, A.G. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]








| Grade | Textual Description | |
|---|---|---|
![]() | 1 | Grade 1 (none): definite absence of X-ray changes of osteoarthritis |
![]() | 2 | Grade 2 (doubtful): doubtful joint space narrowing and possible osteophytic lipping |
![]() | 3 | Grade 3 (mild): definite osteophytes and possible joint space narrowing |
![]() | 4 | Grade 4 (moderate): moderate multiple osteophytes, definite narrowing of joint space, some sclerosis, and possible deformity of bone ends |
![]() | 5 | Grade 5 (severe): large osteophytes, marked narrowing of joint space, severe sclerosis, and definite deformity of bone ends |
| Train | Test | |
|---|---|---|
| Accuracy | 96.91 | 76.94 |
| Macro Precision | 97.10 | 79.57 |
| Macro Sensitivity | 96.97 | 76.85 |
| Macro F1_score | 97.02 | 76.66 |
| KL Grade | Sensitivity (95% CI) | Specificity | AUC |
|---|---|---|---|
| Grade 0 (Normal) | 81.2% (78.4–83.9%) | 92.4% | 0.91 |
| Grade 1 (Doubtful) | 68.5% (65.1–71.8%) | 87.1% | 0.83 |
| Grade 2 (Mild) | 72.1% (69.3–74.8%) | 89.4% | 0.86 |
| Grade 3 (Moderate) | 79.4% (76.2–82.5%) | 93.1% | 0.90 |
| Grade 4 (Severe) | 83.1% (80.5–85.6%) | 96.8% | 0.94 |
| Model | Accuracy |
|---|---|
| VGG19 | 63% |
| ResNet50 | 68.3% |
| MobileNet | 67.2 |
| ViT | 73.7% |
| Fine-tuned CLIP | 76.94% |
| Authors | Train Dataset | Test Dataset | Methods | Accuracy % |
|---|---|---|---|---|
| Anthony et al. [17] | OAI | Internal OAI Split | Pre-trained CNN | 63.66 |
| Helwan et al. [9] | OAI | Internal OAI Split | Wide residual network (WRN) | 72 |
| Tilupin et al. [18] | OAI | Internal OAI Split | Siamese CNN | 66.71 |
| Pie et al. [6] | OAI | Internal OAI Split | Ensemble deep learning | 76.93 |
| Chen et al. [4] | OAI | Internal OAI Split | CNN | 70.4 |
| Ours | OAI (train) | Dataset 2 (External) | Fine-tuned CLIP model | 76.94 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Bashir, S.A.; Altarhouni, R.S.; Milad, M.B.; Abuhtna, F.A.; Wafi, M.M.; Elbahri, E.A.; Alshareef, E.A.; Ma’aitah, M.K.S.; Alsariera, E.; Toigozhinova, A. Knee Osteoarthritis Severity Grading Using Contrastive Learning Image Pre-Training. J. Pers. Med. 2026, 16, 314. https://doi.org/10.3390/jpm16060314
Bashir SA, Altarhouni RS, Milad MB, Abuhtna FA, Wafi MM, Elbahri EA, Alshareef EA, Ma’aitah MKS, Alsariera E, Toigozhinova A. Knee Osteoarthritis Severity Grading Using Contrastive Learning Image Pre-Training. Journal of Personalized Medicine. 2026; 16(6):314. https://doi.org/10.3390/jpm16060314
Chicago/Turabian StyleBashir, Sedigh Abdalla, Rabeeah S. Altarhouni, Mohamed Burid Milad, Fauzia Ali Abuhtna, Mansor Masaud Wafi, Ellafi. A. Elbahri, Esam Alsadiq Alshareef, Mohammad Khaleel Sallam Ma’aitah, Esraa Alsariera, and Ainur Toigozhinova. 2026. "Knee Osteoarthritis Severity Grading Using Contrastive Learning Image Pre-Training" Journal of Personalized Medicine 16, no. 6: 314. https://doi.org/10.3390/jpm16060314
APA StyleBashir, S. A., Altarhouni, R. S., Milad, M. B., Abuhtna, F. A., Wafi, M. M., Elbahri, E. A., Alshareef, E. A., Ma’aitah, M. K. S., Alsariera, E., & Toigozhinova, A. (2026). Knee Osteoarthritis Severity Grading Using Contrastive Learning Image Pre-Training. Journal of Personalized Medicine, 16(6), 314. https://doi.org/10.3390/jpm16060314






