MedKnee: A New Deep Learning-Based Software for Automated Prediction of Radiographic Knee Osteoarthritis

In computer-aided medical diagnosis, deep learning techniques have shown that it is possible to offer performance similar to that of experienced medical specialists in the diagnosis of knee osteoarthritis. In this study, a new deep learning (DL) software, called “MedKnee” is developed to assist physicians in the diagnosis process of knee osteoarthritis according to the Kellgren and Lawrence (KL) score. To accomplish this task, 5000 knee X-ray images obtained from the Osteoarthritis Initiative public dataset (OAI) were divided into train, valid, and test datasets in a ratio of 7:1:2 with a balanced distribution across each KL grade. The pre-trained Xception model is used for transfer learning and then deployed in a Graphical User Interface (GUI) developed with Tkinter and Python. The suggested software was validated on an external public database, Medical Expert, and compared with a rheumatologist’s diagnosis on a local database, with the involvement of a radiologist for arbitration. The MedKnee achieved an accuracy of 95.36% when tested on Medical Expert-I and 94.94% on Medical Expert-II. In the local dataset, the developed tool and the rheumatologist agreed on 23 images out of 30 images (74%). The MedKnee’s satisfactory performance makes it an effective assistant for doctors in the assessment of knee osteoarthritis.


Introduction
Knee Osteoarthritis (OA) is one of the most common chronic diseases, and happens when the cartilage in the knee joint breaks down.The resulting friction between the bones of the knee joint can cause discomfort, stiffness, or swelling in the knees.If left untreated, severe osteoarthritis of the knee can result in partial or complete disability.Although there is no known treatment for knee osteoarthritis, if a timely and accurate diagnosis is made there are therapies that can lessen symptoms and halt the disease's progression.Knee OA affects mainly women, the elderly, and the obese.By 2050, 130 million people worldwide will suffer from knee osteoarthritis [1].
In recent years, Artificial Intelligence (AI) has attracted considerable interest in medical imaging [2][3][4][5][6][7].Particularly, AI was utilized in the diagnosis of knee osteoarthritis based on validated using 5148 X-ray images obtained from the OAI dataset.The LSTM combined with VGG16 achieved the highest accuracy of 75.28%.
Both Chen et al. [15] and Wani et Saini.[16] proposed a novel adjustable ordinal loss instead of the cross-entropy loss in the detection of knee OA using VGG, ResNet, DenseNet, and InceptionV3.To develop and compare the proposed models, 8260 knee Xray images were used in [15] and 1656 X-ray images were applied in [16], all collected from the OAI dataset.The VGG19 model with the proposed ordinal loss in [15,16] obtained the highest knee severity grading accuracy of 70.4%, and 96.7%, respectively.The ordinal loss function-based approach was used also by Jain et al. [17] to develop an automated method of detecting knee osteoarthritis from X-ray images, named High-Resolution Network (HRNet).The developed model was combined with a convolutional mass attention module (CBAM).HRNet is a revolutionary multi-resolution deep CNN consisting of a convolution (2D) layer followed by layers that add up the high-to-low resolution and then merge the multi-resolution in parallel for information exchange.The model was built on 8260 knee X-ray images from the OAI dataset.The method achieved an accuracy of 71.74% and a mean absolute error (MAE) of 0.311.
Yunus et al. [18] tended to apply a specific approach based on Darknet-53 and Alexnet combined with local binary pattern (LBP) to extract deep features and identify knee OA severity from radiological images.The final classification was performed with the support vector machine (SVM) and the K-nearest neighbors (KNN).Then the classified images were localized using a combination of YOLOv2 and an open neural network exchange (ONNX) built in 24 layers for the preparation of the developed model as (i) input layer, (ii) two element-wise Affine layers, (iii) four convolutional layers, (iv) four Batch normalization (BN) layers, (v) three max-pooling layers, and (vi) four activation layers, while YOLO-v2 was built using three convolutional layers, two BN layers, and two ReLU layers.This approach was developed using 3795 X-ray images from the OAI public dataset.The model achieved an accuracy of 90.6%.
Hu et al. [19] developed a novel deep learning architecture, Adversarial Evolving Neural Network (A-ENN), for the longitudinal progression of Knee OA severity over 4 years.The deep learning model was built with Resnet-18 and three classifiers: VGG19, ResNet50, and visual transformer model (Vit).The proposed model was trained and tested on 3294 labeled knee X-ray images belonging to the OAI dataset.The model combined with VGG19 achieved the best accuracy of 64.6%, 63.9%, 63.2%, 61.8%, and 60.2% for progression baseline, 12-month, 24-month, 36-month, and 48-month, respectively.
Raisuddin et al. [20] proposed and evaluated Deep Active Learning (DAL) designed to classify knee OA severity.The proposed model was built with Semi-Supervised Learning (SSL) deep Siamese using the VGG and Consistency Regularization (CR) approach which ensures the model's stability in front of the input noise.This model was trained and validated using 8953 knee X-ray images from the OAI dataset.The developed DAL achieved a balanced accuracy of 64.13%.
Huu et al. [21] applied the transfer learning of VGG16 for the automated binary classification of KOA severity using a deep Siamese convolution neural network.The proposed model consists of six convolutional layers with a stride of 1, three convolutional layers with a stride of 2, three dropout layers, a Separable Adaptive Max-pooling (SAM) layer, and a fully connected layer.The proposed model was built using 2874 X-ray images collected from the OAI dataset.The updated VGG16 model achieved an accuracy of 89%.
Yifan et al. [22] presented a knee OA classifier using the Transfer learning of ReseNet34 and DenseNet121 combined with a novel learning scheme that splits data into two categories based on reliability.The two models were developed using 8302 X-ray images from the OAI dataset and a hybrid loss function to manipulate the lower reliability sets.Both models, DenseNet121 and ReseNet34, achieved an accuracy of 70.13% and 68.32%, respectively.
More recently, Alshamrani et al. [23] proposed transfer learning models based on sequential CNNs, VGG16, and ResNet-50 to identify normal and abnormal knees from X-ray images.The proposed models were trained using 3836 X-ray images collected from Kaggle.The best developed model was VGG16 which achieved a training accuracy of 99% and a testing accuracy of 92%.
Mohammed et al. [24] suggested a binary classification and a multiclass classification for the severity of KOA from radiographic images.This approach was built using six pre-trained DNN models: ResNet101, MobileNetV2, VGG16, VGG19, InceptionResNetV2, and DenseNet121.The designed models were trained and tested on 9786 knee images taken from OAI.The best-performing model was the pre-trained ResNet101 for three classes and five classes with an accuracy of 89% and 69%, respectively.

Materials and Methods
Although radiography is fast, inexpensive, non-invasive, and easy to use, the quality of the radiographic image requires several treatments to improve contrast and brightness and remove noise.Therefore, before selecting and improving the classification network, image processing was carried out to enhance the identification of knee osteoarthritis severity based on KL grade.

Dataset Description
OAI is a longitudinal observational study conducted by the US National Institutes of Health (NIH) in men and women over ten years.It consists of 4446 X-ray images of knees labeled by Boston University according to the Kellgren and Lawrence (KL) scoring system.In this study, the knee X-ray images used to train the proposed model belong to the OAI dataset available on Mendeley Data [26].The images consist of unilateral PA fixed flexion of uniform size with an identical resolution.The raw images are bilateral PA fixed flexion with varied resolutions and sizes, pre-processed before being made available for download on Mendeley.The dataset used is labeled Healthy, Doubtful, Minimal, Moderate, and Severe, with equal size to avoid the problem of an unbalanced dataset.The dataset was augmented into 5000 images and then divided into a training set (70%), a validation set (10%), and a testing set (20%).

Preprocessing
Histogram equalization is applied in image processing to spread and evenly distribute pixel values.This allows the image quality to be improved by increasing the dispersion of the highest frequency and decreasing the dispersion of other frequencies, allowing the low contrast of the source images to be improved [27].In this work, we have applied the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm used in [21,25].Then, to improve the performance of the osteoarthritis detection system, it was necessary to use images with variable contrast, brightness, and positions, as they can be captured by the camera used in our work.The original images were resized to 224 × 224, flipped horizontally and vertically, and rotated left and right.Therefore, the image intensity was normalized between 0 and 1, and then the brightness was modified in the range of (0.1,0.7).The preprocessing pipeline is shown in Figure 1.
of uniform size with an identical resolution.The raw images are bilateral PA fixed flexion with varied resolutions and sizes, pre-processed before being made available for download on Mendeley.The dataset used is labeled Healthy, Doubtful, Minimal, Moderate, and Severe, with equal size to avoid the problem of an unbalanced dataset.The dataset was augmented into 5000 images and then divided into a training set (70%), a validation set (10%), and a testing set (20%).

Preprocessing
Histogram equalization is applied in image processing to spread and evenly distribute pixel values.This allows the image quality to be improved by increasing the dispersion of the highest frequency and decreasing the dispersion of other frequencies, allowing the low contrast of the source images to be improved [27].In this work, we have applied the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm used in [21,25].Then, to improve the performance of the osteoarthritis detection system, it was necessary to use images with variable contrast, brightness, and positions, as they can be captured by the camera used in our work.The original images were resized to 224 × 224, flipped horizontally and vertically, and rotated left and right.Therefore, the image intensity was normalized between 0 and 1, and then the brightness was modified in the range of (0.1,0.7).The preprocessing pipeline is shown in Figure 1.

Network Architecture
Xception, also known as the advanced variant of an Inception module, is a deep convolutional neural network based entirely on depthwise separable convolution layers.It was developed in 2017 by the creator of the Keras Library, Francois Chollet [28].This model moderately surpassed Inception V3 on the ImageNet dataset in 2017.The pre-trained version of the network trained on more than a million images from the ImageNet database.The pre-trained Xception model was selected in this work because it consumes few resources while maintaining acceptable accuracy, and its architecture is very easy to define and modify, making it a prime candidate for medical tasks [29].It has been utilized in various medical tasks during the past two years, such as the assessment of benign and malignant gastric ulcer lesions based on gastrointestinal endoscopic images [30], the detection of COVID-19 from radiographic images [31], and the detection of knee osteoarthritis [32].

Network Architecture
Xception, also known as the advanced variant of an Inception module, is a deep convolutional neural network based entirely on depthwise separable convolution layers.It was developed in 2017 by the creator of the Keras Library, Francois Chollet [28].This model moderately surpassed Inception V3 on the ImageNet dataset in 2017.The pretrained version of the network trained on more than a million images from the ImageNet database.The pre-trained Xception model was selected in this work because it consumes few resources while maintaining acceptable accuracy, and its architecture is very easy to define and modify, making it a prime candidate for medical tasks [29].It has been utilized in various medical tasks during the past two years, such as the assessment of benign and malignant gastric ulcer lesions based on gastrointestinal endoscopic images [30], the detection of COVID-19 from radiographic images [31], and the detection of knee osteoarthritis [32].
The Xception architecture is easy to define and modify.As shown in Figure 2, it contains 2 convolutional layers, 34 separable convolution layers, 4 max-pooling layers, and a global average pooling layer.Convolutional and separable convolution layers are followed by batch normalization and ReLU layers.The 36 convolutional layers used for extracting network characteristics are arranged in 14 blocks, which are all surrounded by linear residual connections, except for the first and last layers.The convolutional base is succeeded by a logistic regression layer.The pre-trained model head in this study has been updated with a global average pooling layer, batch normalization layer, a dropout of 0.2, and a fully connected layer with a softmax activation function.The model is trained using categorical cross-entropy with five outputs corresponding to the Kellgreen and Lawrence (KL) grading scale.The updated network was trained for 200 epochs using a batch size of 16, and an Adam optimizer with an initial learning rate of 0.0001.The best model was saved in a «.H5» file.The network training is conducted with the hyperparameters illustrated in Table 2.The pre-trained model head in this study has been updated with a global average pooling layer, batch normalization layer, a dropout of 0.2, and a fully connected layer with a softmax activation function.The model is trained using categorical cross-entropy with five outputs corresponding to the Kellgreen and Lawrence (KL) grading scale.The updated network was trained for 200 epochs using a batch size of 16, and an Adam optimizer with an initial learning rate of 0.0001.The best model was saved in a «.H5» file.The network training is conducted with the hyperparameters illustrated in Table 2.

Software Requirements and Libraries
To carry out this work, a machine running the Windows 10 operating system with an i7 processor, 16 GB memory, and a Nvidia Quadro M2200 5.2 graphics processing unit (GPU) is used.Python 3.9.16(Anaconda3), TensorFlow-GPU 2.10.0,CUDA (Compute Unified Data Architecture) 11.2, and CUDNN 8.1.33are used to implement the proposed method.Spyder5 was used in these experiments as a framework.

Experiment Results and Model Deployment
The saved model was deployed in a graphical user interface (GUI) using a desktop application.Then, to validate our study, the model was tested on the OAI labeled dataset and two other external datasets, including a local dataset.

Result on OAI Dataset
The proposed automatic system's performance was assessed through the use of the F1-score, precision, recall, and confusion matrix.The pre-trained model is tested on 1000 images belonging to the OAI dataset, divided into 200 images for each KL grade.The use of batch normalization and drop-out layer has enabled us to avoid overfitting.The model achieved a validation accuracy of 99.39% and a test accuracy of 97.20%.
The test dataset consisted of 1000 knees, 800 knees with OA and 200 knees without OA.As illustrated in Figure 3, the confusion matrix of the Xception model indicates that KL3 was classified without error, followed by KL grade 1, where the method correctly classified 198 knees out of 200.In third place, we find KL4 grade with a re-rating rate of 195 correctly identified knees.Finally, KL2 and KL0 grades have the lowest rate with the correct identification of 190 and 189 knees, respectively.We note the classification of grade KL0 as grade KL 1 in 10 cases due to the minimal difference between the two grades and the difficulty in distinguishing between them.
To carry out this work, a machine running the Windows 10 operating system with an i7 processor, 16 GB memory, and a Nvidia Quadro M2200 5.2 graphics processing unit (GPU) is used.Python 3.9.16(Anaconda3), TensorFlow-GPU 2.10.0,CUDA (Compute Unified Data Architecture) 11.2, and CUDNN 8.1.33are used to implement the proposed method.Spyder5 was used in these experiments as a framework.

Experiment Results and Model Deployment
The saved model was deployed in a graphical user interface (GUI) using a desktop application.Then, to validate our study, the model was tested on the OAI labeled dataset and two other external datasets, including a local dataset.

Result on OAI Dataset
The proposed automatic system's performance was assessed through the use of the F1-score, precision, recall, and confusion matrix.The pre-trained model is tested on 1000 images belonging to the OAI dataset, divided into 200 images for each KL grade.The use of batch normalization and drop-out layer has enabled us to avoid overfi ing.The model achieved a validation accuracy of 99.39% and a test accuracy of 97.20%.
The test dataset consisted of 1000 knees, 800 knees with OA and 200 knees without OA.As illustrated in Figure 3, the confusion matrix of the Xception model indicates that KL3 was classified without error, followed by KL grade 1, where the method correctly classified 198 knees out of 200.In third place, we find KL4 grade with a re-rating rate of 195 correctly identified knees.Finally, KL2 and KL0 grades have the lowest rate with the correct identification of 190 and 189 knees, respectively.We note the classification of grade KL0 as grade KL 1 in 10 cases due to the minimal difference between the two grades and the difficulty in distinguishing between them.As shown in Table 3, the model perfectly detects grade KL3 with a Recall of 1 and an F1-score of 0.98.KL1 and KL4 were classified with a recall of 0.99 and 0.97, respectively.KL0 and KL2 were identified with an equal precision of 0.99 and an F1-score of 0.97.Table 4, compares the proposed model with works deployed on an identical dataset collected from the OAI dataset with a comparable size.The proposed model has outperformed many research studies based on internal validation with a validation accuracy of 99.39% and a test accuracy of 97.20%.

Graphical User Interface (GUI): MedKnee
A desktop application, the Moroccan KOA diagnosing tool (MedKnee), has been created using TKINTER and FPDF2 Python to simplify the use of the knee osteoarthritis classification implementation.The interface allows the user to enter the patient's name, date, age, and sex.After entering the correct password, the user can select the left or right knee and obtain an individual diagnosis for each knee (Figure 4).As shown in Figure 5, the diagnostic report for each knee can be printed, with a field at the bo om for comments.As shown in Figure 5, the diagnostic report for each knee can be printed, with a field at the bottom for comments.As shown in Figure 5, the diagnostic report for each knee can be printed, with a field at the bo om for comments.

Results on External Validation
In the first step, to approve our study, Medical Expert-I and Medical Expert-II public datasets were used.Then, a local dataset was employed in the second step to compare the results with a specialist's diagnosis.

Medical Expert Public Dataset
The database is composed of 1650 digital radiographs of the knee joint collected from various hospitals and diagnostic centers in India [33].Each X-ray image of the knee is manually labeled by two medical experts according to the Kellgren and Lawrence grades.Both experts are experienced orthopedic surgeons who review between 70 and 100 radiographs per day.To obtain images comparable to those in the Medknee training dataset, preprocessing consistent with that applied to the aforementioned training dataset must be implemented.First, we excluded images with double knees, then we resized the selected images from 362 × 162 to 224 × 162, and finally, we applied contrast-limited adaptive histogram equalization (CLAHE) to improve the local contrast of the images.As shown in Figure 6, the confusion matrix was used to evaluate the performance of our model.The proposed model achieved an accuracy of 95.36% on 1464 images labeled by expert1 and 94.94% on 1463 images annotated by expert2.be implemented.First, we excluded images with double knees, then we resized the selected images from 362 × 162 to 224 × 162, and finally, we applied contrast-limited adaptive histogram equalization (CLAHE) to improve the local contrast of the images.As shown in Figure 6, the confusion matrix was used to evaluate the performance of our model.The proposed model achieved an accuracy of 95,36% on 1464 images labeled by expert1 and 94.94% on 1463 images annotated by expert2.

Local Dataset
In the local dataset, we have selected 60 adult patient files of men and women with knee disorders who were radiographically examined at the Radiology Department of the El Kelaa des Sraghna Provincial Hospital.Each file consists of one or more knee radiographs in Dicom format as presented in Figure 7.The images were acquired using a standard ITALRAY radiology table.After analysis, 30 images with implants and non-posterioranterior examination views were excluded and 30 left and right Dicom knees were retained to validate this work.To make the validation local dataset, the ROI was selected manually as illustrated in Figure 8.Then, each patient's file was presented to a rheumatologist with PNG images.In case of disagreement between our diagnostic system and the rheumatologist, a radiologist was consulted for arbitration to make the final decision.

Local Dataset
In the local dataset, we have selected 60 adult patient files of men and women with knee disorders who were radiographically examined at the Radiology Department of the El Kelaa des Sraghna Provincial Hospital.Each file consists of one or more knee radiographs in Dicom format as presented in Figure 7.The images were acquired using a standard ITALRAY radiology table.After analysis, 30 images with implants and non-posterioranterior examination views were excluded and 30 left and right Dicom knees were retained to validate this work.To make the validation local dataset, the ROI was selected manually as illustrated in Figure 8.Then, each patient's file was presented to a rheumatologist with PNG images.In case of disagreement between our diagnostic system and the rheumatologist, a radiologist was consulted for arbitration to make the final decision.Out of the 30 images presented to the doctor, the DL model and the rheumatologist agreed on 23 images (74%).As shown in Table 5, in the seven remaining images where there was a disagreement, the referee confirmed four images, indicating agreement between the referee and the DL model.Three images were not confirmed, of which only one was not confirmed by the referee, indicating agreement between the referee and the rheumatologist.Based on the final decision of the arbitrator physician, our diagnostic model failed in 2 out of 30 images with a test accuracy of 90%.Table 5. Arbitration by radiologists to resolve disagreements between software and rheumatologists regarding the local dataset.The bold characters that show that there is an agreement between the result of the developed software diagnostic and the doctors' diagnostic.
Out of the 30 images presented to the doctor, the DL model and the rheumatologist agreed on 23 images (74%).As shown in Table 5, in the seven remaining images where there was a disagreement, the referee confirmed four images, indicating agreement between the referee and the DL model.Three images were not confirmed, of which only one was not confirmed by the referee, indicating agreement between the referee and the rheumatologist.Based on the final decision of the arbitrator physician, our diagnostic model failed in 2 out of 30 images with a test accuracy of 90%.Out of the 30 images presented to the doctor, the DL model and the rheumatologist agreed on 23 images (74%).As shown in Table 5, in the seven remaining images where there was a disagreement, the referee confirmed four images, indicating agreement between the referee and the DL model.Three images were not confirmed, of which only one was not confirmed by the referee, indicating agreement between the referee and the rheumatologist.Based on the final decision of the arbitrator physician, our diagnostic model failed in 2 out of 30 images with a test accuracy of 90%.Out of the 30 images presented to the doctor, the DL model and the rheumatologist agreed on 23 images (74%).As shown in Table 5, in the seven remaining images where there was a disagreement, the referee confirmed four images, indicating agreement between the referee and the DL model.Three images were not confirmed, of which only one was not confirmed by the referee, indicating agreement between the referee and the rheumatologist.Based on the final decision of the arbitrator physician, our diagnostic model failed in 2 out of 30 images with a test accuracy of 90%.Table 5. Arbitration by radiologists to resolve disagreements between software and rheumatologists regarding the local dataset.The bold characters that show that there is an agreement between the result of the developed software diagnostic and the doctors' diagnostic.Out of the 30 images presented to the doctor, the DL model and the rheumatologist agreed on 23 images (74%).As shown in Table 5, in the seven remaining images where there was a disagreement, the referee confirmed four images, indicating agreement between the referee and the DL model.Three images were not confirmed, of which only one was not confirmed by the referee, indicating agreement between the referee and the rheumatologist.Based on the final decision of the arbitrator physician, our diagnostic model failed in 2 out of 30 images with a test accuracy of 90%.Table 5. Arbitration by radiologists to resolve disagreements between software and rheumatologists regarding the local dataset.The bold characters that show that there is an agreement between the result of the developed software diagnostic and the doctors' diagnostic.Out of the 30 images presented to the doctor, the DL model and the rheumatologist agreed on 23 images (74%).As shown in Table 5, in the seven remaining images where there was a disagreement, the referee confirmed four images, indicating agreement between the referee and the DL model.Three images were not confirmed, of which only one was not confirmed by the referee, indicating agreement between the referee and the rheumatologist.Based on the final decision of the arbitrator physician, our diagnostic model failed in 2 out of 30 images with a test accuracy of 90%.Out of the 30 images presented to the doctor, the DL model and the rheumatologist agreed on 23 images (74%).As shown in Table 5, in the seven remaining images where there was a disagreement, the referee confirmed four images, indicating agreement between the referee and the DL model.Three images were not confirmed, of which only one was not confirmed by the referee, indicating agreement between the referee and the rheumatologist.Based on the final decision of the arbitrator physician, our diagnostic model failed in 2 out of 30 images with a test accuracy of 90%.Table 5. Arbitration by radiologists to resolve disagreements between software and rheumatologists regarding the local dataset.The bold characters that show that there is an agreement between the result of the developed software diagnostic and the doctors' diagnostic.Out of the 30 images presented to the doctor, the DL model and the rheumatologist agreed on 23 images (74%).As shown in Table 5, in the seven remaining images where there was a disagreement, the referee confirmed four images, indicating agreement between the referee and the DL model.Three images were not confirmed, of which only one was not confirmed by the referee, indicating agreement between the referee and the rheumatologist.Based on the final decision of the arbitrator physician, our diagnostic model failed in 2 out of 30 images with a test accuracy of 90%.Table 5. Arbitration by radiologists to resolve disagreements between software and rheumatologists regarding the local dataset.The bold characters that show that there is an agreement between the result of the developed software diagnostic and the doctors' diagnostic.Out of the 30 images presented to the doctor, the DL model and the rheumatologist agreed on 23 images (74%).As shown in Table 5, in the seven remaining images where there was a disagreement, the referee confirmed four images, indicating agreement between the referee and the DL model.Three images were not confirmed, of which only one was not confirmed by the referee, indicating agreement between the referee and the rheumatologist.Based on the final decision of the arbitrator physician, our diagnostic model failed in 2 out of 30 images with a test accuracy of 90%.Table 5. Arbitration by radiologists to resolve disagreements between software and rheumatologists regarding the local dataset.The bold characters that show that there is an agreement between the result of the developed software diagnostic and the doctors' diagnostic.

Discussion
In this work, an automated approach to classify the severity of knee osteoarthritis from simple radiographic images according to the KL grade is presented and implemented in a desktop application using TKINTER and the DL model based on Xception.The pre-trained Xception was chosen due to its superior performance on the OAI dataset compared to other models used during software development, such as EfficientNetV2M (93.15%) and MobileNetV2 (75.60%).Table 4 gives the multiclass classification values in comparison with similar works.The developed method achieved a validation multiclass accuracy of 99.39% and a test accuracy of 97.20%, which is a high performance compared to [11,12,14].To validate our work, we tested the model on the Medical expert database and a local database.The model achieved an accuracy of 95.20% on Medical Expert-I and 94.94% on Medical Expert-II.This minimal difference confirms the results obtained in [34].To validate the results, the model was tested on 30 images and compared with the diagnosis of a rheumatologist.In case of disagreement, a radiologist was consulted.Out of the 30 images, 28 were correctly identified (90%).It is worth noting that when four images were presented twice to the same doctor, the latter failed to give the same diagnosis for two images.However, our software's diagnosis coincided with the second diagnosis of the doctor for one of the images, which was then revised.Furthermore, it is often challenging to differentiate between images of grades KL-0 and KL-1, which explains the discrepancy in identifying these two grades, but there is no clinical benefit in distinguishing between KL grades 0 and 1.In binary classification for osteoarthritis (OA) (KL < 2), and non-OA (KL ≥ 2), the new software MedKnee tested on a local dataset achieved an accuracy of 100%.Indeed, as shown in Table 5, although only one knee was classified as doubtful (KL = 1) by the rheumatologist, the referee (radiologist) confirmed the software's diagnosis by validating that the knee in question had minimal osteoarthritis (KL = 2).In 2023, similar software, MediAI-OA [35], was developed using the NASNet DL model, but its accuracy is limited to 83%, which is significantly lower than the performance of our new software, MedKnee.
However, several limitations must be noted regarding the proposed approach.First, the study of the OAI dataset did not include lateral radiographs.Therefore, as noted by Ahmed and Mustapha [1], the addition of lateral radiographs would have provided additional information.Secondly, the radiographic images of knee osteoarthritis utilized to train the proposed model are pre-processed and consist of a PA radiograph with fixed flexion and identical resolution and size, whereas raw images require pre-processing before they can be processed.The generalizability of the developed model to external databases is another major limitation.The proposed model is limited to pre-processed images with zooming, resizing, and region of interest (ROI) rearrangement.Since the model was only developed using the pre-treated images from the OAI dataset, its accuracy is not acceptable in the absence of these operations.Nonetheless, the model can be more broadly generalized without requiring significant preprocessing if it is built using a combination of many external databases and local or other institutional datasets, including bilateral and unilateral images of varying sizes and resolutions.Furthermore, the developed application helps physicians to identify knee osteoarthritis with acceptable accuracy using manual localization of the (ROI).The addition of deep learning models, such as YOLO or Faster-RCNN, would be helpful for real-time detection of the ROI.Finally, the implementation of the proposed approach in this study with enhanced computational resources, including Nvidia GeForce, can enhance the degree of accuracy achieved by increasing the number of training epochs and images.In the future, we expect to improve our application by using a large local dataset and introducing automatic ROI selection with the option of adding image segmentation.Nevertheless, the developed software achieved a high level of accuracy and can help physicians predict the exact severity grade of knee OA by analyzing radiographic images.

Conclusions and Future Work
This work presents MedKnee, new DL software designed to automatically classify the severity of knee osteoarthritis from radiographic images.The software is developed based on the transfer learning of a pre-trained Xception model and the public OAI dataset.The best DL model achieved a validation accuracy of 99.39% and a test accuracy of 97.20%, moderately better than reported in the recent literature.The developed tool was validated using the Medical expert dataset and a local dataset.The model performed with an accuracy of 95.36% when tested on Medical Expert-I, and 94.94% on Medical Expert-II.On the local dataset, the model was tested on 30 images, and compared with a rheumatologist's diagnosis.To resolve the disagreement, a radiologist was consulted.The model achieved a multiclass accuracy of 90% and a binary classification accuracy of 100%.It can be concluded that the proposed software "MedKnee" can equip radiologists with the ability to quickly and accurately diagnose and predict knee osteoarthritis from X-ray images.The proposed

Figure 1 .
Figure 1.The preprocessing pipeline used in this work.

Figure 1 .
Figure 1.The preprocessing pipeline used in this work.
tains 2 convolutional layers, 34 separable convolution layers, 4 max-pooling layers, and a global average pooling layer.Convolutional and separable convolution layers are followed by batch normalization and ReLU layers.The 36 convolutional layers used for extracting network characteristics are arranged in 14 blocks, which are all surrounded by linear residual connections, except for the first and last layers.The convolutional base is succeeded by a logistic regression layer.

Figure 3 .
Figure 3.The confusion matrix of the proposed approach tested on the OAI dataset.Figure 3. The confusion matrix of the proposed approach tested on the OAI dataset.

Figure 3 .
Figure 3.The confusion matrix of the proposed approach tested on the OAI dataset.Figure 3. The confusion matrix of the proposed approach tested on the OAI dataset.

Figure 5 .
Figure 5. Printable report on knee osteoarthritis diagnosis.Figure 5. Printable report on knee osteoarthritis diagnosis.

Figure 5 .
Figure 5. Printable report on knee osteoarthritis diagnosis.Figure 5. Printable report on knee osteoarthritis diagnosis.

Figure 6 .
Figure 6.Model performance: (a) the confusion matrix of the model tested in 1464 images of Medical Expert-I; (b) the confusion matrix of the proposed approach tested on 1463 images of Medical Expert-II.

Figure 6 .
Figure 6.Model performance: (a) the confusion matrix of the model tested in 1464 images of Medical Expert-I; (b) the confusion matrix of the proposed approach tested on 1463 images of Medical Expert-II.

Figure 7 .
Figure 7. Example of knee X-ray images collected and filtered from patients with knee symptoms.Figure 7. Example of knee X-ray images collected and filtered from patients with knee symptoms.

Figure 7 .
Figure 7. Example of knee X-ray images collected and filtered from patients with knee symptoms.Figure 7. Example of knee X-ray images collected and filtered from patients with knee symptoms.

Figure 7 .
Figure 7. Example of knee X-ray images collected and filtered from patients with knee symptoms.

Figure 8 .
Figure 8. Manual ROI selection.(a) ROI selection of a man's knee.(b) ROI selection of a woman's knee.

Table 5 .Figure 7 .
Figure 7. Example of knee X-ray images collected and filtered from patients with knee symptoms.

Figure 8 .
Figure 8. Manual ROI selection.(a) ROI selection of a man's knee.(b) ROI selection of a woman's knee.

Table 5 .Figure 7 .
Figure 7. Example of knee X-ray images collected and filtered from patients with knee symptoms.

Figure 8 .
Figure 8. Manual ROI selection.(a) ROI selection of a man's knee.(b) ROI selection of a woman's knee.

Figure 7 .
Figure 7. Example of knee X-ray images collected and filtered from patients with knee symptoms.

Figure 8 .
Figure 8. Manual ROI selection.(a) ROI selection of a man's knee.(b) ROI selection of a woman's knee.

Figure 7 .
Figure 7. Example of knee X-ray images collected and filtered from patients with knee symptoms.

Figure 8 .
Figure 8. Manual ROI selection.(a) ROI selection of a man's knee.(b) ROI selection of a woman's knee.

Table 5 .Figure 7 .
Figure 7. Example of knee X-ray images collected and filtered from patients with knee symptoms.

Figure 8 .
Figure 8. Manual ROI selection.(a) ROI selection of a man's knee.(b) ROI selection of a woman's knee.

Figure 7 .
Figure 7. Example of knee X-ray images collected and filtered from patients with knee symptoms.

Figure 8 .
Figure 8. Manual ROI selection.(a) ROI selection of a man's knee.(b) ROI selection of a woman's knee.

Figure 7 .
Figure 7. Example of knee X-ray images collected and filtered from patients with knee symptoms.

Figure 8 .
Figure 8. Manual ROI selection.(a) ROI selection of a man's knee.(b) ROI selection of a woman's knee.

Table 1 .
Overview of the methods used in recent literature with the average values of the metrics used.

Table 3 .
Performance of the proposed model in detecting each KL grade of knee osteoarthritis validated on the OAI dataset.

Table 4 .
Comparison between the proposed tool, MedKnee, and similar works in KOA's 5-class classification based on internal validation.