A Novel Hybrid Approach Based on Deep CNN Features to Detect Knee Osteoarthritis

In the recent era, various diseases have severely affected the lifestyle of individuals, especially adults. Among these, bone diseases, including Knee Osteoarthritis (KOA), have a great impact on quality of life. KOA is a knee joint problem mainly produced due to decreased Articular Cartilage between femur and tibia bones, producing severe joint pain, effusion, joint movement constraints and gait anomalies. To address these issues, this study presents a novel KOA detection at early stages using deep learning-based feature extraction and classification. Firstly, the input X-ray images are preprocessed, and then the Region of Interest (ROI) is extracted through segmentation. Secondly, features are extracted from preprocessed X-ray images containing knee joint space width using hybrid feature descriptors such as Convolutional Neural Network (CNN) through Local Binary Patterns (LBP) and CNN using Histogram of oriented gradient (HOG). Low-level features are computed by HOG, while texture features are computed employing the LBP descriptor. Lastly, multi-class classifiers, that is, Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbour (KNN), are used for the classification of KOA according to the Kellgren–Lawrence (KL) system. The Kellgren–Lawrence system consists of Grade I, Grade II, Grade III, and Grade IV. Experimental evaluation is performed on various combinations of the proposed framework. The experimental results show that the HOG features descriptor provides approximately 97% accuracy for the early detection and classification of KOA for all four grades of KL.


Introduction
Osteoarthritis (OA) is a severe disease in joints, especially in the knees, due to loss of cartilage. It appears with age, and it is present mostly in the elderly population. Overweight is also among the various causes of the prevalence of OA [1,2]. The Knee joint consists of two major bones, the femur and the tibia. Between these bones, a thick material called cartilage is present. This cartilage helps with the flexible and frictionless movement of the knee. Cartilage volume may decrease due to aging or accidental loss [3]. Due to decreased cartilage volume, tibiofemoral bones produce friction during movement, leading to knee osteoarthritis (KOA). Articular cartilage is composed of a chondrocyte that helps the underlying bone by load distribution, and it works for a lifetime [4].
Kellgren-Lawrence (KL) is a grading system that describes the various stages of OA. This system is based on the radiographic classification of KOA. It is found to be the most method, gait analysis is performed to predict joint deterioration [24]. Joint mechanics and function are based on the efficient working of menisci. These menisci enable load balancing at tibia-femoral bones. It also facilitates articular cartilage by reducing the load on it. The lubrication and distribution of synovial fluids are also regulated and affected by menisci [25]. There are two types of material from which knee bone is made; one type of material is known as Cancellous or Trabecular (Spongy) bone, and the other is known as Cortical (compact) bone [26]. The bone has different shapes; some bones are long, some are short, some are flat, and other bone shapes are irregular [27]. They have presented a Layered graph approach for optimal segmentation. It can be applied on single and multiple interacting surfaces [28].
Mosaicplasty is a self-cartilage transplantation method. In the case of knee cartilage damage, it is one of the remedies. It requires 3D image precision [29]. Osteoarthritis and rheumatoid arthritis are other widespread diseases that are inclined to cause effusion. Even situations, such as gout or the formation of tumors and cysts, can trigger fluid keeping in and around the knee. A fully automated segmentation technique is used. This technique uses MR images and is applied for the detection of osteoarthritis of the knee [14,30]. Image processing techniques, such as histogram quantization, threshold, region of interest processing, edge detection, and so forth, are used to detect the breakdown of the cartilage [11,12]. Image processing techniques in medical diagnosis are presented. Edge detection and contrast enhancement are shown based on the threshold. This threshold directly affects the results. These experiments are performed on the Linux platform using the C language. The proposed algorithm helps in the case of noisy and blurred images [31]. A fully automated method for the segmentation of cartilage and bone is performed on MRI images. Cartilage volume, thickness, and surface area are detected based on knee segmentation. These parameters are then used for the progression of KOA [32].
Fully automated bone segmentation using a Graph cut algorithm has been used. The images used in this technique are taken from the OAI publicly available database. Here, MR images are used and classified to detect the bones, background, and to detect the fats present in the MR images. In this study, a two-phase approach has been proposed. In the first phase, areas of bones (femur and tibia) are identified. The output of the first phase is given as input to the second phase, where bone segmentation is performed, and other structures, such as fat and muscle, are separated. The accuracy of detection in the first phase is 0.99, whereas, in the second phase, accuracy is 0.95 mean DSI [33]. In addition, the cartilage composition is assessed by MR imaging. In this work, they have developed a direct segmentation technique (DST) to detect knee osteoarthritis. The imaging data have been taken from the OAI database [34]. Used X-ray images for automatic detection of KOA. They have detected different forms such as standard, doubtful, minimal, and moderate KOA. The dataset consists of 350 X-ray images. The KL classification is done manually. In this method, image features are first extracted. This process is carried out on transforms. For better results and feature extraction, transformation is also used. Experimental results showed that average and minimal grade KL OA was easily differentiated from normal OA with an accuracy of 91.5% and 80.4%, respectively, while doubtful OA was detected with an accuracy of 57% [5]. In their work, they have used fully automated segmentation techniques that have used three-label bone segmentation. They have also applied a convex optimization technique for the segmentation of knee cartilage. The proposed method provides a more significant result than manual segmentation [35]. A new graph cut technique is used for the detection of KOA, in which MR images are used for the segmentation [33].
A semiautomatic technique is used on knee MRI images to obtain a segment of cartilage. The cartilage segment is separated from femur and tibia bones [11]. In this study, a computer-aided image analysis method is used to detect the early development of KOA. This method detects the texture and structural changes in an image, such as bone-in-knee and hip KOA. Radiographic images or X-rays are used in this method. First, X-rays are taken and then digitized. The joint detection is automatically performed, and common areas are then separated from the image. Numerical image features or content descriptors are then extracted. In the end, images are classified based on the feature values. For moderate KOA (KL-3), the experimental results have shown an accuracy of 72%. While for mild KOA (KL-2), the accuracy was 62%. The critical aspect of the research is that the part of the tibia just beneath the joint is very informative for the early detection of KOA. This part has produced substantial and higher signals. Other areas of the tibia and femur away from the joint did not produce any signal and hence are less helpful in the early detection of KOA [36]. In [6], for early KOA detection, an automated technique is proposed using X-ray images. Firstly, images are preprocessed using the circular Fourier filter. Then, multi-linear regression is applied to minimize the variations among healthy and OA parts. Then for feature extraction, an independent component analysis is used, and at the end, Naïve Bayes and random forest classifiers are used to detect KOA. The algorithm gave 82.98% accuracy and 80.65% specificity. Knee OA is detected by using the knee joint space parameter [2]. The region of interest is separated through template matching using the HOG feature vector, then the knee joint space is calculated and compared with the reference knee joint space width. Detail of related works on the detection of Knee Osteoarthritis is given in Table 1. However, the method only detects the KOA, showing an accuracy of 97%. In [46], a region-based technique was used to detect the KOA. Histograms of gradient elements were calculated using a multi-class support vector machine (SVM), and results were categorized based on Kellgren and Lawrence's (KL) grading system. Accuracies of <98% for Grade-0, 93% for Grade-I, <87% for Grade-II, and 100% for Grade-3 and Grade-4 were attained. In [47], Hu's moments were used to extract information by understanding the geometric transformation of the cartilage from distorted images. Total seven invariants were calculated. At last, classification was performed using K nearest neighbor (KNN) and the decision tree classifier. KNN performs better than the decision tree and gives an accuracy of 98.5%, approximately. Nevertheless, the proposed system used 2000 X-ray images for training and testing.
From the literature discussed above, it is observed that various studies proposed techniques that worked on knee X-ray images for the detection of KOA or their own created datasets. To attain good results, many authors used morphological processing on images and feature extraction and classification algorithms such as HOG, Hu's, SVM, and KNN. The contributions of this study are below: • To propose a novel robust algorithm that can carry out early detection of KOA according to the KL grading. The proposed algorithm uses X-ray images for training and testing the results. The hybrid features descriptors extract features, that is, CNN with HOG and CNN with LBP. Three multi-classifiers are used to classify disease according to the KL grading system (I, II, III, IV), such as KNN, RF, and SVM; • Cross-validation has been used, using 420 images to evaluate the performance of the proposed technique, and results show 97% accuracy for overall detection and classification; • A five-fold validation is used, such as (50,50), (25,75), (30,70), (40,60), (20,80); here, an individual set represents the train and test data respectively for each Grade and the last set is for a healthy class. Our proposed technique gives an accuracy of 98% for all grade classifications; • We analyzed the performance of individual grade detection during cross-validation, revealing the following facts for the classification: The algorithm obtained 98% accuracy for Grade I, 97% accuracy for Grade II, 98.5% accuracy for Grade III and Grade IV; • Due to the algorithm's robustness, it can be used for other disease detection and classification, acquiring significant results.

Proposed Framework
The first step of the proposed system is preprocessing to detect the contours of the knee and to remove noise. Then region of interest is extracted, and segmentation is carried out. In the third step, features are extracted using Deep Convolutional Neural Network (DCNN) hybridized as Convolutional Neural Network (CNN), Histogram of Oriented Gradient (HOG), and DCNN with Local Binary Patterns (LBP). Features are extracted as texture, shape, scaling, rotation, and translation. These extracted features are passed to multiclass classifier Support Vector Machine (SVM), K nearest neighbors (KNN), and Random Forest to classify the images into four grades according to the KL grading system. The detail of the proposed deep learned system is given below:

Pre-Processing
The aim of preprocessing images is to prepare the data for further processing in the proposed system. Format conversion is performed in this step while improving the image quality. Images are converted into TIFF format because it preserves the overall quality of the images by storing the image information without loss. During the conversion, irrelevant information is removed. In addition, a negative of the image is formed, as it enhances the visibility of the region of interest. Finally, images are required to downscale using a bilinear approach from all dimensions to improve computational complexity. It also minimizes the noise as in the bilinear approach; the output value of the pixel is the average of weights of pixels in 2 × 2 neighborhoods.

Region of Interest (ROI) and Segmentation
The key aspect of the algorithm is detecting the early KOA disease space width of the knee joint. This disease becomes advanced as the gap between knee joints increases with age. The region of interest (ROI) is the tibiofemoral joint. The ROI is calculated through a matching technique with the database of knee images. The database image moves on the input image pixel by pixel, and the similarity among the image's blocks through a histogram of gradients' features is computed. The block, having maximum similarity, is selected as the ROI. This similarity-based mechanism outperforms the traditional algorithms. Let us suppose that an input image I of knee is fed to the system having size I × J, and D represents the database image, having size d r × d c , where v d is the vector of HOG, having a size of The block with the minimum MAD is selected as the ROI that contains the knee joint. The knee image that is used in the database is shown in Figure 1b. Figure 1a shows the original knee image, while Figure 1c shows the selected ROI. The selected region has essential features as it shows the joint space width (JSW). After extracting the ROI, this cropped image is given as an input for performing the segmentation through the active contour algorithm. The image is dynamically segmented using 3 × 3 masks [49].

Deep Learning
Deep Learning performs nonlinear transformation hierarchy-wise. Convolutional Neural Network (CNN) has deep architecture in a feed-forward manner on which learning can be applied. Each layer in CNN can see the features and show high variance [50]. During the testing phase of the deep convolutional network, it runs in the forward direction, and all layers are distinguished. The main characteristic of deep CNN is to perform each possible match among images. There are convolutional layers that linearize manifolds while pooling layers collapse them. At the output, layer size depends upon the stride. The filter is for sharpening the image. If the kernel size is K × K and the input size is S × S, the stride is 1. The feature maps of the input are F and of the output are O, then the input size will be F@ S × S, the output size will be O @ (S−K + 1) × (S−K + 1), the kernel has the F × O × K × K coefficients that must be learned, and the cost will be F × K × K × O × (S−K + 1) × (S−K + 1). The filter size should be matched with the size of the output pattern that is to be detected. The stride between the pools is the factor on which the output size depends. For example, if independent pools have a size of K × K and the input size is S × S with F features, then the output will be F @ (S/K) × (S/K).
The output function is defined as in the below equation: where F L refers to the layer that considers the output o of the previous layer as an input, represented by × L -1 to calculate the output × L depending upon the weights ωL for every single layer as in the below equation:

Feature Extraction
Features are extracted based on the shape that depends upon the knee joint space. The extracted level features include area, compactness, perimeter, lengths, that is, maximum and minimum axial length, circulatory, diameter, and convex area. Low-level features are computed through HOG, while texture features are computed using the LBP descriptor. On the other hand, high-level features are computed through ConvNet, such as scaling, rotation, and translation. In existing techniques, single or separate feature descriptors have been used that somehow fail to classify all grades of KOA due to KL having more than 95% accuracy [46]. In our proposed technique, both low and high-level features are used for the resultant image that outperforms the state-of-the-art by complete matching with the trained knee features.

Convolutional Neural Network as Feature Descriptor
In our proposed system, the first layer of a 2-dimensional CNN is a convolutional layer that has a filter size of 20 while the stride size is 1. It is followed by the max-pooling layer with a size of 2 × 2 and a stride of 1. The next layer is a convolutional layer with a stride size of 1 and a filter size of 32. All the first six layers are arranged alternatively as the convolutional and max-pooling layers. The next seventh layer is the activation layer that has a Rectified Linear Unit (ReLU), while the next layer is the convolutional layer with a filter size of 40 (4 × 4 × 32). The final layer is a softmax function layer. The weights of the convolutional layers and the values of operators in max-pooling layers should be steady for valuable computations. In our datasets, the image size is 50 × 50 × 1, which converts into a size of 1 × 1 × 2 with the help of forwarding propagation of all layers [51]. Deep Neural Network layers used for the proposed technique are shown in Figure 2. CNN has convolutional layers that take the input of image I, filters are applied as f having dimensions as m × n with length l weights as w and bias as b. So, the output can be written in the form of an equation as below:

Histogram of Oriented Gradient
The images are converted into sizes from 28 × 28 to 6 × 6 concurring blocks, and each block has a size of 2 × 2 with a stride of size 4. A total of nine bins are formed. A total of 1296 low-level features are computed. Normalization can be performed for better feature extraction as pulmonary images show better intensity and shadow normalization. Intensity is considered in the blocks of a greater image size. A similar orientation is computed for the opposite directions of the image part as they are grouped in the same bin. The range of the angle remains from 0 to 180. Gradient Magnitude M of the pixel (x,y) and its direction is given in the below equation as: where the angle ϑ varies from 0 to 2 π, while I y and I x are the gradients in a direction of x and y.

Local Binary Pattern
An LBP descriptor is used for texture feature extraction. It works on the principle that an individual pixel compares itself with its neighbor pixels; as a result, it encodes the local neighbors using the threshold function [52]. If the gray value of the neighbors is greater than or equal to the center pixel, the value of the center pixel is fixed as 1. Let k represent the total neighboring pixels, so LBP generates its double feature vectors such as 2 16 = 65,536 feature vectors, and the number increases as the size of the neighboring pixels increases. The equation of the LBP is given below.
where p is the neighbor pixel and c refers to the center one. Pseudo code for proposed framework is given in Algorithm 1.

Classification
For the classification, Support Vector Machine is a supervised learning algorithm trained with four classes according to the KL grading system, such as Grade-I, Grade-II, Grade-III, and Grade-IV using extracted features and the last class that belongs to a healthy knee. SVM is one of the most memory-efficient techniques. Random Forest (RF) has multiple de-correlated decision trees and remains the best for large datasets. K-Nearest Neighbour classifier (KNN) is also used and is the simplest of them. It classifies the data using a voting system and recognizes the undefined data. In KNN, if k = 1, the current object is allocated to the nearest neighbor class. The Block diagram of the proposed model is given in Figure 3.

Dataset
The Knee Osteoarthritis Severity Grading dataset, known as the Mendeley dataset IV [47], is used. The experiment was performed on a system Core-i7-7700K 4-core 4.2 GHz with 32 GB RAM (Intel Corporation, Santa Clara, CA, USA) and with NVIDIA Titan V providing 12 GB memory (Nvidia Corporation,Santa Clara, CA, USA). The dataset was collected from different hospitals. X-ray images were taken from the machine, PROTECT PRS 500EX-ray. All the images were in grayscale form and were labeled manually according to KL's grading system. A total of 500 images were used to train the classifier, of which 100 images showed healthy knees without KOA. For each Grade, according to KL, 100 images were used for the training. A five-fold validation was used, such as (50,50), (25,75), (30,70), (40,60), (20,80); here, an individual set represents the training and testing data respectively for each Grade and the last set is for the healthy class. The algorithm produces an accuracy of 98% for five-fold validation using the KNN algorithm and the combined feature vector of CNN and HOG and SVM with CNN feature vector gives an accuracy of 97.6%. The sample images from different classes are shown in Figure 4.

Results
The total execution times of this hybrid proposed system for each feature descriptor with all classifiers, such as SVM, RF and KNN, are computed as shown in Tables 2-4. Analysis graphs are shown in Figures 5-7. The time taken by the SVM classifier with LBP is 4.2 s, and the shortest time is by SVM_CNN, which is 3.84 s. However, the shortest time of SVM_CNN is also more significant than the shortest time of KNN-LBP, which is 2.3 s.    Hybridized Feature Descriptors: HOG and LBP extracted low-level features while CNN extracted the high-level features such as shear, translate, scale and rotate. The aim is to obtain shape features by combining these hybrid features. These hybridized features, that is, CNN+LBP and CNN+HOG, are used, and then these are classified using SVM, KNN, and RF algorithms. The combined equation of the convolutional neural network and the local binary pattern is given as below: The combined equation for the convolutional neural network and the histogram of oriented gradients is as below: f st.I x+s−1,y+t−1 . e sϑ(x+s−1,y+t−1) .
Analysis of Different Combinations of Methods: During the training phase, there were 20 convolutional layers. The size of the kernel for the pooling layer was 2. Convolutional and pooling layers were used alternatively three times. The activation layer was the ReLU layer. At the output layer, the Softmax function was used to extract the features of the knee. The average accuracy attained was 93% within 1870 iterations. The classification algorithms used in the proposed system are SVM, KNN, and RF for five classes, such as healthy and not healthy, which include Grade I, II, III, and IV classes. Firstly, an SVM classifier was trained on the dataset for true positive and true negative classes. We used five-fold cross-validation in which 460 not healthy and 130 healthy images were validated. Tables 5-8 show the results obtained for the proposed system.    Other than cross-validation, 420 images were used for testing purposes. Confusion matrices are shown in these tables for all classifiers. The descriptor is shown in Table 5 in the SVM classifier's confusion matrix with all combinations of features. Therefore, if the total number of images is 420, then SVM_LBP shows 360 True Positive (TP), 15 True Negative (TN), 37 False Positive (FP), and eight False Negative (FN). TP is a number of those images that belong to the positive class and are classified as positive by the system. TN is a number of those images that belong to the negative class, and in actuality, they are also negative. FP is a number of those images that are falsely classified as positive class images. FN is a number of those images that are falsely classified as negative class images. These confusion matrices are based on two classes, that is, healthy and diseased images. Diseased images are further divided into four grades. The detailed confusion matrix for SVM_LBP is shown in Figure 8, where 360 True positive images are divided according to grades of disease. Twenty-seven false-positive images are divided as (15,12,9,1); these images belonged to the healthy knee but they are falsely classified as diseased, that is, Grade I, II, III, and IV. Eight false negative frames were included and six Grade I and two Grade II images were classified as healthy. For all three classifiers used in the proposed system, confusion matrices are given in Tables 5-7.

Evaluation Metrics
Time: The RF's shortest time with a local binary pattern that is 2.23 s as described in Tables 2-4. The time taken by the SVM classifier with LBP is 4.2 s, and the shortest time by SVM is with CNN, which is 3.84 s. The shortest time by KNN is with LBP, which is 2.3 s.
Accuracy: Accuracy is computed to analyze the overall performance of the proposed algorithm on the data. An algorithm based on KNN with a combination of CNN and HOG attained 98% accuracy on five-fold validation and 97% accuracy on cross-validation. The algorithm obtained 91% accuracy for Grade I, 98% accuracy for Grade II, and 99.5% accuracy for Grade III and Grade IV. The equation for accuracy is given below.
The true positive, TP, refers to the number of images correctly classified as the Non-Healthy class, while false positive, FP, refers to the images falsely classified as Non-Healthy. The false negative, FN, refers to the frames that our proposed algorithm failed to detect as Non-Healthy but were in fact non-Healthy. FP refers to those frames that are falsely classified as non-healthy, but actually were healthy. After the KNN with CNN and HOG, the Random Forest classifier performed well on CNN extracted features by providing an accuracy of 93.09%.
Recall: Recall R is calculated and represents the percentage of images that were Non-Healthy that the system recalled. The highest Recall was obtained for the same combination, which was 97.95%. The equation of Recall is given below: Precision: Precision P is calculated and represents the percentage of those accurately classified by the proposed system. The highest precision value obtained for the KNN classifier using Convolutional Neural Network with Histogram of Oriented Gradient was 98.96%. A comparison of different proposed algorithms is given in Table 8 in order to select the best and a graphical representation is shown in Figure 9. The average and standard deviations of all the proposed algorithms are also reported in Table 8. The equation of precision is given below:

Comparison with State-of-the-Art
A comparison with the state-of-the-art is performed to analyze the performance of our proposed algorithm. The analysis is shown in Table 9. It is shown that all the existing algorithms are less robust and classify knee disease with less accuracy. The random forestbased technique is proposed in [53] using the images of patients collected from the hospital. It achieved 72.95% accuracy and 76.12% precision. CNN based methods for knee disease detection have been proposed in [15,16,54], using same dataset, that is, OAI and MOST. They attained 61.78%, 66.68%, and 67.49% accuracies. The VGG-19 based technique has been used in [55], achieving an accuracy of 69.70% using the OAI dataset. CNN and LSTM based knee severity disease classification was performed in [56] using the OAI dataset and attaining 75.28% accuracy. The maximum accuracy is attained by [57] based on the LSVM classifier. They attached the sensors to the patient's knee to collect the data using VAG signals. However, our proposed algorithm has attained 97.14% accuracy and 98.96% precision, which is the best result among the reported existing techniques. The accuracy of different gradings of knee Osteoarthritis (Grade I, II, III, IV) is represented in Table 10.

Conclusions
A novel knee osteoarthritis detection technique is presented for early-stage prediction. To achieve this goal, deep learning-based feature descriptors are utilized on X-ray images. The image is taken from the Mendeley Dataset VI for training and testing. The proposed model feature is extracted from the region of interest using joint space width by CNN with LBP and CNN with HOG. The multi-class classifiers, that is, SVM, RF, and KNN, are used to classify KOA according to the KL system. Five-Fold Validation and Cross-Validation are performed on the images. The proposed algorithm gives a 97.14% accuracy on crossvalidation and a 98% accuracy on five-fold validation. In the future, the proposed model can also be merged with models for the hybrid and diverse detection of different diseases other than of the knee. Further, this can also be exploited using feature fusion methods for the detection and classification of other diseases.  Data Availability Statement: Publicly available datasets were analysed in this study. This data can be found here: https://www.frontiersin.org/articles/10.3389/frobt.2020.591827/full (accessed on 23 June 2020) and https://data.mendeley.com/datasets/t9ndx37v5h/1#folder-18a3659a-1fa2-4340-b7 bb-526fb81006f6(accessed on 23 June 2020).

Conflicts of Interest:
The authors declare no conflict of interest.