Quantum-Inspired Interpretable AI-Empowered Decision Support System for Detection of Early-Stage Rheumatoid Arthritis in Primary Care Using Scarce Dataset

: Rheumatoid arthritis (RA) is a chronic inﬂammatory and long-term autoimmune disease that can lead to joint and bone erosion. This can lead to patients’ disability if not treated in a timely manner. Early detection of RA in settings such as primary care (as the ﬁrst contact with patients) can have an important role on the timely treatment of the disease. We aim to develop a web-based Decision Support System (DSS) to provide a proper assistance for primary care providers in early detection of RA patients. Using Sparse Fuzzy Cognitive Maps, as well as quantum-learning algorithm, we developed an online web-based DSS to assist in early detection of RA patients, and subsequently classify the disease severity into six different levels. The development process was completed in collaborating with two specialists in orthopedic as well as rheumatology orthopedic surgery. We used a sample of anonymous patient data for development of our model which was collected from Shohada University Hospital, Tabriz, Iran. We compared the results of our model with other machine learning methods (e.g., linear discriminant analysis, Support Vector Machines, and K-Nearest Neighbors). In addition to outperforming other methods of machine learning in terms of accuracy when all of the clinical features are used (accuracy of 69.23%), our model identiﬁed the relation of the different features with each other and gave higher explainability comparing to the other methods. For future works, we suggest applying the proposed model in different contexts and comparing the results, as well as assessing its usefulness in clinical practice. of objectives, study setting, predictors, outcome, statistical analysis, conclusions.


Introduction The Importance of Early Diagnosis of Rheumatoid Arthritis in Primary Care
Rheumatoid arthritis (RA) is an autoimmune, chronic inflammatory disease [1,2]. One can characterize the disease by persistent synovitis, systemic inflammation, and autoantibodies (particularly to rheumatoid factor and citrullinated peptide) [3]. The incidence of RA ranges between 0.5% to 1%, and is more common among women and older adults [3]. Aside from the social burden, RA carries a substantial individual burden, resulting in "musculoskeletal deficits, with attendant decline in physical function, quality of life, and cumulative comorbid risk" [4].
Primary care physicians can contribute to improved outcomes of RA patients [1]. Primary care, is the gateway into the health care system for all needs and problems and all conditions, including uncommon or unusual ones such as RA [5,6]. Primary care providers are expected to recognize RA patients as early as possible and refer them to a rheumatologist [7]. Early diagnosis of RA, and consequently early treatment, are essential to better management of RA and may lead to reduce bone tissue loss and increase favorable outcomes, including remission [3,8,9]. However, diagnosis of RA is complex and difficult, and in many patients, early diagnosis is not possible given that clinical indicators are not specific to RA. Indeed, in the early stages of the disease, the typical RA patient has "tender and swollen joints of recent onset, morning joint stiffness, and abnormal laboratory tests, such as elevated concentrations of C-reactive protein or erythrocyte sedimentation rate" [3] which can be indicative of RA or other types of arthritis (e.g., reactive arthritis, osteoarthritis, psoriatic arthritis, infectious arthritis, or rarer autoimmune conditions such as connective tissue diseases) [3].
Our team's literature review showed that among over thousands of the studies only about 90 studies focus on AI in primary care, and among those only two studied the use of AI for diagnosis of RA [10]. Primary care providers need to have reliable RA diagnostic tools, as its early diagnosis could reduce the negative impact. The goal of this study is to develop an interpretable intelligent decision support system based on specialists knowledge (i.e., rheumatologists and orthopedic surgeons).
We organize the rest of this paper as following: Section 2.1 explains some of the previous studies focusing on diagnosis of RA, Sections 2.1-2.4 describes briefly the different methods we utilized and the previous work by those methods, Section 3.1 gives details about the dataset that we used in our study, Section 3.2 provides the description of the proposed algorithm, Section 4 contains the results of the proposed algorithm and comparison with other algorithms, Section 5 highlights the limitations of the this study and Section 6 concludes the paper.

Previous Works on Diagnosis of RA
In the previous work [11], we designed a RA diagnosis decision support system by training a 10-node fully-connected Fuzzy Cognitive Map (FCM) and using a particle swarm optimization (PSO) algorithm. Morita et al. [12] proposed a finger joint detection method for RA diagnosis using 45 Japanese RA patients X-ray images, and support vector machines (SVM). Singh et al. [13] used human knowledge as rules for fuzzy logic controller (FLC) for diagnosis of RA, and Montejo et al. [14] used optical tomography images, extracted 594 features from the images, and using five different classifiers, classified images of RA patients.
Despite the attempts, some improvements are still needed in this area: (a) The previous works introduced fully connected networks. Those models have a high number of parameters, so it is possible for the model to memorize the different samples that it is trained on. This can increase the chance of overfitting due to increase in complexity of the network [15] and can decrease the ability of both explainability and static analysis of the network. (b) Previous works have considered simple objective functions in their classification process, such as classification accuracy. Generalizability of the model is low when one is dealing with small datasets, such as the datasets used in the above-mentioned works. Additionally, accuracy might not be the best metric when the training data have an imbalance in the number of classes. Therefore, it is important to tackle this problem by defining the right objective function. In order to overcome the above-mentioned limitations, in this study we propose a novel method based on FCM and a quantum learning algorithm [16], to classify the severity of RA data into six different classes in a way that is more interpretable and generalizable. The outcome of the interest is detection of RA patients at early stages.

Fuzzy Cognitive Maps
Fuzzy Cognitive Maps have been developed by Kosko [17] and are based on cognitive maps theory [18]. Using causal models, they attempt to mimic human experts' cognitive processes in specific domains. FCM uses a number of concepts and the causal relationships existing between the features for modeling a system, which can be represented as a directed graph [19]. FCM includes N n concepts whose values can be shown as Equation (1).
where C is a state vector and C i [0, 1] represents the value of the ith concept. As the value of a concept approaches +1, its associated activation degree increases. The causal relationship of concepts can be stated in terms of a weight matrix, shown in the Equation (2).
where w ij [−1, +1] shows the value of a weight from the ith to the jth concept. When w ij is a positive number, the ith concept has a positive impact on the jth concept. In other words, any increase in the ith concept results in an increase in the jth concept. The ith concept has a negative impact on the jth concept when w ij is a negative number. In the case of w ij = 0, there is no causal relationship between the ith and jth concepts [19]. If A is causally related to B, it does not necessarily mean that B is caused by A as well. Thus, w ij do not need to be equal to w ji . In other words, the weight matrix does not need to be a symmetric matrix. Figure 1 shows a 4-node FCM with its associated weight matrix. The ith node value in the (t + 1)th iteration can be calculated from the weight matrix and the values of the concepts in the tth iteration. By using Equation (3), we can obtain: where Ψ(x) is a transfer function, and is for limiting the output of the concept values to the desired range. Based on the conducted experiments [20], sigmoid transfer functions outperform other types of transfer functions; hence, we used this function, as stated in the Equation (4).
where λ is a free parameter which denotes the function's slope. A typical value of λ is 5 [21]. Consider the Equation (3) in terms of a matrix multiplication: where A(t) T represents the transpose of matrix A at tth iteration. The Equation (5) illustrates that, in every iteration, a FCM calculates the linear combination of row vectors, denoted by each with the C i coefficient and does a transformation to keep the values in the desired range. Owing to the use of a continuous transfer function, a FCM simulation can reach one of the following three cases [22]: (1) "Fixed point attractor" where after a limited number of iterations, all concepts converge to a fixed pattern; (2) "Limit cycle", where after a number of iterations, all concepts will fluctuate between a limited number of fixed patterns; and (3) "Chaotic attractor" where concepts will fluctuate between an unlimited number of patterns. Figure 2 shows a fixed-point attractor simulation for the FCM shown in Figure 1.

Particle Swarm Optimization
Kennedy and Eberhart [23] introduced Particle Swarm Optimization (PSO) based on behavior observed in nature. It is one of the most popular optimization algorithms and used in various different fields, such as finance [24], chemistry [25], and medicine [26]. PSO is a search algorithm, which is based on population concept, where the particles comprising the population move in the multi-dimensional space to find the optimal position that optimizes an objective function. Based on the values returned by the objective function at each iteration, the gbest is the position which returns the global best value over all iterations and pbest i is the position having the best value of the ith particle over all iterations.
The ith position in a d-dimensional search space, denoted by move towards a position in between the gbest and pbest i , guided by velocity v i which is also a d-dimensional vector. The whole update equations are given in Equations (6) and (7).
where ω is a number chosen in the range of [0.1, 0.5] and c 1 , c 2 are two numbers in the range of [1.5, 2]. The values are chosen, such that PSO algorithm has a proper exploration and exploitation abilities at the same time. More exploration causes the particles to not converge to an optima. Although having a lot of exploitation would make the particles become stuck in a local optima, as they are not able to explore most of the search space.

The QFCM Algorithm
Fuzzy cognitive maps can be analyzed in two different ways: dynamic and static analysis. In dynamic analysis, values that are obtained from a FCM simulation, and the discrepancies between them and the test pattern are important. In static analysis, the weights, or lack of it, are important. Non-zero weights in FCM represent a causal relationship between concepts, in contrast to conventional neural networks such as multilayer perceptron (MLP).
Designing algorithms which can form a FCM with both dynamic and static analyses abilities is not an easy task and even conventional algorithms such as Non-linear Hebbian Learning (NHL) [27] are not able to do so. Recently, we proposed QFCM algorithm [16] to tackle this problem. It outperformed some other newly developed algorithms such as dMAGA [28]. The foundation of the QFCM algorithm is that it models the existence of a weight as a Q-bit. Q-bits are information units in the quantum evolutionary algorithm (QEA) [29], and models the values of weights as particles, which are theunits of information in PSO algorithm. Equation (8) shows a simple Q-bit.
In Equation (8), α 2 i and β 2 i denote the probability that Q i exists (i.e., one state), and does not exist (i.e., zero state). We combine the quantum evolutionary algorithm (QEA) and particle swarm optimization (PSO) algorithm, such that the FCMs whose training is performed by QFCM, not only contain the causal relationship between the components but also can be analyzed dynamically or statically. One of the limitations of the QFCM is that it was developed for time series predictions. It is, therefore, not currently appropriate for classification problems. In this study, we overcame this limitation.

Materials and Methods
In this paper, we report our method development and validation according to the multivariable prediction model's transparent reporting for individual prognosis or diagnosis (TRIPOD) guideline. The TRIPOD guideline is used to help the authors in writing reports and help the readers to critically look at the different sections of the report [30]. The guideline of TRIPOD has been offered to support authors in writing reports of development and validation of their prediction models. See Appendix A for the TRIPOD checklist.

Dataset
To develop our web-based decision support system (DSS), we used a dataset with the information of 13 anonymous patients with RA who were randomly chosen from Shohada University Hospital in 2016. Table 1 shows the features that are used in the study along with the justification for their use. Table 2 shows some samples from the dataset and their associated severity or class label. We included all adults diagnosed with RA. The dataset has been used for training and validating. A subset of this dataset had been used for regression [11]. As with all artificial intelligence (AI) and machine learning (ML) empowered systems, the output of our DSS is highly related to the data with which it has been developed (input data). Given the complex and ambiguous nature of patient data, including clinical judgements, healthcare professionals may find it easier to express these data using linguistic variables rather than numerical ones [31]. In AI, fuzzy logic can help deal with these ambiguous, subjective, and imprecise judgments. Therefore, with the physicians, we chose six fuzzy variables with Gaussian membership functions (Extremely Severe, Very Severe, Severe, Minor, Very Minor, Extremely Minor) to describe the RA diagnostic criteria. The criteria and justifications for their selection are provided in Table 1. For further discussion regarding the selection of these criteria, refer to [11].

C1: Rest pain
Pain is one the most common symptoms in patients with RA. While it is assumed to be interlinked with inflammation, in many cases, despite controlling the inflammation, the FL pain persists [32,33].

C2: Morning stiffness
This symptom is common among patients with RA. Clinical trials have shown that the duration of this symptom is associated with reduced quality of life [34].

C3: Symmetry of joint infection
Symmetrical joint involvement is a hallmark of RA. Patients usually have several infections in their joints [35].

C4: Redness
Due to inflammation, joints may become red and warm in comparison to FL the surrounding tissue [35]. Additionally, based on health professionals' opinions, we assigned six different severity levels to the patients with RA so that they can also help with a more subjective understanding. The levels for each of the conditions for each of the patients is taken and there were no missing data in our dataset. Some of the selected data from the initial dataset from the hospital is shown in Table 2. The C i refers to the C i criteria which is defined in Table 1.

Proposed Method
Our proposed method includes the training of a FCM with our QFCM algorithm [16] modified for classification problems and with a new objective function. The modified version of QFCM algorithm is a supervised learning methodology, that is presented in in Algorithm 1. for Q = [Q 1 Q 2 · · · Q n 2 ] do 4: observe Q to produce a sparse network. 5: update velocity and position of the particles. 6: mutate particles. 7: repair particles. 8: classify the RA patients' data by using output concept's value and output fuzzy sets. 9: calculate the value of the objective function 10: update best local and best global particles 11: end for 12: update all Qs with H gate. 13: update the best quantum candidate. 14: if migration period reached then 15: perform local as well as global migration. .05] is omitted because it cannot represent a causal relationship in a FCM [38]. In the observation process, either 1 (i.e., existence of a link) or 0 (i.e., inexistence of a link) is assigned to the Q-bits, based on the Equation (9).
In Equation (9), r i is a random number in the range of [0, 1] with uniform distribution. In the next step, the positions and velocities of the particles are updated according to the Equations (10) and (11), which are proposed in [39] as the modified version of the PSO algorithm.
where p i (t) and v i (t) represent the position and velocity of the ith particle at tth iteration. ω, c 1 , and c 2 are three random numbers in the ranges of [0.1, 0.5], [1.5, 2], and [1.5, 2], respectively. "lbest i " and "gbest" show the best positions of the ith and of all particles, respectively. In step 6 of the QFCM algorithm, mutation occurs: elements from the latter half of each particle are sampled with a probability of µ, and replaced with a random number in the range of [−1, 1]. In the repair step (i.e., step 7), the values of all particles are confined to the range [−1, +1] using Equation (12). It is worth noting that if p i is in the range (+1, +∞) or in the range (−∞, −1), the velocity of ith particle is multiplied by −1 to reverse the search focus direction given that saturation has occurred in the initial direction. This ensures that the search algorithm does not explore areas that are outside the search space.
In the classification step 8, all of the trained samples are assigned to one of the six classes, illustrating the severity of RA. To this end, the value of the FCM's output concept is calculated for a given sample, as is the membership degree of this value in each of the six fuzzy sets (Figure 3). A sample is assigned to a class if its membership degree in this class is higher than that in the other classes. Centers and widths of the membership functions are design parameters. After the classification step, the output of objective function, proposed in this study in the context of FCMs, is calculated by Equation (13).
In Equation (13), "#misclassified" is the total number of misclassified samples, "#samples" is the total number of samples in the training set, x i is the value of the FCM output concept for the ith sample in the training set, b 1 i and b 2 i are the two borders (i.e., intersection of fuzzy membership functions) nearest to x i , and NF is the normalization factor that is defined in Equation (14). NF is defined in order to limit the second term of the objective function to the range of [0, 1].
In Equation (14), Lb represents the length of the two farthest successive borders. As indicated by Equation (13), the objective function is designed in such a way that, apart from the classification accuracy, it considers the distance of the training samples from the borders. The global minimum of the second term of the objective function occurs when the FCM maps all the training samples, exactly, to the centers of the successive borders, placing them thus at the farthest possible distance from the borders. Therefore, according to the theory presented in the SVMs [40], the probability of generalizability increases. In step 10 of the QFCM algorithm, the best local and best global particles within the quantum population are saved. Subsequently, in step 12, the Q-bits are updated using H gate [41], which is defined in Equation (15).
In Equation (15), Rotate(Q i ) indicates the rotation of the Q-bit by degrees, and the amount of rotation is a design parameter with the typical value of 0.01. In step 15, local and global migration is performed as a mechanism for avoiding local optima. In this regard, values of the best quantum candidate are copied to other candidates locally or globally.
Fengmao et al. [42] showed that after several iterations, the Q-bit converges to either condition 1 or condition 3 of Equation (15). In previous studies [16], we proved that after convergence, it is difficult to escape from the optima it has converged into. Since this work is an extension to work for classification, the same reason applies and after several iterations, there is a low probability to escape from the local optima.
The new objective function defined in Equation (13) considers the predicted labels and the true data to assign values to each position. The modified QFCM algorithm is a supervised learning algorithm that classifies the severity of RA in the patient. For a new patient data, the attributes of the person is taken and the last attribute is taken to be 1 √ 2 . The attributes for the next iteration is obtained using Equation (5). The last attribute of the updated list can be mapped into the fuzzy membership function shown in Figure 3 to classify the patient into the different categories.

Experimental Results
In this section, we will first present the results of our analysis using the proposed method, as well as, the results of its comparison with other machine learning methods. Then, we will present the contribution of each of the diagnostic criteria to the results by illustrating the weight matrix obtained from training an FCM with our proposed method. For demonstrating the robustness of the proposed method against different parameter settings, we set the free parameters as shown in Table 3.

Classification Accuracy
We trained a 10-node FCM, with one output concept, by using the data shown in the Table 2 and the proposed method. The dataset consists of 13 patients taken randomly from Shohada University Hospital. Considering the sample size, the selection of a reliable metric is important. For evaluating its efficacy, in view of the scarce dataset, we used leave-one-out cross validation method (LOOCV). Tables 4-12 shows the accuracy and confusion matrix obtained. Our modified QFCM algorithm (i.e., proposed method) classified nine of the 13 samples correctly, representing an accuracy rate of 69.23%. Among the four misclassified samples, two belong to class 2, one belongs to class 1, and one belongs to class 4. In addition, based on the obtained confusion matrix, in three of the four misclassified samples, the predicted severity is higher than the actual severity. In other words, although misclassified, underdiagnosing of the patients with RA is avoided. In clinical contexts, false negatives are more dangerous than false positives. Overestimation of models leads to a false positive (i.e., overdiagnosis) rather than a false negative (i.e., underdiagnosis), therefore there will be a higher chance of the patient being asked to see a specialist.
In order to compare our results with other machine learning methods, we trained and evaluated different classifiers, namely linear discriminant analysis (LDA), linear SVM, quadratic SVM, cubic SVM, fine K nearest neighborhood (KNN), and weighted KNN-by LOOCV and using the same dataset (Table 2). To check the highest accuracy, we also tried to reduce the number of features and rerun the experiments. Since we are removing the search space, methods such as KNN are expected to perform better. However, based on domain knowledge (i.e., clinical literature and our collaborating specialists opinion), we observed that the removed features to increase accuracy are quite important in clinical experiments. Tables 4-12 present the results. The two models with fewer features had been checked to see if reducing the features would improve the accuracy or not. In one case, it does increase the accuracy but in cost of losing important clinical features which absolutely needs to be considered in this clinical context. Among the rest of the classifiers evaluated, LDA performed the best with an accuracy rate of 53.8%, which is 15.4% lower than that of our QFCM (i.e., 69.23%).
Moreover, unlike our proposed method, LDA underestimates the severity of RA, which may result in mis/under-diagnosis. Figure 4 presents a coweb [43] graphical representation of our proposed method and LDA to visually compare the two methods. It illustrates that the area under the curve for LDA is larger than the proposed method illustrating its lower accuracy. We randomly split the data into 5 folds, and QFCM plus LDA, linear SVM, Quadratic SVM, Cubic SVM, Fine KNN, and weighted KNN were trained on the data. We saved the obtained accuracy. We repeated this procedure for 10 times randomlyhaving sufficient accuracies to run statistical test. Then, we applied t-test. The following table shows the obtained p-values (Table 13). Given the results, the inference is that in all scenarios, there is a significant difference between the accuracies obtained by the proposed method, i.e., QFCM, and other methods. This testing proves that QFCM can significantly outperform other methods in terms of accuracy.   Table 5. Linear discriminant analysis (LDA). Accuracy: 53.8%.

Weight Matrix of the FCM and Its Interpretability
Using our dataset (Table 2), we trained an FCM, with the weight matrix shown in Equation (16). The density of this FCM is 50%, meaning that half of the 100 weights are zero. The first nine columns represent the nine criteria in the order presented in Table 1. Furthermore, an extra node has been added which is connected to all other nodes. This 10th node is used to determine the contribution of the other nodes to detect the disease. The 10th column of this matrix elucidates the impact of each of the features on the output concept. None of the weights of associated with RA diagnostic tests (i.e., C7, C8, C9) are 0, demonstrating the importance of these tests relative to the physical symptoms of RA.
Among the physical symptoms chosen for the diagnostic criteria, rest pain had the most important contribution to the output, whereas the weights of morning stiffness, redness, and body pain were zero and among lab tests, ESR had a greater impact on the output. Regarding anti-CCP and RF, our QFCM algorithm assigned a larger weight to anti-CCP, which indicates that it contributes more to the output than RF, which is compatible with the clinical study conducted on over 1025 patients [44].
Using the Equation (16), the interactions between the criteria can be investigated. Weights with values near to 1 or −1 are indicative of strong relationships. For example, referring to the first column on the left, if we ignore the self-feedback/loop, our results indicate that ESR (i.e., C9) is the criterion most strongly related to rest pain (i.e., C1) and symmetry of joint infection (i.e., C3), or according to the 5th column from the left, body pain and redness (i.e., C5 and C4) are interlinked.

Web Based DSS
Our DSS is freely available for academic purposes and can be accessed from the GitHub page https://github.com/rahimi-s-lab/RA-paper (accessed on 1 December 2021) or https: //rahimislab.ca/ra-dss (accessed on 1 December 2021) and is coded in the Hypertext Preprocessor (PHP) language to make it easy to use. To help calculate the severity of a RA patient, the input data should be inserted by a user for each of the nine diagnostic criteria ( Figure 5). The DSS will perform all calculations based on the proposed method in this study and immediately display the patient's severity of RA along with interpretations of the results ( Figure 6). The DSS also contains information on the symptoms and some information on RA.

Limitations
First, while we acknowledge the small sample size in our study, we believe this study is a good illustration of how hybrid interpretable AI methods could be developed with support of domain knowledge for early detection of diseases using small dataset. Second, the developed web based decision support system is not tested and/or validated with different users. Future studies are needed to evaluate end-users perspectives on the developed tool.

Conclusions
Primary care providers are responsible for identifying patients with RA and referring them to a specialist, however, the diagnosis of patients with RA is complex and, in many cases, early diagnosis of RA by primary care providers is not an easy task because of the non-specific nature of their symptoms and clinical indicators. The aim of this study was to: (1) contribute to the existing methodology in the field by overcoming the current limitations, and (2) develop a web-based Decision Support System to aid primary care providers in early diagnosis of patients with RA. We developed this system based on wellknown soft computing method, Fuzzy Cognitive Maps, and modified quantum learning algorithm. To develop the algorithm for this system, we consulted with two specialists (i.e., a rheumatologist and an orthopedic surgeon) and integrated their knowledge into our model. We evaluated the accuracy of the proposed method and compared its accuracy rate with other machine learning methods such as LDA, quadratic SVM, weighted KNN which had accuracies of 53.8%, 46.2% and 46.2%, respectively. Our proposed hybrid method obtained the highest accuracy when all the features of interest are considered and outperformed other machine learning methods. Apart from having higher accuracy, one of the strengths of our proposed hybrid method is its interpretability. Due to the FCM matrix generated, one can obtain an idea of how the different features are related to each other and contribute to the final output. For the future works, more investigations are required to evaluate the developed method and web-based decision support system in larger-scale, adapt it to other clinical contexts, and interlink the knowledge obtained from the interpretability of the network into human knowledge.