Moderate-High Disease Activity in Patients with Recent-Onset Psoriatic Arthritis—Multivariable Prediction Model Based on Machine Learning

The aim was to identify patient- and disease-related characteristics predicting moderate-to-high disease activity in recent-onset psoriatic arthritis (PsA). We performed a multicenter observational prospective study (2-year follow-up, regular annual visits) in patients aged ≥18 years who fulfilled the CASPAR criteria and had less than 2 years since the onset of symptoms. The moderate-to-high activity of PsA was defined as DAPSA > 14. We trained a logistic regression model and random forest–type and XGBoost machine learning algorithms to analyze the association between the outcome measure and the variables selected in the bivariate analysis. The sample comprised 158 patients. At the first follow-up visit, 20.8% of the patients who attended the clinic had a moderate-to-severe disease. This percentage rose to 21.2% on the second visit. The variables predicting moderate-high activity were the PsAID score, tender joint count, level of physical activity, and sex. The mean values of the measures of validity of the machine learning algorithms were all high, especially sensitivity (98%; 95% CI: 86.89–100.00). PsAID was the most important variable in the prediction algorithms, reinforcing the convenience of its inclusion in daily clinical practice. Strategies that focus on the needs of women with PsA should be considered.


Introduction
Psoriatic arthritis (PsA) is a chronic inflammatory disease included in the group of spondyloarthritis, although the multifaceted nature of this entity has led it to be addressed in a differentiated and specific manner [1]. According to recent data, the prevalence of PsA in adults is approximately 0.6%, thus making this disease one of the most common in rheumatology clinics [2].
Evaluation of the activity of PsA has been mostly directed towards developing monodimensional instruments, such as the DAPSA (Disease Activity Index for Psoriatic Arthritis), which is a simple sum of the number of painful joints, the number of swollen joints, the pain, and the general estimate that the patient makes about his or her disease, to which the C-reactive protein value can be added (optional). However, given the multifaceted profile of the disease, multidimensional instruments have also been developed to ensure a broader assessment of patients [3]. The latter is focused on capturing the multiple facets of the disease (skin, joints, physical function, quality of life, etc.) as opposed to the onedimensional instruments designed only to assess the musculoskeletal component. The European League Against Rheumatism (EULAR) favors the former in its recommendations on the management of PsA, whereas the Group for Research and Assessment of Psoriasis and Psoriatic Arthritis clearly favors the latter [3,4].
In any case, an area of constant doubt requiring further research is that of identifying which characteristics of PsA can predict greater disease activity and/or severity. Traditional recommendations by EULAR on the management of PsA state the factors indicative of a poor prognosis to be female sex, elevated acute phase reactant values, dactylitis, early structural damage, and a high number of joints involved [4]. Nevertheless, it is difficult to discern whether these and other factors reported in the literature refer more to disease activity itself (inflammatory burden), cumulative damage (functional burden), or the overall impact of the disease, which brings together the previous factors. Indeed, the concept of severity should not be made homonymous with that of inflammatory load. The severity of the disease brings together not only the inflammatory load of the disease but also physical disability, structural damage, and the impact on quality of life, among others. The factor that perhaps weighs the most in this equation, especially at the beginning of the disease, is inflammatory activity, but with the passage of time, this loses its predictive capacity, and there are other aspects (structural damage, comorbidities, deterioration in physical function, etc.) more clearly associated with greater severity [3][4][5][6][7]. In addition, more than one study has shown how the outcome measures on disease activity and those on the negative impact on QoL are clearly discordant, which suggests that both aspects measure different things within the overall health of patients with PsA [3][4][5][6][7].
Therefore, being able to predict which aspects of PsA are associated with increased disease activity is not only essential for better management of the disease but also an unresolved issue. From a clinical practice viewpoint, the need for information on predictors of greater disease activity is immediate, given that a suitable approach to inflammatory activity could limit or prevent structural damage.
The objective of the present study was to evaluate which patient-and disease-related characteristics predict moderate-to-high disease activity in patients with recent-onset PsA during the first years of follow-up.

Materials and Methods
This work is part of the REAPSER study, a multicenter observational prospective study with a 2-year follow-up and regular annual visits, promoted by the Spanish Society of Rheumatology. The design of REAPSER has been described in detail elsewhere [5][6][7][8]. Patients of both sexes, aged ≥18 years, who met the Classification Criteria for Psoriatic Arthritis (CASPAR) [9] and had less than 2 years since the onset of symptoms attributable to the disease were included.
A baseline visit evaluated the patient's situation before disease progress was modified by the treatments prescribed by the rheumatologist. According to this, the time from treatment initiation to the baseline visit could not be longer than 3 weeks for methotrexate, leflunomide, or apremilast, and patients could not be on biologic disease-modifying antirheumatic drugs (DMARDs). These time gaps were defined according to the mean time between treatment initiation and the onset of the response (4 weeks for synthetic DMARDs and 1 week for biologic DMARDs). There were 9 patients with more than 3 weeks on synthetic DMARDs but less than 2 months; the investigating rheumatologists confirmed that these patients had not responded to treatment at the baseline visit.
Psoriatic patients treated with synthetic or biologic DMARDs that were referred to the rheumatologist because of recent onset PsA could be included in the study since this would not go against the criterion explained in the previous paragraph.
Patients were invited to participate consecutively when visiting the rheumatologist. Baseline visits took place between November 2014 and October 2016. 25 centers covering a wide area of Spain participated in the study.
Written informed consent was obtained from all participants. The study complies with the Declaration of Helsinki and was approved by the Clinical Research Ethics Committee of the Principality of Asturias (study number 14/2014).

Variables and Measurement
Variables included in REAPSER have been previously described [8]. For this work, we considered 43 variables: (a) Sociodemographic data: age; sex; educational level (none, primary, secondary, university). (b) Family history of PsA, other types of inflammatory arthritis, and psoriasis. (c) Comorbidities (based on a review of medical records): age-adjusted Charlson comorbidity index [10], cardiovascular risk factors (arterial hypertension, hyperlipidemia, diabetes mellitus (insulin and non-insulin-dependent)). (d) Anthropometric data: body mass index (BMI). (e) Lifestyle: smoking, alcohol consumption [11], and physical activity (low, moderate, and high) [12]. (f) The clinical situation at diagnosis of PsA: year of presentation of symptoms; clinical form (axial, peripheral, mixed); articular pattern (oligoarticular, polyarticular, distal, mutilans, spondylitis); the presence of dactylitis (yes/no). (g) Joint involvement and enthesitis: number of tender joints (NTJ68); the number of swollen joints (NSJ66); an extended version of the Maastricht Ankylosing Spondylitis Enthesitis Score (MASES) [13]. Polyarthritis was defined as NSJ66 ≥5. (h) Pain and global assessment of disease during the previous week: Global pain on a scale from 0 (no pain) to 10 (very intense); patient global assessment of disease (from 0 -feels very well-to 10 -feels very ill-); physician global assessment of disease (from 0 -minimal activity-to 10 -maximum activity-). (i) Cutaneous and nail involvement (evaluated by a dermatologist): cutaneous psoriasis (yes/no); year of onset of psoriasis; clinical type; specific locations; treatment of psoriasis at PsA diagnosis. Psoriasis Area and Severity Index (PASI) [14]; onychopathy (number of digits affected). Severe psoriasis was defined as PASI >10 [14]. (j) Functional situation and quality of life: Health Assessment Questionnaire (HAQ) [15], Psoriatic Arthritis Impact of Disease (PsAID) [16]. (k) Radiographic evaluation at baseline: Bath Ankylosing Spondylitis Radiology Index (BASRI) of the sacroiliac region [17], hand involvement according to the modified Steinbrocker method for PsA [18]. (l) Laboratory tests: C-reactive protein (CRP), uric acid, total cholesterol, LDL cholesterol, and triglycerides. For purposes of the analysis, a series of cut-off points were established to define high values: >0.5 mg/dL for standard CRP; >0.3 mg/dL for highsensitivity CRP; hyperuricemia if >7 mg/dL in men and >6 mg/dL in women; ≥200 mg/dL for total cholesterol; ≥100 mg/dL for LDL; ≥150 mg/dL for triglycerides. (m) Treatment of PsA with DMARDs: synthetic DMARDs, biologic DMARDs (the different drugs within these two groups were not considered individually). (n) The moderate-to-high activity of PsA (binary outcome variable) is defined as DAPSA >14 [19].
DAPSA is calculated as the sum of NTJ68, NSJ66, CRP (mg/dL), patient global assessment of disease, and global pain [19].
Rheumatologists assessing the patients did not know the objectives of this specific analysis.

Sample Size
The REAPSER study was a cohort intended to collect a large number of variables without prespecified hypotheses, so a sample size was not previously calculated for this work. The duration of psoriasis was imputed with the median in the remaining patients within the same age range (<41 years, 41-60 years, and >60 years). - Since during data monitoring we observed that cases in which systemic treatment of psoriasis was not available were patients with no treatment or topical treatment, systemic treatment was imputed with 0 (not receiving it). This imputation affected only two cases. -Radiological involvement of the hands was not imputed, except for those patients with both NTJ28 and NSJ28 values of 0, who were imputed with 0. -For patients who stopped attending the visits due to improvement in their condition, the missing values for the variables PsAID, HAQ, and moderate or severe disease activity were imputed with 0.

Generation of the Dataset
The analysis aimed to estimate predictive ability, seeking associations between the outcome measure (moderate-high activity of PsA) and values at the previous visit for the remaining variables. To do so, the dataset contained data for the independent variables from the baseline visit and from follow-up visit number 1, which were matched with the outcome measure from follow-up visits 1 and 2, respectively. Atemporal variables (such as sex) and those only collected at the baseline visit (e.g., clinical form at diagnosis) were matched with the outcome measure from follow-up visits 1 and 2; therefore, their values are the same for each one.

Bivariate Analysis
To identify informative variables, we used two different methods: significant Spearman correlation according to the threshold applied to the p correlation coefficient (|p| > 2 √ N , with N being the number of data items), and methods based on artificial intelligence, specifically the XGBoost algorithm with the SHAP technique (the twenty most important variables according to this technique were identified; see Appendix A for a detailed explanation of this approach).
After informative variables were identified, we applied the Mann-Whitney test for continuous or discrete variables and the χ 2 test for categorical variables to select those variables associated with the outcome measure (p < 0.05).

Multivariate Analysis
Considering that the dataset contained data from different annual visits for the same subjects, we included the number of visits (the different time points) as a variable in the multivariate analysis, so the association of the other variables with the outcome measure is adjusted for this variable.
To generate models where the independent variables do not share information and have a significant contribution to the model when adjusting for the rest of the variables included, we identified those variables with p < 0.05 in an iterative fashion using logistic regression models based on artificial intelligence. To avoid overfitting, the steps were performed in 75% of the sample (the training dataset, which was generated in a random and balanced manner, i.e., the categories of the outcome measure were equally distributed in that 75% of the sample and the remaining 25%) (see Appendix A for a detailed explanation).
Subsequently, random forest-type and XGBoost machine learning algorithms were used to analyze the association between the outcome measure and the variables selected (see Appendix A for more detail). To train the machine learning models, the sample is randomly split into two subsets: one to train the model and the other to evaluate its functioning. The split is also generated in a balanced way; the categories of the outcome measure are equally distributed in both subsets. When the subsamples are imbalanced, those data whose value for the outcome variable is a minority value are duplicated or triplicated to train the models (oversampling technique).
As different splits might result in different models, k-fold cross-validation was used to reduce this randomness. It consists in splitting the dataset into k subsets of the same size, and iteratively training the algorithm with k-1 of them while testing the model with the one left. After k iterations, the algorithm will have been trained and evaluated with all the partitions. In this analysis, a k-fold cross-validation with k = 5 was performed (models were trained with 80% of the data at each iteration and then evaluated with the remaining 20%). The subsets were the same for the random forest and XGBoost.
The contribution of the variables to the prediction of each iteration of the algorithms was calculated by the feature importance of each variable in the training data. To estimate the validity of the algorithms, we calculated the accuracy, sensitivity, specificity, positive predictive, and negative predictive values as the mean of the values obtained for each parameter in the five evaluations performed in the cross-validation.
Data analysis was performed with Python

Results
The sample comprised 158 patients. The baseline characteristics have been previously published [6]; 59.6% of the patients had moderate-to-high disease activity; their rheumatologists prescribed synthetic DMARDs to 60.8% of the patients and biologic DMARDs to 1.3%.
Thirty-three patients (20.9%) were lost to follow-up. For 10 of these patients, the investigating rheumatologists could confirm that they had not attended the visit because their PsA had improved. 20.8% and 21.2% of the patients who attended the clinic had moderate-high disease activity at the first and second follow-up visits, respectively. Table 1 shows the variables selected in the bivariate analysis.

Multivariate Analysis
The number of observations for this analysis was 319. Table 2 shows the results of the logistic regression. The variables predicting moderatehigh disease activity selected in this analysis were PsAID, tender joint count, level of physical activity, and sex. When the random forest-type and XGBoost machine learning algorithms were trained with these 4 variables, PsAID was the most important variable according to the values of feature importance in all the models (Table 3).  Table 4 shows the mean of the values of accuracy, sensitivity, specificity, positive predictive value, and negative predictive value in the different evaluations performed in the cross-validation.

Discussion
In this multicenter prospective study carried out in patients with recent-onset PsA, assessed at baseline before the potential modification of its natural history because of the treatment prescribed by a rheumatologist, an artificial intelligence-based analysis revealed 4 variables that could predict moderate-to-severe disease activity (according to DAPSA, a one-dimensional tool to assess the inflammatory activity of PsA in clinical routine) on the following annual visit during the first two years after diagnosis: PsAID, tender joint count, level of physical activity, and sex. The mean values of the measures of validity of the machine learning algorithms were all high, especially sensitivity and NPV.
PsAID, a 12-item questionnaire designed to capture the impact of PsA on a patient s daily life, was the most important variable in all the models. It is currently the gold standard for estimating the impact of PsA on a patient s health [20]. This 12-item instrument evaluates various aspects, such as pain, skin, physical function, sleep, psychosocial aspects, the ability to work, and the ability to enjoy leisure activities. PsAID could serve to evaluate not only disease activity but also functioning and aspects that do not necessarily reflect disease activity or inflammatory burden [20]. Therefore, to a certain extent, PsAID could be an all-in-one tool for the evaluation of PsA in clinical practice. Nevertheless, analysis of the metric characteristics of PsAID with respect to other evaluation tools or disease outcomes shows that the degree of agreement ranges from discrete to considerable [21][22][23]. Furthermore, PsAID shows good sensitivity to change in real-world studies on PsA [24]. Similarly, although the debate between the use of mono and multidimensional activity indices continues in PsA, a recent multicenter study showed better agreement between a low-impact PsAID and remission according to DAPSA (k: 0.58) than with very low disease activity (VLDA response) (k: 0.18) [22]. In summary, the finding of PsAID as an independent predictor of higher disease activity as measured by DAPSA in our study seems to indicate that when patients make a global assessment of their state and record this information using DAPSA, they could be already including, albeit indirectly, information on other types of involvement (e.g., axial, enthesis, or skin).
The tender joint count was the second variable in order of importance according to the results of the random forest algorithms. Based on the results of the bivariate and logistic regression analyses, the tender joint count was more closely associated with moderate-high disease activity than polyarthritis. We should remember that many patients with PsA and no obvious arthritis in the physical examination present with subclinical inflammation-both in joints and in entheses-when sensitive imaging techniques such as high-resolution ultrasound and magnetic resonance imaging are used [25]. Inflammatory activity according to ultrasound or magnetic resonance imaging in the joints of patients in clinical remission acts as a predictor of clinical flares in the short term (6 months) and of subsequent structural damage [25]. Therefore, it is possible that some PsA patients with joint pain and no apparent inflammatory activity do, in fact, have some subclinical inflammatory activity that can only be detected using sensitive imaging techniques. This result of our analysis would not be affected by the presence of fibromyalgia, as only 1 patient in the baseline visit and 2 patients in the first follow-up visit were diagnosed with this disease. As a practical corollary to our finding, patients with apparently good control of PsA who develop joint pain should be actively screened for signs of subclinical inflammation using sensitive imaging techniques and start appropriate therapy.
We detected an inverse relationship between physical activity and higher disease activity, according to DAPSA. It is known that a session of moderate physical exercise lasting 20 to 30 min can reduce the number of TNFα-releasing immune cells by 5% [26,27]. Furthermore, though intense and acute physical exercise increases IL-6 levels, IL-6 downregulates several proinflammatory cytokines such as TNFα, helping to improve glucose metabolism and increase insulin sensitization to insulin [26,27]. In summary, it seems that physical exercise has a direct anti-inflammatory effect that contributes to the patient's general well-being and to combating insulin resistance and obesity. The connection between obesity and inflammation is well documented [28]. Therefore, another clearly practical approach is to advise and encourage patients with PsA to adopt cardio-healthy lifestyles, since this could prevent not only the adverse cardiovascular events that are so common in PsA but possibly the development of a more active disease.
The last factor associated with greater disease activity, according to DAPSA, was female sex. Women with spondylarthritis or PsA usually have greater disease activity (as also shown by our results), a higher degree of pain (a component of DAPSA), a higher impact of the disease, and poorer persistence of biologics [29]. The association between sex and disease activity on the following year s visit is observed after adjusting for the tender joint count and PsAID in the logistic regression analysis, pointing out that this association would go beyond different disease activity or impact in the previous visit. Women with PsA perceive greater disease impact and activity, probably because of multifactorial reasons that are beyond the scope of this study; therefore, we should make every effort to design strategies that focus on the needs of women with PsA [30].
In this work, we have described predictors of moderate-high disease activity according to the DAPSA score. We had previously described severity predictors using machine learning algorithms, as in the present work [6]. In any case, it is necessary to highlight that activity and severity are not the same concept. Inflammatory activity can be part of what we understand as disease severity, but the latter concept is much broader and encompasses other aspects of PsA that go beyond the inflammatory load, which is what DAPSA collects. Therefore, we think that the information provided in this study should be understood as complementary to our previous work and not as a circular argument.
The main limitation of this work is its sample size and missing data for some variables. This could have influenced the power of the analysis and, as a consequence, the capacity to identify variables associated with moderate-to-severe disease activity. We tried to counteract this by using models based on artificial intelligence and machine learning. Random forests are "joint" algorithms in which decision trees are trained with different subsets of variables and data. Decision trees are very flexible since they can identify many types of associations between variables. Additionally, random forests add variability, which prevents the model from being overadjusted to the data and can be re-run with new data, increasing the robustness of the predictions. On the other hand, XGBoost algorithms use groups of decision trees in a sequential manner. In each tree, the observations that were wrongly classified in the previous one are given a larger weight, thus defining models with very little bias that commonly generate very accurate predictions. The risk of this phenomenon is a higher probability of the model being overfit to the training subset. Our results showed that the random forest models tended to perform better than XGBoost, which could be a consequence of the reduced number of observations in the dataset, causing the training and test subsets to be quite different. Consequently, we could conclude that for small datasets, an algorithm that overfits less to the training subset, such as random forest, is more suitable.
Due to the relatively small sample size and short follow-up period, independent replication of our results becomes very important.
The main strength of this study is its ability to record the course of PsA from an early phase; the baseline visit reflected the situation of the patients before the evolution was modified by treatment prescribed by the rheumatologist.
Part of the interest of our work is ensuring that there is no confounding effect affecting the associations found between the explanatory variables and moderate-to-high disease activity. According to this, the four final predictors have an association with disease activity once the effect of the rest of the variables initially selected in the bivariate analysis has been taken into account. Moreover, the observed association of each of the four predictors with disease activity in the following annual visit is adjusted for the other three variables. This would reflect an association that would go, at least in part, beyond disease activity at the present moment.

Conclusions
Our artificial intelligence-based analysis revealed that the PsAID score, tender joint count, physical exercise, and female sex could predict greater disease activity according to the DAPSA index during the first years of follow-up with very high sensitivity. Once again, PsAID has shown to be important in the evaluation of PsA, reinforcing the convenience of its inclusion in daily clinical practice. Strategies that focus on the needs of women with PsA should be considered.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A
1. Selection of variables using logistic regression models based on artificial intelligence: The steps were performed in the 75% of the sample (training dataset) as follows (stepwise backward regression): -A logistic regression model based on artificial intelligence was trained with all the variables selected in the bivariate analysis, and the p values of the Wald tests were obtained for each variable (the null hypothesis of this test is that the coefficient of the variable is zero, that is, it is not associated with the outcome measure). - The variable with the highest p value (Wald test) was withdrawn. -An additional logistic regression model was trained with the rest of variables. -These steps were repeated until no variable passed the Wald test (p < 0.05).

Random forest:
Random forests are extensions of decision trees. A decision tree model makes predictions by navigating nodes until a leaf node is reached. In order to construct a decision pathway, we must select the feature that best separates data items into 2 groups and construct an "X ≤ y" type condition. If a data item fulfills the condition when navigating through the tree, it takes the left branch; if it does not, it takes the right branch.
Random forests are supervised algorithms. A random forest involves training a set of decision trees in which each tree votes on the classification of data. The final decision in this model is that voted for by most trees. For the decision trees in the set to be different, each tree is trained with different subsets of data and different groups of variables.
Once a random forest is trained, the importance that the model gives to each variable can be obtained based on the number of times the model uses a variable to establish a condition of the type "X ≤ y" normalized by the number of data that those conditions separate.
3. XGBoost: XGBoost is another supervised decision tree-based algorithm. Unlike the previous approach, which used randomization to improve individual models, models based on boosting use the predictions of each individual tree to improve the results of the following individual tree. Instead of training all the trees in parallel and predicting based on a vote, XGBoost links the results of a tree for sequential training of the following tree.
XGBoost differs from other decision tree boosting algorithms in that, for training, it uses the gradient descent technique to minimize error, regularization (limiting the number of trees and their depth) to ensure that the fit between the model and the training subset is not too close, and other optimization parameters to improve performance.
Gradient descent is an optimization algorithm. Optimization algorithms are used to minimize functions (i.e., to find parameters in the model that minimize error between predictions and real data). Gradient descent progressively evaluates the function in an iterative fashion; it takes steps in the opposite direction to that of the gradient of the function at each point, since this is the steepest descent, until a minimum is found. 4. SHAP (SHapley Additive exPlanations): SHAP method is a technique for explaining the output of artificial intelligence models. It assigns a SHAP value to each value of each variable of each data item in the evaluation set according to the degree to which it affects the prediction of the model (the higher the SHAP absolute value, the more the data item affects the prediction) and to the direction in which it affects the prediction of the model (a positive SHAP value indicates a positive effect on the prediction, that is, it contributes to the prediction having a higher value, and the opposite with a negative SHAP value).
Specifically, the method involves running the data item through the model, removing it from its trees, and calculating the SHAP value based on the differences in the predictions with and without the data item. It is important to understand that SHAP explains the model and not the values of the variables directly. That is, for the same value of a variable in different subjects or at different visits, the SHAP values may differ depending on the values of the other variables in each subject or at each visit.
The importance of a variable in the model is calculated as the mean of the absolute SHAP values for that variable.