The Impact of Multiple Sclerosis Disease Status and Subtype on Hematological Profile

Multiple sclerosis (MS) is an immune-mediated, demyelinating disease of the central nervous system. In this study, an MS cohort and healthy controls were stratified into Caucasian and African American groups. Patient hematological profiles—composed of complete blood count (CBC) and complete metabolic panel (CMP) test values—were analyzed to identify differences between MS cases and controls and between patients with different MS subtypes. Additionally, random forest models were used to determine the aggregate utility of common hematological tests in determining MS disease status and subtype. The most significant and relevant results were increased bilirubin and creatinine in MS cases. The random forest models achieved some success in differentiating between MS cases and controls (AUC values: 0.725 and 0.710, respectively) but were not successful in differentiating between subtypes. However, larger samples that adjust for possible confounding variables, such as treatment status, may reveal the value of these tests in differentiating between MS subtypes.


Introduction
Multiple sclerosis (MS) is a complex disease of the central nervous system in which the myelin sheaths of the neurons in the brain and spinal cord are damaged. As presentation of the disease varies widely between patients, several subtypes of MS have been defined based on patterns of its progression. Relapsing remitting multiple sclerosis (RRMS), the most common form of MS, is characterized by unpredictable attacks (with potentially permanent deficits) followed by periods of disease quiescence [1]. Over time, RRMS patients typically transition into secondary progressive multiple sclerosis (SPMS), which is characterized by steady disease progression. In contrast, a minority of MS cases are classified as primary progressive multiple sclerosis (PPMS), in which disability accrues from disease onset without relapses. PPMS patients constitute only about 10% of all MS patients [2].
Characterizing physiological differences present in MS and its subtypes has been of significant interest in the MS research community [3][4][5][6]. A number of case-control studies have reported cerebrospinal fluid (CSF) and hematological biomarkers associated with MS disease status [7][8][9][10]. Such biomarkers may also have utility as predictors of MS clinical features such as disease progression. Past work has uncovered CSF and hematological biomarkers that correlate with MS subtype [6,[11][12][13]. In addition to fluid biomarkers associated with MS subtype, one study has found low-frequency genetic variants that influence MS subtype susceptibility [14]. While a number of subtype biomarkers have been reported in the literature, there are currently no predictive biomarkers for MS disease course in clinical use. In light of this scarcity of clinical predictors of disease course, and to further investigate physiological changes in MS patients in general, we performed a survey study of hematological profiles in MS patients and controls using commonly employed hematological panels. Additionally, we used random forest classifiers to determine the predictive potential of these blood panels in the context of MS case status and disease subtype. Random forest classifiers have been used to discover novel disease-biomarker relationships and to classify patients in a variety of settings [15][16][17], as well as in other clinically relevant applications [18,19].
In our analyses, we utilized two common hematological panels, the complete blood count (CBC) panel and comprehensive metabolic panel (CMP). Both panels are routine measurements used by clinicians to understand the overall health of a patient [20]. These tests are performed frequently on both ill and healthy patients. The CBC measures blood cell values such as hemoglobin, platelet count, and white blood cell count. The tests in the CMP quantify clinical chemistry values, such as blood serum levels of albumin, various ions, and several liver enzymes. Several values measured in these panels have been shown to be correlated with MS disease status, subtype, and disease progression [21][22][23]. In particular, studies have found an association between the neutrophil-to-lymphocyte ratio (NLR) and MS disease status, disability, and subtype [23][24][25][26][27]. Creatinine and bilirubin have also been implicated in similar studies [10,21,22,28]. In this study, we performed three analyses: first, to better understand the physiological differences present in MS patients in general, we investigated whether differences exist in CBC and CMP values between MS cases and controls. Al-Hussain et al. performed a similar analysis in 2017 using a small MS cohort and a subset of CBC and CMP tests [26]. We used a larger dataset to attempt to replicate their findings for this subset of tests and discover novel associations involving tests not included in their study. Second, due to the lack of clinical biomarkers of MS subtype and the widespread clinical use of the CBC and CMP, we investigated differences in hematological profiles between RRMS/SPMS and PPMS patients. Finally, to evaluate the overall utility of these blood panels in differentiating between MS cases and controls and between PPMS and RRMS/SPMS patients, we trained random forest classifiers using a subset of patients and tested the classifiers on the remaining patients.

Sample Population and Data Preprocessing
Laboratory values for MS patients and control patients were retrieved from deidentified health records in Vanderbilt University Medical Center's Synthetic Derivative (SD). The SD also provided the patient race (observer recorded) and sex data used in the analyses. Patients with missing demographic values, a reported race other than African American or Caucasian, or multiple reported races were excluded from the analysis. MS subtype was determined for each patient using previously published extraction algorithms [29]. Only patients with one of the three major MS subtypes (PPMS, RRMS, or SPMS) were included in the final dataset, with RRMS and SPMS patients grouped together to stratify the cohort into relapsing and progressive groups. Demographic characteristics of the groups are noted in Table 1.
The two blood panels (CBC and CMP) collectively contain patient lab values for 35 biomarkers. Median values were calculated for each patient and used in all analyses. Neutrophil-to-lymphocyte ratio (NLR) measurements were calculated for each patient with a value for both neutrophil absolute count and lymphocyte absolute count on a given date, and the median ratio was used. Patient age at the time of the most recent measurement was used as a covariate in the analyses. Lastly, the data were stratified into Caucasian and African American groups for statistical analysis. For both groups, the group averages of all lab values were within the reference ranges. Not all patients had data for each biomarker; the number of patients ultimately used for each analysis can be found in Supplementary  Table S1.

Statistical Methods
Logistic regression analysis was performed in R (R Core Team, Vienna, Austria, version 3.1.3). Each of the 36 lab tests was analyzed separately as an independent variable. For the case-control analysis, patient median value for the given lab test was used to predict MS disease status. For the subtype analysis, patient median value for the given lab test was used to predict MS subtype. Gender and patient age were included as covariates in all analyses. As seen results, some analyses were not performed in the African American group due to insufficient sample sizes.
To correct for multiple testing, Bonferroni correction (α = 0.05) was applied at the group level in each of the analyses (four total adjusted p-values were calculated). The adjusted p-value calculated for the Caucasian case-control analysis, the African American case-control analysis, and the Caucasian group in the subtype analysis was 0.0014. The adjusted p-value calculated for the African American group in the subtype analysis (with fewer tests) was 0.0017. Analysis results are reported below. Multiple regressions were performed with biologically related lab tests. Mean platelet volume and platelet count were analyzed in one regression. White blood cell count and absolute neutrophil count were analyzed in another regression.
To measure the utility of the CBC and CMP blood panels in classifying patients as MS cases or controls, as well as differentiating between MS subtypes, a random forest model was fitted to each of the previously described study populations. These analyses were restricted to include only individuals without missing values for each laboratory test, and tests with a significant number of missing values were excluded altogether. The analysis utilized the package "randomForest" in R (standard parameters were used). Receiver operating characteristic (ROC) curves and area under the curve (AUC) values were generated using the "ROCR" package to assess the performance of each random forest model in classifying subjects.

Subtype Analysis
In the subtype analysis, no test was significant after Bonferroni correction, but several lab values were nominally significant in both groups. For the odds ratios, a larger odds ratio represents an increased risk of PPMS relative to RRMS/SPMS. In the Caucasian group, increased MPV (p-value: 0.046) had a risk-increasing effect, while increased WBC (p-value: 0.029) and neutrophil absolute count (p-value: 0.003) lowered PPMS risk. In the African American group, increased calcium (p-value: 0.028) and anion gap (p-value: 0.009) carried increased PPMS risk, and higher levels of chloride (p-value: 0.014) and bilirubin (p-value: 0.029) corresponded to decreased risk. Creatinine levels approached significance (p-values: 0.051 and 0.066) in both groups and carried increased risk. In the subtype analysis, all nominally significant tests for the Caucasian group were found in the CBC, and all nominally significant tests for the African American group were found in the CMP.
Multiple regression was performed for nominally significant lab values with closely related biology. In multiple regression analysis with the Caucasian mean platelet volume and platelet count as covariates, platelet count remained significant (p-value: 0.036), while mean platelet volume did not (p-value: 0.177). Similarly, in the African American group, in an analysis with anion gap, calcium, and chloride as covariates, anion gap remained significant (p-value: 0.032) while calcium (p-value: 0.331) and chloride (p-value: 0.153) did not. Additionally, in the same group, analyzing creatinine and bilirubin together resulted in bilirubin retaining its significance (p = 0.044); creatinine remained statistically insignificant (p-value: 0.083).

Random Forest Classification
ROC curves in Figure 1 summarize the predictive performance of the random forest model for each dataset. Both the CMP and CBC performed adequately at differentiating between MS cases and matched controls, with average AUC values of 0.725 and 0.710, respectively. However, when differentiating between PPMS and RRMS/SPMS patients, neither the CMP model nor CBC model achieved notably improved performance over the random guess baseline represented by the gray line, likely due to small sample sizes.

Discussion
In this study, we characterized differences in hematological profiles between MS cases and controls in both African American and Caucasian cohorts; furthermore, we performed a similar analysis to investigate differences between RRMS/SPMS and PPMS patients. A number of biomarkers differed between groups in our analyses, although we were unable to replicate the statistically significant relationships reported by Al-Hussain et al. Interestingly, increased MPV was not only associated with MS in both the Caucasian and African American groups, but it was also associated with PPMS in the Caucasian cohort in the subtype analysis and trended in the same direction for the African American group. Increased WBC had the opposite effect: in both groups of the case-control analysis and the Caucasian group in the subtype analysis, it carried a protective effect, and this directionality was also seen in the African American group during the subtype analysis, though not significantly. Possible explanations for the lack of significance in the African American subtype analyses include smaller sample sizes and physiological differences between the groups. In the African American analyses, higher anion gap was associated with both MS (case-control) and PPMS (subtype analysis). These results were not found in the Caucasian subtype analysis. To our knowledge, none of these relationships have been previously reported in the literature.
Previous studies have reported that bilirubin, creatinine, and the NLR have utility for distinguishing between MS cases and controls and between MS subtypes. Due to the MS subtypes present in our cohort, we were unable to compare our results to those previously reported by Ljubisavljevic et al. evaluating bilirubin as a biomarker of MS disease progression [21]. However, as both Ljubisavljevic et al. and Peng et al. reported, bilirubin levels were significantly reduced in MS patients in our dataset [10,21]. Despite the results of four previous studies that found that the NLR was elevated in MS patients, our case-control analysis found that higher NLR was associated with decreased risk of MS [24][25][26][27]. Two previous studies had reported inconsistent results regarding the ability of the NLR to predict disease course [23,25]. We found that the NLR was not associated with MS disease course. In our case-control analyses, creatinine was lower in MS patients, which contradicts the results in three other studies [22,28,30]. As with the NLR, conflicting evidence exists regarding the ability of creatinine to discriminate between MS subtypes [28,30]. While our results only approached significance, they indicated that creatinine was elevated in PPMS patients.
Given that the laboratory values in this study were selected based on availability rather than biological significance, it is encouraging that the case-control random forest models were able to perform notably better than the baseline. We expect that with sufficient samples improved performance from the subtype models would be observed, as well. Altogether, our results demonstrate that common laboratory values have utility in classifying MS cases and controls; larger samples are needed to assess their value in classifying patients based on MS subtype. This study has several limitations. First, we were unable to account for the effects of medication use and other clinical characteristics (age of MS onset, disease duration) on hematological profile due to a lack of data for these variables. It has been reported that MS treatments can affect hematological values [31][32][33], so our observed associations may correlate with treatments and not necessarily disease onset. Observed associations may also be due to changes as MS progresses, rather than onset. Second, as we used EHR data, we were dependent on previously ordered tests, which limited our sample size for some of the less common tests. Our sample sizes were also limited by the relative rarity of MS in African American populations [1].

Conclusions
This study highlights several compelling trends; however, future studies are needed to replicate these findings while controlling for possible confounding factors and ensuring adequate study power for the subtype analyses.  Institutional Review Board Statement: Ethical review and approval were waived for this study, due to non-human subject research. All data was deidentified prior to inclusion in the study.