Evaluation of Visual-Evoked Cerebral Metabolic Rate of Oxygen as a Diagnostic Marker in Multiple Sclerosis

A multiple sclerosis (MS) diagnosis often relies upon clinical presentation and qualitative analysis of standard, magnetic resonance brain images. However, the accuracy of MS diagnoses can be improved by utilizing advanced brain imaging methods. We assessed the accuracy of a new neuroimaging marker, visual-evoked cerebral metabolic rate of oxygen (veCMRO2), in classifying MS patients and closely age- and sex-matched healthy control (HC) participants. MS patients and HCs underwent calibrated functional magnetic resonance imaging (cfMRI) during a visual stimulation task, diffusion tensor imaging, T1- and T2-weighted imaging, neuropsychological testing, and completed self-report questionnaires. Using resampling techniques to avoid bias and increase the generalizability of the results, we assessed the accuracy of veCMRO2 in classifying MS patients and HCs. veCMRO2 classification accuracy was also examined in the context of other evoked visuofunctional measures, white matter microstructural integrity, lesion-based measures from T2-weighted imaging, atrophy measures from T1-weighted imaging, neuropsychological tests, and self-report assays of clinical symptomology. veCMRO2 was significant and within the top 16% of measures (43 total) in classifying MS status using both within-sample (82% accuracy) and out-of-sample (77% accuracy) observations. High accuracy of veCMRO2 in classifying MS demonstrated an encouraging first step toward establishing veCMRO2 as a neurodiagnostic marker of MS.


Introduction
Current procedures for diagnosing multiple sclerosis (MS) rely primarily upon clinical presentation and qualitative analysis of standard, medical-grade (e.g., lower resolution) magnetic resonance structural, brain images, e.g., [1]. It has been demonstrated that the diagnostic accuracy of MS can be improved when providers implement advanced neuroimaging techniques and analyses that are not presently common in clinical practice, e.g., [2], see also [3]. Further, research using advanced neuroimaging techniques has demonstrated that these techniques can be more sensitive than their traditional counterparts in detecting subtle changes associated with very early manifestations of MS, e.g., [4,5]. Here, we investigated the accuracy of an advanced neuroimaging technique never before used in MS, calibrated functional magnetic resonance imaging (cfMRI), to classify MS patients and closely age-and sex-matched healthy controls (HCs). Specifically, we focused our analyses upon the ability of a new neuroimaging marker, visual-evoked cerebral metabolic rate of oxygen (veCMRO 2 ), to accurately discriminate between MS patients and HCs.
cfMRI is a relatively new neuroimaging technique that capitalizes upon established relationships between blood-oxygen-level dependent (BOLD) signal and cerebral blood flow (CBF) in order to estimate steady-state, oxygen metabolism [6,7] see [8]. The technique gets its name from the use of a BOLD-calibration parameter, often acquired during a gas-inhalation challenge. The CMRO 2 metric permitted by cfMRI offers several advantages over the more commonly used BOLD signal. First, CMRO 2 offers physiological specificity. CMRO 2 represents a true physiological process, oxygen metabolism, whereas BOLD reflects a confluence of processes and as such, is physiologically non-specific. Second, calibration-derived CMRO 2 is strongly tied to electrical and chemical neural activity, e.g., [9][10][11][12][13][14][15], whereas an appreciable component of BOLD signal is unexplained by neural activity, e.g., [16][17][18][19][20], see [21], but see [9]. Finally, CMRO 2 measures are not dependent upon the hemodynamic assumptions of BOLD, making them optimal measures of brain function in populations with atypical hemodynamics, like MS, e.g., [22,23], see [24].
Evaluating CMRO 2 as a diagnostic marker of MS is particularly relevant for these patients because MS is associated with changes to neurometabolism. Neuroimaging research has produced considerable evidence of altered neurometabolism in MS, e.g., [25][26][27][28][29]. In one study, Ge and colleagues [30] demonstrated decreases in brain-wide resting CMRO 2 for MS patients relative to HCs. Some neuroimaging studies have shown that neurometabolic alterations were related to white matter macrostructural (i.e., lesions, e.g., [30]) or microstructural damage in MS, e.g., [27,28]. For example, magnetic resonance spectroscopy in centrum semiovale white matter has shown that N-acetylaspertate (NAA) and NAA: creatine ratios were strongly related to diffusion-weighted indices of white matter structural integrity in MS patients [27].
It is intuitive that MS patients would show differences in in vivo neurometabolism when considering that postmortem analyses have revealed extensive alterations to the mitochondria in lesioned and non-lesioned MS neural tissue [31][32][33], see [34][35][36]. For instance, Singhal and colleagues [33] found decreases in postmortem NAA, a partial marker of neuronal respiratory capacity, and decreases in electron transport subunit proteins across lesioned and non-lesioned MS grey matter, relative to matched control participants' grey matter. Taken together, the results of postmortem and in vivo neuroimaging studies demonstrate that neurometabolic alterations are generally featured in MS.
Evaluating veCMRO 2 should also be particularly relevant as a diagnostic marker of MS because MS is marked by alterations to the neural substrate of the visual system, see [37][38][39][40] see also [5]. The use of advanced imaging techniques such as high-resolution structural brain imaging, optical coherence tomography (OCT), functional magnetic resonance imaging (fMRI), and diffusion tensor imaging (DTI) has revealed that visual system alterations exist even in MS patients without visual disturbances or a history of optic neuritis (a clinical syndrome closely linked to MS and marked by visual impairment and visual pathway insult). Indeed, there are MS-related structural alterations to both early (e.g., retinae) and later (e.g., optic radiations) portions of the afferent visual pathway, and alterations to visuocortical activity in patients without a history of optic neuritis see [39]. For instance, Alshowaier and colleagues [41] used electroencephalogram recordings to show that MS patients without a history of optic neuritis demonstrated delayed inion channel, multifocal visual-evoked electrical potentials relative to age-and sex-matched HCs. Previous work in our laboratory has also revealed alterations to visual cortex BOLD signal during visual stimulation in MS patients with normal or corrected-to-normal vision compared to matched HCs [42], see also [43]. Together, structural and functional imaging results suggest that changes to the visual system are a robust marker of MS pathology.
MS is associated with changes to neurometabolism and alterations to the neural substrate of the visual system. Thus, visual-evoked oxygen metabolism signals in visual cortex (i.e., veCMRO 2 ) should be a diagnostically relevant marker of MS. We assessed the extent to which veCMRO 2 signals could be used to discriminate between MS patients and HCs. The classification accuracy of veCMRO 2 was examined in the context of other variables commonly assayed in MS, including measures of neurological insult (e.g., gross lesion volume, parenchymal atrophy), neuropsychological change (e.g., Brief Repeatable Battery of Neuropsychological Tests [44]), and self-report symptom measures (e.g., subjective fatigue). We tested the extent to which veCMRO 2 , and these other measures, could classify MS status using both within-sample and out-of-sample observations.

Participants
Participants between the ages of 18 and 65 were recruited for this study. Participants were required to be free of MR-contraindicators, concurrent substance abuse, have normal or corrected-to-normal vision, and speak fluent English. Because study procedures included a gas-inhalation challenge (see Section 2.4), participant selection was limited to non-smokers. Participants did not have histories of respiratory or pulmonary problems, cerebral vascular issues, or cardiac problems. Participants were required to have a score greater than 21 on the telephone interview for cognitive status [45]. Thirty-one participants in total met the inclusion criteria.
Twelve MS patients meeting the above criteria were recruited from the Clinical Center for Multiple Sclerosis at the University of Texas Southwestern Medical Center. Eleven patients had a diagnosis of relapsing-remitting MS and one patient had a diagnosis of secondary-progressive MS. Patients were required to be at least 1 month past their most recent exacerbation and their last corticosteroid treatment. Patients were recruited who did not report a history of optic neuritis. Patients without a history of optic neuritis were specifically selected so as to limit additional variability from attributed to severe, anterior visual pathway damage/dysfunction (e.g., such as that resulting from conduction block) and potential visual impairment. All MS patients' vision was normal or corrected-to-normal. Two patients withdrew or declined to undergo the gas challenge (total n = 10).
Nineteen HC participants were recruited from the Dallas-Fort Worth Metroplex via email, posted flyers, and word-of-mouth. These participants were evaluated for the general inclusion/exclusion criteria described above. Three HCs did not undergo the scanning protocol because of exclusions discovered after study enrollment (e.g., concussion history revealed after pre-screening, incidental MR finding). Two HCs withdrew or declined to undergo the gas challenge. During imaging processing (see Section 2.5), one HC's functional images failed to appropriately register to their anatomical image after multiple attempts, so this person was excluded. Thirteen HCs (n = 13) remained for subsequent analyses. These participants were closely age-and sex-matched to the MS patients (see Table 1).

Study Procedures
Study procedures were approved by the University of Texas Southwestern Medical Center Institutional Review Board. Recruitment numbers were approximated based upon previous research showing sufficient power to demonstrate group changes in calibrated fMRI (cfMRI) contrasts with similar sample sizes [22,23]. Participants meeting inclusion criteria were asked to refrain from caffeine use at least two hours before their scheduled appointment time, e.g., [47]. They were also asked not to consume alcohol on the same calendar day before their scheduled appointment. Participants gave written informed consent before undergoing procedures and were compensated for their time. Participants underwent functional and structural neuroimaging on a Philips 3-Tesla magnet (Philips Medical Systems, Best, The Netherlands) with an 8-channel SENSE radiofrequency head coil. Foam padding was placed around the head to minimize motion during MRI scan acquisition. Participants completed standard neuropsychological tests (e.g., Brief Repeatable Battery of Neuropsychological tests [44]) and self-report measures regarding their general health and symptomology (i.e., SF-36 [48], Modified Fatigue Impact Scale (MFIS, [49]); see Table 2 for a complete list of model variables).

cfMRI Parameters and Theory
Dual-echo pseudocontinuous arterial spin labeling (pCASL) and BOLD images (together referred to as dual-echo images) were acquired using an interleaved echo scanning protocol see [7,52]. Together, the perfusion (Echo 1) and BOLD-weighted (Echo 2) images along with biophysical modeling procedures allowed for estimation of CMRO 2 and a neural-vascular coupling coefficient (n, see [8]) associated with steady-state, neural stimulation [5,7]. One task run of dual-echo imaging data and one gas-challenge run of dual-echo imaging data were collected using the following parameters: Estimations of CMRO2 and n were based upon the Davis model of BOLD signal change [6,7]: where ∆x/x 0 denotes a change from baseline, α is an empirically derived constant linking cerebral blood flow and cerebral blood volume, and β is an empirically derived constant related to vascular exchange and susceptibility of deoxyhemoglobin at specific field strengths (e.g., [53][54][55]). We assumed α = 0.38 [56] and β = 1.3 [52]; these values were chosen because they have been shown to be sensitive to group differences in neurophysiology [22,23]. Also, these values have previously demonstrated group-equivalence in the estimation of M, e.g., [22,23]. M is a subject-specific scaling factor dependent upon the washout resting deoxyhemoglobin see [8]. M was estimated in each participant, using the gas challenge detailed below. The measurement of BOLD, CBF, and M allows for the estimation of CMRO 2 . Here, ∆CMRO 2 reflects the visual task-related change in neurometabolism of oxygen from resting baseline: where ∆x/x 0 reflects percent change of signal during task compared to resting baseline. With the estimation of ∆CMRO 2 , n, may also be estimated: thus, n reflects per unit output of ∆CBF per unit input of ∆CMRO 2 see [8].

cfMRI Task and Gas Challenge
Participants completed a visual stimulation task during dual-echo task imaging. This task was chosen for two reasons. First, differences in the functional response to visual stimulation have been observed in MS visual cortex see [42,57]. Second, because this task required minimal effort, group differences in performance were not expected to be a factor.
Participants were trained on the task before entering the MR environment. During the task, participants focused on a fixation cross at the center of their visual field. Participants were required to respond via bilateral, thumb-button press when a change in the luminance of the fixation cross occurred. This task was used in order to control the center of the participants' visual field [22,23,58]. Change in luminance was jittered and occurred every 2, 3, 4, or 6 s. Visual stimulation occurred in a block format. There were 6 visual stimulation task blocks consisting of 60 s of continual annulus flickering in the participants' near-foveal visual field. Annuli alternated at orthogonal orientations (0 to 90 • ) to avoid neural adaptation [58]. Alterations occurred at a constant frequency of 8 Hz because both electrochemical neural activity and BOLD signal have been shown to peak at this frequency, potentially yielding the greatest signal-to-noise estimates, e.g., [59,60]. Rest blocks were jittered at 32, 34, 36, 38, and 40 s intervals (see Figure 1).
thus, n reflects per unit output of ∆CBF per unit input of ∆CMRO2 see [8].

cfMRI Task and Gas Challenge
Participants completed a visual stimulation task during dual-echo task imaging. This task was chosen for two reasons. First, differences in the functional response to visual stimulation have been observed in MS visual cortex see [42,57]. Second, because this task required minimal effort, group differences in performance were not expected to be a factor.
Participants were trained on the task before entering the MR environment. During the task, participants focused on a fixation cross at the center of their visual field. Participants were required to respond via bilateral, thumb-button press when a change in the luminance of the fixation cross occurred. This task was used in order to control the center of the participants' visual field [22,23,58]. Change in luminance was jittered and occurred every 2, 3, 4, or 6 s. Visual stimulation occurred in a block format. There were 6 visual stimulation task blocks consisting of 60 s of continual annulus flickering in the participants' near-foveal visual field. Annuli alternated at orthogonal orientations (0 to 90°) to avoid neural adaptation [58]. Alterations occurred at a constant frequency of 8 Hz because both electrochemical neural activity and BOLD signal have been shown to peak at this frequency, potentially yielding the greatest signal-to-noise estimates, e.g., [59,60]. Rest blocks were jittered at 32, 34, 36, 38, and 40 s intervals (see Figure 1).

Figure 1.
Example of three-trial visual stimulation task. Participants viewed a fixation cross at the center of the screen. This cross changed color at jittered intervals throughout task. Rest periods were also jittered. Continuous stimulation blocks lasted 60 s with 0° to 90° flickering annuli (at 8 Hz). Note: fixation cross was presented during task and rest periods however it cannot be seen in the task example periods here.
Participants also completed a gas-challenge in order to estimate M. Participants breathed 4 min of room air (~0.03% CO2: 21% O2: 78% N2) and 6 min of an iso-oxic, CO2 solution (5% CO2: 21% O2: 74% N2) during dual-echo imaging. Each participant was fitted with a two-way, non-rebreathing valve/mouthpiece and a nose clip. Baseline end-tidal CO2 (EtCO2), O2 saturation, breath rate, and heart rate measures were collected. After the 4 min of room air breathing, a valve was opened to release the CO2 solution from a Douglas airbag which then flowed into the participants' breathing apparatus [22,23]. The CO2 inhalation lasted 6 min.
Hypercapnic challenge, via the inhaled 5% CO2 solution, increases global CBF, but probably has no or a minimal depressant effect on oxygen metabolism, e.g., [61][62][63]. Hypercapnia acts to wash out local baseline concentrations of deoxyhemoglobin, yielding a local maximum estimate of resting BOLD signal. Potential changes to oxygen metabolism due hypercapnic challenge have not been shown to appreciably alter the estimation of M as relationships between hypercapnia-derived M and M derived from non-hypercapnic techniques show high correspondence [64]. Example of three-trial visual stimulation task. Participants viewed a fixation cross at the center of the screen. This cross changed color at jittered intervals throughout task. Rest periods were also jittered. Continuous stimulation blocks lasted 60 s with 0 • to 90 • flickering annuli (at 8 Hz). Note: fixation cross was presented during task and rest periods however it cannot be seen in the task example periods here.
Participants also completed a gas-challenge in order to estimate M. Participants breathed 4 min of room air (~0.03% CO 2 : 21% O 2 : 78% N 2 ) and 6 min of an iso-oxic, CO 2 solution (5% CO 2 : 21% O 2 : 74% N 2 ) during dual-echo imaging. Each participant was fitted with a two-way, non-rebreathing valve/mouthpiece and a nose clip. Baseline end-tidal CO 2 (EtCO 2 ), O 2 saturation, breath rate, and heart rate measures were collected. After the 4 min of room air breathing, a valve was opened to release the CO 2 solution from a Douglas airbag which then flowed into the participants' breathing apparatus [22,23]. The CO 2 inhalation lasted 6 min.
Hypercapnic challenge, via the inhaled 5% CO 2 solution, increases global CBF, but probably has no or a minimal depressant effect on oxygen metabolism, e.g., [61][62][63]. Hypercapnia acts to wash out local baseline concentrations of deoxyhemoglobin, yielding a local maximum estimate of resting BOLD Brain Sci. 2017, 7, 64 7 of 23 signal. Potential changes to oxygen metabolism due hypercapnic challenge have not been shown to appreciably alter the estimation of M as relationships between hypercapnia-derived M and M derived from non-hypercapnic techniques show high correspondence [64].

cfMRI Processing
Task and gas-challenge Echo 1 and Echo 2 data were processed in analysis of functional neuroimages (AFNI [65]) and the Functional MRI of the Brain Software Library (FSL [66]). Data were transformed into cardinal planes. Anomalous data points in each voxel time series were then attenuated using an interpolation method based upon the average signal. Data were volume registered to correct for motion to the fourth functional volume of each dataset's (task or gas challenge) Echo 2 sequence using a heptic polynomial interpolation method. CBF was estimated from Echo 1 images using the surround subtraction method [67]. Dual-echo BOLD data were also interpolated by pairwise averaging of temporally adjacent images.
For the visual stimulation task, Echo 2 data were linearly registered (12 degrees-of-freedom) to each participant's anatomical data using AFNI's align_epi_anay.py program. The transformation matrix from this registration was then applied to Echo 1 data, placing these two datasets in the same space. For gas-challenge data, a binary mask was created for functional voxels in Echo 2 to aid in co-registration. This mask was then registered to the respective participant's anatomical space using the align_epi_anay.py program. Gas-challenge Echo 2 and Echo 1 data were also aligned to the mask which was registered in native anatomical space. After alignment, Echoes 1 and 2 data from both the visual task and gas challenge were visually inspected for registration errors. One HC participant failed to register correctly after multiple attempts and was discarded from further analyses. Echoes 1 and 2 data from the visual task and gas challenge were then spatially smoothed using a Gaussian kernel (FWHM = 8 mm) and high-pass filtered (0.0039 Hz).
Preprocessed data from Echoes 1 and 2 in the visual stimulation task were analyzed via generalized linear modeling of task versus rest periods using a boxcar reference function. This modeling quantified task-related CBF and BOLD changes from baseline. BOLD and CBF beta-values were scaled to each voxel's resting baseline signal and were multiplied by 100, yielding percent signal change estimates from baseline (∆BOLD and ∆CBF). Data were averaged from a visual (functional) region of interest (ROI) comprised of overlapping ∆BOLD and ∆CBF suprathreshold signals within occipital lobe (see Structural and Functional ROI; [22,23]). ∆BOLD, ∆CBF, ∆CMRO 2 , and n results extracted from the functional region of interest were taken as the visual-evoked signals (i.e., veBOLD, veCBF, veCMRO 2 , and ven).
For the gas challenge, resting baseline BOLD and CBF signals during room air breathing were averaged for each voxel time-series (BOLD 0 and CBF 0 ). The first two minutes of hypercapnia BOLD and CBF time-series were discarded to allow participants' blood flow to stabilize on the CO 2 solution, e.g., [22,23]. The last four minutes of hypercapnia BOLD and CBF time-series were averaged to yield BOLD hc and CBF hc respectively. Average values were extracted from a functional region of interest (see Structural and Functional ROI) using overlapping BOLD hc and CBF hc suprathreshold signals within occipital lobe, and were used to calculate M, using the following equation: where (x hc −x 0 )/x 0 reflects percent change in signal from normocapnic to hypercapnic states, normalized by the signals during normocapnia and multiplied by 100. Once M was estimated, ∆CMRO 2 and n were also estimated (see Equations (2) and (3); see Figure 2) within a functional region of interest (see Structural and Functional ROI).
A visual task functional ROI was created within the structural ROI described above to estimate veBOLD, veCBF, veCMRO2, and ven (see Figure 3). This procedure eschewed noise from inactive voxels, e.g., [68]. Voxels comprising each participant's functional ROI were the overlapping top 5% of BOLD and top 5% of CBF t-values obtained from the generalized model, within the structural ROI. This ensured that average veBOLD and veCBF estimates were being derived from the same, taskresponsive voxels and that veCMRO2 and ven were derived in voxels with both CBF and BOLD taskrelated increases (see Figure 3). veCMRO2 was calculated voxel-wise within the functional ROI using ∆BOLD, ∆CBF, M (which was extracted from functional ROI described below). ven was then calculated similarly. The final product of these analyses was average positive veBOLD, veCBF, and veCMRO2, and ven extracted from the functional ROI (see Figure 3).
Because the gas challenge data differed in occipital coverage compared to the visual task data, M was estimated ex situ. To create a functional ROI for the gas challenge, ∆BOLDhc/BOLD0 and

Structural and Functional ROIs
First, the magnetization-prepared rapid acquisition gradient-echo (MPRAGE) data were processed to create a native-space, occipital ROI. The skull was removed using an automated command, separating parenchyma and cerebral spinal fluid from the skull. An intensity based automated segmentation algorithm was used to delineate primarily white matter, grey matter, and cerebral spinal fluid voxels yielding a partial volume estimate of each tissue type, for each voxel. A grey matter mask was then created, retaining voxels with only a greater than or equal to grey matter partial volume estimate of 80%. A structural ROI of occipital lobe was manually delineated on each participant's MPRAGE image. These were drawn in native space because native space analyses tend to allow for more sensitive patient-control contrasts [68]. The structural ROI was drawn using gyral and sulcal landmarks and encompassed most of occipital cortex including calcarine sulcus, cuneus, and occipital portions of lingual gyrus. Several anatomical landmarks were used in the demarcation of this ROI (parieto-occipital sulcus, occipital pole, pre-occipital notch). Within the anatomically defined occipital lobe, only voxels with partial volume estimates of grey matter (≥80%) were retained. These final masks were down-sampled to the functional voxel size.
A visual task functional ROI was created within the structural ROI described above to estimate veBOLD, veCBF, veCMRO 2 , and ven (see Figure 3). This procedure eschewed noise from inactive voxels, e.g., [68]. Voxels comprising each participant's functional ROI were the overlapping top 5% of BOLD and top 5% of CBF t-values obtained from the generalized model, within the structural ROI. This ensured that average veBOLD and veCBF estimates were being derived from the same, task-responsive voxels and that veCMRO 2 and ven were derived in voxels with both CBF and BOLD task-related increases (see Figure 3). ∆CBFhc/CBF0 maps were thresholded and extracted from the structural ROI detailed above. The criteria for retention of a voxel within these maps required that the voxel was within the top 15% (top 20% for one participant) of ∆BOLDhc/BOLD0 and ∆CBFhc/CBF0 voxels in the structural ROI, and that these ∆BOLDhc/BOLD0 and ∆CBFhc/CBF0 voxels overlapped. This procedure ensured complementary maximum ∆BOLDhc/BOLD0 and ∆CBFhc/CBF0 signals in the retained voxels. Average ∆BOLDhc/BOLD0 and ∆CBFhc/CBF0 signals were extracted from this ROI and M was calculated (see Equation (4)).

Structural Images
One T1-weighted MPRAGE image was acquired for each participant: 160 slices, TE = 3.7 ms, repetition time TR = 8.1 ms, sagittal slice orientation, 1 × 1 × 1 mm 3 voxel, 12° flip angle. SIENAX [15,69] was used to obtain measures of grey matter, white matter, and total brain volume normalized by participant's head size. This technique uses partial volume estimation to calculate volume of differing tissue types (see Figure 4B,C). Further, this technique takes into account lesioned tissue, as demarcated by lesion masks (see below), in order to avoid misclassification of this tissue. The final products of these analyses were scaled estimates of each participant's grey matter, white matter, and total brain volume (mm 3 ).
A T2 fluid attenuated inversion recovery (FLAIR) scan was also acquired for each participant: 33 slices, TE = 125 ms, TR = 11,000 ms, no slice gap, transverse slice orientation, 0.45 × 0.45 × 5.00 mm 3 voxel, 120° refocusing angle. FLAIR images were used to estimate the extent of gross lesion burden for each participant. Hyperintense voxels were demarcated using in-house MATLAB code based upon slice-wise, signal intensity (i.e., voxels that were ≥1.25 SD over the slice mean intensity). Next, lesions were manually delineated from the hyperintense tissue by two trained researchers (L.H., S.F.). Manual delineation ruled out false positives in lesion classification due to fat signals, motion, ventricular edge effects, skull, or signal inhomogeneites [70]. Lesion burden was estimated by extracting the number of voxels that were demarcated by the automated and manual procedures. Inter-rater agreement of lesion burden was calculated using a Dice ratio (κ) of the lesion burden estimates made by the two researchers on a sample of several subjects [71]. After the researchers were trained on lesion classification, inter-rater agreement was found to be high, κ = 0.89; where κ > 0.70 is generally thought to reflect excellent inter-rater agreement [72]. Lesion burden was quantified using veCMRO 2 was calculated voxel-wise within the functional ROI using ∆BOLD, ∆CBF, M (which was extracted from functional ROI described below). ven was then calculated similarly. The final product of these analyses was average positive veBOLD, veCBF, and veCMRO 2 , and ven extracted from the functional ROI (see Figure 3).
Because the gas challenge data differed in occipital coverage compared to the visual task data, M was estimated ex situ. To create a functional ROI for the gas challenge, ∆BOLD hc /BOLD 0 and ∆CBF hc /CBF 0 maps were thresholded and extracted from the structural ROI detailed above. The criteria for retention of a voxel within these maps required that the voxel was within the top 15% (top 20% for one participant) of ∆BOLD hc /BOLD 0 and ∆CBF hc /CBF 0 voxels in the structural ROI, and that these ∆BOLD hc /BOLD 0 and ∆CBF hc /CBF 0 voxels overlapped. This procedure ensured complementary maximum ∆BOLD hc /BOLD 0 and ∆CBF hc /CBF 0 signals in the retained voxels. Average ∆BOLD hc /BOLD 0 and ∆CBF hc /CBF 0 signals were extracted from this ROI and M was calculated (see Equation (4)).

Structural Images
One T 1 -weighted MPRAGE image was acquired for each participant: 160 slices, TE = 3.7 ms, repetition time TR = 8.1 ms, sagittal slice orientation, 1 × 1 × 1 mm 3 voxel, 12 • flip angle. SIENAX [15,69] was used to obtain measures of grey matter, white matter, and total brain volume normalized by participant's head size. This technique uses partial volume estimation to calculate volume of differing tissue types (see Figure 4B,C). Further, this technique takes into account lesioned tissue, as demarcated by lesion masks (see below), in order to avoid misclassification of this tissue. The final products of these analyses were scaled estimates of each participant's grey matter, white matter, and total brain volume (mm 3 ).
A T 2 fluid attenuated inversion recovery (FLAIR) scan was also acquired for each participant: 33 slices, TE = 125 ms, TR = 11,000 ms, no slice gap, transverse slice orientation, 0.45 × 0.45 × 5.00 mm 3 voxel, 120 • refocusing angle. FLAIR images were used to estimate the extent of gross lesion burden for each participant. Hyperintense voxels were demarcated using in-house MATLAB code based upon slice-wise, signal intensity (i.e., voxels that were ≥1.25 SD over the slice mean intensity).
Next, lesions were manually delineated from the hyperintense tissue by two trained researchers (L.H., S.F.). Manual delineation ruled out false positives in lesion classification due to fat signals, motion, ventricular edge effects, skull, or signal inhomogeneites [70]. Lesion burden was estimated by extracting the number of voxels that were demarcated by the automated and manual procedures. Inter-rater agreement of lesion burden was calculated using a Dice ratio (κ) of the lesion burden estimates made by the two researchers on a sample of several subjects [71]. After the researchers were trained on lesion classification, inter-rater agreement was found to be high, κ = 0.89; where κ > 0.70 is generally thought to reflect excellent inter-rater agreement [72]. Lesion burden was quantified using absolute (total mm 3 of lesioned tissue; see Figure 4E) and relative scales (percent of total mm 3 of lesioned tissue scaled by uncorrected white matter volume in mm 3 ). Spatially distinct lesion count was also obtained by counting the number of non-touching lesions for each subject (see Figure 4F), e.g., [73]. A lesion was required to have at least 3 mm 3 volume in order to be added to the total lesion count. Thus, the final products of these analyses were absolute lesion volume, relative lesion volume, and spatially distinct lesion count.
Brain Sci. 2017, 7, 64 10 of 22 absolute (total mm 3 of lesioned tissue; see Figure 4E) and relative scales (percent of total mm 3 of lesioned tissue scaled by uncorrected white matter volume in mm 3 ). Spatially distinct lesion count was also obtained by counting the number of non-touching lesions for each subject (see Figure 4F), e.g., [73]. A lesion was required to have at least 3 mm 3 volume in order to be added to the total lesion count. Thus, the final products of these analyses were absolute lesion volume, relative lesion volume, and spatially distinct lesion count.
Automatic Image Registration [75] was performed on raw diffusion-weighted images to correct distortion caused by eddy currents. Six elements of the 3 × 3 diffusion tensor were determined by multivariate least-squares fitting. The tensor was diagonalized to obtain three eigenvalues (λ1-3) and eigenvectors (v1-3). Standard tensor fitting was conducted with DTIStudio [76] to generate the most common DTI-derived diffusion characteristics, fractional anisotropy (FA), axial diffusivity (AD), mean diffusivity (MD), and radial diffusivity (RD).
DTI measurements were obtained at the skeletons of the white matter using FSL [77] to alleviate partial volume effects with tract-based spatial statistics (see Figure 4F-H) [77]. Participant FA maps were registered nonlinearly to the EVE single-subject FA template [78][79][80] for better alignment with a digital white matter atlas (JHU ICBM-DTI-81) [81]. Registered FA maps of all subjects were
DTI measurements were obtained at the skeletons of the white matter using FSL [77] to alleviate partial volume effects with tract-based spatial statistics (see Figure 4F-H) [77]. Participant FA maps were registered nonlinearly to the EVE single-subject FA template [78][79][80] for better alignment with a digital white matter atlas (JHU ICBM-DTI-81) [81]. Registered FA maps of all subjects were averaged to generate a mean FA map, from which an FA skeleton mask was created. Skeletonized FA images of all subjects were obtained by projecting the registered FA images onto the mean FA skeleton mask. Skeletonized AD, MD, and RD metrics were obtained by applying the same registration, projection, and skeletonization procedures. We extracted skeleton-wide averages of each DTI metric (i.e., AD, FA, MD, RD), wherein an average of each metric is calculated across all voxels within the white matter skeleton (see Figure 4A).

Statistical Analyses
All analyses were performed on distributions free of outliers (≥±2 SD from group mean for simple group comparisons, ≥±3 MAD from group median for classification modeling see [82]). Binary logistic regression was used for classifying MS status. A description of model variables can be found in Table 2. The accuracies of these models were computed as the proportion of correct classification outcomes over all outcomes. Accuracy was chosen as the metric of interest because it combines sensitivity and specificity in binary classification analysis by taking into account both true positives and true negatives relative to all outcomes. We used resampling-based hypothesis testing to examine both within-sample and out-of-sample classification of patient status see [83]. Because we used relatively conservative analytic techniques, inherently reducing the likelihood of Type I error and increasing the generalizability of our results, the criterion for a rejection of the null hypothesis was not corrected for multiple comparisons and all models were evaluated at the field-standard α = 0.05. We also denote which hypothesis tests survived Benjamini-Hotchberg correction (Table 4; Figure 7).
Within-sample classification analyses obtained bias-corrected and accelerated (BCa) bootstrappedresampled (B = 10,000) 95% confidence intervals of the accuracy of binary logistic regression models. The BCa procedure was used because it is robust to both skewness and sampling bias in the bootstrap distribution [84]. To avoid unstable classification, we stratified all resamples to match the original sample's constitution of patients and controls, 56.5% and 43.5%, respectively. If the BCa-derived 95% confidence interval did not contain a value at or below 0.50 (binary chance), this would demonstrate the measure's accuracy was significantly greater than chance to classify MS patients and HCs.
Out-of-sample classification analyses used a leave-one-out cross-validation approach [85]. This technique used training and sample iterations to test the ability of the model derived from the training set to predict an observation in the test (out-of-sample) set, thus, circumventing problems of sample bias, model over fitting, and lending a true predictive element to these analyses. Briefly, the leave-one-out cross validation (LOOCV) approach fitted N models, where N was proportional to our sample size. Each model was trained on N-1 samples and then the accuracy of the training model was assessed on the left-out sample. The N accuracies were then averaged to attain a representative and generalizable measure of the average out-of-sample classification accuracy. Permutation based p-values (5000 permutations) were computed to assess the significance of the LOOCV-derived accuracy statistics. The test permuted patient status labels and recomputed the accuracy of the model at each iteration, thus building the null distribution. The p-values were calculated from the percentage of the accuracy estimates of the permuted samples that were better than actual LOOCV-derived accuracy statistic of each model. This procedure was slightly modified according to Ojala and Garriga [86].

Within-Sample Classification Analyses
Measures are ranked on original accuracy and presented in Table 4. Accuracy and smoothed density distributions for the significant and bottom 5 measures can be found in Figure 6.

Out-of-Sample Classification Analyses
Predictors presented in Figure 7 are ranked on LOOCV-derived accuracy.

Discussion
In the present study, we used a neuroimaging approach novel to MS research (cfMRI) to assess the accuracy of veCMRO 2 in classifying MS patients and closely age-and sex-matched HC participants. MS patients showed similar responses to HCs in veBOLD and ven, however showed decreased veCBF and a pronounced decrease in veCMRO 2 relative to HCs. Groups were similar on visual task performance and on physiological measures pertaining to the CO 2 challenge, indicating that potential MS-related changes in physiological response to carbon dioxide, e.g., [87] or visual attention were not likely contributors to group CMRO 2 differences. Within-sample classification analyses demonstrated that veCMRO 2 was significant and one of the top measures to accurately classify MS status, discriminating between MS patients and HCs with exceptional accuracy (82%). Results also showed that within-sample classification accuracy by veCMRO 2 was comparable to neuroimaging measures often used to gauge MS pathology, such as T 2 -FLAIR lesion burden (80% accuracy) and T 1 grey matter volume (81% accuracy). veCMRO 2 was also significantly accurate in MS classification using out-of-sample observations (77% accuracy). The use of such out-of-sample modeling afforded a predictive element to this study and demonstrated that veCMRO 2 can accurately classify new observations of MS and HC participants, offering support for its potential diagnostic utility.
One question that arises from these results is whether veCMRO 2 can add predictive value over other advanced imaging techniques not studied here. For instance, measurements of multifocal visual-evoked potentials have been of great interest to the MS research community. This technique, which uses visual stimulation and electroencephalogram signals in occipital channels proximal to the inion has been demonstrated to (1) more sensitively and specifically detect visual abnormalities in MS eyes relative to other visual-system measurements [88], (2) predict conversion to an MS diagnosis in persons with optic neuritis [89], and (3) relate to the extent of MS-related damage to visual white matter tracts [41]. Not surprisingly, this technique can also accurately discriminate between MS patients and HCs, e.g., [90]. For example, one study showed that measurements gathered from multifocal visual-evoked potentials were on average 74.76% accurate (range: 62.7%-96.1%) in classifying within-sample observations of MS patients without optic neuritis and HCs ( [90], average calculated from Figures 5 and 6, pp. 910-911). We can compare these figures with the within-sample accuracy of veCMRO 2 observed here (82%). This suggests that veCMRO 2 accuracy is in about the same range as multifocal evoked potentials. However, it performs appreciably better than the average multifocal evoked potential measure. Future research directly comparing veCMRO 2 to electroencephalogram and other measures is necessary to more faithfully adjudicate claims about the relative performance of this technique.
A second avenue for future research could involve examining whether the integration of evoked CMRO 2 from other neural systems could maximize MS classification accuracy. Here, we showed significant decreases in MS patients' veCMRO 2 relative to HCs. This variable was also largely accurate in the prediction of MS status. We looked at veCMRO 2 specifically because of robust alterations to the visual system in MS see [37][38][39][40]. However, because (1) mitochondrial alterations are found in multiple forms of neural tissue in MS [31,33] and (2) global brain decreases in oxygen metabolism have been found in MS patients relative to HCs [30], it is likely that evoked CMRO 2 is affected in other neural systems as well. Our work and others' have shown altered patterns of brain activity in MS patients in motor, e.g., [42,91,92] and association cortices [43,[93][94][95], see [96]. It is possible that the addition of measures of evoked CMRO 2 in these areas could lend improvements in the accuracy of MS classification. One advantage of the cfMRI approach over other advanced imaging approaches in MS, like OCT or visual-evoked potentials, is that this technique can specifically and simultaneously assay multiple neural systems. Work underway in our laboratories is examining the extent to which evoked motor and executive system CMRO 2 differs between MS patients and age-and sex-matched healthy HCs, and whether these changes, along with veCMRO 2 , can help build optimal neurodiagnostic models of MS.
The utility of imaging biomarkers in MS is not limited to assisting in diagnosis see [97]. For instance, OCT measures have been shown to be effective in predicting brain atrophy and visual acuity loss in MS see [38]. The retinal nerve fiber thickness and macular volume measures from OCT might also be useful in differentiating different subtypes of MS [98]. Other imaging-based measures, such as T 2 -lesion burden, have shown prognostic ability by prediction of future MS disability, e.g., [99], see also [100][101][102]. One potential avenue for future research is to evaluate the use of oxygen metabolism signals in MS prognosis. For example, Ge and colleagues' [27] research showed that lower resting brain-wide levels of oxygen metabolism were associated with both increased neurological disability and increased lesion burden in MS patients. Although these findings were cross-sectional, they suggested that oxygen metabolism could be a marker of the trajectory of disease course. To wit, future longitudinal work should examine whether measures of oxygen metabolism in early MS can predict future disease progression cf. [89]. veCMRO 2 or resting oxygen metabolic markers could also be evaluated for their abilities to predict the transition from risk states (such as clinically or radiologically isolated syndrome) to clinically definite MS see [100,102,103].
A recent wave of findings related to metabolic dysfunction in MS has led to metabolic hypotheses to explain the pathophysiology of MS see [34][35][36]. For instance, Paling and colleagues furthered an energy failure hypothesis of the pathophysiology of MS [35,104]. These authors postulated a link between white matter damage and energy demand in MS, wherein this damage causes neuroenergetic demand to exceed the supply of metabolic substrate. This hypothesis is largely consistent with the findings of the present study, wherein the observed relative decrease in veCBF (the supply of oxygen and glucose) in MS might have limited the neurometabolic response (veCMRO 2 ) relative to HCs. Further, issues of oxygen extraction due to mitochondrial damage/dysfunction could have also contributed to the relative decrease in veCMRO 2 for MS patients relative to HCs see [34][35][36].
Imaging techniques here and elsewhere have produced convincing biomarkers of MS see [38,97,100]. However, MS is a complex, multifaceted disease. Thus, it is not surprising that our results revealed a diverse array of measures that were accurate in classifying MS patients and HCs. The goal of this work was to examine the ability of a new marker (veCMRO 2 ) to accurately classify MS. However, a truly prodigious advance in MS diagnostics will likely evolve from models that combine many relevant factors. It is possible that a "gold-standard" model of MS diagnostics would contain information about evoked CMRO 2 , along with other information like lesion count, self-reported symptomology, neuropsychological performance, and potentially other strong associates of MS not examined here (e.g., low-contrast letter acuity performance see [105], oligoclonal band status [106], retinal nerve fiber layer thickness see [38]). For instance, research from the Alzheimer's Disease Neuroimaging Initiative showed that a complement of multimodal neuroimaging, cerebrospinal fluid proteins, along with standard clinical evaluations allow for optimal prediction of conversion from mild cognitive impairment to Alzheimer's disease [107]; see also [108] for application in psychiatry.

Conclusions
This study was the first to apply cfMRI in an MS sample. Presently, the intricacies of cfMRI acquisition and post-acquisition processing probably hinder it from having an immediate impact upon routine diagnosis or tracking of MS. However, acquisition continues to be optimized and research is showing promise toward eliminating the gas-challenge component of this method, see [8], which should increase the ease of cfMRI administration and the diversity of patients in which it can be applied. With contemporary research highlighting the importance of neurometabolism in the pathophysiology of MS and continued optimization of this technique, cfMRI shows promise as a translational diagnostic/prognostic tool for MS.
Our findings demonstrated that veCMRO 2 was accurate in classifying both within-and out-of-sample observations of MS patients and HCs. Out-of-sample analyses suggested that predictive models using veCMRO 2 could be useful in MS diagnostics and potentially new cases of MS. Although out-of-sample analyses provide confidence in the generalizability of our findings, larger, independent samples are desirable to confirm the robustness of these effects. However, the present findings represent an encouraging first step in realizing the diagnostic relevance of veCMRO 2 in MS.