Assessing Multi-Site rs-fMRI-Based Connectomic Harmonization Using Information Theory

Several harmonization techniques have recently been proposed for connectomics/networks derived from resting-state functional magnetic resonance imaging (rs-fMRI) acquired at multiple sites. These techniques have the objective of mitigating site-specific biases that complicate its subsequent analysis and, therefore, compromise the quality of the results when these images are analyzed together. Thus, harmonization is indispensable when large cohorts are required in which the data obtained must be independent of the particular condition of each resonator, its make and model, its calibration, and other features or artifacts that may affect the significance of the acquisition. To date, no assessment of the actual efficacy of these harmonization techniques has been proposed. In this work, we apply recently introduced Information Theory tools to analyze the effectiveness of these techniques, developing a methodology that allows us to compare different harmonization models. We demonstrate the usefulness of this methodology by applying it to some of the most widespread harmonization frameworks and datasets. As a result, we are able to show that some of these techniques are indeed ineffective since the acquisition site can still be determined from the fMRI data after the processing.


Introduction
Magnetic resonance imaging (MRI) is an imaging modality that allows, among other things, the monitoring and sensing of the neuronal activity in the brain. Several drivers are raising the application horizons of MRI imaging. To mention the main ones, MRI is innocuous and non-invasive, resonators are steadily becoming less expensive, and with increasing capabilities, the knowledge of the human brain's anatomy and physiology is turning more definite and precise, and new analysis techniques are able to extract subtle, latent information that is fundamental to several research and clinical purposes. In particular, resting-state functional magnetic resonance imaging (rs-fMRI) is able to capture interactions between brain regions that, in turn, can lead to evaluating several biomarkers of interest. Currently, acquisitions obtained from rs-fMRI data make it possible to thoroughly study several aspects of the human brain function, both in healthy subjects and those diagnosed with neurological or even psychiatric conditions, including Alzheimer's disease [1,2], schizophrenia [3,4], and autism spectrum disorder [5,6]. In order to carry out these studies correctly and obtain an adequate statistical significance, it is necessary to have a high number of acquisitions, which is not always possible in a single site and session. Therefore, the number of studies using images acquired at multiple sites has been increasing over the years [7][8][9][10]. This allows the speeding up the data collection and analysis process, and increasing the sample size naturally leads to greater predictive power and more sophisticated studies. Multi-site acquisition is also critically important Overview of our technique to analyze, with the help of Information Theory, the effectiveness of the harmonization techniques for multi-site acquired data. From a set of multi-site fMRI data, we generate the Phase Interaction Matrix and, from there, we verify in the Shannon-Fisher plane the quality of the application of different harmonization techniques.

Datasets
In this section, we describe the different datasets used for testing. Since the aim is to compare the sites within each multi-site dataset, we can see that it does not matter that each downloaded dataset had different processing steps already applied. Further, it is worth noting that three of the datasets are given as sets of time series (IMPAC, ABIDE, and ADHD-200), while one was given as a set of connectivity matrices of Pearson correlation values (SRPBS). See below.

ABIDE
ABIDE [41] is a publicly accessible repository of 20-site rs-fMRI data from 17 different international institutions. The dataset is composed of scans of 539 individuals diagnosed with autism spectrum disorder and 573 healthy control individuals. However, because some data had downloading problems, the number of total subjects was reduced from 1112 to 884. The image acquisition parameters can be found at https://fcon_1000.projects.nitrc. org/indi/abide/, (accessed on 15 April 2022). The data were preprocessed using the standard pipeline DPARSF (Data Processing Assistant for Resting-State fMRI Toolbox), which is based on the parametric statistical mapping. Finally, the time series were extracted following the definitions of seven different atlases: Talaraich

ADHD-200
As part of the International Neuroimaging Datasharing Initiative (INDI), the ADHD-200 dataset [42] is a collaboration of 8 imaging sites, composed of neuroimaging data from 362 children and adolescents diagnosed with Attention Deficit Hyperactivity Disorder (ADHD) and 585 typically developing controls (total 947 subjects). As some scans had downloading problems or missing data, the number of subjects used in this work was reduced to a total of 768 subjects. The data were preprocessed using the Athena pipeline, which is based on tools from the AFNI and FSL software packages. The time series extracted from the images are based on six different atlases: Talaraich-Tournoux (97 ROIs), Harvard-Oxford (111 ROIs), AAL (116 ROIs), Eickhoff-Zilles (116 ROIs), Craddock (190 ROIs), and Craddock (351 ROIs). The data can be found at https://www.nitrc.org/frs/?group_id=383, (accessed on 15 April 2022), and details of the pipeline at https://www.nitrc.org/plugins/mwiki/index.php/ neurobureau:AthenaPipeline, (accessed on 15 April 2022). The imaging parameters are detailed at http://fcon_1000.projects.nitrc.org/indi/adhd200/, (accessed on 15 April 2022).

SRPBS
This dataset [43,44] includes data from subjects with four different diagnoses and healthy control subjects who were examined at nine sites corresponding to eight institutions. Of the 805 participants, 482 are healthy, 161 have a major depressive disorder, 49 have autism spectrum disorder, 65 have obsessive-compulsive disorder, and 48 have schizophrenia. Each participant underwent a single session of rs-fMRI for 5 to 10 min. The time series extraction procedure is detailed in the work by Yamashita and co-authors [31], where 268 regions of interest were delineated. Of note, participants who reported high levels of head movement were excluded, resulting in a reduction in the size of the dataset to 637 subjects. It is worth mentioning that this dataset also includes traveling-subject data, and it thus could be applicable to the traveling-subject harmonization methodology. The data are available at https://bicr.atr.jp/dcn/en/download/harmonization/, (accessed on 15 April 2022), while scan parameters can be found at https://bicr.atr.jp/rs-fmri-protocol-2/, (accessed on 15 April 2022).

Traveling-Subject Dataset
As will be explained in a later section, the traveling-subject dataset [31] is necessary to estimate measurement bias across sites in the SRPBS dataset. It is composed of data of 9 healthy participants that were scanned at each of 12 sites, which included the 9 sites in the SRPBS dataset, producing a total of 411 scan sessions.

Connectivity
We started by applying a bandpass filter of 0.04-0.07 Hz to the BOLD time series to select adequate low frequencies. Then, the instantaneous phases φ i (t) of each region i were estimated by applying the Hilbert transform to the filtered signals. The phase coherence P ij (t) between two regions i and j at a time t was calculated using the cosine of the phase difference, as shown in Equation (1).
Since the Hilbert transform expresses signals in polar coordinates, using the cosine function makes two regions have a phase coherence close to 1 when their time series are in phase, 0 when they are orthogonal, and −1 when they are out of phase. In this way, the phase interaction matrix P(t) represents the instantaneous phase synchrony between each of the regions [45,46]. This procedure results in a matrix of size N × N × T for each subject, where N is the number of regions and T is the total number of points in the time series: Since T is a different value for each subject, the temporal dimension was eliminated by averaging each of the matrices over time. The procedure detailed above was applied to the IMPAC, ABIDE, and ADHD-200 datasets. For the SRPBS dataset, on the other hand, we worked with correlation matrices since the data provided by the dataset correspond to the Pearson correlation coefficient's values are widely used in previous studies.

Harmonization
As introduced in the previous section, to reduce potential biases and non-biological variances introduced by different acquisition sites and scanners, there are various methods of data harmonization. In this work, the effectiveness of three of them is evaluated. One is ComBat [21][22][23], which is probably the most accepted and widely used in the literature. The second is CovBat [28], which is a refined and improved version of ComBat. The last one is the traveling-subject method, whose authors have shown that it can be more effective than ComBat in certain cases [31]. Below we briefly describe their main features. It is important to emphasize that in this work, for both ComBat and CovBat, the biological covariates to be protected during the removal of scanner/site effects were defined as the gender, age, and diagnosis of each of the subjects.

ComBat
The ComBat technique, originally created to be used in Genomics analysis, is perhaps the most commonly used for the harmonization of brain connectivity data. It is based on Bayes' empirical method, assuming that errors in the data can be corrected by adjusting the means and variances of the different acquisition sites. It has been shown to be able to eliminate site differences while adequately maintaining biological variability [24,26]. Defining y tjv as the evaluation at site t, participant j. and characteristic v, the ComBat regression model can be written as where α v is the average connectivity of the feature v, X tj is the design matrix for the covariances of interest, and β v is the regression vector of coefficients corresponding to X. In turn, γ tv and δ tv represent the additive and multiplicative terms of the site i for the feature v, respectively, while tjv represents an error term assumed to arise from a normal distribution with zero mean and variance σ 2 v . The values harmonized by ComBat are then defined as: where γ * tv and δ * tv are the empirical Bayesian estimates of the parameters γ tv and δ tv . Therefore, biological and non-biological terms are modeled and estimated to algebraically eliminate the additive and multiplicative effects of the sites. The calculations were made using the library available at https://github.com/Jfortin1/ComBatHarmonization, (accessed on 15 April 2022).

CovBat
The method called Correcting Covariance Batch Effects (CovBat) was proposed to remove site effects in mean, variance, and covariance. It was built on top of the ComBat framework, assuming that the features follow Equation (3). However, the error vectors tjv may be spatially correlated and differ in covariance across sites, so this method modifies principal component scores to shift each within-site covariance to the pooled covariance structure. This means that the first term of Equation (4) is assumed to have a mean of 0, but its covariance matrix Σ may differ between sites. Therefore, principal component analysis (PCA) is performed to obtain an estimation of the eigenvalues λ and eigenvectors φ of Σ. The principal component scores are defined as where tjk is a zero-mean normal distribution and µ tk and ρ tk are the center and scale parameters corresponding to principal components k = 1, 2, · · · , K where K is a hyperparameter selected to capture the desired proportion of the variation in the observations. The parameters are estimated by finding the values that bring each site's mean and variance in scores to the pooled mean and variance. Then, the site effects are removed via The CovBat-adjusted residuals are obtained as Adding the intercepts and covariates' effects, the harmonized values result in

Traveling-Subject Method
This method is based on the identification of measurement biases, sampling biases, disorder factors, and subject factors. The measurement bias m for each site is defined as the deviation of the connectivity value between each pair of regions of interest from its average over all sites and is due to the differences between the properties of the scanners involved. The sampling bias s, introduced due to differences in participant groups between sites, is assumed to be different for subjects diagnosed with different disorders. Disorder factors d are defined as deviations from control subjects. In turn, the factors of the subjects p are calculated as the deviation of the connectivity from the average of the participants.
All the biases and factors mentioned above are estimated by fitting a linear regression model using ordinary least squares with L 2 regularization. The connectivity value v for subject j is then given by where const represents the average connectivity between all participants and e the noise.
The vectors x m , x s , x d , and x p are one-hot encoded. It is difficult to separate differences between sites using a single dataset because the two types of biases defined are correlated across sites. Therefore, in order to use this method, it is necessary to have an extra dataset (the so-called traveling-subject dataset), where the participants are constant; that is, it must be composed of scans of a constant set of healthy subjects in each of the sites in the original dataset. This is why the measurement bias is estimated only using the traveling-subject dataset. It is precisely the acquisition of this traveling-subject dataset that presents the main problem of the technique, since it requires that the same reference subjects physically travel to all the acquisition sites, with the consequent logistic problem that this implies. Finally, matching is achieved by subtracting the estimated measurement biases m v from the connectivity values, resulting in y Traveling jv

Assessment of Harmonization Quality
In this subsection, we will introduce the specific theoretic information foundations that we use in this work. As already mentioned, the measures and methods derived from this background are capable of assessing the actual effectiveness of the different harmonization measures. Readers familiar with Information Theory, Shannon entropy, Fisher Information, and the causality-complexity plane can skip the reading of this subsection.

Information Theory Measures
A key aspect of Information Theory is the concept of entropy as a measure of the uncertainty involved in the outcome of a random variable or process. These outcomes, in turn, are related to the probabilities (or relative frequencies) of the possible values that the variable or process may hold. Then, as the first step in our application of Information Theory in the context of rs-fMRI, it is necessary to define the probability distribution for the data. Our strategy was to binarize both the averaged phase interaction matrices and the correlation matrices using a threshold value of 0.5. Any other nontrivial threshold can be applied, and experiments show that the results to be exposed below are robust with respect to this choice in a wide range (f.e., from 0.2 to 0.8). Connections with values greater than the threshold were assigned to 1, while the others were assigned to 0. In this way, each of the new matrices results in an adjacency matrix A that represents the neural graph of a subject model of the given threshold. Once obtained, the probability that a random walk goes from a node i of a graph to any other node j is calculated. This probability distribution p i→j is defined for each node as where k i is the degree of the node, and its value is obtained as k i = ∑ j a ij .

Shannon Entropy
Based on the P (i) distribution, the Shannon Entropy for each node can be defined as where P (i) = {p i→j : j = 1, ..., N} is the probability distribution vector associated to node i. In turn, the Normalized Nodal Entropy for node i is obtained as Finally, the Normalized Network Shannon Entropy (SE) is calculated by averaging the Normalized Nodal Entropy over the entire network, resulting in SE is a global disorder measure commonly used in various applications of Information Theory. An advantage of SE in networks is that it is relatively insensitive to substantial changes in distributions that are concentrated in a small region of space. Therefore, it is able to quantify the heterogeneity of networks: H → 0 for sparse networks and H → 1 for fully connected networks.

Fisher Information
Fisher Information is a statistical model aimed at measuring how much information about an unknown parameter can be obtained from a sample. In other words, it assesses the amount of information that an observable random variable within a population carries about an unknown parameter of the distribution that models the population. Using the notation of the previous subsection, the Normalized Fisher Information for a node i is given by Then, the Normalized Network Fisher Information (FI) is defined as This measure can be interpreted in various ways, for instance, the ability to precisely estimate a parameter, the amount of information that can be extracted from a set of measures, or the state of disorder of a system. Unlike Shannon Entropy, Fisher Information is a local measure based on the gradient of the underlying distribution, so it is significantly sensitive to localized disturbances in small regions.

Shannon-Fisher Plane
The use of the Shannon-Fisher plane was originally proposed by Vignat and Bercher [47], who defined it to show that through the simultaneous examination of both Shannon Entropy and Fisher Information, the non-stationary behavior of a complex signal may be characterized. Without any assumption on the nature of the data, the Shannon-Fisher area can be simply defined as: Using this plane, we can find that our system lies in a very ordered state when the Shannon Entropy H ∼ 0 and the Fisher Information F ∼ 1. However, when the system stays in a very disordered state, we obtain that H ∼ 1 and F ∼ 0 [48]. In general, it is widely accepted that the Shannon-Fisher Information plane is an effective tool to contrast global and local characteristics of a given probability distribution. In our case, and as performed by Freitas et al. [37], each of the networks is placed in the Shannon-Fisher plane that arises from the two measurements that have been explained in the previous sections.

Quantification Measures
We define the null hypothesis as that the population median of all of the sites are equal, the Kruskal-Wallis [49] test was used twice to quantify the magnitude of the effects of the acquisition sites for SE and FI. All possible combinations of datasets, atlases, and harmonization methods were analyzed: 21 cases with no harmonization, 21 cases harmonized with ComBat, 19 cases with CovBat, and 1 case with the traveling-subject method. It is worth noting that two cases (ABIDE/Craddock400/CovBat and ADHD-200/Craddock351/CovBat) are missing due to limitations in computing resources. As stated in a previous section, for the traveling-subject method, an extra dataset is needed, so the only possible case to evaluate that method is the one from the SRPBS dataset. The transformation p = − log p was applied to the p-values obtained with the test to facilitate the comparison and interpretation of the results. Table 1 shows the comparison of the harmonization methods for the different datasets, atlases, and Information Theory measures, while the figures in the next subsection visually illustrate the performance of each method described in this manuscript. Analyzing the results for the case without the harmonization stage, it is observed that the influence of the acquisition site is considerable, even more so in the IMPAC database. In addition, the impact is greater in the SE measure than in the FI measure for all cases and in the atlases with a larger number of ROIs for most cases (ABIDE/Dosenbach is the largest exception). When applying the ComBat method, substantial improvements are obtained in all cases, i.e., the site influences are unnoticeable and not statistically significant, achieving p ≥ 0.05 in 12 cases. The harmonization is still better for the atlases with lower amounts of ROIs, but now the impact on the Information Theory measures is reversed: for all cases, the site effects are stronger for FI than for SE. The data for which the quality of the harmonization obtained are the lowest comes from the ADHD-200 dataset. This is due, as can be seen in Figure 2, to the existence of an outlier that may come from severe misalignments present in the scan of that particular subject. The CovBat method achieves similar results, slightly outperforming ComBat in most cases, especially when applied to atlases with no more than 200 ROIs. Finally, the traveling-subject method also represents an improvement over the case without harmonization, but it is almost insignificant compared to ComBat and CovBat.

Visualization
The relationships observed by analyzing Table 1 can also be visualized by plotting the Shannon-Fisher plane for the different cases. In this subsection, the most relevant results for the analysis are presented, while the others can be found in the Supplementary Material. The first row of Figure 3 shows the distribution of all the subjects of the IMPAC dataset within the Shannon-Fisher plane without previously applying a harmonization technique. Acquisition sites 28 and 31 present a significant distance from the center, so they are shown separately in the figures. In the second row, the planes corresponding to the ComBat method are presented, where the distribution is grouped in a much more uniform way. The same happens in the third row for the CovBat method.
In Figure 4, both the effectiveness of ComBat/CovBat and the inability of the travelingsubject method to harmonize the measures extracted from the SRPBS dataset are evident. For the unharmonized case, we can observe that sites 5 and 6 lay outside the "cloud" of the rest of the measures, which can be attributed to acquisition differences due to already mentioned factors other than the biological ones (e.g., differences in the actual equipment, differences in the used functional BOLD MRI sequence settings, improper calibration, etc.) This separation is no longer appreciable after the application of ComBat or CovBat but remains practically unchanged with the traveling-subject method. Figures 2 and 5 show the resulting planes after applying the ComBat and CovBat methods to, respectively, the ABIDE dataset and the ADHD-200 dataset. As can be seen at the bottom left of the planes corresponding to the ADHD-200 dataset, there is an outlier that could not be removed with any harmonization method, probably caused by some kind of misalignment in the scanning device.

Discussion and Conclusions
These results provide new evidence about the importance of having techniques capable of removing unwanted biases caused by the acquisition sites from rs-fMRI images in order to merge data obtained by means of different scanners without incurring methodological errors. A set of tools based on Information Theory was presented to discern the quality of harmonization techniques for multi-site rs-fMRI measurements, allowing the quality of such techniques to be verified. This set of tools, when applied to data harmonized with different techniques, makes it possible to determine if there are still traces of the original locations or if the data are reliable for the subsequent specific treatment depending on the problem to be treated. We should mention that part of inter-site heterogeneity might be caused by differences in sample demographics (e.g., race, education, background), which cannot be addressed with the harmonization methods presented here. On the other hand, we observed that the tools performed best with a low number of ROIs, as shown in the Results Section. This is probably due to the finer granularity and higher detail associated with a higher-order (i.e., with more ROIs) atlas, which allows capturing more inter-subject features (i.e., biological features that do not depend on the site). Further, variability in fMRI imaging parameters across sites may affect the quality of harmonization. In general, we observed that ABIDE, IMPAC, and ADHD-200 MRI scan parameters vary a lot from site to site. As a consequence, this could induce a lot of variability in the BOLD SNR obtainable from each site. Given that the rs-fMRI connectivity metrics have sensitive fMRI BOLD SNR, this, in turn, could have an impact on the effectiveness of harmonization. For instance, as we mentioned, the disalignment in some individual cases may force ComBat and CovBat to fail to fully harmonize the full ADHD dataset, as can be seen in Figure 2, which could be due to differences in MRI imaging parameters between that site and other sites in the ADHD dataset, and between individual runs.
In general, we observe that the theoretical information analysis reveals that ComBat and CovBat provide better results than the traveling-subject method, showing that these tools are effective in discerning the subtle details of the registration site that were not removed by the harmonization method. As a rule of thumb, we could say that CovBat provides the best results for datasets with less than 200 ROIs, while ComBat excels otherwise. The harmonization effectiveness of both Normalized Network Shannon Entropy (SE) and Normalized Network Fisher Information (FI) measures seem to deteriorate as the number of ROIs in the parcellation schemes increases. This could be due to variations between sites in the fMRI SNR. SE measures uncertainty, and thus a larger number of ROIs implies smaller sizes, which makes them more sensitive to variations in fMRI SNR and hence variability in scanning parameter-related site effects. FI, in turn, measures information, and thus a larger number of ROIs will be more sensitive to site effects as they yield more information about the site.
It is worth noticing that the results obtained in this work do not correspond to those obtained by Yamashita and co-authors [31], where they achieved a greater reduction in measurement bias using their traveling-subject method than with other techniques, i.e., ComBat. This dichotomy could be an indication that their harmonization technique is useful in specific analyses, as performed in their paper, but is not robust to other processing methodologies, such as Information Theory measures. In particular, our analysis of the original data with the Shannon-Fisher plane revealed that the site-related information was not completely removed by the harmonization process, which may render the method inadequate for further comparative analysis, while ComBat presents a more general and robust performance. Another disadvantage of the traveling-subject method is its high cost and time consumption due to the need for a large group of participants to travel to all the sites involved. Therefore, applying this method to correlation matrices requires a wider logistic basis to achieve significant harmonization, while both ComBat and CovBat present very good results.
Finally, it is worth mentioning that the workflow presented in this paper cannot be applied directly to images obtained with rs-fMRI studies; it is necessary to have the corresponding BOLD time series. Hence, for this study, from the wide spectrum of existing harmonization methods [20], we used the three compatible with this type of data. One interesting avenue for further research is to extend the workflow to more general types of information, which could be performed by properly defining the respective information theoretical measures. Further, for future work, we consider it relevant to be able to extend the analysis through the Shannon-Fisher plane to graphs and probabilistic networks (without requiring prior thresholding). Finally, we will investigate if these results can also be reproduced using the Pearson correlation matrix instead of the phase interaction matrix.