Cognitive neuroscience research aims to explore relationships between various neural and behavioral measures to examine the underlying peripheral/central neural mechanisms in various testing conditions and subject populations. For this purpose, the bivariate Pearson correlation analysis is commonly used to examine the strength of the linear relationship between two continuous variables of interest, which can be graphically represented by fitting a least-squares regression line in a scatter plot [1
]. If the variables do not represent continuous data or if the relationship between the two variables is non-linear, other types of bivariate correlation tests such as Spearman or Point-Biserial correlations can be used. However, when a study involves multivariate data, the conventional correlation method only allows for the examination of one predictor and one outcome variable at a time. Even if the Pearson correlation results are adjusted for multiple comparisons or a simple multiple regression model is applied, the statistical treatment may not take into account the complex relationships and categorical grouping terms that likely exist in the multiple within-subject predictor variables [2
In consideration of the violation of the assumed sample independence required of bivariate Pearson correlations and the like, researchers have long argued for the necessity to apply more sophisticated statistical techniques to handle repeated measures from the same subjects [3
]. The use of mixed-effects (or multilevel) models has recently captured attention in longitudinal medical research [6
], behavioral and social sciences research [15
] (including speech and hearing research [20
]), and neurophysiological and neuroimaging research [42
]. Its increasing popularity is shown in the exponential growth over the last three decades in the number of publications in the scientific literature (Figure 1
Data analysis using mixed-effects regression models allows for the examination of how multiple variables predict an outcome measure of interest beyond what a simple multiple regression model can handle [2
]. In addition to the fixed effects in a conventional multiple regression model, a mixed-effects model includes random effects associated with individual experimental units that have prior distributions. Thus mixed-effects models are able to represent the covariance structure that is inherent in the experimental design. In particular, the linear and generalized linear mixed-effects models (LME or GLME), as implemented in popular software packages such as R, prove to be a powerful tool that allows researchers to examine the effects of several predictor variables (or fixed effects) and their interactions on a particular outcome variable while taking into account grouping factors and the existing covariance structure in the repeated measures data. For instance, adding research participants as a random effect in a LME model allows investigators to resolve the issue of independence among repeated measures by controlling for individual variation among participants. Essentially, the inclusion of subject as a random effect in the model assumes that each participant has a unique intercept, or “baseline”, for each variable. Linear mixed-effects models also allow for an understanding of how changes in an individual predictor variable, among other co-existing variables, impact the outcome measure. These regression coefficients provide more detailed information about relationships among predictors and outcome variables than Pearson correlation coefficients as the Pearson correlation coefficient simply measures the strength of the linear relationship between each selected pair of variables independent of the others. Additionally, driven by the research questions and the nature of the independent and dependent variables, researchers can build and compare LME models differing in complexity to best summarize findings. Many possibilities regarding appropriate types of models, necessary data transformations to achieve linearity for each variable, and the inclusion of interaction terms as well as random slopes or intercepts can be considered.
Despite the wide acceptance of the LME method and similar approaches for multivariate data analysis, researchers do not necessarily take into account the differences between Pearson correlation and LME models for proper statistical treatment of their data. The current report of side-by-side comparison was propelled by the successive publication of two recent studies from our lab that respectively used conventional Pearson correlations and the more sophisticated linear mixed-effects regression models. In particular, our first study investigated whether noised-induced trial-by-trial changes in cortical oscillatory rhythms in the ongoing auditory electroencephalography (EEG) signal could account for the basic evoked response components in the averaged event-related potential (ERP) waveforms for speech stimuli in quiet and noisy listening conditions [54
]. When the first study was submitted, we were not aware of the importance and relevance of the LME approach to the analysis of our data set. Even though the paper went through two rounds of revisions, the two anonymous peer reviewers did not raise any concerns for the use of Pearson correlation in our analysis. Our second study further examined whether the noise-induced changes in trial-by-trial neural phase locking, as measured by inter-trial phase coherence (ITPC) and spectral EEG power, could predict averaged mismatch negativity (MMN) responses for detecting a consonant change and a vowel change and whether the cortical MMN response itself could predict speech perception in noise at both the syllable and sentence levels [54
]. In the publication process of the second study, reviewers questioned the validity of the Pearson correlation analysis for the multiple measures for the same speech stimuli from the same group of subjects, which led to a major revision adopting the LME regression analysis. In hindsight, as the trial-by-trial oscillations and the averaged ERPs are different analysis techniques applied to the same EEG signal, it would have been appropriate to choose the LME models to report the statistical results in our first publication.
As these two previous publications in auditory neuroscience reported only correlation results using one statistical approach, a direct comparison of both the Pearson correlation and LME approaches can be helpful to highlight the differences in the statistical results. Although our examples here are exclusively focused on speech perception research, the informative comparisons of the statistical results are presented as a further development to advocate for proper implementation of statistical modeling and interpretation of multivariate data analysis in future studies of cognitive neuroscience and experimental psychology.
2. Study 1
Koerner and Zhang [54
] aimed to determine whether noise-induced changes in trial-by-trial neural synchrony in delta (0.5–4 Hz), theta (4–8 Hz), and alpha (8–12 Hz) frequency bands in response to the syllable /bu/ in quiet and in speech babble background noise at a −3 dB SNR (signal-to-noise ratio) were predictive of variation in the N1–P2 ERPs across participants.
2.1. Statistical Methods
In the published data [54
], Pearson correlations were used to examine the strength of linear relationships between ITPC and the N1–P2 amplitude and latency measures pooled across the two listening conditions for each participant and frequency band, resulting in 12 correlations. The reported p
-values were adjusted for multiple comparisons. Prior to this analysis, scatterplots were used to check the linearity of each pair of continuous variables. Separate repeated measures analysis of variance (ANOVA) were also used to examine the effects of background noise on ITPC and N1–P2 latency and amplitude measures. The ITPC values ranged from 0 to 1, where 1 represents perfect synchronization across trials and 0 represents absolutely no synchronization across trials. Resulting p
-values were adjusted for multiple comparisons. For the current comparative report, linear mixed-effects models were developed using R [55
] and the nlme
]. Participants were used as a “by-subject” random effect and listening condition (quiet vs. noise) was included as a blocking variable in each linear mixed-effect model. ITPC values at time points associated with the N1 and P2 responses in delta, theta, and alpha frequency bands were included as fixed effects. For each Pearson correlation and linear mixed-effects model, the significance of each variable in predicting behavioral performance was assessed with the significance level at 0.05.
Koerner and Zhang [54
] provided detailed results from repeated measures ANOVAs and the Pearson correlations (see replicated Table 1
for summary of correlation coefficients). The repeated measures ANOVA revealed significant noise-induced delays in N1 (F
(1, 10) = 53.71, p
< 0.001) and P2 (F
(1, 10) = 22.27, p
< 0.001) latency as well as a significant reduction in N1 amplitude (F
(1, 10) = 13.85, p
< 0.01). Additionally, the repeated measures ANOVA revealed significant noise-induced reductions in ITPC for N1 in delta (F
(1, 10) = 20.68, p
< 0.01), theta (F
(1, 10) = 18.51, p
< 0.01), and alpha (F
(1, 10) = 23.45, p
< 0.001) frequency bands as well as for P2 in delta (F
(1, 10) = 13.27, p
< 0.01), theta (F
(1, 10) = 14.86, p
< 0.01), and alpha (F
(1, 10) = 14.57, p
< 0.001) frequency bands.
Results from the Pearson correlation tests showed that ITPC was significantly correlated with N1 latency in delta (r = −0.586, p < 0.01), theta (r = −0.521, p < 0.05), and alpha (r = −0.510, p < 0.05) frequency bands. Similarly, significant correlations were found between ITPC and N1 amplitude in delta (r = 0.780, p < 0.001), theta (r = −0.765, p < 0.001), and alpha (r = −0.720, p < 0.001) frequency bands. Correlational analysis also revealed significant correlations between ITPC and P2 latency in delta (r = −0.468, p < 0.05), theta (r = −0.575, p < 0.01), and alpha (r = −0.586, p < 0.01) frequency bands as well as between ITPC and P2 amplitude in delta (r = 0.666, p < 0.01), theta (r = 0.612, p < 0.01), and alpha (r = 0.599, p < 0.01) frequency bands.
Results from the linear mixed-effects models showed that ITPC in the delta frequency band was a significant predictor of N1 (F
(1, 7) = 16.12, p
< 0.01) and P2 amplitude (F
(1, 7) = 10.72, p
< 0.05) across listening conditions. Neural synchrony in the alpha frequency band was a significant predictor of N1 latency (F
(1, 7) = 12.51, p
< 0.05) across listening conditions. Potential interaction effects were statistically nonsignificant when examined in a full LME model and were therefore removed from the report. An examination of regression coefficients allows for an interpretation of how each fixed effect is related to the outcome measure of interest. For example, a one-point decrease in ITPC in the delta frequency band is associated with a 1.05 unit increase in the N1 amplitude (see Table 2
for a summary of F-statistics and correlation coefficients (B)). The residual plots from each linear mixed-effects model were normally distributed and did not reveal heteroscedasticity or significant trends. Therefore, it is not expected that generalized linear models would provide better results.
This current report compared results from Pearson correlations and linear mixed-effects regression models using data from two published ERP studies. It was determined that Pearson correlations were not appropriate for examining relationships in our data, which contained built-in differences across within-subject repeated measures. The results showed how linear mixed-effects regression models (after verification of normality of residuals and homogeneity of variance) are able to depict relationships between the predictor and outcome variables while taking into account repeated measures across participants. While the LME models were able to confirm basic conclusions gained from the Pearson correlation analyses for both studies [54
], a comparison of methods and results for each model highlighted differences between the two approaches.
The repeated measures ANOVA indicated that background noise had a significant effect on N1 and P2 latencies as well as N1 amplitudes in response to the syllable /bu/ [54
]. Similarly, the repeated measures ANOVA revealed that MMN latency, amplitude, and spectral power were significantly impacted by background noise [57
]. These results support the possibility that pooling data from quiet and noise listening conditions created a built-in contrast and bias between data points when Pearson correlations were used, which partly led to the overestimation of the association strength in the reported results (Table 1
and Table 3
). In other words, the Pearson correlation analysis ignores these built-in differences and treats this type of data as if each variable in the repeated measures design were independent and normally distributed across the two listening conditions. The resulting p
-values represent the probability of observing an effect that is as large, or larger, than what would be observed if there was no covariance structure in the repeated measures. In contrast, LME regression analysis was able to account for the covariance structure and grouping factors for the repeated measures. Tests of significance from the LME models examined whether each predictor variable, or fixed effect, was significantly different than zero while taking into account the other fixed or random effects in the model.
One issue common to regression analysis concerns the possible existence of multi-collinearity (or the existence of high correlations) among the predictor variables and how it may inflate the results with unstable estimates of regression coefficients such as an overall significant model with no significant predictors [2
]. In the mixed-effects (or multilevel) models, the implementation of fixed and random effects allows control of the within-subject factor for repeated measures, and the additional stepwise approach allows removal of predictor variables in a systematic fashion, for instance, calculating a variance inflation factor (VIF) to identify collinear predictors to aid the stepwise removal of predictors from the LME models. The VIF represents the proportion of variance in one predictor variable accounted for by all the other predictors in the model. Estimation of VIFs for each predictor and progressive dropping of the predictor with the largest VIF beyond the cutoff criterion can be helpful in dealing with the collinearity of interaction terms. By contrast, Pearson correlation analysis assumes independence of the variables, and only fixed effects are directly examined piecewise without elaborate procedures to take into account how the existing associations/differences among the predictor variables may contribute to (oftentimes inflate) the correlation coefficients. The bivariate Pearson correlation analysis disregards potential correlations and data groupings among variables, which makes it inappropriate for research questions that aim to examine associations between variables that contain built-in differences between experimental conditions or subject groups.
Although the flexibility in model selection can be considered a strength of LME regression analysis, the number of educated choices a researcher must make while developing and implementing models can be a challenge. For instance, the inclusion of interactions or random effects in LME models affects the regression coefficients and interpretation of fixed effects, which cannot properly be taken into account in the bivariate Pearson correlation analysis. Although stepwise regression methods are available as a systematic approach to choose an appropriate model, it is important for researchers to think deeply about the subject matter in order to determine whether the inclusion and interpretation of specific fixed and random effects are appropriate for the specific research question and study objective.
While the two ERP studies reported here are clearly limited in scope and depth of analysis, the side-by-side comparisons clearly demonstrate the limitations and inappropriateness of the Pearson approach as well as its inflated correlation estimation results for the data sets. Given that multiple analysis techniques (for example, waveform analysis, source localization, time-frequency analysis) can be applied to the same neurophysiological data in cognitive neuroscience research [54
], a cautionary note against the convenient use of the simple Pearson correlation test is necessary when selecting and applying statistical models to interpret brain-behavior correlations (e.g., biomarkers of various diseases and disorders) or correlations among the various brain measures with prior distributions and covariance structure for repeated measures.