Comparative Analysis of Feature Extraction Methods for Intelligence Estimation Based on Resting State EEG Data †

: This paper presents a comparative study of relationship estimation between intelligence indicators and single-channel and multi-channel feature sets extracted from resting EEG data. In the ﬁrst case, the power of four frequency bands (alpha, theta, beta, delta) calculated using the discrete Fourier transform (DFT) and the power spectral density (PSD) estimated through the Welch’s method for each of the channels were extracted as features from the EEG signals. In the second case, Imaginary Coherence (iMOCH) measure values for a pair of channels in the frequency bands were extracted. Graph theoretical connectivity metrics were calculated for iMOCH. As part of the experimental part of the study, the data of the EEG records of 79 subjects at rest and the values of four IQ indicators (IQ2—ability to abstract; IQ3—verbal analogies and combinatorial abilities; IQ7—ﬁgure detecting, combinatorial abilities; IQ8—spatial imagination) of the structure of intelligence were analyzed by the Amthauer method. For relationship estimation, a principal component regression was used. The performance evaluation is based on the nested Monte-Carlo cross-validation. The single-channel feature set provides the smallest standard deviation of mean absolute error. For non-verbal intelligence, the results of the multi-channel approach are better. For verbal intelligence, on the contrary, the single-channel approach gives the best result.


Introduction
Differences in human intelligence have been the focus of research for many years. Intelligence quotient (IQ) indicators are used as a measure of intelligence. An assessment of a subject's intelligence, obtained through a series of tests, shows the subject's intelligence relative to the intelligence of an average individual in a cohort. In order to obtain a qualitative assessment of IQ using a series of tests, it is necessary that the questionnaires be localized for different countries, taking into account their linguistic and cultural backgrounds. In recent years, new methods for assessing the level of intelligence, such as electroencephalography (EEG) [1][2][3][4][5], have become of particular interest to scientists around the world. Since the EEG method is able to capture individual differences in brain function, this method of studying the brain has come to be considered a powerful tool for studying the biological basis of intelligence. The first works in this area focused on studying the correlation between IQ test scores and brain rhythms. In [6], it was shown that Raven's Progressive Matrices positively correlated with the alpha frequency in the prefrontal and frontal areas, while the verbal subtests of the Amthauer Intelligence Structure Test positively correlated with mean and peak alpha frequency factors.
Numerous studies prove that IQ is related to the characteristics of EEG signals in the time and frequency domains [7,8]. In [9], an approach to predicting IQ is proposed involving the analysis of the ratio of the powers of subranges of brain waves and an artificial neural network. In [10], aspects of quantitative EEG prediction of intelligence are considered, which provide moderate to strong estimates of the size of the effect of cognitive functioning. In [11], the relationship between individual EEG characteristics at rest and the level of non-verbal intelligence is studied. In [12], a new non-invasive method for measuring human intelligence is proposed using a direct approach by classifying EEG data. In [13], an approach to predicting the IQ level from EEG data based on cross-relational analysis and the SVM classification model is presented. A promising method in the field of detecting relationships between IQ and EEG at rest is the graph-theoretic approach to brain network analysis [14]. The purpose of this study is to compare approaches to identifying the relationship between intelligence measures and EEG data at rest using single-channel and dual-channel feature sets. We considered four components of the intelligence test, two of which can be attributed to verbal intelligence (using words) and two to non-verbal intelligence (using geometric objects).
The paper is organized as follows. Section 1 substantiates the relevance of the topic. Section 2 presents the theoretical background. Section 3 contains a description of materials and methods. Section 4 describes the experimental results. Section 5 summarizes the work performed in this study.

EEG Signals
An EEG signal is composed of the integration of neuronal activity in various spatial and temporal scales. To take an EEG, a certain number of scalp electrodes are installed on the human scalp in accordance with a standardized scheme for applying electrodes. Figure 1 shows the layout of electrodes according to the international 10-20 system. The scalp electrodes' identifiers consist of letters indicating the region of the brain and numbers (odd numbers for the left hemisphere of the brain, even numbers for the right) or the letter "z" (for the midline). Electrodes A1 and A2 are connected to the left and right earlobes and are used as reference electrodes [15]. There are five major frequency bands. From low to high frequencies, they are called alpha (less than 4 Hz), theta (from 4 to less than 8 Hz), beta (from 8 to less than 13 Hz), delta (from 13 to less 30 Hz), and gamma (more than 30 Hz) [16].

EEG Signal Features
EEG signals are complex, so it is quite difficult to extract from them the information necessary to solve a particular problem. Currently, there are many features that can be extracted from EEG data. The set of features extracted from EEG data can be divided into several categories: measures that are calculated from the data of one channel and measures that take into account the data of two channels. The first category includes feature extraction methods based on transformation (for example, Fourier transform), time domain methods (for example, Hjort parameters), frequency domain (for example, band power), and complexity (for example, Petrosian fractal dimension) [17][18][19]. The second category includes measures such as cross-correlation and coherence [20].
The most demanded measure among researchers is the estimation of power in various frequency ranges. This feature extraction method requires the transformation of the EEG time series from the time domain to the frequency domain. To estimate the power spectral density (PSD), the Welch method is most often used, but the Discrete Fourier Transform (DFT) method can also be used. EEG coherence can provide information about the formation of the functional integration of brain regions. A measure of coherence can be used to determine whether two or more sensors or brain regions have the same oscillatory activity of neurons with each other [21].

Amthauer Intelligence Structure Test
One of the most popular tests for measuring general intelligence is the Amthauer intelligence structure test. In Russia, this test is used in adaptation [22]. The Amthauer intelligence structure test allows one to build an intelligence profile consisting of nine components: • IQ1-completion of sentences, logical thinking, language skills; • IQ2-word exception, ability to abstract; • IQ3-verbal analogies, combinatorial abilities; • IQ4-conceptualization, ability for abstract verbal thinking; • IQ5-calculations, mathematical abilities; • IQ6-number series completion, ability to operate with numbers and inductive thinking; • IQ7-figure detecting, combinatorial abilities; • IQ8-identification of cubes, spatial imagination; • IQ9-remembering words, ability for short-term storage of information.
We used four subtests. The two selected subtests, IQ2 and IQ3, can be attributed to verbal intelligence, since they use words, while the other two subtests, IQ7 and IQ8, can be attributed to non-verbal intelligence, as they use geometric shapes. Non-verbal intelligence was studied in [14]. Thus, we want to confirm or not the previous results and also to compare the results for verbal and non-verbal intelligence.

Principal Component Regression
One of the approaches to regression analysis, used in conditions where a correlation is observed between input factors, is the construction of a principal component regression [23]. The idea is to use the principal component method to decompose the input variables into orthogonal factors.
Let the data matrix X contain n rows (observations) and m columns (input variables) normalized so that the mean is zero and the variance is equal to one. Then, the correlation matrix R can be represented as R = X T X, where T means transpose. The correlation matrix R can be expressed by the matrix Λ of its eigenvectors (factor loadings) as follows: where the eigenvalues of R on the diagonal of the diagonal matrix Φ are the variances of the corresponding principal components.
Usually (q << m) principal components are used to represent correlation matrix R. Then, (1) can be rewritten asR where the index q means that instead of the m components, their smaller number q is taken. The principal components are ordered in descending order of their explanatory power to represent the correlation matrix R. Principal component scores F q can be calculated as F q = XΛ q .
If we use the matrix of principal components F q as the input matrix in the model, the problem of collinearity is weakened because of the orthogonality of columns of the matrix F q . In order to go back from the q-dimensional space of principal components to the original m-dimensional space, it is necessary to multiply the resulting vector of estimates by the matrix Λ q . Then, the final expression for the principal component regression estimates has the formβ where y is a response vector of dimension n × 1.
The covariance matrix for estimates (2) is expressed as follows: where σ 2 is the error variance.
The choice of q can be made on the basis of cross-validation. Hyperparameter selection should be separated from the model training procedure. Therefore, to evaluate the performance of the considered approaches, it is more correct to use nested cross-validation [24]. To do this, the sample is divided into three parts: training, validation, and testing. Here, they were divided in the ratio of 60%, 20%, and 20%, respectively. On the training sample, the principal component regressions are estimated for various values of the parameter q (from one to five). The validation sample is used to estimate the prediction error for various values of the parameter q and to choose the optimal value of the parameter q that provides the smallest average error. Furthermore, this optimal value is used in the procedure of estimating a principal component regression based on a sample, including training and validation together. Based on the obtained regression estimates, a prediction is built for the test sample. It is important that, here, the test sample is not involved in any way when choosing the parameter q. The procedure is based on the Monte-Carlo cross-validation [25]; that is, the division into three parts was carried out randomly.

Data Description and Model Structure
The data of EEG records of 79 subjects at rest were analyzed, as well as the values of their IQ2, IQ3, IQ7, and IQ8 components of the intelligence structure according to the Amthauer method. The sample included only female individuals aged 17 to 21. Individuals were grouped into age groups: 17 years old, 18 years old, 19 years old, and older. It turned out that the distribution of IQ2 component is not the same depending on age. Table 1shows the p-value of the two-sample Kolmogorov-Smirnov test for comparison IQ value in the age groups 18 years old, 19 years old, and older. From the EEG data, the band power spectra calculated using DFT and PSD were extracted from various channels and frequency ranges (alpha, theta, beta, delta). There are 152 features in total. Since the records had different durations (minimum-2 min; maximum-6 min), the features were extracted for each minute. Observations for different minutes for one subject were considered as repeated. As a result, the sample size was 222.
To estimate synchronization between a pair of signals, Imaginary Coherence (iMOCH) was used, calculated using MNE Python software. The various values of the quantiles (from 10% to 80%) of strength of connections within the person were used as the threshold. Connection strengths lower than the threshold were removed. For four IQ indicators, correlations were calculated with each of the seven graph connectivity indicators (average and characteristic path length, transitivity, network modularity, diameter, eigenvector centrality, closeness centrality) in five frequency bands (alpha, theta, beta, delta, and gamma). Thus, a total of 35 extracted features were obtained.
During the analysis, it was revealed that the effect of the interaction of age and EEG features has a significant effect on IQ components. Therefore, the following model was taken as the basis: where y i is the IQ value for the i-th observation, z ij is the i-th value of the binary variable reflecting the subject's belonging to group j by age, x il is the value of the l-th feature according to EEG data for the i-th observation, k is the number of features taken from the EEG data, ε i is a random error, and θ 0 , . . . , θ J , α 1 , . . . , α L , γ 11 , . . . , γ LJ are the parameters to be estimated. The model (3) is estimated using principal component regression described above.

Results
The results of the multi-channel approach turned out to depend on the way the brain network graph was constructed. Three alternatives were used: unweighted, weighted by the coherence values, and weighted by the reciprocal coherence values (weighted reversed). An unweighted graph has unit weights. Similar to how it was carried out in [26] to remove group bias [27], the iMOCH values were divided by the maximum value within each matrix. Figure 2 shows that there are no significant advantages in using weighted graphs. A mean absolute error similar in magnitude can also be obtained using unit weights. Based on the results obtained, it is impossible to make recommendations regarding the choice of the threshold value. For IQ2 (Figure 2a), the minimum mean absolute error is reached at a small threshold value. At the same time, for IQ7 (Figure 2c), the error sharply increases with the threshold value 0.3. For IQ3 and IQ8 (Figure 2b,d), the minimum mean absolute error corresponds to the threshold value 0.7.
For comparison with the single-channel approach, the minimum error values that were achieved using the multi-channel approach are taken. Table 2 shows the comparison results. The standard errors of the mean are given in parentheses.
In all cases, the standard errors of the mean using the single-channel approach are almost two times smaller. This suggests that this approach allows for a more stable result. This is because when using the single-channel approach, one main component (q = 1) for PCR is always selected during nested cross-validation. Meanwhile, for the multi-channel approach, the number of components varies more, especially for IQ2. Table 2. The mean absolute error using nested cross-validation.

IQ Component
Single-Channel Multi-Channel For verbal intelligence (IQ2 and IQ3), the single-channel approach gives the best result. However, for non-verbal intelligence (IQ7 and IQ8), the results of the multi-channel approach are better. The complexity of the single-channel approach is that a very large number of features are extracted, which is a multiple of the number of channels. In the multichannel approach, the number of features does not depend on the number of channels, since it takes into account the connections between them. However, there are difficulties with choosing a method for constructing a brain network graph and choosing an appropriate value of the threshold of connection strength.
Probably, a reasonable compromise would be to include both single-channel and multi-channel features as an initial subset of features in the principal component regression.

Conclusions
A single-channel feature set is easier to extract, but there are difficulties with the large dimensions of the feature space and with high correlations between features. To build a multi-channel feature set, it is necessary to determine the values of the weights of the graph edges, as well as to set the value for the coherence threshold. It turns out that the choice of these parameters affects the error of the IQ model built on the basis of feature set obtained using a multi-channel approach. Moreover, the empirical study results do not allow us to give recommendations on the optimal choice of the threshold. Nevertheless, it was found that it is possible to build an unweighted graph. As a result, the accuracy is not worse than in the case of a weighted graph. For verbal intelligence, the single-channel approach gives the best result. For non-verbal intelligence, on the contrary, the results of the multi-channel approach are better. This is consistent with the results of the article [14], where, for non-verbal intelligence, the graph-theoretic approach to the analysis of the brain network was used. A possible compromise is the combination of single-channel and multi-channel features in the overall principal component regression.