Application of Machine Learning to Electroencephalography for the Diagnosis of Primary Progressive Aphasia: A Pilot Study

Background. Primary progressive aphasia (PPA) is a neurodegenerative syndrome in which diagnosis is usually challenging. Biomarkers are needed for diagnosis and monitoring. In this study, we aimed to evaluate Electroencephalography (EEG) as a biomarker for the diagnosis of PPA. Methods. We conducted a cross-sectional study with 40 PPA patients categorized as non-fluent, semantic, and logopenic variants, and 20 controls. Resting-state EEG with 32 channels was acquired and preprocessed using several procedures (quantitative EEG, wavelet transformation, autoencoders, and graph theory analysis). Seven machine learning algorithms were evaluated (Decision Tree, Elastic Net, Support Vector Machines, Random Forest, K-Nearest Neighbors, Gaussian Naive Bayes, and Multinomial Naive Bayes). Results. Diagnostic capacity to distinguish between PPA and controls was high (accuracy 75%, F1-score 83% for kNN algorithm). The most important features in the classification were derived from network analysis based on graph theory. Conversely, discrimination between PPA variants was lower (Accuracy 58% and F1-score 60% for kNN). Conclusions. The application of ML to resting-state EEG may have a role in the diagnosis of PPA, especially in the differentiation from controls. Future studies with high-density EEG should explore the capacity to distinguish between PPA variants.


Introduction
Primary progressive aphasia (PPA) is a clinical syndrome secondary to the neurodegeneration of language brain regions and networks [1]. There are currently three main variants of PPA recognized in the literature: non-fluent (nfvPPA), semantic (svPPA), and logopenic variants (lvPPA) [2]. Diagnosis of PPA in the early stages is usually challenging. On the one hand, very mild word-finding difficulties may be present in aging, and the insidious onset of PPA symptoms limit an early identification [3]. On the other hand, there is a certain overlap between the PPA variants, especially between nfvPPA and lvPPA [4,5]. Neuropsychological batteries and language assessments are usually time consuming, and a high level of expertise is necessary for an adequate interpretation of the clinical findings. Although some novel and brief cognitive tests are being developed [6,7], neuroimaging and cerebrospinal fluid biomarkers are usually performed to confirm PPA diagnosis and the specific variant in each case. Structural magnetic resonance imaging (MRI) have shown adequate values of sensitivity and specificity for the diagnosis of svPPA, but diagnostic properties for the other variants are poorer [8,9]. Other more advanced MRI sequences show different patterns between PPA variants, but are not generally applicable routinely. Regarding positron emission tomography imaging, the 18F-FDG tracer has shown adequate values for diagnosis of the three variants of PPA, especially svPPA and lvPPA [10]. Amyloid tracers may distinguish between patients with amyloid deposition (generally associated with lvPPA) or not, but it does not have enough sensitivity to discriminate between subtypes of non-Alzheimer's disease variants. Novel tracers, such as tau tracers, are still under investigation and are not usually available beyond research settings [11]. Consequently, the combination of several tools is often necessary to conduct an adequate diagnosis. However, some of these techniques are not available in all clinical settings, which jeopardizes the equality of opportunities. In recent years, there is an increasing interest in an accurate diagnosis of neurodegenerative disorders. Furthermore, early diagnosis may imply early access to language therapies, which have shown positive effects in PPA [12,13]. In addition, the classification of PPA into three clinical variants improves the prediction of the underlying pathology [14]. Thus, novel and cost-effective biomarkers are necessary for early detection and differential diagnosis between PPA variants.
One of the key processes in neurodegenerative disorders comprises the alterations in brain activity and network disruptions [15]. There are several methods for measuring brain activity, with differences in the spatiotemporal resolution and applicability. Some methods, such as single-unit recordings, have high spatial and temporal precision, but are invasive and are not applicable to large networks and clinical practice. Among the non-invasive methods, functional MRI, magnetoencephalography and electroencephalography (EEG) permit the assessment of brain activity across the entire brain [16]. These methods are generally well tolerated and applicable in clinical practice, and evaluate brain activity with a resolution on the scale of millimeters and centimeters. This means that each voxel of a conventional MRI or a channel of an EEG reflect the activity of thousands of neurons and billions of synapses [17]. In comparison to functional MRI, EEG shows lower spatial but higher temporal resolution. However, both techniques are regarded as useful for the assessment of brain activity and connectivity. As advanced computational algorithms promise to improve signal processing and filtering, noninvasive recording devices are increasingly being investigated and applied. Some approaches record neural potentials from the scalp, and depending on the intensity of recording, they can capture the activity of thousands of neurons. Multiple layers obstruct information transmission from the cerebral cortex to the scalp, resulting in signal amplitudes and spatial resolution that are reduced. The electrodes are also sensitive to external interferences such as eye movements, face movements, chewing, or swallowing, among others [18].
EEG is a widely available technique very useful for the diagnosis of epileptic disease. In the last years, quantitative EEG has also been confirmed as a helpful biomarker in the assessment of several neurodegenerative disorders [19]. These studies suggest a potential clinical application of EEG in the assessment of neurodegenerative disorders, either in the differential diagnosis between them or with other non-neurodegenerative causes, including psychiatric conditions [20]. Data regarding the application of EEG signal in PPA are scarce. In a recent study [21], three patients with nfvPPA and five with primary progressive apraxia of speech (two of them also showing aphasia) underwent EEG. A theta slowing was detected in almost all patients with nfvPPA, suggesting a potential clinical application. Another recent study has detected some particular findings in the analysis of EEG microstates in 8 patients with svPPA in comparison with controls and Alzheimer's dementia [22].
Machine learning (ML) techniques may be helpful in improving the diagnostic performance of EEG, as has been shown in predicting epileptic seizures [23][24][25][26], Alzheimer's disease [27], or depression [28,29]. The rationale for the application of ML to EEG is based on the following factors. First, the visual analysis of EEG is time-consuming and requires high levels of expertise. Second, changes in neurodegenerative disorders may be less visually evident than epileptiform activity, which also limits the inter-rater reliability. Third, filter settings, frequency bands and criteria for thresholds are not clearly defined in the setting of neurodegenerative disorders [30].
ML for EEG analysis may be divided into two approaches: feature-based and endto-end. On the one hand, feature-based decoding algorithms have a long track record of effectiveness in various EEG decoding challenges [20,31]. The data are often represented by handcrafted and previously selected features in this approach. End-to-end decoding algorithms, on the other hand, allow raw or minimally pre-processed data as inputs [32,33]. To date, end-to-end deep learning has gotten much interest due to its success in other disciplines of research. At least for the extraction of the features, this technique might lead to better solutions or the discovery of unexpectedly informative characteristics, and it does not involve handcrafting. In terms of learning features, end-to-end models have a reputation for being "black boxes".
In this study, we aimed to evaluate the potential of EEG as a biomarker for the diagnosis of PPA. For that purpose, the EEG raw signal was pre-processed in terms of feature transformations to enlarge the representation domain. We evaluated the diagnostic performance of EEG for the diagnosis of PPA, and the differential diagnosis between the three PPA Variants, applying ML models.

Participants
Forty patients with PPA were enrolled in this study. All patients met the current diagnostic criteria for PPA [1]. Patients were evaluated with a comprehensive language and neuropsychological protocol, which has been described elsewhere [34]. Structural MRI and FDG-PET were performed in all cases supporting the clinical diagnosis. Accordingly, patients were categorized as nfvPPA (n = 18), svPPA (n = 10), and lvPPA (n = 12). Twenty controls (control group, CG) were also included for comparison. Table 1 shows the details of the groups participating in the study.
The CG was obtained from patients that underwent EEG because of a previous history of syncope, but visual analysis of EEG, neuroimaging, and clinical follow-up were normal, excluding potential neurological disorders.

EEG Acquisition
EEGs were recorded in a resting state condition with the eyes closed and under the supervision of trained personnel. EEGs were acquired on a NicoletOne device of 32 channels, using the standard 10/20 system and referenced to A1. Time of acquisition was 20 min.

Preprocessing
Different signal transformations were applied, aiming to expand the amount of information. Signal preprocessing was performed following the pipeline implemented by [35] in EEGLAB Software (Matlab). These procedures try to minimize the external noise and artefacts that are usually present in a raw EEG signal. The following steps were conducted, which are also summarized in Figure 1: Time ranges selection. Original signals are too long to be analyzed and can contain some additional noise, so we manually selected those time ranges with higher quality in the signal representation to get the most accurate and clean signal. This process also considered the labels recorded during the EEG acquisition and clinical assessment, which notify about the state of the patient, unexpected events, or activities that could impact on the signal.

2.
High-pass filtering at 1 Hz. This filter was applied to remove baseline noise, remove noise introduced by sweating, and prepare the signal for ICA analysis.

3.
Apply CleanLine process with the following configuration: 10 Hz of bandwidth at 50 Hz line frequency. This preprocessing step removes line noise and related harmonics from each one of the scalp channels using a novel approach, as described in [36]. For that purpose, and for each sliding window over the original data, a multi-taper FFT is applied to transform the signal to the frequency domain; after that, the complex amplitude of the desired frequency is extracted. With that information, a noise signal in the frequency domain is generated and, finally, the time-domain associated noise signal that needs to be extracted from the original one is also created.

4.
Re-reference data to average. This is the most effective and easiest way to re-reference EEG data because it establishes that the summed up power across the scalp topography should sum zero. In other words, we removed the mean over all scalp channels to every single channel to make sure that all channels contribute with the same weight.

5.
Low pass filter at 40 Hz. This step was applied in order to remove any possible undesired high-frequency signal that was not removed by CleanLine. Although other investigations are looking for biomarkers in higher frequency ranges of EEG signal, most recent research works are focusing in lower frequency ranges [37]. For simplicity of our analysis and control of error sources, we have limited our work to the lower frequency bands. 6.
To apply ICA (Independent Component Analysis) to the signal. This method is a linear decomposition technique which aims to find the source signals from a set of mixed signals, as it occurs with EEG. Unlike PCA (Principal Component Analysis), ICA tries to retrieve those original signals that are maximally statistical independent in just one domain [38]. 7.
To epoch data into windows of duration equal to one second without overlapping. 8.
Visual artifact rejection of epochs. As a final step, we reviewed manually all signals and all their epochs looking for artefacts or undesired signal events.

Quantitative EEG
QEEG, quantitative EEG, is the frequency domain transformation of the original EEG signal [39]. To obtain this transformed signal, the Discrete Fourier Transform method was applied over our sampled (at 500 Hz) EEG signal. Given a x(n) discrete EEG signal, the definition of Fourier Transform (FT) is as follows: This transformation gave a transformed domain that increases the representation domains of the original signal and, hence, the information provided. We divided the total frequency range into non-overlapping frequency bands:

Wavelet Transformation
Wavelet Transform (WT) is the decomposition of the original signal into a set of basis functions consisting of contractions, expansions, and translations [40]. This is a similar approach to FT but using wavelet functions to achieve the transformed domain.
WT of the EEG signal was obtained by filtering repeatedly until reaching the desired level of decomposition. In each repetition, we applied a low-pass filter to obtain the approximation coefficient (CA) and a high-pass filter to obtain the detailed coefficient (CD). After every filtering stage, the signal was down-sampled by half the sampling frequency of the previous level.
A total of seven sequential subdivisions were applied until a sufficient number of transformations, all correctly subdivided in frequency, was achieved. This process provided eight signals, each one assigned to a different frequency range: From all the extracted signals, sub-bands 1 and 2 were removed from the pipeline because neither of them offered any information after the application of step 5 in the preprocessing pipeline section (low-pass filter at 40 Hz).

AutoEncoders
Autoencoders are a specific type of neural networks architecture where the input is the same as the output. They compress the input into a lower-dimensional code and then reconstruct the output from this representation. The code is a compact "summary" or "compression" of the input, also called the latent-space representation. This novel modeling technique is also exploited in dimensionality reduction problems. This architecture was created by using an Encoder-Decoder system ( Figure 2). The first part, the Encoder, used fully-connected layers in which the number of input neurons is higher than the number of output neurons, to achieve that reduction of dimensionality. The second part, the Decoder, used fully-connected layers in which the number of input neurons is lower than the number of output neurons to achieve the reconstruction of the original signal.
In addition to this vanilla configuration, there are other complex configurations with multiple hidden layers for the encoding and decoding part, some of them even add noise to the input or to the intermediate part to force the neural network as a regularization technique to a better generalization.
As stated, this type of neural networks are usually applied as a technique for dimensionality reduction, but we decided to use them as a mechanism to transform the time domain (similarly to the FT and WT). We applied the Encoder to the EEG values along time, creating a new domain of features and information representation.

Graph Theory Analysis
Graph Theory Analysis (GTA) is a mathematical formalism used to model pairwise relations between objects. Here, we used the EEG sources from the different electrodes to generate the network that merges such data [41].
For that purpose, we generated a graph matrix, also called adjacency matrix, by calculating all pairs of partial correlations between all available channels or electrodes [42]. The absolute value operator was also applied to this matrix to achieve our final result, which in this case was an undirected weighted adjacency matrix.
Once the network was created, it was analyzed using different metrics: • Node degree. This metric represents the number of links detected for every node. In addition to these metrics, we also created a brain representation that shows each electrode as a node in a graph structure, and the connections that we obtained between those electrodes. An example of this representation ca be found in the Figures 3 and 4 where the connections of the CG and PPA groups, filtered by alpha frequency range (from 8 to 14 Hz), are shown, respectively.

Data Analysis 2.8.1. Binary Classification Model between PPA and CG
A classification model was generated to differentiate between CG and PPA patients based only on the extracted EEG features. Seven classification algorithms were evaluated: Decision Tree, Elastic Net (EN), Support Vector Machine (SVM), Random Forest (RF), k-Nearest Neighbors (kNN), Gaussian Naive Bayes (Gaussian NB) and Multinomial Naive Bayes (Multinomial NB).
The following pre-processing pipeline was applied to prepare the data for modelling: • In the training process, the selection of the best hyperparameters for each model was accomplished by a Bayes-search optimization algorithm (this optimization algorithm created a full space with all possible hyperparameter values and applied Bayes Theorem in order to find those exact values that minimized the error function). All models performed a binary classification using a 10-fold cross-validation using F1-Score metric (2) as the main metric. This metric was selected to optimize the classification problem, which was imbalanced. Precision (3), Sensitivity (4), Specificity (5) and Youden Index (6) are also displayed.

Classification Model for All Groups
Following the same pipeline described in the binary classification model, a multiclass classification model was applied in order to distinguish between nfvPPA, svPPA, lvPPA and CG. The same pre-processing steps, hyperparameter tuning techniques and crossvalidation options were applied here. All models performed a multiclass classification with 4 different classes (one per each group of patients) and using F1-Score metric as their main aim.

Network Analysis
We transformed each EEG signal into a Network; this allowed to extract additional network metrics and enlarge the set of features per patient, but it also allowed to evaluate the differences between two brains in terms of activity according to these network metrics.
We also visualized the generated connections between EEG channels in each group of patients. This provided meaningful information about interactions of brain regions across the different groups of patients.

Principal Components Analysis
In order to explain the complexity of our working dataset, we applied Principal Component Analysis (PCA) to reduce the dimensionality. PCA aims to find the directions of maximum variance in high-dimensional data and projects them onto a new subspace with equal or fewer dimensions than the original one. Hence, it reduces the number of features by combining them linearly. We performed a dimensionality reduction to only two principal components. Accordingly, the visualization of all subjects as data points is allowed by looking for the linear combination of all the extracted features into these two principal components. In this way, it is possible to visualize the multi-dimensional data distribution and evaluate how mixed are the data instances in the representation space.

Classification Model between CG and PPA
Main metrics are summarized in Table 2.  Table 2). kNN model, a non-parametric supervised classification method, achieved the best performance, showing a Sensitivity of 0.88, an F1-Score value of 0.83 and a Specificity of 0.5. The confusion matrix from the best model (kNN) is shown in Figure 5, as well as its ROC curve in Figure 6.   Figure 7. A Decision Tree model is included in Figure 8. Regarding the most relevant variables, all variables except one were generated by the network transformations. Specifically, 40% from Node Degree, 50% from Clustering Coefficient, 5% Path Length, and 5% qEEG. Similarly, most features used in the Decision Tree algorithm are associated with network analysis.

Classification Model between All Groups
A multiclass model (4 classes) was developed to evaluate the possibility of automatic detection among all the PPA variants and CG. The same aforementioned models were evaluated (Table 3). Again, kNN model achieved the best performance. However, Sensitivity was 0.58 and F1-Score was 0.6. Confusion matrix for this model is shown in Figure 9.  As in the previous analysis, Figure 10 shows the graphical representation of the Decision Tree Model. In this case, decisions are mainly based on features obtained from network analysis and Autoencoder transformations.

Discussion
In this pilot study, we evaluated the diagnostic performance of a resting-state EEG obtained in clinical practice conditions for the diagnosis of PPA. We applied ML models, as they may be helpful to maximize the diagnostic capacity from many variables with no a priori hypotheses. In this regard, diagnostic performance was relatively high for the detection of patients with PPA in comparison with the control group. This suggests that there are certain EEG abnormalities that may be detected in patients with PPA. The most important features ranked by the algorithms for the classification and included in the decision trees algorithms involve mainly temporal and frontal channels in both hemispheres. Interestingly, features derived from network analysis obtained the best classification, emphasizing the role of graph theory in the analysis of EEG data [32]. These findings are consistent with recent investigations that are exploiting this new area of analysis [43,44].
Conversely, the application of EEG to the diagnosis of the specific variant of PPA did not achieve a satisfactory classification. Previous studies using quantitative data from EEG for the differential diagnosis of neurodegenerative disorders have obtained generally better results [31]. For instance, applying support vector machines, a 91% of accuracy was found to distinguish Alzheimer's disease and dementia with Lewy bodies, and 88% for Alzheimer's disease and Frontotemporal dementia [45]. Another study achieved a 93.3% of accuracy to classify between Alzheimer's disease and Frontotemporal dementia [46]. However, these studies were performed with small samples, and were not replicated in larger studies [47]. The application of EEG to PPA is probably more challenging, due to the regional overlap between PPA variants in contrast to other disorders such as Alzheimer's disease and Frontotemporal Dementia. In this regard, high-density EEG with a larger number of channels might obtain better results in the classification between PPA variants.
Our key insight is that machine learning itself can deal well with errors, qualitative and corrupted data and, more importantly for our purposes, integrate heterogeneous data from multiple domains. With this aim, our research work has enlarged the dataset to increase the information representation. The applied machine learning algorithms can jointly manage transformations from time to frequency domain, wavelet or network representation provided the setup parameters are carefully selected. In [48], a total of 49 experimental studies published from 2009 until 2020, which apply machine learning algorithms on resting-state EEG recordings from AD patients, were reviewed. These works did not evaluate the benefits of increased information representation in classification accuracy. Most of the studies focused on AD detection incorporating Support Vector Machines (SVM). Conversely, we found that classification algorithms based on distance (similar to kNN, where the function is only approximated locally and all computation is deferred until function evaluation) can improve performance.
The visualization of the multi-dimensional data used in our study, and the complexity of such dataset, has been performed with a PCA. Using this method, we found that patients are not clearly separated ( Figure 11). This aspect is important for optimal performance in SVM models in the classification [49]). In contrast, kNN model, which is based on local distances, could lead to better predictions. This explains why we observed better performance of kNN model with respect to SVM model. Additionally, kNN model works better with a small number of features [50], which is our case after the application of the feature selection method. To follow this line, we replicated the same pipeline in the CG vs. PPA classification, skipping the feature selection phase. Thus, we compared the results of all models using only the best 50 features against using all generated features. As shown in Table 4, SVM model obtained better performance than kNN. However, the absolute values of the quality metrics (F1-score, precision, sensitivity and accuracy) explain the need for the feature selection process followed in our study.  The trend of increased information representation may be seen in recent works like [51] (still SVM in AD), or [52] (where, apart from EEG, a wide range of diagnostic tests were included). Our approach has focused exclusively in the classification possibilities of the EEG signal for the PPA, but we expect that our results could be improved by the addition of other tests such as neuroimaging, cognitive assessment, or genetics.
Patients included in this study fulfilled the current diagnostic criteria, with both MRI and FDG-PET supporting the diagnosis. As they were generally in early stages and EEG in comparison with controls was discriminative, these findings raise the possibility to explore in future studies the role of EEG in the clinical follow-up of patients with PPA, especially in the setting of clinical trials, in which reliable, reproducible and non-invasive endpoints are necessary [53].
Our study has some limitations. First, we included 40 patients, generally in early stages but not in the first consultation. Future studies should enroll a larger sample size and specifically focusing on patients in the early stages to confirm a potential role of EEG in the detection of PPA. Second, in this study, we only applied ML to EEG data. One of the main strengths of ML is the combination of multiple sources of information. Thus, the application of ML to studies including multimodality assessments (cognitive testing, MRI, PET) may be of interest to disentangle the best tools (isolated or in combination) for diagnosis of PPA [54].

Conclusions and Future Work
Our study shows that the application of ML to resting-state EEG may have a role in the diagnosis of PPA, especially in the differentiation from controls. EEG may have some advantages compared with other biomarkers.
ML techniques were applied to evaluate the possibility to automatically classify EEG data from PPA patients with respect to a control group. Our work showed that a feature expansion process can increase the information representation and achieve good classification accuracy, using mainly features from the graph-network representation of the EEG signal. The capability to classify PPA variants was also evaluated. Although lower, the classification capacity is still promising and advises further development of these automatic techniques for phenotype classification from EEG signals.
We are currently increasing the sample size to improve the classification accuracy of the models. In addition, we aim to enlarge the frequency range in the input dataset (over 45 Hz) to evaluate whether higher-frequency components may help the biomarker discovery with machine learning and deep learning methods. Funding: Jordi A. Matias-Guiu is supported by Instituto de Salud Carlos III through the project INT20/00079 (co-funded by European Regional Development Fund "A way to make Europe").

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board (or Ethics Committee) of Hospital Clinico San Carlos (protocol code 17/247, date of approval 27th June 2017).

Informed Consent Statement:
Written informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.
Acknowledgments: Jordi A Matias-Guiu is supported by Instituto de Salud Carlos III through the project INT20/00079 (co-funded by European Regional Development Fund "A way to make Europe"). We also acknowledge support from Spanish Ministry of Science and Innovation under project PID2019-110866RB-I00.

Conflicts of Interest:
The authors declare that they have no conflict of interests.

Abbreviations
The following abbreviations are used in this manuscript: