Deep Convolutional Neural Network Model for Automated Diagnosis of Schizophrenia Using EEG Signals

: A computerized detection system for the diagnosis of Schizophrenia (SZ) using a convolutional neural system is described in this study. Schizophrenia is an anomaly in the brain characterized by behavioral symptoms such as hallucinations and disorganized speech. Electroencephalograms (EEG) indicate brain disorders and are prominently used to study brain diseases. We collected EEG signals from 14 healthy subjects and 14 SZ patients and developed an eleven-layered convolutional neural network (CNN) model to analyze the signals. Conventional machine learning techniques are often laborious and subject to intra-observer variability. Deep learning algorithms that have the ability to automatically extract signiﬁcant features and classify them are thus employed in this study. Features are extracted automatically at the convolution stage, with the most signiﬁcant features extracted at the max-pooling stage, and the fully connected layer is utilized to classify the signals. The proposed model generated classiﬁcation accuracies of 98.07% and 81.26% for non-subject based testing and subject based testing, respectively. The developed model can likely aid clinicians as a diagnostic tool to detect early stages of SZ.


Introduction
In the medical field, diseases are often diagnosed by means of laboratory tests, biological markers, or by imaging modalities. However, the diagnosis of diseases encompassing psychiatric disarray is predominantly based on interviews from patients, symptoms presented, and the existence or absence of representative behavioral signs [1]. Schizophrenia (SZ) is a severe, prolonged disorder of the brain that interrupts normal thinking, speech, and the behavioral characteristics of an individual [2]. The National Institute of Mental Health views SZ as a significant contributor to disease burden, with about 2.4 million people in the United States over the age of 18 effected by it [3]. Moreover, the World Health Organization reports that more than 21 million people are affected by SZ worldwide. Schizophrenia is a manifestation of a constellation of symptoms that can include hallucinations, hearing voices that are non-existent, disorganized speech, and functional deterioration, among many others. selection was done using the wrapper method [20]. The 1-norm Support Vector Machine (SVM) classifier was utilized to classify correct and incorrect trials in data with the SVM Model 1, yielding a classification accuracy of 84%. The SVM Model 2 was implemented to classify normal versus SZ condition in correct trial data, achieving a classification accuracy of 87%. Santos-Mayo et al. [21] analyzed the EEG-event-related potentials(ERP) signals of participants who were involved in an auditory oddball task. The brain signals were recorded using Brain Vision equipment, in compliance with 10-20 international standards. After acquisition, the signals were pre-processed using EGGLAB, after which 16 time-domain features and four frequency-domain features were extracted per electrode, for each participant. Features were selected via linear discriminant analysis using J5, mutual information feature selection (MIFS), and double input symmetrical relevance. The Multilayer Perceptron (MLP) and SVM classifiers were employed for classification. High classification rates of 93.42% and 92.23% were achieved with the J5 MLP and J5 SVM classifiers, respectively.
Ibanez-Molina et al. [22] acquired EEG recordings from participants while they were at rest and engaged in a naming task. The Neuroscan SynAmps 32-channel amplifier was employed for the data acquirement. EEG signals at the resting phase were acquired prior to the task, while those from the task were extracted after each trial. In the resting phase, the segments were analyzed using a moving window method, after which Lempel-Ziv complexity (LZC) was computed per window. After normalization, the final LZC value was computed by calculating the average of the values obtained from the moving window method. A total of 80 EEG segments of 2 × 10 3 ms were evaluated, at task, and then averaged to obtain the final Multiscale LZC value. Higher complexity values were reported in right frontal regions of patients who were at rest. Pang et al. [23] analyzed 2D time and frequency domain connectivity features and 1D intricate network features gauged from EEG signals. These features were then input to the Multi-domain connectome CNN model to obtain feature maps, which aided in the classification process. An accuracy of 93.06% was yielded.
It is notable from Table 1 that most prior studies employed machine learning techniques to diagnose SZ. However, these conventional techniques can be cumbersome, as features require manual extraction and selection prior to SZ classification. Additionally, these methods underperform when large datasets are used. Hence, we have employed a deep convolutional neural network (CNN) model to detect SZ in this study. The novelty of this method lies in the development of an eleven-layered system to distinguish between normal and SZ subjects using EEG signals. Moreover, this model circumvents the typical feature extraction and classification processes, allowing quicker yet more accurate diagnosis.

EEG Recording and Preprocessing
EEG signals from 14 patients with paranoid SZ, comprising seven males and seven females, with average ages of 27.9 ± 3.3 and 28.3 ± 4.1 years, respectively, were collected from the Institute of Psychiatry and Neurology in Warsaw, Poland [24]. The exclusion standards involved patients with severe neurological ailments such as Alzheimer's, early stage SZ, and epilepsy, amongst other considerations, such as pregnancy and existence of a general medical condition. Fourteen healthy subjects within the same age group and gender proportion were recruited for the study from the same institute as well. Each participant provided informed consent to participate in the study upon receiving the study protocol.
As participants remained in a relaxed state with eyes closed, fifteen minutes of EEG data was collected at a sampling rate of 250 Hz. Data was obtained via the typical International 10-20 System. The electrodes used were Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, O2. The signals acquired were then divided into segments, in which the signals can be considered to be stationary. Each segment consisted of a 25 s (6250 sample) window length and was normalized with Z-score, before feeding to the one-dimensional deep convolution network for training and testing. A total of 1142 EEG segments were used and each segment consisted of 6250 × 19 sampling points. Normalization was employed to scale the signals to a standard range of values, hence allowing faster convergence by the deep learning model during training. For subject-based testing and non-subject-based testing, 50 epochs and 70 epochs were fed into the network, respectively. An epoch is the dataset that passes forward and backward through a neural network once. An epoch of training for deep learning lasted between 2 to 3 s. For subject-based testing, the validation of the system is executed in three phases: training the data, validation, and testing of data, respectively. During the training phase, k-fold validation is employed, wherein the full data pool is split into fourteen equal parts (subjects). Of these subjects, twelve were used for training, one subject for validation, and one subject for testing, respectively. This process was repeated fourteen times so that all of the fourteen subjects were subjected to the training, validation, and testing phases. In non-subject based testing, the system is validated through the training and testing phases. During the training phase, ten-fold validation is employed, whereby the entire data is split into ten uniform parts. Of these, nine are used for training the model and the remaining one part is used to test. This process is reiterated such that each of the ten portions is involved in both the training and testing phases. Thereafter, 20% of the cross-validation training data is set aside for validation of the model. Figure 1 illustrates an example of EEG recordings from normal and SZ patients.

Deep Learning
Since EEG signals are nonlinear in nature, nonlinear feature extraction techniques are often employed to differentiate between EEG signals of normal versus SZ patients [25]. Machine learning is prevalently used for pattern recognition. However, this state-of-the-art technique exhibits some impediments. It works well for simple recognition tasks [26], but in realistic settings where the features studied display substantial variability, larger training datasets are needed in order to recognize them [27]. Additionally, a model with a sizeable learning capacity enables higher level features to be studied through learning of data from large datasets as compared to the traditional machine learning techniques. Moreover, conventional techniques require features to be extracted manually. In deep learning, both the feature extraction and classification processes are conducted automatically [28,29] unlike the traditional machine learning techniques. Amongst others, CNN is the most prevalent type of deep learning network that has been exploited by researchers to identify abnormal EEG signals [30] and to study these signals to diagnose disorders such as depression [31], seizure [32], attention deficit hyperactivity disorder [33] and autism [34]. In this study, an eleven-layer deep CNN model has been implemented to discern between normal and SZ classes for non-subject based testing and subject based testing, respectively. Figures 2 and 3 illustrate the models used for non-subject based testing and subject based testing, respectively.

Convolutional Neural Network
The CNN is a complex network which comprises many masked layers and parameters. The three main tiers in the network are the convolution, max pooling, and fully connected layers [35]. The CNN undergoes a training protocol wherein the convolutional layer uses different sized kernels to interpret the input signal. During convolution, features are extracted from input signals, with the feature maps formed thereafter for the next layer [36]. To normalize the training data, the batch normalization layer is then exercised so that it flows between the middle layers. This helps to expedite and boost the learning process. Max pooling shrinks the size of the feature map, as it yields only the highest number in every kernel. The output from the convolutional and pooling layers portray the top features of the input data. The fully-connected layer then categorizes the input data into the various classes based on the training data. Each neuron in the max pooling and fully-connected layers are connected, whereby the output accurately forecasts the outcome of the input signal as normal or not [37,38].
The system generally learns better with increasing depth of the network; however deeper networks may prolong computational time. Yet, in our study, careful consideration was taken in designing a network that merits a more rapid calibration time. The best classification result is yielded from parameters which are calibrated during training. Architecture   Figures 2 and 3 highlight the architectures proposed in this study. Subject-based testing and non-subject based testing involve different approaches. CNN architecture of subject-based testing uses average pooling layer to obtain smoother features and global average pooling layer at the end to provide more generalized predictions, while the non-subject based testing is a classical CNN architecture that consists of convolution, max pooling and fully connected layers. These structural differences help to enable the model to generalize better during the training phase, depending on the partitioned training and testing data. The non-subject based testing model tends to perform well as we may be using the same subject data for training as well as testing. However, when data are separated based on subject, the classification model needs to learn well the generalized features, in order to classify the new subject data correctly. Hence different architectures of CNN were used.

Proposed CNN
To improve generalization for subject based testing, dropout is applied to layers 4 and 6 during training, with a dropout rate of 0.5 (meaning that there is 0.5 probability that a neuron will be dropped out during training) but in non-subject based testing, dropout is applied to layers 9 and 10 with a dropout rate of 0.5. Table 1 details the layers used. In subject based testing, Adam optimization [39] parameters, with a learning rate of 0.001, are employed with the Leaky Rectifier Linear Unit (LeakyRelu) and their function is used as the activation functions for layers 1, 3, 5, 7, 9 and 11, respectively. Max pooling is employed after convolution to extract the most crucial features. The average pooling layer is applied after max pooling, to better smooth the features. Subsequently, the global average pooling layer is used instead of the dense layer, in order to obtain a more generalized model. Global average pooling has the upper hand over the dense layer as it does not contain any trainable parameters, thus reducing the likelihood of overfitting. All of the factors are fine-tuned based on the training set that provides the optimal training accuracy. The number of filters and kernel size were determined via the brute force technique. Classification was then done with the help of the fully-connected layer.
In non-subject based testing, Adam optimization parameters with a learning rate of 0.0001 are used with LeakyRelu and Softmax functions for layers 1, 3, 5, 7, 9, 10 and 11, respectively. Max pooling is applied after convolution at each stage to extract the most important features. Table 2 highlights the details of all layers used. The model with the best validation accuracy was considered during training and testing. Classification was then done with the help of the fully-connected layer.  Fully connected 20 --10 Fully connected 10 --11 Fully connected 2 --

Results
The CNN network employed in this study was designed using Two Intel Xeon 2.40 GHz (E5620) processors with 24 GB RAM and the Intel(R) Xeon(R) CPU E5-2650 v4 2.20GHz (2 processors), 384 GB RAM and NVIDIA Quadro K4200. Accuracy, sensitivity, positive predictive value, and specificity were utilized as the assessment parameters. Tables 3 and 4 show the classification result per fold for subject based testing and non-subject based testing, respectively. The best diagnostic performance for the subject based testing is achieved with a learning rate of 0.001 while that of the non-subject based testing it is 0.0001. Figure 4a,b indicate the performance of the network with dropout layers. It is notable that the accurateness of the training set does not deviate substantially from that of the validation set in Figure 4a, when dropout is added to layers 9 and 10 during training for non-subject base testing. However, in Figure 4b, the accuracy of the training set is far better than that of the validation set, when dropout is added to layers 4 and 6 during training for the subject base testing. The proposed architecture generated high accuracy, sensitivity, specificity, and positive predictive values of 98.07%, 97.32%, 98.17%, 98.45% and 81.26%, 75.42%, 87.59%, 87.59%, for the non-subject based testing and subject based testing, respectively. It is apparent that non-subject based testing using 10-fold yields results of higher accuracy compared to subject based testing using 14-fold. Figure 5 shows the confusion matrix result. Based on Figure 5a, it is evident that 13.18% of healthy subjects are miscategorized as SZ patients and 23.32% of healthy subjects are incorrectly classified as SZ patients. In Figure 5b, 1.56% of healthy subjects are miscategorized as SZ patients and of 2.24% healthy subjects are wrongly classified as SZ patients.

Comparison with Related Work
Among related work, Kim et al. [16] exploited feature extraction methods on the different brain waves and obtained an accuracy of 62.2% on the delta frequency band. Dvey-Aharon et al. [14] also explored feature extraction methods on beta brain waves and obtained an accuracy between 91.5% and 93.9%. Johannesen et al. [19] analyzed five brain waves using a software program and employed statistical analysis and feature selection. Two SVM models were then implemented for classification, with accuracies of 84% and 87% yielded for models 1 and 2, respectively. Santos-Mayo et al. [21] extracted features by employing feature extraction methods and selected features via linear discriminant analyses. Classification accuracies of 93.42% and 92.23% were achieved with the J5 MLP and J5 SVM classifiers, respectively. Ibanez-Molina et al. [21] used the moving window method to compute Multiscale LZC to analyze brain signals. The study revealed that higher complexity values were present in right frontal regions of patients who were at rest. Pang et al. [23] employed the Multi-domain connectome CNN model to classify extracted features with an accuracy of 93.06%. It can be noted from Table 5 that the current state-of-the-art techniques can be employed to classify SZ accurately. Comparing the different techniques discussed, it is evident that the highest accuracy is yielded for the classification of SZ using the CNN deep learning algorithm. In non-subject based testing, the segments used for training and testing are split randomly, wherein the subjects are not truly separated, resulting in higher accuracy, as compared to subject based testing, wherein the segments are not randomly split. Hence, using 10-fold validation [40,41] for non-subject based testing generated more accurate results as compared to 14-fold validation for subject based testing. The model developed in our study and described herein could potentially also be used to diagnose other neurological disorders such as Alzheimer s, Parkinson's disease, and epilepsy. Apart from the CNN model, other deep learning methods such as long short-term memory (LSTM) and autoencoders could also be explored in the diagnosis of SZ.

Merits and Drawbacks of the New Paradigm
The main advantages of the proposed system include: (1) An eleven-layered CNN model has been developed to accurately assess SZ patients versus controls.
(2) The CNN model does the extraction, selection, and classification processes automatically.
(3) The model is validated with the highly graded 10-fold cross validation technique. (4) High accuracy with a small data size is an attestation to the robustness of the system. Despite its high classification accuracy, the proposed system does exhibit some limitations. The main disadvantages of the proposed system are: (1) The CNN model was developed using a small data pool of 14 healthy subjects and 14 SZ patients.
(2) Compared to the traditional machine learning techniques, CNN is costly to compute. Johannesen et al. [19], 2016 60 features per participant Theta 1 and 2, alpha, beta and gamma frequency bands analysed during a working memory task.
Brain Vision Analyser software to analyse signals Support vector machine (SVM) to build EEG classifiers Regression-based analyses used to validate SVM models.

Future Work
To improve the efficacy of our CAD system, we propose adding a web-based detection component to the existing model. Figure 6 highlights how the added component would work. This method taps the Internet for SZ patient diagnostics. The EEG signals gathered from patients would be saved in the server within the clinic or hospital and sent to cloud, wherein the developed CNN model is positioned. The diagnostic result is then ported to the clinic or hospital via the cloud. Additionally, this technique has an edge over others, as the diagnostic result can also be sent directly to the patient via a push notification ported to mobile devices. With the implementation of this system, the task of healthcare professionals can be made easier.

Conclusions
An eleven-layered CNN model was proposed to detect SZ using EEG signals. High classification accuracies of 98.07% and 81.26% were obtained for non-subject based testing and subject based testing, respectively, despite the small data pool. With the proposed technique, exhaustive screening of SZ patients to alert for behavioral markers of the disease is not required, as the model is satisfactory in automatically assisting with the diagnosis. This robust system is foreseen to be a windfall to clinicians as a diagnostic tool, aiding them in SZ assessment. In the near future, we intend to use a larger dataset to test our model, and also plan to combine the web-based cloud method to identify the early stages of SZ.