High-Precise Bipolar Disorder Detection by Using Radial Basis Functions Based Neural Network

: Presently, several million people suffer from major depressive and bipolar disorders. Thus, the modelling, characterization, classiﬁcation, diagnosis, and analysis of such mental disorders bears great signiﬁcance in medical research. Electroencephalogram records provide important information to improve clinical diagnosis and are very useful in the scientiﬁc community. In this work, electroencephalogram records and patient data from the Hospital Virgen de la Luz in Cuenca (Spain) were processed for a correct classiﬁcation of bipolar disorders. This work implemented an innovative radial basis function-based neural network employing a fuzzy means algorithm. The results show that the proposed method is an effective approach for discrimination of two kinds of classes, i.e., bipolar disorder patients and healthy persons. The proposed algorithm achieved the best performance compared with other machine learning techniques such as Bayesian linear discriminant analysis, Gaussian naive Bayes, decision trees, K-nearest neighbour, or support vector machine, showing a very high accuracy close to 97%. Therefore, the neural network technique presented could be used as a new tool for the diagnosis of bipolar disorder, considering the possibility of integrating this method into medical software.


Introduction
Electroencephalogram (EEG) signals can reveal a great variety of brain pathologic, behavioural, and medication patterns, thus providing a valuable aid in clinical applications, for instance in early diagnosis, treatment, rehabilitation, and classification [1][2][3][4]. Visual inspection of the EEG for seizure detection varies with human expertise. Thus, an automatic diagnosis of bipolar disorders (BD) can be crucial in clinical environments. BD is a grave mental illness characterized by episodes of depression, psychosis, changes in mood state, and manias. The main consequences of late diagnosis/treatment involves high rates of suicide, lower productivity, and poorer quality of life. The causes of bipolar disorder are unidentified, but present research indicates a combination of genetic components (about 70-90%) and environmental factors [5]. As a consequence, early diagnosis of BD may significantly reduce health care costs [6]. The prevalence of bipolar disorder is between 2.6% and 5% of the population [7]. According to diverse authors, misdiagnosed patients received inappropriate and costly treatment regimens involving suboptimal medication treatment [8,9]. When untreated, the illness poses a high risk of morbidity and mortality [10]. There is also an increased risk of suicide compared with unipolar depression [11]. BD is a leading cause of global disability. Therefore, correctly diagnosing bipolar disorder should be a priority for the health care systems for clinical, administrative, and research purposes.  56 34 The EEG records were obtained at the Psychiatric Service of the Virgen de la Luz Hospital. The equipment available at the Hospital was used to perform the EEGs, specifically a BrainAmp DC-32 channel Brain Vision system with a sampling frequency of 500 Hz. The International System 10-20 was used to place the electrodes, and the EEG recording standard was maintained with that format according to the clinical practice manual. Silver chloride electrodes and gel were used for measurements [45]. The impedance value between the electrode and the scalp remained less than 10 KΩ during the recording. For this, the measuring BrainAmp DC-Brain Vision System had an impedance warning system, indicating values above this limit (in this case, the electrodes were placed again or filled with more gel). The entire measuring process was standardized and strictly followed a medical protocol. It should be noted that the placement, impedance adjustment, and EEG recording were carried out by medically trained staff. The EEG records of different patients presented various noise samples, such as muscle noise, artefacts, baseline, etc. To obtain a more accurate result of the neural network, the raw brain signal measured by the electrodes was pre-processed [46,47] prior to classification. By this means, this noise and artefacts were removed. In this study, the signal was filtered between 0.5 Hz and 40 Hz, and a notch filter at 50 Hz was applied. Figure 1 shows a sample of the raw EEG recording and scalp maps. More specifically, the results obtained for all electrodes are presented at different times i.e., from 1028 ms to 9976 ms, to check which part of the brain was activated. Scalp maps display the distribution of voltage in the time or frequency domain. Information about the position of the electrodes was used to create the maps. The algorithm used to create the scalp map was based on spherical spline interpolation [48]. To calculate the spherical splines, different parameters were used: the order of the splines and the maximum degree of the Legendre polynomial. The values of the parameters used the interpolation to provide different ripple, whereas high-order splines provide flat responses.

Method
An artificial neuronal network (ANN) is a computational model based on the ture and functions of biological neural networks, inspired by the known behaviour human brain. ANN systems have a non-linear behaviour and allow adjustment to v objectives. The inputs are the stimuli that the artificial neuron receives, and the ou are the responses to those stimuli. The neuron can adapt and learn by modifying the of its synaptic weights since they can be modified and adapted to perform a given [37,43,44].
The training method and RBF network architecture developed for the classifi of bipolar disorder are shown below. In our proposed system, we created two c class BD corresponds to bipolar disorders, and class CN corresponds to control su The characteristics of the proposed neuronal network system can be seen in Figur has three layers: an input layer, a hidden layer with a non-linear RBF activation fun and a linear output layer. Equation (1) shows the activity s(p) of the sth node, which represents the Euc norm.  Figure 1. Raw EEG and scalp maps recorded.

Method
An artificial neuronal network (ANN) is a computational model based on the structure and functions of biological neural networks, inspired by the known behaviour of the human brain. ANN systems have a non-linear behaviour and allow adjustment to various objectives. The inputs are the stimuli that the artificial neuron receives, and the outputs are the responses to those stimuli. The neuron can adapt and learn by modifying the value of its synaptic weights since they can be modified and adapted to perform a given target [37,43,44].
The training method and RBF network architecture developed for the classification of bipolar disorder are shown below. In our proposed system, we created two classes: class BD corresponds to bipolar disorders, and class CN corresponds to control subjects. The characteristics of the proposed neuronal network system can be seen in Figure 2. It has three layers: an input layer, a hidden layer with a non-linear RBF activation function, and a linear output layer.

Method
An artificial neuronal network (ANN) is a computational model based on the structure and functions of biological neural networks, inspired by the known behaviour of the human brain. ANN systems have a non-linear behaviour and allow adjustment to various objectives. The inputs are the stimuli that the artificial neuron receives, and the outputs are the responses to those stimuli. The neuron can adapt and learn by modifying the value of its synaptic weights since they can be modified and adapted to perform a given target [37,43,44].
The training method and RBF network architecture developed for the classification of bipolar disorder are shown below. In our proposed system, we created two classes: class BD corresponds to bipolar disorders, and class CN corresponds to control subjects. The characteristics of the proposed neuronal network system can be seen in Figure 2. It has three layers: an input layer, a hidden layer with a non-linear RBF activation function, and a linear output layer. Equation (1) shows the activity s(p) of the sth node, which represents the Euclidean norm. Equation (1) shows the activity a s (p) of the s th node, which represents the Euclidean norm. represents the center of the s th node. For node output, a radial symmetric function was used. A Gaussian function can also be applied: with w 2 s being the width of the node.

Training of the Proposed Neural Network
In order to train the neural network, a group of known inputs and outputs (p k ; f k )(k = 1, 2, . . . , K) training pairs was used. In the proposed system, the training of the neural network consisted of two steps:

1.
First, the parameters of the hidden layer c s (p) were calculated; 2.
They are determined from the junction weights between the output and the hidden layer.
In addition, the FM algorithm was applied to choose the network structure and the centres of the hidden nodes [37,43,44]. The proposed algorithm used the FP of the input space, where a number of fuzzy sets were defined for each input variable. The innovative RBF method applied a uniform division of the discourse universe for its input j with functions of form as follows: where v s j represents the central element to which the unit's membership value is set, and l s j is half of the respective width. The sum of the degrees of correspondence at any point in the discourse universe is close to 1 for each input variable. Defining a FP into the M dimensional input space results in the initial FP of every input. From this, the following algorithm is proposed to find, from the input data vector, the nearest fuzzy subspace [37,43,44].
• Algorithm A: The closest diffuse subspace to a determined input vector is created.
Phase 1: From an input data vector p = [p 1 , p 2 , . . . , p M ] T and for j = 1, 2, . . . , M, the fuzzy set that fixes the maximum degree of membership to p j is elected. Phase 2: As for p, a fuzzy subspace F is created, obtained as the sum of the fuzzy sets chosen in phase 1.
• Through the FM algorithm, the size and centres of the hidden layer are decided. The phases in the algorithm used to select the centres and size of the hidden layer are explained below [37,43,44].
Phase 1: From input and output data (p k ; f k )(k = 1, 2, . . . , K), the rules number S is established to 0. Phase 2: The initial input data p(1) is selected, and we applied Algorithm A to create the initial diffuse subspace F 1 = v 1 , 1 1 . In addition, S is fixed to 1. Phase 3: The k − 1 input vectors are exanimated, and S diffuse subspaces are generated, with 1 ≤ S ≤ k − 1. The kth input vector p(k) is inserted, and the Euclidean relative distances zl s (p(k))(s = 1, . . . , S) are calculated between p(k) and all fuzzy subspaces S created using Equation (4). The minimum distance zl s0 (p(k)) is assumed to belong to the fuzzy subspace F S0 = v S0 , 1 S0 . Then, whether the next comparison is true is checked: If the condition is met, phase 4 can be skipped. Otherwise, the algorithm continues its normal order in phase 4. Phase 4: Algorithm A is applied and a novel fuzzy subspace is created for p(k).
In addition, the S value is updated to S = S + 1. Phase 5: It stops if k = K. Otherwise, the successive input vector is included and returns to phase 3. The final step of the initial phase aims to define the width w of the Gaussian activation function. For each i node, the width w i is estimated using the g heuristic of the nearest neighbour: where c 1 , c 2 , . . . , c g represents the node centres closest to the hidden i node. The g value was chosen so that entering an input vector into the system activates a large number of nodes.

Performance Metrics
For this study, the techniques used to check performance are described below: In these equations, TP represents the number of true positive cases, TN the true negatives, FN the false negatives, and FP corresponds to the false positive cases.
The F 1 score can be defined as Another measure of overall model classification performance is the Matthew's correlation coefficient (MCC) [49], which is defined as Finally, degenerated Younden's index (DYI) [49] and Cohen's Kappa (CK) parameters are also employed to analyse the performance of the proposed method [49].

Proposed Methodology
In the cross validation, the train sample was divided into several box folds that were retained from the training process, with training carried out iteratively with the remaining cases. The diagnostic performance of classification methods was evaluated by repeated ten-fold cross-validation and percentage split strategies. The dataset was divided into 70% for training and 30% for testing. Recorded brain signals employed during the training phase were pre-processed as described in Section 2. These pre-processed EEG data were not shared during the training and testing subsets in order to avoid the use of the same data for classification and training. The methodology applied in this study can be observed in Figure 3. To check the performance of the proposed method, its operation was compared with different machine learning algorithms. From all of them we selected the following algorithms: BLDA, GNB, KNN, DT, and SVM. All of these ML methods were implemented using the statistics and machine/deep learning MATLAB toolbox [50].  Machine learning techniques usually have one or more hyperparameters that allow a different adjustment of the algorithm during the training process. The values of these hyperparameters (number of splits, learners, neighbours, distance metric, distant weight, kernel, box constraint level, multiclass method, etc.) for each method lead to algorithms with different prediction performances to obtain the best possible accuracy. In order to optimize these hyperparameters for each ML technique used in this study, each model was trained with a Bayesian optimization approach. Bayesian optimization aims to estimate which configuration of hyperparameters is the one that would maximize the performance of the algorithm from the previous attempts, based on the assumption that there is a relationship between the various hyperparameters and the performance achieved by the algorithm. In this regard, the area under the curve (AUC) and the balanced accuracy were used as performance measures to be maximized.

Results
This section describes the results obtained during the training and classification of bipolar disorder patients. The performance of the proposed system was compared with different classification machine learning methods accepted in the scientific community. Table 2 shows the values of balanced accuracy, recall, precision, and F1 score of the classification methods for bipolar disorder and healthy patients. Systems based on BLDA and GNB obtained lower classification values than other methods, with accuracy values close to 86%; this value is improved with SVM and KNN methods that reached values around 89%. On the other hand, the proposed system, based on a NN, obtained the best performance, achieving accuracy values close to 96.78% for real EEG records. As for the precision and recall values, the KNN and SVM methods were the closest to the proposed NNbased system. In the case of the F1 score value, the BLDA and GNB methods obtained values close to 86%. KNN and SVM also provided lower performance compared with the proposed method.  Machine learning techniques usually have one or more hyperparameters that allow a different adjustment of the algorithm during the training process. The values of these hyperparameters (number of splits, learners, neighbours, distance metric, distant weight, kernel, box constraint level, multiclass method, etc.) for each method lead to algorithms with different prediction performances to obtain the best possible accuracy. In order to optimize these hyperparameters for each ML technique used in this study, each model was trained with a Bayesian optimization approach. Bayesian optimization aims to estimate which configuration of hyperparameters is the one that would maximize the performance of the algorithm from the previous attempts, based on the assumption that there is a relationship between the various hyperparameters and the performance achieved by the algorithm. In this regard, the area under the curve (AUC) and the balanced accuracy were used as performance measures to be maximized.

Results
This section describes the results obtained during the training and classification of bipolar disorder patients. The performance of the proposed system was compared with different classification machine learning methods accepted in the scientific community. Table 2 shows the values of balanced accuracy, recall, precision, and F 1 score of the classification methods for bipolar disorder and healthy patients. Systems based on BLDA and GNB obtained lower classification values than other methods, with accuracy values close to 86%; this value is improved with SVM and KNN methods that reached values around 89%. On the other hand, the proposed system, based on a NN, obtained the best performance, achieving accuracy values close to 96.78% for real EEG records. As for the precision and recall values, the KNN and SVM methods were the closest to the proposed NN-based system. In the case of the F 1 score value, the BLDA and GNB methods obtained values close to 86%. KNN and SVM also provided lower performance compared with the proposed method. Other parameters used in performance studies, such as the AUC, MCC, DYI, and Kappa index, were also analysed to check the operation of the proposed system. These parameters helped us to check the correct implementation of the methods when classifying the classes investigated in the study, i.e., bipolar patients and controls. The Matthews correlation coefficient is a more reliable statistical rate which produces a high score only if the prediction gives good results in all the confusion matrix categories (true positives, false negatives, true negatives, and false positives). As it can be seen in Table 3, the NN-based method achieved the highest MCC value. KNN and SVM were the systems that presented a MCC value close to the proposed method. The rest of the methods obtained a smaller value. Another parameter used was the Kappa index; in this case, the NN-based system again obtained the highest value. The other systems used in the comparison reached lower parameter values.  Table 4 displays the result of the classification for the different types of patients and healthy people. As can be observed, the results of the different methods were not modified by these variables. The values of balanced accuracy were very similar between females, males, adults, and children. It should be noted that the proposed method maintained the highest precision in the classification. In order to evaluate the classification capacity of the systems presented, the receiver operating characteristic (ROC) were also considered. The curve is the result of representing, Electronics 2022, 11, 343 9 of 14 for each threshold value, the sensitivity and specificity measurements [51]. Figure 4 shows the results obtained for the different classification algorithms. According to Table 3, the RBF-based system had the best AUC (0.96), and the KNN method possessed the second highest value (0.89). Specifically, the NN-based system achieved an improvement of 7%, 9%, and 10% with respect to KNN, SVM, and BLDA methods, respectively. As it can be seen, the proposed system can achieve high classification of bipolar disorder disease automatically, resulting in a tool that could help the healthcare personnel for clinical diagnosis.  For clarity, all metrics are presented as radar charts, grouped by each training and test dataset. The shape of the plots may be indicative of the quality of the models, where a perfect score would be represented by a circle. The NN-based system ( Figure 5) has the best-balanced model. The training and test sets are both virtually represented by similar circular plots. As it can be observed, BLDA and GNB methods have the worst performance in different metrics.  For clarity, all metrics are presented as radar charts, grouped by each training and test dataset. The shape of the plots may be indicative of the quality of the models, where a perfect score would be represented by a circle. The NN-based system ( Figure 5) has the best-balanced model. The training and test sets are both virtually represented by similar circular plots. As it can be observed, BLDA and GNB methods have the worst performance in different metrics.
Additionally, Big-O notation (used in computer science to describe the complexity of an algorithm) was applied to the proposed and the classic machine learning methods studied in this paper. Big-O representation specifically defines the worst case and can be used to describe the execution time required or the space used (e.g., in memory or on disk) [52,53]. Table 5 shows the complexity in seconds for the proposed systems, where N is the number of samples used in the input vector. As can be seen, the proposed RBF method presents the lowest complexity as it is a very simple neuronal network with just one hidden layer. It shows a logarithmic growth O(log(N)) (as in the case of the DT algorithm). Conversely, the SVM system takes the longest processing time, of the order of O(N 2 ), for a high number of samples. The rest of the classifiers have a linear processing time O(N). For clarity, all metrics are presented as radar charts, grouped by each training and test dataset. The shape of the plots may be indicative of the quality of the models, where a perfect score would be represented by a circle. The NN-based system ( Figure 5) has the best-balanced model. The training and test sets are both virtually represented by similar circular plots. As it can be observed, BLDA and GNB methods have the worst performance in different metrics. Additionally, Big-O notation (used in computer science to describe the complexity of an algorithm) was applied to the proposed and the classic machine learning methods studied in this paper. Big-O representation specifically defines the worst case and can be used to describe the execution time required or the space used (e.g., in memory or on disk) [52,53]. Table 5 shows the complexity in seconds for the proposed systems, where N is the

Discussion
The discrimination of bipolar disorders is a hard classification problem which requires the use of a strong optimizing algorithm and an effective feature set selection procedure. Automated detection can guide treatment decisions, help prognostication, and study the pathophysiology of bipolar disorders [47]. In this study, a radial basis function neural network was successfully used for this pattern recognition task. The results show that discriminating between BD patients and healthy controls with high accuracy can be possible by using our proposed NN classification framework. A maximum classification accuracy of 96.7% was obtained, demonstrating the potential clinical use to classify BD patients by means of EEG datasets.
The proposed algorithm was analysed with different classification methods described in the literature. In the comparison of the systems, it was possible to appreciate the considerable improvement achieved by the proposed NN. This resulted in a tool that facilitates the automatic analysis of EEG signals to aid in the diagnosis of bipolar disorder. One of the limitations of the NN systems is the initialization of the centres and the choice of the base function [43,44]. In our study, the fuzzy initialization of the RBF neural network was used to improve performance. Different techniques have been developed that allow an optimal combination of the results of simple classifiers through alpha integration to exploit the complementarities of simple classifiers under an optimization criterion [54]. The use of NN method brings many advantages. For example, they simplify the configuration of the network, the training method is faster, and the approach capabilities are improved. In addition, the NN proposed introduces an innovative contribution, such as the fuzzy initialization of the network.
Finally, Table 6 provides a comparison of the performance for the proposed method and different relevant related research works presented in the literature. The references were selected because they report novel algorithms classifying BD. Each column in the table specifies the reference and year of publication, acquisition data employed, classification system, and accuracy obtained. In this regard, it is important to remark that different data or training processes may lead to distinct performance results. Hence, the methods compared can present unlike accuracy depending on the input information/training employed. However, this sort of comparison analysis helps to understand the advantages of the proposed classification method. The proposed RBF system has advantages in comparison with the other classification algorithms. Those advantages include: (i) high accuracy close to 97%; (ii) simple network configuration; (iii) fast training procedures; and (iv) potential to be integrated into real-time commercial tools due to its low computational complexity.

Conclusions
In this paper, a novel neuronal network model based on radial basis functions employing a fuzzy means algorithm for the classification of bipolar disorder patients is presented. The method proposed in this work achieved the highest values of balanced accuracy, recall, precision, and F 1 score-higher than those achieved with other classical methods, i.e., Bayesian linear discriminant analysis, support vector machines, Gaussian naive Bayes, K-nearest neighbours, or decision trees. This guarantees its reliability for the automatic classification of the pathology treated in this study. Experimental results obtained from real EEG records illustrate the high accuracy of the proposed approach. Therefore, the proposed radial basis function-based neural network can be a complementary tool to help healthcare personnel diagnose brain impairments such as bipolar disorder.