An Improved Brain-Inspired Emotional Learning Algorithm for Fast Classification

Classification is an important task of machine intelligence in the field of information. The artificial neural network (ANN) is widely used for classification. However, the traditional ANN shows slow training speed, and it is hard to meet the real-time requirement for large-scale applications. In this paper, an improved brain-inspired emotional learning (BEL) algorithm is proposed for fast classification. The BEL algorithm was put forward to mimic the high speed of the emotional learning mechanism in mammalian brain, which has the superior features of fast learning and low computational complexity. To improve the accuracy of BEL in classification, the genetic algorithm (GA) is adopted for optimally tuning the weights and biases of amygdala and orbitofrontal cortex in the BEL neural network. The combinational algorithm named as GA-BEL has been tested on eight University of California at Irvine (UCI) datasets and two well-known databases (Japanese Female Facial Expression, Cohn–Kanade). The comparisons of experiments indicate that the proposed GA-BEL is more accurate than the original BEL algorithm, and it is much faster than the traditional algorithm.


Introduction
Classification has been widely used in the area of machine learning, pattern recognition and data mining.Various artificial intelligence methods, such as the artificial neural network (ANN) [1], support vector machine (SVM) [2], decision tree [3], extreme learning machine [4], the linear regression classifier [5] and other classifiers, have been proposed for classification problems.Among these methods, ANN is very popular due to the features of self-learning, self-adaptive and high generalization capability.However, the traditional ANN has been proven to have some significant drawbacks, such as low training speed, high computational complexity, and the convergence rate makes it hard to meet the requirements of real-time classification [6].
Recently, owing to the neurobiology and cognition research of emotion, emotional intelligence is playing an important role in artificial intelligence [7], and it has attracted an increasing interest around the world.Many bio-inspired brain emotional learning (BEL) models have been proposed and successfully applied in intelligent engineering applications [8,9].These models are based on the amygdala-orbitofrontal model proposed by Morén et al. [10], which was inspired by the LeDoux's anatomical findings of the limbic system in mammalian brain [11].According to the LeDoux' finding, amygdala and orbitofrontal cortex are two main parts of the limbic system, and they interact to process emotional stimulus correctly and rapidly.Therefore, BEL-based models that mimic the high speed of the emotional learning mechanism in the limbic system have the superior features of fast learning and quickly reacting, and they are widely used in classification [12], prediction [13] and control applications [14].
In the amygdala-orbitofrontal model, the reward signal is important to adjust the weights of amygdala and orbitofrontal cortex in the emotional learning process, but it is not clearly defined so far.Many researchers have proposed different versions of BEL models based on the amygdala-orbitofrontal model, as well as different reward signal determinations.Lucas et al. [15] proposed the BEL-based controller and explicitly determined the reward signal as PID formulization, which has been successfully applied in intelligent engineering applications.Abdi et al. [16] applied a BEL model to prediction short-term traffic flow and defined the reward signal as the multiplication of some related weight.Parsapoor [17] presents a BEL-based architecture for chaotic time series prediction.Although these BEL-based models achieve success in applications, they are based on reinforcement learning to adjust the weights of amygdala and orbitofrontal cortex in the BEL model, and they cannot be generalized to other issues for their characteristic of model sensitive.Lotfi et al. [18] proposed a novel BEL-based pattern recognizer, which employed a pattern-target method instead of the reward-based reinforcement learning to update the weights of amygdala and orbitofrontal cortex.Although this is a model free method, it reduces the precision of the learning process, and the classification accuracy needs to be further improved.
In this study, we aim to propose a more accurate BEL algorithm for fast classification.It has been demonstrated that the neural network classification accuracy can be substantially improved by optimizing the weights of the network [19].There are many optimization methods, such as genetic algorithm (GA) [20], particle swarm optimization (PSO) [21], differential evolution (DE) [22], and so on.Compared to GA, PSO and DE are relatively simple and easily converge, but they are easily trapped into local minimum in the searching process and show a slight compromise in accuracy.While GA can evolve the population to a better area of the search space and it has good global searching ability [19], the BEL algorithm has the advantage of fast learning, but shows a compromise in accuracy.We aim to optimize the BEL algorithm and make it more accurate.Therefore, we adopted the effective GA to optimize the BEL model.The integrated algorithm named as GA-BEL takes advantage of the fast learning and low computational complexity of BEL, as well as the global optimum solution of GA.Thus, GA-BEL is expected to achieve better performances than the original BEL in classification applications.The GA-BEL is tested on eight University of California at Irvine (UCI) classification datasets, for two facial recognition problem.Results indicated the superiority of the proposed GA-BEL in terms of classification accuracy and execution speed.
The rest of this paper is organized as follows.Section 2 offers the brief related work on BEL.The improved BEL neural network and the implementation of GA-BEL algorithm are described in Section 3. In Section 4, the detailed experimental design is described, as well as the empirical results and discussions.Finally, conclusions and future work are summarized in Section 5.

Anatomical Foundation and Related Works
The description of the emotional brain is based on the limbic system theory [11].Figure 1a [18] shows the limbic system in the emotional brain and its components, including the sensory cortex, thalamus, amygdala, orbitofrontal cortex, etc.There are two main parts among these components.One is amygdale, which plays a critical role in emotional learning and reacting.The other is orbitofrontal cortex, which assists amygdala to process emotional stimulus.LeDoux [23] argues that emotional stimuli can reach the amygdala by two different ways, as shown is Figure 1b.One is short and fast, coming directly from the thalamus, and the other is long and slow, coming from the sensory cortex.Amygdala is properly situated to reach the stimuli extremely quickly and produce the required reaction.Due to the existence of a short path in the emotional brain, emotional stimuli are processed much faster than normal stimuli.Motivated by LeDoux's anatomical findings in the emotional brain, Morén et al. [10] proposed the amygdala-orbitofrontal model in 2000, the framework of amygdala-orbitofrontal model is shown in Figure 2. The structure is inherited from some parts of the limbic system (e.g., the amygdala, thalamus, sensory cortex) and imitates the interaction between those parts of the limbic system.In the amygdala-orbitofrontal model, amygdala and orbitofrontal cortex are essential for emotional learning and reacting.Amygdala receives emotional stimuli from the sensory cortex and thalamus, as well as the external reward signal, and it interacts with the orbitofrontal cortex and reacts to the emotional stimuli based on the reward signal.The orbitofrontal cortex receives sensory input from the sensory cortex, as well, and evaluates the amygdala's response to prevent inappropriate learning connections.They interact frequently to mimic the functionality of the emotional brain responsible for processing emotional stimulus.In the amygdala-orbitofrontal model, Si is the sensory input; Aj is the output of the amygdala; Oj is the output of the orbitofrontal cortex.The reward signal Rew is used to update the weights of amygdale and orbitofrontal cortex in emotional learning, and learning rules are expressed as follows [10]: ) where Δvi and Δwi represent the weight of amygdala and orbitofrontal cortex, respectively, and α and β are learning rates, which are used to adjust the learning speed.The reward signal Rew is used to adjust the weights of amygdala and orbitofrontal cortex in the emotional learning process.Various modified BEL models that are based on the amygdala-orbitofrontal model have been proposed, as well as the reward signal determination.In the modified BEL model in [16], the reward signal Rew is defined as follows: Motivated by LeDoux's anatomical findings in the emotional brain, Morén et al. [10] proposed the amygdala-orbitofrontal model in 2000, the framework of amygdala-orbitofrontal model is shown in Figure 2. The structure is inherited from some parts of the limbic system (e.g., the amygdala, thalamus, sensory cortex) and imitates the interaction between those parts of the limbic system.In the amygdala-orbitofrontal model, amygdala and orbitofrontal cortex are essential for emotional learning and reacting.Amygdala receives emotional stimuli from the sensory cortex and thalamus, as well as the external reward signal, and it interacts with the orbitofrontal cortex and reacts to the emotional stimuli based on the reward signal.The orbitofrontal cortex receives sensory input from the sensory cortex, as well, and evaluates the amygdala's response to prevent inappropriate learning connections.They interact frequently to mimic the functionality of the emotional brain responsible for processing emotional stimulus.Motivated by LeDoux's anatomical findings in the emotional brain, Morén et al. [10] proposed the amygdala-orbitofrontal model in 2000, the framework of amygdala-orbitofrontal model is shown in Figure 2. The structure is inherited from some parts of the limbic system (e.g., the amygdala, thalamus, sensory cortex) and imitates the interaction between those parts of the limbic system.In the amygdala-orbitofrontal model, amygdala and orbitofrontal cortex are essential for emotional learning and reacting.Amygdala receives emotional stimuli from the sensory cortex and thalamus, as well as the external reward signal, and it interacts with the orbitofrontal cortex and reacts to the emotional stimuli based on the reward signal.The orbitofrontal cortex receives sensory input from the sensory cortex, as well, and evaluates the amygdala's response to prevent inappropriate learning connections.They interact frequently to mimic the functionality of the emotional brain responsible for processing emotional stimulus.In the amygdala-orbitofrontal model, Si is the sensory input; Aj is the output of the amygdala; Oj is the output of the orbitofrontal cortex.The reward signal Rew is used to update the weights of amygdale and orbitofrontal cortex in emotional learning, and learning rules are expressed as follows [10]: ) where Δvi and Δwi represent the weight of amygdala and orbitofrontal cortex, respectively, and α and β are learning rates, which are used to adjust the learning speed.The reward signal Rew is used to adjust the weights of amygdala and orbitofrontal cortex in the emotional learning process.Various modified BEL models that are based on the amygdala-orbitofrontal model have been proposed, as well as the reward signal determination.In the modified BEL model in [16], the reward signal Rew is defined as follows: In the amygdala-orbitofrontal model, S i is the sensory input; A j is the output of the amygdala; O j is the output of the orbitofrontal cortex.The reward signal Rew is used to update the weights of amygdale and orbitofrontal cortex in emotional learning, and learning rules are expressed as follows [10]: where ∆v i and ∆w i represent the weight of amygdala and orbitofrontal cortex, respectively, and α and β are learning rates, which are used to adjust the learning speed.The reward signal Rew is used to adjust the weights of amygdala and orbitofrontal cortex in the emotional learning process.
Various modified BEL models that are based on the amygdala-orbitofrontal model have been proposed, as well as the reward signal determination.In the modified BEL model in [16], the reward signal Rew is defined as follows: where r stands for the factors of the reinforcement agent, and w represents the related weights.Although these BEL-based models achieve success in the applications, most of them are based on the reinforcement learning to adjust the weights of amygdala and orbitofrontal cortex; they are model sensitive and cannot be generalized to other issues.
Lotfi et al. [18] employed activation functions and the target value (T) of the input pattern to update the weights of amygdala and orbitofrontal cortex in the learning phase, i.e., Rew = T. Thus, the supervised learning rules are described as follows: where v k j and w k j represent the weight of amygdala and orbitofrontal cortex, respectively, T k is the target value associated with the k-th pattern pk, E k a is the output of amygdala, E k is the final output, k is a learning step and α and β are learning rates.Additionally, γ is the decay rate in the amygdala learning rule.The model can be employed to learn the pattern-target relationship of an application by using emotional learning, but this method reduces the precision of the learning process, and the classification accuracy needs to be improved.

Improved BEL Neural Network
In contrast to previous BEL-based models, we apply the fitness function in GA instead of reinforcement learning to update the weights of amygdale and orbitofrontal cortex in emotional learning.Therefore, we delete the reward signal in the BEL neural network.In addition, according to the biological interaction between amygdale and orbitofrontal cortex in the emotional learning, we add the bias for each part.The improved BEL-based neural network is shown in Figure 3, which consists of four common subsystems including thalamus, sensory cortex, orbitofrontal cortex and amygdala.Amygdala and orbitofrontal cortex are the two main subsystems, which are mainly responsible for emotional learning.
Algorithms 2017, 10, 70 4 of 20 where r stands for the factors of the reinforcement agent, and w represents the related weights.
Although these BEL-based models achieve success in the applications, most of them are based on the reinforcement learning to adjust the weights of amygdala and orbitofrontal cortex; they are model sensitive and cannot be generalized to other issues.
Lotfi et al. [18] employed activation functions and the target value (T) of the input pattern to update the weights of amygdala and orbitofrontal cortex in the learning phase, i.e., Rew = T .Thus, the supervised learning rules are described as follows: ) , 1, 2,..., where v k j and w k j represent the weight of amygdala and orbitofrontal cortex, respectively, T k is the target value associated with the k-th pattern pk, E k a is the output of amygdala, E k is the final output, k is a learning step and α and β are learning rates.Additionally, γ is the decay rate in the amygdala learning rule.The model can be employed to learn the pattern-target relationship of an application by using emotional learning, but this method reduces the precision of the learning process, and the classification accuracy needs to be improved.

Improved BEL Neural Network
In contrast to previous BEL-based models, we apply the fitness function in GA instead of reinforcement learning to update the weights of amygdale and orbitofrontal cortex in emotional learning.Therefore, we delete the reward signal in the BEL neural network.In addition, according to the biological interaction between amygdale and orbitofrontal cortex in the emotional learning, we add the bias for each part.The improved BEL-based neural network is shown in Figure 3, which consists of four common subsystems including thalamus, sensory cortex, orbitofrontal cortex and amygdala.Amygdala and orbitofrontal cortex are the two main subsystems, which are mainly responsible for emotional learning.The model is presented as a multiple input-single output architecture; amygdala receives m input vectors SI = [S 1 , S 2 , . . ., S m ] from the sensory cortex and A th from the thalamus; A th is calculated by Equation ( 4) [10]: As shown in Figure 3, v i is the amygdala weight, and b a is the bias of amygdala neuron.E A is the output of the amygdala, and it is calculated by Equation ( 5) [10]: Furthermore, the orbitofrontal cortex receives the input patterns from the sensory cortex.E O is the output of the orbitofrontal cortex that is used to inhibit the amygdala's output, which is calculated by Equation ( 6) [10]: where w i are orbitofrontal cortex weights and b o is the bias of orbitofrontal cortex neuron.Finally, the final output is simply calculated by Equation ( 7) [10]: where E is the final output that represents the correct amygdala response.
The improved single-output BEL neural network can be learned by pattern-target examples; it is model free and can be utilized in classification application.The number of features in the input vector determines the number of neurons in the thalamus and sensory cortex units, and the number of classes determines the number of orbitofrontal cortex and amygdala units.As a result, the improved BEL model can be extended to generalize in multi-classification applications, and the architecture is shown in Figure 4.As shown in Figure 3, vi is the amygdala weight, and ba is the bias of amygdala neuron.EA is the output of the amygdala, and it is calculated by Equation ( 5): Furthermore, the orbitofrontal cortex receives the input patterns from the sensory cortex.EO is the output of the orbitofrontal cortex that is used to inhibit the amygdala's output, which is calculated by Equation ( 6): where wi are orbitofrontal cortex weights and bo is the bias of orbitofrontal cortex neuron.Finally, the final output is simply calculated by Equation ( 7): where E is the final output that represents the correct amygdala response.
The improved single-output BEL neural network can be learned by pattern-target examples; it is model free and can be utilized in classification application.The number of features in the input vector determines the number of neurons in the thalamus and sensory cortex units, and the number of classes determines the number of orbitofrontal cortex and amygdala units.As a result, the improved BEL model can be extended to generalize in multi-classification applications, and the architecture is shown in Figure 4.In the proposed m-n BEL network as shown in Figure 4, m is the number of inputs, and n is the number of outputs; there are n amygdala-orbitofrontal cortex parts; in this network, each output unit is associated with one amygdala-orbitofrontal cortex part that interacts separately.Although the distinctive feature of BEL network is fast learning, the network easily is trapped into local minimum.Therefore, the performance of classification accuracy needs to be improved.

GA-BEL Algorithm
In this paper, GA is adopted to optimize the initial weights and biases of amygdala and orbitofrontal cortex in the BEL neural network.There are three steps in GA-BEL for classification.

•
Step 1: Chromosome encoding.For the advantages of high precision, in this paper, real encoding is adopted to acquire optimal results.One real number string presents one chromosome, which consists of the connection weights and bias of orbitofrontal cortex and amygdala.According to the structure of the BEL neural network, each chromosome is initialized as follows: where w i and b o represent orbitofrontal cortex weights and biases, respectively.v i and b a represent amygdala weights and bias, respectively.The values of them are usually chosen in [−1, 1].m is the number of input features; thus, the number of genes in each chromosome is 2m + 3.

•
Step 2: Optimization.The value of individual fitness can be calculated by the fitness function.
The most optimal fitness value corresponding to the best individual can be found by the selection, crossover and mutation in GA.
(1) Fitness function: The fitness function is used to evaluate the adaptability of each individual in the whole population, and the individual fitness will provide reference to selection operation.As a consequence, we select the sample variance as the evaluation criteria of weights, and the fitness function is defined as: where E k is the response to the k-th input pattern with given weights in Ch k , which can be calculated by Equation (7).T k is the target value; n is the number of pattern-targets.On the definition of the fitness function, the minimum output of the fitness function means the minimum total error for all training samples.Thus, the overall goal of genetic operators is to find the minimum value of the fitness function.
(2) Selection: The selection operation can choose better individuals for the subsequent iteration.There are several methods of selection in GA, such as the roulette wheel method, the championship method, the optimum maintaining strategy, etc. [24].In this study, according to the definition of the fitness function, we adopt the roulette wheel method based on the fitness ratio, for the method works by selecting chromosomes with a higher probability of survival, it is intuitive and widely used in GA.According to the definition of the fitness function, the selection probability p i for each individual i is: where F i is the fitness value of individual i, k is the coefficient and n is the number of individuals in a group.
(3) Crossover: To enlarge the diversity and searching space, the crossover operation is used to produce two new individuals by exchanging information between the parent individuals.For crossover mechanisms, single-point crossover, two-point crossover, arithmetic crossover and multipoint crossover are reported [24].This paper uses a real coded GA strategy in chromosomes.Here, arithmetic crossover is adopted; it is a method with high precision.The rule is given as follows [25]: where c k i and c k j represent the chromosomes for which occur crossovers in the k-th bit, n denotes the number of iterations and α is a random number uniformly distributed in [0, 1].
(4) Mutation: To further enhance the local search capability of GA, mutation is another way to create new individuals.The mutation operation is needed to change one or some gene values of the chromosome.For mutation mechanisms, simple mutation, uniform mutation, non-uniform mutation and boundary mutation are reported [24].Here, we adopt the uniform mutation strategy with a small mutation probability, for it can make a uniform search in the searching space in early generations, and it can also greatly reduce the risk of premature convergence.The mutation operation is defined as [26]: where c j i represents the chromosome for which occurs mutation in the j-th bit, c max is the upper limit of the allele and c min is the lower limit, g represents current generation, G max is the max iteration and r 1 and r 2 are random numbers within the range [0, 1].

•
Step 3: Classification.After the operations of selection, crossover and mutation, the best chromosome can be found that represents the best weights.The original weights and biases are reassigned by the best chromosome, and the trained network is used to classify.The iterative process can be terminated when the result reaches a defined condition.The flowchart of the GA-BEL algorithm is given in Figure 5.
In this paper, the GA-BEL algorithm is implemented by MATLAB.According to the fitness function, the process of optimization is to evaluate the output of the BEL network.Therefore, the main task is the simulation of the BEL network, which can be described as follows:  In this paper, the GA-BEL algorithm is implemented by MATLAB.According to the fitness function, the process of optimization is to evaluate the output of the BEL network.Therefore, the main task is the simulation of the BEL network, which can be described as follows: /

Simulation Results
In this section, two different experiments are constructed to evaluate the performance of the proposed GA-BEL algorithm.The first experiment is built to show the performance of GA-BEL on the benchmark UCI datasets.The second experiment is arranged to test the GA-BEL on the well-known Japanese Female Facial Expression (JAFFE) and Cohn-Kanade facial expression databases.The comparative experiments are carried out in the two cases.Both experiments are performed in MATLAB R2010b running in an Intel core-i7 3.4-GHz CPU with 8.00 GB RAM and the Windows 7 operation system.

Datasets' Description
Eight benchmark datasets are from the University of California at Irvine (UCI) repository [27] of the machine learning database.In this paper, binary and multiclass datasets are both included, which

Simulation Results
In this section, two different experiments are constructed to evaluate the performance of the proposed GA-BEL algorithm.The first experiment is built to show the performance of GA-BEL on the benchmark UCI datasets.The second experiment is arranged to test the GA-BEL on the well-known Japanese Female Facial Expression (JAFFE) and Cohn-Kanade facial expression databases.The comparative experiments are carried out in the two cases.Both experiments are performed in MATLAB R2010b running in an Intel core-i7 3.4 GHz CPU with 8.00 GB RAM and the Windows 7 operation system.Eight benchmark datasets are from the University of California at Irvine (UCI) repository [27] of the machine learning database.In this paper, binary and multiclass datasets are both included, which are of relatively high or low dimensions, large or small sizes, and the details are summarized in Table 1.Classification performance can be evaluated by the confusion matrix as described in Figure 6, in which measures such as accuracy, precision and recall are commonly used to assess the performance of bankruptcy classification systems.

Measure for Performance Evaluation
Classification performance can be evaluated by the confusion matrix as described in Figure 6, in which measures such as accuracy, precision and recall are commonly used to assess the performance of bankruptcy classification systems.where TP is the number of true positives, FN is the number of false negatives, TN is the number of true negatives and FP is the number of false positives.The accuracy, recall and precision are calculated by the following formulas: Precision 100%

TP TN TP FP TN FN TP TP FN TP TP FP
The performance of the execution speed can be evaluated by the computing time in the training and testing process.

Classification on Breast Cancer Dataset
The detailed parameter settings in GA-BEL are separate for BEL and GA.For the BEL network, the input patterns determine the number of input neurons, and the classes determines the number of output neurons.Therefore, for the Breast Cancer dataset, the number of the input nodes and output nodes is set to nine and two, respectively.The number of hidden nodes is set to six after many tests.The initialization of the weights and biases is chosen randomly within the range of [−1, 1].For GA, the value of the population size is based on the chromosome encoding defined in Equation ( 8), in where TP is the number of true positives, FN is the number of false negatives, TN is the number of true negatives and FP is the number of false positives.The accuracy, recall and precision are calculated by the following formulas: The performance of the execution speed can be evaluated by the computing time in the training and testing process.

Classification on Breast Cancer Dataset
The detailed parameter settings in GA-BEL are separate for BEL and GA.For the BEL network, the input patterns determine the number of input neurons, and the classes determines the number of output neurons.Therefore, for the Breast Cancer dataset, the number of the input nodes and output nodes is set to nine and two, respectively.The number of hidden nodes is set to six after many tests.The initialization of the weights and biases is chosen randomly within the range of [−1, 1].For GA, the value of the population size is based on the chromosome encoding defined in Equation (8), in which the number of genes in each chromosome is 2m + 3; m is the number of features.The Breast Cancer dataset has nine features.Thus, the population size was set to 21 (2 × 9 + 3 = 21).Other parameters' settings were based on the methods related to artificial neural network optimized by GA [25,26].Finally, we obtained the best configuration after many tests.Here, the population groups and maximal generation are set to 200 and 100, respectively.The crossover probability and mutation probability are set to 0.8 and 0.03, respectively.Seventy percent of samples are used as training data, and the remaining 30% of samples are used for validation and test purposes.The simulation results are shown in Figure 7.
Figure 7a shows the best and mean fitness corresponding to each generation during the evolution.It can be observed that the fitness curves gradually improved from Generation 1-100 and exhibits no significant improvements after Iteration 90, eventually stopping at Generation 100.This phenomenon demonstrates that GA-BEL comes to convergence and obtains the best chromosome by the evolution.
From the confusion matrix in Figure 7b, we can see that the classification accuracy (Row 3-Column 3) is 96.1% and 97.6% in the training and testing samples, respectively.The precision (Row 1-Column 3) and recall (Row 3-Column 2) of the classification are also given in the two confusion matrices.
phenomenon demonstrates that GA-BEL comes to convergence and obtains the best chromosome by the evolution.
From the confusion matrix in Figure 7b, we can see that the classification accuracy (Row 3-Column 3) is 96.1% and 97.6% in the training and testing samples, respectively.The precision (Row 1-Column 3) and recall (Row 3-Column 2) of the classification are also given in the two confusion matrices.As described above, the proposed GA-BEL aims to enhance the BEL classification accuracy by optimizing the parameters of weights and biases in the BEL network.For comparison, we conducted the comparative study between GA-BEL and BEL on the Breast Cancer dataset.Fifty trials have been conducted for each algorithm; the results in terms of precision, recall, accuracy and computing time were recorded.To ensure that the improvements obtained by using the proposed GA-BEL are significant, a statistical validation based on the Kruskal-Wallis (K-W) test [28] was As described above, the proposed GA-BEL aims to enhance the BEL classification accuracy by optimizing the parameters of weights and biases in the BEL network.For comparison, we conducted the comparative study between GA-BEL and BEL on the Breast Cancer dataset.Fifty trials have been conducted for each algorithm; the results in terms of precision, recall, accuracy and computing time were recorded.To ensure that the improvements obtained by using the proposed GA-BEL are significant, a statistical validation based on the Kruskal-Wallis (K-W) test [28] was performed.Results including the average precision, recall, accuracy, computing time, as well as the p-value by the K-W test are listed in Table 2.As illustrated in Table 2, the results obtained from GA-BEL outperform BEL.The average accuracy has been improved by 2.6%; the resultant p-value by the K-W test is 0.0183; the value is less than a chosen significance level of 0.05.It can be confirmed that there is statistically significance between the results.The other comparisons in terms of average precision, recall and the computing time are at the statistical significance level of 0.05.

Classification on the Heart Dataset
To evaluate the GA-BEL on the relatively small-sized and high dimensional datasets, we chose the Heart dataset for the test.The configuration in the test changed with every dataset.This is because the attributes of the datasets are different.The Heart dataset has 270 samples with 13 features.According to the features of the Heart dataset, the input nodes, hidden nodes and output nodes are respectively set to 13, 6, 2 after many tests.The initialization of the weights and biases is chosen randomly within the range of [−1, 1].For GA, the population size was set to 29 based on the chromosome encoding defined in Equation ( 8).The Heart dataset has 13 features.Thus, the population size was set to 29 (2 × 13 + 3 = 29).Other parameters' settings were based on the methods related to artificial neural network optimized by GA [25,26].Finally, we obtained the best configuration after many tests.Here, the population groups and maximal generation were set to 800 and 700, respectively.The crossover probability and mutation probability were set to 0.7 and 0.02, respectively.We used 70% of samples as training data, and the remaining 30% of samples were used to serve the validation and test purposes.The simulation results are shown in Figure 8.

.2. Classification on the Heart Dataset
To evaluate the GA-BEL on the relatively small-sized and high dimensional datasets, we chose the Heart dataset for the test.The configuration in the test changed with every dataset.This is because the attributes of the datasets are different.The Heart dataset has 270 samples with 13 features.According to the features of the Heart dataset, the input nodes, hidden nodes and output nodes are respectively set to 13, 6, 2 after many tests.The initialization of the weights and biases is chosen randomly within the range of [−1, 1].For GA, the population size was set to 29 based on the chromosome encoding defined in Equation (8).The Heart dataset has 13 features.Thus, the population size was set to 29 (2 × 13 + 3 = 29).Other parameters' settings were based on the methods related to artificial neural network optimized by GA [25,26].Finally, we obtained the best configuration after many tests.Here, the population groups and maximal generation were set to 800 and 700, respectively.The crossover probability and mutation probability were set to 0.7 and 0.02, respectively.We used 70% of samples as training data, and the remaining 30% of samples were used to serve the validation and test purposes.The simulation results are shown in Figure 8.   Figure 8a shows the best and mean fitness corresponding to each generation during the evolution.It can be observed that the fitness curves gradually improved from Generation 1-700 and exhibits no significant improvements after Iteration 600, eventually stopping at Generation 700. Figure 8a shows the best and mean fitness corresponding to each generation during the evolution.It can be observed that the fitness curves gradually improved from Generation 1-700 and exhibits no significant improvements after Iteration 600, eventually stopping at Generation 700.This phenomenon demonstrates that GA-BEL comes to convergence after 700 generations and obtains the best chromosome by the evolution.
From the confusion matrix in Figure 8b, we can see that the classification accuracy (Row 3-Column 3) is 86.2% and 88.9% in the training and testing samples, respectively.The precision (Row 1-Column 3) and recall (Row 3-Column 2) of the classification are also given in the two confusion matrices.
For comparison, we conducted the comparative study between GA-BEL and BEL on the Heart dataset.Fifty trials have been conducted for each algorithm, and the results in terms of precision, recall, accuracy and computing time were recorded.To ensure that the improvements obtained by using the proposed GA-BEL were significant, a statistical validation based on the Kruskal-Wallis (K-W) test was performed.Results including the average precision, recall, accuracy, computing time, as well as the p-value by the K-W test are listed in Table 3.As illustrated in Table 2, the results obtained from GA-BEL outperform BEL.The average accuracy has been improved by 2.3%; the resultant p-value by the K-W test is 0.0258; the value is less than a chosen significance level of 0.05.It can be confirmed that there is statistically significance between the results.The other comparisons in terms of average precision, recall and the computing time are at the statistical significance level of 0.05.

Total Comparison and Discussion
To verify the effectiveness of the proposed model, GA-BEL was compared with three other reference algorithms (SVM [29], LS-SVM [30] and BEL [18]) on the eight UCI datasets.In this study, the SVM is performed with Gaussian kernel, and 50 trials have been conducted for each problem.Table 4 shows the average precision, recall, accuracy and the computing time obtained on the eight classification problems.
As observed from Table 4, comparing to SVM and LS-SVM, the BEL-based methods have the superior feature of fast training, because they mimic the high speed of emotional processing in the emotional brain, and the computational complexity is low.While the training of SVM involves a quadratic programming problem, so the computational complexity is usually high; thus the training speed is lower than the BEL-based method.
Compared with the original BEL algorithm, GA-BEL shows a significant improvement in terms of accuracy.GA-BEL employs GA to optimize the initial weights and biases of amygdala and orbitofrontal cortex in the BEL neural network; it can evolve the population to a better area of the search space and avoid falling into local minima.Moreover, the GA-BEL achieves faster training speed in large-sized datasets for the GA-BEL, which may encourage a grouping effect.Therefore, GA-BEL is more efficient and effective when dealing with large-scale data classification problems.
For further comparison, we list the classification accuracies of the previous methods, which were investigated on the same datasets' classifications.Because the experiment environment is different, we evaluate the GA-BEL with respect to the average classification accuracy.Datasets are specially chosen for each case, including high or low dimensions, large or small sizes, as shown in Table 5.The results indicate the superiority of the proposed GA-BEL.Classification is necessary in facial expression recognition; we evaluate the proposed GA-BEL algorithm on two well-known databases, i.e., JAFFE [34] and Cohn-Kanade [35].The JAFFE database contains 213 grayscale images with 256 × 256 pixel resolution of 10 Japanese females.There are seven facial expressions (angry, surprise, happy, sadness, fear, disgust and neutral), so the facial expression recognition in JAFFE is posed as a seven-class classification problem.The Cohn-Kanade database consists of 2105 digitized image sequences of males and females in the age range of 18-30 years.There are six basic facial expressions (happy, angry, disgust sadness, fear, surprise).Image sequences of each expression are from neutral to peak intensity with the resolution of 640 × 480 or 640 × 490 pixels.
In this study, we used all of the images of the JAFFE and 993 images that represent each expressions in Cohn-Kanade.We adopted the salient patch-based method [36] to extract features from each image.Then, GA-BEL was employed for classifying the statistical features.The selected salient facial patches with a lower number of histogram bins were used to reduce the computation, which contributed significantly to the classification.Ten-fold cross-validation was used to evaluate the proposed approach.It achieved high recognition rates with fast speed; it can successfully meet the requirements of real-time facial expression recognition.Interestingly, the surprise expression was usually difficult to recognize in previous studies.However, in our experiments, there were no difficulties in the surprise expression classification.On the contrary, we obtained pretty good accuracies on the two databases.Figure 9 shows the best record of the surprise expression recognition in JAFEE and Cohn-Kanade via 10-fold cross-validation.In this study, we used all of the images of the JAFFE and 993 images that represent each expressions in Cohn-Kanade.We adopted the salient patch-based method [36] to extract features from each image.Then, GA-BEL was employed for classifying the statistical features.The selected salient facial patches with a lower number of histogram bins were used to reduce the computation, which contributed significantly to the classification.Ten-fold cross-validation was used to evaluate the proposed approach.It achieved high recognition rates with fast speed; it can successfully meet the requirements of real-time facial expression recognition.Interestingly, the surprise expression was usually difficult to recognize in previous studies.However, in our experiments, there were no difficulties in the surprise expression classification.On the contrary, we obtained pretty good accuracies on the two databases.Figure 9 shows the best record of the surprise expression recognition in JAFEE and Cohn-Kanade via 10-fold cross-validation.As shown in Figure 9a, the accuracy is stable after 30 generations, and the classification accuracy is 97.19%.Figure 9b shows that the accuracy is stable after 18 generations, and the classification accuracy is 98.32%.However, there are still some difficulties that occur on the classification of the sadness and fear expressions.The results of correctly-classified and misclassified expressions are shown in Figure 10.As shown in Figure 9a, the accuracy is stable after 30 generations, and the classification accuracy is 97.19%.Figure 9b shows that the accuracy is stable after 18 generations, and the classification accuracy is 98.32%.However, there are still some difficulties that occur on the classification of the sadness and fear expressions.The results of correctly-classified and misclassified expressions are shown in Figure 10.To evaluate the performance of GA-BEL, 10-fold cross-validation was used.The average recognition accuracies of each facial expression on two database are given in Tables 6 and 7.  To evaluate the performance of GA-BEL, 10-fold cross-validation was used.The average recognition accuracies of each facial expression on two database are given in Tables 6 and 7. Average Accuracy 96.17 Table 6 shows the classification confusion matrix on the JAFFE, from which we can see that all of the expressions are recognized with high accuracies.The accuracy of happy is the highest, which is 98.47%.However, there are still some difficulties that occur on the classification expressions of disgust, fear and sadness.Especially, disgust is misclassified as fear with 2.71%.On the Cohn-Kanade database, it is observed from Table 7 that the best result occurs on the happy expression, as well, and the largest difficulty occurs on the classification of the sadness expressions.Explicitly, sadness is misclassified as fear with 2.35%.

Comparison and Discussion
In order to verify the performance of the proposed GA-BEL in terms of recognition accuracy and execution speed, we compare GA-BEL with SVM [29], LS-SVM [30] and the original BEL [18] on JAFEE and Cohn-Kanade.In this study, the Gaussian kernel was used for SVM.The ten-fold cross-validation strategy was employed to perform the comparisons.The detailed results including the average recognition accuracies and the computing time per image are shown in Table 8.As shown in Table 8, GA-BEL achieved the best performance in JAFFE and Cohn-Kanade.For effectiveness, the average recognition accuracies are 2.22%, 1.01% and 1.54% higher than SVM, LS-SVM and the original BEL on the JAFFE dataset and 1.36%, 1.05% and 2.49% higher than SVM, LS-SVM and the original BEL on the Cohn-Kanade dataset, respectively.For efficiency, the computational cost of the BEL method is significantly less.Especially, GA-BEL takes only 0.2736 s and 0.2958 s to process one image in JAFFE and Cohn-Kanade, respectively, which is much faster than SVM and LS-SVM.
For further comparison, we list the results of the previous methods that investigated the facial expression recognition, as shown in Table 9.The results indicate that GA-BEL can obtain better classification accuracy than other traditional methods.

Conclusions
We have proposed an improved BEL model based on GA to cope with the problems of classification and pattern recognition.In contrast to the original BEL model, we apply the fitness function in GA instead of reinforcement learning to update the weights and biases of amygdale and orbitofrontal cortex of the BEL neural network in emotional learning, which provides a more accurate and robust method by learning the pattern-target relationship of an application.GA-BEL is a biologically-inspired method that has the superior features of fast learning and low computational complexity.Two case studies are carried out on benchmark problems, classification on eight UCI datasets and facial expression recognition on the well-known JAFFE and Cohn-Kanade databases.Detailed comparisons of experiments indicate that the proposed GA-BEL achieves better accuracy compared to the original BEL, and it is more effective and more efficient than most traditional methods.
This study introduces emotional intelligence into artificial intelligence; it presents a novel approach to update the learning rules for the BEL model, which presents an important perspective for research related to machine learning.In future work, we will combine GA with particle swarm optimization to further improve the performance of the BEL model and apply it in real-time applications in the real world, such as pattern recognition in video images and big data analysis based on network data.

Figure 1 .
Figure 1.Limbic system and emotion circuits in the brain.(a) Limbic system; (b) emotion circuits.

Figure 1 .
Figure 1.Limbic system and emotion circuits in the brain.(a) Limbic system; (b) emotion circuits.

Figure 5 .
Figure 5.The flowchart of the GA-BEL algorithm.

Figure 5 .
Figure 5.The flowchart of the GA-BEL algorithm.

Figure 7 .
Figure 7. Simulation results of the GA-BEL on the Breast Cancer dataset.(a) Fitness curve during evolution; (b) classification confusion matrix for training and testing samples.

Figure 7 .
Figure 7. Simulation results of the GA-BEL on the Breast Cancer dataset.(a) Fitness curve during evolution; (b) classification confusion matrix for training and testing samples.

Figure 8 .
Figure 8. Simulation results of the GA-BEL on the Heart dataset.(a) Fitness curve during evolution; (b) classification confusion matrix for training and testing samples.

Figure 8 .
Figure 8. Simulation results of the GA-BEL on the Heart dataset.(a) Fitness curve during evolution; (b) classification confusion matrix for training and testing samples.

Algorithms 2017 ,
10, 70 15 of 20 sequences of each expression are from neutral to peak intensity with the resolution of 640 × 480 or 640 × 490 pixels.

Figure 9 .
Figure 9.The best record for the surprise expression recognition during the training stage.(a) Surprise expression recognition in JAFFE; (b) surprise expression recognition in Cohn-Kanade.

Figure 9 .
Figure 9.The best record for the surprise expression recognition during the training stage.(a) Surprise expression recognition in JAFFE; (b) surprise expression recognition in Cohn-Kanade.

Figure 10 .
Figure 10.Facial expressions recognition results.(a) Correctly-classified expression on the JAFEE database; (b) correctly-classified expression on the Cohn-Kanade database; (c) misclassified expressions; the true expressions are angry, angry, disgust and sadness in turn.

Figure 10 .
Figure 10.Facial expressions recognition results.(a) Correctly-classified expression on the JAFEE database; (b) correctly-classified expression on the Cohn-Kanade database; (c) misclassified expressions; the true expressions are angry, angry, disgust and sadness in turn.

Table 2 .
Experimental results of BEL vs. GA-BEL on the Breast Cancer dataset.

Table 3 .
Experimental results of BEL vs. GA-BEL on the Heart dataset.

Table 4 .
Performance comparisons of different algorithms.

Table 5 .
Classification accuracies obtained by the proposed method and the previous methods.

Table 6 .
Confusion matrix for the JAFFE database.

Table 7 .
Confusion matrix for the Cohn-Kanade database.

Table 6 .
Confusion matrix for the JAFFE database.

Table 7 .
Confusion matrix for the Cohn-Kanade database.

Table 8 .
The performance comparisons of different algorithms.

Table 9 .
Classification accuracies obtained with the proposed method and other methods.