1. Introduction
Machine learning enables the artificial generation of knowledge based on intelligent training of data. An artificial intelligent system learns from known (training) data and applies the gained knowledge to unknown (test) data. In the area of affective computing, machine learning is used to recognize emotion-related patterns in datasets based on specific features extracted from different modalities, such as facial, speech, text or biosignal information. The feature extraction step is a main key task within the recognition process as it delivers significant information related to a specific affective state. However, the large number of features that can be extracted from specific data might also lead to inefficient classifications in terms of recognition rates and computation time [
1]. Therefore, feature selection methods are used in the next step to select the most relevant and non-redundant features from the large number of extracted features. This step is crucial in the recognition workflow process to achieve optimal computations by enhancing the speed of the algorithms and increasing the rate of the recognition.
There are many feature selection methods available based on different strategies. In our previously developed processing workflow for affective computing [
2], three feature selection methods were presented including: Forward Selection, Backward Elimination, and Brute Force methods. In terms of computation time, it was mentioned that the Brute Force is usually the last option to be used, as it tries all possible combinations of features in order to select the ones leading to the highest performance. However, also the Backward Elimination, which starts with the whole set of extracted features and subsequently removes features, as well as the Forward Selection, which begins with an empty selection of features and subsequently adds features, are associated with high computational time depending on the classification method used.
Therefore, in this paper, we present another feature selection approach based on evolutionary algorithms to further optimize the computational process within our recognition workflow. Evolutionary algorithms is a generic term for a number of different procedures that use Darwinian-like evolutionary processes to solve difficult computational problems. They are based on the Darwinian principle using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover [
3]. In the 1960s, scientists started to study evolutionary systems to solve optimization problems [
4]. Genetic algorithms belong to the larger class of evolutionary algorithms as a search heuristic that mimics the process of natural evolution. Genetic Algorithms as a part of evolutionary algorithms were introduced by Holland to generate solutions to optimization problems [
5]. Since then, evolutionary approaches have been adopted in many studies, for instance, in the field of multimodal pain recognition [
6] or for improved diagnostic ability of beat-to-beat variability analysis [
7]. Studies comparing different feature selection methods were also conducted to investigate which strategy delivers the best classifications. It was shown that for four out of five datasets used, the best results were obtained using the optimized selection with genetic algorithms [
8].
In the following, we present the implementation of a forward selection method based on evolutionary algorithms and describe its integration within our previously developed workflow for affective computing and stress recognition [
2]. Then, we evaluate this approach using biosignal data from our uulmMAC dataset [
9] and finally discuss some options for future optimizations.
2. Materials and Methods
Our feature selection method with evolutionary algorithms is based on a genetic algorithm that uses techniques inspired by natural evolution, such as mutation, crossover, and selection. In the context of feature selection,
mutation denotes switching features on and off, while
crossover denotes interchanging used features.
Selection is achieved using a specified selection scheme parameter [
10].
Given a clearly defined problem to be solved and a bit string representation for candidate solutions, a simple genetic algorithm works as follows [
11]:
Start with a randomly generated population of n parent individuals, where each individual represents a solution to a problem.
Calculate the fitness (accuracy of the prediction, stating how good the individual solves the problem) of each parent individual in the population.
Repeat a set of steps including mutation, crossover, evaluation, and selection, until n offspring (mutated and/or recombined version of the parent individuals, also synonym for all generated child individuals) has been created.
Each iteration of this process is called a generation. A genetic algorithm is typically iterated for 50 to 500 or more generations.
The proposed method is realized using the “Optimize Selection (Evolutionary)” operator from RapidMiner, which uses a genetic algorithm to select the most relevant features of a given dataset. It consists of the steps Initialize, Mutate, Crossover, Evaluate, and Select and is implemented as follows:
Initialize: First, an initial population consisting of p individuals is generated, in which every individual is a vector of a randomized set of attributes (features). In our example, the population size parameter p is set to 20 and each individual has a minimum and maximum size of attributes of 3 and 10, respectively. Each attribute is switched on with a probability defined with the p-initialize parameter, set in our example to p_i = 0.5.
Mutate: For all of the individuals in the population, mutation is performed by setting the used attributes to unused with probability
p_m and vice versa. The probability
p_m is defined by the p-mutation parameter, given typically as a very small rate [
11]. In our case, we set the mutation rate to
p_m = −1.0, which is equivalent to a probability of 1/
n, where
n is the total number of attributes. Mutation allows adding new child individual information while slightly changing the parent individual.
Crossover: Crossover for interchanging the used features is performed on two individuals chosen from the population, with probability
p_c. The probability
p_c is defined by the p-crossover parameter and is set to
p_c = 0.5. The type of crossover is defined by the crossover type parameter and is set to uniform. In uniform crossover, we select two individuals for crossover and assign
heads to one parent and
tails to the other. Then, we flip a coin for each position for the first child and make an inverse copy for the second child. The uniform operator has the property that the elements of an individual are position independent [
12].
Evaluate: In the next step, the fitness of all individuals generated with mutation and crossover is evaluated. Therefore, the accuracy of the prediction is calculated using a given classification algorithm. In this paper, we use the Random Forests classifier to evaluate the fitness of an individual by computing the accuracy of the correct predicted emotional state. The higher the fitness of an individual is, the more likely it is selected for the next generation.
Select: Finally, a selection scheme is adopted to map all of the individuals according to their fitness and draw
p individuals at random according to their probability for the next generation, where
p is again the population size parameter. In this paper, we use the Roulette Wheel selection scheme, in which the number of times an individual is expected to be selected for the next generation is equal to its fitness divided by the average fitness in the population [
11].
This process is repeated as long as the stopping criterion is not yet reached. The stopping criterion is set after a maximum of 50 generations or after two generations without improvement. The described parameters are illustrated in
Figure 1. These can be adjusted independently on the used classification algorithm. A detailed description of the different parameters as well as other available options can be found in the documentation section of RapidMiner [
10].
3. Results and Discussion
The feature selection method based on evolutionary algorithms was first designed in RapidMiner, as described in the previous section.
Figure 2 illustrates the implementation of this method using the “Optimize Selection (Evolutionary)” operator. It is integrated within the feature selection subprocess of our previously developed processing workflow for affective computing and stress recognition [
2].
Then, the proposed method was evaluated using biosignal data from our uulmMAC database for affective computing and machine learning applications [
9]. For the evaluation, we applied our processing workflow using both the evolutionary algorithms and the Forward Selection method. The latter was chosen for comparison as the fastest among the other two approaches of Backward Elimination and Brute Force. Regarding the classifier, we applied the Random Forests algorithms to compute the accuracy of the prediction. Regarding the validation, we used the 10-fold cross validation method.
A total of 162 different features were extracted from the biosignal data, including category-based features for the respiration, skin conductance level, temperature and electromyography channels, and signal-specific features for the electrocardiogram channel. Considering the six different sequences available in the uulmMAC dataset, we evaluated a two-class problem by computing the recognition rates for the states Overload and Underload, as well as a six-class problem, including the six classes Interest, Overload, Normal, Easy, Underload, and Frustration.
Our results show that the proposed feature selection approach based on evolutionary algorithms has a much faster runtime compared to the Forward Selection method at similar recognition rates. It does not stop at a local optimum, allowing a promising feature selection alternative in the field of affective computing. Preliminary classifications using the Random Forests classifier and a 10-fold cross validation resulted in a 3-fold enhancement of computation time for the two-class problem with similar classification rates of about 86% (chance level is 100/2 = 50%) for both selection methods, while an almost 8-fold (7.8) enhancement of the computation time was obtained for the six-class problem at the same recognition rate of about 29% (chance level is 100/6 = 16.67%) for both selection methods.
Table 1 summarizes the results.
Currently, we are investigating some other options to further optimize the results. For instance, by increasing the stopping criteria to more generations and evaluating the effect on the classification rates relative to the increase of computation time. We will conduct further computations using multi-class problems to classify the other affective states from the uulmMAC dataset and extend our preliminary classifications [
13]. The results will be evaluated with different classifiers and validation methods, as previously adapted by our performance study [
14]. Furthermore, in the present work, we use the Roulette Wheel selection scheme to select the fittest individuals for the next generation. In the Roulette Wheel selection, the survival probability of each individual is proportional to its relative fitness. Further selection schemes could be investigated, such as the Tournament Selection, in which a randomly selected number of individuals is first selected to take part in a tournament, and the individuals with the highest fitness of this tournament are subsequently selected into the next generation until a predefined number of individuals is reached in the new generation [
15].