Optimizing Android Facial Expressions Using Genetic Algorithms

: Because the internal structure, degree of freedom, skin control position and range of the android face are different, it is very difﬁcult to generate facial expressions by applying existing facial expression generation methods. In addition, facial expressions differ among robots because they are designed subjectively. To address these problems, we developed a system that can automatically generate robot facial expressions by combining an android, a recognizer capable of classifying facial expressions and a genetic algorithm. We have developed two types (older men and young women) of android face robots that can simulate human skin movements. We selected 16 control positions to generate the facial expressions of these robots. The expressions were generated by combining the displacements of 16 motors. A chromosome comprising 16 genes (motor displacements) was generated by applying real-coded genetic algorithms; subsequently, it was used to generate robot facial expressions. To determine the ﬁtness of the generated facial expressions, expression intensity was evaluated through a facial expression recognizer. The proposed system was used to generate six facial expressions (angry, disgust, fear, happy, sad, surprised); the results conﬁrmed that they were more appropriate than manually generated facial expressions.


Introduction
A social robot can deliver information required for everyday life to humans and can interact with humans via two-way communication. Natural interaction with humans is enabled only by combining research on topics such as robot control, object recognition and dialog engines [1][2][3]. Recently, a hotel in Japan introduced an android to confirm the reservation information of clients and guide them to their rooms [4,5]. This system combines interactive technology that responds to clients, vision/recognition technology that can recognize an approaching person and memorize their appearance, voice recognition and dialog technology that can understand what people say and have a conversation and robot control technology that can inform people of their room location via hand gestures.
The appearance of social robots takes various forms, depending on their roles and purposes. In homes, domestic robots that perform the role of a personal secretary have been introduced [6,7]. These robots play music, read either texts or schedules and show them on screens upon receiving voice commands from their owners. Unlike home service robots, robots that guide people through spaces, such as exhibitions, are relatively larger and they have additional mobile functions and manipulator technology. These robots stretch their arms to lead people through areas or communicate the location of an object and can directly guide people to a requested location by using mobile technology [8][9][10]. In particular, when used in roles such as hotel receptionist and announcer positions, androids present and the same facial expression may be interpreted differently by different people. Therefore, an index is required to objectively evaluate the facial expressions generated by an android.
To address these problems, this study proposes a system for androids that can automatically generate the facial expressions of an android based on a genetic algorithm. Because the face of the android resembles a human being, it is possible to utilize a human facial expression recognizer directly. This system comprises a facial expression recognizer, an android capable of generating facial expressions and a genetic algorithm. The system features a number of prerequisites. First, the facial expression recognizer should be able to identify the android's facial expressions as effectively as it can recognize those of a human. Second, the android must be able to generate facial expressions by using a combination of motor displacements inside the mechanism. Finally, to utilize the genetic algorithm, the motor displacements must be able to be expressed as genes to evaluate the fitness of the facial expressions using recognizers. By doing so, the android's facial expressions can evolve through processes such as selection, crossover and mutation, thereby generating natural facial expressions.
The rest of this paper is structured as follows. In Section 2, we describe the EveR series of androids, to which the proposed system is applied. Section 3 describes how to encode facial expressions into genes and how to measure the fitness of the generated facial expressions, as well as our complete optimization system that comprises the android, the facial expression recognizer and the genetic algorithm for facial expression optimization. In Section 4, we describe the system for implementing the proposed algorithm and analyze the evolution and fitness of facial expressions using this system. Finally, we present conclusions in Section 5.

EveR Androids
Androids are humanoid-type robots with similar appearances, skin and eyes to human beings. EveR, an android capable of expressing human movements, was developed by KITECH [12]. The name was generated using the name of the "first" woman, Eve and the first letter of "robot". EveR is divided into three sections for control: the head, upper body and lower body. Twenty-three motors are arranged in layers in the head and connected to major control positions through wires to generate skin movements. Human behavioral expressions, such as eye blinking, eye movements, lip shape formation and other facial expressions, are imitated through a combination of movements of the 23 motors. The waist and neck have three degrees of freedom to allow EveR to rotate its waist and neck like humans. Finally, both arms can express a range of gestures by using 12 motors [13,14].
According to Mori, who proposed the uncanny valley theory, human beings have a more favorable impression of robots when they are more similar to humans. Conversely, they have feelings of extreme rejection when the behavioral expressions of robots are awkward [33]. Therefore, robots resembling humans must act as much like humans as possible to reduce this feeling of rejection and enhance favorable impressions, necessitating behavioral expressions with natural presentations.
EveR has been utilized in different exhibitions and performances ( Figure 1) [18]. As the majority of exhibitions are based on predictable scenarios, predefined dialogs and gestures must be prepared in advance. For this, various gesture expressions using the upper body of the android (arms, neck and waist), as well as behavioral expressions using the head (lip syncing and facial expressions), must also be generated naturally in advance. For gestures, the motion of an actor is captured using marker-based motion capture equipment. Robot joint angles are extracted from these data and utilized for the android gestures. Unlike upper body movements, the facial expressions and lip shapes are generated by controlling individual motors.

Encoding of Facial Expressions
Genetic algorithms are optimization algorithms created by Holland in 1975 that imitate the evolutionary process of natural ecosystems [34], where genes evolve adaptably for survival through the processes of selection, crossover and mutation. Genetic algorithms that model this process from an engineering perspective can be applied to various problems that are difficult to define mathematically, such as project scheduling problems, cost problems for constructing network environments and optimization problems of multipurpose buildings, to obtain optimal solutions.
In this study, a facial expression optimization system based on a genetic algorithm was developed for facial expression generation methods that are difficult to define mathematically. DEAP, a Python-based open source library, was employed as the genetic algorithm [35]. The DEAP framework is capable of rapidly and simply testing various evolutionary computations, such as genetic algorithms (GA), generic programming (GP) and evolution strategies (ES). In general, the genes of a genetic algorithm consist of a binary code of 0 or 1. Although the motor value can be expressed as a chromosome composed of 0 s and 1 s, the length of the chromosome becomes cumbersome and the motor value cannot be interpreted intuitively. The chromosome was therefore constructed with real numbers between −1 and 1 by using real-coded GA, in consideration of the length of the chromosome, so that the motor value could be intuitively identified [36]. The gene position in a chromosome is defined in Figure 2. Figure 3 shows the face control positions for EveR. There are 23 motors for the face; however, movements of the pupils and tongue that do not significantly affect the facial expression recognition rate were excluded, leaving 16 control positions ( Figure 3). The double arrows indicate movements that can be controlled on both sides (e.g., number 1) and single arrows indicate motion that can be controlled on a single side (e.g., number 3). The robot's eyes are bilaterally symmetrical and have six degrees of freedom. Emotions and eye blinking can be expressed by controlling the eyebrows and eyelids. The area around the mouth has 10 degrees of freedom to express emotions as well as perform lip syncing for conversation. Table 1 summarizes the identification numbers for motors in the face, gene positions in the chromosome and face skin control positions used in this study. For example, motor number 1 corresponds to the right eyelid of EveR and is presented in the g 1 position of the chromosome.    Figure 4 shows the facial expression made by EveR from randomly generated genes. For g 1 , g 2 , g 9 and g 16 , which can be controlled in both directions, a value between −1 and 1 was randomly generated and values between 0 and 1 were randomly generated for single-direction genes. For example, the right eyelid value, g 1 , was −0.64 and the left eyelid value, g 2 , was 0.35. The difference between the genes for the two eyelids demonstrates the different actions being controlled; g 1 expresses the action for closing the eyes (less than zero) and g 2 expresses the action for opening the eyes widely (greater than zero). For g 16 , which controls the jaw, a value less than zero resulted in a closed mouth gesture whereas a value greater than zero resulted in an open mouth gesture.

Evaluation
The fitness of facial expressions generated with genes was evaluated using FaceReader, a facial expression recognizer software package developed by Noldus in the Netherlands. In addition to neutral emotion, FaceReader can recognize six basic emotions in real time-happy, sad, angry, surprised, fear and disgust. According to the results of a verification study, the emotion classification tool in FaceReader 6 has an average classification accuracy of 88%. They claimed that the latest version, FaceReader 7.1, achieved an emotion classification accuracy of 93%. Figure 5 shows the results of human facial expression recognition by FaceReader as a bar graph. The Neutral bar was highest when the person was not performing a facial expression (Figure 5a) and the Happy bar was highest when the person was laughing (Figure 5b). The facial features of the EveR android, which are similar to those of a person, can also be recognized as emotional changes expressed in real time through FaceReader. Figure 6 shows the results of facial expression recognition for EveR in the Neutral state; FaceReader recognized the android's face accurately (shown by the rectangular box) and confirmed its Neutral facial expression (gray bar).

Experimental Setup
In this study, a system for generating the facial expressions of an android based on a GA was proposed. To verify this system, we attempted to generate six basic facial expressions (angry, happy, sad, surprised, disgust, fear) in EveR. For this purpose, the system was constructed as shown in Figure 8. Each software program could exchange data through TCP/IP communication.
The main control software of the system was developed based on genetic algorithms using DEAP's library. The GA parameters used for the experiment are as follows. Each chromosome was composed of 16 genes (16 motors) and each generation was composed of 20 chromosomes (20 facial expressions). The tournament method was used for selection with a tournament size of 3. The two-point exchange method was used for crossover and the Gaussian method was used for mutation. The coefficients were set to 0.1 for µ and 0.2 for σ, to ensure that the mutation did not cause serious displacements in the gene. The probabilities of crossover and mutation were set to 50% and 30%, respectively. The facial expression intensity received from the facial expression recognizer was utilized to verify the fitness by generation and the accumulated fitness score was used for parent generation selection.
Facial expressions for the robot head were generated after receiving the chromosomes (motor values) from the main control software and controlling the motors. The control period of the proposed system was set to 10 s. For 7 s, chromosome transmission from DEAP to the robot control system, facial expression generation and maintenance in the robot head, facial expression recognition and fitness transmission from FaceReader to DEAP were performed. We recognized the facial expression by recording a snapshot 3 s after the facial expression generation. When the expression changes in the neutral state, the result of the recognizer varies greatly. To minimize the error, we maintain the expression until the recognition result is stable and then record the result. The remaining 3 s were used as rest time by setting all motor values to zero to minimize the failure of the robot. Finally, the expression intensity of the facial expressions of EveR was transmitted to the main control software to be utilized as the fitness value of the system.

Facial Expression Analysis
In this study, six basic facial expressions (angry, disgust, fear, happy, sad, surprised) were generated to evaluate the performance of the proposed system. To minimize failure of the robot skin and motors in the proposed system, evolution was limited to a maximum of 20 generations. Evolution was stopped when the average expression intensities of the evolved generations did not increase by more than 5% for four consecutive instances or when the set threshold value was reached. Figure 9 shows the evolution of facial expressions by emotion. It shows the expression of chromosomes with the highest fitness for each generation. The surprised facial expression completed evolution in the 6th generation, which was the fastest evolution observed. In contrast, the expression intensity for angry, disgust and fear facial expressions did not significantly increase over 20 generations of evolution. Finally, evolution of the sad and happy facial expressions was completed by the 18th and 12th generations, respectively. Figure 10 summarizes the fitness value for each generation of an evolving facial expression. Figure 10a-f show the fitness values for angry, happy, sad, surprised, disgust and fear expressions, respectively. The yellow line represents the fitness of the expression created manually by controlling the motor in one axis, whereas the blue line indicates the maximum fitness of the expression created by the proposed system following each generation and the red line indicates the average fitness following each generation.
The facial expressions generated using the proposed system had higher fitness values than manually generated facial expressions. For sad expressions, the system's evolution produced more suitable facial expressions than manual expressions by the 5th generation. From the 8th generation, the average fitness of the evolving generation was higher than that of the manual expressions. Moreover, the fitness of every facial expressions increased with evolution. However, the recognizer did not effectively identify angry and happy facial expressions. Although more appropriate facial expressions were generated through this system, with respect to their fit to an emotion, future research should increase the expressiveness of the android by changing the skin control position of the robot and adding degrees of freedom to generate more facial expressions.
We generated facial expressions by applying the proposed system to the older male face robot. The internal structure of this robot and the number of motors are similar to those of 'EveR' but the skin control position, external shape and skin thickness are different. We applied the system to the male robot by mapping the motor value to the gene to generate facial expressions for the robot. We generate facial expressions expressing six emotions through the same process as that when generating 'EveR' facial expressions.   Figure 11 was generated using chromosomes with the highest fitness of the generation. In the case of the male robot's facial expression, all 20 generations made a facial expression. Similar to the EveR facial expressions, surprise and sad expressions were well expressed but the other facial expressions could not be generated well.
Finally, we analyzed the GA parameters affecting facial expression generation. The GA parameters that were previously applied are as follows. Each generation comprised 20 chromosomes. We used tournament selection and Gaussian mutation; the tournament size was 3 and the sigma value of the mutation was 0.2. We generated a happy facial expression by varying the number of chromosomes, tournament size and the sigma value of the Gaussian mutation. Figure 13 shows the evolutionary result of the parameter difference. The number of chromosomes is 10, the sigma value is 0.02 and the tournament size is 3 in Figure 13a. The number of chromosomes is 10, the sigma value is 0.2 and the tournament size is 2 in Figure 13b. The number of chromosomes is 10, the sigma value is 0.2 and the tournament size is 3 in Figure 13c. The number of chromosomes is 20, the sigma value is 0.2 and the tournament size is 2 in Figure 13d. We confirmed that the faster the evolution (owing to a larger sigma value of the mutation), the larger the change in motor value through Figure 13a,b. Similarly, through Figure 13b,c, the evolution speed is fast when the tournament size is large. And we confirmed that the higher the number of chromosomes, the faster the evolutionary rate through Figure 13d.

Difference of Facial Expression According to Recognition Performance
The facial expression recognizer has the biggest influence on the facial expression generation of the android. A recognizer built by learning the face of a Westerner cannot recognize the facial expressions of Asian people. Based on recent research results, Wataru proved that the basic emotional expression is not related to pleasure, surprise expression and culture, unlike Ekman's claim that it is similar regardless of culture [37]. This implies that facial expressions are not generated well when Western facial datasets are used to create facial expressions for Eastern facial robots.
To generate facial expressions naturally, the performance of the recognizer should be supported. By using a recognizer that accounts for the characteristics of the robot, more suitable facial expressions can be generated. For example, to make an angry facial expression for an Asian male robot, an Asian facial expression recognizer based on an angry data set would be ideal. This method can generate more suitable facial expressions than those created using commercial software.

Limitations
We aimed to generate facial expressions for androids that provide guidance and personal services. Depending on the purpose of the service, androids have different genders, facial features and functions.
The proposed system has the advantage that facial expressions of various appearances can be easily generated by using recognizers learned based on human facial data. If an artist with aesthetic sensibilities creates a facial expression while considering the director's intention, a more suitable facial expression than that generated using this system can be made. However, it is very difficult for an artist to generate facial expressions because they cannot accurately predict the movement of the skin when the motor is moved. Therefore, facial expressions are generated easily using our proposed system. It would be more efficient to create facial expressions after generating a draft facial expression by using this system.
The expressiveness of facial expressions that can be made varies according to the characteristics of each face robot. There are approximately 80 muscles in the human face. However, face robots only have between 10 and 25 degrees of freedom. This makes it very difficult to generate facial expressions of the robot that are identical to those of human beings. The number of facial expressions of the android that can be generated will increase as the number of motors increases but this is accompanied by increased interference among the wires.

Conclusions
In this study, we proposed a system that can automatically generate facial expressions for an android by combining three components-an android head, a facial expression recognizer capable of interpreting human and android faces and a genetic algorithm. A facial expression recognizer was employed to identify when androids exhibited angry, happy, sad, surprised, disgust and fear facial expressions. These expressions were controlled using one chromosome (a set of genes), through 1:1 mapping of the motors and genes. Each generation of facial expressions comprised 20 chromosomes. Crossover probabilities were set to 50% and mutation probabilities were set to 30%. For evolution into the next generation, the tournament method was used for selection, the two-point method was used for crossover and the Gaussian method was used for mutation. The results from the facial expression recognizer showed that the fitness of all facial expressions generated using the proposed system was higher than that of manually generated facial expressions, with the surprised facial expression evolving quickest. Although the face of the EveR android had insufficient muscle expression ability to effectively generate angry and happy facial expressions, the expression ability for surprised and sad expressions was excellent.
We suggest that the facial expressions of a range of androids can be conveniently generated using the proposed system. In particular, the main advantage of this system is that it does not need to take factors such as degree of freedom, internal structure, skin control position and range into consideration to generate a robot's facial expression. To utilize this system, it is only necessary to map the motor value of face robot to a gene. The system will determine the optimal facial expression by reflecting the characteristics of the face robot. The higher the degree of freedom of the face robot, the more deformable the skin can be, allowing a natural facial expression to be generated. In addition, a face robot with the appearance of an Asian person can generate a natural facial expression by using a recognizer learned from Asian facial expression data.