User Affect Elicitation with a Socially Emotional Robot

To effectively communicate with people, social robots must be capable of detecting, interpreting, and responding to human affect during human–robot interactions (HRIs). In order to accurately detect user affect during HRIs, affect elicitation techniques need to be developed to create and train appropriate affect detection models. In this paper, we present such a novel affect elicitation and detection method for social robots in HRIs. Non-verbal emotional behaviors of the social robot were designed to elicit user affect, which was directly measured through electroencephalography (EEG) signals. HRI experiments with both younger and older adults were conducted to evaluate our affect elicitation technique and compare the two types of affect detection models we developed and trained utilizing multilayer perceptron neural networks (NNs) and support vector machines (SVMs). The results showed that; on average, the self-reported valence and arousal were consistent with the intended elicited affect. Furthermore, it was also noted that the EEG data obtained could be used to train affect detection models with the NN models achieving higher classification rates


Introduction
There exists a growing number of social robots being integrated into our daily lives as they can assist and extend human capabilities in human-centered environments such as homes, hospitals, and workplaces [1]. To effectively assist and interact with humans during human-robot interactions (HRIs), these robots are expected to have social intelligence and engage in bi-directional communications [2]. For robots to communicate with people, they must be able to recognize, interpret, and respond to human affect, which can be used to communicate people's feelings, emotions, thoughts and intent [3]. Robots that can interpret human affect can promote more effective and engaging HRIs, which can lead to better acceptance from their users [4].
Affective computing enables robots to detect and interpret different modes of human affect using a variety of rule-based or learning techniques [5]. In order to accurately recognize and classify user affect, affect elicitation techniques have been developed for creating and training affect detection models [6].
The majority of existing elicitation techniques often utilize stimuli that users view (e.g., looking at images, watching videos containing certain emotions) [7,8] or engage users in specific human-human scenarios (e.g., social interactions between people) [9] in order to elicit a certain human affect. However, these techniques cannot always elicit the affect that users would experience, specifically, during HRIs, as it has been found that such affect is directly related to the interaction with the robot itself [9][10][11]. For example, users may experience higher affective arousal when interacting with a robot compared with a human for the same activity due to the unfamiliarity and unpredictability of the robot [10]. Therefore, In [11], the cat-like robot iCat was used to play chess games on an electronic chessboard with children. The robot displayed a happy facial expression when a player made a bad move and a sad facial expression when the player made a good move. During the interactions, videos of the frontal and lateral views of the children were recorded. The videos were annotated manually by three coders to determine affective postural expressions. These expressions were used to train a recognition model for the level of engagement of a player comparing different learning-based classifiers.
In [12], two child-robot interaction scenarios were designed utilizing a teleoperated NAO robot to elicit and collect spontaneous emotional expressions from the speech, facial expressions, and body language of children. The first scenario involved a child and the robot playing Snakes and Ladders on a computer. Further, 2D cameras and Kinect sensors placed in front of the children were used to record the body, face, and audio information. The robot would display positive or negative body gestures based on the child's performance. In the second scenario, each child watched movie clips with the robot that elicited different emotions (anger, disgust, fear, happiness, sadness, or surprise). After each movie clip, the robot expressed its own emotion using affective gibberish speech. Its emotion was either the same or contradictory to the movie clip. The children were required to rate the robot's emotion based on valence and arousal using the Self-Assessment Manikin (SAM) scale [33]. All captured data from both scenarios were annotated manually by four raters using the 2D valence-arousal scale.
In [13], a toy-like robot Mof-mof was used to elicit four different emotional facial expressions (happy, surprised, angry, and sad) by displaying varying robot actions (e.g., hopping, bending) and speech patterns (e.g., "I feel good today") based on a user's situation (e.g., how busy they were, their current postures). The user's facial expressions were captured by a camera and detected using the OKAO Vision facial expression recognition software [34]. Each user entered their current situation on a computer and then the robot would select an action and speech that was expected to elicit a specific facial expression based on a multilayer perceptron neural network robot behavior model. This model was previously trained in [35].

Self-Reported Affect
In [14], a robot manipulator CRS A460 was used to elicit affect by performing motion trajectories using different velocities and accelerations. Physiological signals including heart rate, perspiration rate, and electromyogram (EMG) were obtained in addition to users' subjective responses on their perceived level of anxiety and calmness using a 5-point Likert scale. The level of valence and arousal were extracted from the user's level of anxiety and calmness. These were then used as labels for the corresponding physiological data. Three hidden Markov models (HMM) were trained to detect valence and arousal.
In [15], two Adept Viper 6 degrees-of-freedom (DOF) robot arms with two-finger grippers were used to play the Tower of Hanoi game with users. A Kinect sensor was used to monitor the game by tracking the number of moves made by the users and the robots, and a surveillance camera was used to monitor the interaction in case of an emergency. The players played the game multiple times by themselves, and with a human or a robot collaborator. After each game, they rated their own elicited emotional experience using the Geneva Emotion Wheel (GEW) [36].

Use of Both Self-Reported and Coded Affect
In [16], a NAO robot was used to present lectures to a crowd of people in order to investigate how robot moods influence the affect of an audience. The robot displayed different arm gestures to convey positive or negative valence during the lectures. Two 30-min lecture scenarios were conducted using either of these conditions for the entire lecture. Participants rated their affect using the SAM scale before, in the middle of, and after the lecture. Each scenario was video recorded, and the videos were annotated by two coders to assess participant valence and arousal based on verbal and non-verbal reactions (e.g., laughter, applause) on a 9-point Likert scale. The self-reported results and the coded affect were analyzed separately. They showed agreement on the elicited arousal, however, the coded affect had higher positive valence in the positive session than the self-reported affect.
The aforementioned robots have been used for affect elicitation during HRI scenarios using different embodiments such as manipulator arms [14,15], animal/creature-like robots [11,13], and humanoid robots [12,16]. It has been found that the manipulator and animal/creature type robots can have limited social embodiment which may affect their ability to partake in certain social roles and perform a variety of social behaviors [37]. A human-like social embodiment has been shown to make it easier for a robot to follow human social norms which have resulted in more engaging and effective social HRIs [37]. The majority of the robots engaged in the abovementioned "social" HRI scenarios used only physical affective expressions such as body movements and facial expressions to determine user affect. However, these expressions are not always available in HRI scenarios. For example, exercise facilitation requires users to perform physical movements that may restrain them from displaying body gestures, and a user's facial expressions can be perturbed due to the increase in effort and muscle fatigue from physical activities [7]. Furthermore, some approaches have only focused on affect elicitation (e.g., [15,16]) without considering the recognition of the user affect that is being elicited.
Robotics 2020, 9, 44 4 of 17 In this paper, we propose the development of an affect elicitation approach that can elicit user affect with a human-like social robot in order to directly capture the affect that individuals feel during HRI scenarios. In order to consider different populations and HRI activities, we uniquely obtain EEG signals for the detection of user valence and arousal during social HRIs. As EEG measures the electrical activity of the brain, our approach can be used by a robot even when users are participating in physical activities.

A User Affect Elicitation Methodology Using a Social Robot
Our affect elicitation methodology utilizes a social robot to directly provide stimuli for eliciting the valence and arousal of a user. We utilize the Pepper robot to display a combination of affective body movements to music in order to induce different user affect. In general, body movements can be used to accurately express distinctive affect [38]. Furthermore, observing affective body movements can activate the mirror neuron network of a person which in turn can produce a similar affect to that observed [39]. Music has also been shown to effectively invoke affect by triggering hormonal and autonomic responses of a person through the direct use of its structural features (e.g., intensity, tempo, and mode) [40]. For example, music that is fast in tempo and composed in the major mode can induce positive valence and high arousal, while music that is written in the minor mode with a slow tempo can induce negative valence and low arousal [40].
As music has a direct impact on brain activation, it is effective in eliciting affect that can be recognized through physiological signals such as EEG [41]. Music has been used as a common elicitation technique for the training of affect detection models [6]. It has been used both independently on its own [42], or combined with other modes such as video [43], or body movements [44]. Both music and gestures/body movements have been validated independently for the use of affect detection [45,46], which motivated their use in our work. Furthermore, they have also been combined together to successfully determine affect [44,[47][48][49]. As people often relate particular movement features with music that matches the movements [50], this combination can generate stronger affective responses, including physiological responses, than when using each mode alone [44]. Therefore, combining music with body movements with congruent affective information, as we propose herein for a social robot, has the potential to induce distinct affect in users.
Our proposed affect elicitation and detection methodology is presented in Figure 1. EEG signals are used to measure the affect of users during HRIs, and self-assessments are then used to label the corresponding EEG signals to develop and train an affect detection model for determining the user's level of valence and arousal. Each of the two sub-systems within our methodology are discussed in detail below.
focused on affect elicitation (e.g., [15,16]) without considering the recognition of the user affect that is being elicited.
In this paper, we propose the development of an affect elicitation approach that can elicit user affect with a human-like social robot in order to directly capture the affect that individuals feel during HRI scenarios. In order to consider different populations and HRI activities, we uniquely obtain EEG signals for the detection of user valence and arousal during social HRIs. As EEG measures the electrical activity of the brain, our approach can be used by a robot even when users are participating in physical activities.

A User Affect Elicitation Methodology Using a Social Robot
Our affect elicitation methodology utilizes a social robot to directly provide stimuli for eliciting the valence and arousal of a user. We utilize the Pepper robot to display a combination of affective body movements to music in order to induce different user affect. In general, body movements can be used to accurately express distinctive affect [38]. Furthermore, observing affective body movements can activate the mirror neuron network of a person which in turn can produce a similar affect to that observed [39]. Music has also been shown to effectively invoke affect by triggering hormonal and autonomic responses of a person through the direct use of its structural features (e.g., intensity, tempo, and mode) [40]. For example, music that is fast in tempo and composed in the major mode can induce positive valence and high arousal, while music that is written in the minor mode with a slow tempo can induce negative valence and low arousal [40].
As music has a direct impact on brain activation, it is effective in eliciting affect that can be recognized through physiological signals such as EEG [41]. Music has been used as a common elicitation technique for the training of affect detection models [6]. It has been used both independently on its own [42], or combined with other modes such as video [43], or body movements [44]. Both music and gestures/body movements have been validated independently for the use of affect detection [45,46], which motivated their use in our work. Furthermore, they have also been combined together to successfully determine affect [44,[47][48][49]. As people often relate particular movement features with music that matches the movements [50], this combination can generate stronger affective responses, including physiological responses, than when using each mode alone [44]. Therefore, combining music with body movements with congruent affective information, as we propose herein for a social robot, has the potential to induce distinct affect in users.
Our proposed affect elicitation and detection methodology is presented in Figure 1. EEG signals are used to measure the affect of users during HRIs, and self-assessments are then used to label the corresponding EEG signals to develop and train an affect detection model for determining the user's level of valence and arousal. Each of the two sub-systems within our methodology are discussed in detail below.

Affect Elicitation
We have designed robot body movements which utilize different combinations of upper body, shoulder, head, arm, and hand movements for the robot to express two types of affect: (1) positive valence and high arousal, and (2) negative valence and low arousal. These affect types were chosen as their respective movement dynamics (e.g., speed) and music (e.g., tempo) share a structure that is emotionally relevant to people [44]. In turn, they also can produce stronger physiological responses in the perceivers [44].
Herein, the design of the robot's body movements is adapted from [45], which identifies distinct associations between certain human body movements and affect. Our designed positive valence and high arousal movements are composed of a series of expansive movements with high movement activities and/or high movement dynamics, Figure 2a. The negative valence and low arousal movements are unexpansive with low movement activity and/or low movement dynamics, Figure 2b. These two affect elicitation stimuli are used for eliciting reciprocal affect in users.

Affect Elicitation
We have designed robot body movements which utilize different combinations of upper body, shoulder, head, arm, and hand movements for the robot to express two types of affect: (1) positive valence and high arousal, and (2) negative valence and low arousal. These affect types were chosen as their respective movement dynamics (e.g., speed) and music (e.g., tempo) share a structure that is emotionally relevant to people [44]. In turn, they also can produce stronger physiological responses in the perceivers [44].
Herein, the design of the robot's body movements is adapted from [45], which identifies distinct associations between certain human body movements and affect. Our designed positive valence and high arousal movements are composed of a series of expansive movements with high movement activities and/or high movement dynamics, Figure 2a. The negative valence and low arousal movements are unexpansive with low movement activity and/or low movement dynamics, Figure 2b. These two affect elicitation stimuli are used for eliciting reciprocal affect in users. The affective body movements are coordinated with music chosen from a publicly validated dataset that contains 1000 licensed music excerpts from the Free Music Archive (FMA) that were specifically designed for affect elicitation [46]. Each excerpt is 45 s long. A minimum of 10 annotators from 10 different countries were used to rate the level of valence and arousal for each of these excerpts [46]. For the robot stimuli, we selected five music excerpts for each affect type to match with the intended affect of the robot movements accordingly, Table 1. Each excerpt contains only instrumental arrangements, thus excluding those with vocals to prevent the potential influence of a language barrier on the user's experience. A video of our designed robot affect elicitation stimuli can be found here (https://youtu.be/UaoPb6_uOeE) on our YouTube Channel. The affective body movements are coordinated with music chosen from a publicly validated dataset that contains 1000 licensed music excerpts from the Free Music Archive (FMA) that were specifically designed for affect elicitation [46]. Each excerpt is 45 s long. A minimum of 10 annotators from 10 different countries were used to rate the level of valence and arousal for each of these excerpts [46]. For the robot stimuli, we selected five music excerpts for each affect type to match with the intended affect of the robot movements accordingly, Table 1. Each excerpt contains only instrumental arrangements, thus excluding those with vocals to prevent the potential influence of a language barrier on the user's experience. A video of our designed robot affect elicitation stimuli can be found here (https://youtu.be/UaoPb6_uOeE) on our YouTube Channel.

Affect Detection
Elicited user affect is measured by EEG signals. They are labeled based on self-reported perceived affect in the (a) positive valence and high arousal (PH) and (b) negative valence and low arousal (NL) sessions. The labeled data are used to train the affect detection models in order to detect valence and arousal during HRIs.

Physiological Responses
Physiological signals can be more reliable than physical signals (e.g., facial expressions) as they are not easily controlled by people in order to hide or manipulate their affect [31]. Common physiological modes used in affect detection include: (1) cardio activity, (2) skin conductivity, (3) blood volume pulse, (4) surface electromyography (EMG), (5) EEG, and (6) respiration [51]. Only EEG and EMG can be used for detecting both valence and arousal while the other modes are only used to measure arousal [51]. However, EMG can only be used to measure user affect when there is no muscle contraction or movements [52], which can be impractical for users engaging in HRIs. On the other hand, EEG measures the brain electrical activities which are less affected by such movements. Therefore, EEG signals are used in our work to measure a person's valence and arousal.
The EEG headband we use is the InteraXon Muse 2016, a low-cost four-channel dry electrode EEG sensor with a sampling rate of 256 Hz [53]. The sensor measures the electrical signals from the four electrode locations at TP9 (above the left ear), AF7 (left side of the forehead), AF8 (right side of the forehead), and TP10 (above the right ear), described using the International 10-20 system [53], as shown in Figure 3.

Affect Detection
Elicited user affect is measured by EEG signals. They are labeled based on self-reported perceived affect in the (a) positive valence and high arousal (PH) and (b) negative valence and low arousal (NL) sessions. The labeled data are used to train the affect detection models in order to detect valence and arousal during HRIs.

Physiological Responses
Physiological signals can be more reliable than physical signals (e.g., facial expressions) as they are not easily controlled by people in order to hide or manipulate their affect [31]. Common physiological modes used in affect detection include: (1) cardio activity, (2) skin conductivity, (3) blood volume pulse, (4) surface electromyography (EMG), (5) EEG, and (6) respiration [51]. Only EEG and EMG can be used for detecting both valence and arousal while the other modes are only used to measure arousal [51]. However, EMG can only be used to measure user affect when there is no muscle contraction or movements [52], which can be impractical for users engaging in HRIs. On the other hand, EEG measures the brain electrical activities which are less affected by such movements. Therefore, EEG signals are used in our work to measure a person's valence and arousal.
The EEG headband we use is the InteraXon Muse 2016, a low-cost four-channel dry electrode EEG sensor with a sampling rate of 256 Hz [53]. The sensor measures the electrical signals from the four electrode locations at TP9 (above the left ear), AF7 (left side of the forehead), AF8 (right side of the forehead), and TP10 (above the right ear), described using the International 10-20 system [53], as shown in Figure 3.  Two types of EEG frequency domain features, namely the power spectral density (PSD) feature and the frontal asymmetry feature, are extracted from the EEG data. Both types of features can be used in real-time valence and arousal detection [54,55]. The EEG data are processed using the Muse LSL package [56].
The EEG signal is decomposed using fast Fourier transform (FFT) through a 1 s sliding window with an overlap of 80% to extract the PSD. This procedure is implemented to reduce spectral leakage and minimize data loss to extract the PSD [54].
Examples of the EEG raw signal in the time domain for both the PH and NL sessions are presented in Figure 4a,b. Examples of five consecutive sliding windows of the PSD in the frequency domain for both sessions are presented in Figure 4c,d. The PSD features are acquired from each electrode location (TP9, AF7, AF8, and TP10) in four distinct frequency bands: θ (4-8 Hz), α (8-13 Hz), β (13-30 Hz), and γ (30-40 Hz) [55]. These features have been commonly used as the input for classifying affective valence and arousal using EEG signals [31,32,55,57]. The θ band power is often correlated with relaxation [54].

EEG Feature Extraction
Two types of EEG frequency domain features, namely the power spectral density (PSD) feature and the frontal asymmetry feature, are extracted from the EEG data. Both types of features can be used in real-time valence and arousal detection [54,55]. The EEG data are processed using the Muse LSL package [56].
The EEG signal is decomposed using fast Fourier transform (FFT) through a 1 s sliding window with an overlap of 80% to extract the PSD. This procedure is implemented to reduce spectral leakage and minimize data loss to extract the PSD [54].
Examples of the EEG raw signal in the time domain for both the PH and NL sessions are presented in Figure 4a,b. Examples of five consecutive sliding windows of the PSD in the frequency domain for both sessions are presented in Figure 4c,d. The PSD features are acquired from each electrode location (TP9, AF7, AF8, and TP10) in four distinct frequency bands: θ (4-8 Hz), α (8-13 Hz), β (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), and γ (30-40 Hz) [55]. These features have been commonly used as the input for classifying affective valence and arousal using EEG signals [31,32,55,57]. The θ band power is often correlated with relaxation [54]. An increase in the frontal θ power (e.g., in AF7 and AF8 in Figures 3 and 5a) can be observed with a lower arousal stimulus [54]. In addition, a greater θ band power in the right hemisphere (e.g., in AF8 and TP10) can be observed when there are negative stimuli [58]. The α band power is related to the relaxed state of the mind [54]. An increase in the α band power in the right hemisphere (e.g., in AF8 and TP10) occurs also when viewing negative stimuli [54]. A decrease in the frontal α band power (e.g., in AF7 and AF8) can be observed when someone is exposed to high-arousal stimuli [59]. An increase in the frontal θ power (e.g., in AF7 and AF8 in Figures 3 and 5a) can be observed with a lower arousal stimulus [54]. In addition, a greater θ band power in the right hemisphere (e.g., in AF8 and TP10) can be observed when there are negative stimuli [58]. The α band power is related to the relaxed state of the mind [54]. An increase in the α band power in the right hemisphere (e.g., in AF8 and TP10) occurs also when viewing negative stimuli [54]. A decrease in the frontal α band power (e.g., in AF7 and AF8) can be observed when someone is exposed to high-arousal stimuli [59]. The β band is associated with the sensory-motor system and an increase in the β band power has been found when someone is exposed to positive stimuli [57]. Furthermore, an increase in the frontal β band power (e.g., in AF7 and AF8) has been observed for viewing high-arousal stimuli [60]. The γ band power has been associated with the integration of information and an increase in the γ band power has been found for viewing positive valence stimuli as well as high-arousal stimuli [57]. An example of these PSD features for the PH session and NL session based on the signals shown in Figure 4 are presented in Figure 5a.
Previous studies have shown that the valence and arousal of a person are correlated to the frontal EEG asymmetry, which refers to the power difference between the left and right frontal hemispheres of the brain within the and β frequency bands [54,57,61]. More specifically, a greater frontal left hemisphere activity is associated with positive valence, while a greater frontal right hemisphere activity is associated with negative valence [61]. In addition, higher arousal is characterized by a higher β activity and lower activity on the frontal hemispheres of the brain [61]. Frontal EEG asymmetry features are measured by the ratio of the and β bands in order to determine valence and arousal [55]. Four different valence and arousal frontal asymmetry features are adapted from [55]. These features are computed as valence, v1 to v4, Equations (1)-(4), and arousal, to , Equations (5)-(8): (1)  The β band is associated with the sensory-motor system and an increase in the β band power has been found when someone is exposed to positive stimuli [57]. Furthermore, an increase in the frontal β band power (e.g., in AF7 and AF8) has been observed for viewing high-arousal stimuli [60]. The γ band power has been associated with the integration of information and an increase in the γ band power has been found for viewing positive valence stimuli as well as high-arousal stimuli [57]. An example of these PSD features for the PH session and NL session based on the signals shown in Figure 4 are presented in Figure 5a.
Previous studies have shown that the valence and arousal of a person are correlated to the frontal EEG asymmetry, which refers to the power difference between the left and right frontal hemispheres of the brain within the α and β frequency bands [54,57,61]. More specifically, a greater frontal left hemisphere activity is associated with positive valence, while a greater frontal right hemisphere activity is associated with negative valence [61]. In addition, higher arousal is characterized by a higher β activity and lower α activity on the frontal hemispheres of the brain [61]. Frontal EEG asymmetry features are measured by the ratio of the α and β bands in order to determine valence and arousal [55]. Four different valence and arousal frontal asymmetry features are adapted from [55]. These features are computed as valence, v 1 to v 4 , Equations (1)-(4), and arousal, a 1 to a 4 , Equations (5)- (8): where α AF7 , α AF8 , β AF7 , and β AF8 are the α and β band powers measured at the AF7 and AF8 locations. An example of these features, for the PH and NL sessions, based on the signals shown in Figure 4, is presented in Figure 5b. Based on the results presented in Figure 5a, the NL session had a higher θ band power which indicates that the NL session induced more negative valence and lower arousal compared with the PH session [54,58]. The higher α band power in the NL session indicates the NL session elicited more negative valence than the PH session [54]. The lower α band power indicates the PH session induced higher arousal than the NL session [59]. The PH session had a higher β and γ band power which resulted in more positive valence and higher arousal compared with the NL session [57].
Regarding the average frontal EEG asymmetry based on the α and β band powers presented in Figure 5b, the PH session had higher v 2 and v 3 and lower v 1 and v 4 compared with the NL session. This indicates that the user in the PH session experienced more positive valence compared with the NL session [55,61]. On the other hand, the PH session had higher a 2 -a 4 and lower a 1 compared with the NL session. This indicates that the user experienced higher arousal in the PH session [55,61].
In total, 20 features are utilized for both valence or arousal detection, which include 16 PSD features for the four frequency bands, θ, α, β, and γ, measured at the locations TP 9 , AF 7 , AF 8 , and TP 10 and four frontal EEG asymmetry features obtained from Equations (1)-(8).

Self-Assessment
For training, each user completed a self-assessment questionnaire to report their perceived affect after viewing each of the stimuli. We utilize the Self-Assessment Manikin (SAM) [33] pictorial assessment technique to measure self-reported valence and arousal to the robot stimuli. For both valence and arousal, a 5-point Likert scale ranging from −2 (highly negative valence or very low arousal) to +2 (highly positive valence or very high arousal) with the corresponding SAM pictorial representation is presented to each user. The self-assessed affect is then used to label the corresponding EEG signals to develop our affect detection models.

Affect Detection Model
We investigate two learning-based models for our affect detection module. Namely, a three-hidden layer multilayer perceptron neural network (NN) model and a support vector machine (SVM) model with the radial basis function kernel (RBF) are considered and compared using the Scikit-Learn toolbox [62]. These two models are considered as they are the most commonly used learning-based models for affect classification [32]. Each model consists of a valence sub-model and arousal sub-model. As previously mentioned, they are trained using the labeled EEG signals.

Experiments
A user study was conducted to evaluate the proposed affect elicitation and detection methodology. We were able to recruit nineteen participants (µ = 45.58, σ = 30.95) for two one-on-one interactions with Pepper. This participant size is comparable to other affect elicitation and detection studies which have 4-22 participants, e.g., [11][12][13]. Participants consisted of individuals from two different age groups: (1) 13 younger adults (YA) between the ages of 22 and 38, mainly university students (12 male and 1 female), and (2) 6 older adults (OA) between 81 and 96 years old from a local long-term care facility (1 male and 5 female). All subjects gave their informed consent for inclusion before they participated in the study. The study received approval from the University of Toronto Ethics Committee.
Our experiment follows the standard comparison of different emotions in the same context within-subject experiment design, where each participant is exposed to two different stimuli under the same experiment conditions [63]. This experiment design has been commonly used in numerous affect elicitation studies, where EEG data are collected during the applied stimuli and used to develop affect detection models [64][65][66]. The experiment took place in an isolated quiet room. Participants were seated in front of the robot and wore the EEG headband, Figure 6. Both stimuli consisting of the robot movements set to music were presented to each participant to elicit either positive valence and high arousal (PH session) or negative valence and low arousal (NL session). The stimuli were presented to the users in a random order. Prior to each session, participants were given 2 min to relax in silence, where no stimulus was presented. Each session was approximately 4 min in duration followed by a 5 min break between each session. Participants were asked to report their perceived valence and arousal levels using the 5-point (+2 to −2) SAM scale during the break, where negative ratings were considered as negative valence or low arousal, 0 was considered as neutral, and positive ratings were considered as positive valence or high arousal.
Robotics 2020, 9, x FOR PEER REVIEW 10 of 17 consisting of the robot movements set to music were presented to each participant to elicit either positive valence and high arousal (PH session) or negative valence and low arousal (NL session). The stimuli were presented to the users in a random order. Prior to each session, participants were given 2 min to relax in silence, where no stimulus was presented. Each session was approximately 4 min in duration followed by a 5 min break between each session. Participants were asked to report their perceived valence and arousal levels using the 5-point (+2 to −2) SAM scale during the break, where negative ratings were considered as negative valence or low arousal, 0 was considered as neutral, and positive ratings were considered as positive valence or high arousal.

Affect Elicitation Results
The self-reported results for all the participants as well as for each age group per each session are presented in both Figure 7 and Table 2. On average, the participants reported positive valence and high arousal for the PH session and negative valence and low arousal for the NL session, Figure 7a,b, which was consistent with our intended elicited affect for each session.
With respect to the two age groups, for valence, 84.62% of YA self-reported positive valence for the PH session as well as negative valence for the NL session. For OA, all of the participants self-rated a +2 in the PH session. However, there was less consensus for this group in the NL session, where 50% of them reported negative valence, 33.33% reported positive, and 16.67% reported neutral valence. The data from these 84.62% of the YA and 50% of the OA (i.e., 14 participants in total) were used to train our valence detection models. One YA and two OA perceived positive valence for both sessions as they stated that they were intrigued/amazed by the robot's performance. Compared with the YA, the OA, in general, had higher perceived valence in both sessions. This may be due to the fact that they had less exposure to and experience with robots, and these individuals were more excited to interact with a robot for the first time.
With respect to arousal, on average, both the YA (61.54%) and OA (66.67%) self-reported high arousal for the PH session, Figure 7c. The YA also self-reported low arousal (76.92%) during the NL session, while all of the OA reported neutral arousal, Figure 7d. The data from these 61.54% YA and 66.67% OA (i.e., 12 participants in total), who reported higher arousal in the PH session than the NL session, were used to develop our arousal detection model. The same YA, who perceived positive valence for both sessions, also reported high arousal for both sessions. In addition, three YA rated low arousal for both sessions, however, they all commented on the robot's ability to display complex human-like movements. Two OA perceived neutral arousal for both the PH and NL sessions. The first OA had positive valence for both sessions. This participant found the robot's movements in

Affect Elicitation Results
The self-reported results for all the participants as well as for each age group per each session are presented in both Figure 7 and Table 2. On average, the participants reported positive valence and high arousal for the PH session and negative valence and low arousal for the NL session, Figure 7a,b, which was consistent with our intended elicited affect for each session. With respect to the two age groups, for valence, 84.62% of YA self-reported positive valence for the PH session as well as negative valence for the NL session. For OA, all of the participants self-rated a +2 in the PH session. However, there was less consensus for this group in the NL session, where 50% of them reported negative valence, 33.33% reported positive, and 16.67% reported neutral valence. The data from these 84.62% of the YA and 50% of the OA (i.e., 14 participants in total) were used to train our valence detection models. One YA and two OA perceived positive valence for both sessions as they stated that they were intrigued/amazed by the robot's performance. Compared with the YA, the OA, in general, had higher perceived valence in both sessions. This may be due to the fact that they had less exposure to and experience with robots, and these individuals were more excited to interact with a robot for the first time.
both sessions to be pleasant. The second OA had positive valence for the PH session, and neutral valence for the NL session. This participant stated that they did recognize the robot was sad, therefore resulting in the lower reported valence, however, the difference in body movements and music did not affect their arousal.  Regarding the OA who reported neutral valence during the NL session, we postulate that this could be a result of older adults having a negative-to-neutral shift in perceiving negative stimuli, namely they tend to experience negative stimuli as neutral more frequently than younger adults [67]. This can be a result of age-related reduced amygdala activity to negative stimuli, especially negative valence and low-arousal stimuli, where the amygdala is the region of the brain associated with processing and experiencing emotions [67]. In addition, the reason why the OA did not self-report lower than neutral arousal may also be due to age-related changes in the intensity of the affective With respect to arousal, on average, both the YA (61.54%) and OA (66.67%) self-reported high arousal for the PH session, Figure 7c. The YA also self-reported low arousal (76.92%) during the NL session, while all of the OA reported neutral arousal, Figure 7d. The data from these 61.54% YA and 66.67% OA (i.e., 12 participants in total), who reported higher arousal in the PH session than the NL session, were used to develop our arousal detection model. The same YA, who perceived positive valence for both sessions, also reported high arousal for both sessions. In addition, three YA rated low arousal for both sessions, however, they all commented on the robot's ability to display complex human-like movements. Two OA perceived neutral arousal for both the PH and NL sessions. The first OA had positive valence for both sessions. This participant found the robot's movements in both sessions to be pleasant. The second OA had positive valence for the PH session, and neutral valence for the NL session. This participant stated that they did recognize the robot was sad, therefore resulting in the lower reported valence, however, the difference in body movements and music did not affect their arousal.
Regarding the OA who reported neutral valence during the NL session, we postulate that this could be a result of older adults having a negative-to-neutral shift in perceiving negative stimuli, namely they tend to experience negative stimuli as neutral more frequently than younger adults [67]. This can be a result of age-related reduced amygdala activity to negative stimuli, especially negative valence and low-arousal stimuli, where the amygdala is the region of the brain associated with processing and experiencing emotions [67]. In addition, the reason why the OA did not self-report lower than neutral arousal may also be due to age-related changes in the intensity of the affective stimuli to which the amygdala is more receptive [67]. In general, when high-arousing stimuli is observed, OA and YA show similar levels of amygdala activity, however, OA have decreased amygdala activity compared with YA when low-arousing stimuli is observed, such that often they do not experience the low arousal that YA may experience [67].

Affect Detection Models
As we only used the data of users whose perceived affect matched with the intended elicited affect in both sessions, the EEG data obtained from 14 participants were used for developing our valence detection sub-model and from 12 participants for the arousal detection sub-model. Each participant had two sessions (i.e., one positive and one negative session), and the EEG data were recorded for an average of 223 secs for each session. Average band powers were sampled every second to compute the features which resulted in 6254 samples of features from the 14 participants for valence detection and 5424 samples from the 12 participants for arousal detection. For both the NN and SVM models, we labeled our samples in two classes for valence (positive and negative valence) and for arousal (high and low arousal) such that the same number of samples was used for each class (i.e., 3127 samples for both positive and negative valence classes, and 2712 samples for both high and low arousal classes).
We first separate our data into a training set and a testing set (approximately 75%/25% split between users), and the testing set contains the data of users that are not in the training set. Both the training and testing sets had a combination of YA and OA data. We conducted both subject-dependent ten-fold and subject independent leave-one-out (LOO) cross-validations on the training set. Then another subject independent evaluation was performed on the testing set consisting of multiple new users to further assess how well the classification models perform on unknown subjects to the system.
The overall classification results are presented in Table 3. For the ten-fold cross-validation, the classification rates for detecting valence were 71.9% and 70.1%, and for arousal were 70.6% and 69.5% using the NN and SVM detection models, respectively. The LOO cross-validation classification rates for valence were 63.7% and 61.8% for the NN and SVM models and for arousal were 63.3% and 61.6% for NN and SVM. Since the ten-fold cross-validation is subject-dependent while the LOO cross-validation is subject independent, the classification rates from the ten-fold are expected to be higher [31,68]. With respect to the testing set, the classification rates were 63.3% and 62.4% for valence and 62.6% and 61.2% for arousal for the NN and SVM models. The classification results from both the LOO cross-validation and subject-independent testing are comparable to each other. Furthermore, they are also comparable to other non-HRI affect detection techniques used with subject-independent EEG data, namely 57.6-62.5% for valence and 55.7-62.5% for arousal [43,66,[68][69][70][71][72]. Based on the results, NN achieved higher classification rates for detecting both valence and arousal with both the training and testing sets. To further evaluate the ability of each learning-based model to distinguish between classes, receiver operating characteristics (ROC) curves [73] were plotted for each model on the testing set, Figure 8. The area under the curve (AUC) of an ROC curve represents how the model is able to distinguish between different classes [73]. For the NN model, the AUCs were 0.73 and 0.74 for detecting valence and arousal, respectively. For the SVM model, the AUCs were 0.72 and 0.71 for valence and arousal detection. Therefore, the NN model achieved a slightly higher AUC than the SVM model for both valence and arousal detection, and was effectively able to distinguish between positive and negative valence as well as high and low arousal.

Conclusions
In this paper, we present the development of a novel affect elicitation and detection methodology for socially assistive robots. The affect elicitation stimuli were uniquely designed using non-verbal emotional behaviors of the robot set to affective music to elicit positive valence and high arousal, and negative valence and low arousal. User affect was measured through EEG signals and used to train two learning-based affect detection models. HRI experiments consisting of two different age groups showed that the majority of participants were able to successfully perceive and reciprocate the affect as intended. Furthermore, a three-hidden layer multilayer perceptron neural network model achieved better classification results for both valence and arousal detection than a support vector machine model. Our future work will focus on using our affect detection model during various assistive HRI tasks to detect and respond to user affect in order to promote more engaging HRIs.

Conclusions
In this paper, we present the development of a novel affect elicitation and detection methodology for socially assistive robots. The affect elicitation stimuli were uniquely designed using non-verbal emotional behaviors of the robot set to affective music to elicit positive valence and high arousal, and negative valence and low arousal. User affect was measured through EEG signals and used to train two learning-based affect detection models. HRI experiments consisting of two different age groups showed that the majority of participants were able to successfully perceive and reciprocate the affect as intended. Furthermore, a three-hidden layer multilayer perceptron neural network model achieved better classification results for both valence and arousal detection than a support vector machine model. Our future work will focus on using our affect detection model during various assistive HRI tasks to detect and respond to user affect in order to promote more engaging HRIs.