1. Introduction
The study of emotional responses from consumers has gained interest in the field of sensory science [
1]. The self-reported and physiological responses from consumers towards different types of stimuli (e.g., images) are important perceptual dimensions to be considered. According to the 7-38-55 rule from Sarma and Bhattacharyya [
2], 7% of messages are conveyed by verbal communication, 38% by voice intonation, and 55% by body language and facial expressions. As shown in a review on autonomic nervous system (ANS) activity and emotions conducted by Kreibig (2010), the ANS activity has been incorporated as a major component of the emotional responses in many recent theories of emotions. Therefore, the present study identifies the ANS responses in emotions and the self-reported responses of panelists towards visual stimuli.
The Geneva Affective PicturE Database (GAPED) is an image-based repository that consists of 730 images developed by The Department of Psychology of the University of Geneva, Switzerland, to increase the availability of visual emotion stimuli for mental state assessments and emotional responses towards images. It consists of negative, positive, and neutral images [
3]. The individual pictures have been rated according to arousal, valence, and the compatibility of the represented scene with respect to external (legal) and internal (moral) standards. The EsSense® profile developed by King and Meiselman [
1] is a method used to assess the self-reported emotional responses of consumers towards products by providing a list of emotional attributes. A disadvantage of many verbal self-reported approaches is that it makes the panelists bored and more tired when they make a large number of evaluations per sample [
4]. Several studies have been conducted using the self-reported responses of consumers by using questionnaires or through online surveys [
5,
6]. However, the outputs of these studies can have considerable variability and bias as the self-reported responses may differ from one individual to another, mainly based on the personal judgement and oral expressions from individuals [
7]. Self-reported responses of consumers are considered as indirect measurements of sensory experiences [
8]. Thus, self-reported responses do not always represent consumer attitudes and preferences. This arouses the need for more research conducted using physiological responses to understand consumer implicit reactions. ANS activity may help explain a different dimension of the emotional experiences of consumers [
9].
Biometric techniques are commonly used to identify people based on their unique physical and biological characteristics. They are used to verify the identity of an individual by using one or more of their personal characteristics [
10]. Heart rate (HR), skin or body temperature (ST or BT), skin conductance (SC), eye pupil dilation (PD), and finger prints (FP) are some of the most familiar biometrics. On the other hand, FaceReader™ (Noldus Information Technology, Wageningen, Netherlands) is a software that analyzes facial expressions of participants via the detection of changes in their facial movements and relates them to emotions using machine learning models. FaceReader™ has been trained to classify facial changes to obtain intensities of eight emotions: (i) happy, (ii) sad, (iii) angry, (iv) surprised, (v) scared, (vi) disgusted, (vii) contempt, and (viii) neutral [
11]. When compared to most biometric systems, such as the use of sensors in direct contact with the bodies of participants, face recognition for facial expressions analysis has the advantage of being a non-invasive process based on the video analysis of participants [
12]. There have been recent studies conducted using FaceReader™ to evaluate images to obtain human emotions. For example, Ko and Yu [
13] and Yu [
14] studied facial expressions to understand the emotional responses towards two sets of images, with and without shading and texture. It was used as a guide for graphic designers to establish emotional connections with viewers by using design elements to reflect consumers interests.
Regarding other biometric techniques, different studies and discussions on how Skin Temperature (ST) affects emotions of humans show contradicting results. According to Barlow [
15], decreases in ST were associated with anxiety, fear, tension, and unpleasantness. However, studies conducted by Cruz Albarran et al. [
16] found that the ST was lower with joy and higher when angry and disgusted. Furthermore, Brugnera, et al. [
17] concluded that the Heart Rate (HR) and ST decreased during experiences of happiness and anger. Also, according to studies conducted by Kreibig [
9], fear, sadness, and anger were associated with lower ST. The HR is another biometric technique that can be correlated with emotions, as stress can increase blood pressure (BP, systolic pressure (SP) and diastolic pressure (DP)) [
18]. There are several views on HR and BP towards emotional experiences. It has been found that the pleasantness of stimuli can increase the maximum HR response, and HR decreases with fear, sadness, and happiness [
18]. Dimberg [
19] concluded that the HR decreased during happiness and anger. Studies conducted by McCaul et al. [
20] have shown that fear increased the HR of subjects. Ekman et al. [
21] used some emotions to collect details about the physiological measures. They concluded that happiness, disgust, and surprise showed lower HR, while fear, anger, and sadness showed higher HR. de Wijk et al. [
22] concluded that liking scores were positively correlated with increases in HR and ST. The measurement of BP is mostly used in medical research and is limited in studies used as a physiological response to associate emotions [
23]. However, a study conducted by Barksdale et al. [
24] on racial discrimination using black Americans showed that BP was negatively correlated with sadness and frustration.
Due to the high discrepancy in the results of previous research, the present study focused on both physiological and self-reported responses measured using FaceReader™, HR, SP, DP, and ST combined with a simplified face scale measurement and the EsSense profile®, respectively. The main objective was to understand differences in the self-reported and physiological responses towards the perception of positive, neutral, and negative images by consumers. The specific objective was to predict the self-reported responses using significant biometric parameters. Results showed that physiological along with self-reported responses were able to separate images based on cluster analysis as positive, neutral, and negative according to GAPED classification. Emotional terms with high or low valence were predicted by a general linear regression model using biometrics, while calm, which is in the center of emotion dimensional model, was not predicted.
2. Materials and Methods
2.1. Participants and Stimuli Description
Panelists were recruited from the staff and students at The University of Melbourne, Australia, via e-mail. A total of N = 63 participants from different nationalities attended the session; however, due to issues with the video and thermal image quality (incorrect position of the participants), only 50 participants between 25–55 years old were used for the analysis. Panelists received chocolate and confectionary products as incentives for their participation in the study. According to the Power analysis (1 − β > 0.999) conducted using SAS® Power and Sample Size 14.1 software (SAS Institute Inc. Cary, NC, USA), the sample size of 50 participants was enough to find significant differences between samples. The experimental procedure was approved by the Ethics Committee of the Faculty of Veterinary and Agricultural Sciences at The University of Melbourne, Australia (Ethics ID: 1545786).
A total of 12 images (the number of images were decided as 12 to avoid fatigue from panelists by exposing to too many stimuli)—ten from GAPED and two chosen based on common fears, four from each emotion, positive, neutral and negative—were selected as stimuli, based on the valence and arousal scores proposed by Dan-Glauser and Scherer [
3] using continuous scales ranging from 0 to 100 points for valence and arousal scores. From the latter reference, the images defined as positive were rated above 71 points and negative below 64 points for the valence scores. The values in-between (40–69) were considered as the neutral images. Neutral pictures were slightly above the scale midpoint. This may be due to the relative comparison with many negative pictures. Regarding arousal ratings, neutral (below 25) and positive (below 22) images obtained relatively low values. Negative images had mildly arousing levels ranging from 53 to 61. Due to ethical reasons, the panelists could not be exposed to extreme negative images, so as a result the other two negative images were selected from a previous study [
25]. The images were displayed on a computer screen (HP Elite display, E231 monitor, Palo Alto, CA, USA) with a 1080 × 1920-pixel resolution for 10 s each (
Figure 1). The order of images exposed to panelists was positive, neutral, and negative, respectively, to avoid the contrast effect of changing form one extreme condition to another. During the recruitment, panelists did not know about the study. However, prior to the experiment, panelists attended a briefing session in which they were asked to sign a written consent form to acknowledge video recording and instructed with the experimental steps of the session as per ethical approval requirements.
2.2. Sensory Session and Self-reported Response Acquisition
The study was conducted using individual portable booths in a conference-type room, which isolated the panelists, and there was no interaction between them. Each booth contained a Hewlett Packard (HP) computer screen (Hewlett Packard, Palo Alto, CA, USA) to present the images, a video camera Point 2 View USB Document Camera (IPEVO Sunnyvale, CA, USA) to record videos from participants during the session, a FLIR ONE™ (FLIR Systems, Wilsonville, OR, USA) infra-red camera (thermal resolution 80 × 60, ±2 °C/ ±2% accuracy, sensitivity 0.1 °C) to obtain thermal images, and a Samsung tablet PC (Samsung, Seoul, South Korea) displaying a bio-sensory application (App) that is capable of showing the sensory questionnaire and collating the data from each participant, which was developed by the sensory group from The University of Melbourne. Participants were seated 30–40 cm from the cameras, and room temperature was within 24–25 °C, which is within the temperature range that the infrared thermal camera operates normally (0–35 °C).
Panelists were asked to observe the images for 10 s and immediately after, respond to a questionnaire in the bio-sensory App [
26]. The sensory form consisted of two types of questions: (i) a simplified version of the face scale used for tests with children [
27] consisting of a 15-cm continuous non-structured face scale and with no anchors showing faces that changed from very sad to very happy, passing through the neutral state (
Figure 2); and (ii) EsSense profile
® questions using a 5-points scale categorized as: 1 = “Not at all”, 2 = “Slightly”, 3 = “Moderately”, 4 = “Very”, and 5 = “Extremely”. The reason for using a modified face scale from that used with children was because it is easier for consumers to reflect their emotions with the face when they are looking instead of using words, and this also avoids the use of more than one scale to assess positive, neutral, and negative scales separately [
25]. Further, the EsSense profile® scale was used for five emotion-based words (sad, scared, calm, peaceful, and happy). The emotion-based words happy (HappyEs), peaceful (PeacefulEs), sad (SadEs), and scared (ScaredEs) were selected from the EsSense profile® as they best represent the emotions obtained by the FaceReader™ 7 (Noldus Information Technology, Wageningen, Netherlands); furthermore, each selected term represents an area of the arousal-valence dimension—happy has a high valence and high arousal, sad has a low valence and low arousal, peaceful is high valence and low arousal, and scared has low valence and high arousal. On the other hand, calm (CalmEs) was selected as the closest to represent the neutral emotion from the FaceReader™ as its valence and arousal scores place this term in the center of the two-dimensional (arousal versus valence) emotions model according to Jun et al. [
28] and Markov and Matsui [
29]. There was a five second blank screen before the panelist was exposed to the next image.
2.3. Video Acquisition and Facial Expressions Analysis
Videos from each participant were recorded during the whole session and post-processed by cutting 12 shorter videos from the parts in which the participant was looking at the stimulus. These videos were further analyzed in FaceReader™ using the default settings. Two different models were used for facial expressions analysis: The East Asian model for the Asian participants, and the General model for Non-Asians, as recommended by the software manufacturer (Noldus Information Technology, 2016). The outputs from FaceReader™ consist of eight emotions: (i) neutral, (ii) happy, (iii) sad, (iv) angry, (v) surprised, (vi) scared, (vii) disgusted and (viii) contempt on a scale from 0 to 1, two dimensions (ix) valence and (x) arousal on a scale from −1 to 1, head orientation in the three axes (xi) X (Xhead), (xii) Y (Yhead), and (xiii) Z (Zhead), (xiv) gaze direction (GazeDir; −1 = left; 0 = forward; 1 = right), and five facial states (xv) mouth (0 = closed; 1 = opened), (xvi) left eye and (xvii) right eye (0 = closed; 1 = opened), (xviii) left eye brow, and (xix) right eyebrow (−1 = lowered; 0 = neutral; 1 = raised). Each emotion was averaged and summed, this value was taken as 100%, then the percentage of each emotion was calculated ; for the two emotional dimensions and head orientation movements the maximum value was used, while for the face states the mean values were obtained due to the nature of the data, which, as previously explained, was transformed to 0 and 1.
Images obtained using FLIR ONE™ infra-red thermal camera and the radiometric data files in comma separated values format (csv) obtained using FLIR Tools™ (FLIR Systems, Wilsonville, OR., USA) were processed using Matlab
® R2018b (MathWorks, Inc., Natick, MA, USA) with a customized algorithm that is able to automatically detect the area of interest (rectangular area including both eyes) using the cascade object detector algorithm [
30] to extract the maximum temperature value (°C) of each image [
25,
31]; the average of the maximum value extracted from all images of the participant that corresponded to the same sample was used for further analyses. The videos captured were analyzed using the raw video analysis method (RVA) using customized codes written in Matlab
® R2018
b. The videos were manually cropped to analyze forehead, right cheek, and left cheek of panelists to obtain a higher accuracy. The areas of the cropped rectangles were in the range of 120–150 × 50–70 pixels for forehead and 50–80 × 60–90 pixels for each of the cheeks. These Areas of Interest (AOIs) were selected because these are the areas in which there is higher blood flow, and therefore the heart rate measurement has higher accuracy [
23,
32]. The outputs obtained from this method were the average and standard deviation values for the HR, Amplitude, and Frequency of forehead, right cheek, and left cheek. These results were further processed using machine learning models developed using Levenberg-Marquardt backpropagation algorithm with high accuracy (R = 0.85) [
23] to obtain HR, SP, and DP. The biometrics presented in this paper do not have a medical grade but are accurate enough to compare differences and find changes between participants [
23,
25,
31,
33,
34,
35].
2.4. Statistical Analysis
Multivariate data analysis based on principal components analysis (PCA), cluster analysis, and correlation matrix (CM) were developed for the data from the self-reported responses along with the FaceReader™ outputs, HR, ST, SP, and DP using a customized code written in Matlab
® (Mathworks Inc., Natick, MA, USA) to assess relationships (PCA) and significant correlations (CM; p-value < 0.05) among the different parameters [
25]. Furthermore, data from the self-reported responses and biometrics (facial expressions, HR, SP, DP, and ST) were analyzed for significant differences using Analysis of variance (ANOVA) for the effect of images nested within the classification group and the least squares means post-hoc test (α = 0.05) in SAS® software 9.4 (SAS Institute Inc. Cary, NC, USA). Multiple regression analysis using Minitab® 18.1 software was used to obtain predictions of self-reported responses towards stimuli using the physiological (biometric) responses as predictors. A general model was developed using all positive, neutral, and negative images selected for the study, while three other models as positive, negative, and neutral were developed specifically for each image category. A forward selection stepwise procedure (α = 0.05) was used to obtain a model in each case. Physiological responses that were not significant for a given condition were not considered as potential predictors. The aim of this step was to determine which self-reported response may be best predicted using biometrics.