A Survey on Psycho-Physiological Analysis & Measurement Methods in Multimodal Systems

: Psycho-physiological analysis has gained greater attention in the last few decades in various ﬁelds including multimodal systems. Researchers use psychophysiological feedback devices such as skin conductance (SC), Electroencephalography (EEG) and Electrocardiography (ECG) to detect the affective states of the users during task performance. Psycho-physiological feedback has been successful in detection of the cognitive states of users in human-computer interaction (HCI). Recently, in game studies, psycho-physiological feedback has been used to capture the user experience and the effect of interaction on human psychology. This paper reviews several psycho-physiological, cognitive, and affective assessment studies and focuses on the use of psychophysiological signals in estimating the user’s cognitive and emotional states in multimodal systems. In this paper, we review the measurement techniques and methods that have been used to record psycho-physiological signals as well as the cognitive and emotional states in a variety of conditions. The aim of this review is to conduct a detailed study to identify, describe and analyze the key psycho-physiological parameters that relate to different mental and emotional states in order to provide an insight into key approaches. Furthermore, the advantages and limitations of these approaches are also highlighted in this paper. The ﬁndings state that the classiﬁcation accuracy of >90% has been achieved in classifying emotions with EEG signals. A strong correlation between self-reported data, HCI experience, and psychophysiological data has been observed in a wide range of domains including games, human-robot interaction, mobile interaction, and simulations. An increase in β and γ -band activity have been observed in high intense games and simulations.


Introduction
Research in psycho-physiological analysis has grown rapidly in recent decades. Psycho-physiological measures such as electroencephalography (EEG), electrocardiography (ECG) and skin conductance (SC) have been used to estimate user cognitive and affective states in various environments. People from varied backgrounds are using psycho-physiological analysis according to their logical foundations, interests, and inspirations. Various studies have been conducted to record user experience or psychological effects during human-computer interaction (HCI), they have utilized real-time features from the psychophysiological data to access subject's psychology. Along these lines, we have various research outcomes for different research questions, yet next to no amassed information that could be utilized for creating more exact research questions or for making hypothetical theories. This review aims to provide a critical analysis of these studies.
Psycho-physiological research involves the study of human psychology through physiological signals [1]. However, there is no direct relationship between psychological phenomena and physiological processes, which makes the interpretation of the signals more difficult. Multi-modal human-computer interaction (HCI) stimuli are quite challenging in this scenario-they utilized physically intricate input devices and usually provide outputs in multiple modalities; these interactions also involve detailed cognitive processing on various levels and can last from a few seconds to several hours [2]. The motivation for using a multi-modal interaction system also changes from person to person [3]. As most of the psycho-physiological activities initiated from a much simpler experimental setup, it is not obvious that the same relation will appear when interacting with a multi-modal HCI system. For example, a direct relation found between physiology and psychology when a person looks at the pictures that are selected for inducing emotions, might not appear the same way when the person is experiencing a multi-modal system.
A cognitive state is the mental action of acquiring knowledge through thoughts, experience and senses. There are many processes and functions that contribute to a cognitive state such as attention, memory, reasoning, problem solving, and so forth. Psychological efforts are the cognitive actions executed to complete a task. Emotional state is a relatively brief conscious experience result in intense mental activity. User experience is the actual experience of a user based on cognitive states and emotions when interacting with an HCI. In cognitive psychology, cognitive load refers to the used amount of working memory resources.
Another complication in analyzing multi-modal system through psychophysiology is the lack of common theory on how the interaction experience arises. One big task is to identify the research questions that can be answered even when the multi-modal interaction is complex and the psychological processes are numerous, and then to design an experimental setup with no confounding variables affecting the results [4]. If the multi-modal interaction is complicated and hard to control, the sample size should be large enough to record the reaction without noise. The recommended sample size by statisticians to effectively detect a large reaction is at least 28 [5], but it largely depends upon the experimental tasks, psychophysiological measures, and environment.
The HCI experience is greatly affected by physical reactions during a process. Hence, it will entice the claim that psycho-physiological strategies give a technique to measure the multi-modal HCI experience itself. Actually, we can only see into those parts of the interaction that has distinguished measurable physical concomitants. In spite of that, physiological measures might provide more representing and detailed information about the user's cognitive and emotional states than subjective techniques [6]. Most of the physiological studies involved voluntary participation, therefore, the recordings are polluted by participants' answering style, social involvement, the limit of memory, questionnaire quality and observer biases. Moreover, for HCI interaction the main advantage is that the signals can be recorded automatically and in real-time without disturbing the participant's natural conduct. Another advantage of psychophysiological measurement is that it can detect quite sensitive responses. Join together with other methods such as observational data and questionnaires, the psycho-physiological analysis includes noteworthy precision in studying multi-modal HCI. There are some practical limitations such as expensive equipment, attention, time and device maintenance but with the advancement in technology, these factors will become insignificant.
As there is a lot of literature on the theory, method and practice of psycho-physiological analysis, in this paper we are only covering the part relates to cognitive and emotional state estimation. There are many papers that review the psycho-physiological analysis in HCI context, but no single accepted theory is available that can explain the HCI interaction experience. Most of the theoretical framework has been borrowed from other fields such as physiology, psychology and media research. In this paper, we reviewed some recent methods used to analyze multi-modal HCI systems and the parameters involved in estimating the cognitive and emotional state.

Psycho-Physiological Signals and Analysis
Psychophysiology is a branch of physiology that deals with the relationship between psychological and physical phenomenon. To record psychophysiological signals, three kinds of measure: reports, reading, and behavior are used. The reports evaluate participants introspection and self-rating about the psychological and physiological states [7]. Questionnaires are most commonly used to record the self-rating. The merit of using report is that it is a representation of user's subjective experience; however, the demerit is the human error such as bias response, misunderstanding of question or scale, and so forth [8].
Reading corresponds to the physiological responses that are measured via an instrument to read bodily events such as heart rate, body temperature, muscle tension, brain signals, skin conductance and so forth [7]. The benefit of using these measures is that they provide an accurate and subject independent response; however, they are very prone to physical activity and situation [9]. The behavior measure involves the recording of observations and actions such as facial expressions, eye movements, and so forth [7]. These responses are easy to measure and mostly used in attention and emotions related experiments [10,11].
In psychophysiology, a complex and interactive analysis of bio-signals is usually required. The application of psycho-physiological analysis range from stress to lie detection. Often, researchers use it to monitor the effect of an experiment on the user by measuring the short-term affective responses (feeling, mood, disposition, etc.) [12]. Affective responses are considered as an instinctive state of mind based on circumstances and mood. These responses are spontaneous and last for a few minutes, which makes them hard to recognize. The classical affective states used in the psycho-physiological analysis are anger, contempt, disgust, fear, happiness, neutral, sadness and surprise.
The researchers have used psychophysiological signals to estimate the cognitive state of participants. These signals have been used to analyze low order (e.g., simple visual inspection) and high order cognitive processes (e.g., attention, memory, language, problem solving) [13]. Different signal sources are used in the literature for psycho-physiological analysis such as Electrocardiogram (ECG), Skin conductance (GSR), Electroencephalography (EEG), Electromyography (EMG), respiration rate (RR), Electrooculogram (EOG), Skin temperature (ST), facial expression, etc. Some of these measures are mentioned in the next section.
To record the above mentioned psychophysiological signals, various novel technologies have been used in the past to design electrodes. These technologies have been upgraded from wet to dry electrodes with silver/silver chloride as the most commonly used plating material for these biofeedback sensors. Apart from silver/silver chloride, gold, aluminum, stainless steel and mixture of some other metals such as nickel and titanium are also utilized [14] in sensors. The wet electrodes require an electrolytic gel to increase the conduction but causes discomfort to the participants. Thus, for applications that involve real-time recording, preference was given to the dry electrodes [15]. A list of some commercially available measurement devices for recording biofeedback signals has been given in Table 1. EMG Neuronode [27], Sx230 [28], Trigno mini sensor [29] EOG Google glass [30], SMI eye tracking glassess [31], ASL eye tracking glasses [11] GSR Empatica [32], Shimmer 3 [33], Grove-GSR [34] ST YSI 400 series temperature probe [35], TIDA-00824 by Texas Instrument RR SA9311M [36], TMSI respiration sensor [37]

Electrocardiogram
The electrocardiogram (ECG) signal is a measure of electric potential recorded from the skin. The rise and fall of the signal identify different polarization levels of the heart over each heartbeat. The heart rate is measured by calculating the distance from R to R point (peak to peak) as shown in Figure 1. The distance increases with a decrease in heart rate. One drawback of using ECG to find a heart rate is that sometimes it becomes uncomfortable because electrodes are in direct contact with the skin [38].

Photoplethysmography
Photoplethysmography (PPG) is a low-cost non-invasive optical device used to detect changes in blood volume in the microvascular bed of tissue. The PPG signal comprises of two parts; pulsating (AC) signal that measures the changes in blood volume and it is synchronous with cardiac activity, slowly varying (DC) signal contains various low frequencies used to measure the respiration and thermoregulation. These days, it is the most common way of measuring heart-rate, oxygen saturation, and blood pressure [39]. PPG has been used in HCI and Human-robot interaction (HRI) for measuring user experience in terms of emotion and stress [40].

Heart Rate and Heart Rate Variability
Heart rate and Heart Rate Variability (HRV) are among the widely used features in detecting emotion states [41]. Autonomic Nervous System (ANS) activity can be effectively derived from the heart rate because the sympathetic and parasympathetic nervous systems govern ANS activity. Stress or activation can be related to ANS because, in a state of stress, the Sympathetic Nervous System (SNS) accelerates the heart rate. In the case of relaxation or rest, the heart rate returns to normal because of the Parasympathetic Nervous System (PNS) [41]. Heart rate is the number of heartbeats per min (bpm). Whereas, HRV is the sequence of time intervals between heart beats. SNS activity is directly related to heart rate; an increase in heart rate is due to an increase in SNS activity. The opposite is the case with PNS; a decrease in heart rate triggers PNS activity which corresponds to the rest or relaxation states. There are some other features that can be derived from the acceleration and deceleration periods including the magnitude and slope of that period, the amount of time taken by these periods, and the mean difference over the baseline [41].
On the other hand, HRV is also sometimes useful in calculating the affect. HRV can be used to explain both time-and frequency-domain metrics. This metric can be simple, such as the standard deviation of successive heartbeats, to some complicated metric, such as short-term power spectral density [42].
A simple robust metric such as standard deviation is sometimes preferred with a short time window because of the limited information [43]. Other metrics can be the maximum and minimum difference between normal R-R wave time interval in a defined window, the successive normal R-R interval difference percentage that is greater than 50 msec (pNN50) and root mean square difference between consecutive R-R interval [44].
With the advancement in signal recording and processing algorithms, complex features such as short-time Fourier Transform (FT) or Power Spectral Density (PSD) of heart rate are becoming the more effective tools for analyzing HRV. The PNS activity has the ability to modulate the HRV in frequencies of 0.04 to 0.5 Hz. Whereas, the SNS activity has functional gain below 0.1 Hz [42,45]. The spectral domain can function as best in discriminating the SNS and PNS activity influence on HRV, this is often known as sympathovagal balance.
One easy step to calculate the sympathovagal ratio of all heart rate activity is to measure the ratio of the energy of lower frequency range (0.04-0.1 Hz) with the total energy in the band (0.04-0.5 Hz). Some research suggests that it can also be measured by comparing the energy of the low-frequency band with a variety of combination of low, medium and high-frequency bands energy [44].
Every HRV dependent measure is robust to artifacts such as noise, outliers and abnormal beats and difference in SNS vs PNS activity. In accession to selecting a suitable metric, scientists and researchers ought to also select the acceptable time frame for heart rate series over which metric needs to be calculated. The quality of heart rate series and variable of interest will define the selection of a suitable metric from the cardiac signal. Generally, a 5 min time window is recommended for an average heart rate of 60 bpm (beats per minutes) [46].

Skin Conductance
Skin conductance is the measure of a person's sweat level in glands. Normally, the skin is an insulator but its conductance changes when there is sweat in the sweat glands. Skin conductance is sometimes referred to as Galvanic Skin Response (GSR). Skin conductivity is a non-invasive method to detect sympathetic activation, which is sweat-gland activity [47]. Karl Jung first used GSR to measure "negative complexes" in a word-connection experiment [48] which was further used as a key component in "lie detector" tests [49]. Skin conductance has been found to have a linearly varying property with respect to emotional arousal. It has been used to classify different states such as anger and fear. It is also utilized in detecting stress level in experiments that are performed on anticipatory anxiety and stress, while performing a task [50].
Skin conductance or galvanic skin response can be measured at any place on the skin, however, the highly active sweat glands for emotions are available in the hand's palm and the foot's sole [50]. In experimental studies, the middle and index finger's lower portion is commonplace for skin conductance electrodes. Usually, a conductive gel is placed on the skin to ensure good conductivity of electrical signals. For measuring skin conductance, the voltage change is measured, while injecting a small amount of current into the skin [51]. By constantly monitoring the change in potential difference across the electrodes, the skin conductance can be measured continuously.
For studies that involve movements, alternate electrodes locations are used because hand placement is sometimes found to be inconvenient and the placement also distorts the signal when a person is moving. Some researchers have measured conductivity even through clothes and jewelry [52,53]. Figure 2 shows an example signal of skin conductivity that varies with time. In this figure, an audio stimulus was played to get "orienting responses" during the test. Seven skin conductance responses are labeled in the graph. The response "1" has occurred at the start of the graph was not stimulated but potentially to be caused by clicking sound made by the computer when starting the audio track. The reactions "2", "3", "5" and "7" has been generated by the consecutive audio burst. Another "unstimulated" response observed at "6" [41].

Electroencephalography
An electroencephalogram (EEG) corresponds to the electrical activity of the brain and is observed by measuring the electrical voltage generated by neurons. Electrodes are placed on the surface of the skull to record an EEG signal. Analysis of EEG signals is a vast field with extensive research going on in the fields of neuroscience and psychology. The pre-frontal cortex (PFC) region of the brain seems to represent emotions such as anger [54]. James and Cannon [54] gave a model of the combined working of the mind and body in processing emotion for the first time.
The EEG is the most widely used to measure brain activity because its electrodes are non-invasive and portable. A full EEG headset comprises more than 128 channels; however, some experiments use fewer electrodes in neuro-feedback practice [55]. Experimental studies have shown that the EEG has the potential to differentiate positive emotional valence from negative emotional valence. The EEG signals can also identify different arousal levels. During an experiment that involved walking, the EEG can be considered as only a raw estimate of arousal level, but new advancements have the ability to change this concept [56]. The EEG signal analysis becomes complex with the high dimensional data and the best option is to use a feature selection algorithm to select the optimal feature set for the analysis [57]. A critical review on the use of EEG in HCI was presented by Spapé et al. which discusses the shortcomings and contribution of EEG signals in HCI [58].

Latest Research in Psycho-Physiological Analysis
Psycho-physiological signal analysis has shown promising techniques for measuring valence and arousal level for capturing the emotional and mental state. Self-reported data interrupt the flow of interaction and does not necessarily show the actual state of the user. The psychophysiological measures help undercover the ground truth. The main problems with the psychological measures are complex equipment setting, signal analysis and controlled environment which restrict the participant's experience of the interaction in many ways. Nevertheless, the advantages are far more than the disadvantages of psychophysiological analysis [59]. Psycho-physiological analysis has been used in the literature to recognize emotions or affective states as well as cognitive activity, but most of the research is focused on affect recognition. In the remaining sections, we will give an overview of recent trends in the use of psychophysiology in HCI.

Emotion/Affect Recognition in HCI
The research in affective phenomena focuses on detecting emotions, feelings, mood, attitude, and temperament, and so forth. A range of algorithms and techniques are available in the literature to detect emotions using different modalities. The first stage in these techniques is to generate the affective signals. This can be done in a number of ways such as by watching videos, looking at images, listening to songs, and performing a number of task, and so forth.
Our thoughts, feelings, and behavior are linked with emotions and therefore have a direct effect on decision making and thinking [60]. There are many definitions to describe primary and secondary affective states, but there is no uniform set. Six basic emotions used by many researchers are anger, joy, sadness, disgust, fear, and surprise as recommended by Ekman [10]. Another model that has been used widely to define emotions is the wheel of emotion proposed by Plutchik [61]. In the wheel of emotion, there are eight emotions. Six of those emotions are the same as defined in [10], the other two emotions are anticipation and acceptance.
Arousal and valence dimensions have been used by psychological researchers to model emotions in 2D as shown in Figure 3. In an arousal-valance model, the arousal can be "active" or "passive" and valance can be "positive" or "negative" [62]. Lang [63] labels individual pictures based on an arousal valence space which further converted into a non-verbal picture assessment called Manikin SAM [8]. Their self-assessment is used widely by advertising agencies and product designers to record affective experiences. The 2D arousal-valance model to define emotion is undoubtedly the most common model. A database named the International Affective Picture System (IAPS) is formed based on this model [64]. Emotion/affect recognition is considered to be a basic tool for the evaluation of HCIs, and the research is mainly focused on recognizing, interpreting, processing, and simulating human behavior and feelings [65]. Different research studies show that a variation in physiology is highly correlated with a variation in affects [66]. For instance, a person's smile is mapped in positive valence, on the other side, displeasure relates to the negative valence.
Scheirer et al. [67] recognize frustration by classifying galvanic skin response and blood pressure. Klein et al. [68] also experiment with frustration by forcefully frustrating the subject using a game that involves text-based assistance for the user. The results of this experiment show that the interaction time increases significantly when textual assistance is provided, in contrast to when no assistance is given. Research studies also support the hypothesis that different stimuli can be used to generate different emotions [69], but these emotions are evoked by seeing a picture/video or listening to audio stimulus and make it hard to apply these procedures in real-world applications.
Extensive research has been done in recognizing emotions from face and voice with very high accuracy in cases where the experimental environment is controlled. The accuracy will be lower if the experiment is conducted in normal circumstances. Some researchers believe that emotions are generated due to physiological arousal, while others consider it to be a part of the emotional process [70]. In gaming research, a fuzzy approach has been used by Mandryk et al. [71] to recognize emotions using facial expressions and skin conductance, while playing NHL2003 on a Sony PS2. To record facial expressions, four electrodes have been used. Smiling and frowning are the two emotions that are recognized. The assumption is that smiling is related to positive valence and frowning is related to negative valence, but these assumptions are not enough for strong claims as it does not map the emotions to valence scale effectively [10].
In experiments on a first-person shooter game, Juma [72] worked on secondary emotions by developing a game in which the primary emotion is combined with a secondary emotion to generate an affective component. The key finding of this experiment is that secondary emotion can be of vital importance in selecting an action in an HCI environment. Emotional films have been used by Costa et al. [73] to evoke five primary emotions in participants. To estimate the valence value of emotions, a synchronization index has been calculated. Li et al. [74] used pictures to generate happiness and sadness in a subject, and record 90% classification accuracy. However, Horlings et al. [60] commented that the recognition rate will be low if the arousal and valence values are not extreme. A user-independent emotion-recognition system has been developed by Nie et al. [75]. The emotions in their experiment are generated by movies and all four emotions are extreme emotions. Frequency-domain features of EEG signals have been extracted and classification has been performed using a support vector machine (SVM).
Emotion recognition through EEG signals in brain-computer interfaces (BCI) and neuroimaging are usually carried out in a constrained environment. A small tolerance range is allowed for motor movement, which is important in object manipulation activity. Nowadays, many researchers work on using the psycho-physiological signal analysis in real-life situations such as evaluating the performance of sportsmen, game environments, etc. A review of current research in evaluating the peak performance of sportsmen has been done by Thompson et al. [76]. The study records the finding that the EEG signals are disturbed by motor movement, and it also discusses the techniques that can be used to generate reliable EEG recordings when the subjects are moving.
Nakasone and his team in 2005, presented a model to detect emotions in real time, using EMG and GSR [77] in a gaming scenario between the user and a 3D humanoid agent. Khair et al. published a review paper on human emotions in 2012 [78] in which protocols to generate and analyze human emotions, and an optimal induction method, have been proposed. According to Khair et al., music is considered to be the most popular way of inducing emotions.
The emotional response to different states of physiological signals is shown in Table 2. In another study, they found that different genders relate to different expressions of emotion [79]. Boys induce happiness and anger with faster music and upward movements, unlike girls. A combination of two approaches can be very useful in generating strong emotions such as combining music with a video or a games with strong emotional music.  [80]. The study aims to answer the question: Can auditory stimuli be used effectively to elicit emotions instead of visual stimuli? They found that both stimuli were equally effective in inducing emotions. They also conducted a culture-specific analysis between India and China but the accuracy was more or less the same. The reason for this may be the strictly controlled experimental environment. Based on their results, we think that visual stimuli strongly backed and synced with auditory data will be much more effective as emotional elicitors in practical HCI applications.
To provide sufficient evidence in support of recognizing various factors affecting performance and to thoroughly test the developed techniques, the psychophysiological data must be sufficient. Table 3 shows the summary of the available datasets for psycho-physiological analysis accessible publicly. A substantial amount of research has been carried out in recognizing emotion from facial input. A very good review paper on facial emotion recognition on real-world user experience and mixed reality has been written by Mehta et al. [81]. Classification accuracy of almost 90% was seen in the literature using facial input which indicates that there is still room for improvement.

Cognitive States Assessment in HCI
Another major research area in psycho-physiological analysis is cognitive assessment. The literature is quite limited to cognitive assessment for multi-modal human-computer interface systems. Most of the literature is focused on the assessment of user cognition in games experience. The assessment of human-robot interaction is also popular among many researchers as well as some other HCI's evaluation through psychophysiological signals.

Game Systems
The video gaming industry is one of the biggest industry in World [88]. Still, the assessment of user-game interaction and experience is primarily done by self-reported techniques [89]. With the development in measurement techniques and methods for psychophysiological system, more and more research has been carried out in measuring user experience using psychophysiological signals. A game user experience focused survey book written by Bernhaupt define various user experience and evaluation methods [90]. In a review paper, the use of the psychophysiological measure in video-games was investigated and listed the pros and cons of using psychophysiological techniques [91]. They highlighted that the field lacks useful and widely accepted game-specific theory background, research and integrated knowledge.
Drachen et al. presented a study to find a correlation between self-reported data (In-Game Experience Questionnaire (iGEQ)) and psychophysiological measures and found a direct correlation of iGEQ with heart rate [92]. Some researchers studied the correlation of psychophysiological measures and violent games and found an increase in cardiovascular activity when compared to non-violent games [93,94]. The researchers reported that the psychophysiological measures, especially heart rate, showed a strong correlation with self-reported data in both positive and negative experiences [93,95,96].
The relationship between level design parameters, user experience, and player characteristics was explored by Pedersen et al., and found a correlation between gameplay features and three emotions: Fun, challenge and frustration with an average performance of above 70% [97]. In Reference [98], McMahan et al. assessed various stimulus modalities and gaming events using an Emotive EEG device. They found a significant difference between various stimulus modalities that have increasingly difficult cognitive demands. The power of the β and γ bands of EEG signals was increased during high-intensity events. They also suggest that Emotiv EEG headset can be used to differentiate between various cognitive processes.
Nacke et al. [99] studied the user experience in a fast-paced first-person shooter game with and without sound effects. EDA and facial EMG were recorded in addition to the questionnaire to evaluate the game experience. A significant effect of sound was observed in questionnaire results related to tension and flow and these results correlate with EMG/EDA activity. The EDA, EMG, and ECG data were used to classify two different gaming events with 80% accuracy shows that the psychophysiological signal has the capability to differentiate between different user experience [100].
Stein et al. presented a method to adjust the game difficulty using EEG signals [101]. They estimated the long term excitement of the participant to trigger the dynamic difficulty adjustment and found a correlation between excitement patterns and game events. In the literature, machine learning and evolutionary algorithms are used for clustering various gaming events [102], design new levels [103], difficulty adjustment [101,104], modeling user experience [97,105], and feedback to personalized game elements [106]. Despite these advancements, the investigation in modeling and estimating user experience for the improvement of the HCI system is still in its preliminary stages.

Human Robot Interaction (HRI) Studies
Psycho-physiological analysis has been applied in HRI studies that involve interacting with actual robots to evaluate the user experience [107]. The main problem with the psychophysiological analysis is to verify the accuracy and significance of the results. A research conducted by Itoh et al. used ECG, skin response, EDA, blood pressure and upper body movement to estimate the participant stress level and based on the stress level robot modify its action [108]. They found that the user's stress level decreased when the robot shook their hand. Other researchers have found the same observation when modifying the robot's behaviors based on participants psychophysiological state [109,110]. Kulic and Croft evaluated the feasibility of psychophysiological measure for user experience evaluation [111]. Results showed a relationship between anxiety, calmness and the speed of the robotic arm. A stronger response was seen in EDA, EMG and ECG signals. Dehais et al. study showed the same result when they evaluate the human response to different types of robot motion [112].
Researchers have used human gaze analysis to measure situation awareness in real-time in HRI [113]. The model was able to predict a standard measure of situation awareness. Podevijn et al. [114] study the psychophysiological state of the participants when they interact with a swarm of robots. A direct relationship was found between user state and number of robots which the user is exposed to and an increase in arousal value was observed when the user was exposed to 24 robots. The visual features of a robot such as an appearance and vocal properties an affect the cognitive state of the patient who is under some treatment [115]. Human response to robot motions during direct human-robot interaction using physiological signals was studied by Kulic and Croft [116,117]. The human response was estimated by fuzzy inference engine and results showed that fast robot motion induces strong arousal response. A direct relationship was found between estimated emotional arousal and robot velocity. ROBIN, a telepresence robot, design by Cortellessa et al. [118] to measure physical and psychological health of elderly people. The evaluation showed that the interaction of ROBIN with elderly people was pleasant and usable.
In a study conducted to record the response of elderly people suffering from mild cognitive impairment showed that interacting with a telepresence robot has no adverse effect in cardiovascular activity [119]. Psychophysiological measures have been used in evaluating haptic robot interaction for stroke patients in a multi-modal virtual environment [120] and observed a weak psychophysiological response compared to healthy patients. Ting et al. [121] proposed a framework of adaptive automation system based on the operator's mental state calculated through heart-rate variability and task load index. Munih and Mihelj presented a very interesting article that summarizes the psychophysiological response in robot-assisted rehabilitation including multi-modal challenges and physical activity [122].

The Other HCI Systems
Psychophysiological measures are used as a tool to objectively investigate user experience in many other systems. Zhang et al. [123] studied the cognitive load measurement of a virtual reality driving system with multi-modal information fusion techniques. They found that a hybrid fusion of modalities is best suited for these kinds of difficult tasks probably because of Dual Coding Theory. Yao et al. [124] uses psychophysiological signals to evaluate the user experience of mobile applications. Participant's physiological responses, task performance, and self-reported data were collected and they found a correlation between self-reported data and skin response, an increase in skin response in failed tasks compared to successful tasks.
Various frequency bands of EEG signal have been used to study the cognitive load of the user. Kumar and Kumar used EEG to measure cognitive load in an HCI environment and found a significant difference in spectral power between low level and high-level cognitive task [125]. Similar kind of increase in cognitive load was reported by Baig et al. in analyzing novice and expert in using a 3D modeling multi-modal HCI system compared to traditional 3D modeling system [126]. Puma et al. used theta and alpha band power of EEG to estimate the cognitive workload in a multitasking environment [127]. The results showed an increase in alpha and theta band powers when there was an increase in the involvement of cognitive resources for completing the sub-tasks.
Significant differences were found in skin response, HR and blood volume pulse (BVP) in response to a video conferencing tool [128]. An increase in GSR, HR, and decrease in BVP was observed for videos at 5 frames per second compared to 25 frames per second. Most of the subjects didn't notice the difference in video quality, which indicates that psychophysiological measure has the capability to mine the underlying fact that cannot be found using traditional methods of measuring user experience [124]. In a comparison study between well-and ill-designed web pages, Ward et al. [129] found a decrease in GSR and HR in well-designed web pages compared to ill-designed web pages which result in an increase of stress level.
Anders Bruun presents a study where non-specialists analyze GSR data to detect user experience related events and found an accuracy of 60-80% [130]. Lin et al. present an investigation study to find the relationship between physiological measure and traditional usability index and found evidence that physiological data correlates with task performance and subjective reports assessing stress levels [131]. To study the experience in virtual reality, Meehan et al. conducted a study where they compared the participant's physiological response to a non-threatening virtual height simulation and found a change in heart rate and skin conductance [132].
The human brain responds differently to text and multimedia stimuli; to investigate this statement, Gerě et al. [133] present a study in which they investigate cognitive processes that take place in learning information presented in a visual or text format. They use EEG signals to measure cognitive activity and found higher α-band power, correspond to less mental activity in the brain, for text presentation. They also concluded that video and picture input gives a spark to visualization strategies, whereas text-induced activity is related to verbal processing. No gender-related differences were observed during this experiment. The same kind of work has been done by Madi and Khan [134]; they focused on analyzing cognitive activity and learning performance in text and multimedia comprehension. Cognitive load and emotions were monitored during the study. They found differences in α-and β-band power. Their study revealed that multimedia presentation, such as video and image, elicit positive emotions more than a text presentation, which induces a higher cognitive load.
To study the differences between single-task and dual-task multi-modal human-computer interaction, Novak et al. found significant changes from baseline to single and dual-task in psychophysiological signals but no differences were found between single and dual mental arithmetic task [135]. Their results suggested that different task results show different response in psychophysiological measure and it is not compulsory that the response correlates with the participant's subjective feelings. Researchers have found significant differences in respiratory response when the participants were given high-level cognitive task. Grassmann et al. presented a systematic review of respiratory changes with respect to cognitive load [136].
Psycho-physiological analysis has also been used in the study of cognitive skills and information processing in programmers. Lee et al. [137] present a study in which they examine the differences between novices and experts in programming comprehension. They used EEG to record the neural activity and found clear differences between novices and experts. The results showed that experts have superior programming comprehension abilities and excel at digit encoding, solving simple programs in a short time, and the ability to recall program functions after an extended period of time compared to a novice. Psycho-physiological analysis has been used for assessing real-time cognitive load for younger and older adults in the situation of divided attention and task interruption with an average cognitive load assessment of 73% for younger and 70% for older adults [138].
Liu et al. [139] analyzed the psycho-physiological signals to detect affective states of engineers in CAD activities and found that the EEG results correlate with the emotions described by the engineers during that activity. In another paper, Nguyen and Zeng used heart rate and EEG signals to find the relationship between the designer's mental efforts and stress levels [140]. They found that mental effort was the lowest at high-stress levels and no variations in the mental effort were seen in medium and low-stress level tasks. In another research work, Nguyen and Zeng found a strong association between self-rated effort and beta band power. They demonstrated that self-rating itself contribute towards mental activity [141].
Baig and Kavakli used connectivity analysis of functional brain networks to estimate the cognitive activity of designers in a 3D modeling task [142]. They used normalized transfer entropy to construct the connectivity matrix from EEG signals and found that significant changes in cognitive activity in drawing and manipulation state from the resting state. In another paper, they tried to find correlation between cognitive activities and task completion rate using a coding scheme to segment out the EEG signal with respect to design activities [143]. The results showed that the users who performed more physical action than conceptual and perceptual actions were relaxed as their alpha-band power activity of EEG signal was high.
Fairclough [144] reviews the literature related to the development of psychophysiological computing to innovate HCI and concluded that the psychophysiological measures have great potential to improve HCI, but the complex issues need to be fully tackled. Based on the literature review, we found that GSR/EDA is best suited to record arousal and mental efforts, HR is equipped to measure the arousal in emotion, likeability, and attention. HRV, EMG and respiration is mostly used for emotional state estimation. BVP is used for evaluating relaxation and facial input is applied to recognize emotions from face expression. EEG signal is widely used to detect emotions, frustration, and mental effort.

Conclusions
This paper presents a review of methods and measurements that are currently used in psychophysiological analysis studies to measure cognitive or mental states. There are various methods that can estimate the user's brain states but the focus of this paper is on psycho-physiological signal analysis based cognitive state estimation. By reviewing the literature, we contribute to highlight the key parameters used to estimate the mental states and identify their advantages and limitations from an HCI perspective. In particular, there are various studies that showed the effectiveness of psychophysiological measures in indexing emotional and cognitive response of the user when interacting with HCI. However, the fast-growing field lacks acceptable theories and background research. The major findings are given below:

•
The interaction time increases significantly when an assistance is provided.
• Auditory and visual stimuli are the best ways to elicit emotions in a controlled experimental settings. • Violent games increase the cardiovascular activity compared to non-violent games.
• Psycho-physiological measures shows a strong correlation with the self-reported data.

•
An increase in β and γ-bands of EEG signals was observed during high intensity events. • A decrease in stress level was found while interacting with a social robot. • Psycho-physiological measure has the capability to mine the underlying fact that cannot be found using traditional methods.

•
Ill-design web pages increase the stress level of the user.

•
Virtual reality simulations can be used to study the relationship between brain responses and stress levels.

•
Multimedia presentations such as video and image elicit positive emotion more than text presentation, which induces a higher cognitive load.
Our review reveals that previous studies did not stick to a standardized experimental setting and as a result, significant differences were observed in sample size, age, gender, experiences, sessions length, and types of participants. This nonuniform behavior makes it difficult to compare results of different settings. A basic step towards developing an acceptable theory is to test the previous findings with a more generic dataset and find answers to the questions on how existing theories perform in different modalities, social environment, and experiences. We can get an insight into determining the statistical requirements for experimental setup and design and have a better understanding of various modes of interaction.