Eye-Based Recognition of User Traits and States—A Systematic State-of-the-Art Review

Langner, Moritz; Toreini, Peyman; Maedche, Alexander

doi:10.3390/jemr18020008

Open AccessSystematic Review

Eye-Based Recognition of User Traits and States—A Systematic State-of-the-Art Review

by

Moritz Langner

^*

,

Peyman Toreini

and

Alexander Maedche

Institute for Information Systems (WIN), Department of Economics and Management, Karlsruhe Institute of Technology (KIT), Kaiserstraße 89-93, 76133 Karlsruhe, Germany

^*

Author to whom correspondence should be addressed.

J. Eye Mov. Res. 2025, 18(2), 8; https://doi.org/10.3390/jemr18020008

Submission received: 14 January 2025 / Revised: 15 February 2025 / Accepted: 21 March 2025 / Published: 1 April 2025

Download

Browse Figures

Versions Notes

Abstract

Eye-tracking technology provides high-resolution information about a user’s visual behavior and interests. Combined with advances in machine learning, it has become possible to recognize user traits and states using eye-tracking data. Despite increasing research interest, a comprehensive systematic review of eye-based recognition approaches has been lacking. This study aimed to fill this gap by systematically reviewing and synthesizing the existing literature on the machine-learning-based recognition of user traits and states using eye-tracking data following PRISMA 2020 guidelines. The inclusion criteria focused on studies that applied eye-tracking data to recognize user traits and states with machine learning or deep learning approaches. Searches were performed in the ACM Digital Library and IEEE Xplore and the found studies were assessed for the risk of bias using standard methodological criteria. The data synthesis included a conceptual framework that covered the task, context, technology and data processing, and recognition targets. A total of 90 studies were included that encompassed a variety of tasks (e.g., visual, driving, learning) and contexts (e.g., computer screen, simulator, wild). The recognition targets included cognitive and affective states (e.g., emotions, cognitive workload) and user traits (e.g., personality, working memory). A set of various machine learning techniques, such as Support Vector Machines (SVMs), Random Forests, and deep learning models were applied to recognize user states and traits. This review identified state-of-the-art approaches and gaps, which highlighted the need for building up best practices, larger-scale datasets, and diversifying tasks and contexts. Future research should focus on improving the ecological validity, multi-modal approaches for robust user modeling, and developing gaze-adaptive systems.

Keywords:

eye tracking; user modeling; cognitive state; affective state; trait; personality; machine learning; deep learning

1. Introduction

In today’s world, we are surrounded by a vast array of digital technologies that help us perform everyday tasks, such as working, learning, shopping, and communicating. Interactions with these digital technologies follow well-established paradigms. Mice, keyboards, and touchscreens are the most common input modalities of today’s digital technologies and allow for efficient and effective interaction. In many cases, users know what they want to accomplish with the PC, but the input modalities provide limited insight into these internal user processes, creating an information asymmetry, where systems are unaware of users’ intentions, frustrations, or engagement [1]. Currently, the vast amount of wearables and biosignal sensors allow for counteracting this challenge by providing digital technologies with deeper insights into the traits and states of users [2,3,4]. While user traits are characteristics of a person that persist over time (e.g., personality traits), user states (e.g., emotional or cognitive) depend on the characteristics of the user, the situation, and the interaction between the user and the situation [5]. Thus, user states vary over time as the situation changes and are context dependent [5]. By providing further information about the user state or trait, interactive systems have the ability to make the interaction between users and computers symmetrical and increase the effectiveness and efficiency of human–computer interaction [1].

Biosignals offer a powerful means of sensing user states and traits, enabling a deeper understanding of user needs and fostering more balanced interactions with computers. These signals, produced autonomously by living organisms, can be measured continuously using sensors such as electroencephalogram (EEG), electrocardiogram (ECG), electrodermal activity (EDA) sensors, or eye tracking during computer usage [3]. Eye tracking is a technology that provides access to a variety of user information from the eyes, such as eye movements, pupil size, blinks, and head distance [6]. Utilizing this information and finding correlations between user states, such as the cognitive workload, emotions, mind wandering, or traits like personality traits, has been gaining growing interest in research in recent years [7,8,9,10]. These insights can be used to develop eye-based models for recognizing and interpreting user states and traits. Advancements in machine learning have further expanded the potential for connecting eye-tracking data with user traits and states. This is underscored by recent reviews on the eye-based recognition of cognitive and affective states [11] and personality traits and cognitive abilities [12]. These studies provide initial insights into the feasibility of machine learning models for eye-based user recognition and highlight the eye-related features used. However, they do not adopt a systematic approach, leaving a gap in understanding, where specific machine learning and deep learning algorithms have been applied to model particular user traits or states. A comprehensive overview of these methods would be a valuable contribution for the advancement of eye-based user modeling. Furthermore, Steichen [12] emphasizes the need for further research that focuses on deep learning approaches and collecting data outside controlled laboratory settings. Skaramagkas et al. [11] also highlighted the potential of combining eye-tracking data with other biosignals, such as EEG, ECG, and EDA, to enable a multi-modal recognition approach. Finally, studies on eye-based user modeling is scattered across several disciplines, e.g., HCI, ML and deep learning, autonomous driving, and psychology. Thus, we argue that there is a research gap for a comprehensive overview of the current literature on eye-based user trait and state recognition across disciplines, with a focus on ML and deep learning, multi-modality, and diverse contexts to further advance the development of future eye-based user models. This led to the following research question:

RQ: What is the state-of-the-art of eye-based user trait and state recognition using ML approaches and what are the future research directions?

We conducted a systematic literature review (SLR) and developed a conceptual framework for eye-based user state and trait recognition. We found and reviewed 90 articles on the dimensions of task, context, technology and data processing, and recognition target. Based on this analysis, we derived future research directions to guide research on eye-based user trait and state recognition. We contribute a comprehensive overview of which user traits and states have been investigated and recognized using eye-tracking data, and which ML-based modeling approaches and eye-based features have been used so far. This will support researchers and practitioners in the development of future eye-based user models. Moreover, we contribute to the development of gaze-adaptive systems. Following the paradigm of biosignal-adaptive systems by Schultz and Maedche [3], gaze-adaptive systems use biosignal information from the user’s eyes by continuously recording and interpreting them to model user traits and states, and ultimately to adapt the system to the user’s current needs. Adaptive systems were researched in various disciplines, such as HCI, NeuroIS, or neuroscience [1,13,14,15], but their development still remains a challenge [3]. The first gaze-adaptive systems were developed by Hutt et al. [9] and Qi et al. [16] and show the potential of adaptive systems to support mind-wandering during learning and to help readers improve their reading strategy. Therefore, this review lays the foundation for developing the necessary user models to enable eye-based user trait and state recognition.

2. Foundations

Eye tracking is an established technology for recording the eye movements and pupil sizes of people [6]. Modern eye trackers rely on the video-based infrared pupil–cornea reflection technology, which creates a reflection on the cornea of the eye that moves in relation to the pupil center depending on were the user is looking [17,18]. Through a calibration process, the eye tracker can calculate the point of gaze on a screen and the gaze direction. Typically, eye-tracking devices are divided into two classes: remote eye trackers and head-mounted eye trackers [6,17,19]. Remote eye trackers (often referred to as table-mounted, screen-based, or desktop-based eye trackers) are positioned at a distance of 50–90 cm and are attached to or underneath a screen. Thus, they are not in physical contact with the user. Remote eye trackers typically provide a stream of information for the points of gaze (x- and y-coordinates on a screen) for the left and right eyes, gaze origins (x-, y-, and z-coordinates) for the left and right eyes, and pupil sizes of the left and right eyes [6]. Head-mounted eye trackers (often also called mobile eye trackers) are glasses with an integrated infrared source and video camera that the user wears on the head [6,17,20]. Head-mounted eye trackers also provide data streams of the point of gaze in a 3D environment (x-, y-, and z-coordinates) and pupil size for both the left and right eyes. In recent years, eye-tracking technology has advanced significantly, especially in terms of robustness, allowing it to move from research laboratories to real-world applications [6]. Eye-tracking applications can be distinguished along various dimensions. While Duchowski [21] separated between interactive (eye-based interactions) and diagnostic (visual attention analysis) applications of eye tracking, Majaranta and Bulling [17] distinguished eye-tracking applications into four areas across the dimensions of real-time versus offline analysis: eye-based interactions, attentive user interface, eye-based user modeling, and diagnostic applications. According to Majaranta and Bulling [17], eye-based interactions focus on using eye movements as explicit input for real-time interactions with a user interface, while attentive user interfaces use eye-tracking data as an implicit input to support the attention management of the user. The focus of this review is on eye-based user modeling, which relies on eye-tracking data to understand, e.g., the user’s cognitive and affective processes, traits, behavior, or intentions and develop models for their recognition. Diagnostic applications refer to the offline analysis of eye movement data to gain insights into visual and attentional processes.

Eye movement data analysis extracts so-called fixations and saccades from raw gaze point data [6,22]. Fixations are the short pauses of gaze on a specific area of interest and typically have a duration of at least 50–300 ms [22]. Saccades are the rapid jumps between two consecutive fixations and have a duration of 10 to 100 ms [6]. Several algorithms exist to detect fixations and saccades out of raw eye movement data. From fixations and saccades, several metrics can be derived. Low-level metrics are based on the fixation duration, rate, and count or the saccade duration, length, acceleration, rate, and count, as well as their statistical features, such as the mean, median, minimum, maximum, skew, or kurtosis [6,23]. High-level metrics make use of so-called areas of interests (AOIs), i.e., particular areas of a UI or the environment that are of high importance, and focus on transitions between AOIs or fixation- or saccade-based metrics computed separately for each AOI [6,23].

A fundamental assumption in eye tracking is the eye–mind hypothesis by Just and Carpenter [24]. The eye–mind hypothesis states that what a person fixates on is also actively cognitively processed [24]. Research also showed a relationship between the pupil size and cognitive workload, as well as the arousal level [25,26]. This is also the reason why eye trackers are often used to study cognitive and affective processes or personality traits [11,12]. These insights into these user states and traits can be leveraged to design gaze-adaptive systems. Adaptive systems research has a long history dating back to Pope et al. [13], who suggested biocybernetic systems that rely on EEG sensors to assess operator engagement and adapt the level of automation depending on the engagement. The terms of neuroadaptive systems [1,15], physiological computing systems [4,14], physioadaptive systems [27], and biosignal adaptive systems [3] then evolved out of it. All these systems follow the same loop structure with three stages: (1) the collection of biosignals from users by sensors, (2) the recognition of user traits and states, and (3) system adaptation. In the first stage, the signals are recorded by sensors [1]. Moreover, this stage includes three components, namely, signal acquisition, signal processing, and signal storage [27]. Typically, these signals are acquired from sensors, like EEG, ECG, EDA/GSR, or eye tracking. In the second stage, the collected sensor data are analyzed with the goal of recognizing user traits and states [4]. Often, the collected sensor data are combined with self-reported data from users, e.g., collected through surveys. This process typically involves feature engineering and an analytics engine that applies supervised machine learning techniques [27]. In the final stage, the system adaptively responds based on the recognized user traits and states [14,15]. The goal of the system adaptation is either to trigger a desirable response or to maintain a desirable state [27]. Our SLR specifically focused on the first two stages. In doing so, we lay the foundations for the development of gaze-adaptive systems that use eye-tracking data as the biosignal.

3. Materials and Methods

To answer the research question that guides this article, we first conducted a systematic literature review. This provided a set of relevant papers on the eye-based recognition of user traits and states. In a second step, we developed a conceptualization of eye-based user recognition in the form of a framework. Based on the identified literature and the framework, we highlight research gaps and outline future research directions.

3.1. Systematic Literature Review

This literature review followed the guidelines established by Page et al. [28] and Kitchenham and Charters [29]. We structured the literature review process along the three phases of plan, conduct, and report. The review was not registered and a protocol was not prepared. As a first step in the planning phase, we developed the search strategy for this literature review. Starting with an exploratory search on Google Scholar, we identified the first relevant literature. After reviewing the literature found and its keywords, we iteratively developed the search string several times. The final search string consisted of four parts to capture a broad range of relevant studies that investigated user states and traits. The first part ensured that only studies that used eye, gaze, or pupil data were included. The second part focused on capturing studies that investigated the prediction of cognitive, mental, affective, emotional, physiological, psychological, or personality user constructs. The third part ensured that only studies that investigated user states, traits, or characteristics were included. The fourth part ensured that only studies were included that examined the recognition, detection, classification, modeling, and prediction of user states and traits. Therefore, after applying Boolean operators and wildcards, the following search string was developed:

(eye * OR gaze* OR pupil*)
AND (cognit* OR mental OR affect* OR emotion* OR physiolog* OR psycholog* OR personality)
AND (state OR trait OR characteristic*)
AND (recogni* OR detect* OR classif* OR model* OR predict*).

As a next step, we selected the ACM Digital Library and IEEE Explore databases for this review, as these databases are well established and considered as reliable sources for conducting literature reviews by scholars. The databases were searched on 12 November 2024. Moreover, we developed selection criteria for including studies in this review, which are presented in Table 1. We decided to exclude studies that focused on the recognition of diseases and disorders, as this requires specific medical knowledge and can have a serious impact on people’s lives if misclassified. In order to obtain a holistic overview, we decided not to limit the results to a specific type of eye-tracking data, time period, publication type, or publication outlet.

This was followed by the phase of the literature search. The previously defined search string with a filter for the abstract and title was executed on the selected databases and 1568 articles were identified, as shown in Figure 1. To identify relevant publications, we followed a single screening approach and all records were assessed by one reviewer. First, the titles, abstracts, and keywords were scanned before reviewing the full text of the publications. From this set of relevant publications, a forward and backward search was performed following the same selection criteria. The final set of publications was considered in this literature review. The result of the literature search phase was a comprehensive set of 90 papers that employed machine learning for the eye-based recognition of user traits and states.

3.2. Framework Creation

After identifying the relevant literature, the publications were analyzed by reading and extracting all the relevant information to provide an answer to our research question. To structure the analysis process, we developed a framework skeleton that represented all the relevant dimensions of eye-based recognition. We followed a conceptual-to-empirical and empirical-to-conceptual development approach, as suggested by Nickerson et al. [31]. To derive a first version of the conceptual framework, we first leveraged the PACT framework, as presented in Benyon [32], and gathered the previously introduced biosignal-adaptive systems concept from Hettinger et al. [1], Schultz and Maedche [3], Pope et al. [13], Allanson and Fairclough [14], Loewe and Nadj [27], and Riedl and Léger [15] to follow a top-down conceptual-to-empirical approach. By extracting these dimensions, we derived four dimensions: task, context, technology and data processing, and recognition target. In the second step, an empirical-to-conceptual approach was conducted following Nickerson et al. [31]. To derive the subcategories of each dimension and the corresponding codes, we used an inductive coding procedure based on Wolfswinkel et al. [33]. To minimize the bias, some papers were coded jointly by two reviewers, and the subcategories and codes were iteratively refined by them. Once a sufficient level of abstraction of the conceptual framework and codes was reached, one reviewer coded the papers in the final set of this review accordingly and recorded the results in a framework presented in Section 4 and a concept matrix presented in Appendix A.2.

4. Results

In this section, we describe the results of the literature review and analyze the identified papers in detail. Moreover, we present the framework for the eye-based user recognition of user traits and states based on the discovered literature. Finally, we provide research gaps and directions for future research.

4.1. Descriptive Results

As described in the Materials and Methods Section, the developed search string was applied to the relevant databases and resulted in 1568 hits. We applied the previously described inclusion criteria to this set and excluded 1403 papers based on the title and abstract. The remaining 165 publications were reviewed in detail and 55 papers remained after the full-text analysis. Most of the excluded studies either did not apply machine learning algorithms on the collected eye movement data to recognize user traits and states or focused on the eye-based recognition of activities or diseases. Finally, we performed a forward and backward search and identified an additional 35 publications. At the end of this process, the final set consisted of 90 relevant publications. A complete list of all identified papers is attached in Appendix A.1. As shown in Figure 2, analyzing the descriptive data of the publications highlighted that about ten years ago, the application of machine learning algorithms on eye-based data to recognize user traits and states emerged. Since then, there has been a steady increase in publications about eye-based user trait and state recognition. The sample sizes of the studies were quite different, with an average of M = 45.6 (SD = 46.6).

4.2. Framework

Following the conceptualization process described in the previous section, a framework for the eye-based recognition of user traits and states was iteratively developed to reflect the state-of-the-art research landscape. The framework covered four dimensions: task, context, technology and data processing, and recognition target (traits and states). Figure 3 shows the (sub)dimensions, including their codes. For each code, the figure shows the number of papers in the final set that covered the respective analysis results (indicated by the number in brackets). In the following sections, we describe each (sub)dimension in detail.

4.2.1. Dimensions: Task and Context

The task dimension describes the experimental task that participants had to accomplish during the studies. As it can be seen, a variety of tasks were used in the experimental studies during which eye-tracking data were recorded to recognize the user states and traits. In total, nine different task categories were identified. The most common tasks were visual tasks, in which the study participants, for example, had to work on a visual task (n = 42) such as exploring a visual stimulus (e.g., [10,34]), performing the n-back task (e.g., [35,36]), or the Stroop test (e.g., [37,38]). The to-be-explored visual stimulus was provided either in the form of an image (e.g., [39]), a video (e.g., [40]), or an information visualization (e.g., [41]). The second-most common tasks were driving tasks (n = 15) in a simulator (e.g., [42]) or in virtual reality (e.g., [43]). Other commonly studied tasks were learning and reading tasks (n = 13) (e.g., [9,44]), everyday tasks (n = 6) (e.g., [45]), gaming or simulation tasks (n = 5) (e.g., [7,8]), aviation-related tasks (n = 3) (e.g., [46]), medical tasks (n = 3) (e.g., [47,48]), or coding-related tasks (n = 2) ([49,50]). Furthermore, a few studies (n = 11) investigated tasks that did not fit in to the abovementioned task categories.

In terms of the context, most studies were conducted on computer screens (n = 65) in a controlled environment. Twelve studies were conducted in simulators, such as a driving simulator, and seven studies were conducted in the wild, such as during everyday tasks. Furthermore, seven studies investigated the recognition of user traits or states in virtual or augmented reality.

4.2.2. Dimension: Technology and Data Processing

The technology and data-processing dimension includes all technical and data-related sub-dimensions, including types of signals, apparatus, collected eye-tracking data, and applied recognition algorithms.

Signals: For the biosignals collected from participants within the studies, we distinguished between mono-modal eye-tracking data collection and a multi-modal data collection that combined eye tracking with other sensor data. Half of the articles (n = 47) only recorded eye-tracking data to recognize user traits and states. A multi-modal approach was followed by 43 of the analyzed publications. Here, eye tracking was coupled with a variety of other biosignal sensors, such as GSR/EDA (n = 17) (e.g., [51,52]), EEG (n = 16) (e.g., [53,54]), heart-related sensors like ECG and PPG (n = 15) (e.g., [55,56]), video input (n = 6) (e.g., [57,58]), or thermal sensors (n = 5) (e.g., [59,60]). In 11 of the studies, speech, audio, respiration, speech, or environment information was also used as an input. Adding another modality to eye tracking has the advantage of collecting complementary information and improving the recognition accuracy [61,62].

Apparatus: In terms of the study apparatus, 54 of the studies used a remote eye tracker, while 37 of the studies relied on a head-mounted eye-tracking device. It is interesting to note that some studies with a computer-screen-based context even relied on a head-mounted eye tracker (e.g., [8,10]). Most of the used remote eye trackers were from Tobii, while SMI eye-tracking glasses were the most used head-mounted eye trackers.

Eye-tracking data: Eye-tracking devices typically provide at least two data streams, namely, a raw gaze stream and a pupil data stream. These data streams are then aggregated into fixations, saccades, pupil size, and blinks before low-level and high-level eye-tracking metrics are computed from them. These eye-tracking metrics serve as the input features for the algorithms to recognize user states and traits. Most publications (n = 65) relied on fixation-based features, like the number of fixations and average fixation duration. Furthermore, saccade-based features (e.g., number of saccades, average saccade length) were leveraged in 56 studies and pupil-based metrics (e.g., pupil diameter) in 64 of the studies. Blink-related metrics served as input features in 39 studies. Typically, further statistical features, such as the minimum, maximum, mean, median, standard deviation, skew, and kurtosis, were computed for all the features (e.g., [63]).

Algorithm: This literature review focused only on articles that leveraged more advanced analytical algorithms based on machine learning and deep learning approaches to recognize or predict the user trait or state. The most commonly used machine learning algorithms were Support Vector Machines (SVMs) (n = 52), followed by Random Forests (n = 37). Moreover, k-Nearest Neighbors (n = 21), Logistic Regression (n = 20), Naive Bayes (n = 15), Decision Trees (n = 12), and Multilayer Perceptrons (n = 11) were frequently applied algorithms in the final set of articles. The “other category” contained a wide range of different algorithms that were tested regarding its prediction quality in recognizing user states and traits, and an extensive list of algorithms can be found in Appendix A.2. Some studies also applied multiple algorithms to the collected eye-tracking data and compared them (e.g., [40]).

4.2.3. Dimension: Recognition Targets

The recognition target dimension describes the user’s perspective. Specifically, it summarizes the recognized target constructs in terms of the traits and states using advanced analytical algorithms described in the previous section. This dimension is divided into two sub-dimensions of user traits and states. Overall, several papers investigated several constructs of these two sub-dimensions in parallel, like [64].

User traits: Predicting user traits through eye-tracking data was performed by 11 of all the identified publications. Typical user traits predicted through eye-tracking data were personality traits (n = 4). Research showed that affective personality traits (e.g., tactics, views, morality traits of the Dark Triad) were predicted through eye-tracking data more accurately than cognitive (e.g., openness of the HEXACO traits) or behavioral (e.g., conscientiousness of the HEXACO traits) personality traits [10,40]. Visual working memory capacity (WMC) and perceptual speed were investigated in three studies. Spatial memory and verbal WMCs were studied in two studies each. Verbal and visual WMCs and perceptual speed have important impacts on information processing, and detecting these cognitive abilities can enable user-adaptive information visualization [65,66]. Furthermore, two of the studies used eye-tracking data to infer cognitive styles, such as field dependence–independence during task performance ([67,68]). Raptis et al. [67] uncovered that participants with different cognitive styles could be especially distinguished based on gaze entropy, fixation duration, and count. Only one study predicted expertise based on eye data (e.g., [47]).

Affective and cognitive states: Eye-tracking data were also successfully applied to recognize specific user states by 80 of the studies. The most common state studied in more than every third study of this final set were emotions (n = 25). Emotions were often inferred in a multi-modal setup, together with EEG (e.g., [42,55,62]). Moreover, work not only focused on inferring the dimensions of affect, such as arousal and valence [69,70], but also on the specific types of emotions [62,71] with the help of eye-tracking data. Another user state often investigated in our publication set was cognitive workload (n = 21). Specifically, research established a relationship between the pupil size and the experienced cognitive workload of the study participants [35]. Furthermore, eye-tracking metrics, such as fixation-, saccade-, and blink-based metrics are also leverage for cognitive workload estimation [60,72]. The third-most frequently investigated cognitive state was mind wandering (n = 9), which was very often predicted during reading and learning tasks (e.g., [9,16,51]). Five of the studies used eye-tracking technology to examine confusion, which is studied in a visual task context (e.g., [73,74,75]), medical tasks (e.g., [48]), or physical tasks (e.g., [76]). Stress, which can include both cognitive and affective aspects, was examined in five studies (e.g., [38,50]). Distraction was another investigated cognitive state, where four of the studies studied it, and all of these distraction studies used driving tasks [77,78,79,80]. Other cognitive states recognized through eye-tracking data using advanced algorithms were situation awareness (n = 3) ([37,81,82]) and fatigue (n = 2) [83,84]. Moreover, two studies investigated comprehension [16,44] and perceptual curiosity [64,85] prediction, while one study investigated the recognition of indecisiveness [86] and another one the attention type [59] based on eye-tracking data.

5. Discussion

In this section, we propose several possible future research directions for the eye-based recognition of user traits and states considering the analysis within the framework presented in the previous section. Each dimension of the framework is discussed separately below. A condensed summary is presented in Table 2.

5.1. Task

A deep dive into the tasks studied in the collected papers highlights their diversity. We hypothesized that this diversity can be observed because users’ eye movements are highly task-specific and central to task performance. In addition, most tasks were studied in controlled environments, such as laboratories or highly controlled desktop environments. However, many tasks take place in semi- or highly uncontrolled environments, such as at work, school, or university. For example, working in a workshop or on an assembly line as an industrial example, teachers and professors imparting knowledge to students or supervising students at school, and white-collar workers videoconferencing or working with GenAI have not been the focus of research on the eye-based recognition of user states and traits. Therefore, we propose to further increase the diversity of tasks investigated to include tasks in semi- or uncontrolled environments. Furthermore, as unique and individual as eye movements are in different tasks, it is important to investigate different eye movement features and algorithms that work across tasks. Ensuring that models trained on one task also work on another task would support the generalizability of eye-based user models across tasks and help eye tracking to support more everyday tasks.

5.2. Context

An examination of the identified contexts shows that the previous research was mainly limited to the context in which users work with computer screens. However, users may perform their tasks on smaller screens, such as smartphones, or even larger screens, and thus, experience different affective and cognitive states. Focusing only on highly controlled contexts and environments also limits the ecological validity of the user state and trait models developed, as they may behave differently on data collected outside of controlled conditions. For example, bright sunlight, different poses, or larger screens or distances to screens can pose unique challenges to eye-based user models. Furthermore, gaze-adaptive systems can be used in specific spatial contexts, such as offices, production lines, and public displays with varying environmental conditions. Therefore, varying conditions and environments should be the focus of future research. In addition, new virtual environments, such as mixed reality, augmented reality, or virtual reality, should be further explored as eye tracking becomes one of the main interaction methods of these devices, as seen in the Apple Vision Pro and Meta Quest Pro. Overall, there is a need to further diversify the context in which eye-based recognition is tested.

5.3. Technology and Data Processing

Regarding the signal sub-dimension, the current result shows that there was limited research on using eye tracking in combination with other sensor technologies to perform recognition. However, the use of multi-modal data, such as EEG for brain activity or ECG for heart rate, is known to increase the accuracy of models [80,87]. By adding another perspective of the bodily response to changing user states, it can help to generalize the user state independent of the specific task and context. Furthermore, interaction information, such as mouse, keyboard, or application information, has not been well utilized [73]. Adding this information to multi-modal models would allow for more information about the task to be integrated into the models, potentially improving model performance. Therefore, future research should focus on the multi-modal aspect. Furthermore, it is necessary to compare different possible combinations of biosignals and interaction data to find the best one for the prediction results.

In terms of model development and evaluation, we have seen that almost every paper follows a different approach. These approaches range from train–test splits, train–test–validation splits, cross-validation, leave-one-participant/activity-out cross-validation of different metrics for evaluating models, such as accuracy; precision; recall; f1-score; AUC; or visualizations, such as the ROC curve. This makes it difficult to compare models across studies and is the reason why we do not report the model performance information in this paper. However, there is a need to define standards and best practices for reporting model evaluation results to further advance this field of research. Therefore, we encourage researchers to publish the underlying datasets and corresponding model development pipelines as open source to establish best practices and ensure correct model evaluation. Furthermore, by publishing this information, other researchers can build on this knowledge and further improve eye-based user models or test new approaches on old datasets. This would further support the development of generalizable user state and feature models.

5.4. Recognition Target

In general, the results showed that eye tracking can be used to detect a variety of user traits and states. Humans are complex, and psychology has examined a variety of constructs related to user traits and states that could potentially be predicted by eye-tracking data. Therefore, there is great potential for the further exploration of user traits and states. For affective–cognitive states, multi-modal approaches should be pursued. Also, previous studies focused only on the individual level, and the team level is not well explored. In collaborative scenarios, the recognition of individual and team-level affective and cognitive user states and characteristics, such as shared attention, collaborative workload, group stress, emotional competence, and social cohesion, are considered interesting. Furthermore, machine learning approaches have only been used for retrospective and not for ad hoc real-time recognition. Real-time recognition can support the simultaneous evaluation of multiple recognition targets to identify dependencies and correlations between them. For example, the system can identify user traits that influence or are influenced by certain user states. Therefore, we argue that the real-time recognition of user traits and states is a research gap and a promising avenue.

5.5. General Suggestions

The ability to recognize user traits and states using eye-tracking technology and machine learning approaches is the foundation for designing gaze-adaptive systems. There are many open research challenges in this exciting field. A major challenge will be the realization of the real-time recognition of user traits and states in order to enable so-called gaze-adaptive systems. Gaze-adaptive systems are a new class of adaptive systems that leverage eye-tracking data, more specifically gaze data, as the biosignal. Besides the goal of designing technology for productivity and maximizing usage time, there is a call for designing for well-being in HCI [88,89,90]. Since many user traits and states are related to well-being, there is a potential for researching gaze-adaptive systems to increase well-being. Furthermore, many different types of adaption are possible when building gaze-adaptive systems. Research should be conducted on how to design gaze-adaptive systems using various adaption strategies, such as changing the layout of the UI, adjusting the task difficulty, or providing supportive feedback to better meet the user’s needs. Looking at the study design of eye-based user model development studies, many studies recorded a few user state or trait labels per participant and the majority of studies had less than 50 participants. To advance eye-based user modeling and develop generalizable models, it is important to collect data from more participants and more labels per participant. Thus, we argue that there is a need for longitudinal and large-scale studies on eye-based user modeling. Another important aspect to consider is the data privacy of eye-tracking data, as they are considered particularly worthy of protection according to laws in European countries. Therefore, research on the anonymization of eye-tracking data and user trait and state modeling pipelines working with anonymized data is necessary to meet the requirements of the GDPR.

5.6. Limitations

Although rigor and relevance were emphasized in this study, we recognize that this literature review had some limitations. First, due to the selected databases and the search string developed, there was a risk of missing relevant articles. A methodological limitation of this review was that we followed a single-screen approach, which might have introduced biases in the results. Furthermore, we are well aware that the defined selection criteria may introduce a bias in the extracted results and influence the identified future research directions. To counteract potential biases, we followed the well-established guidelines from Page et al. [28] and Kitchenham and Charters [29]. Moreover, we explicitly described the procedure of this literature review. Second, a high number of the articles that we identified in this review were found through a forward and backward search. This indicates that our search string might have some shortcomings. Third, we developed a conceptual framework by examining the collected papers, which to some extent reflects the opinion of the authors. Therefore, we followed the approach of Nickerson et al. [31], applied an iterative procedure, and bilaterally developed the (sub)dimensions and codes of the conceptual framework. Fourth, we excluded papers that applied eye tracking to disease detection, as this is a medical topic that requires specific knowledge and tools. This could have had an impact on the holistic view that we aimed for in this literature review, as the approaches used for recognition in medical-related user traits and states were not covered in this review.

6. Conclusions

Eye tracking is already widely used in research to model and recognize user traits and states. However, previous studies were dispersed across multiple disciplines, faced limitations in ecological validity, and lacked the comprehensive overview needed to develop robust and generalizable eye-based models for user trait and state recognition. In this review, we systematically structured and analyzed the existing literature. We propose a conceptual framework and conduct an in-depth analysis of 90 relevant papers based on its dimensions. This structured overview provides researchers and practitioners with a clear understanding of the research landscape in eye-based user state and trait modeling. Additionally, it allows them to identify the best practices and reference state-of-the-art approaches in their respective fields. Furthermore, we outlined future research directions based on the framework’s (sub-)dimensions, helping researchers and practitioners formulate new research questions and expand the body of knowledge in this domain. Ultimately, we believe that this literature review serves as a foundation for developing gaze-adaptive systems that leverage eye-tracking technology to enhance user interaction and adaptation.

Author Contributions

Conceptualization, M.L., P.T. and A.M.; methodology, M.L. and P.T.; validation, M.L. and A.M.; formal analysis, M.L.; investigation, M.L. and P.T.; resources, M.L.; data curation, M.L. and P.T.; writing—original draft preparation, M.L., P.T. and A.M.; writing—review and editing, M.L. and A.M.; visualization, M.L. and P.T.; supervision, P.T. and A.M.; project administration, M.L., P.T. and A.M.; funding acquisition, A.M. All authors read and agreed to the published version of this manuscript.

Funding

This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)–GRK2739/1–Project Nr. 447089431–Research Training Group: KD2School–Designing Adaptive Systems for Economic Decisions.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within this article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of this study; in the collection, analyses, or interpretation of the data; in the writing of this manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AOI	Area of interest
AUC	Area under curve
EEG	Electroencephalogram
ECG	Electrocardiogram
EDA	Electrodermal activity
GSR	Galvanic skin response
HCI	Human–computer interaction
ML	Machine learning
ROC	Receiver operating characteristic
SLR	Systematic literature review
SVM	Support Vector Machine
UI	User interface
WMC	Working memory capacity

Appendix A

Appendix A.1. List of All Identified Publications

Abbad-Andaloussi et al. [49]
Abdelrahman et al. [59]
Abdurrahman et al. [56]
Alcañiz et al. [91]
Alhargan et al. [8]
Alhargan et al. [69]
Appel et al. [35]
Appel et al. [92]
Appel et al. [7]
Aracena et al. [39]
Babiker et al. [93]
Bao et al. [57]
Bao et al. [94]
Barral et al. [65]
Behroozi and Parnin [50]
Berkovsky et al. [10]
Bixler and D’Mello [95]
Bozkir et al. [43]
Bühler et al. [51]
Castner et al. [47]
Chakraborty et al. [36]
Chen et al. [46]
Chen et al. [96]
Chen and Epps [97]
Conati et al. [41]
Conati et al. [98]
Dumitriu et al. [99]
Fenoglio et al. [100]
Gong et al. [101]
Gong et al. [34]
Hoppe et al. [64]
Hoppe et al. [85]
Horng and Lin [83]
Hosp et al. [48]
Hutt et al. [102]
Hutt et al. [103]
Hutt et al. [9]
Jiménez-Guarneros and Fuentes-Pineda [54]
Jyotsna et al. [104]
Katsini et al. [68]
Kim et al. [81]
Ktistakis et al. [72]
Kwok et al. [105]
Lallé et al. [73]
Zhao et al. [106]

Li et al. [84]
Li et al. [37]
Liang et al. [77]
Liu et al. [78]
Liu et al. [107]
Liu et al. [108]
Lobo et al. [109]
Lu et al. [61]
Lufimpu-Luviya et al. [86]
Luong and Holz [110]
Luong et al. [111]
Ma et al. [112]
Mills et al. [113]
Mills et al. [63]
Misra et al. [79]
Miyaji et al. [80]
Mou et al. [42]
Oppelt et al. [60]
Raptis et al. [67]
Reich et al. [44]
Ren et al. [114]
Ren et al. [52]
Salima et al. [45]
Salminen et al. [74]
Sims and Conati [75]
Soleymani et al. [53]
Steichen et al. [66]
Stiber et al. [76]
Tabbaa et al. [55]
Taib et al. [40]
Tao and Lu [115]
Tarnowski et al. [70]
Wang et al. [116]
Wu et al. [117]
Qi et al. [16]
Xing et al. [58]
Yang et al. [118]
Zhai et al. [38]
Zhai and Barreto [119]
Zhang et al. [120]
Li et al. [121]
Zheng et al. [62]
Zheng et al. [71]
Zhong and Hou [122]
Zhou et al. [82]

Appendix A.2. Concept Matrix

References

Hettinger, L.J.; Branco, P.; Miguel Encarnacao, L.; Bonato, P. Neuroadaptive technologies: Applying neuroergonomics to the design of advanced interfaces. Theor. Issues Ergon. Sci. 2003, 4, 220–237. [Google Scholar] [CrossRef]
Picard, R.W. Affective computing: Challenges. Int. J. Hum. Comput. Stud. 2003, 59, 55–64. [Google Scholar] [CrossRef]
Schultz, T.; Maedche, A. Biosignals meet Adaptive Systems. SN Appl. Sci. 2023, 5, 234. [Google Scholar] [CrossRef]
Fairclough, S.H. Fundamentals of physiological computing. Interact. Comput. 2009, 21, 133–145. [Google Scholar] [CrossRef]
Steyer, R.; Schmitt, M.; Eid, M. Latent state-trait theory and research in personality and individual differences. Eur. J. Personal. 1999, 13, 389–408. [Google Scholar] [CrossRef]
Duchowski, A.T. Eye Tracking Methodology; Springer International Publishing: Cham, Switzerland, 2017. [Google Scholar] [CrossRef]
Appel, T.; Sevcenko, N.; Wortha, F.; Tsarava, K.; Moeller, K.; Ninaus, M.; Kasneci, E.; Gerjets, P. Predicting Cognitive Load in an Emergency Simulation Based on Behavioral and Physiological Measures. In Proceedings of the 2019 International Conference on Multimodal Interaction, Suzhou, China, 14–18 October 2019; pp. 154–163. [Google Scholar] [CrossRef]
Alhargan, A.; Cooke, N.; Binjammaz, T. Multimodal Affect Recognition in an Interactive Gaming Environment Using Eye Tracking and Speech Signals. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK, 13–17 November 2017; ICMI ’17. pp. 479–486. [Google Scholar] [CrossRef]
Hutt, S.; Krasich, K.; Brockmole, J.R.; D’Mello, S.K. Breaking out of the Lab: Mitigating Mind Wandering with Gaze-Based Attention-Aware Technology in Classrooms. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 8–13 May 2021. CHI ’21. [Google Scholar] [CrossRef]
Berkovsky, S.; Taib, R.; Koprinska, I.; Wang, E.; Zeng, Y.; Li, J.; Kleitman, S. Detecting Personality Traits Using Eye-Tracking Data. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, UK, 4–9 May 2019; CHI ’19. pp. 1–12. [Google Scholar] [CrossRef]
Skaramagkas, V.; Giannakakis, G.; Ktistakis, E.; Manousos, D.; Karatzanis, I.; Tachos, N.S.; Tripoliti, E.; Marias, K.; Fotiadis, D.I.; Tsiknakis, M. Review of Eye Tracking Metrics Involved in Emotional and Cognitive Processes. IEEE Rev. Biomed. Eng. 2023, 16, 260–277. [Google Scholar] [CrossRef]
Steichen, B. Computational Methods to Infer Human Factors for Adaptation and Personalization Using Eye Tracking. In A Human-Centered Perspective of Intelligent Personalized Environments and Systems; Springer: Berlin/Heidelberg, Germany, 2024; pp. 183–204. [Google Scholar] [CrossRef]
Pope, A.T.; Bogart, E.H.; Bartolome, D.S. Biocybernetic system evaluates indices of operator engagement in automated task. Biol. Psychol. 1995, 40, 187–195. [Google Scholar] [CrossRef]
Allanson, J.; Fairclough, S.H. A research agenda for physiological computing. Interact. Comput. 2004, 16, 857–878. [Google Scholar] [CrossRef]
Riedl, R.; Léger, P.M. Fundamentals of NeuroIS Information Systems and the Brain; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar] [CrossRef]
Qi, X.; Lu, Q.; Pan, W.; Zhao, Y.; Zhu, R.; Dong, M.; Chang, Y.; Lv, Q.; Dick, R.P.; Yang, F.; et al. CASES: A Cognition-Aware Smart Eyewear System for Understanding How People Read. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2023, 7, 115. [Google Scholar] [CrossRef]
Majaranta, P.; Bulling, A. Eye Tracking and Eye-Based Human–Computer Interaction. In Advances in Physiological Computing; Fairclough, S., Gilleade, K., Eds.; Springer: London, UK, 2014; pp. 39–65. [Google Scholar] [CrossRef]
Morimoto, C.H.; Mimica, M.R. Eye gaze tracking techniques for interactive applications. Comput. Vis. Image Underst. 2005, 98, 4–24. [Google Scholar] [CrossRef]
Holmqvist, K. Eye Tracking: A Comprehensive Guide to Methods and Measures; Oxford University Press: New York, NY, USA, 2011. [Google Scholar]
Bulling, A.; Ward, J.A.; Gellersen, H.; Oster, G.T. Eye Movement Analysis for Activity Recognition Using Electrooculography. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 741–753. [Google Scholar] [CrossRef] [PubMed]
Duchowski, A.T. A breadth-first survey of eye-tracking applications. Behav. Res. Methods Instruments Comput. 2002, 34, 455–470. [Google Scholar] [CrossRef]
Salvucci, D.D.; Goldberg, J.H. Identifying fixations and saccades in eye-tracking protocols. In Proceedings of the Symposium on Eye Tracking Research & Applications—ETRA ’00, Palm Beach Gardens, FL, USA, 6–8 November 2000; pp. 71–78. [Google Scholar] [CrossRef]
Srivastava, N.; Newn, J.; Velloso, E. Combining Low and Mid-Level Gaze Features for Desktop Activity Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2018, 2, 189. [Google Scholar] [CrossRef]
Just, M.A.; Carpenter, P.A. A theory of reading: From eye fixations to comprehension. Psychol. Rev. 1980, 87, 329–354. [Google Scholar] [CrossRef]
Beatty, J. Task-evoked pupillary responses, processing load, and the structure of processing resources. Psychol. Bull. 1982, 91, 276–292. [Google Scholar] [CrossRef] [PubMed]
Partala, T.; Jokiniemi, M.; Surakka, V. Pupillary responses to emotionally provocative stimuli. In Proceedings of the Symposium on Eye Tracking Research & Applications—ETRA ’00, Palm Beach Gardens, FL, USA, 6–8 November 2000; pp. 123–129. [Google Scholar] [CrossRef]
Loewe, N.; Nadj, M. Physio-Adaptive Systems - A State-of-the-Art Review and Future Research Directions. In Proceedings of the ECIS 2020 Proceedings—Twenty-Eighth European Conference on Information Systems, Marrakesh, Morocco, 15–17 June 2020; p. 19. [Google Scholar]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Kitchenham, B.; Charters, S. Guidelines for performing Systematic Literature Reviews in Software Engineering. In Proceedings of the 28th international Conference on Software Engineering, Shanghai, China, 20–28 May 2007. [Google Scholar] [CrossRef]
Page, M.J.; Moher, D.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. PRISMA 2020 explanation and elaboration: Updated guidance and exemplars for reporting systematic reviews. BMJ 2021, 372, n160. [Google Scholar] [CrossRef]
Nickerson, R.C.; Varshney, U.; Muntermann, J. A method for taxonomy development and its application in information systems. Eur. J. Inf. Syst. 2013, 22, 336–359. [Google Scholar] [CrossRef]
Benyon, D. Designing Interactive Systems: A Comprehensive Guide to HCI, UX and Interaction Design; Pearson: London, UK, 2014; Number 3. [Google Scholar]
Wolfswinkel, J.F.; Furtmueller, E.; Wilderom, C.P. Using grounded theory as a method for rigorously reviewing literature. Eur. J. Inf. Syst. 2013, 22, 45–55. [Google Scholar] [CrossRef]
Gong, X.; Chen, C.L.; Hu, B.; Zhang, T. CiABL: Completeness-induced Adaptative Broad Learning for Cross-Subject Emotion Recognition with EEG and Eye Movement Signals. IEEE Trans. Affect. Comput. 2024, 15, 1970–1984. [Google Scholar] [CrossRef]
Appel, T.; Scharinger, C.; Gerjets, P.; Kasneci, E. Cross-subject workload classification using pupil-related measures. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, Warsaw, Poland, 14–17 June 2018; pp. 1–8. [Google Scholar] [CrossRef]
Chakraborty, S.; Kiefer, P.; Raubal, M. Estimating Perceived Mental Workload From Eye-Tracking Data Based on Benign Anisocoria. IEEE Trans. Hum.-Mach. Syst. 2024, 54, 499–507. [Google Scholar] [CrossRef]
Li, R.; Cui, J.; Gao, R.; Suganthan, P.N.; Sourina, O.; Wang, L.; Chen, C.H. Situation Awareness Recognition Using EEG and Eye-Tracking data: A pilot study. In Proceedings of the 2022 International Conference on Cyberworlds (CW), Kanazawa, Japan, 27–29 September 2022; pp. 209–212. [Google Scholar] [CrossRef]
Zhai, J.; Barreto, A.B.; Chin, C.; Li, C. Realization of stress detection using psychophysiological signals for improvement of human-computer interactions. In Proceedings of the Proceedings, IEEE SoutheastCon 2005, Lauderdale, FL, USA, 8–10 April 2005; pp. 415–420. [Google Scholar] [CrossRef]
Aracena, C.; Basterrech, S.; Snael, V.; Velasquez, J. Neural Networks for Emotion Recognition Based on Eye Tracking Data. In Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics, Hong Kong, China, 9–12 October 2015; pp. 2632–2637. [Google Scholar] [CrossRef]
Taib, R.; Berkovsky, S.; Koprinska, I.; Wang, E.; Zeng, Y.; Li, J. Personality Sensing: Detection of Personality Traits Using Physiological Responses to Image and Video Stimuli. ACM Trans. Interact. Intell. Syst. 2020, 10, 18. [Google Scholar] [CrossRef]
Conati, C.; Lallé, S.; Rahman, M.A.; Toker, D. Comparing and Combining Interaction Data and Eye-tracking Data for the Real-time Prediction of User Cognitive Abilities in Visualization Tasks. ACM Trans. Interact. Intell. Syst. 2020, 10, 12. [Google Scholar] [CrossRef]
Mou, L.; Zhao, Y.; Zhou, C.; Nakisa, B.; Rastgoo, M.N.; Ma, L.; Huang, T.; Yin, B.; Jain, R.; Gao, W. Driver Emotion Recognition With a Hybrid Attentional Multimodal Fusion Framework. IEEE Trans. Affect. Comput. 2023, 14, 2970–2981. [Google Scholar] [CrossRef]
Bozkir, E.; Geisler, D.; Kasneci, E. Person Independent, Privacy Preserving, and Real Time Assessment of Cognitive Load using Eye Tracking in a Virtual Reality Setup. In Proceedings of the 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), Osaka, Japan, 23–27 March 2019; pp. 1834–1837. [Google Scholar] [CrossRef]
Reich, D.R.; Prasse, P.; Tschirner, C.; Haller, P.; Goldhammer, F.; Jäger, L.A. Inferring Native and Non-Native Human Reading Comprehension and Subjective Text Difficulty from Scanpaths in Reading. In Proceedings of the 2022 Symposium on Eye Tracking Research and Applications, Seattle, WA, USA, 8–11 June 2022; pp. 1–8. [Google Scholar] [CrossRef]
Salima, M.; M’hammed, S.; Messaadia, M.; Benslimane, S.M. Machine Learning for Predicting Personality Traits from Eye Tracking. In Proceedings of the 2023 International Conference on Decision Aid Sciences and Applications (DASA), Annaba, Algeria, 6–17 September 2023; pp. 126–130. [Google Scholar] [CrossRef]
Chen, J.; Zhang, Q.; Cheng, L.; Gao, X.; Ding, L. A Cognitive Load Assessment Method Considering Individual Differences in Eye Movement Data. In Proceedings of the 2019 IEEE 15th International Conference on Control and Automation (ICCA), Edinburgh, UK, 16–19 July 2019; pp. 295–300. [Google Scholar] [CrossRef]
Castner, N.; Kuebler, T.C.; Scheiter, K.; Richter, J.; Eder, T.; Huettig, F.; Keutel, C.; Kasneci, E. Deep semantic gaze embedding and scanpath comparison for expertise classification during OPT viewing. In Proceedings of the ACM Symposium on Eye Tracking Research and Applications, Stuttgart, Germany, 2–5 June 2020; pp. 1–10. [Google Scholar] [CrossRef]
Hosp, B.; Yin, M.S.; Haddawy, P.; Watcharopas, R.; Sa-Ngasoongsong, P.; Kasneci, E. States of Confusion: Eye and Head Tracking Reveal Surgeons’ Confusion during Arthroscopic Surgery. In Proceedings of the 2021 International Conference on Multimodal Interaction, Montréal, QC, Canada, 18–22 October 2021; ICMI ’21. pp. 753–757. [Google Scholar] [CrossRef]
Abbad-Andaloussi, A.; Sorg, T.; Weber, B. Estimating developers’ cognitive load at a fine-grained level using eye-tracking measures. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, Pittsburgh, PA, USA, 16–17 May 2022; pp. 111–121. [Google Scholar] [CrossRef]
Behroozi, M.; Parnin, C. Can We Predict Stressful Technical Interview Settings through Eye-Tracking? In Proceedings of the Workshop on Eye Movements in Programming, Warsaw, Poland, 15 June 2018. EMIP ’18. [Google Scholar] [CrossRef]
Bühler, B.; Bozkir, E.; Deininger, H.; Goldberg, P.; Gerjets, P.; Trautwein, U.; Kasneci, E. Detecting Aware and Unaware Mind Wandering During Lecture Viewing: A Multimodal Machine Learning Approach Using Eye Tracking, Facial Videos and Physiological Data. In Proceedings of the 26th International Conference on Multimodal Interaction, San Jose, Costa Rica, 4–8 November 2024; ICMI ’24. pp. 244–253. [Google Scholar] [CrossRef]
Ren, P.; Barreto, A.; Gao, Y.; Adjouadi, M. Affective Assessment by Digital Processing of the Pupil Diameter. IEEE Trans. Affect. Comput. 2013, 4, 2–14. [Google Scholar] [CrossRef]
Soleymani, M.; Lichtenauer, J.; Pun, T.; Pantic, M. A multimodal database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 2012, 3, 42–55. [Google Scholar] [CrossRef]
Jiménez-Guarneros, M.; Fuentes-Pineda, G. CFDA-CSF: A Multi-Modal Domain Adaptation Method for Cross-Subject Emotion Recognition. IEEE Trans. Affect. Comput. 2024, 15, 1502–1513. [Google Scholar] [CrossRef]
Tabbaa, L.; Searle, R.; Bafti, S.M.; Hossain, M.M.; Intarasisrisawat, J.; Glancy, M.; Ang, C.S. VREED: Virtual Reality Emotion Recognition Dataset Using Eye Tracking & Physiological Measures. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2021, 5, 178. [Google Scholar] [CrossRef]
Abdurrahman, U.A.; Zheng, L.; Sharifai, A.G.; Muraina, I.D. Heart Rate and Pupil Dilation As Reliable Measures of Neuro-Cognitive Load Classification. In Proceedings of the 2022 International Conference on Advancements in Smart, Secure and Intelligent Computing (ASSIC), Bhubaneswar, India, 19–20 November 2022; pp. 1–7. [Google Scholar] [CrossRef]
Bao, J.; Tao, X.; Zhou, Y. An Emotion Recognition Method Based on Eye Movement and Audiovisual Features in MOOC Learning Environment. IEEE Trans. Comput. Soc. Syst. 2024, 11, 171–183. [Google Scholar] [CrossRef]
Xing, B.; Wang, K.; Song, X.; Pan, Y.; Shi, Y.; Pang, S. User Emotion Status Recognition in MOOCs Study Environment Based on Eye Tracking and Video Feature Fusion. In Proceedings of the 2023 15th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), Hangzhou, China, 26–27 August 2023; pp. 88–91. [Google Scholar] [CrossRef]
Abdelrahman, Y.; Khan, A.A.; Newn, J.; Velloso, E.; Safwat, S.A.; Bailey, J.; Bulling, A.; Vetere, F.; Schmidt, A. Classifying Attention Types with Thermal Imaging and Eye Tracking. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2019, 3, 69. [Google Scholar] [CrossRef]
Oppelt, M.P.; Foltyn, A.; Deuschel, J.; Lang, N.R.; Holzer, N.; Eskofier, B.M.; Yang, S.H. ADABase: A Multimodal Dataset for Cognitive Load Estimation. Sensors 2022, 23, 340. [Google Scholar] [CrossRef] [PubMed]
Lu, Y.; Zheng, W.L.; Li, B.; Lu, B.L. Combining eye movements and EEG to enhance emotion recognition. In Proceedings of the IJCAI International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; pp. 1170–1176. [Google Scholar]
Zheng, W.L.; Liu, W.; Lu, Y.; Lu, B.L.; Cichocki, A. EmotionMeter: A Multimodal Framework for Recognizing Human Emotions. IEEE Trans. Cybern. 2019, 49, 1110–1122. [Google Scholar] [CrossRef] [PubMed]
Mills, C.; Gregg, J.; Bixler, R.; D’Mello, S.K. Eye-Mind reader: An intelligent reading interface that promotes long-term comprehension by detecting and responding to mind wandering. Hum.-Comput. Interact. 2021, 36, 306–332. [Google Scholar] [CrossRef]
Hoppe, S.; Loetscher, T.; Morey, S.A.; Bulling, A. Eye movements during everyday behavior predict personality traits. Front. Hum. Neurosci. 2018, 12, 328195. [Google Scholar] [CrossRef]
Barral, O.; Lallé, S.; Guz, G.; Iranpour, A.; Conati, C. Eye-Tracking to Predict User Cognitive Abilities and Performance for User-Adaptive Narrative Visualizations. In Proceedings of the 2020 International Conference on Multimodal Interaction, Virtual, 25–29 October 2020; ICMI ’20. pp. 163–173. [Google Scholar] [CrossRef]
Steichen, B.; Carenini, G.; Conati, C. User-adaptive information visualization - Using Eye Gaze Data to Infer Visualization Tasks and User Cognitive Abilities. In Proceedings of the 2013 International Conference on Intelligent User Interfaces, Los Angeles, CA, USA, 19–22 March 2013; IUI ’13. pp. 317–328. [Google Scholar] [CrossRef]
Raptis, G.E.; Katsini, C.; Belk, M.; Fidas, C.; Samaras, G.; Avouris, N. Using Eye Gaze Data and Visual Activities to Infer Human Cognitive Styles: Method and Feasibility Studies. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, Bratislava, Slovakia, 9–12 July 2017; UMAP ’17. pp. 164–173. [Google Scholar] [CrossRef]
Katsini, C.; Fidas, C.; Raptis, G.E.; Belk, M.; Samaras, G.; Avouris, N. Eye Gaze-driven Prediction of Cognitive Differences during Graphical Password Composition. In Proceedings of the 23rd International Conference on Intelligent User Interfaces, Tokyo, Japan, 7–11 March 2018; IUI ’18. pp. 147–152. [Google Scholar] [CrossRef]
Alhargan, A.; Cooke, N.; Binjammaz, T. Affect recognition in an interactive gaming environment using eye tracking. In Proceedings of the 2017 7th International Conference on Affective Computing and Intelligent Interaction, ACII 2017, San Antonio, TX, USA, 23–26 October 2017; pp. 285–291. [Google Scholar] [CrossRef]
Tarnowski, P.; Kołodziej, M.; Majkowski, A.; Rak, R.J. Eye-Tracking Analysis for Emotion Recognition. Comput. Intell. Neurosci. 2020, 2020, 2909267. [Google Scholar] [CrossRef]
Zheng, L.J.; Mountstephens, J.; Teo, J. Four-class emotion classification in virtual reality using pupillometry. J. Big Data 2020, 7, 43. [Google Scholar] [CrossRef]
Ktistakis, E.; Skaramagkas, V.; Manousos, D.; Tachos, N.S.; Tripoliti, E.; Fotiadis, D.I.; Tsiknakis, M. COLET: A dataset for COgnitive workLoad estimation based on eye-tracking. Comput. Methods Programs Biomed. 2022, 224, 106989. [Google Scholar] [CrossRef]
Lallé, S.; Conati, C.; Carenini, G. Predicting confusion in information visualization from eye tracking and interaction data. In Proceedings of the IJCAI International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016; pp. 2529–2535. [Google Scholar]
Salminen, J.; Nagpal, M.; Kwak, H.; An, J.; Jung, S.g.; Jansen, B.J. Confusion Prediction from Eye-Tracking Data: Experiments with Machine Learning. In Proceedings of the 9th International Conference on Information Systems and Technologies, Cairo, Egypt, 24–26 March 2019. icist 2019. [Google Scholar] [CrossRef]
Sims, S.D.; Conati, C. A Neural Architecture for Detecting User Confusion in Eye-Tracking Data. In Proceedings of the 2020 International Conference on Multimodal Interaction, Virtual, 25–29 October 2020; ICMI ’20. pp. 15–23. [Google Scholar] [CrossRef]
Stiber, M.; Bohus, D.; Andrist, S. “Uh, This One?”: Leveraging Behavioral Signals for Detecting Confusion during Physical Tasks. In Proceedings of the 26th International Conference on Multimodal Interaction, San Jose, Costa Rica, 4–8 November 2024; ICMI ’24. pp. 194–203. [Google Scholar] [CrossRef]
Liang, Y.; Reyes, M.L.; Lee, J.D. Real-Time Detection of Driver Cognitive Distraction Using Support Vector Machines. IEEE Trans. Intell. Transp. Syst. 2007, 8, 340–350. [Google Scholar] [CrossRef]
Liu, T.; Yang, Y.; Huang, G.B.; Yeo, Y.K.; Lin, Z. Driver Distraction Detection Using Semi-Supervised Machine Learning. IEEE Trans. Intell. Transp. Syst. 2016, 17, 1108–1120. [Google Scholar] [CrossRef]
Misra, A.; Samuel, S.; Cao, S.; Shariatmadari, K. Detection of Driver Cognitive Distraction Using Machine Learning Methods. IEEE Access 2023, 11, 18000–18012. [Google Scholar] [CrossRef]
Miyaji, M.; Kawanaka, H.; Oguri, K. Study on effect of adding pupil diameter as recognition features for driver’s cognitive distraction detection. In Proceedings of the 2010 7th International Symposium on Communication Systems, Networks & Digital Signal Processing (CSNDSP 2010), Newcastle, UK, 21–23 July 2010; pp. 406–411. [Google Scholar] [CrossRef]
Kim, G.; Lee, J.; Yeo, D.; An, E.; Kim, S. Physiological Indices to Predict Driver Situation Awareness in VR. In Proceedings of the Adjunct 2023 ACM International Joint Conference on Pervasive and Ubiquitous Computing & the 2023 ACM International Symposium on Wearable Computing, Cancun, Mexico, 8–12 October 2023; UbiComp/ISWC ’23 Adjunct. pp. 40–45. [Google Scholar] [CrossRef]
Zhou, F.; Yang, X.J.; De Winter, J.C. Using Eye-Tracking Data to Predict Situation Awareness in Real Time during Takeover Transitions in Conditionally Automated Driving. IEEE Trans. Intell. Transp. Syst. 2022, 23, 2284–2295. [Google Scholar] [CrossRef]
Horng, G.J.; Lin, J.Y. Using Multimodal Bio-Signals for Prediction of Physiological Cognitive State Under Free-Living Conditions. IEEE Sens. J. 2020, 20, 4469–4484. [Google Scholar] [CrossRef]
Li, F.; Chen, C.H.; Xu, G.; Khoo, L.P. Hierarchical Eye-Tracking Data Analytics for Human Fatigue Detection at a Traffic Control Center. IEEE Trans. Hum.-Mach. Syst. 2020, 50, 465–474. [Google Scholar] [CrossRef]
Hoppe, S.; Loetscher, T.; Morey, S.; Bulling, A. Recognition of Curiosity Using Eye Movement Analysis. In Proceedings of the Adjunct 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2015 ACM International Symposium on Wearable Computers; Osaka, Japan, 7–11 September 2015, UbiComp/ISWC’15 Adjunct; pp. 185–188. [CrossRef]
Lufimpu-Luviya, Y.; Merad, D.; Paris, S.; Drai-Zerbib, V.; Baccino, T.; Fertil, B. A Regression-Based Method for the Prediction of the Indecisiveness Degree through Eye Movement Patterns. In Proceedings of the 2013 Conference on Eye Tracking South Africa, Cape Town, South Africa, 29–31 August 2013; ETSA ’13. pp. 32–38. [Google Scholar] [CrossRef]
Zheng, W.L.; Dong, B.N.; Lu, B.L. Multimodal emotion recognition using EEG and eye tracking data. In Proceedings of the 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2014, Chicago, IL, USA, 26–30 August 2014; pp. 5040–5043. [Google Scholar] [CrossRef]
Calvo, R.; Peters, D. Positive Computing: Technology forWellbeing and Human Potential; The MIT Press: Cambridge, MA, USA, 2014. [Google Scholar]
Hassenzahl, M. Experience Design: Technology for All the Right Reasons; Morgan & Claypool Publishers: San Rafael, CA, USA, 2010; Volume 3, pp. 1–95. [Google Scholar]
Riva, G.; Baños, R.M.; Botella, C.; Wiederhold, B.K.; Gaggioli, A. Positive technology: Using interactive technologies to promote positive functioning. Cyberpsychol. Behav. Soc. Netw. 2012, 15, 69–77. [Google Scholar] [CrossRef]
Alcañiz, M.; Angrisani, L.; Arpaia, P.; De Benedetto, E.; Duraccio, L.; Gómez-Zaragozá, L.; Marín-Morales, J.; Minissi, M.E. Exploring the Potential of Eye-Tracking Technology for Emotion Recognition: A Preliminary Investigation. In Proceedings of the 2023 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE), Milano, Italy, 25–27 October 2023; pp. 763–768. [Google Scholar] [CrossRef]
Appel, T.; Gerjets, P.; Hoffman, S.; Moeller, K.; Ninaus, M.; Scharinger, C.; Sevcenko, N.; Wortha, F.; Kasneci, E. Cross-task and Cross-participant Classification of Cognitive Load in an Emergency Simulation Game. IEEE Trans. Affect. Comput. 2021, 14, 1558–1571. [Google Scholar] [CrossRef]
Babiker, A.; Faye, I.; Prehn, K.; Malik, A. Machine learning to differentiate between positive and negative emotions using pupil diameter. Front. Psychol. 2015, 6, 1921. [Google Scholar] [CrossRef]
Bao, L.Q.; Qiu, J.L.; Tang, H.; Zheng, W.L.; Lu, B.L. Investigating Sex Differences in Classification of Five Emotions from EEG and Eye Movement Signals. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; pp. 6746–6749. [Google Scholar] [CrossRef]
Bixler, R.; D’Mello, S. Automatic gaze-based user-independent detection of mind wandering during computerized reading. User Model. User-Adapt. Interact. 2016, 26, 33–68. [Google Scholar]
Chen, L.; Cai, W.; Yan, D.; Berkovsky, S. Eye-tracking-based personality prediction with recommendation interfaces. User Model. User-Adapt. Interact. 2023, 33, 121–157. [Google Scholar] [CrossRef]
Chen, S.; Epps, J. Multimodal Event-based Task Load Estimation from Wearables. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–9. [Google Scholar] [CrossRef]
Conati, C.; Lallé, S.; Rahman, M.A.; Toker, D. Further Results on Predicting Cognitive Abilities for Adaptive Visualizations. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 1568–1574. [Google Scholar] [CrossRef]
Dumitriu, T.; Cîmpanu, C.; Ungureanu, F.; Manta, V.I. Experimental Analysis of Emotion Classification Techniques. In Proceedings of the 2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, 6–8 September 2018; pp. 63–70. [Google Scholar] [CrossRef]
Fenoglio, D.; Josifovski, D.; Gobbetti, A.; Formo, M.; Gjoreski, H.; Gjoreski, M.; Langheinrich, M. Federated Learning for Privacy-aware Cognitive Workload Estimation. In Proceedings of the 22nd International Conference on Mobile and Ubiquitous Multimedia, Vienna, Austria, 3–6 December 2023; MUM ’23. pp. 25–36. [Google Scholar] [CrossRef]
Gong, X.; Chen, C.L.P.; Zhang, T. Cross-Cultural Emotion Recognition With EEG and Eye Movement Signals Based on Multiple Stacked Broad Learning System. IEEE Trans. Comput. Soc. Syst. 2024, 11, 2014–2025. [Google Scholar] [CrossRef]
Hutt, S.; Mills, C.; Bosch, N.; Krasich, K.; Brockmole, J.; D’Mello, S. “Out of the Fr-Eye-ing Pan”. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, Bratislava, Slovakia, 9–12 July 2017; pp. 94–103. [Google Scholar] [CrossRef]
Hutt, S.; Krasich, K.; Mills, C.; Bosch, N.; White, S.; Brockmole, J.R.; D’Mello, S.K. Automated gaze-based mind wandering detection during computerized learning in classrooms. User Model. User-Adapt. Interact. 2019, 29, 821–867. [Google Scholar] [CrossRef]
Jyotsna, C.; Amudha, J.; Ram, A.; Fruet, D.; Nollo, G. PredictEYE: Personalized Time Series Model for Mental State Prediction Using Eye Tracking. IEEE Access 2023, 11, 128383–128409. [Google Scholar] [CrossRef]
Kwok, T.C.K.; Kiefer, P.; Schinazi, V.R.; Hoelscher, C.; Raubal, M. Gaze-based detection of mind wandering during audio-guided panorama viewing. Sci. Rep. 2024, 14, 27955. [Google Scholar] [CrossRef] [PubMed]
Zhao, L.M.; Li, R.; Zheng, W.L.; Lu, B.L. Classification of Five Emotions from EEG and Eye Movement Signals: Complementary Representation Properties. In Proceedings of the 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER), San Francisco, CA, USA, 20–23 March 2019; pp. 611–614. [Google Scholar] [CrossRef]
Liu, X.; Chen, T.; Xie, G.; Liu, G. Contact-free cognitive load recognition based on eye movement. J. Electr. Comput. Eng. 2016, 2016, 1601879. [Google Scholar] [CrossRef]
Liu, Y.; Yu, Y.; Tao, H.; Ye, Z.; Wang, S.; Li, H.; Hu, D.; Zhou, Z.; Zeng, L.L. Cognitive Load Prediction from Multimodal Physiological Signals using Multiview Learning. IEEE J. Biomed. Health Informat. 2023, 1–11. [Google Scholar] [CrossRef]
Lobo, J.L.; Ser, J.D.; De Simone, F.; Presta, R.; Collina, S.; Moravek, Z. Cognitive Workload Classification Using Eye-Tracking and EEG Data. In Proceedings of the International Conference on Human-Computer Interaction in Aerospace, Paris, France, 14–16 September 2016. HCI-Aero ’16. [Google Scholar] [CrossRef]
Luong, T.; Holz, C. Characterizing Physiological Responses to Fear, Frustration, and Insight in Virtual Reality. IEEE Trans. Vis. Comput. Graph. 2022, 28, 3917–3927. [Google Scholar] [CrossRef] [PubMed]
Luong, T.; Martin, N.; Raison, A.; Argelaguet, F.; Diverrez, J.M.; Lecuyer, A. Towards Real-Time Recognition of Users Mental Workload Using Integrated Physiological Sensors into a VR HMD. In Proceedings of the 2020 IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2020, Virtual, 9–13 November 2020; pp. 425–437. [Google Scholar] [CrossRef]
Ma, R.X.; Yan, X.; Liu, Y.Z.; Li, H.L.; Lu, B.L. Sex Difference in Emotion Recognition under Sleep Deprivation: Evidence from EEG and Eye-tracking. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Virtual, 1–5 November 2021; pp. 6449–6452. [Google Scholar] [CrossRef]
Mills, C.; Bixler, R.; Wang, X.; D’Mello, S.K. Automatic gaze-based detection of mind wandering during narrative film comprehension. In Proceedings of the 9th International Conference on Educational Data Mining, EDM 2016, Raleigh, NC, USA, 29 June–2 July 2016; pp. 30–37. [Google Scholar]
Ren, P.; Ma, X.; Lai, W.; Zhang, M.; Liu, S.; Wang, Y.; Li, M.; Ma, D.; Dong, Y.; He, Y.; et al. Comparison of the Use of Blink Rate and Blink Rate Variability for Mental State Recognition. IEEE Trans. Neural Syst. Rehabil. Eng. 2019, 27, 867–875. [Google Scholar] [CrossRef]
Tao, L.Y.; Lu, B.L. Emotion Recognition under Sleep Deprivation Using a Multimodal Residual LSTM Network. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar] [CrossRef]
Wang, R.; Amadori, P.V.; Demiris, Y. Real-Time Workload Classification during Driving using HyperNetworks. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 3060–3065. [Google Scholar] [CrossRef]
Wu, C.; Cha, J.; Sulek, J.; Zhou, T.; Sundaram, C.P.; Wachs, J.; Yu, D. Eye-Tracking Metrics Predict Perceived Workload in Robotic Surgical Skills Training. Hum. Factors J. Hum. Factors Ergon. Soc. 2020, 62, 1365–1386. [Google Scholar] [CrossRef]
Yang, H.; Wu, J.; Hu, Z.; Lv, C. Real-Time Driver Cognitive Workload Recognition: Attention-Enabled Learning With Multimodal Information Fusion. IEEE Trans. Ind. Electron. 2024, 71, 4999–5009. [Google Scholar] [CrossRef]
Zhai, J.; Barreto, A. Stress Detection in Computer Users Based on Digital Signal Processing of Noninvasive Physiological Variables. In Proceedings of the 2006 International Conference of the IEEE Engineering in Medicine and Biology Society, New York, NY, USA, 30 August–3 September 2006; pp. 1355–1358. [Google Scholar] [CrossRef]
Zhang, T.; El Ali, A.; Wang, C.; Zhu, X.; Cesar, P. CorrFeat: Correlation-based Feature Extraction Algorithm using Skin Conductance and Pupil Diameter for Emotion Recognition. In Proceedings of the 2019 International Conference on Multimodal Interaction, Suzhou, China, 14–18 October 2019; ICMI ’19. pp. 404–408. [Google Scholar] [CrossRef]
Li, T.H.; Liu, W.; Zheng, W.L.; Lu, B.L. Classification of Five Emotions from EEG and Eye Movement Signals: Discrimination Ability and Stability over Time. In Proceedings of the International IEEE/EMBS Conference on Neural Engineering, NER, San Francisco, CA, USA, 20–23 March 2019; pp. 607–610. [Google Scholar] [CrossRef]
Zhong, X.; Hou, W. A Method for Classifying Cognitive Load of Visual Tasks Based on Eye Tracking Features. In Proceedings of the 2023 9th International Conference on Virtual Reality (ICVR), Xianyang, China, 12–14 May 2023; pp. 131–138. [Google Scholar] [CrossRef]

Figure 1. Filtering process following Page et al. [30] to select papers included for this review.

Figure 2. Number of articles about eye-based recognition of user traits and states accumulated per year.

Figure 3. Framework of eye-based recognition of user traits and states.

Table 1. Inclusion criteria.

No	Inclusion Criterion Description
1	Applied a high-quality video-based eye-tracking device to collect eye data.
2	Investigated a user trait or state with eye data.
3	Leveraged eye-tracking data collected within an experimental study.
4	Applied an advanced algorithm that leveraged a machine learning or deep learning approach to recognize a user state or trait.
5	Publication should be available in English.

Table 2. Future research directions of eye-based user state and trait recognition.

Dimension	Suggestions
Task	Diversify studied tasks to generalize eye-based user trait and state models more independently from the specific task and to support more everyday life tasks.
Context	Diversify investigated contexts, e.g., smaller and bigger screen sizes, wild/natural environment of users (work, home, university, school), or MR/AR/VR contexts.
Technology and data processing	Follow-up multi-modal approaches for trait and state recognition (e.g., eye tracking + ECG, EEG, webcam, speech, interaction data). Build up best practices by sharing datasets and machine learning pipelines for eye-based user state and trait model development.
Recognition target	Eye-based recognition of team/group-level constructs. Simultaneous and real-time recognition of user states and traits.
General suggestions	Build gaze-adaptive systems that leverage real-time eye-based recognition. Design and evaluate different adaptation types for gaze-adaptive systems. Conduct longitudinal and large-scale studies to strengthen the generalizability of eye-based user models. Perform research on privacy-aware eye-based user state and trait recognition and gaze-adaptive systems.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Langner, M.; Toreini, P.; Maedche, A. Eye-Based Recognition of User Traits and States—A Systematic State-of-the-Art Review. J. Eye Mov. Res. 2025, 18, 8. https://doi.org/10.3390/jemr18020008

AMA Style

Langner M, Toreini P, Maedche A. Eye-Based Recognition of User Traits and States—A Systematic State-of-the-Art Review. Journal of Eye Movement Research. 2025; 18(2):8. https://doi.org/10.3390/jemr18020008

Chicago/Turabian Style

Langner, Moritz, Peyman Toreini, and Alexander Maedche. 2025. "Eye-Based Recognition of User Traits and States—A Systematic State-of-the-Art Review" Journal of Eye Movement Research 18, no. 2: 8. https://doi.org/10.3390/jemr18020008

APA Style

Langner, M., Toreini, P., & Maedche, A. (2025). Eye-Based Recognition of User Traits and States—A Systematic State-of-the-Art Review. Journal of Eye Movement Research, 18(2), 8. https://doi.org/10.3390/jemr18020008

Article Menu

Eye-Based Recognition of User Traits and States—A Systematic State-of-the-Art Review

Abstract

1. Introduction

2. Foundations

3. Materials and Methods

3.1. Systematic Literature Review

3.2. Framework Creation

4. Results

4.1. Descriptive Results

4.2. Framework

4.2.1. Dimensions: Task and Context

4.2.2. Dimension: Technology and Data Processing

4.2.3. Dimension: Recognition Targets

5. Discussion

5.1. Task

5.2. Context

5.3. Technology and Data Processing

5.4. Recognition Target

5.5. General Suggestions

5.6. Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. List of All Identified Publications

Appendix A.2. Concept Matrix

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI