Using Synchronized Eye Movements to Predict Attention in Online Video Learning

: Concerns persist about attentional engagement in online learning. The inter-subject correlation of eye movements (ISC) has shown promise as an accessible and effective method for attention assessment in online learning. This study extends previous studies investigating ISC of eye movements in online learning by addressing two research questions. Firstly, can ISC predict students’ attentional states at a finer level beyond a simple dichotomy of attention states (e.g., attending and distracted states)? Secondly, whether learners’ learning styles affect ISC’s prediction rate of attention assessment in video learning? Previous studies have shown that learners of different learning styles have different eye movement patterns when viewing static materials. However, limited research has explored the impact of learning styles on viewing patterns in video learning. An eye tracking experiment with participants watching lecture videos demonstrated a connection between ISC and self-reported attention states at a finer level. We also demonstrated that learning styles did not significantly affect ISC’s prediction rate of attention assessment in video learning, suggesting that ISC of eye movements can be effectively used without considering learners’ learning styles. These findings contribute to the ongoing discourse on optimizing attention assessment in the evolving landscape of online education.


Introduction 1.Online Learning and Attention Assessment
Online learning, or e-learning, has revolutionized education by increasing flexibility, accessibility, and inclusivity.The pervasive influence of technology has significantly contributed to the widespread adoption of online learning, especially during the COVID-19 pandemic when educational institutions quickly transitioned to digital platforms [1].Despite its numerous advantages, concerns have been raised regarding attentional engagement in online learning.
Attentional engagement refers to directing and maintaining attention on a task, stimulus, or activity, involving cognitive resource allocation and sustaining focus over time [2].Attentional engagement or the lack of it has been shown to be highly correlated with learning performance [3][4][5].In traditional classroom settings, teachers can monitor students' attention in real-time and make timely adjustments to the pedagogical strategies to attract their attention.However, in the context of online learning, especially in asynchronous online learning (e.g., MOOCs and recorded lectures), educators lack the immediacy of feedback to gauge and adapt to students' attention levels.Consequently, there is a pressing need for a real-time and effective attention assessment method to address this pedagogical gap.Researchers used various methods to tackle this challenge.For example, some researchers used wearable devices such as headsets to detect attentional-related brain signals [6][7][8] or wrist devices to assess students' attention levels via photoplethysmogram Educ.Sci.2024, 14, 548 2 of 12 signals [9].Other researchers developed attention-monitoring algorithms based on multimodal data including facial expressions, eye movements, and behaviors [10,11].Notably, recent advancements by Madsen and colleagues have demonstrated that the inter-subject correlation of eye movements (ISC), specifically synchronized eye movements across subjects, tracked through webcams, can effectively predict attentional levels in the context of video learning [2].This method is particularly promising as it obviates the need for additional specialized devices, ensuring ease of computation and rendering ISC of eye movement a viable solution to the attention assessment predicament in online learning.

Eye Tracking and Synchronized Eye Movements
Eye tracking technology, which captures information through the tracking of eye movement trajectories and pupil sizes, serves as a valuable tool for revealing cognitive and perceptual abilities during information processing.The application of eye tracking technology in online learning has garnered increasing attention from researchers in recent decades, with a notable focus on attention assessment (for reviews, refs.[12,13]).Attention, a critical component in the learning process, is intricately linked to eye movements, gaze direction, and visual fixation [14][15][16].Previous studies typically examined the directions in which learners gaze at learning materials or teacher's instructions as well as the time they dwell on learning materials.However, this method necessitates meticulous content analysis and is not easily applicable for routine evaluation of individual students.A more efficient and convenient index is needed.
Building on the observed high correlation of eye movements across subjects during video presentations [17,18], Madsen et al. [2] hypothesized that online instructional videos synchronize eye movements across students; but the level of synchrony depends on whether students are paying attention.Using a research-grade eye tracker and standard webcam, they successfully demonstrated that participants in an attentive condition exhibited significantly higher ISC of eye movements compared to those in a distracted condition.They also found that ISC of eye movements effectively predicted students' performance in exams.Liu et al. [19] replicated the results.
While the studies by Madsen et al. [2] and Liu et al. [19] focused on the binary states of attention-attending and distracted-it is acknowledged that attention operates on a continuum, fluctuating between optimal and suboptimal states from moment to moment [20].The question of whether ISC of eye movements can reliably predict learners' attention at a more nuanced level remains unexplored.

Learning Styles and Eye Movements
Although ISC of eye movements serve as a valid indicator of attentional engagement, many factors may influence the synchronization of eye movements across subjects.For example, individual differences in learning styles present a pertinent consideration, as elucidated by the well-known Felder-Silverman learning style model [21] This model considers learning in a structured education as a two-step process involving the reception and processing of information.The learning style model classifies students according to where they fit on many scales of the ways they receive and process information.Dimensions of learning styles include Sensory/Intuitive, Visual/Verbal, Active/Reflective, and Sequential/Global.

•
Sensing/Intuitive continuum refers to the way individuals prefer to take in information.Sensing learners prefer concrete and practical information, relying on facts and details.Intuitive learners prefer conceptual and innovative information, concerned with theories and meanings.• Visual/Verbal continuum describes how learners prefer to receive information.Visual learners learn best through visual aids such as diagrams, charts, and graphs.Verbal learners prefer written and spoken explanations.
• Active/Reflective continuum refers to the way individuals prefer to process information.Active learners prefer to learn through doing or discussing, while reflective learners prefer to think about and consider information before acting on it.

•
Sequential/Global continuum determines how you prefer to organize and progress toward understanding information.Sequential learners prefer organized and linear presentations, while global learners prefer a broader context and see the overall structure before focusing on details.
Research has consistently demonstrated that learners with distinct learning styles (or cognitive styles) exhibit divergent viewing patterns when engaging with learning materials as summarized in Nugrahaningsih et al. [22].For example, Al-Wabil et al. [23] observed that visual learners classified by Felder and Silverman's learning style focused more at the multimedia areas, while verbal learners focused more at the text areas.Mehigan et al. [24] also showed that visual learners made more fixations on the graphic slide area than verbal learners (no statistical significance test was conducted).Luo [25] tracked learners' eye movement information when studying different learning and showed that (1) compared with active learners, reflective learners spend more time on examples areas; (2) compared with intuitive learners, sensing learners spend more time on reading; (3) visual learners mainly fixated on images, while verbal learners fixated on words more often; (4) sequential learners skip less learning objects and sequential learners spend less time on the navigation pane.
However, it is noteworthy that most prior studies primarily focused on static learning materials, such as slides or web pages, leaving a notable gap in our understanding of how learning styles impact viewing patterns in video learning.Only a limited number of studies have utilized videos as stimuli for eye-tracking experiments [26,27].For example, Cao and Nishihara [26] conducted an eye-tracking experiment using slide videos.Results showed that the mean viewing time for visual group learners in picture parts was longer than the intermediate group, but the difference in the mean viewing time on each slide was not significant.Importantly, the mean viewing time for text parts on each slide for the strong visual group was bigger than the intermediate visual group, contrary to the hypothesis.They also found that global learners tended to have shorter fixation durations and moved their eyes faster and with larger degrees than sequential learners, but the difference was again not significant.Therefore, we do not have enough information on how learning styles affect viewing patterns in video learning.
On the one hand, learners with different learning styles may also have different viewing patterns in video learning.Therefore, categorizing learners based on their learning styles and calculating ISC within homogeneous groups may improve ISC's predictive accuracy of attention assessment.In other words, it reduces interference from dissimilar eye movement patterns of students with distinct learning styles, ultimately increasing prediction efficiency.On the other hand, learning styles may not significantly affect learners' viewing patterns in video learning and ISC of eye movements can be effectively used without considering learners' learning styles.Unlike textural learning materials, lecture videos usually do not present much content on a single frame/slide.Instead, presentation contents are broken down into serval parts and presented sequentially [28], limiting the complicated viewing paths.Also, when designing lecture videos, we use methods such as animation or teacher gestures to guide students' attention [29,30].Therefore, students' viewing paths are relatively clear and consistent, unlike textual learning materials, which produce different view paths.Empirical studies are indispensable to test these contrasting hypotheses and advance our understanding of the intricate interplay between learning styles and eye movement patterns in the dynamic context of video-based learning.

The Current Study
To further explore the ISC of eye movements in assessing attention states in online learning, we conducted an eye tracking experiment examining learners' attention in a video-viewing context.Specifically, we aimed to answer two research questions RQ 1: Whether ISC of eye movements can reliably predict learners' academic performance as well as their attention at a more nuanced level?
We followed Madsen et al. [2] study to evaluate the validity and stability of ISC as an indicator for assessing learners' academic performance as well as their attentional states in a video learning context.More importantly, in terms of the attentional states, we extended our inquiry beyond the binary attending and distracted conditions to incorporate intermediate levels of attention.To achieve this, subjects were asked to self-report their attention levels on a scale from 1 to 9 after watching each video.Subsequently, we conducted correlation analyses to test the significance of the relationship between ISC of eye movements and reported attention levels, with the objective of determining whether ISC can effectively predict attention across a spectrum of attentional states.

RQ 2: Whether learners' learning styles affect ISC's prediction rate of attention assessment in video learning?
Expanding our exploration into the practical application of ISC in attention assessment, we considered the potential impact of learners' individualized learning styles.Specifically, we asked subjects to fill out the Felder-Silverman learning style questionnaire to identify their learning style.We then correlated learners' attention levels with ISC of eye movements calculated within the subgroup.Importantly, we compared the two correlation coefficients (correlations between attention levels and ISC styles or ISC all ), seeking insights into whether differentiating ISC based on learning styles affects ISC's prediction rate of attention assessment.This comparison served as a crucial step in determining whether ISC of eye movements can be effectively employed without considering learners' individualized learning styles or if calculating ISC within distinct learner groups would yield improved prediction rates.

Method 2.1. Participants
Thirty participants took part in the experiment.One participant was excluded due to bad data quality, resulting in 29 participants (14 females, age range from 18 to 27 years, M = 22.31, SD = 2.33).The sample size was chosen based on previous eye-tracking studies for learning [19,25].None of the participants reported a history of neurological or psychiatric disorders.All participants included had normal or corrected to normal vision.Written informed consent was obtained before the experiment following the Declaration of Helsinki.Participants were remunerated for their time.The ethics committee of the Education Department at XX University had approved the study.

Stimuli and Procedure
The video stimuli for the eye-tracking experiment were 4 videos from "Digital Photography Fundamentals" in MOOC.The duration of the videos ranged from 8.5 min to 13.5 min with a mean of 10 min.The video style was classic "Presenter and animation", which shows a presenter as texts and pictures information are shown.Video materials contain the content or elements expected by different learning styles.For example, videos include not only theories about the relationship between aperture, shutter, and ISO, but also illustrative examples and concrete data to demonstrate the relationship.Diagrams and charts as well as texts are integrated to cater to varied learning styles.Stimuli were presented on a 27-inch LCD monitor with a screen refresh rate of 60 Hz and a resolution set at 1920 × 1080 pixels.A chin rest was used to hold the participant's head throughout the experiment.The experiment was programmed using PsychoPy 2022.2.1.
The participants first filled out the Felder-Silverman learning style questionnaire to identify their learning styles (Figure 1).They were then asked to answer 20 questions (16 four-alternative forced choice and 4 multiple choice questions) to test their prior knowledge of photography.After answering the pre-test questions, the eye-tracking experiment started.Eye positions were monitored using the monocular eye-tracking system EyeLink 1000 (SR Research, Mississauga, ON, Canada) at a sampling rate of 1000 Hz.Eye movement Educ.Sci.2024, 14, 548 5 of 12 data were recorded from the left eye.A nine-point calibration and validation procedure were performed to map the eye positions to the screen coordinates.Drift correction was performed before the very video.Participants watched four videos in the order of the original MOOC course while their eye movements were tracked.They rated their attention level on a scale from 1 to 9 after watching each video.After finishing the video watching, they answered the 20 questions again to assess their learning outcomes.They were then asked to watch the first video again but in a distracted condition.In this condition, participants counted backward silently in their minds, from a randomly chosen number between 800 and 1000, in decrements of 7 [2].This task distracted the subjects from the stimulus without requiring overt responses.They had to report the final number when the video finished.
four-alternative forced choice and 4 multiple choice questions) to test their prior knowledge of photography.After answering the pre-test questions, the eye-tracking experiment started.Eye positions were monitored using the monocular eye-tracking system EyeLink 1000 (SR Research, Mississauga, ON, Canada) at a sampling rate of 1000 Hz.Eye movement data were recorded from the left eye.A nine-point calibration and validation procedure were performed to map the eye positions to the screen coordinates.Drift correction was performed before the very video.Participants watched four videos in the order of the original MOOC course while their eye movements were tracked.They rated their attention level on a scale from 1 to 9 after watching each video.After finishing the video watching, they answered the 20 questions again to assess their learning outcomes.They were then asked to watch the first video again but in a distracted condition.In this condition, participants counted backward silently in their minds, from a randomly chosen number between 800 and 1000, in decrements of 7 [2].This task distracted the subjects from the stimulus without requiring overt responses.They had to report the final number when the video finished.

Preprocessing of Eye Movement Data
EDF (EyeLink Data Format) files were converted to ASCII files for later analyses.Horizontal, and vertical coordinates, as well as pupil sizes from the first frame to the last frame of the video, were extracted.Blinks were detected using the SR research blink detection

Data Analyses 2.3.1. Preprocessing of Eye Movement Data
EDF (EyeLink Data Format) files were converted to ASCII files for later analyses.Horizontal, and vertical coordinates, as well as pupil sizes from the first frame to the last frame of the video, were extracted.Blinks were detected using the SR research blink detection algorithm.The blinks and 100 ms before and after were filled with linearly interpolated values for horizontal, and vertical coordinates, as well as pupil sizes.

Intersubject Correlation of Eye Movement Data
Intersubject correlation of eye movements was calculated using the method from Madsen et al. [2].Specifically, we first computed the Pearson's correlation coefficient between a single participant's vertical coordinates with that of all other participants.Second, we calculated a single ISC value for a participant by averaging the correlation values between that participant and all other participants.Third, we then repeated steps 1 and 2 for all participants, resulting in a single ISC value for each participant.We repeat these three steps for the horizontal coordinates and pupil size.Finally, we averaged ISC vertical and ISC horizontal to obtain a single ISC to represent the ISC of eye movements or the three ISC values (i.e., ISC vertical , ISC horizontal , and ISC pupil ) to represent the ISC of eye movements and pupil sizes for each participant, to assess their correlations with other participants.The ISC values for the attending and distracted conditions were computed on the data for the two conditions separately.

Learning Style Questionnaire
The FSLSQ assesses participants' learning styles across four main categories: (1) Active/ Reflective, (2) Sensing/Intuitive, (3) Visual/Verbal, and (4) Sequential/Global.There are 11 questions in each category with answer a or answer b.Answer a corresponds to the preference for the first type of each category (Active, Sensing, Visual, or Sequential), and answer b to the second type of each category (Reflective, Intuitive, Verbal, or Global).When answering a question, for instance, with an active preference, 1 point is added to the value of the type Active.We then calculated the learning style results by subtracting the points of the second type from the first type (e.g., 7 Active − 4 Reflective = 3 Active).According to Felder and Soloman, the interpretation of the score is as follows:

•
If the score for a dimension is 1 or 3, learners are fairly well balanced on the two categories of that dimension, with only a mild preference for one or the other.

•
If the score for a dimension is 5 or 7, learners have a moderate preference for one category of that dimension.Learners may learn less easily in an environment that fails to address that preference at least some of the time than they would in a more balanced environment.

•
If the score for a dimension is 9 or 11, learners have a strong preference for one category of that dimension.Learners may have difficulty learning in an environment that fails to address that preference at least some of the time.

Scores of the Pretest and Posttest
We first performed a one-sample t-test to examine the prior knowledge level of participants.Results showed that the scores of the pretest (M: 4.76 ± 1.62) were not significantly different than 5 (20 items with 25% of chance being right from guessing), t (28) = −0.80,p = 0.429, 95%CI = [−0.53,0.23], suggesting participants were naïve to the learning material.Paired t-test of the pretest and post-test showed that the scores of the posttest (M: 15.59 ± 2.34) were significantly larger than the pretest, t (28) = 19.59,p < 0.001, 95%CI = [9.70,11.96], Cohen's d = 3.64, suggesting learning process were successfully implemented during the experiment.

ISC Prediction on Test Score and Attention Level
To test whether correlated eye movements predict the test score and the attention level, we performed correlation analyses.The Pearson correlation coefficient showed that ISC of eye movements was not significantly correlated with test scores (r = 0.09, p = 0.65, Figure 3A).However, when we correlated the self-report attention level (from 1-9) after watching each video with ISC values, Spearman correlation analysis showed significant correlation effects (r = 0.57, p < 0.001, Figure 3B).Results suggest that ISC of eye movements predicts the attention level of the learners but not necessarily the test scores.This is also true for ISC of eye movement and pupil sizes (r scores = 0.12, p = 0.54; r attention = 0.59, p < 0.001).

ISC Prediction on Test Score and Attention Level
To test whether correlated eye movements predict the test score and the attention level, we performed correlation analyses.The Pearson correlation coefficient showed that ISC of eye movements was not significantly correlated with test scores (r = 0.09, p = 0.65, Figure 3A).However, when we correlated the self-report attention level (from 1-9) after watching each video with ISC values, Spearman correlation analysis showed significant correlation effects (r = 0.57, p < 0.001, Figure 3B).Results suggest that ISC of eye movements predicts the attention level of the learners but not necessarily the test scores.This is also true for ISC of eye movement and pupil sizes (rscores = 0.12, p = 0.54; rattention = 0.59, p < 0.001).

ISC of Different Learning Styles
We set the score difference threshold to 3, which means that participants with a score larger or equal to 3 were categorized to the corresponding learning style.The distribution of the participant's learning style is listed in   To investigate whether categorizing learners into their learning styles can change ISC's prediction rate of learner's attention, we re-calculated ISC of different learning styles.Specifically, we first computed the Pearson's correlation coefficient between a single participant's vertical/horizontal coordinates or pupil sizes with that of all other participants who belonged to the same learning style.We then calculated a single ISC value for a participant by averaging the correlation values between that participant and all other participants of the same learning style.The rest of the steps remained unchanged.In other words, we utilized information only from the participants who belonged to the same learning style to compute ISCstyles, whereas, the conventional ISCall utilizes information from all the participants regardless of their learning styles.We performed Spearman correlation analyses with the ISCstyles and their attention levels.The r coefficients of different learning styles are listed in Table 2.

ISC of Different Learning Styles
We set the score difference threshold to 3, which means that participants with a score larger or equal to 3 were categorized to the corresponding learning style.The distribution of the participant's learning style is listed in   To investigate whether categorizing learners into their learning styles can change ISC's prediction rate of learner's attention, we re-calculated ISC of different learning styles.Specifically, we first computed the Pearson's correlation coefficient between a single participant's vertical/horizontal coordinates or pupil sizes with that of all other participants who belonged to the same learning style.We then calculated a single ISC value for a participant by averaging the correlation values between that participant and all other participants of the same learning style.The rest of the steps remained unchanged.In other words, we utilized information only from the participants who belonged to the same learning style to compute ISC styles , whereas, the conventional ISC all utilizes information from all the participants regardless of their learning styles.We performed Spearman correlation analyses with the ISC styles and their attention levels.The r coefficients of different learning styles are listed in Table 2.
To test whether ISC styles is significantly different from ISC all , we performed a bootstrap analysis [31].Specifically, we randomly sampled ISC styles and ISC all datasets with replacement, each time taking out the same amount of data as the original sample as a new sample.We then calculated the difference between the two new ISC styles and ISC all samples, repeating it for 5000 times.It allowed us to construct the 95% confidence interval of the difference.We then determined the significance of the difference based on the position of 0 significantly correlated with test scores.Another study that used a webcam-based eye tracking system also failed to replicate the relationship between ISC of eye movements and learning performance [32].Several factors may contribute to these discrepancies, including variations in learning materials and exam difficulty.For instance, Madsen et al. [2] utilized short videos from YouTube channels, while Liu et al. [19] and our study employed lecture videos from MOOCs, which typically follow a different structure and have longer durations.Sauter et al. [32] utilized recorded conference videos.The variances in learning materials and exam characteristics highlight the need for additional research to thoroughly investigate and confirm the relationship between ISC of eye movements and test scores.It is crucial to consider these contextual factors to discern the generalizability and robustness of the observed correlations, emphasizing the necessity for further exploration in diverse educational settings.

Learners' Learning Styles Do Not Affect ISC's Prediction Rate of Attention Assessment in Video Learning
Importantly, we tested whether learners' learning styles affect ISC's prediction rate of attention assessment in video learning.The outcomes revealed that distinguishing among learners' learning styles did not yield notable changes in the effectiveness of ISC in predicting attention.As highlighted in our introduction, lecture videos commonly employ strategies like animation and teacher's gestures to guide viewer attention, potentially resulting in more consistent viewing patterns.This finding aligns with the work of Mu et al. [33], who explored learners' attention preferences in the context of online learning.Their study, conducted during video learning, found no significant differences in attention preferences among students with varying visual-verbal preferences.
To further explore this finding, we conducted a focused analysis on three segments of the lecture video stimuli which lasted around 200 s with the mean duration being 65.85 s where content remained relatively static.The information presented in the segment contains images and texts at the same time.We investigated the dwell time of visual and verbal learners on images and text contents.We found that visual learners spent significantly more time on images (mean ± SD: 30.71 s ± 10.12) than textual information (23.71 s ± 11.62), t (44) = 2.27, p = 0.03, while verbal learners dedicated more time on texts (33.95 s ± 12.01) than image information (23.65 s ± 10.22), t (14) = 2.25, p = 0.04.This implies that learners with distinct learning styles indeed exhibit divergent viewing patterns and preferences when confronted with static materials.However, these distinctions seem less pronounced in the context of lecture videos, where the dynamic presentation and additional guiding elements might contribute to more uniform viewing behaviors.
There are a few limitations in the current study that need to be mentioned.Firstly, the distribution of learning styles among subjects is not balanced in the current participants.The representation of active, verbal, sensing, and sequential learners is notably lower compared to their counterparts, introducing potential bias into the results.Future studies should aim for a more balanced representation to ensure a comprehensive understanding of the relationship between learning styles and eye movements.Another limitation pertains to the examination of a specific style of lecture videos-Presenter and animation.Generalizing the findings to other lecture video formats, such as presenter and glass board or recorded lectures, may not be warranted.Future investigations should encompass a broader spectrum of video formats and diverse subjects to enhance the external validity of the findings.Additional limitation relates to the ISC itself.If we use ISC of eye movements to assess attention, we need to guarantee that the contents presented to learners are synchronized.While it may not be a problem in synchronous online learning, in asynchronous learning however, learners could not stop or re-play the video section if they did not understand the section before continuing learning which is an important advantage of asynchronous learning.
In summary, the current study verifies and extends previous studies by demonstrating that ISC of eye movements can reliably predict learners' attention at a more nuanced level.

Figure 2 .
Figure 2. Eye movement data from two subjects in horizontal and vertical direction and pu in attending and distracted conditions.

Figure 2 .
Figure 2. Eye movement data from two subjects in horizontal and vertical direction and pupil size in attending and distracted conditions.

Table 1 .
Distribution of the participants' learning style.

Table 2 .
Correlation coefficients between ISCstyles and attention levels.

Table 1 .
Distribution of the participants' learning style.