Eye Movement Analysis of Digital Learning Content for Educational Innovation

: Eye movement technology is highly valued for evaluating and improving digital learning content. In this paper, an educational innovation study of eye movement behaviors on digital learning content is presented. We proposed three new eye movement metrics to explain eye movement behaviors. In the proposed method, the digital content, which were slide-deck-like works, were classiﬁed into page categories according to the characteristics of each page. We interpreted the subjects’ eye movement behaviors on the digital slide decks. After data regularization and ﬁltering, the results were analyzed to give directions for how to design an attractive digital learning content from the viewpoint of eye movement behaviors. The relationships between the subjects’ evaluation scores, page categories, and eye movement metrics are discussed. The results demonstrated that the proposed ﬁxation time percentage (FTP) was a representative, strong, and stable eye movement metric to measure the subjects’ interest. Moreover, a reasonable portion of semantic content had a positive inﬂuence on the subjects’ interest.


Introduction
As an intuitive way to inspect what is attractive to humans through the fixation of gaze since the very beginning of human attention studying, eye movement has been a popular research area in human-computer interaction research for decades. McCormick and Sanders [1] indicated that 80% of humanity's cognitive information processing is through vision. Recently, with the popularity of low-cost eye trackers and open source eye movement data analytic software, one of the significant areas of progress in eye movement and concentration research is that we can now obtain eye movement data more precisely and easily [2]. A large number of studies have been focused on the eye movement research of evaluating the usability of a system, studying interface design and implementing remote manipulation applications for those with disabilities [3][4][5].
Currently, with the major development and widespread reach of digital content, the habit of reading has gradually changed from reading paper books to reading electronic books. More and more schools began to adopt digital learning courses [6]. Thus, studies on reading habits through eye movement drew attention from educators and human behavior researchers. For educators in the area of educational innovation, eye movement data is an excellent opportunity to evaluate and improve digital learning content to obtain more interest from readers [7][8][9]. Therefore, eye tracking technology is highly valued in e-learning and educational innovations [10,11]. The researchers indeed found an attention pattern in the research of web viewing [12], from which the results point out that the attention models are quite different between the semantic and graphic content. Several studies [13,14] also indicated that font size and style have a major amount of influence on human attention.
Eye tracking analysis plays a crucial role in human behavior research, and there are still many questions about the relationship between the degree of interest and the eye movement behaviors.
An ideal and precise model can draw viewers' interest or attention towards a simple sentence, or even a formula [15]. For educators or designers, it is very convenient to evaluate how attractive an e-book or a digital learning slide deck is using metrics. Furthermore, researchers can utilize such models to improve their design and layout, to obtain more attention. Unfortunately, there is currently no simple and stable coefficient to measure or predict the degree of interest in a digital learning area. Though there are many studies on eye movement and degree of interest [16], the behaviors change with the viewing environment, which means that we cannot simply consider every viewing behavior as being directly comparable.
While studying digital learning content, we found that the pages or slide decks have their own unique features, correlated with the eye movement behaviors of the digital learning process. Therefore, in this paper, a study on eye movement behaviors while reading digital learning content is presented to reveal the regular pattern of eye viewing on different types of slides. Three new eye movement metrics are proposed to explain eye movement behaviors: Fixation time percentage (FTP) represents the percentage of the total fixation time in the whole reading process; Mean time to fixation (MTTF) represents the average time to the next fixation to be generated; Spatial diversity of fixations (SDOF) represents the average distance from each fixation point to the mean point in a certain page of a slide deck. The digital content, which was a slide-like work edited online with the software Microsoft Sway, was classified according to the characteristics of each page. The experiment was carried out to interpret the eye movement behaviors on the digital content, and the result was analyzed to give directions of how to design attractive digital learning content from the viewpoint of eye movement behaviors. Furthermore, this paper also attempts to find the relationship between the subjects' evaluation scores, page categories, and the proposed eye movement metrics, including: (1) The relationship between the eye movement indicators and scores. (2) The relationship between the page categories and the subjects' evaluation scores. (3) The relationship between the eye movement indicators and page categories.
The rest of this paper is organized as follows: Section 2 presents the related work of eye tracking. The proposed methods for analyzing eye movement data and the hypothesis about these statistics is described in Section 3. Section 4 presents the experimental results, the hypothesis evaluation, and the discussion of the relationship between subjects' evaluation scores, page categories, and the proposed eye movement metrics. Finally, the conclusions and the future work are presented in Section 5.

Related Work
Human eye movement and concentration have been studied for decades. Researchers work on recording eye movement data with eye trackers, which have two parts: records about the positions and about the times of gaze. Fixation identification is the key technique that transfers eye movement data into relevant human eye behaviors [17], such as fixation and saccade. Salvucci and Goldberg [18] proposed several algorithms to achieve fixation identification from different angles. The methods for fixation identification can roughly be divided into spatial methods and temporal methods. Spatial methods include velocity-based, dispersion-based, and area-based methods. All of them emphasize the moving distance while eye scanning, which takes advantage of the fact that the fixation behavior has a lower moving distance than the saccade behavior. On the other hand, in temporal methods, the duration sensitive and locally adaptive methods are based on the fact that fixations are rarely less than 100 ms and often in the range of 200-400 ms, making fixation identification successful as well.
These fixation identification methods can lead to similar results, providing a foundation for further eye movement analysis.
Subsequently, a great amount of the following eye tracking studies paid attention to the area of interest (AOI) [2], which usually represents an area of a graphic or an object that is attractive to human concentration. AOI-based algorithms provide an effective method for translating subject concentrations raw data into an analyzable number. While the AOI method has its own limitations [7], it still is a popular analytic tool for eye research. To interpret eye movement data more intuitively and deeply, several visual analytic mythologies are widely used [19,20]. The scan path allows us to determine how people redirect their attention between elements; duration plots and heat maps provide an intuitive way to understand which part of an image received the most attention; the cover plot highlights the image area covered by fixations. Furthermore, we can classify the fixation data to select the most suitable analytic metric while researching large data or data with high variations with a summary map of the time clusters [21]. These visualization modalities and other visual metrics provide observers with an opportunity to examine other aspects of information.
With the major development and wide spread of neural networks and machine learning, many researchers have attempted to analyze eye movements with learning-based methods [22,23] to improve object detection results by training the model with eye movement data. The features that can drive an observer's attention are classified into two levels. The low-level features include the luminance, color, edge, and density, whereas the high-level features, which can only be recognized in higher brain visual neural areas, include the content and semantic text in pictures. These two types of statics are used in learning-based methods to construct the saliency model, which represents human attention more precisely than a heat map.
The fundamental premise of eye tracking analysis is that researchers must choose adequate metrics to measure the eye movement behavior. There are many visual-effort metrics that have been proposed, and Table 1 shows a list of metrics used in previous studies based on fixation [24]. Fixation duration (FD), fixation number (FN), and fixation average duration (FAD) are very common eye tracking metrics to measure the required time for analyzing and processing depth on the stimulus. The ratio of on-target: all-target fixation time (ROAFT) and the average duration of relevant fixations (ADRF) focus on relevant AOIs, which are used to measure efficiency while searching the stimulus. In order to further explain the degree of interest for the stimulus, we propose several metrics and hypotheses in this paper, attempting to understand the interest pattern and uncover the relationship between the degree of interest and eye movement behaviors.  The total duration of the fixations for relevant AOIs, divided by the total number of relevant AOIs

Participants
The experiment was carried out in a senior high school located in northern Taiwan. Sixty-five students from three classes of the first-grade students voluntarily participated in the study. All of the students were selected by a questionnaire that ensured they were used to reading the digital learning content and had enough knowledge to evaluate the content of experimental materials. To ensure accuracy in capturing the eye movement data, participants were limited to those with normal visual acuity or fully corrected visual acuity. The total effective sample size of this study included the data from students who completed the task, as their eye movement data were recorded successfully for analysis, while 23 students' eye movement records were counted as being invalid. Specially, seven students' eye movement records failed due to human errors. The recording system can only record data from one Chrome browser. However, during the experiment the researcher activated more than one Chrome browser at a time, which lead to an error in the URL event record. Five students' eye movement records were also discarded due to data loss caused by the recording system. The remaining 11 records failed with unknown problems. Thus, the total effective sample size of this study was 42. The average age of the sample was 15.8 years (SD = 0.42).

The Tasks and Materials
In our experiments, the students were asked independently to read and evaluate the selected materials and to submit their assessment. At the same time, their eye movement behaviors were recorded through eye trackers.
The experiment materials were from a previous work [25], where 20 students participated in re-editing a section of the textbook in an online web environment. They were provided with a textbook (a pdf file) and were asked to search for relative multimedia as content to complete their editing using Microsoft Sway. Each student was required to complete a slide deck that was restricted in the range of 8 to 12 pages. We successfully collected the work of 20 students, that is, 20 online slide decks. These works were page-turned, which allowed us to study the eye movement behavior page by page. Among them, we further selected six slide decks as experimental materials. To identify the quality of the work, two experts rated each work based on assessment scale that is the same as the student assessment scale defined in Section 3.5.
Finally, six different Microsoft Sway slide decks were selected as learning content for the students to read in this experiment. Two of them had high evaluation scores, two had medium evaluation scores, and two had low evaluation scores. We scored each of them with the measurement and divided them into three groups by their score. The scores were regarded as a metric to measure the degree of interest. The two slide decks with high evaluation scores were named #1 and #2. The two slide decks with medium evaluation scores were named #3 and #4. The two slide decks with low evaluation scores were named #5 and #6.
The six slide decks have their respective characteristics. The average page number of the six selected slide decks is 10. To understand the features of each work according to the eye movement behavior, we organized the composition of each slide deck by page. Each page was presented as a visual stimulus. To identify the page composition, we divided the pages into eight categories according to the visually similar features and similar characteristics regarding the statics, including "Upper left title", "Centered title", "Image with upper left title", "Image with centered title", "Image with instructions", "Scattered text", "Centralized text", and "Little centralized text", as shown in Table 2. Figure 1 shows the examples of each category.

Page Category Description
Upper left title A page with simply a title located at the upper left corner. Centered title A page with simply a title located at the center of the screen. Image with upper left title A page with a title located at the upper left of the screen and an image as background. Image with centered title A page with a title located at the center of the screen and an image as background.
Image with instructions A page with an image and a small amount of text for instruction, usually in small font size.
Scattered text A page with a large amount of scattered text, usually in big font size. The text accounts for the vast majority area of the page.
Centralized text A page with a large amount of centralized text, usually in small font size. The text accounts for the vast majority area of the page. Little centralized text A page with a small amount of centralized text, usually in small font size. (d) "Image with centered title" page; (e) "Image with instructions" page; (f) "Scattered text" page; (g) "Centralized text" page; (h) "Little centralized text" page.

Procedure
Before the experiment, we prepared the eye tracker and eye movement recording system. The Tobii eye tracker 4C [26] was selected as the device to process the experiment as it is quite simple to set up and can access to the recording system. An open-source eye analytic software called Chrome Plus Record (CPR) [27] with Tobii's analytical use license, was used in this work for eye movement data recording, as it was more convenient and more suitable to fulfill our research purposes compared with other eye tracking analytic applications.
The eye tracker was placed in front of the user under the screen with an attached magnet, as shown in Figure 2. After the installation, we need one more step to finish the setting: i.e., calibration. The calibration is based on the nine-point calibration function implemented in CPR system with the Tobii development SDK, as shown in Figure 3, where all calibration points are visualized with a circle. In the calibration process, subject needed to look at the calibration point on the screen by following an instruction of staring at specific points, and the eye tracker started reading the subject's gaze data. By checking if the subject's gaze matched the location of an alternately moving and stopping display point on the monitor, the CPR system will automatically determine if the calibration passed or not. We utilized a 23 inch LCD with 1920 × 1080 resolutions as the experimental device. The distance between the subject's eye and the screen was not strictly prescribed as the 4C eye tracker can capture the human face and trace the eye ball model.  Before the experiment, the subjects were clearly informed that the experiment would store their eye tracking data for further analysis in this work. To start the experiment, we first asked the subject to sit in front of the prepared screen with the eye tracker placed in front of them. The calibration process was conducted for each subject to adjust the eye tracker. Only once the subject passed the calibration, was he or she allowed to go on with the next step. Secondly, after calibration, we asked the subject to read the Microsoft Sway slide decks in order, while we recorded the eye movement behaviors with the CPR system. To prevent the influence of the reading order on the evaluation scores, the reading order of the slide decks for each participant were randomly arranged. Thirdly, after reading each slide deck, the participant was asked to evaluate each work by filling in the assessment questionnaire that is defined in Section 3.4. The steps mentioned above, including calibration and reading and filling in the questionnaire were repeated until all of the six slide decks were successfully evaluated. In general, it took 10 minutes for each participant to evaluate one slide deck. Finally, the experimental instructor checked the completion of the entire process. If the data were correctly recorded into the database, we exported them as a csv file for further analysis.

Questionnaire
The participants were asked to fill in an assessment questionnaire after reading each of the six selected works. There were a total of six items in the questionnaire, including "Does the text properly match the media?", "Is the layout logical?", "Is the layout creative?", "Is the content complete?", "Is the content correct?", and "Is the key point described clearly?". The question responses for each item were captured using a 5-point Likert scale, ranging from 1 (strongly disagree) to 5 (strongly agree). The assessment criteria were divided into two topic areas, namely content and design [28]. Each area was further subdivided into individual items.

Measured Values
We mainly focused on the information provided by GazeFixation.csv, URLEvent.csv, and Rawdata.csv, which were generated by the CPR system. The GazeFixation file recorded the information regarding the fixations, including the fixation ID, subject name, start time of the fixation in milliseconds, duration in milliseconds, the position information in pixels, and the scrollTop property information in pixels. The threshold for a meaningful fixation in this work was defined as 250 ms. The URLEvent file shows the information of the webpages visited by the subjects, including the URL ID, subject name, webpage URL, keyword, and visiting time. Combining GazeFixation with the webpage viewing information provided by the URLEvent file, we were able to analyze the eye behaviors for each slide deck. Furthermore, after separating the slide decks into several single pages, we were able to analyze the eye movement deeply on each certain page. The Rawdata file recorded the raw mouse and eye movement data, including the event ID, the subject name, the event time, the mouse position information, the gaze point position information, and the scrollTop information. It provided the information of single movements of the subject's eye and mouse, allowing us to study any single event during the reading process.
We used the following common eye tracking metrics to represent a slide deck or a page: duration, fixation duration (FD), fixation number (FN), and fixation average duration (FAD). The duration referred to the total duration of viewing on this slide deck. FD referred to the total duration of fixations on this slide deck. A longer duration or a longer fixation duration indicated that the participant considered that information to be important and was more focused on it [29]. FN referred to the total fixation count in the slide deck. A higher fixation count indicated more devoted attention was given to this stimulus [30]. FAD was the mean value of the durations of fixations on this slide deck. A longer FAD indicated more overall effort spent by a participant during the experiment [31].
In addition, we proposed several new fixation-based metrics to measure the depth of student's processing in the content, including the fixation time percentage (FTP), mean time to fixation (MTTF), and spatial diversity of fixations (SDOF). In the experiment, we researched whether the proposed metrics were stable or not in measuring the subjects' interest while viewing the slide decks. FTP was defined as the percentage of total fixation duration takes of total viewing duration on the page/slide deck, can be obtained by using the following equation: Fixation time percentage = (all fixation duration)/(total viewing duration). (1) FTP is a metric that focuses on the proportion of all fixation duration in the total viewing duration of a stimulus. It is different from the metrics, such as the Ratio of Fixation Time and ROAFT [32], which focus on the ratio of fixation duration of an AOI in all fixation duration of the stimulus. FTP is supposed to explain the subject's attention while viewing the slide decks. In general, a larger fixation time percentage means that the participant pays more attention and is more concerned with the content of the slide deck. MTTF is the average time until the next fixation is generated, which is calculated as the following equation: Mean time to fixation = (total viewing duration)/(fixation number). ( While studying deeply page by page, we find that the characteristics of each page, such as the density and layout of the visual elements on the screen, have a strong connection with the MTTF. For example, a page with a small font size and high density of text leads to a small MTTF. On the other hand, a page that is text-only with a large font size and low density, leads to a large MTTF. Therefore, in the experiment we also attempted to determine whether MTTF was a representative metric or not to measure the subjects' interest while viewing slide decks. The SDOF is the average distance (pixel) to the mean point of all fixation points. It was utilized to evaluate the balance of visual elements on the screen, usually referred to as the mean point. The SDOF is described with the following equation: Spatial diversity of fixations = (total distance from each fixation to the mean point)/ (fixation number).
We bring two definitions of the mean point here. One is the average position of all fixation points, the other is the average point weighted by the duration of fixation. In general, these two mean fixation points were located at almost the same location. However, in some special cases, the two mean fixation points did not fall in the same position, such as a small-type page where participants gave an extremely short viewing duration. We speculated that during the viewing process, the subjects made fixations on searching for the "next page" button that is on the bottom right corner of the screen. Due to the lack of content, the page was not attractive enough for the subjects. The existence of the difference between the two versions of mean point demonstrated the inconsistency of the fixation durations. Being aware of that, we conducted a filter to clear the unnecessary fixation points in the research. We will discuss the filtering method in Section 3.6. As the two different definitions usually cause the same results, we simply use "average fixation point" here. In the experiment, we also attempted to determine whether the SDOF could represent the degree of dispersion of the fixations, or whether a lower SDOF indicated a higher attraction of the page.

Regularization and Filters
Before the analysis of the data, we needed to arrange the raw data, transforming it into readable data. There were two parts to preprocess the data: data regularization and fixation filtering. The data preprocessing can be separated into five steps: (1) Reading order recovery To prevent the results from being affected by the reading order, we made a random reading order list for each participant while doing the experiment. Thus, the first task of the data regularization was to rearrange the reading order of the record corresponding to the real order. In this step, we used the timestamp obtained from the URLEvent.csv generated by CPR system and the URL of the slide decks to cut the viewing process into several sections, and we then arranged everyone's section records in a uniform order. (2) Video offset recovery In this step, we recovered the "frame loss" of the generated video. The frame loss represents some of the frames that caused timestamp offsets in the video that were not recorded during the experiment process. The CPR system provided us eye tracking data and screen records. However, the qualities of the generated videos were not consistent. We speculated that the poor performance of the operating computer could not fully support the CPR system. As a result, these screen videos faced a problem named "frame loss", which means that some frames were not recorded during the experiment process, resulting in timestamp offset in the video. The images displayed by the video did not match with the event recorded in the database. To solve this problem, we assumed that the frame lost rate was consistent in one video, computing the real FPS and scaling the video to the correct length. After recovery, we found that the events and the images finally matched, proving the effectiveness of the recovery method.

(3) Timestamp labeling
To analyze the eye movement behavior of the slide deck by page, we labeled the timing of page switching manually to get the information regarding page switching during the reading process. We manually sampled the timestamp three times to reduce the artificial error made by humans. Finally, each timestamp was labeled as the mean value of three samples. The difference between three different timestamps was narrowed down to less than 0.5 seconds. The process of data regularization was finished when the labeling was complete. (4) Button mask As described in Section 3.5, we knew that the fixations on the "next page" button of the screen might change the position of the weighting mean fixation point to be different from that of another mean point. A feature of Sway's slide decks is that there is a button on the bottom right corner for users to skip to the next page. However, the button caused the visual offset, reflected in the inconsistency of the calculated mean point. Obviously, the gazes generated in these areas did not represent the influence of the page itself. Thus, we masked these areas to filter the fixations. The mask size was 167 × 190, located in the bottom right corner of the page. (5) Outlier removal During the experiment, some fixations had singular durations with either extremely high or low values. We filtered these extreme values in order to make the data distribution correspond to a normal distribution. The standard deviation method, which is common in statistics, was used to remove the outliers.
After these steps, the data preprocessing was complete. Figure 4 shows an example of the comparison of the original fixation data and the filtered data, where the yellow circles represent the fixations. It can be found that the filters improved the performance on the normality test.

The Relationship between Eye Movement Indicators and Scores
Pearson's correlation coefficient (PCC) [33] was utilized to measure the statistical relationship among the eye movement variables and the evaluation scores of each slide deck. This test statistic measures the statistical relationship, or association, between two continuous variables. To interpret the value of PCC, if the PCC value between two variables is above 0.7, it represents a strong linear relationship. If the PCC value is between 0.5 and 0.7, it represents a moderate linear relationship, and if the PCC value is between 0.3 and 0.5, it represents a weak linear relationship. The PCC results of the eye movement indicators and scores are shown as Table 3. Apart from the common eye tracking metrics defined in Section 3.5, the average total number of words (WC) and font size (FS) on each slide deck were also considered as indicators.  Table 3 shows that most PCC results between coefficients and the scores valued by subjects were weaker than the scores valued by experts. The results might be caused by the different judging criteria because the experts considered the degree of interest when they scored the lecture, while the subjects needed to understand the content first. Despite the PCC results between the eye movement indicators and the scores valued by experts and students being different, the PCC result between Score(S) and Score(E) was 0.615, which still represents a moderate linear relationship. In addition, Table 3 shows a negative linear relationship between the scores and the FAD, FTP, and WC, and a positive linear relationship between the SDOF, FS, and scores.
In terms of the proposed metric FTP, the results in Table 3 show that the FTP achieved a very strong negative linear relationship with the scores, which indicates that FTP was a significant metric to measure the participations' interest and predict the evaluation score of the content. The results demonstrated that a high FTP value would lead to a low score, which is because a longer fixation duration corresponded to more difficult-to-process context in the digital learning content, which reflected the degree of cognitive difficulty in the region [32,34]. Therefore, a slide deck with small font and massive content would make subject feel tired and more difficult to understand, and hence would influence the scores.
In terms of the other proposed metric MTTF, the results in Table 3 showed that MTTF appeared to not have a good performance on the correlation coefficient with the scores. It indicated that the "time to fixation (TTF)" was not a significant metric for measuring the participants' interest or predicting the evaluation score of the content. With respected to the SDOF, the results in Table 3 did not show a stronger relationship with the scores. Table 3 indicates that only the SDOF had a positive moderate linear relationship with the scores. We assume that not only diversity but also brightness, contrast, or other features can affect human interest, so there are still more facts to be discovered. In addition, the SDOF was the only spatial metric in the research. The result proved that the spatial features of eye movement had a relationship with the degree of interest, just as the temporal features did. Table 3 also demonstrates that the fixation average duration (FAD), word count (WC), and font size (FS) had an influence on the eye movement behaviors and the subjects' degree of interest. This fact proved our hypothesis and demonstrated a similar result to the previous related work [13,14].
After the PCC test, we also used regression analysis to further examine the relationship of scores and the combination of multiple eye movement metrics, to attempt to determine more significant features. However, the final results showed that only FTP could pass the F test where the P-value of Significance F equaled 0.007761. The regression models were not significant in terms of other eye movement metrics and their combination. In addition, the P-value of FTP in the regression model was less than the common significant level of 95% confidence. The proposed metric FTP was further demonstrated to be significant in the regression model with the evaluating scores. Table 4 shows the compositions of the six slide decks with respect to the eight categories defined in Section 3.2. The average page number of the six selected slide decks is 10. The final scores of each work for experts and students are also shown in Table 4, where score(E) represents the score results from experts, and score(S) represents the score results from subjects. The scores were calculated based on the questionnaire defined in Section 3.4. Table 4 shows that the Slide decks #1, #2, and #3 resulted in a higher score(E) than Slide decks #4, #5, and #6, which indicates that based on the experts' opinions, Slide decks #1, #2, and #3 are better works. On the other side, the score results from the subjects score(S) are similar to the experts' scores, and the Slide decks #1, #2, and #4 resulted in a higher score(S) than Slide decks #3, #5, and #6. Therefore, in accordance with the score results from both experts and subjects, we can conclude that the Slide decks #1 and #2 are relatively better works, whereas Slide decks #5 and #6 are relatively worse works. To further discuss the relationship between the page category and scores, Table 5 shows the page composition table of the high score works (Slide decks #1 and #2) and low score works (Slide decks #5 and #6). While analyzing the page composition, we noticed that the scattered text pages, which are, in total, eight pages, had the largest portion of the high score works, which was a total of 22 pages. On the other hand, the categories that had the largest part in the low score group were the centered title (6/19) and centralized text (6/19). We assumed that these page categories had some characteristics affecting the ranking or scores. From Table 5, for the page categories: Upper left title, Centered title, Image with upper left title, Image with instructions, and Little centralized text, their portion of the high and low score works are very similar and balanced. Additionally, they usually had similar features in the class table.

The Relationship between Page Categories and Scores
On the other hand, the difference between the portion of the page categories: Image with centered title, Scattered text, and Centralized text, in the high and low score works is significant. The results indicated that the subjects were more interested in the works with more "Image with centered title" and "Scattered text" pages. On the other hand, the works with more "Centralized text" pages had a negative relationship with scores as a slide deck with small font and massive content would likely make the subject feel labored and lose interest [35]. Therefore, the scattered text page, image with centered title page, and centralized text page had significant features and provided a major influence on the scores.

The Relationship between the Eye Movement Indicators and Page Categories
To evaluate the subjects' attention paid to different page categories, Table 6 shows the relationship between the page categories and the eye movement indicators. The results demonstrated that the subjects had a greater fixation on scattered text and centralized text pages, and less on title style pages. In addition, we found from Table 6 that the relationship was very similar in some features but quite unique in some others. The results demonstrated that even if a scattered text type page and a centralized text type page had similar textual content and word count, they had a different effect on the viewer. For example, pages in scattered text type had a low fixation time percentage (FTP) and fixation average duration (FAD), yet pages in centralized text had an extremely high score in these two features. However, both had high fixation duration (FD) and fixation number (FN). After analyzing the relationship between the page categories and scores in Section 4.2, we found that the "scattered text page", "image with centered title page", and "centralized text page" had significant features and would have a major influence on the scores. To further evaluate the difference of the subjects' attention paid to the three page categories, Table 7 shows the relationship between the three significant page categories and the eye movement indicators. Based on the results listed in Table 7, we found that comparing to the negative correlation centralized text page, the positive correlation pages (scattered text and image centered title pages) had the following features:  The results lead to a similar conclusion to that discussed in Section 4.1: The FTP and FAD demonstrated a negative, lean relationship with the scores, the FS showed a positive one. Among the three metrics, FTP was the most significant. The results also demonstrated that even if a scattered text type page and a centralized text type page had a similar textual content and word count, they had a different effect on the viewer. In addition, after deeply studying the tables and features, we found that the word count (WC), duration, fixation duration (FD), and fixation number (FN) had a strong, lean correlation with each other. The results indicate that a major portion of the subjects' gaze or fixations happened during the reading process.
The most representative slide deck, ID #6, had the most centralized text pages and 1360 word counts that were almost twice the average. However, almost every subject gave this lecture a bad score. At the same time, slide deck ID #5 had two centralized text pages and the lowest word counts 312, but the score was still not good. On the other hand, slide deck ID #2 had four scattered text pages and a word count of 458 words, and every subject gave this work a good score. This indicates that although the semantic content did not perform well as being interesting to subjects, a reasonable portion of the semantic content had a positive influence on the total score.

Conclusions
This paper presents a study on eye movement behaviors while reading digital learning content. The CPR system and Tobii eye tracker 4C were utilized to process the experiments with 42 effective records. After data preprocessing, the Pearson's correlation coefficient test was applied to measure the linear relationship between the scores and the eye movement indicators. For the proposed three new eye movement metrics: FTP, MTTF, and SDOF, FTP was proven to be a symbolic, strong, and stable metric to measure the subjects' interest, and the SDOF also showed a moderate to strong correlation with the scores.
According to the collected eye movement and page categories, we found the scattered text page, image with centered title page, and centralized text page had significant features and had a great influence on the scores. The highly scored lectures had the common following features: (1) a lower fixation average duration (FAD); (2) a lower fixation time percentage (FTP); and (3) a higher font size (FS). We also found that the word count (WC), duration, fixation duration (FD), and fixation number (FN) had a strong, lean correlation with each other. The result indicated that a major portion of the subjects' gaze or fixations happened during the reading process. From the results, we proposed that a reasonable portion of semantic content had a positive influence on the subjects' interest as well.
For future work, research could focus on the visual stimulation, as the visual elements including brightness, contrast, or animation may also have an influence on the fixation behaviors. In our research, we proved that the spatial features also had a correlation with the subjects' value. However, it only made use of a modest number of samples that may not adequately represent the e-learning content. Moreover, as the score in this work was calculated with respect to each slide deck, the PCC results can only reflect the relationship among the eye movement variables and the scores of each slide deck. In the future, we will attempt to design a work where the score is respectively evaluated for each page category to further study the relationship among the eye movement variables and the score of each page category.