Analysis of the Learning Process through Eye Tracking Technology and Feature Selection Techniques

: In recent decades, the use of technological resources such as the eye tracking methodology is providing cognitive researchers with important tools to better understand the learning process. However, the interpretation of the metrics requires the use of supervised and unsupervised learning techniques. The main goal of this study was to analyse the results obtained with the eye tracking methodology by applying statistical tests and supervised and unsupervised machine learning techniques, and to contrast the effectiveness of each one. The parameters of ﬁxations, saccades, blinks and scan path, and the results in a puzzle task were found. The statistical study concluded that no signiﬁcant differences were found between participants in solving the crossword puzzle task; signiﬁcant differences were only detected in the parameters saccade amplitude minimum and saccade velocity minimum. On the other hand, this study, with supervised machine learning techniques, provided possible features for analysis, some of them different from those used in the statistical study. Regarding the clustering techniques, a good ﬁt was found between the algorithms used ( k- means ++, fuzzy k- means and DBSCAN). These algorithms provided the learning proﬁle of the participants in three types (students over 50 years old; and students and teachers under 50 years of age). Therefore, the use of both types of data analysis is considered complementary.


Introduction
The eye tracking technique has represented an important advance in research in different fields, for example, cognitive psychology, as it records evidence on the cognitive processes related to attention during the resolution of different types of tasks. In particular, this technology provides the researcher with knowledge of the eye movements that the learner performs to solve different tasks [1]. This implies an important advance in the study of information processing, as this technique will allow us to obtain empirical indicators in different metrics, all of which offers a guarantee of precision to the psychology professional for the interpretation of each user's information processing. However, the measurements are complex and, above all, lengthy in time, which often means that the ratios of participants are not very large. In summary, technological advances are improving the study of information processing in different learning tasks. The use of these resources is an opportunity for cognitive and instructional psychology to delve into the analysis of the variables that facilitate deep learning in different tasks. In addition, these tools allow the visualisation of the learning patterns of apprentices during the resolution of different activities. Initial research in this field [2] indicated that readers with prior knowledge showed little interest in the images embedded in the learning material. Furthermore, recent research [3] has found significant differences in eye tracking behaviour between experts vs. novices. It seems that experts allocate their attention more efficiently and learn more easily if automated monitoring processes are applied in learning proposals. Similarly, other studies [2] have indicated that the use of multimedia resources that incorporate zoom effects makes it easier for information to remain longer in short-term memory (STM). Likewise, if this information is accompanied by a narrating voice, attention levels and semantic comprehension increase [4,5]. A number of methods are used to analyse the effectiveness of the learning process including eye tracking-based methods. This technique offers an evaluation of eye movement in different metrics [1][2][3][4][5]. The eye tracking technique can use different algorithms [6][7][8][9]. They can be used to extract different metrics (more detailed explanations are given below). Specifically, eye tracking technology allows the analysis of the relationship between the level of visual attention and the eye-hand coordination processes during the resolution of different tasks within the executive attention processes [7,8]. Clearly, rapid eye movement has also been associated with the learner's fixation on the most relevant elements of the material being learned [2].
In this context, attention is considered to be the beginning of information processing and the starting point for the use of higher-order executive functions. In the same way, observational skills relate to eye tracking, which is directly related to the level of arousal and the transmission of information first to the STM and then its processing in working memory [6]. This development is influenced by learner-specific variables such as age, level of prior knowledge, cognitive ability and learning style [7]. However, some studies show that prior knowledge can compensate for the effects of age [8]. On the other hand, eye tracking technology is one of the resources that is supporting this new way of analysing the learning process. This technology is centred on evidence-based software engineering (EBSE) [9]. This technological resource makes it possible to study attentional levels and relate them to the cognitive processes that the learner uses in the course of solving a task [10,11]. Thus, eye tracking technology provides different metrics based on the recording of the frequency of gaze on certain parts of a stimulus. These metrics can be previously defined by the researcher and are called areas of interest (AOI), which can be relevant or irrelevant. This information will allow the practitioner to determine which learners are field-dependent or field-independent, based on their access to irrelevant vs. relevant information [12]. Likewise, the use of multimedia resources, such as videos, which include Self-Regulation Learning (SRL) aids through the teacher's voiceover or the figure of an avatar seems to be an effective resource for maintaining attention and comprehension of the task and even compensating for the lack of prior knowledge of the learners. One possible explanation is that they enhance self-regulation in the learning process [13][14][15]. However, the design of learning materials seems to be a key factor in maintaining attention during task performance. Therefore, it is necessary to know which elements are relevant vs. irrelevant, not only for the teacher but also for the learners' perception [16]. This is why the knowledge of measurement metrics in eye tracking technology, together with their interpretation, is a relevant component for the design of learning activities for different types of users.

Measurement Parameters in Eye Tracking Technology
As mentioned above, eye tracking technology facilitates the collection of different metrics. First, it enables the recording of the learner's eye movement or eye tracker while performing an activity. In addition, the use of eye tracking technology allows the definition of relevant vs. non-relevant areas (AOI) in the information being learned [17]. Within these metrics, different parameters can be studied, such as the fixation time of the eye on the part of the stimulus (interval between 200 and 300 ms). In this line, recent studies [18] indicate that the acquisition of information is related to the number of eye fixations of the learner. Similarly, another important metric is the saccade, which is defined as the sudden and rapid movement of a fixation (the interval is 40-50 ms). Sharafi et al. [18,19] found differences in the type of saccade depending on the phase of information encoding the learner was at. Another relevant parameter is the scan path or tracking path. This metric collects, in chronological order, the steps that the learner performs in the resolution of the learning task within the AOI marked by the teacher [18,19]. Likewise, eye tracking technology allows the use of supervised machine learning techniques to predict the level of learners' understanding, as this seems to be related to the number of fixations [20]. Recent studies indicate that variability in gaze behaviour is determined by image properties (position, intensity, colour and orientation), task instructions, semantic information and the type of information processing of the learner. These differences are detected using AOIs that are set by the experimenter [21].
In summary, eye tracking technology records diverse types of parameters that provide different interpretations of the underlying cognitive processes during the execution of a task. These parameters fall into three categories: fixations, saccades and scan path. The first one, fixations, refers to the stabilisation of the eye on a part of the stimulus during a time interval between 200 and 300 ms. In addition, eye tracking technology provides information about the start and the end time in x and y coordinates. The meaning of the cognitive interpretation is related to the perception, encoding and processing of the stimulus. The second ones, saccades, refers to the movement from one fixation to another, which is very fast and in the range of 40-50 ms. The third ones, scan path, refers to a series of fixations in the AOIs in chronological order of execution. This cognitive metric is useful for understanding the behavioural patterns of different participants in the same activity. Furthermore, each of these metrics has its own measurement specifications. Table 1 below shows the most significant ones and, where appropriate, their relationship with information processing. In summary, the use of eye tracking technology for the analysis of information processing during the resolution of tasks in virtual learning environments has been shown to be a very effective tool for understanding how each student learns [23]. Moreover, recent Appl. Sci. 2021, 11, 6157 5 of 24 studies conclude the need to integrate this technology in the usual learning spaces such as classrooms, although its use is still conditioned to important technical and interpretation knowledge on the part of the teacher [24]. Therefore, more research studies are needed to find out which of the presentation conditions of a learning task are more or less effective in learning depending on the characteristics of each learner (age, previous knowledge, learning style, etc.) [25]. There are many studies on the application of eye tracking technology that address the model of understanding the results obtained in the different metrics. To do so, they analyse the differences in results between experts vs. novices. Experts use additional information and solve a task faster and in less time. These studies also analyse behavioural patterns by comparing the type of participant, the type of pattern and the efficiency in solving the task. Cluster analysis metrics on frequency, time and effort are used to perform these analyses. Experts vs. novices use the additional information, e.g., colour and layout, in order to use the most efficient way of navigating the platform [11]. Additionally, experts seem to be faster, meaning they will solve tasks faster and more accurately. However, novice students seem to have a greater ability to understand the tasks [13]. Nevertheless, a comparative analysis of the performance of either the same learner in their learning process or between different types of learners (e.g., novices vs. experts) [26,27] requires the use of different data mining techniques [21,28]. These can be supervised learning (related to prediction or classification) [21] or unsupervised learning (related to the use of clustering techniques) [29]. Such techniques applied to the analysis of user learning have been called educational data mining (EDM) techniques [30]. Likewise, especially in the field of analysing student behaviour during task solving, the importance of using pattern analysis techniques within what has been called educational process mining (EPM) [31] stands out. EPM is a process that focuses on detecting among the possible variables of a study those that have a greater predictive capacity. These variables may be unknown or partially known. In short, EPM thus focuses on assuming a different type of data called events. Each event belongs to a single instance of the process, and these events are related to the activities. EPM is interested in end-to-end processes and not in local patterns [31]. The general objective of instance selection techniques (e.g., prototype selection) is to "try to eliminate from the training set those instances that are misclassified and, at the same time, to reduce possible overlaps between regions of different classes, i.e., their main goal is to achieve compact and homogeneous groupings" [32] (p. 2). These analyses would belong to the supervised machine learning techniques of classification and also to the statistical techniques related to knowing which possible independent variable or variables are the ones that have a significant weight on the dependent variable or variables. The common aim of these techniques would be the elimination of noise [33], which in experimental psychology would be related to the development of pre-experimental descriptive studies [34].
In summary, feature selection techniques are a very important part of machine learning and very useful in the field of education, as they will make it possible to eliminate those attributes that contribute little or nothing to the understanding of the results in an educational learning process. Knowledge of these aspects will be essential for the proposal of new research and in the design of educational programmes [8,35]. In brief, the use of sequence mining techniques [36] and the selection of instances used in studies on the analysis of the metacognitive strategies used during task resolution processes will be very useful for the development of personalised educational intervention proposals.

Application of the Use of Eye Tracking Technology
The cognitive procedure in the process of visual tracking of images, texts or situations in natural contexts is based on the stimulus-processing-response structure. Information enters via the visual pathway (retina-fovea) and is processed at the level of the subcortical and cortical regions within the central nervous system. This processing results in a sensory stimulation response. Specifically, saccades are a form of sensory-to-motor transformation from a stimulus that has been found to be significant. Saccadic eye movements are used to redirect the fovea from one point of interest to another. Fixation is also used to keep the fovea aligned on the target during subsequent analysis of the stimulus. This alternative saccade-fixation behaviour is repeated several hundred thousand times a day and is essential in complex behaviours such as reading and driving. Saccades can be triggered by the appearance of a visual stimulus that is motivating to the subject or initiated voluntarily by the person's interest in an object. Saccades can be suppressed during periods of visual fixation. In these situations, the brain must inhibit the automatic saccade response [37]. Eye tracking technology collects, among others, metrics related to fixations, saccades and blinks. This technology is also used in studies on information processing in certain learning processes (reading, driving machines or vehicles, marketing, etc.) in people without impairments [38][39][40][41][42][43][44][45] or in groups with different impairments such as attention deficit hyperactivity disorder or autism spectrum disorder [45]. In these cases, the objective is to analyse the users' difficulties in order to make proposals for therapeutic intervention. This technology is also being used as an accident avoidance strategy [43]. Similarly, this technology can be used to study the behavioural patterns of subjects and to analyse the differences or similarities between different groups [44][45][46]. Eye tracking is also currently being used to test the human-machine interface based on monitoring the control of smart homes through the Internet of Things [47]. In addition, this technology is being incorporated into mobile devices. This will soon facilitate its use by users in natural contexts [48]. Similarly, eye tracking technology is being incorporated into virtual and augmented reality scenarios as the software for registration is included within the glasses [49][50][51]. Similarly, eye tracking technology is being incorporated into the control of industrial robots [52,53]. Finally, systems are being implemented to improve the calibration and tracking of gaze tracking for users who were previously unable to use it, due to various neurological conditions (stroke paralysis or amputations, spinal cord injuries, Parkinson's disease, multiple sclerosis, muscular dystrophy, etc.) [53]. However, these applications are still very novel and require very specific knowledge of application, and processing and interpretation of the metrics. However, progress is being made in this aspect with the implementation of interpretation algorithms in software, such as machine learning techniques for supervised learning of classification, including algorithms such as k-nn and random forests [54].
Based on the above theoretical foundation, a study was carried out on the analysis of the behaviour of novice vs. expert learners during the performance of a self-regulated learning task. This task was carried out in a virtual environment with multimedia resources (self-regulated video) and was monitored using eye tracking technology.
In this study, two types of analysis were used. On the one hand, statistical techniques based on analysis of covariance (ANCOVA) were used on two fixed effects factors which have been shown to be relevant in the research literature, the type of participant (novice vs. expert) and age (in this study, over 50 years old vs. under 50 years old). In addition, whether the participant is a student vs. a teacher was considered as a covariate on the dependent variables learning outcomes in solving a crossword puzzle task and eye tracking metrics (fixations, saccades, blinks and scan path length).
The hypotheses were as follows: RQ1. Will there be significant differences in the results of solving a crossword puzzle depending on whether the participants are novices vs. experts, taking into account the covariate student vs. teacher?
RQ2. Will there be significant differences in fixations, saccades, blinks and scan path length metrics depending on the age of the participant (over 50 vs. under 50), taking into account the covariate student vs. teacher?
RQ3. Will there be significant differences in the metrics of fixations, saccades, blinks and scan path length depending on whether the participants are novices vs. experts, taking into account the covariate student vs. teacher?
On the other hand, this study applied a data analysis procedure using different supervised learning algorithms for feature selection. The objective was to find out the most significant attributes with respect to all the variables (characteristics of the participants and metrics obtained with eye tracking technology).

Participants
A disaggregated description of the sample with respect to the variables age, gender and type of participant (prior knowledge vs. no prior knowledge; teacher vs. student) can be found in Table 2.

Instruments
The following resources were used: 1. Eye tracking equipment iView XTM, SMI Experimenter Center 3.0 and SMI BeGazeTM. These tools record eye movements, their coordinates and the pupillary diameters of each eye. In this study, 60 Hz, static scan path metrics (fixations, saccades, blinks and scan path) were used. In addition, participants viewed the performance of the learning task on a monitor with a resolution of 1680 × 1050.
2. Ad hoc questionnaire on the characteristics of each participant (age, gender, level of studies, branch of knowledge, current employment situation and level of previous knowledge).
The questions were related to the following: 3. Ad hoc crossword puzzle on the knowledge of the information in 5 questions related to the content of the video seen and referring to the origin of monasteries in Europe. 4. Learning task that consisted of a self-regulated video through the figure and voice of an avatar that narrated the task about the origins of monasteries in Europe. The duration of the activity was 120 s.
The questions were related to the following: (a) Monks belonging to the order of St. Benedict; (b) Powerful Benedictine monastic centre founded in the 10th century, whose influence spread throughout Europe; (c) Space around which the organisation of the monastery revolves; (d) Set of rules that govern monastic life; (e) Each of the bays or sides of a cloister.

Procedure
An authorisation was obtained from the Bioethics Committee of the University of Burgos before starting the research. In addition, convenience sampling was used to select the sample. The participants did not receive any financial compensation. They were previously informed of the objectives of the research, and a written informed consent was obtained from all of them. The first phase of the study consisted of collecting personal data and testing the level of prior knowledge. Subsequently, the calibration test was prepared for each participant, using the standard deviation of 0.1-0.9, for both eyes, with a percentage adjustment of between 86.5% and 100%. Subsequently, a test was applied, which consisted of watching a 120-s video about the characteristics of a medieval monastery. The video was designed by a specialist teacher in art history, and the voiceover was provided by a specialist in SRL. After watching the video, each participant completed a crossword puzzle with five questions about the concepts explained in the video. The evaluation sessions were always conducted by the same people: a psychologist with expertise in SRL and a computer engineer, both with experience in the operation of eye tracking technology. Figure 1 shows an image of the calibration procedure and Figure 2 shows the viewing of the video and the completion of the crossword puzzle.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 26 crossword puzzle with five questions about the concepts explained in the video. The evaluation sessions were always conducted by the same people: a psychologist with expertise in SRL and a computer engineer, both with experience in the operation of eye tracking technology. Figure 1 shows an image of the calibration procedure and Figure 2 shows the viewing of the video and the completion of the crossword puzzle.

Statistical Study
A study was conducted using three-factor fixed effects analysis of variance (ANOVA) statistical techniques (type of participant, i.e., student vs. teacher, age (over 50 years old vs. under 50 years old) and knowledge (expert vs. novices)) and eta squared effect value analysis (η 2 ). Analyses were performed with the SPSS v.24 statistical package [55].
A 2 × 2 × 2 factorial design (experts vs. non-experts, students vs. teachers, age (over 50 years old vs. under 50 years old)) was used [34]. The independent variables were type  Figure 1 shows an image of the calibration procedure and Figure 2 shows the viewing of the video and the completion of the crossword puzzle.

Statistical Study
A study was conducted using three-factor fixed effects analysis of variance (ANOVA) statistical techniques (type of participant, i.e., student vs. teacher, age (over 50 years old vs. under 50 years old) and knowledge (expert vs. novices)) and eta squared effect value analysis (η 2 ). Analyses were performed with the SPSS v.24 statistical package [55].
A 2 × 2 × 2 factorial design (experts vs. non-experts, students vs. teachers, age (over 50 years old vs. under 50 years old)) was used [34]. The independent variables were type of participant (experts vs. novice), age (over 50 years old vs. under 50 years old) and participant type (students vs. teachers). The dependent variables were as follows:

Statistical Study
A study was conducted using three-factor fixed effects analysis of variance (ANOVA) statistical techniques (type of participant, i.e., student vs. teacher, age (over 50 years old vs. under 50 years old) and knowledge (expert vs. novices)) and eta squared effect value analysis (η 2 ). Analyses were performed with the SPSS v.24 statistical package [55].
A 2 × 2 × 2 factorial design (experts vs. non-experts, students vs. teachers, age (over 50 years old vs. under 50 years old)) was used [34]. The independent variables were type of participant (experts vs. novice), age (over 50 years old vs. under 50 years old) and participant type (students vs. teachers). The dependent variables were as follows: • Solving crossword puzzle results; • Fixations (fixation count, fixation frequency count, fixation duration total, fixation duration average, fixation duration maximum, fixation duration minimum, fixation dispersion total, fixation dispersion average, fixation dispersion maximum, fixation dispersion minimum); • Saccades (saccade count, saccade frequency count, saccade duration total, saccade duration average, saccade duration maximum, saccade duration minimum, saccade amplitude total, saccade amplitude average, saccade amplitude maximum, saccade amplitude minimum, saccade velocity total, saccade velocity average, saccade velocity maximum, saccade velocity minimum, saccade latency average); • Blinks (blink count, blink frequency count, blink duration total, blink duration average, blink duration maximum, blink duration minimum) and scan path length.
These metrics are related to the analysis of the cognitive procedure during visual tracking. This procedure is based on the stimulus-processing-response structure. Information enters via the visual pathway (retina-fovea) and is processed at the level of subcortical and cortical regions within the central nervous system. This processing results in a sensory stimulation response. Specifically, saccades constitute a form of sensory-to-motor transformation in response to a stimulus that has been found to be significant and a sensorimotor control of the processing. Saccadic eye movements are used to redirect the fovea from one point of interest to another. Likewise, fixation is used to keep the fovea aligned on the target during subsequent image analysis. This alternating saccade-fixation behaviour is repeated several hundred thousand times a day in humans and is central to complex behaviours such as reading. Saccades can be triggered by the appearance of a visual stimulus that is motivating to the subject or initiated voluntarily by the person's interest in a particular object. Saccades can be suppressed during periods of visual fixation, in which case the brain must inhibit the automatic saccade response [37]. The whole process is summarised in Figure 3. In addition, a video (https://youtu.be/DlRK21afGgo access on 28 June 2021) on the process of performing the task applied in this study can be consulted. In this video, the fixation and saccade points can be seen. Blinks (blink count, blink frequency count, blink duration total, blink duration average, blink duration maximum, blink duration minimum) and scan path length.
These metrics are related to the analysis of the cognitive procedure during visual tracking. This procedure is based on the stimulus-processing-response structure. Information enters via the visual pathway (retina-fovea) and is processed at the level of subcortical and cortical regions within the central nervous system. This processing results in a sensory stimulation response. Specifically, saccades constitute a form of sensory-to-motor transformation in response to a stimulus that has been found to be significant and a sensorimotor control of the processing. Saccadic eye movements are used to redirect the fovea from one point of interest to another. Likewise, fixation is used to keep the fovea aligned on the target during subsequent image analysis. This alternating saccade-fixation behaviour is repeated several hundred thousand times a day in humans and is central to complex behaviours such as reading. Saccades can be triggered by the appearance of a visual stimulus that is motivating to the subject or initiated voluntarily by the person's interest in a particular object. Saccades can be suppressed during periods of visual fixation, in which case the brain must inhibit the automatic saccade response [37]. The whole process is summarised in Figure 3. In addition, a video (https://youtu.be/DlRK21afGgo access on 28 June 2021) on the process of performing the task applied in this study can be consulted. In this video, the fixation and saccade points can be seen.

Study Using Machine Learning Techniques
As stated in the introduction, machine learning techniques can be divided into supervised learning techniques, which in turn can be subdivided into classification and prediction techniques [21], and unsupervised learning, which refers to the use of clustering techniques [29]. Specifically, supervised learning techniques of pattern analysis are used for human behavioural analysis. These would fall within the supervised learning techniques of clustering [31,32,36]. Concretely, in this study, we used supervised automatic learning techniques for classification (the gain ratio, symmetrical uncertainty and chi-square algorithms were applied) and unsupervised clustering (the k-means ++, fuzzy k-means and DBSCAN algorithms were applied). The analyses were performed with the R programming language [56].
In the study with machine learning techniques, a descriptive correlational design was applied [34]. A supervised learning analysis of classification and non-supervised clustering was applied on all features.

Previous Analyses
Before starting the testing of the hypotheses, it was checked whether the sample followed a normal distribution, for which a study was conducted on the values of skewness (values below |2.00| are considered accepted values, and a value of skewness = −0.22 was found) and kurtosis (values below |8.00| are considered accepted values, and a value of kurtosis = −2.06 was found). Therefore, the results indicate that the distribution follows the assumptions of normality, which is why parametric statistics were used to test the hypotheses.

Hypothesis Testing Analysis
To test RQ1, a one-factor fixed effects ANCOVA was applied for the participant type "expert vs. novice" considering the covariate (participant type "student vs. teacher") with respect to the dependent variable crossword result. No significant differences were found, but a mean effect value was found (F = 1.91, p = 0.40, η 2 = 0.66). Additionally, no effect of the covariate was found (F = 0.03, p = 0.90, η 2 = 0.03), and in this case, the effect value was low.
To test RQ2, a one-factor fixed effects ANCOVA (participant type "over 50 vs. under 50") was applied considering the covariate (participant type "student vs. teacher"). No significant differences were found in the metrics of fixations, saccades, blinks and scan path length. A covariate effect was only found in the metrics of saccade amplitude minimum (F = 5.19, p = 0.03, η 2 = 0.13) and saccade velocity minimum (F = 5.18, p = 0.03, η 2 = 0.13), in both cases with a low effect value. All results can be found in Table A1 in Appendix A.
Regarding test RQ3, a one-factor fixed effects ANCOVA (participant type "novice vs. expert") was applied considering the covariate (participant type "student vs. teacher"). No significant differences were found in the metrics of fixations, saccades, blinks and scan path length. The effect of the covariate was only found in the metrics of saccade amplitude minimum (F = 6.90, p = 0.01, η 2 = 0.16) and saccade velocity minimum (F = 7.67, p = 0.01, η 2 = 0.18), and in both cases, the effect value was medium. All results can be found in Table A2 in Appendix A.

Study with Supervised Learning Machine Learning Techniques: Feature Selection
A feature selection analysis was performed with the R programming package mclust, selecting from all possible variables those that received a positive ranking. The gain ratio, symmetrical uncertainty and chi-square algorithms were used for feature selection. Table 3 shows the best values found with each of them for feature selection. (a) The gain ratio is a feature selection method that belongs to the filtering methods. It relies on entropy to assign weights to discrete attributes based on their correlation between the attribute and a target variable (in this study, the results in solving the crossword puzzle). The gain ratio focuses on the information gain metric [57], traditionally used to choose the attribute at a node of a decision tree with the ID3 method. This is the one that generates a partition in which the examples are distributed less randomly among the classes. This method was improved by Quinlan in 1993 [58], as he detected that the information gain was calculated with an unfair favouritism towards attributes with many results. To correct this, he added a value correction based on standardisation by the entropy of that attribute. If Y is the variable to be predicted, then the gain ratio standardises the gain by dividing by the entropy of X. Thus, the C4.5 decision tree construction method uses this measure. From a data mining point of view, this attribute selection could be understood as the selection of attributes as best candidates for the root of a decision tree, which in this study will predict the solving crossword puzzle variable. With H being the entropy, the gain ratio equation is as follows:    (c) Chi-square is a feature selection algorithm that belongs to the filter type and tries to obtain the weights of each feature by using the chi-square test (in case the features are not nominal, it discretises them). The selection result is the same as Cramer's V coefficient. The chi-square equation is as follows: where Oi is the observed or empirical absolute frequency and Ei is the expected frequency. Figure 6 shows the correlation matrix found with chi-square (χ 2 ) [59]. (c) Chi-square is a feature selection algorithm that belongs to the filter type and tries to obtain the weights of each feature by using the chi-square test (in case the features are not nominal, it discretises them). The selection result is the same as Cramer's V coefficient. The chi-square equation is as follows: where Oi is the observed or empirical absolute frequency and Ei is the expected frequency. Figure 6 shows the correlation matrix found with chi-square (χ 2 ) [59]. Appl. Sci. 2021, 11, x FOR PEER REVIEW 15 of 26 Figure 6. Relationship matrix on the selected characteristics performed with the chi-square algorithm.

Study with Unsupervised Learning Machine Learning Techniques: Clustering
Finally, cluster detection was performed on the data with unsupervised learning techniques, ignoring the solving crossword puzzles parameter in order to detect patterns in the instances. Nominal variables were transformed into dummy variables in such a way that a variable with n possible different values was divided into n-1 new binary variables, meaning that each of them indicated belonging to one of the previous values. The data were normalised by normalising the mean of the attributes to 0 and the standard deviation to 1. The following clustering algorithms were used: (a) k-means++ is an algorithm for choosing the initial values of the centroids for the k-means clustering algorithm. It was proposed in 2007 by Arthur and Vassilvitskii [60] as an approximation algorithm for solving the NP-hard k-means problem. That is, a way to avoid the sometimes poor clustering encountered by the standard k-means clustering algorithm.

(µ ) ≤ 2 (µ ) + 2||µ − µ ||
where µ is the initial point selected and D is the distance between point µ and the nearest centre of the cluster. Once the centroids are chosen, the process is like the classical kmeans.
(b) The fuzzy k-means algorithm combines the methods based on the optimisation of the objective function with those of fuzzy logic [61,62]. This algorithm performs cluster formation through a soft partitioning of the data. That is, a piece of data would not belong exclusively to a single group but could have different degrees of belonging to several groups. This procedure calculates initial means (m1, m2, ..., mk) to find the degree of membership of data in a cluster. As long as there are no changes in these means, the degree of membership of each data item xj in cluster i is calculated.

Study with Unsupervised Learning Machine Learning Techniques: Clustering
Finally, cluster detection was performed on the data with unsupervised learning techniques, ignoring the solving crossword puzzles parameter in order to detect patterns in the instances. Nominal variables were transformed into dummy variables in such a way that a variable with n possible different values was divided into n-1 new binary variables, meaning that each of them indicated belonging to one of the previous values. The data were normalised by normalising the mean of the attributes to 0 and the standard deviation to 1. The following clustering algorithms were used: (a) k-means++ is an algorithm for choosing the initial values of the centroids for the k-means clustering algorithm. It was proposed in 2007 by Arthur and Vassilvitskii [60] as an approximation algorithm for solving the NP-hard k-means problem. That is, a way to avoid the sometimes poor clustering encountered by the standard k-means clustering algorithm.
where µ 0 is the initial point selected and D is the distance between point µ i and the nearest centre of the cluster. Once the centroids are chosen, the process is like the classical k-means.
(b) The fuzzy k-means algorithm combines the methods based on the optimisation of the objective function with those of fuzzy logic [61,62]. This algorithm performs cluster formation through a soft partitioning of the data. That is, a piece of data would not belong exclusively to a single group but could have different degrees of belonging to several groups. This procedure calculates initial means (m 1 , m 2 , ..., m k ) to find the degree of membership of data in a cluster. As long as there are no changes in these means, the degree of membership of each data item x j in cluster i is calculated. where m i is the fuzzy mean of all the examples in cluster i.
(c) DBSCAN (density-based spatial clustering of applications with noise) [63] is understood as an algorithm that identifies clusters describing regions with a high density of observations and regions of low density. DBSCAN avoids the problem that other clustering algorithms have by following the idea that, for an observation to be part of a cluster, there must be a minimum number of neighbouring observations (minPts) within a proximity radius (epsilon) and that clusters are separated by empty regions or regions with few observations.
As all remaining variables were nominal after feature selection, after pre-processing the data, only clustering with binary variables was used, which complicated the processing of the k-means++ algorithm by placing the centroids at different locations in the space when the number of centroids was bigger than three. For this reason, the parameter value k in the k-means++ and fuzzy k-means algorithms was equal to 3.
The value of the DBSCAN algorithm parameters was 5 for the minPts variable as it is the default value in the library [64]. To choose the epsilon value, the elbow method was applied. Figure 7 shows the average distance of each point to its nearest neighbouring minPts, and the value 2.97 was chosen for this parameter. (c) DBSCAN (density-based spatial clustering of applications with noise) [63] is understood as an algorithm that identifies clusters describing regions with a high density of observations and regions of low density. DBSCAN avoids the problem that other clustering algorithms have by following the idea that, for an observation to be part of a cluster, there must be a minimum number of neighbouring observations (minPts) within a proximity radius (epsilon) and that clusters are separated by empty regions or regions with few observations. As all remaining variables were nominal after feature selection, after pre-processing the data, only clustering with binary variables was used, which complicated the processing of the k-means++ algorithm by placing the centroids at different locations in the space when the number of centroids was bigger than three. For this reason, the parameter value k in the k-means++ and fuzzy k-means algorithms was equal to 3.
The value of the DBSCAN algorithm parameters was 5 for the minPts variable as it is the default value in the library [64]. To choose the epsilon value, the elbow method was applied. Figure 7 shows the average distance of each point to its nearest neighbouring minPts, and the value 2.97 was chosen for this parameter. The visualisation of the clustering results can be seen in Figure 8, which shows the data after applying dimensionality reduction with the principal component analysis (PCA) method. The clusters selected by the k-means++ and fuzzy k-means algorithms are identical, while DBSCAN only found two clusters, leaving instances out of them. These instances labelled as noise in this study are assigned to an additional cluster.
Finally, it has to be considered that when applying an unsupervised learning method, such as clustering, there is no objective variable to evaluate the goodness of the distribution of instances in clusters. However, the goodness of clustering can be tested using the adjusted Rand index (ARI), in order to compare how similar the clustering algorithms are to each other. Thus, if many algorithms perform similar partitions, the conclusion will be consistent [65]. That is, if a pair of instances is in the same cluster in both partitions, this fact will represent similarity between these partitions. In the opposite case, where a pair of instances is in the same cluster in one partition and in different clusters in the other category, it will represent a difference. With n being the number of instances, a being the number of pairs of instances grouped in the same cluster in both partitions and b being The visualisation of the clustering results can be seen in Figure 8, which shows the data after applying dimensionality reduction with the principal component analysis (PCA) method. The clusters selected by the k-means++ and fuzzy k-means algorithms are identical, while DBSCAN only found two clusters, leaving instances out of them. These instances labelled as noise in this study are assigned to an additional cluster.
Finally, it has to be considered that when applying an unsupervised learning method, such as clustering, there is no objective variable to evaluate the goodness of the distribution of instances in clusters. However, the goodness of clustering can be tested using the adjusted Rand index (ARI), in order to compare how similar the clustering algorithms are to each other. Thus, if many algorithms perform similar partitions, the conclusion will be consistent [65]. That is, if a pair of instances is in the same cluster in both partitions, this fact will represent similarity between these partitions. In the opposite case, where a pair of instances is in the same cluster in one partition and in different clusters in the other category, it will represent a difference. With n being the number of instances, a being the number of pairs of instances grouped in the same cluster in both partitions and b being the number of pairs of instances grouped in different clusters in different partitions, the Rand index (without adjustment and correction) would be as follows: A correction is made to the original intuition of the Rand index, since the expected similarity between two partitions established with random models can have pairs of instances that coincide, and this fact would cause the Rand index to never be 0. To make the correction, the adjusted Rand index algorithm, ARI, was applied, in which negative values can be found if the similarity is less than expected, being equal to The applied ARI formula is therefore the number of pairs of instances grouped in different clusters in different partitions, the Rand index (without adjustment and correction) would be as follows: = , ℎ = , , ∈ , , ∈ , = , ℎ = , ∈ , ∈ , ∈ , ∈ , = + A correction is made to the original intuition of the Rand index, since the expected similarity between two partitions established with random models can have pairs of instances that coincide, and this fact would cause the Rand index to never be 0. To make the correction, the adjusted Rand index algorithm, ARI, was applied, in which negative values can be found if the similarity is less than expected, being equal to The applied ARI formula is therefore where if X = {X1,X2, ..., Xr} and Y = {Y1,Y2, ..., Ys}, then nij = Xi ∩ Yj, ai = ∑jnij and bi = ∑inij. Thus, the ARI can have a value between -1 and 1, where 1 indicates that the two data clusters match exactly in every pair of points, 0 is the expected value for randomly created clusters and -1 is the worst fit. The results indicate that the algorithms that provide the best fit are k-means ++ and fuzzy k-means (ARI = 1), k-means ++ and DBSCAN (ARI = 0.96) and fuzzy k-means and DBSCAN (ARI = 0.9), where the higher the intensity, the higher the relationship. It can therefore be concluded that the degree of fit between the algorithms applied in this study is good for all possible associations (show Figure 9). Thus, the ARI can have a value between −1 and 1, where 1 indicates that the two data clusters match exactly in every pair of points, 0 is the expected value for randomly created clusters and −1 is the worst fit. The results indicate that the algorithms that provide the best fit are k-means ++ and fuzzy k-means (ARI = 1), k-means ++ and DBSCAN (ARI = 0.96) and fuzzy k-means and DBSCAN (ARI = 0.9), where the higher the intensity, the higher the relationship. It can therefore be concluded that the degree of fit between the algorithms applied in this study is good for all possible associations (show Figure 9).

Discussion
Regarding the results found in the RQ1 check, it was not confirmed that participants with prior knowledge performed better on the crossword puzzle solving test than nonexperts. In line with studies by Eberhard et al. [2], Takacs and Bus [4] and Verhallen and Bus [5], this may be explained by the fact that the task was presented in a video that included self-regulated speech. This technique has been shown to be very effective in mitigating the differences between novice vs. experienced learners [12][13][14]. However, although no significant differences were found with respect to the independent variable, a mean effect value was found. This suggests that the participant type variable "novice vs. expert" is an important variable in task resolution processes. However, in this study, this effect may have been mitigated by the way the task was presented (self-regulated procedure). This result coincides with the findings of studies that conclude that the lack of prior knowledge in novice learners can be compensated by the proposal of self-regulated multimeasure tasks [12][13][14][15]35,36]. The explanation is that self-regulated video may facilitate homogeneity in the encoding of information, attention to relevant vs. non-relevant information and in the route taken in the scan path [18,19].
Regarding RQ2, no effect of age was found on the metrics of fixations, saccades, blinks and scan path length. This may be explained by the way the task was presented (self-regulated video), or by the participants' prior knowledge. In this line, research [8] supports that prior knowledge compensates for the effects of age on cognitive functioning, for example, on long-term memory processes or reaction times. In addition, it has been found that the covariate participant type "student vs. teacher" does weigh on task performance. Specifically, differences were found in the saccade amplitude minimum and saccade velocity minimum parameters. These data can be related to the findings of studies indicating that age effects can be mitigated by learners' prior knowledge of the task [8] and also by self-regulated presentation of the task [18,19]. In fact, the significant differences found in the covariate focused on saccade amplitude and minimum saccade velocity, which is consistent with studies that found differences in saccade type depending on the phase of information encoding the learner was at [18,19]. This result is important for future research proposals. The reason is that the way students vs. teachers process information might be influencing the way they learn. For example, teachers might develop more systematic processing that would compensate for their lack of knowledge in a task.

Discussion
Regarding the results found in the RQ1 check, it was not confirmed that participants with prior knowledge performed better on the crossword puzzle solving test than nonexperts. In line with studies by Eberhard et al. [2], Takacs and Bus [4] and Verhallen and Bus [5], this may be explained by the fact that the task was presented in a video that included self-regulated speech. This technique has been shown to be very effective in mitigating the differences between novice vs. experienced learners [12][13][14]. However, although no significant differences were found with respect to the independent variable, a mean effect value was found. This suggests that the participant type variable "novice vs. expert" is an important variable in task resolution processes. However, in this study, this effect may have been mitigated by the way the task was presented (self-regulated procedure). This result coincides with the findings of studies that conclude that the lack of prior knowledge in novice learners can be compensated by the proposal of selfregulated multi-measure tasks [12][13][14][15]35,36]. The explanation is that self-regulated video may facilitate homogeneity in the encoding of information, attention to relevant vs. nonrelevant information and in the route taken in the scan path [18,19].
Regarding RQ2, no effect of age was found on the metrics of fixations, saccades, blinks and scan path length. This may be explained by the way the task was presented (self-regulated video), or by the participants' prior knowledge. In this line, research [8] supports that prior knowledge compensates for the effects of age on cognitive functioning, for example, on long-term memory processes or reaction times. In addition, it has been found that the covariate participant type "student vs. teacher" does weigh on task performance. Specifically, differences were found in the saccade amplitude minimum and saccade velocity minimum parameters. These data can be related to the findings of studies indicating that age effects can be mitigated by learners' prior knowledge of the task [8] and also by self-regulated presentation of the task [18,19]. In fact, the significant differences found in the covariate focused on saccade amplitude and minimum saccade velocity, which is consistent with studies that found differences in saccade type depending on the phase of information encoding the learner was at [18,19]. This result is important for future research proposals. The reason is that the way students vs. teachers process information might be influencing the way they learn. For example, teachers might develop more systematic processing that would compensate for their lack of knowledge in a task. Alternatively, younger students might implement more effective learning, and thus processing, strategies even though they are novices [13]. These hypotheses will be explored in future studies.
Regarding RQ3, no significant differences were found in the metrics of fixations, saccades, blinks, and scan path length depending on whether the participant was a novice or an expert. This may be explained by the way the task was presented (a self-regulated video with a set time duration). However, future studies could test the results on videos that did not include self-regulation and/or that could be viewed more than once. We also found that there is an effect of the covariate participant type "student vs. teacher" on the saccade amplitude minimum and saccade velocity minimum parameters. As indicated in RQ2, this is an important fact to consider in future research, as the way students vs. teachers process information could be influencing the type of information processing. Similarly, future studies could test whether the form of task presentation (self-regulated vs. non-self-regulated; timed vs. untimed, etc.) could be influencing the form of processing (fixations, saccades, blinks, scan path length). Similarly, processing patterns could be found for different participant types (novice vs. expert, with different age intervals, etc.), and the types of patrons could be tested according to the type of participant.
According to the analysis performed with supervised learning methods of feature selection, it was found that the different algorithms applied (gain ratio, symmetrical uncertainty, chi-square) provided valuable information regarding the most significant attributes in the study. In this case, the following attributes were considered as important: previous knowledge, group type, employment status, gender, level degree and knowledge branch. This result is very interesting for future research, as it provides information on the possible effects of characteristics that were not considered as independent variables in the statistical study (employment status, gender, level degree and knowledge branch).
Regarding the study with unsupervised learning techniques (clustering), it allowed us to know the grouping, i.e., the similar interaction patterns of the participants in the selected characteristics. The three algorithms applied had a good ARI. This result is important for future studies, as a learning style profile can be extracted for each group and its relationship with the outcome of the learning tasks and with the reaction times for the execution of the tasks can be checked.

Conclusions
The use of the eye tracking technique provides evidence on the processing of information in different types of participants during the resolution of different tasks [9][10][11]. This fact facilitates research in behavioural sciences [37]. Working with this technology opens up many fields of research applied to numerous environments (learning to read and write, logical-mathematical reasoning, physics, driving vehicles, driving dangerous machines, marketing, etc.) [38][39][40][41][42]. It can also be used to find out how people with different learning disabilities [45] (ADHD, ASD, etc.) learn. Therefore, it could improve their learning style and make proposals for personalised intervention according to the needs observed in each of them. In addition, this technology can be used to improve driving practices and accident prevention with regard to the handling of dangerous machinery. This training is being carried out in virtual and/or augmented reality scenarios [49][50][51] that apply eye tracking technology.
All these possibilities open an important field to be addressed in future research.
Another relevant aspect to take into account is the way tasks are presented. This study has shown that the use of self-regulated tasks facilitates the processing of information and homogenises learning responses between novice and expert learners [12][13][14][15]35,36]. Therefore, in future studies, we will study participants' processing in different types of tasks (self-regulated designs with avatars, zooming in on the most relevant information, etc.). Likewise, the results will be tested in different educational stages (early childhood education, primary education, secondary education, university education and non-formal education) and in different subjects (experimental vs. non-experimental). Subsequently, this study has shown that the use of different automatic learning techniques such as feature selection facilitates the knowledge of attributes that may be more significant for the research. This functionality is very useful in research that works with a large volume of features or instances. Moreover, if this technique is combined with the use of machine learning techniques and traditional statistics, the results can provide more information, especially related to future lines of research. In fact, in this study, it has been found that some of the variables considered as independent in the statistical study were also selected as relevant features in the study that applied supervised learning techniques of instance selection (e.g., prior knowledge, type of participant (student vs. teacher)). However, the feature selection techniques have also provided clues to be taken into account in future studies on the influence of other variables (e.g., gender, employment status, level of education and field of knowledge). In this line, the use of different algorithms to test both feature selection and clustering in unsupervised learning provides the researcher with a repertoire of results whose fit can be contrasted with the ARI. This will make it possible to know the groupings among the learners and to isolate the patterns of the types of learners in order to be able to offer educational responses based on personalised learning. On the other hand, the use of statistical analysis methods makes it possible to ascertain whether the variables indicated as independent have an effect on the dependent variables. In summary, perhaps the most useful procedure is, first, to apply the techniques of supervised learning of characteristics and then, depending on the variables detected, to pose the research questions and apply the relevant statistical analyses to test them.
Finally, the results of this study must be taken with caution, as this study has a series of limitations. These are mainly related to the size of the sample, which is small, and the selection of the sample, which was conducted convenience sampling. However, it must be considered that the use of the eye tracking methodology requires a very exhaustive control of the development of tasks in laboratory spaces, an aspect that makes it difficult for the samples to be large and randomised. Another of the limiting elements of this work is that a very specific task (acquisition of the concepts of the origins of monasteries in Europe and verification of this acquisition through the resolution of a crossword puzzle) was used in a specific learning environment (history of art). For this reason, possible future studies have been indicated in the Discussion and Conclusions sections.   Note. G1 = participants younger than 50 years; G2 = participants older than 50 years; M = mean; SD = standard deviation; df = degrees of freedom; η 2 = eta squared effect value; * p < 0.05. Note. G1 = novice participants; G2 = expert participants; M = mean; SD = standard deviation; df = degrees of freedom; η 2 = eta squared effect value; * p < 0.05.