Eye Tracking Scanpath Analysis Techniques on Web Pages: A Survey, Evaluation and Comparison

Eraslan, Sukru; Yesilada, Yeliz; Harper, Simon

doi:10.16910/jemr.9.1.2

Open AccessArticle

Eye Tracking Scanpath Analysis Techniques on Web Pages: A Survey, Evaluation and Comparison

by

Sukru Eraslan

^1,2,

Yeliz Yesilada

² and

Simon Harper

¹

School of Computer Science, University of Manchester Manchester, Manchester M13 9PL, UK

²

Middle East Technical University, Northern Cyprus Campus, 99738 Kalkanli, Guzelyurt, Mersin 10, Turkey

J. Eye Mov. Res. 2016, 9(1), 1-19; https://doi.org/10.16910/jemr.9.1.2

Published: 30 December 2015

Download

Browse Figures

Versions Notes

Abstract

:

Eye tracking has commonly been used to investigate how users interact with web pages, with the goal of improving their usability. This article comprehensively revisits the techniques that could be applicable to eye tracking data for analysing user scanpaths on web pages. It also uses a third-party eye tracking study to compare these techniques. This allows researchers to recognise existing techniques for their goals, understand how they work and know their strengths and limitations so that they can make an efficient choice for their studies. These techniques can mainly be used for calculating similarities/dissimilarities between scanpaths, computing transition probabilities between web page elements, detecting patterns in scanpaths and identifying common scanpaths. The scanpath analysis techniques are classified into four groups by their goals so that researchers can directly focus on the appropriate techniques for a sequential analysis of user scanpaths on web pages. This article also suggests dealing with the limitations of these techniques by pre-processing eye tracking data, considering cognitive processing and addressing their reductionist approach.

Keywords:

eye tracking; scanpath; eye movement sequence; web pages; visual elements; scanpath analysis techniques; sequence analysis; pattern detection; common scanpaths

Introduction

Web pages are typically made up with visual elements such as menus, headers and footers. These elements allow users to complete their tasks. For example, users can navigate within web pages by using the menus. In order to investigate how users interact with these visual elements, many researchers (i.e., academic/usability researchers and usability evaluators) prefer to conduct eye tracking studies. These studies reveal which visual elements are fixated and which paths are followed (Holsanova, Rahm, & Holmqvist, 2006; Yesilada, Jay, Stevens, & Harper, 2008; Albanesi, Gatti, Porta, & Ravarelli, 2011; Hejmady & Narayanan, 2012; Eraslan, Yesilada, & Harper, 2014). For example, Gossen, Hö bel, and Nü rnberger (2014) conducted an eye tracking study and investigated how children interact with search engines. Their findings illustrate that children typically experience difficulties in estimating the relevancy of a search result. Therefore, they suggest that search engines should be improved to support children to find the most relevant results.

Eye tracking studies supplement other usability methods, especially the Retrospective Think Aloud method where users are asked to verbalise their performance after they complete their tasks (Guan, Lee, Cuddihy, & Ramey, 2006). A study conducted by Guan et al. (2006) illustrates that when users encounter difficulties in completing their tasks, they verbalise their performance at a very abstract level. Hence, when users are asked to complete more complicated tasks, scanpath analysis becomes more crucial to understand their real performance. Besides this, scanpath analysis is likely to be more valuable for exploratory tasks in comparison with goal-directed tasks. For goaldirected tasks, various metrics can be used, such as the task completion time. However, there is no specific goal in exploratory tasks, thus researchers can benefit from scanpath analysis to understand how users explore web pages. Groner, Siegenthaler, Raess, Wurtz, and Bergamin (2009) propose a multifunctional usability analysis approach that consists of eye gaze analysis, verbal reports, log file analysis, retrospective interviews and performance characteristics (such as failure and success). They applied their approach to an eLearning module of the Moodle learning management system (http://moodle.com) to investigate how users interact with the module. Their findings suggest that users experience difficulties in navigating the module because of a large amount of visual information on a page. To improve the navigation, they suggest to include less information on the start page and provide a table of contents that gives a direct access to other parts.

When users read web pages, their eyes become relatively stable at certain points which are referred to as fixations. A series of fixations represent their scanpaths on the web pages. Figure 1 shows an example of a scanpath of a particular user on the HCW Travel web page which is segmented into its visual elements (Brown, Jay, & Harper, 2012; Akpinar & Yeşilada, 2013). As can be seen from this figure, fixations are illustrated with circles where larger circles are used for longer fixations. The user here fixated the visual elements B, D, C and E respectively. Therefore, the scanpath is represented as BDCE.

The scanpath theory of Noton and Stark (1971a, 1971b) suggests that a user establishes his or her own scanpath on the first visit to a visual stimulus and then follows the same scanpath, with some variations, on the following visits to the visual stimulus. It also suggests that the scanpaths are not similar between different users on a particular visual stimulus, and between different visual stimuli for a particular user. As web pages are repeatedly visited visual stimuli, both Josephson and Holmes (2002) and Burmester and Mast (2010) tested this theory with web pages. However, they recognised that the scanpath theory could not be fully supported on web pages. In particular, they recognised that the users typically followed various scanpaths on a particular web page instead of a single scanpath as suggested by the scanpath theory. Josephson and Holmes (2002) also recognised many cases where the most similar scanpaths on a particular web page were from different users instead of the same user. The scanpaths can also be affected by user tasks and different individual factors, such as a gender and user expertise (Eraslan & Yesilada, 2015; Underwood, Humphrey, & Foulsham, 2008).

A number of techniques have been suggested in the literature to visualise user scanpaths for analysing them in an exploratory and qualitative way (Räihä, Aula, Majaranta, Rantala, & Koivunen, 2005; Blascheck et al., 2014). These techniques have already been comprehensively reviewed by Blascheck et al. (2014). Apart from the scanpath visualisation techniques, there are also different techniques that could be applicable to eye tracking data for analysing user scanpaths, which are correlated with visual elements of web pages, in a more detailed way. These techniques can typically be used for calculating a similarity/dissimilarity between a pair of scanpaths, computing transition probabilities between visual elements, detecting patterns within given scanpaths and identifying a common scanpath for multiple scanpaths. To make the best use of available data, researchers should select an appropriate technique for their studies. At this point, it is crucial for them to know the strengths and limitations of these techniques. This article, therefore, initially explains how these techniques work. It then provides an analysis and critical evaluation of their strengths and weaknesses supported by data from an eye tracking study.

Although there are several review articles in this field, they mainly focus on a specific set of techniques which can be used for a particular objective, for example, techniques to compare two scanpaths (Le Meur & Baccino, 2013; Anderson, Anderson, Kingstone, & Bischof, 2014). Additionally, some of these techniques are summarised in the related work sections of existing publications (Duchowski et al., 2010; Mast & Burmester, 2011). Furthermore, Holmqvist et al. (2011) published a book on eye tracking methodologies which also introduced some of these techniques.

To the best of the authors’ knowledge, this article is the most comprehensive review and analysis of the techniques which can be used to compare and correlate (i.e., computing transition probabilities between visual elements, finding patterns, identifying common scanpaths, etc.) not only two scanpaths but also more than two scanpaths. It makes a contribution to eye tracking research on the web by guiding researchers to choose an appropriate technique and revealing some directions to address the limitations.

In order to investigate both the strengths and limitations of these techniques, we evaluated them with an eye tracking dataset from a study conducted with twelve users by Brown et al. (2012). We then criticised the techniques based on the results, this meant we used a data-driven approach to investigate, compare and contrast the techniques.

Scanpath analysis is relevant to all studies with the aim of analysing sequential patterns on visual stimuli. Specifically, it can be used for investigating the differences between the sequential patterns of different user groups on web pages, such as male and female groups (Eraslan & Yesilada, 2015). In addition, it can be conducted for recognising the search efficiency of users. For example, longer scanpaths can be interpreted as less efficient searching (Ehmke & Wilson, 2007). Scanpaths can also be analysed to identify common sequential patterns that can be used for different objectives. In particular, common patterns can be a guide to re-engineering web pages to make them more accessible on small screen devices by allowing users to directly access firstly visited visual elements without a lot of scrolling and zooming (Akpınar & Yeşilada, 2015).

The remainder of this article firstly explains our methodology to evaluate the scanpath analysis techniques, secondly revisits them along with their strengths and limitations based on our evaluation, and finally discusses and criticises the techniques to provide some directions to address their limitations.

Methodology

In order to investigate both the strengths and limitations of the scanpath analysis techniques on web pages, we decided to evaluate them with a third-party eye tracking dataset. In other words, we decided to use a dataset that was not previously used to evaluate any of these techniques. In addition, the data was not originally collected for this purpose. Therefore, in this article, we re-evaluated the techniques with the same dataset. This made the evaluation more objective to compare and contrast the techniques.

We unfortunately could not evaluate three of these techniques as highlighted in Table 2. ScanMatch technique works with a grid-layout page segmentation by default (see Figure 2) (Cristino, Mathôt, Theeuwes, & Gilchrist, 2010). It also allows to apply another type of segmentation by associating each pixel to a particular segment. However, there may be some spaces between segments (see Figure 1). In other words, some pixels may not be associated with a particular segment. Because of this limitation of ScanMatch technique, it could not be applied to the dataset. Besides this, the T-Pattern Detection technique is not publicly available, and therefore it could not be applied to the dataset (Magnusson, 2000). As the Multiple Sequence Alignment technique is described at a very abstract level with the lack of details, it could also not be applied the dataset (Hembrooke, Feusner, & Gay, 2006). However, we still analyse these techniques based on their given descriptions.

Dataset

As also stated by Shen and Zhao (2014), there is no publicly available eye tracking dataset on real web pages. Although we also asked some other researchers in related fields (chi-web@listserv.acm.org) whether they have eye tracking datasets to share with us, we could not find any appropriate dataset. Fortunately, we have an eye tracking dataset from a study conducted by Brown et al. (2012) in March 2010. They are members of our Interaction Analysis and Modelling Lab at the University of Manchester. This study aimed to investigate how users interact with dynamic content on web pages. In this study, the participants sat in front of a 17” monitor with a built-in Tobii x50 eye tracker and the screen resolution of 1280 × 1024. The HCW Travel web page (see Figure 1) was shown to the participants and their eye movements were recorded.

The participants were asked to read the latest news from the HCW Travel Company and then click on the link for the special offers. This meant they required to fixate certain visual elements on the web page in a particular order. Specifically, they needed to fixate the element E that includes the latest news, and then fixate the element D that contains the link to see the special offers. Since the latest news were shown next to the Latest News title and the link for the special offers was labelled as Special Offers, the participants could find the related visual elements by only scanning the web page. Twelve people participated in the eye tracking study.

These were students and staff at the University of Manchester ranging between the ages of 18 and 45. We noticed some problems with the results of the eye tracking recordings for two participants as they were distracted, and therefore we had to eliminate their data from our evaluation process. Although the sample size is small in this eye tracking study, it is still good enough in illustrating the strengths and weaknesses of the scanpath analysis techniques. Having small dataset is even better in clearly explaining how these techniques work and comparing them.

Visual Elements

In our evaluation, we used the extended and improved version of the Vision Based Page Segmentation (VIPS) algorithm (http://www.eclipse.org/actf/downloads/tools/eMine/build.php) to segment the HCW Travel web page into its visual elements because it automatically discovers visual elements and correlates them with the underlying source code which is important for further processing of web pages (Akpinar & Yeşilada, 2013). In particular, when scanpaths are correlated with these visual elements, they can then be used for the purpose of re-engineering of web pages (Yesilada, Harper, & Eraslan, 2013).

The VIPS algorithm segments web pages based on the selected segmentation level where smaller visual elements are identified with higher levels. As the 5th level was determined as the most successful level with approximately 74% user satisfaction, we used the 5th level for our evaluation (Akpinar & Yeşilada, 2013).

User Scanpaths in Terms of Visual Elements

Once the visual elements were discovered, we exported the eye tracking data of the ten users and correlated their fixations with the visual elements to construct their individual scanpaths in terms of the visual elements. To achieve this, we used the width, height, x and y coordinates of the visual elements and the x and y coordinates of the fixations. We then simplified the individual scanpaths by abstracting consecutive repetitions as stated in the literature (Brandt & Stark, 1997; Jarodzka, Holmqvist, & Nyströ m, 2010). For example, AABBBCC becomes ABC after the abstraction.

These ten individual scanpaths on the HCW Travel web page are listed in Table 1 (Yesilada et al., 2013; Eraslan, Yeşilada, & Harper, 2013). As can be seen from the table, the participants followed slightly different paths to complete their tasks. For instance, the third and fourth participants fixated more visual elements to complete their tasks in comparison with the participants six and eight (Yesilada et al., 2013).

When the individual scanpaths were ready, we evaluated the scanpath analysis techniques with them. The following section revisits the techniques along with their strengths and limitations based on our evaluation.

Scanpath Analysis Techniques

In this article, we classify the scanpath analysis techniques into four main groups according to their goals. These groups are as follows: (1) Similarity/Dissimilarity Calculation, (2) Transition Probability Calculation, (3) Pattern Detection and (4) Common Scanpath Identification. Table 2 shows an overview of this classification. Specifically, the table represents the groups along with their techniques. For example, it represents that eMINE scanpath algorithm belongs to the group of common scanpath identification (Eraslan et al., 2014). The techniques within the same group mainly have the same goal but not necessary to have the same analysis approach. In particular, in the common scanpath identification group, one approach suggests to apply a hierarchical clustering with the Dotplots algorithm (Goldberg & Helfman, 2010) whereas another approach (eMINE scanpath algorithm) suggests to use the String-edit algorithm and the Longest Common Subsequence technique together for a hierarchical clustering (Eraslan et al., 2014). In addition, Table 2 shows the main requirements for each technique to be able to run them. For example, eMINE scanpath algorithm only requires a number of scanpaths that are represented in terms of visual elements.

Table 2 also shows whether or not the techniques can work with more than two scanpaths at the same time. As shown in the table, the scanpath analysis techniques are typically designed to produce results for more than two scanpaths, except of the techniques from the similarity/dissimilarity calculation group. In that group, the techniques work in a pairwise manner which means they can work with only two scanpaths at the same time. Moreover, Table 2 illustrates if the techniques consider fixation durations and the positions of visual elements on web pages. Most of the techniques tend to ignore fixation durations while analysing scanpaths. However, it is widely accepted that fixation duration is associated with the depth of processing and the ease or difficulty of information processing (Velichkovsky, Rothert, Kopf, Dornhö fer, & Joos, 2002; Follet, Meur, & Baccino, 2011). Furthermore, they usually do not consider the positions of visual elements on web pages. However, eye movement lengths are shorter between close visual elements in comparison with the visual elements which are distant from each other.

There are also a number of techniques with a reductionist approach. In this context, we refer to the reductionism as an oversimplification of multiple scanpaths with the loss of some important information. Thus, the reductionism is associated with detecting patterns and identifying common scanpaths. We articulate the reductionism as follows: (1) When an algorithm is likely to lose a shared visual element because of its position in individual scanpaths, it is classified as reductionist. (2) When an algorithm is intolerant of small deviations within individual scanpaths (especially, ignoring the visual element fixated by the majority), it is also classified as reductionist.

This section revisits and investigates all of these techniques in depth based on our evaluation.

Similarity/Dissimilarity Calculation

A number of techniques are available to compare two scanpaths to determine a similarity or dissimilarity between two scanpaths. These techniques are as follows: the String-edit algorithm (Heminghous & Duchowski, 2006), the String-edit algorithm with a substitution matrix (Takeuchi & Habuchi, 2007), and ScanMatch technique (Cristino et al., 2010). As these techniques do not focus on generating common scanpaths, the reductionism is not applicable for this group.

String-edit Algorithm. The Levenshtein Distance algorithm, which is commonly known as the String-edit algorithm, has been widely used for comparing a pair of scanpaths represented in a string format (Privitera & Stark, 2000; Josephson & Holmes, 2002; Pan et al., 2004; Heminghous & Duchowski, 2006; Underwood et al., 2008; Duchowski et al., 2010; Eraslan et al., 2014; Eraslan & Yesilada, 2015). When user scanpaths are correlated with visual elements of web pages, they are represented in a string format. Therefore, this algorithm can be applied to calculate the distance (i.e., dissimilarity) between two scanpaths by transforming one of them to another with a minimum number of editing operations which are referred to as insertion, deletion and substitution. The minimum number of operations represent the distance between the scanpaths. Albeit the String-edit algorithm is designed to compare a pair of scanpaths, it can be applied to more than two scanpaths in a pairwise manner. Therefore, the most similar scanpaths to a particular scanpath can be identified.

Equation 1 mathematically formalises how to calculate the similarity between a pair of scanpaths as a percentage by using their String-edit distance (Underwood et al., 2008). First of all, the distance (d) is divided by the length of the longer scanpath (n) to calculate a normalised score for preventing any possible inconsistencies that can be caused by different lengths. The normalised score is then subtracted from one and finally multiplied by 100.

(1)

Table 3 illustrates how the String-edit algorithm works with the fifth and seventh scanpaths in Table 1 and aligns them as an illustration. As seen from the example, 8 operations are required in total (1 insertion/deletion + 7 substitutions) to transform one to another. The distance therefore between these scanpaths is calculated as 8 by this algorithm.

When the String-edit algorithm is applied to the scanpaths in Table 1 with a pairwise manner, the matrix shown in Table 4 is created which illustrates the distances between the scanpaths. According to this matrix, the most similar scanpaths are the seventh and ninth scanpaths because their distance (4) is the lowest in comparison to others.

As mentioned above, the similarity between two scanpaths based on the String-edit distance can be calculated as a percentage. For example, the distance is calculated as 8 between the two scanpaths in Table 3. As the length of the longer scanpath is equal to 17, the distance is firstly divided by 17, and therefore the normalised score is calculated. When this score is subtracted from one and then multiplied by 100, the similarity between the scanpaths is calculated as 52.94%.

Even though the String-edit algorithm has been widely used and it can easily be applied to scanpaths, the algorithm has some important drawbacks. In particular, the algorithm does not consider fixation durations while it is calculating a distance between two scanpaths. Besides this, the algorithm does not consider the positions of visual elements on a web page. For example, the cost of substituting the element B with the element E is not different from the cost of substituting the element B with the element G on the HCW Travel web page. However, as can be seen from Figure 1, the element B and the element E are very close to each other whereas there are five different elements between the element B and the element G. It means that the eye movement between the element B and the element E is shorter than the eye movement between the element B and the element G.

String-edit Algorithm with a Substitution Matrix. By default, the cost of all the operations used by the String-edit algorithm is equal to one. However, the substitution costs between visual elements may not be the same because they may be different in size and the geometrical distances between them can also vary. In other words, the substitution cost should be lower for closer visual elements because eye movements between those visual elements are shorter. To counteract with this, a number of different approaches have been suggested in the literature (Josephson & Holmes, 2002; Takeuchi & Habuchi, 2007). In particular, Takeuchi and Habuchi (2007) propose to use an Euclidean Distance or a City Block Distance to construct a substitution cost matrix. Equation 2 below illustrates the Euclidean Distance formula and Equation 3 shows the City Block Distance formula to calculate a substitution cost between two visual elements

\vec{U}

and

\vec{V}

where

{\vec{U}}_{1}

and

{\vec{U}}_{2}

are x and y coordinates of the centre of the visual element

\vec{U}

and α is a type of normalisation parameter (Takeuchi & Habuchi, 2007). Takeuchi and Habuchi (2007) take this normalisation parameter as 0.001. The substitution costs between visual elements are calculated in a pairwise manner and then stored in a matrix. The substitution cost matrix can then be used with the String-edit algorithm.

(2)

(3)

When the Euclidean Distance is used to construct a substitution matrix for the HCW Travel web page, the matrix shown in Table 6 is constructed. The matrix can then be used with the String-edit algorithm to calculate a distance between a pair of scanpaths on the HCW Travel web page by minimising the cost. Therefore, as illustrated in Table 5, the distance (namely, the total operation cost) between the fifth and seventh scanpaths in Table 1 is calculated as 1.96.

Albeit this version of the String-edit algorithm considers the positions of visual elements on a web page while it is determining a distance between a pair of scanpaths, it still does not consider fixation durations.

As stated above, the String-edit algorithm has been widely used in the literature. In particular, Heminghous and Duchowski (2006) developed an application with the String-edit algorithm called iComp. This application segments an image into its areas of interest (AoIs) by using fixation distribution over the image as suggested by Santella and DeCarlo (2004). Once the scanpaths are represented in terms of the AoIs, the application applies the String-edit algorithm to compare the scanpaths. Instead of automatic AoI detection, the evaluators and the users can also identify AoIs according to the evaluation goals or research questions (such as Holsanova et al. (2006)). Josephson and Holmes (2002) also used the String-edit algorithm to organise scanpaths into smaller groups based on their similarities between each other. Furthermore, Underwood et al. (2008) investigated the differences between expert and novice users while they were viewing the visual stimuli in the context of Engineering and Civil War by using the String-edit algorithm.

ScanMatch. Instead of calculating the distance between two scanpaths, Cristino et al. (2010) use the Needleman and Wunsch algorithm to directly calculate the similarity between two scanpaths by using a substitution cost matrix and a gap penalty. They call their approach ScanMatch (http://seis.bris.ac.uk/˜psidg/ScanMatch/). In this approach, the substitution costs are inversely related to the Euclidian distance where the lowest cost is assigned to a pair of visual elements that are the farthest from each other. In addition, there is a threshold value that represents the cut-off point for determining whether the substitution cost is positive or negative. The threshold value can be adjusted to ensure that the alignment is only applied to visual elements within the variability of the saccade amplitudes. The gap penalty can also be changed. Instead of using the substitution matrix generated by ScanMatch technique, a different type of a substitution matrix can also be introduced by a researcher.

The scanpath analysis techniques typically do not take fixation duration into consideration. Thus, Cristino et al. (2010) suggest repeating elements in individual scanpaths based on their fixation durations. To achieve this, an appropriate duration (namely, temporal bin size) should be defined for repeating these elements proportionally to the fixation durations. For example, if the duration is defined as 50 milliseconds and the visual element C is fixated for 200 milliseconds by a user, his or her scanpath will include four (200/50=4) consecutive visual element C (...CCCC...). Takeuchi and Matsuda (2012) tested this approach with an eye tracking study by using the String-edit algorithm and a substitution matrix. They suggest that better results can be achieved by taking this approach into account for scanpath comparison.

ScanMatch technique is mainly designed for analysing user scanpaths on visual stimuli segmented by a grid-layout. Figure 2 shows an example of a 5 × 5 grid-layout segmentation with ScanMatch technique where each element is represented with one upper-case letter and one lower-case letter. The grid size can be adjusted and then user scanpaths can be represented with the segments. ScanMatch technique also allows to use a different segmentation but each pixel should be associated with a visual element. As there were some spaces between visual elements generated by the extended and improved version of the VIPS algorithm (see Figure 1), ScanMatch technique could not be applied the dataset that is described in the Methodology section.

Both the durations of fixations and the positions of visual elements on web pages are considered here. However, the subjectivity level of the results can be an important issue here as there are many parameters that need to be configured. The configurations of those parameters can easily affect the results.

Transition Probability Calculation

Markov Models (West, Haake, Rozanski, & Karn, 2006) and eSeeTrack technique (Tsang, Tory, & Swindells, 2010) are categorised under the transition probability calculation group as they determine transition probabilities between visual elements. The reductionism is not again applicable to this group.

Markov models. In order to calculate transition probabilities between visual elements, Markov models have been used with some variations (West et al., 2006; Chuk, Chan, & Hsiao, 2014; Kang & Landry, 2015). These models can be applied to user scanpaths correlated with visual elements of web pages to generate a transition matrix which holds transition probabilities between visual elements. This matrix can then be used to recognise which visual element can be next and can be before a particular element with their probabilities.

Table 7 shows a transition matrix generated for the scanpaths in Table 1 by using the scanpath analysis tool of West et al. (2006) called eyePatterns (http://eyepatterns.sourceforge.net/) (Yesilada et al., 2013; Eraslan et al., 2013). This matrix includes a positive integer number and two percentages in each cell. The number illustrates the number of transitions from a visual element in a row to a visual element in a column. In addition, the percentages show row and column probabilities respectively where the row probabilities are related to the next visual elements, and the column probabilities are associated with the previous visual elements. For example, as highlighted in Table 7, there are 11 transitions from the visual element A to the visual element C in total, and the transition probability from element A to element C is calculated as 55.01%. Moreover, the probability of fixating element A just before element C is calculated as 23.92%.

As also stated in the literature, Markov models are incapable of identifying whether or not there is a typical scanpath for multiple scanpaths (Abbott & Hrycak, 1990; Josephson, 2010). For example, it could be assumed that the starting point is the visual element C for the scanpaths in Table 1 as it is firstly fixated by most of the users. According to the transition matrix in Table 7, users are more likely to fixate the visual element D after the visual element C. They are then more likely to fixate the visual element E and then the visual element D again. It continues as CDEDED..., and therefore a number of considerable questions arise, especially what the ending point should be and which probabilities should be used. Furthermore, the durations of fixations and the positions of visual elements on web pages are not used while creating the transition matrix.

eSeeTrack. There is another analysis tool called eSeeTrack which visualises eye tracking data based on the segments of visual stimuli by using a timeline and a tree visualisation (Tsang et al., 2010). The timeline illustrates a sequence of fixations based on visual elements for each user. Each fixation is represented as a coloured band, and the width of the band represents the duration. As a result, the long fixations can be recognised in the timeline. Moreover, the tree visualisation allows recognition of transitions between segments for multiple users where higher probabilities are highlighted with larger sizes.

An example of the tree visualisation is illustrated in Figure 3. Even though fixation durations are considered by eSeeTrack, the positions of visual elements on visual stimuli are not taken into consideration. Similar to Markov models, eSeeTrack is not able to identify whether or not there is a typical scanpath for multiple scanpaths.

Instead of calculating transition probabilities between visual elements of web pages, some other techniques have also been suggested in the literature to detect patterns within multiple scanpaths. These techniques are revisited and investigated in the following section.

Pattern Detection

The pattern detection techniques range from searching for a particular pattern to detecting all patterns with the number of matches. This group consists of eyePatterns analysis tool (West et al., 2006), the Sequential Pattern Mining algorithm (Hejmady & Narayanan, 2012) and the T-Pattern Detection technique (Magnusson, 2000).

eyePatterns-Search Patterns. When people want to check whether a particular pattern exists within given scanpaths or not, they can use eyePatterns analysis tool (West et al., 2006). For example, on the HCW Travel web page, the participants were asked to read the latest news from the company and click on the link for the special offers. They, therefore, needed to fixate the visual elements E and D respectively to complete their tasks successfully. When the pattern ED is searched in their scanpaths, the analysis tool provides the results shown in Table 8.

According to these findings, the pattern ED is not seen in all of the scanpaths. However, as these participants completed their tasks successfully, it is expected to see this pattern in their scanpaths. The participants might not complete their tasks directly, so there could be other visual elements between the visual elements E and D. Hence, this analysis tool also has an option (namely, gap size) to make the search more flexible by allowing other visual elements between the desired visual elements (maximum five elements), such as allowing to find the pattern ED in the scanpath CACDECECD (S8 in Table 1).

While eyePatterns analysis tool is searching for sequential patterns in given scanpaths, it does not check the durations of fixations and the positions of visual elements on web pages. Moreover, if there are more than five elements between the desired two elements, the two elements cannot be combined to be detected as a pattern.

eyePatterns–Discover Patterns. eyePatterns analysis tool can also be used to discover patterns within multiple scanpaths based on the defined pattern length (West et al., 2006). When it is applied to a number of scanpaths with a particular length, it lists the patterns with how many times they are seen in the paths and how many scanpaths are inclusive of the patterns. Hence, when this tool is applied to the scanpaths on the HCW Travel web page with the default length 4, the discovered patterns are listed as shown in Table 9. For example, the pattern EDED is seen ten times but in four out of ten scanpaths.

This tool does not have a tolerance for extra visual elements within patterns while discovering them. It means it cannot discover the pattern EDED in the scanpath BCECDCDCDEDECDC because of the visual element C. Because of this reason, this tool is reductionist while discovering patterns. In other words, it is likely to detect no pattern or very short patterns that are not helpful for understanding users’ behaviours on web pages. In addition, this tool does not consider the durations of fixations and the positions of visual elements on web pages during the discovery of patterns.

Sequential Pattern Mining. The Sequential Pattern Mining (SPAM) algorithm has also been used to identify patterns within multiple scanpaths (Hejmady & Narayanan, 2012). Although this algorithm was originally developed for detecting frequent patterns in a sequence database (Ayres, Flannick, Gehrke, & Yiu, 2002), it can also be applied to user scanpaths correlated with visual elements of web pages. In contrast to eyePatterns analysis tool, the SPAM algorithm has tolerance to extra visual elements within patterns while discovering them. To find the patterns that are included in all the scanpaths, the minsup parameter, the percentage of scanpaths that include the pattern, should be set to one (or 100%) (Fournier-Viger et al., 2014).

When the SPAM algorithm is applied to the scanpaths in Table 1 to detect patterns that are seen in all the scanpaths, it finds CDED and DCED as the longest patterns (Fournier-Viger et al., 2014). In contrast, as seen in Table 9, eyePatterns analysis tool cannot detect any pattern with the length four which exists in all the scanpaths. Similar to eyePatterns analysis tool, the SPAM algorithm does not pay attention to the durations of fixations and the positions of visual elements on web pages. This algorithm has also a reductionist approach. Specifically, when the individual scanpaths VWXYZ, VWYZ and VXWZY are available, the patterns VWY and VWZ are identified as the longest patterns which are seen all the scanpaths. However, the elements V, W, Y and Z exist in all the scanpaths.

T-Pattern Detection. T-Pattern Detection, which stands for Temporal Pattern Detection, is another approach that has been used to detect patterns within user scanpaths (Burmester & Mast, 2010; Mast & Burmester, 2011; Drusch & Bastien, 2012). It was originally developed by Magnusson (2000) in the area of behavioural science for analysing social interaction but now it can be used in different areas (Mast & Burmester, 2011). For example, this approach was used by Borrie, Jonsson, and Magnusson (2002) to analyse the movements of the ball and the players in some soccer matches. As the T-Pattern Detection technique is now a commercial product (http://www.noldus.com/theme/t-pattern-analysis), researchers need to pay for using it in their studies.

T-Pattern detection requires a behaviour sequence which is coded in terms of the occurrences of event types with their times (Magnusson, 2000). The event type represents the beginning or ending of some particular behaviour such as starting to fixate the visual element A (Magnusson, 2000). As also stated by Burmester and Mast (2010) and Mast and Burmester (2011), two event types are defined as a T-Pattern if they meet the following two conditions:

Both of the two event types appear more than once in the behaviour sequence in the same order.
Both of the two event types appear invariantly over time.

According to Magnusson (2000), there are two possible types of distribution which are called Critical Intervals: Fast and Free Critical Intervals. As also stated by Burmester and Mast (2010) and Mast and Burmester (2011), for the Fast Critical Interval type, the event type A should occur relatively quickly before the event type B. In contrast, for the Free Critical Interval type, the event type A can occur before the event type B within a defined time interval but the time distance between the type A and the type B should be relatively similar. Figure 4 shows the difference between the Fast and Free Critical Intervals with an example (Mast & Burmester, 2011).

As a result of an iterative process, each T-Pattern can be combined with another event type or T-Pattern to create a longer T-Pattern (see an example in Figure 5) (Magnusson, 2000; Mast & Burmester, 2011). A T-Pattern with n components can be represented as follows: X₁ [d1, d2]₁ X₂ [d1, d2]₂ ... X_i [d1, d2]_i X_i₊₁... X_n where [d1, d2] represents the critical interval (Magnusson, 2000).

This technique uses the significance level parameter while generating T-Patterns (Magnusson, 2000). This parameter is related with critical intervals and it influences the number of event types in TPatterns (Magnusson, 2000). When the significance level decreases, less and shorter patterns are detected (Magnusson, 2000). The T-Patterns can also be filtered by using various criteria such as the minimum pattern length, the minimum number of occurrences of the pattern (Magnusson, 2000).

The T-Pattern Detection technique has many different parameters, and the detected patterns can be affected based on the adjustments of these parameters. As a consequence, the subjectivity level of the results can be a problem. By using strict values, the technique can also become reductionist, especially with the Fast Critical Intervals. As illustrated in Figure 4, the pattern AB may not be detected as a T-Pattern because of the Fast Critical interval. Likewise to the majority of the scanpath analysis techniques (see Table 2), the T-Pattern Detection technique does not consider the positions of visual elements on visual stimuli. However, the durations of fixations are used for detecting T-Patterns.

Common Scanpath Identification

As presented above, different techniques have been used to detect patterns within user scanpaths. These techniques can detect more than one pattern for given scanpaths. For example, the SPAM algorithm provides CDED and DCED as the longest patterns for the scanpaths in Table 1. In contrast to these techniques, different techniques are also available to identify one scanpath for representing the entire group which is typically known as a common scanpath. This group includes the following techniques: the Shortest Common Supersequence technique (Räihä, 2010), the Multiple Sequence Alignment technique (Hembrooke et al., 2006), the Position-based Weighted Models of Sutcliffe and Namoun (2012), the Position-based Weighted Models of Sutcliffe and Namoun (2012), Hierarchical Clustering with the Dotplots algorithm (Goldberg & Helfman, 2010) and eMINE scanpath algorithm (Eraslan et al., 2014).

Shortest Common Supersequence. One of these techniques is the Shortest Common Supersequence (SCS) technique (Räihä, 2010). According to Räihä (2010), the sequence P can be a supersequence of the sequences S1 and S2 if the deletion of zero or more characters from P can provide S1 and S2. When this technique is repeatedly applied to the scanpaths in Table 1, it provides the scanpath shown in Example 1.

Example 1.

The common scanpath of the Shortest Common Supersequence Technique for the scanpaths in Table 1 A E B C E B D C F A D C A D C F D E B D A E B D F

As can be clearly seen from the common scanpath, this technique has considerable weaknesses. In particular, it provides a quite longer scanpath compared to the individual scanpaths. For example, the average length of the individual scanpaths in Table 1 is equal to 19.9 (Standard Deviation: 10.61) but the common scanpath for those scanpaths consists of 63 visual elements including repetitions. In contrast to the reductionism, this technique provides an unnecessarily complicated result. Furthermore, the common scanpath is not supported by the majority. For instance, it includes the visual element F four times but this visual element is only included by the third scanpath three times and fourth scanpath only once. Neither the durations of fixations nor the positions of visual elements on web pages are used by the SCS technique.

Multiple Sequence Alignment Technique. Hembrooke et al. (2006) propose to use the multiple sequence alignment technique to identify an average scanpath for multiple users. In other words, they suggest to align repeatedly a scanpath with another scanpath in the list of scanpaths until a single scanpath is left in the list that represents their average scanpath. However, the technique is not described in depth and they have not evaluated this technique with any subsequent study yet.

When two scanpaths are aligned, their shared visual elements can be lost because of their positions in the scanpaths. For example, two scanpaths are aligned in Table 3. Although the first scanpath starts with the element D and the second scanpath has the element D in the second position, the element D is lost in the result of the alignment. Therefore, this technique becomes reductionist because of the alignment process. The durations of fixations and the positions of visual elements on web pages are not taken into consideration here.

Position-based Weighted Models. Sutcliffe and Namoun (2012) use a position-based weighted model to investigate where users focus in very early phases of their searches on web pages. They firstly segment web pages by using a 3 × 3 grid-layout segmentation, and then find the corresponding segments of the first three fixations of users on the web pages. They then give one point to the first segments, 0.5 points to the second segments and 0.2 points to the third segments. After this, they calculate the total point for each segment and sort them by the total points in descending order.

When the position-based weighted model of Sutcliffe and Namoun (2012) is applied to the scanpaths in Table 1 (see Table 10), the initially visited visual elements on the HCW Travel web pages are identified as follows: C (7.2 points), A (4 points), B (2.7 points), D (2.7 points), E (0.4 points). This model only concentrates on very early phases of searching on web pages. Moreover, there cannot be any repetition in the common path but users can fixate the same visual element more than once. As this model only focusses on the first three visual elements in individual scanpaths and none of the visual elements are excluded, the reductionism is not applicable here.

Holsanova et al. (2006) applies a similar approach to analyse reading paths and reading priorities on newspaper spreads. They firstly divide a newspaper spread into its AoIs and then rank them based on the first visits of the AoIs by users (Holmqvist et al., 2011).

The HCW Travel web page has seven visual elements. Thus, when the position-based weighted model of Holsanova et al. (2006) is applied to the scanpaths in Table 1 by giving 7 points to the firstly visited visual elements and no point to the non-visited visual elements, the sequence of the visual elements for all the scanpaths is identified as follows: CDABEFG. Table 1 shows the points for each visual element in each scanpath. Although the same AoI can be visited several times by users, the repetitions are not taken into consideration by this approach. Besides this, some AoIs may not attract users but none of the AoIs is excluded in their model. Therefore, the reductionism is not also applicable for this model.

Both of the position-based models of Sutcliffe and Namoun (2012) and Holsanova et al. (2006) do not consider the durations of fixations and the positions of visual elements on visual stimuli.

Hierarchical Clustering with the Dotplots algorithm. The Dotplots algorithm is also suggested by Goldberg and Helfman (2010) for clustering multiple scanpaths hierarchically to identify their common scanpath. The algorithm was originally developed for the purpose of comparing two biological sequences (Krusche & Tiskin, 2010). Figure 6 illustrates how this algorithm works with the seventh and ninth scanpaths in Table 1 as an example (Eraslan et al., 2013). As can be seen from this example, it uses a two-dimensional matrix. One scanpath is written horizontally (S7) and another one is written vertically (S9). When the same visual elements are matched, their intersections are marked with dots. The dots are then used to find the longest straight line as a shared scanpath. As shown in Figure 6, BCCDCDDED, which is represented by a solid line, can be found as a shared scanpath of the seventh and ninth scanpaths in Table 1

To hierarchically cluster multiple scanpaths with the Dotplots algorithm, the two most similar scanpaths are selected from the list of scanpaths by using the Dotplots algorithm and then the selected scanpaths are merged. Next, the merged scanpath is added to the list of scanpaths and then the selected two scanpaths are removed. This process is repeated until only one scanpath is left in the list that represents the common scanpath. In order to merge two scanpaths, they suggested two different ways: (1) Identifying a shared scanpath of two similar scanpaths by using the Dotplots algorithm (2) Assigning one of the two similar scanpaths to the merged scanpath. The second way is related to the selection of one of the individual scanpaths as a common scanpath that is a debatable idea as users might follow different paths to complete their tasks (see Figure 9).

Figure 7 shows how the scanpaths in Table 1 are hierarchically clustered with the standard Dotplots algorithm by using the first way of merging. It is also used by Albanesi et al. (2011) and they call the result a dominant path.

As can be seen from Figure 7, only visual element C is identified as a common scanpath for the scanpaths in Table 1 with this hierarchical clustering. It is mainly caused by the Dotplots algorithm. It can be recognised from Figure 6 that illustrates how the Dotplots algorithm finds the shared scanpath of two scanpaths. Although the dashed line can provide a longer shared scanpath in comparison to the solid line, it cannot be detected because of the disconnections. Hence, this algorithm makes the hierarchical clustering reductionist at the end. Besides, neither the durations of fixations nor the positions of visual elements on web pages are used by this approach to identify a common scanpath.

eMINE Scanpath Algorithm. Eraslan et al. (2014) propose another algorithm called eMINE scanpath algorithm (http://emine.ncc.metu.edu.tr/software.html) to address the problem of being reductionist. This algorithm is comprised of some of other techniques. It firstly chooses the two most similar scanpaths from the list of scanpaths with the String-edit algorithm. The Longest Common Subsequence (LCS) technique is then applied to these two scanpaths to find their common scanpath (Chiang, 2009). After that, the chosen two scanpaths are removed from the list and then their common scanpath is added to the list. This process is repeated until there is a single scanpath in the list. The single scanpath is then abstracted to provide the common scanpath. When this algorithm is applied to the scanpaths in Table 1, it provides CDED as a common scanpath (see Figure 8) (Yesilada et al., 2013).

eMINE scanpath algorithm tries to address the reductionist problem of the Dotplots algorithm by using the String-edit algorithm and the LCS technique together instead. However, it still uses a hierarchical clustering and that means some visual elements can be lost at the intermediate levels. Because of this reason, eMINE scanpath algorithm is still likely to produce very short common scanpaths which are not useful for further processing of web pages. Assume that the individual scanpaths S6: DCDCABEDCD, S8: CACDECECD and S10: CACDADCBECDCB are available (see Table 1). First of all, the individual scanpaths S8: CACDECECD and S10: CACDADCBECDCB are merged as S (8,10): CACDCECD. When S6: DCDCABEDCD is merged with S (8,10): CACDCECD, CDCECD is identified as a common scanpath. As can be seen from this example, although the visual element A is shared by the three individual scanpaths, it is not included in the common scanpath. Similar to other techniques to identify a common scanpath for multiple scanpaths (see Table 2), eMINE scanpath algorithm does not consider the durations of fixations and the positions of visual elements on web pages.

Discussion

To support researchers in identifying salient web page features, eye tracking software products typically provide heat maps showing those parts of web pages which are mostly fixated by users (http://www.tobii.com/). However, these maps are not designed to illustrate user scanpaths. These products also allow the visualisation of scanpaths along with gaze plots (see an example in Figure 1). Visualisations based on gaze plots are simple individual scanpaths displayed together. These have a limited benefit in evaluating a website in terms of generalisability. When there are multiple scanpaths, these plots become useless because it is difficult to distinguish them (see an example in Figure 9). While there are other visualisation techniques (Räihä et al., 2005), these also become complicated to analyse as the number of users increase.

This article analyses the techniques which can be used to compare and correlate multiple user scanpaths. For instance, the techniques of the similarity/dissimilarity calculation group can be used for comparing scanpaths of two different user groups to investigate whether they follow different paths to complete a particular task (Eraslan & Yesilada, 2015). Moreover, the techniques of the transition probability calculation group can be used for investigating the efficiency of the arrangements of elements (Ehmke & Wilson, 2007). Furthermore, the techniques of the pattern detection and the common scanpath identification groups can be applied to user scanpaths and then the results can be used for re-engineering web pages to allow a direct access to firstly visited visual elements (Yesilada et al., 2013; Akpınar & Yeşilada, 2015).

While all methodologies have a pros and cons (see Table 2), it is worth discussing some of the more notable limitations, along with suggestions for their mitigation.

Pre-processing: Eye tracking data typically consist of a large number of fixations, however, some of the fixations may not be meaningful. For example, involuntary eye movements may occur due to the oculomotor system (Cornsweet, 1956). Since scanpaths are correlated with visual elements of web pages by using fixations, meaningless fixations should be eliminated from the eye tracking data to reduce the variance. For example, our analysis showed that eyePatterns analysis tool cannot discover the pattern EDED in the scanpath BCECDCDCDEDECDC because of the element C. However, the element might be present due to a meaningless fixation. Therefore, eye tracking data should be pre-processed to ensure that meaningless fixations are excluded for improving the quality of the data. The key is identifying ‘meaningless’ fixations in a well found manner.

In the literature, there are researchers who remove the fixations if their durations are below a particular threshold. For example, Rämä and Baccino (2010) eliminated the fixations with a duration less than 100 milliseconds from their studies. However, different approaches exist for a duration that is needed to extract information from a display (Rayner, Smith, Malcolm, & Henderson, 2009; Glö ckner & Herbold, 2011). In particular, Rayner et al. (2009) suggest that users require at least 150 milliseconds for each fixation to process a display normally. However such generalisations can be a problem because web pages can differ in their degrees of complexity. Therefore, the duration needed to extract information can be different from one page to another. The duration can also be affected by individual factors, such as gender (Pan et al., 2004). When a pre-defined threshold is used for eliminating meaningless fixations, eye tracking data can be biased in some way. Instead of using a predefined threshold, a new value can be determined for each page by analysing the data. In particular, researchers can benefit from analysing user fixations on target areas to identify the minimum duration that is needed to achieve the target.

Cognitive Processing: It is widely accepted that fixation duration is related to the depth of processing and the ease or difficulty of information processing (Velichkovsky et al., 2002; Follet et al., 2011). To take cognitive processing into account, fixation durations should be carefully considered. However, the majority of the scanpath analysis techniques do not consider fixation durations (see Table 2). For example, our analysis showed that eMINE scanpath algorithm provides CDED as a common scanpath for the scanpaths in Table 1 but it does not illustrate which element has the longest time.

As also mentioned above, there are researchers who eliminate fixations based on a particular duration, even though they might have some information content. Researchers should also give their attention to fixation durations while they are analysing scanpaths. In particular, they should determine how much time is typically needed to complete the task that they want to ask their users. When a particular user completes the task in an unexpected duration, the user’s data should be analysed to investigate the reasons.

Reductionist Approach: Our analysis showed that scanpath analysis techniques tend to be reductionist while discovering patterns and identifying common scanpaths. In other words, the common scanpaths/patterns are likely to be unacceptably short which is not helpful for understanding users’ behaviours on web pages. In particular, the common scanpath/pattern may not include the visual element shared by all individual scanpaths and/or the visual element included by the majority of the scanpaths. For example, the common scanpath identified by eMINE scanpath algorithm for the individual scanpaths DCDCABEDCD, CACDECECD and CACDADCBECDCB does not include the element A even though it is included in all of the individual scanpaths (See the details in eMINE Scanpath Algorithm section). A technique with a reductionist approach may also identify no common scanpath/pattern or a common scanpath/pattern with a single element. Since a single element does not illustrate a sequence, it is not helpful for understanding sequential behaviours of users on web pages. This problem can be addressed by taking the following suggestions into consideration.

The commonly visited visual elements should be included in the common scanpaths/patterns. Hence, researchers should firstly identify these elements and ensure that these visual elements are included in the common scanpaths/patterns.
The firstly visited visual elements should be located at the initial positions of common scanpaths/patterns. For instance, if the visual element C is firstly visited by the majority of the users, it should be located at the beginning of the common scanpath/pattern.
Small deviations should be allowed from strict sequentiality in some cases. In particular, there can be some visual elements that are fixated by all users but in a slightly different order. Researchers should ensure that these visual elements are also included in the common scanpaths/patterns.

Even though this article focuses on the web, the scanpath analysis techniques have also been used in different domains. For example, Hejmady and Narayanan (2012) applied the SPAM algorithm to identify visual attention patterns of programmers when they debug programs with an Integrated Development Environment (IDE). Another example from Hejmady and Narayanan (2012) who used a position-based model to analyse entry points and reading paths of readers on newspaper spreads. As the techniques revisited can be applied to all static visual stimuli, researchers from different domains can also benefit from this article.

In order to analyse and compare the scanpath analysis techniques, we used the eye tracking data of ten users. Even though the dataset is small, it is useful to illustrate the pros and cons of the techniques. The techniques can also be analysed and compared with a larger dataset in the future. However, when the sample size increases, the variations are also likely to increase. Therefore, the techniques may experience some problems to deal with these variations, especially the techniques that try to detect patterns or identify common scanpaths. In particular, they may not able to provide any result because of the variations.

Finally, in this article, we unfortunately could not apply some of the techniques to the dataset. For example, the implementation of the T-Pattern Detection technique is not publicly available (Magnusson, 2000). We believe that the implementations of the scanpath analysis techniques should be available for research/testing purposes to support eye tracking research.

Conclusions

Scanpaths correlated with visual elements of web pages can be analysed by using different techniques. Each of these techniques has its strengths and weaknesses and the researchers should pick those which are the most appropriate for the task at hand. While this article combines and revisits these techniques, and investigates their strengths and weaknesses by evaluating them with a third-party eye tracking dataset, all possible situations cannot be tested (see the Methodology section). This article also classifies the scanpath analysis techniques according to their goals as shown in Table 2, and by so doing, allows researchers to focus directly on the techniques that are suitable for their scanpath analysis on web pages. The main concluding remarks are listed below.

The String-edit algorithm is useful and straightforward to determine the similarity between a pair of scanpaths as a percentage (Underwood et al., 2008). However, when researchers pay attention to the distances between visual elements on web pages, they should create a substitution cost matrix based on the distances and then integrate the matrix into the String-edit algorithm (Takeuchi & Habuchi, 2007).
When researchers want to investigate transition probabilities between visual elements of web pages, they should consider a transition matrix as it clearly illustrates the transition probabilities (West et al., 2006).
eyePatterns analysis tool is publicly available and it helps researchers to search for a particular pattern within given scanpaths by allowing some gaps between the visual elements within the pattern (West et al., 2006).
When researchers want to detect repetitive patterns within multiple scanpaths, they should use the T-Pattern Detection technique that provides a number of different parameters for them to configure according to their goals (Magnusson, 2000). However, the implementation of the T-Pattern Detection technique is a commercial product.
If an oversimplification can be a problem for researchers while they are identifying patterns and common scanpaths for multiple scanpaths, they should avoid using the techniques with a very reductionist approach.
It is widely accepted that fixation durations have a relationship with the depth of processing and the ease or difficulty of information processing (Velichkovsky et al., 2002; Follet et al., 2011). Therefore, when researchers want to consider cognitive processing, they should pick the techniques that use fixation durations based on their goals. For example, they should use ScanMatch technique to compare a pair of scanpaths (Cristino et al., 2010).

To make the ideal use of available eye tracking data, researchers should select an appropriate technique for their studies. In order to do so, they should be aware of the strengths and weaknesses of the alternatives and this article aims to support that.

References

Abbott, A.; Hrycak, A. Measuring Resemblance in Sequence Data: An Optimal Matching Analysis of Musicians’ Careers. American Journal of Sociology 1990, 96(1), 144185. [Google Scholar] [CrossRef]
Akpınar, E.; Yeṣilada, Y. “Old Habits Die Hard!”: Eyetracking Based Experiential Transcoding: A study with Mobile Users. In Proceedings of the 12th Web for All Conference; ACM: New York, NY, USA, 2015; pp. 12:1–12:5. [Google Scholar]
Akpinar, M. E.; Yeşilada, Y. Sheng, Q. Z., Kjeldskov, J., Eds.; Vision Based Page Segmentation Algorithm: Extended and Perceived Success. In Current Trends in Web Engineering; Springer International Publishing, 2013; Vol. 8295, pp. 238–252. [Google Scholar]
Albanesi, M. G.; Gatti, R.; Porta, M.; Ravarelli, A. Towards Semi-Automatic Usability Analysis through Eye Tracking. In Proceedings of the 12th International Conference on Computer Systems and Technologies; ACM: New York, NY, USA, 2011; pp. 135–141. [Google Scholar]
Anderson, N.; Anderson, F.; Kingstone, A.; Bischof, W. A comparison of scanpath comparison methods. Behavior Research Methods 2014, 2014, 1–16. [Google Scholar]
Ayres, J.; Flannick, J.; Gehrke, J.; Yiu, T. Sequential PAttern Mining Using a Bitmap Representation. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2002; pp. 429–435. [Google Scholar]
Blascheck, T.; Kurzhals, K.; Raschke, M.; Burch, M.; Weiskopf, D.; Ertl, T. Borgo, R., Maciejewski, R., Viola, I., Eds.; State-of-the-Art of Visualization for Eye Tracking Data. In EuroVis–STARs; The Eurographics Association, 2014. [Google Scholar]
Borrie, A.; Jonsson, G.; Magnusson, M. Temporal pattern analysis and its applicability in sport: An explanation and exemplar data. Journal of Sports Sciences 2002, 20(10), 845–852. [Google Scholar]
Brandt, S. A.; Stark, L. W. Spontaneous Eye Movements During Visual Imagery Reflect the Content of the Visual Scene. Journal of Cognitive Neuroscience 1997, 9(1), 27–38. [Google Scholar] [PubMed]
Brown, A.; Jay, C.; Harper, S. Tailored presentation of dynamic web content for audio browsers. International Journal of Human-Computer Studies 2012, 70(3), 179–196. [Google Scholar] [CrossRef]
Burmester, M.; Mast, M. Repeated Web Page Visits and the Scanpath Theory: A Recurrent Pattern Detection Approach. Journal of Eye Movement Research 2010, 3(4), 1–20. [Google Scholar]
Chiang, C.-H. A Genetic Algorithm for the Longest Common Subsequence of Multiple Sequences. Unpublished master’s thesis, National Sun Yat-sen University, 2009. [Google Scholar]
Chuk, T.; Chan, A. B.; Hsiao, J. H. Understanding eye movements in face recognition using hidden Markov models. Journal of Vision 2014, 14(11), 1–14. [Google Scholar] [CrossRef]
Cornsweet, T. N. Determination of the stimuli for involuntary drifts and saccadic eye movements. Journal of the Optical Society of America 1956, 46(11), 987–988. [Google Scholar] [CrossRef]
Cristino, F.; Mathôt, S.; Theeuwes, J.; Gilchrist, I. D. ScanMatch: A novel method for comparing fixation sequences. Behavior Research Methods 2010, 42(3), 692–700. [Google Scholar]
Drusch, G.; Bastien, J. M. C. Analyzing visual scanpaths on the Web using the mean shift procedure and Tpattern detection: A bottom-up approach. In Proceedings of the 2012 Conference on Ergonomie Et Interaction Hommemachine; ACM: New York, NY, USA, 2012; pp. 181:181–181:184. [Google Scholar]
Duchowski, A. T.; Driver, J.; Jolaoso, S.; Tan, W.; Ramey, B. N.; Robbins, A. Scanpath Comparison Revisited. In Proceedings of the 2010 Symposium on Eye Tracking Research & Applications; ACM: New York, NY, USA, 2010; pp. 219–226. [Google Scholar]
Ehmke, C.; Wilson, S. Identifying Web Usability Problems from Eye-Tracking Data. In Proceedings of the 21st British HCI Group Annual Conference on People and Computers: HCI...But Not As We Know It; British Computer Society: Swinton, UK, UK, 2007; Vol. 1, pp. 119–128. [Google Scholar]
Eraslan, S.; Harper, Y.; Yeṣilada, S. Analysis of Algorithms to Identify Patterns in Eye-tracking Scanpaths (in Turkish). In the 21st Signal Processing and Communications Applications Conference; 2013; pp. 1–4. [Google Scholar]
Eraslan, S.; Yesilada, Y. Patterns in Eyetracking Scanpaths and the Affecting Factors. Journal of Web Engineering–Special Issue on ”Engineering the Web for users, developers and the crowd” 2015, 14(4&5), 363–385. [Google Scholar]
Eraslan, S.; Yesilada, Y.; Harper, S. Casteleyn, S., Rossi, G., Winckler, M., Eds.; Identifying Patterns in Eyetracking Scanpaths in Terms of Visual Elements of Web Pages. In Web Engineering; Springer International Publishing, 2014; Vol. 8541, pp. 163–180. [Google Scholar]
Follet, B.; Meur, O. L.; Baccino, T. New insights into ambient and focal visual fixations using an automatic classification algorithm. iPerception: Open-access Journal of Human, Animal, and Machine Perception 2011, 2(6), 592–610. [Google Scholar]
Fournier-Viger, P.; Gomariz, A.; Gueniche, T.; Soltani, A.; Wu, C.-W.; Tseng, V. S. SPMF: A Java Open-Source Pattern Mining Library. Journal of Machine Learning Research 2014, 15, 3389–3393. [Google Scholar]
Glöckner, A.; Herbold, A.-K. An Eye-tracking Study on Information Processing in Risky Decisions: Evidence for Compensatory Strategies Based on Automatic Processes. Journal of Behavioral Decision Making 2011, 24, 71–78. [Google Scholar]
Goldberg, J. H.; Helfman, J. I. Scanpath Clustering and Aggregation. In Proceedings of the 2010 Symposium on Eye Tracking Research & Applications; ACM: New York, NY, USA, 2010; pp. 227–234. [Google Scholar]
Gossen, T.; Höbel, J.; Nürnberger, A. Usability and Perception of Young Users and Adults on Targeted Web Search Engines. In Proceedings of the 5th Information Interaction in Context Symposium; ACM: New York, NY, USA, 2014; pp. 18–27. [Google Scholar]
Groner, R.; Siegenthaler, E.; Raess, S.; Wurtz, P.; Bergamin, P. Improving the Usability of eLearning Tools: The IFEL Multifunctional Analysis and Its Application in Distance Teaching. In Proceedings of the ICDE/EADTU Conference, Maastricht, Netherlands; 2009. [Google Scholar]
Guan, Z.; Lee, S.; Cuddihy, E.; Ramey, J. The Validity of the Stimulated Retrospective Think-Aloud Method as Measured by Eye Tracking. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems; ACM: New York, NY, USA, 2006; pp. 1253–1262. [Google Scholar]
Hejmady, P.; Narayanan, N. H. Visual attention patterns during program debugging with an IDE. In Proceedings of the 2012 Symposium on Eye Tracking Research & Applications; ACM: New York, NY, USA, 2012; pp. 197–200. [Google Scholar]
Hembrooke, H.; Feusner, M.; Gay, G. Averaging Scan Patterns and What They Can Tell Us. In Proceedings of the 2006 Symposium on Eye Tracking Research & Applications; ACM: New York, NY, USA, 2006; pp. 41–41. [Google Scholar]
Heminghous, J.; Duchowski, A. T. iComp: A Tool for Scanpath Visualization and Comparison. In Proceedings of the 3rd Symposium on Applied Perception in Graphics and Visualization; ACM: New York, NY, USA, 2006; pp. 152–152. [Google Scholar]
Holmqvist, K.; Nyström, M.; Andersson, R.; Dewhurst, R.; Jarodzka, H.; van de Weijer, J. Eye Tracking: A comprehensive guide to methods and measures; OUP Oxford, 2011. [Google Scholar]
Holsanova, J.; Rahm, H.; Holmqvist, K. Entry points and reading paths on newspaper spreads: Comparing a semiotic analysis with eye-tracking measurements. Visual Communication 2006, 5(1), 65–93. [Google Scholar]
Jarodzka, H.; Holmqvist, K.; Nyström, M. A Vectorbased, Multidimensional Scanpath Similarity Measure. In Proceedings of the 2010 Symposium on Eye Tracking Research & Applications; ACM, 2010; New York, NY, USA, pp. 211–218. [Google Scholar]
Josephson, S. Josephson, S., Barnes, S. B., Lipton, M., Eds.; Using Eye Tracking to See How Viewers Process Visual Information in Cyberspace. In Visualizing the Web: Evaluating Online Design from a Visual Communication Perspective; Peter Lang Publishing Inc.: New York, 2010; Vol. 1, pp. 99–122. [Google Scholar]
Josephson, S.; Holmes, M. E. Visual Attention to Repeated Internet Images: Testing the Scanpath Theory on the World Wide Web. In Proceedings of the 2002 Symposium on Eye Tracking Research & Applications; ACM: New York, NY, USA, 2002; pp. 43–49. [Google Scholar]
Kang, Z.; Landry, S. J. An Eye Movement Analysis Algorithm for a Multielement Target Tracking Task: Maximum Transition-Based Agglomerative Hierarchical Clustering. IEEE Transactions on Human-Machine Systems 2015, 45(1), 13–24. [Google Scholar]
Krusche, P.; Tiskin, A. Chapman, B., Desprez, F., Joubert, G. R., Lichnewsky, A., Peters, F., Priol, T., Eds.; Computing alignment plots efficiently. In Advances in Parallel Computing; IOS Press: Amsterdam, 2010; Vol. 19, p. p. pp. 158165. [Google Scholar]
Le Meur, O.; Baccino, T. Methods for comparing scanpaths and saliency maps: Strengths and weaknesses. Behavior Research Methods 2013, 45(1), 251–266. [Google Scholar]
Magnusson, M. S. Discovering hidden time patterns in behavior: T-patterns and their detection. Behavior Research Methods, Instruments, & Computers 2000, 32(1), 93–110. [Google Scholar]
Mast, M.; Burmester, M. Exposing Repetitive Scanning in Eye Movement Sequences with T-pattern Detection. Proceedings IADIS International Conference Interfaces and Human Computer Interaction, Rome, Italy; 2011; pp. 137–145. [Google Scholar]
Noton, D.; Stark, L. Scanpaths in eye movements during pattern perception. Science 1971a, 171(3968), 308–311. [Google Scholar]
Noton, D.; Stark, L. Scanpaths in saccadic eye movements while viewing and recognizing patterns. Vision Research 1971b, 11(9), 929–942. [Google Scholar]
Pan, B.; Hembrooke, H. A.; Gay, G. K.; Granka, L. A.; Feusner, M. K.; Newman, J. K. The Determinants of Web Page Viewing Behavior: An Eye-Tracking Study. In Proceedings of the 2004 Symposium on Eye Tracking Research & Applications; ACM: New York, NY, USA, 2004; pp. 147–154. [Google Scholar]
Privitera, C. M.; Stark, L. W. Algorithms for Defining Visual Regions-of-Interest: Comparison with Eye Fixations. IEEE Transactions on Pattern Analysis and Machine Intelligence 2000, 22(9), 970–982. [Google Scholar]
Räihä, K.-J. Elomaa, T., Mannila, H., Orponen, P., Eds.; Some Applications of String Algorithms in Human-Computer Interaction. In Algorithms and Applications; Springer: Berlin Heidelberg, 2010; Vol. 6060, pp. 196–209. [Google Scholar]
Räihä, K.-J.; Aula, A.; Majaranta, P.; Rantala, H.; Koivunen, K. Costabile, M., Patern, F., Eds.; Static Visualization of Temporal Eye-Tracking Data. In HumanComputer Interaction–INTERACT 2005; Springer: Berlin Heidelberg, 2005; Vol. 3585, p. p. 946949. [Google Scholar]
Rämä, P.; Baccino, T. Eye fixation-related potentials (EFRPs) during object identification. Visual Neuroscience 2010, 27, 187–192. [Google Scholar]
Rayner, K.; Smith, T. J.; Malcolm, G. L.; Henderson, J. M. Eye Movements and Visual Encoding During Scene Perception. Psychological Science 2009, 20, 6–10. [Google Scholar] [PubMed]
Santella, A.; DeCarlo, D. Robust Clustering of Eye Movement Recordings for Quantification of Visual Interest. In Proceedings of the 2004 Symposium on Eye Tracking Research & Applications; ACM: New York, NY, USA, 2004; pp. 27–34. [Google Scholar]
Shen, C.; Zhao, Q. Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Webpage Saliency. In Computer Vision ECCV 2014; Springer International Publishing, 2014; Vol. 8695, pp. 33–46. [Google Scholar]
Sutcliffe, A.; Namoun, A. Predicting user attention in complex web pages. Behaviour & Information Technology 2012, 31(7), 679–695. [Google Scholar]
Takeuchi, H.; Habuchi, Y. A Quantitative Method for Analyzing Scan Path Data Obtained by Eye Tracker. In IEEE Symposium on Computational Intelligence and Data Mining 2007, 2007, 283–286. [Google Scholar]
Takeuchi, H.; Matsuda, N. Scan-Path Analysis by the String-Edit Method Considering Fixation Duration. In 2012 Joint 6th International Conference on Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS); 2012; pp. 1724–1728. [Google Scholar]
Tsang, H. Y.; Tory, M.; Swindells, C. eSeeTrack–Visualizing Sequential Fixation Patterns. IEEE Transactions on Visualization and Computer Graphics 2010, 16(6), 953–962. [Google Scholar]
Underwood, G.; Humphrey, K.; Foulsham, T. Holzinger, A., Ed.; Knowledge-Based Patterns of Remembering: Eye Movement Scanpaths Reflect Domain Experience. In HCI and Usability for Education and Work; Springer: Berlin Heidelberg, 2008; Vol. 5298, pp. 125–144. [Google Scholar]
Velichkovsky, B. M.; Rothert, A.; Kopf, M.; Dornhöfer, S. M.; Joos, M. Towards an express-diagnostics for level of processing and hazard perception. Transportation Research Part F: Traffic Psychology and Behaviour 2002, 5(2), 145–156. [Google Scholar]
West, J. M.; Haake, A. R.; Rozanski, E. P.; Karn, K. S. eyePatterns: Software for Identifying Patterns and Similarities Across Fixation Sequences. In Proceedings of the 2006 Symposium on Eye Tracking Research & Applications; ACM: New York, NY, USA, 2006; pp. 149–154. [Google Scholar]
Yesilada, Y.; Harper, S.; Eraslan, S. Experiential Transcoding: An EyeTracking Approach. In Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility; 2013; p. 30. [Google Scholar]
Yesilada, Y.; Jay, C.; Stevens, R.; Harper, S. Validating the Use and Role of Visual Elements of Web Pages in Navigation with an Eye-Tracking Study. In Proceedings of the 17th international conference on World Wide Web; ACM: New York, NY, USA, 2008; pp. 11–20. [Google Scholar]

Figure 1. An example of a user scanpath on the HCW Travel web page which is segmented into its visual elements—This web page was used for the review of scanpath analysis techniques.

Figure 2. A grid-layout segmentation with ScanMatch technique of Cristino et al. (2010) where each element is represented with one upper-case letter and one lower-case letter.

Figure 3. An example part of the tree visualisation of eSeeTrack analysis tool.

Figure 4. Examples of Fast and Free Critical Intervals (from Mast & Burmester (2011)).

Figure 5. The T-Pattern ABCDE that occurs in two behaviour sequences three times (from Mast & Burmester (2011)).

Figure 6. Merging the seventh and ninth scanpaths in Table 1 with Dotplots algorithm (from Eraslan et al. (2013)).

Figure 7. The hierarchical clustering of the scanpaths in Table 1 with the standard Dotplots algorithm.

Figure 8. The hierarchical clustering of the scanpaths in Table 1 with eMINE scanpath algorithm (from Yesilada et al. (2013)).

Figure 9. Gaze Plots.

Table 1. Individual scanpaths of ten users on the HCW Travel web page in terms of its visual elements.

Table 2. An overview of scanpath analysis techniques—The references are provided in the text [✓: Exists, ✗: Does Not Exist, †: A technique that could not be applied to the dataset (see the Methodology section for the reasons)].

Table 3. The String-edit algorithm applied to the fifth and seventh scanpaths in Table 1 [⇋: Substitution, +/-: Insertion/Deletion, =: None].

Table 4. The String-edit distances between the scanpaths in Table 1.

Table 5. The String-edit algorithm applied to the fifth and seventh scanpaths in Table 1 with a substitution matrix that is shown in Table 6 [⇋: Substitution, +/-: Insertion/Deletion, =: None].

Table 6. A substitution matrix generated for the HCW Travel web page by using the Euclidean Distance that is suggested by Takeuchi & Habuchi (2007).

Table 7. A transition matrix for the scanpaths in Table 1 (from Yesilada et al. (2013) and Eraslan et al. (2013)).

Table 8. Searching for the exact pattern ED in the scanpaths in Table 1 by using eyePatterns analysis tool.

Table 9. Discovering patterns with the length four in the scanpaths in Table 1 by using eyePatterns analysis tool.

Table 10. The position-based weighted model of Sutcliffe & Namoun (2012) is applied to the scanpaths in Table 1.

Table 11. The position-based weighted model of Holsanova et al. (2006) is applied to the scanpaths in Table 1.

Share and Cite

MDPI and ACS Style

Eraslan, S.; Yesilada, Y.; Harper, S. Eye Tracking Scanpath Analysis Techniques on Web Pages: A Survey, Evaluation and Comparison. J. Eye Mov. Res. 2016, 9, 1-19. https://doi.org/10.16910/jemr.9.1.2

AMA Style

Eraslan S, Yesilada Y, Harper S. Eye Tracking Scanpath Analysis Techniques on Web Pages: A Survey, Evaluation and Comparison. Journal of Eye Movement Research. 2016; 9(1):1-19. https://doi.org/10.16910/jemr.9.1.2

Chicago/Turabian Style

Eraslan, Sukru, Yeliz Yesilada, and Simon Harper. 2016. "Eye Tracking Scanpath Analysis Techniques on Web Pages: A Survey, Evaluation and Comparison" Journal of Eye Movement Research 9, no. 1: 1-19. https://doi.org/10.16910/jemr.9.1.2

APA Style

Eraslan, S., Yesilada, Y., & Harper, S. (2016). Eye Tracking Scanpath Analysis Techniques on Web Pages: A Survey, Evaluation and Comparison. Journal of Eye Movement Research, 9(1), 1-19. https://doi.org/10.16910/jemr.9.1.2

Article Menu

Eye Tracking Scanpath Analysis Techniques on Web Pages: A Survey, Evaluation and Comparison

Abstract

Introduction

Methodology

Dataset

Visual Elements

User Scanpaths in Terms of Visual Elements

Scanpath Analysis Techniques

Similarity/Dissimilarity Calculation

Transition Probability Calculation

Pattern Detection

Common Scanpath Identification

Discussion

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI