Introduction
Eye movements can be a useful source of information to infer cognitive processes (
Buswell, 1935;
Yarbus, 1967;
Rayner, 1998;
Viviani, 1990; Henderson, 2003). Among the various top-down factors that guide our gaze, expertise plays a prominent role, and can effectively drive the ocular exploratory behavior. The scanpath, i.e., the sequence of saccades and fixations, (
Noton & Stark, 1971), of expert and novice observers differs when they look at pictures or art pieces (Nodine, Locher, & Krupinski, 1993; Zangemeister, Sherman, & Stark, 1995;
Vogt & Magnussen, 2007;
Humphrey & Underwood, 2009;
Pihko et al., 2011), interpret medical images (Nodine, Kundel, Lauver, & Toto, 1996;
Donovan & Manning, 2007), drive (
Underwood, 1998), read music (Waters, Underwood, & Findlay, 1997), play chess (
Chase & Simon, 1973;
Reingold & Sheridan, 2011), practice or watch sports (
Vickers, 2007). Thus, from the characteristics of eye movements it is possible to extrapolate important information about expertise in several knowledge and activity domains.
We have recently provided evidence that the eye movements of novice and expert billiard players differ when they have to predict the outcome of partially-occluded single shots (Crespi, Robino, Silva, & de’Sperati, 2012). Specifically, in order to solve the visual prediction task, novices tended to adopt a strategy based on mental extrapolation of the ball trajectory, whereas experts monitored certain diagnostic points along the trajectory. By exploiting the eye movements differences of novices and experts, we could also identify the temporal boundaries of the single billiard shots contained in a videoclip, thus in fact realizing a sort of physiologically-based video parser (Robino, Crespi, Silva, & de’Sperati, 2012).
In the present study we extend our previous work and ask whether the differences in eye movements of novices and experts are robust enough to detect expertise i) at the individual level, and ii) under not only adhoc, controlled conditions but also naturalistic, unconstrained conditions i.e., during free viewing of a billiard match without a specific task. Also, iii) we aim to detect the “expert’s eye” by analyzing the data regardless of the visual stimulus, that is, relying only on the oculomotor behaviour. Meeting these three conditions would be an important step towards automatic expertise detection.
Quantifying reliably and uniquely a complex behavior such as a sequence of exploratory eye movements (the so-called scanpath) is a non-trivial challenge. The existing methods can be classified into two broad classes, both pioneered by Larry Stark (see Hacisalihzade, Stark, & Allen, 1992, for a combined use of both). The first approach aims at characterizing the spatial distribution of fixations on the scene (spatio-temporal, in case of dynamic scenes) and to provide some similarity metrics (
Brandt & Stark, 1997). Methods following this approach can be further distinguished as
content-driven or
data-driven (
Grindinger et al., 2011).
The content-driven approach largely relies upon Regions Of Interest (ROIs), identified a priori in the stimulus and analyzed in terms of fixations falling inside them. The data-driven approach, in contrast, directly exploits scanpaths, or features extracted from them, independent of whatever was presented as the stimulus. An important advantage of the latter approach is that it obviates the need of arbitrary ROI definition.
The similarity of two scanpaths can be measured in principle by using ROI-based methods followed by coding of the sequence in which ROIs are visually inspected. A common method is the string edit, in which a string is defined by assigning each ROI a discrete symbol (e.g., a character), so that each scanpath is transformed in a string of symbols. Then the editing cost of transforming one string into another one is computed (e.g., by computing the Levenshtein distance, which measures the editing cost of transforming one string into another one, Brandt & Stark, 1997; Choi, Mosley, & Stark, 1995;
Hacisalihzade et al., 1992;
Foulsham & Underwood, 2008). Other methods are also used, such as the Needleman- Wunsch algorithm borrowed from bioinformatics (Cristino, Mathôt, Theeuwes, & Gilchrist, 2010). However, ROI based method suffer from well-known limitations, mostly related to how to cluster and regionalize fixations (
Hacisalihzade et al., 1992;
Privitera & Stark, 2000). For instance many methods rely upon dividing the image into a regular grid, but this way of operating loses any reference to the content of the image, and introduces quantization errors; in this limit case string edit techniques turns into a
data-driven approach, while exploiting an oversimplified representation of data. Semantic ROIs could be used instead (
Privitera & Stark, 2000;
Josephson & Holmes, 2006), but these have by definition different sizes, and therefore the approximation of fixation position can be very coarse and subtle differences in oculo-motor behavior cancelled. In the last few years, heatmaps have become a very popular,
datadriven, tool: heatmaps are plots in which a given oculomotor quantity (typically, the fixation dwell-time) is coded as colored, semi-transparent “bubbles” superimposed to the bi-dimensional image. This graphical representation is very appealing, but it is mostly used to convey an immediate, qualitative impression of the attended regions within a figure (see, however, Caldara & Miellet, 2011;
Crespi et al., 2012). Other methods have also been proposed, based on the construction of an average scanpath (Hembrooke, Feusner, & Gay, 2006), or that minimize an energy function (Dempere-Marco, Hu, Ellis, Hansell, & Yang, 2006), or that end up with a multidimensional vector rather than a single scalar quantity (Jarodzka, Holmqvist, & Nyström, 2010). A main concern of these approaches is to quantify the similarity between scanpaths, which is a crucial issue in certain applications where an average observer is needed (
Boccignone et al., 2008).
The second approach, again pioneered by Stark, takes straightforwardly into account the very stochastic nature of scanpaths. Indeed, gaze-shift processes, and especially saccadic eye movements, exhibit noisy, idiosyncratic variation of visual exploration by different observers viewing the same scene, or even by the same subject along different trials; this is a well-known issue debated since the early eye tracking studies by
Ellis and Stark (
1986), who modeled sequences using Markov transition probability matrices identified from experimental sequences (see Hayes, Petrov, & Sederberg, 2011 for a detailed discussion on methods aiming at capturing statistical regularities in temporally extended eye movement sequences). Here we follow this second approach or, more precisely, the very rationale behind such approach: namely, we consider the gaze shift behavior as a realization of a stochastic process (
Feng, 2006;
Brockmann & Geisel, 2000;
Boccignone & Ferraro, 2014,
2013b,
2013a). In other terms, the distribution functions and the temporal dynamics of eye movements are specified by the stochastic process. In this perspective the visual exploratory features we can measure (saccade amplitude and direction, fixation duration) can be thought of as random variables generated by such a process, however complex it may be (
Tatler & Vincent, 2008,
2009).
In order to discriminate between different oculomotor behavior exhibited by novices and experts, there are two options: to provide a model for the generating process, or to exploit the generated oculomotor pattern. For what concerns the first option, investigating expertise differences in dynamic tasks, such as a billiard match, is a complex modeling issue, and involves aspects far beyond the limits of current computational models (
Borji & Itti, 2013). The second option, i.e., analyzing the generated oculomotor pattern, relies upon the rationale that the key requirements of expertise are discriminability and consistency across different stimuli (Shanteau, Weiss, Thomas, & Pounds, 2002), properties that should be reflected in the generated pattern.
In the present study we applied machine learning techniques to discriminate eye movements of experts and novices at the individual level. Among machine learning techniques, the Support Vector Machine (SVM, Cristianini & Shawe-Taylor, 2000) is widely used to classify noisy signals (see Murphy, 2012 for a general discussion), including eye movement data (Lagun, Manzanares, Zola, Buffalo, & Agichtein, 2011;
Eivazi & Bednarik, 2011; Bednarik, Kinnunen, Mihaila, & Fränti, 2005; Vig, Dorr, & Barth, 2009). Methods simpler than SVM have also been used to classify eye movements (e.g., Henderson, 2003).
Specifically, in this study we have tried to deal with two problems. First, machine learning approaches as usually applied to the analysis of eye-movements tend to overlook the feature representation problem. In order to spot behavioral characteristics - expertise or cognitive impairments - in a
data-driven way, a scanpath can be analyzed by using several features (e.g.,
Lagun et al., 2011). Each feature, in turn, might be differently related to a number of factors, from low-level biomechanics, to learnt knowledge of the structure of the world and the distribution of objects of interest (
Tatler & Vincent, 2009). Thus, within a machine learning perspective, we are dealing with features from different sources and where there may be limited or no a priori knowledge of their significance and contribution to the classification task. Clearly, concatenating all the features into a single feature space does not guarantee an optimum performance, while facing the “curse of dimensionality” problem.
Second, though SVM methodology has proven to be a powerful one, it has a number of well-known limitations (
Tipping, 2001;
Murphy, 2012). Although relatively sparse, SVMs make unnecessarily liberal use of basis functions since the number of support vectors required typically grows linearly with the size of the training set; predictions are not probabilistic, which is particularly crucial in classification where posterior probabilities of class membership are necessary to adapt to varying class priors and asymmetric misclassification cost; the kernel function must satisfy Mercer’s condition, namely, it must be the continuous symmetric kernel of a positive integral operator.
In order to cope with these problems, we have exploited a ground framework for feature space fusion followed by a Bayesian sparse classification technique (
Tipping, 2001) with the ability of achieving sparse solutions that utilize only a subset of the basis functions. In particular, we have considered the basic oculomotor parameters of saccade amplitude, direction, and fixation duration as different information sources that are combined within a composite kernel space level and classified through a Relevance Vector Machine (RVM), namely a multiple-kernel RVM (mRVM, (Psorakis, Damoulas, & Girolami, 2010;
Damoulas & Girolami, 2009a)). See
Appendix A, for a detailed discussion of the RVM approach and its main differences with respect to SVMs.
To the best of our knowledge this approach has never been used with eye movement data.
Results
Expert and novice observers exhibited rather similar exploratory eye movements when watching a given stimulus - at least this is the qualitative impression when observing the cumulative gaze position over time condensed in single snapshots (
Figure 2). Examples of individual scanpaths are illustrated in
Figure 3. Here too, as in the pooled data of
Figure 2, a certain degree of similarity between experts and novices can be appreciated at visual inspection. For example, in the single shots the ball trajectories can be often glimpsed from the raw scanpaths. We quantified the scanpaths by means of three oculomotor features, namely, fixation duration, gaze shift amplitude and gaze shift direction, which were used as input to the classifier either as single features or concatenated in pairs or in a triplet.
The distributions of these basic oculomotor features looked very similar between experts and novices (
Figure 4), with very close median values (fixation duration - novices vs. experts: 247 vs. 231 ms, 231 vs. 215 ms, 247 vs. 230 ms, respectively for SS, LS and Match; gaze shift amplitude - novices vs. experts: 2.219 vs. 2.458 deg, 2.383 vs. 2.525 deg, 2.076 vs. 2.150 deg, respectively for SS, LS and Match). Also the shapes of the gaze shift direction distributions looked rather similar (polar plots in
Figure 4). Despite this apparent similarity, however, in all cases there were statistically significant differences between experts’ and novices’ distributions (2-samples Kolmogorov-Smirnov test for fixation duration and gaze shift amplitude, always
p < 0.01; 2-samples Kuiper test for gaze shift direction, always
p < 0.01). Indeed, across the 3 shots experts had on average slightly shorter fixations (−16 ms), and somewhat larger and more counterclockwise-rotated gaze shifts (+0.15 deg and +0.336 rad).
Such small differences, however, can be exploited to discriminate between novices and experts when raw features are processed by a suitable classifier. For this purposes a RVM has been chosen as classifier. We first used equal kernel functions (linear and Gaussian) for all feature channels (cfr.,
Figure 1), while taking into consideration different numbers of sources/feature spaces
s. Analysis of the results showed that classifier performances for the features
xθ derived from saccadic directions were worse in case of the Gaussian classifier: that lead us to use mixed functions kernels, namely a Gaussian kernel for the length of shifts and fixation times, and a linear one for directions.
The outcomes obtained from the different kernels were quite similar, as can be seen in Supplementary
Table 1. Therefore, the following analysis is performed solely on the results obtained with the multiple kernel approach, because it is a more flexible and novel than single kernel methods. Moreover, except for the case of short shots, it was the only approach where the best performance was attained with more than one feature or combination of features - actually three for the long shots and two on the match - thus indicating a higher efficiency than the other approaches.
Table 1 and
Table 2 report results in terms of the accuracy (percent correct) and discriminability (
d′), respectively. Accuracy was defined as
Nc/Ntot, where
Nc is the number of trials in which correct classification was attained, regardless of the stimulus (novice or expert). Discriminability was computed as
ZH −
ZF, where
ZH is the
z-transformed hit rate (a hit being a “novice“ classification given a “novice“ stimulus) and
ZF is the
z-transformed false alarms rate (a false alarm being a “novice“ classification given an “expert“ stimulus). Discriminability represents the capability of the classifier to separate novices and experts, regardless of the decision criterion.
For both accuracy and discriminability the reported tables represent the mean values across the 5 classifier repetitions, separately for each feature or feature combination and for each stimulus typology. We define the best performance as the highest classification score reported within each stimulus typology (short shots, long shots, match), regardless of which feature, or combination thereof, contributed to it. In case of ties, the best performance was stipulated to be the one in which both accuracy and discriminability were highest. From
Table 1 it can be seen that the classification rate was rather good (range: 63.80% − 88.09%) and always above chance (
p < 0.01 even for the lowest classification rate, one-tail binomial test), with a rather high best performance within each stimulus typology (88.09%, 86.19% and 81.90%, marked in green; red denotes the worst performances within each stimulus typology).
In the best case (88.09%) this amounts to saying that the RVM correctly distinguished as being a novice or an expert 37 out of 42 observers, with a moderate bias to classify correctly novices better than experts (predictive value for novices: 0.917; predictive value for experts: 0.851). By considering the best performances, which show the achievement of the classifier, accuracy was higher with the short shots (88.09%) than the match (81.90%), with the performance with the long shots being somewhat intermediate (86.19%). A oneway ANOVA among the 3 best performances showed a marginally significant effect of classification conditions (either stimulus type or feature; F(2, 12) = 3.547, p = 0.062). Post-hoc LSD pairwise tests indicated that, whereas the two former figures (88.09% and 86.19%) did not differ significantly from each other (p > 0.4), the difference with the accuracy measure obtained with the match stimulus (81.90%) was statistically significant or marginally significant (p = 0.023 and p = 0.097, respectively).
No clear tendency could be appreciated as to which feature, or combination of features, best contributed to the classification. From
Table 1 it can be seen that in no case the same feature, or combination thereof, determined the best accuracy across the three stimulus typologies. In terms of mean performance, using single features provided a somewhat better result (80.31%) than combining them in pairs (75.13%) or triplet (76.82%). The best classification performance within each stimulus category was never obtained with the triplet of features, though only in one case the triplet determined the worst performance (67.61%). An almost identical pattern of results was obtained by computing
d′ as index of performance (
Table 2). Again, the best performance within each stimulus category was higher with the shots than with the match. Interestingly, also the three worst performances (marked in red in the Tables) were coincident for accuracy and discriminability, and were higher for the long shots than the short shots.
Discussion
In this study we have applied machine learning techniques (MKL-based feature combination and RVM) to analyze the oculomotor behavior of individual observers engaged in a visual task, with the aim of classifying them as experts or novices. To this end, we have administered to 42 subjects, half novices and half expert billiard players, various visual stimuli and tasks. As stimuli we used a portion of a real match, videorecorded from the top, containing several shots of variable length and complexity, as well as a number of ad-hoc individual shots, also videorecorded from the top in a real setting. The match stimulus was associated to a free-viewing observation condition, while for the individual shots, which were occluded in the final part of the trajectory, observers were asked to predict the outcome of the shot, which placed implicitly a significant constraint on the deployment of visuospatial attention, and, consequently, on the overt scanpath. Thus, we demonstrated that, in both constrained and unconstrained naturalistic viewing conditions, eye movements contain enough information to detect an internal state such as expertise.
To our knowledge this is the first time that MKLbased feature combination and RVM techniques are applied to eye movement data. A very recent study by Henderson, Shinkareva, Wang, Luke, and Olejarczyk (2013) inferred successfully the observers’ cognitive task (search, memorizing, reading) through classification. However, for the purpose of that study, a dedicated classifier was trained for each observer, and a simple baseline technique as the Na¨ıve Bayes’ classifier was sufficient. Clearly, when addressing a scenario in which individual observers are classified as belonging to one or another population, more sophisticated machine learning tools are needed. Many studies used an approach based on SVM classification (e.g.,
Lagun et al., 2011;
Eivazi & Bednarik, 2011;
Bednarik et al., 2005;
Vig et al., 2009;
Tseng et al., 2013; Bulling, Ward, Gellersen, & Trster, 2011; Bednarik, Vrzakova, & Hradis, 2012). Beyond some limitations inherent to SVM (
Tipping, 2001;
Murphy, 2012), it is worth pointing out that the final classification step is just one side of the problem when spotting expertise from scanpaths in a
data-driven way, the other side being how features are best combined and exploited. As anticipated in the Introduction, to address these issues we have adopted a feature fusion strategy relying on multiple kernel combination.
A comment is due on the choice of the features. The feature we have used are typical basic parameters that characterize saccadic exploration of static scenes. However, our stimuli contained also moving elements (e.g., the ball motion) capable of eliciting smooth pursuit eye movements, which are characterized by different parameters. Thus, it may be argued that using saccade parameters is not too appropriate. Let us firstly note that in our experiment smooth pursuit eye movements were in fact not frequent. Although this may sound surprising, consider that our observers were not instructed to follow the moving target; also, the ball motion occupied only a minor part of the overall stimulus duration, and furthermore its motion was not continuous but interrupted by bounces, which implied rather frequent catchup/anticipatory saccades. To take specific figures, consider the shot trials (
Crespi et al., 2012): the ball was in motion for about 2.1 seconds in each trial, on average. During this short time window, the eyes spent on average only 63% of the time in slow motion (tangential velocity between 0.5 and 40 deg/s with a minimum duration of 100 ms), which amounts to about 1.3 seconds per trial. Considering that the mean recording window within a trial was 12.4 seconds, this indicates that smooth pursuit eye movements contributed to the overall eye movements pattern for only about 10% of the time. We did not measure all these parameters in the match task, but we can assume comparable figures. Secondly, much of the difference between experts and novices was found when the ball was not moving (ROI analysis, figs. 5 and 6 in Crespi et al., 2012; VDA peaks, fig. 2 in Robino et al., 2012). Thirdly, and more importantly, from the perspective of machine learning, segmenting a gradually changing signal into discrete elements and using them as features for the classifier is perfectly legitimate. Using virtual fixations or whatever other signal preprocessing of the oculomotor traces before the classification step is just a matter of convenience, as it is well known that machine learning techniques are blind as to the nature of the underlying processes. To the extent that features bring information, they work (features do not introduce new information).
Indeed, by combining only three basic parameters of visual exploration, the overall classification accuracy, expressed as percent correct and averaged across stimulus types and oculomotor features, scored a respectable 78%. More interesting is to consider the best performance for each stimulus type, which testifies the achievement of the classifier, and which depends on the features used. The best performance ranged between 81.90% and 88.09% - 1.852 to 2.399 in terms of
d′, which is a quite remarkable result, especially considering that a naturalistic, unconstrained viewing condition was included (M). Beside confirming that eye movements contain a signature of billiard expertise (
Crespi et al., 2012), this finding demonstrates that, even ignoring “where” the gaze is directed, i.e., to which objects or events overt visuospatial attention is allocated (
content-driven approach), the “expert’s eye” can be identified at the individual level from “how” the gaze is shifted, i.e, from basic oculomotor features such as saccade amplitude and direction and fixation duration (
data-driven approach). Clearly, this does not amount to saying that the physiology of eye movements is modified by expertise, nor that expertise in a given field could be detected by using whatever visual stimulus, but simply that there is not always the need to match the oculomotor features with the visual features, a common approach that we also used in our past work (
Crespi et al., 2012;
Robino et al., 2012). Notably, expertise detection was successful at the level of individual observers (see below).
The classification accuracy was higher with the shots than the match. This difference, despite being small, is in keeping with the idea that the individual scanpath provides an indication about the degree of “expertise allocation”, that is, how much an observer is actually using knowledge: The more expertise is used, the larger the systematic differences in visual exploration between a novice and an expert, hence the higher the classification performance. For example, the prediction task in which participants had to make a rapid guess as to the outcome of the shots (“will the ball hit the central skittle?”) would seem to leave little room for free ocular exploration, especially for the short shots, thus reducing the idiosyncratic component of ocular exploration. As a consequence, the systematic differences between novices and experts emerge more clearly. Conversely, the fact that during match observation observers had no specific task, and that the pace of the shots was relatively relaxed, allowed more free eye movements, especially after the shots. In other words, the difference between the classification accuracy when the shots rather than the match stimulus is used might depend on the different degree of “expertise allocation” in the two conditions, being higher in the shot prediction task than in the relatively unconstrained match observation task. Indeed, we had previously proposed that, during billiard match observation, it is precisely the alternation between the focusing of attention on the upcoming shot and the post-shot relaxation that allowed us to successfully parse the shot alternation exclusively on the basis of the scanpath differences between novices and experts (
Robino et al., 2012).
The above considerations underscore the importance of selecting a proper test setting in order to detect expertise from the scanpath. On the one hand, it is clearly better to find the conditions (i.e., stimuli and tasks) that best elicit the use of expertise. These should be as stringent and controlled as possible, such as for example the ad-hoc shots coupled with the prediction task that we have used, where the highest classification performance was attained. On the other hand, it is intriguing that the RVM yielded a high accuracy, though not the highest, also with the match stimulus (81.90%). Considering the uncontrolled variability of a real billiard match, coupled with the lack of a specific task for the observers, we think this is a remarkable achievement in terms of capability to extract information from eye movements in naturalistic conditions. Pervasive behavioural monitoring of real-life visual exploration through wearable eye trackers may take advantage of high-performance classification methods such as RVM (
Schumann et al., 2008; Hart, Onceanu, Sohn, Wightman, & Vertegaal, 2009; Noris, Nadel, Barker, Hadjikhani, & Billard, 2012; Vidal, Turner, Bulling, & Gellersen, 2012). Furthermore, especially for real-life conditions, it is crucial that the scanpath analysis can be
data-driven, at least as much as possible, as a
content-driven approach would inevitably require manual labeling of each video frame in terms of semantically-identified regions or visual elements. Indeed, this would preclude an automatic analysis of real-life scanpaths, and even more so for a real-time analysis.
Firstly, our findings suggest that a number of lowlevel physiological parameters of visual exploration behavior could be suitably used to automatically decode inner cognitive processes to the benefit of BCI systems. In the field of neuro-rehabilitation, for example, many efforts are directed at decoding motor imagery and covert motor commands from brain signals with the goal of driving prosthetic devices and boosting motor improvement through neurofeedback training (
Silvoni et al., 2011). Central to this endeavor is the capability to extract in the simplest possible way useful neural information from subjects engaged in some sort of mental imagery tasks. For this, brain activity is recorded via amplifiers and decoded using on-line classification algorithms. Brain signals are not the only physiological correlate of mental imagery, however. Eye movements have been shown to tag in a precise way an elusive covert process such as mental imagery (
Brandt & Stark, 1997; Johansson, Holsanova, Dewhurst, & Holmqvist, 2012), and, more specifically, dynamic motion imagery (
de’Sperati, 1999,
2003;
de’Sperati & Santandrea, 2005; Jonikaitis, Deubel, & de’Sperati, 2009;
Crespi et al., 2012). Thus, the methodological approach that we have described in the present study might be profitably applied to extract eye movements information to drive BCI external devices. For example, automatically classifying good and bad imagery performance could help to refine the mental training procedures until expertise is achieved, or to avoid that incorrect signals are erroneously sent to the BCI device. Also, a classifier could detect spurious eye movements - or their absence - that might mean that visuospatial attention has been drawn from the current imagery task. In sum, an oculomotorbased channel with efficient classification capabilities could be suitably paired to EEG-based or fMRI-based channels to improve mind reading performance in hybrid, multiple input signal sources BCI systems (Amiri, Fazel-Rezai, & Asadpour, 2013).
Another potential application of our approach is the development of an expertise test based on the “expert’s eye”. Clearly, a general expertise test cannot exist. Expertise is specific to particular domains, and it can be of various types and qualities (e.g., declarative-conceptual, procedural, strategic; (
De Jong & Ferguson-Hessler, 1996). Although expertise is ultimately established by directly measuring performance (e.g., through questionnaire scores, as in school grades, or with official rankings, as in sports), an indirect assessment of the visual exploratory behaviour may uncover subtle aspects underlying expertise in all those cases where visual information is crucial (e.g. understanding the working of a mechanical apparatus, or providing legal authentication of a painting, or playing chess, or detecting faults in sports). For example, in our previous study on billiard expertise we have documented, through eye movement recording, the passage from intuitive, procedural knowledge based on mental imagery, a strategy typical of novices, to rulebased, conceptual knowledge, which was expressed only in experts (
Crespi et al., 2012). Incidentally, this may explain the small bias that we have found with the best performance towards a higher misclassification of experts than novices: because experts can adopt a novice’s strategy but a novice cannot adopt an expert’s strategy, a classifier can be fooled by an expert but not by a novice.
The capability to detect expertise automatically, that is, without the need of semantically analyzing which particular objects and events of a visual scene the gaze of an observer is directed to, will enhance “mind reading” methods. However, it should be borne in mind that a psychophysiological test for the expert’s eye would not substitute direct measures of expertise, but rather complement them. Thus, finding a mismatch between the output of an automatic “ocular expertmeter” and the outcome of direct evaluation of expertise obtained with classical methods (e.g., testing, questionnaires) could raise issues as to what strategy or what evidence has actually been used. For example, assuming that the scanpath is indicative of expertise, the finding of an anomalous scanpath in inspecting the figures of a difficult geometry exam would perhaps question what mental procedure was used by a student who nonetheless provided all correct answers; An alternative interpretation could be that the student answered correctly by chance.
The automatic recognition of individual traits through behavioral analyses is an intensely pursued goal. Biometrics is a field of study aimed at identifying individuals through their unique biological characteristics or behavioral patterns. Biological methods in biometrics include for example fingerprint, face, or iris verification, whereas behavioral methods include voice, signature, typing or gait analysis. Recently, behavioral biometrics has been applied to eye movements, with the goal of identifying individuals through their oculomotor patterns (
Holland & Komogortsev, 2011), even in a task-independent way (
Bednarik et al., 2005; Kinnunen, Sedlak, & Bednarik, 2010). In these studies various methods to analyze eye movements have been used, with an ensuing performance however still short of the accepted standards for biometrics systems. Our work was aimed at distinguishing a novice from an expert, that is, two classes of individuals rather than a given individual as in biometrics. Though, the high classification rates that we obtained, even in a poorly constrained scenario such as match observation, suggests that our approach based on feature space fusion and a Bayesian sparse classifier could be profitably applied to personal identification as well. It is interesting that a similar set of eye movements features (e.g., duration and amplitude of saccades) can be used successfully for both individual and categorical classification (personal identity or expertise). This seems to confirm that these basic features are more than just oculomotor traits.