Influence of Number, Location and Size of Faces on Gaze in Video
Abstract
:Introduction
Eye movement experiment
- Stimuli: Fifty-three videos (25 fps, 720 × 576 pixels per frame) were selected from different video sources, for example: indoor scenes, outdoor, scenes of day and night (Figure 1). The videos are converted to grayscale before presenting them to the participants.
- Participants: Fifteen young adults (3 women and 12 men, aged 23-40 years) participated in the experiment. All participants had normal, or corrected to normal vision. Each participant, sitting with his/her head stabilized on a chin rest, in front of a monitor at 57cm viewing distance (40° × 30° field of view), was instructed to look at the videos without any task.
- Apparatus: An eye tracker (SR Research EyeLink II) was used to record eye movements. It is composed of three miniature cameras mounted on a helmet, two in front of each eye to provide binocular tracking, with the third on a head-band for head tracking. The recordings from the first two cameras, when compensated for head movements, give the gaze direction of the participant.
Method
Database
Influencing factors
Number: is a simple count of faces present. It determines the complexity of the scene. For clarity, we only consider cases of frames with one face and two faces.
Eccentricity: is the relative distance from a participant’s fixation on screen to the edge of nearest face ellipse in degrees. In Figure 3, (d − r(α)) is eccentricity E of face ellipse with origin (Ox, Oy) from fixation position (Cx, Cy).
Area: is the two-dimensional surface of face ellipse in squared degrees. It is calculated as πrarb, where r1 and r2 are the face ellipse’s major and minor radii respectively.
Closeness: between the faces f1 and f2 in the case of two faces is the euclidean distance between the two face regions. In Figure 4, (d − (r(α) + r(β)) is the closeness C between the face ellipses with origins Of1 and Of2.
Metrics
Minimum fixation distance: or Shortest Euclidean distance (Wang & Pomplun, 2012) from faces in a scene to fixation. The distance is computed from the fixation position to the face region of interest—the edge of face ellipse. Essentially, it is equal to the eccentricity E of the face closest to the fixation.
Fixation proportion We categorized the fixations on scenes with faces into two types: fixations landing inside a face, called ‘on-face’ fixations (oF), and fixations landing outside a face, called ‘not-on-face’ fixations (nF). This was done by comparing fixation coordinates to a face, represented by an elliptical mask equal to the face dimensions plus 1° of margin. In the study, we used the proportion of the two types of fixations, normalized by the total surface area of the faces. Here, we did not consider fixations for scenes with no faces to fixate upon.
Fixation duration. Cognitive systems interact with the scene to determine where, and how long to fixate. The position of fixation points toward the region of interest, while its duration amounts to the attentional processing directed to that location (Just & Carpenter, 1976; Rayner, 1998; Henderson, 2007).
Comparison with face maps. Different criteria are used to predict the likelihood of different regions attracting attention in a scene. It is often done by comparing such regions of interest to participant eye movements (Itti, Koch, & Niebur, 1998; Parkhurst, Law, & Niebur, 2002; Tatler, Baddeley, & Gilchrist, 2005; Peters, Iyer, Itti, & C., 2005; Torralba, Oliva, Castelhano, & Henderson, 2006; Le Meur, Le Callet, Barba, & Thoreau, 2006).
- Face maps: We computed face map Mf (Figure 7b) for each frame by hand-labeling the position of the face using a bounding box, and then applying a 2D Gaussian to it. The dimensions of the bounding box determine the variance of the 2D Gaussian from its origin in horizontal and vertical axis, whereas the amplitude of the function was kept constant for all faces. All values outside the Gaussianed elliptical face were set to zero.
- Eye fixation maps: The eye fixation maps were defined for each fixation made by a participant. It is simply the fixation position Gaussianed for one participant equivalent to 0.5° of the visual field—the size of the fovea with highest resolution. These maps, denoted as Mh, were used to evaluate faces using the comparison criterion. A sample Mh map is illustrated in Figure 5c.
- Comparison criterion: To compute the comparison criterion, for instance for the first fixation, we compare Mh for each participant to all Mf maps for the entire duration of the fixation. The values are then averaged to get a score for the participant. Likewise, this process is repeated for all participants. Finally, all individual scores from all participants are averaged to get the score for this first fixation. We do the same for the five fixations and process a mean score of the five scores. We looked at each fixation separately but obtained no difference compared to the mean of five fixations. Note that in the case of face maps Mf, the dimensions of the face define the standard deviations of the applied Gaussian where all values lying outside the resulting face ellipse are set to zero. In this study, we perform ROC (Receiver Operating Characteristics) analysis between two maps: a face map Mf and a eye fixation map Mh. The maps are processed as a binary classifier applied to every pixel; classified as fixated (or salient) or as not fixated (or not salient). A simple threshold is systematically moved between minimum and maximum values of the map. For each pair of thresholds, we get four numbers: the true positives (TP), the false positives (FP), the false negatives (FN) and the true negatives (TN). A ROC curve plots the false positive rate as a function of the true positive rate. The ROC area or the AUC (Area Under Curve) obtained by a trapezoid approximation, measures the classification performance. The trapezoidal rule used:
Statistical analysis
Results
Minimum fixation distance
Fixation proportion
Fixation duration
Comparison with face maps
Case of one face
Case of two faces
Discussion
Funding
1 | 572 samples in total = 2 distance values per {166 (one face) +120 (two faces) video snippets} |
References
- Banks, S. M., A. B. Sekuler, and S. J. Anderson. 1991, Nov. Peripheral spatial vision: limits imposed by optics, photoreceptors, and receptor pooling. Journal of the Optical Society of America A 8, 11: 1775–1787. [Google Scholar] [CrossRef] [PubMed]
- Bindemann, M., C. Scheepers, and A. M. Burton. 2009. Viewpoint and center of gravity affect eye movements to human faces. Journal of Vision 9, 2: 7.1–16. [Google Scholar] [CrossRef] [PubMed]
- Birmingham, E., W. F. Bischof, and A. Kingstone. 2008. Gaze selection in complex social scenes. Visual Cognition 16, 2: 341–355. [Google Scholar] [CrossRef]
- Birmingham, E., W. F. Bischof, and A. Kingstone. 2009. Saliency does not account for fixations to eyes within social scenes. Vision Research 49, 24: 2992–3000. [Google Scholar] [CrossRef]
- Buchan, J. N., M. Paré, and K. G. Munhall. 2007. Spatial statistics of gaze fixations during dynamic face processing. Social Neuroscience 2, 1: 1–13. [Google Scholar] [CrossRef]
- Cerf, M., J. Harel, W. Einhäuser, and C. Koch. 2007. Predicting human gaze using low-level saliency combined with face detection. In Nips’07. [Google Scholar]
- Cohen, J. 1988. Statistical power analysis for the behavioral sciences. L. Erlbaum Associates. [Google Scholar]
- Dorr, M., T. Martinetz, K. R. Gegenfurtner, and E. Barth. 2010. Variability of eye movements when viewing dynamic natural scenes. Journal of Vision 10, 10: 1–17. [Google Scholar] [CrossRef]
- Dow, B. M., A. Z. Snyder, R. G. Vautin, and R. Bauer. 1981. Magnification factor and receptive field size in foveal striate cortex of the monkey. Experimental Brain Research 44: 213–228. [Google Scholar] [CrossRef]
- Foulsham, T., J. T. Cheng, J. L. Tracy, J. Henrich, and A. Kingstone. 2010. Gaze allocation in a dynamic situation: Speaking. Cognition 117: 319–331. [Google Scholar] [CrossRef]
- Guo, K., C. H. Liu, and H. Roebuck. 2011. I know you are beautiful even without looking at you: discrimination of facial beauty in peripheral vision. Perception 40, 2: 191–195. [Google Scholar] [CrossRef]
- Guo, K., S. Mahmoodi, R. G. Robertson, and M. P. Young. 2006. Longer fixation duration while viewing face images. Experimental Brain Research 171, 1: 91–98. [Google Scholar] [CrossRef]
- Hasson, U., I. Levy, M. Behrmann, T. Hendler, and R. Malach. 2002. Eccentricity bias as an organizing principle for human high-order object areas. Neuron 34, 3: 479–490. [Google Scholar] [CrossRef] [PubMed]
- Heisz, J. J., and D. I. Shore. 2010. More efficient scanning for familiar faces. Journal of Vision 8, 1: 1–10. [Google Scholar] [CrossRef]
- Henderson, J. M. 2007. Regarding scenes. Current Directions in Psychological Science 16, 4: 219–222. [Google Scholar] [CrossRef]
- Hershler, O., T. Golan, S. Bentin, and S. Hochstein. 2010. The wide window of face detection. Journal of Vision 10, 10: 21. [Google Scholar] [CrossRef]
- Itti, L., C. Koch, and E. Niebur. 1998. A model of saliencybased visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20: 1254–1259. [Google Scholar] [CrossRef]
- Jacques, C., and B. Rossion. 2004. Concurrent processing reveals competition between visual representations of faces. Neuroreport 15, 15: 2417–2421. [Google Scholar] [CrossRef]
- Jacques, C., and B. Rossion. 2006. The time course of visual competition to the presentation of centrally fixated faces. Journal of Vision 6, 2: 154–162. [Google Scholar] [CrossRef]
- Jebara, N., D. Pins, P. Despretz, and M. Boucart. 2009. Face or building superiority in peripheral vision reversed by task requirements. Advances in Cognitive Psychology 5: 42–53. [Google Scholar] [CrossRef]
- Johnson, A., and R. Gurnsey. 2010. Size scaling compensates for sensitivity loss produced by a simulated central scotoma in a shape-from-texture task. Journal of Vision 10, 12: 1–16. [Google Scholar] [CrossRef]
- Just, M. A., and P. A. Carpenter. 1976. Eye fixations and cognitive processes. Cognitive Psychology 8, 4: 441–480. [Google Scholar] [CrossRef]
- Kastner, S., and L. G. Ungerleider. 2001. The neural basis of biased competition in human visual cortex. Neuropsychologia 39, 12: 1263–1276. [Google Scholar] [CrossRef] [PubMed]
- Le Meur, O., P. Le Callet, D. Barba, and D. Thoreau. 2006. A coherent computational approach to model bottom-up visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 5: 802–817. [Google Scholar] [CrossRef]
- Levi, D. M., S. A. Klein, and A. P. Aitsebaomo. 1985. Vernier acuity, crowding and cortical magnification. Vision Research 25, 7: 963–977. [Google Scholar] [CrossRef] [PubMed]
- Marat, S., T. H. Phuoc, L. Granjon, N. Guyader, D. Pellerin, and A. Guérin-Dugué. 2009. Modelling spatiotemporal saliency to predict gaze direction for short videos. International Journal of Computer Vision 82: 231–243. [Google Scholar] [CrossRef]
- Marat, S., A. Rahman, D. Pellerin, N. Guyader, and D. Houzet. 2013. Improving visual saliency by adding ‘face feature map’ and ‘center bias’. Cognitive Computation 5, 1: 63–75. [Google Scholar] [CrossRef]
- Miller, E. K., P. M. Gochin, and C. G. Gross. 1993. Suppression of visual responses of neurons in inferior temporal cortex of the awake macaque by addition of a second stimulus. Brain Research 616, 1-2: 25–29. [Google Scholar] [CrossRef]
- Nagy, K., M. W. Greenlee, and G. Kovács. 2011. Sensory competition in the face processing areas of the human brain. PLoS ONE 6, 9: e24450. [Google Scholar] [CrossRef]
- Pannasch, S., J. R. Helmert, A.-K. Herbold, K. Roth, and W. Henrik. 2008. Visual fixation durations and saccade amplitudes: Shifting relationship in a variety of conditions. Journal of Eye Movement Research 2, 2: 1–19. [Google Scholar] [CrossRef]
- Paras, C. L., J. A. Yamashita, M. L. Simas, and M. A. Webster. 2003. Face perception and configural uncertainty in peripheral vision. Journal of Vision 3, 9: 822. [Google Scholar] [CrossRef]
- Parkhurst, D., K. Law, and E. Niebur. 2002. Modeling the role of salience in the allocation of overt visaul attention. Vision Research 42: 107–123. [Google Scholar] [CrossRef]
- Peters, R. J., A. Iyer, L. Itti, and K. C. 2005. Components of bottom-up gaze allocation in natural images. Vision Research 45: 2397–2416. [Google Scholar] [CrossRef] [PubMed]
- Rayner, K. 1998. Eye movements in reading and information processing: 20 years of research. Psychological Bulletin 124, 3: 372–422. [Google Scholar] [CrossRef] [PubMed]
- Reddy, L., L. Reddy, and C. Koch. 2006. Face identification in the near-absence of focal attention. Vision Research 46, 15: 2336–2343. [Google Scholar] [CrossRef] [PubMed]
- Riby, D., and P. J. Hancock. 2009. Looking at movies and cartoons: eye-tracking evidence from williams syndrome and autism. Journal of Intellectual Disability Research 53, 2: 169–181. [Google Scholar] [CrossRef]
- Rice, K., J. M. Moriuchi, W. Jones, and A. Klin. 2012. Parsing heterogeneity in autism spectrum disorders: Visual scanning of dynamic social scenes in school-aged children. Journal of the American Academy of Child and Adolescent Psychiatry 51, 3: 238–248. [Google Scholar] [CrossRef]
- Rigoulot, S., F. D’Hondt, S. Defoort-Dhellemmes, P. Despretz, J. Honoré, and H. Sequeira. 2011. Fearful faces impact in peripheral vision: behavioral and neural evidence. Neuropsychologia 49, 7: 2013–2021. [Google Scholar] [CrossRef]
- Ro, T., C. Russell, and N. Lavie. 2001. Changing faces: A detection advantage in the flicker paradigm. Psychological Science 12, 1: 94–99. [Google Scholar] [CrossRef]
- Rolls, E. T., and M. J. Tovee. 1995. The responses of single neurons in the temporal visual cortical areas of the macaque when more than one stimulus is present in the receptive field. Experimental Brain Research 103, 3: 409–420. [Google Scholar] [CrossRef]
- Rossion, B., I. Gauthier, M. J. Tarr, P. Despland, R. Bruyer, S. Linotte, and M. Crommelinck. 2000. The n170 occipitotemporal component is delayed and enhanced to inverted faces but not to inverted objects: an electrophysiological account of face-specific processes in the human brain. Neuroreport 11, 1: 69–74. [Google Scholar] [CrossRef]
- Rousselet, G. A., M. J. M. Macé, and M. Fabre-Thorpe. 2003. Is it an animal? is it a human face? fast processing in upright and inverted natural scenes. Journal of Vision 3, 6: 440–55. [Google Scholar] [CrossRef]
- Sato, T. 1995. Interactions between two different visual stimuli in the receptive fields of inferior temporal neurons in macaques during matching behaviors. Experimental Brain Research 105, 2: 209–219. [Google Scholar] [CrossRef] [PubMed]
- Smith, T. J., and P. K. Mital. 2013. Attentional synchrony and the influence of viewing task on gaze behavior in static and dynamic scenes. Journal of Vision 13, 8: 1–24. [Google Scholar] [CrossRef] [PubMed]
- Song, G., D. Pellerin, and L. Granjon. 2013. Different types of sounds influence gaze differently in videos. Journal of Eye Movement Research 6, 4: 1–13. [Google Scholar] [CrossRef]
- Still, D. L., L. N. Thibos, and A. Bradley. 1989. Peripheral image quality is almost as good as central image quality. Investigative Ophthalmology and Visual Science 30: 52. [Google Scholar]
- Tatler, B., R. J. Baddeley, and I. D. Gilchrist. 2005. Visual correlates of fixation selection: effects of scale and time. Vision Research 45: 643–659. [Google Scholar] [CrossRef]
- Thorpe, S. J., K. R. Gegenfurtner, M. Fabre-Thorpe, and H. H. Bülthoff. 2001. Detection of animals in natural images using far peripheral vision. European Journal of Neuroscience 14, 5: 869–876. [Google Scholar] [CrossRef]
- Tomalski, P., M. H. Johnson, and G. Csibra. 2009. Temporalnasal asymmetry of rapid orienting to face-like stimuli. NeuroReport 20, 15: 1309–1312. [Google Scholar] [CrossRef]
- Torralba, A., A. Oliva, M. S. Castelhano, and J. M. Henderson. 2006, October. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychological Review 113, 4: 766–786. [Google Scholar] [CrossRef]
- Virsu, V., and J. Rovamo. 1979. Visual resolution, contrast sensitivity, and the cortical magnification factor. Experimental Brain Research 37, 3: 475–494. [Google Scholar] [CrossRef]
- Võ, M. L.-H., T. J. Smith, P. K. Mital, and J. M. Henderson. 2012. Do the eyes really have it? dynamic allocation of attention when viewing moving faces. Journal of Vision 12, 13: 1–14. [Google Scholar] [CrossRef]
- Vuilleumier, P. 2000. Faces call for attention: evidence from patients with visual extinction. Neuropsychologia 38, 5: 693–700. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.-C., and M. Pomplun. 2012. The attraction of visual attention to texts in real-world scenes. Journal of Vision 12, 6: 1–17. [Google Scholar] [CrossRef] [PubMed]
Copyright © 2014. This article is licensed under a Creative Commons Attribution 4.0 International License.
Share and Cite
Rahman, A.; Pellerin, D.; Houzet, D. Influence of Number, Location and Size of Faces on Gaze in Video. J. Eye Mov. Res. 2014, 7, 1-11. https://doi.org/10.16910/jemr.7.2.5
Rahman A, Pellerin D, Houzet D. Influence of Number, Location and Size of Faces on Gaze in Video. Journal of Eye Movement Research. 2014; 7(2):1-11. https://doi.org/10.16910/jemr.7.2.5
Chicago/Turabian StyleRahman, Anis, Denis Pellerin, and Dominique Houzet. 2014. "Influence of Number, Location and Size of Faces on Gaze in Video" Journal of Eye Movement Research 7, no. 2: 1-11. https://doi.org/10.16910/jemr.7.2.5
APA StyleRahman, A., Pellerin, D., & Houzet, D. (2014). Influence of Number, Location and Size of Faces on Gaze in Video. Journal of Eye Movement Research, 7(2), 1-11. https://doi.org/10.16910/jemr.7.2.5