Apparatus
A calibration procedure enables the linkage of pupil position with the precise point seen at a given distance by the test person. Since a machine can detect the optical axis running between the pupil and the center of the eyeball and since the fovea centralis is located approximately at the back of this axis, it is theoretically possible to build an eye tracker that would not require any calibration. Such a calibration free “gaze tracker” was designed and mainly used in vehicles where the head is more or less stable; two cameras are used to locate and track the eyes (Klefenz et al., 2010). Our prototype was created by the Institute for Bio-Inspired Computing of the Fraunhofer Institute for Digital Media Technology in Ilmenau (Germany) and adapted for the museum setting in order to allow for more distance between the equipment and participant and a larger headbox. In our setup, the head was able to move more freely and additional cameras were used to first detect the face, and then the eyes. The system consisted of a structure made of aluminum rails, a PC, two WiFi antennas, two five-mega-pixel IDS gaze tracker cameras, two infrared VGA Point Grey face tracker cameras, a power distributor, a board with infrared LEDs and cabling. The structure’s dimensions were 70cmx50cmx20cm (x, y, z axes). The cameras and infrared lighting were mounted on two cross rails (
Figure 1 and
Figure 2). We tested this prototype for the first time in this altered setup.
This construction was contained in a simple box made of veneered plywood with two shelves. It did not stand directly on a shelf but was fixed with screws to the back wall of the box and could therefore be easily adjusted. The front and top of the box were covered with black, infraredpermeable Plexiglas sheets. The system could be directly accessed through the on-board PC, or wirelessly through the specifically designed gaze tracker app on a mobile phone running Android 4.0.1 or higher. The app function will be discussed further in the Procedure section below.
The calculation of the gaze vectors is based on the detection and calculation of the pupil ellipse. The pupil ellipse makes it possible to calculate the optical axis with the help of a Hough Transformation. In addition, models are built into the software to compensate for individual deviations between the optical axis and the visual axis (Nagamatsu et al., 2010; Klefenz et al., 2010).
The device recorded the position of the pupil at a resolution of 100 Hz inside a predefined headbox. This means that viewers had to be at the right distance from the eye-tracker and the right position, standing in the middle of the cameras. To ensure this, walls were built in front of the paintings to be recorded. In order to achieve an inconspicuous appearance, the walls were covered with a fabric that matched the surrounding exhibition rooms of the Kunsthistorisches Museum. A 70 cm high "window" was cut into this wall in order to bring the visitors to position their heads in the right place for recording (
Figure 3). While this did alter the previous museum space, most participants reported assuming that the walls were set up for conservation purposes of a particularly fragile work. The head box of the gaze tracker was 25 cm in height and 35 cm in width at given distances: Depending on the size of the paintings, one device was set at a distance of 130 cm (for the smaller paintings), the other at a distance of 150 cm (for the larger paintings).
Results
As mentioned above, only viewing duration results can be reported. A more detailed analysis was planned (see Introduction), but since the expected accuracy of the calibration-free gaze tracker was not achieved, this further analysis was not possible. While data quality was not good enough for a consideration of which elements of the painting participants looked, it was possible to distinguish glances on and off the paintings at large.
The results of a one-way ANOVA and follow up t-test comparisons revealed that the viewing time for single paintings significantly differed:
Paradise by Lucas Cranach gathered the longest views (mean 10.5 seconds, median 6.4 seconds), followed by
Vanitas Still Life by Pieter Aertsen (mean 6.1 seconds, median 3.9 seconds).
St. Sebastian by Andrea Mantegna received the least interest (mean 0.8 seconds, median 0.5 seconds;
Figure 6). The difference between the two longest viewed works and the other paintings is highly significant (p< 0.001, d=0.9). The average time spent in front of the head box for all the paintings was 11.5 seconds with the longest time registering at 138 seconds. However, when considering time spent looking at the painting directly, the average viewing time dropped to 4.3 seconds with the longest time being 126 seconds. The remaining time was spent looking at the walls and other architectural features of the space behind the window and will not be counted in the following analysis.
Out of 924 instances of viewing (some visitors viewed the paintings in both gaze tracker systems during their visit) only four spent over a minute looking at any painting. These few participants, however, influence the average and it must be noted that the general median viewing time was only 1.74 seconds.
One of the goals of the study was to include visitors’ different backgrounds, such as culture and gender, into the analysis and look for viewing behavioral differences between groups. The study included participants from 60 different countries. The top seven (with 30 participants or more) were, in descending order: Germany (DE), Japan (JP), USA (US), Austria (AT), Great Britain (GB), Russia (RU) and France (FR). These seven countries represent 65% of all participants. An ANOVA and follow-up t-tests revealed that only one group is significantly different from the others: the French viewed the paintings significantly (p<0.0008, d=0.6) longer than others (
Figure 7).
While men viewed the paintings on average for a slightly longer time, the longest recorded time was that of a female participant. Both the difference between the gender and according to sexual orientation (divided into five groups: heterosexual, homosexual, bisexual, asexual, and other) of the participants is not statistically significant.
Discussion
The experiment’s design was successful in eliminating many of the usual interruptions that occur in a viewer’s museum experience. There were no calibration, visible equipment, or interaction with researchers until after the recording took place. This meant that participants did not know they were being tested, providing us with the opportunity to record natural viewing experiences. The changes we had to make in the museum display did not interrupt the visitor’s experience of the works, they did allow data recording, but the system did not deliver accurate results. However, once improved, the device could be used to analyze not only viewing duration but within-painting gaze paths and events within specific areas of interest. These can, in turn, shed light on elements of the paintings that might capture viewers’ attention and be compared to the data taken in a lab setting in order to better understand whether there is a difference in perceiving works in a museum as opposed to a laboratory. Such data will open new horizons both for the study of the reception of single paintings by different audiences and our understanding of museum visitors’ viewing experience more generally.
This approach allowed us to test over 800 participants in less than a month, which is vastly more than any study has been able to do in a museum context with original artworks. For comparison purposes: In 2018 and 2019, our lab conducted another large-scale museum study, this time with mobile eye tracking headsets (Reitstätter et al., 2020). Using a slightly higher number of team members and four eye tracking devices at a time, we were only able to test up to 150 participants in seven days.
A remarkable result of this study is that the paintings that were exhibited at the same spot in the museum and viewed by a similar amount of visitors, received significantly different viewing times, varying from a median of 6.4 to 0.5 seconds. We later had similar results in the already mentioned study conducted with a mobile eye tracker at the Belvedere Museum in Vienna: certain artworks received significantly different viewing times varying from a median of 0.52 to 47 seconds, and it is noteworthy that the same artworks attracted similar viewing times in different display situations (Reitstätter et al., 2020). Smith and Smith (2001, 232) also found differences between time spent in front of the paintings they tested. They attributed the variations in viewing time to size of canvas, fame, and available seating in front of the work. These factors do not apply to the current study since none of our paintings are popular highlights of the museum, they had similar sizes and were displayed in the same place.
The difference in viewing time suggests that there is something about certain artworks themselves that consistently draws more attention from a wide variety of viewers. Systematic studies with a larger number of paintings as well as within painting viewing analysis may shed light on which elements attract longer visitors’ attention.
The viewing duration obtained in our study are lower than previously reported viewing averages. For instance, Smith and Smith (2001) observed an average of 27.2 seconds at the Metropolitan Museum of Art in New York, with a median of 17 seconds. However, their results are not directly comparable, since the artworks used for their study were highlights of the museum which would likely have been known at least by some visitors in advance of their visit. This was not the case for the paintings used here. Mantegna’s St. Sebastian, arguably the best known among our paintings, was the one with the lowest average viewing time. It must also be noted that we were able to separate the time spent viewing a painting from just standing in front of it. Earlier visitor observation studies that did not use an eye tracker could not know exactly where the participant was looking and would therefore record the total time spent in front of a painting, whether looking directly at it or not.
The comparison in viewing time revealed no significant difference between sexes or persons with different sexual orientations. Notably, this applies regardless of the content of the paintings. One of them showed a female nude (Titian, Mars and Venus), another a nearly naked man (Mantegna, Saint Sebastian), but neither of those caused an increase of viewing times for any group. In regard to cultural differences, the only relevant result that we can report (and cannot explain at the moment) is that participants born in France had a significantly longer viewing time.
Beyond the technical problems, our innovative method is, of course, not without limitations. The museum display still had to be altered to record data and did not allow for visitors to approach the paintings as close as they could for other works in the museum: The wall with the window altered the normal viewing situation within the museum, though this also occurs in museums where similar measures are used to create distance between visitors and artworks. Another limitation of the window set up was that it did not work if visitors looked through the window from afar (in which case they would not have been able to see the whole painting) or took a photograph (this would result in blocking their faces and arguably should not count as beholding time). While it did allow for the testing of a large number of participants, there also needed to be a device for every painting—which makes it costly to test a large number of paintings. As eye tracking hardware and algorithms develop, we assume that it will be possible to use similar devices for a more in-depth analysis of museum viewing than has yet been possible.