1. Introduction
Landmarks are indispensable when addressing natural human navigation behaviour as they are central to all forms of spatial reasoning (e.g. orientation, wayfinding) and spatial communication (
Richter & Winter, 2014). For example, landmarks are considered to be essential elements in good route instructions. Moreover, they form the basis of our mental representation of space, which is central in our ability to navigate (Lovelace, Hegarty, & Montello, 1999; (
Siegel & White, 1975). Consequently, humans use and describe landmarks on a day-to-day basis when navigating in a city or building and when formulating route instructions. However, landmark identification remains difficult from a research or commercial point of view, for example, to incorporate these navigational aids in path algorithms (
Richter, 2013). This is important as even landmark knowledge acquired through external representations has an impact on human spatial activities (Kettunen, Irvankoski, Krause, & Sarjakoski, 2013).
Having the development of more performant mobile eye trackers in mind, this study focusses on the use of eye tracking measures (i.e. total fixation time) to identify landmarks. On the one hand, there are several reasons to opt for eye tracking. First, perceiving a landmark is often done through vision. The user-centred experience of seeing a landmark is an essential part of navigation as that landmark specifies the location where a navigational action – linked to that object – should take place (
Lovelace et al., 1999;
Spiers & Maguire, 2008). Second, visual saliency, which is often characteristic for a landmark, is closely related to fixation loci of observers (
Foulsham & Underwood, 2008). Third, the eye-mind hypothesis states that certain aspects of the gaze during a task may be analysed to examine cognitive processes as eye fixations are closely related to the human ability to encode spatially distributed visual stimuli. These aspects include the locus of the eye fixation and its duration. The locus indicates the element that is being processed internally even if subjects are not consciously aware of this and the duration is related, but not necessarily identical, to the time needed to encode and to operate on that element (
Just & Carpenter, 1976). As an important aspect of learning an environment is the processing and encoding of landmarks (
Stankiewicz & Kalia, 2007), it is probable that fixation loci will reveal more about which objects are considered to be landmarks.
On the other hand, arguments can be put forward against the use of eye tracking measures as landmark identification tool. First, a landmark is not solely defined based on its visual saliency. Semantic and structural saliency are important features as well (
Sorrows & Hirtle, 1999). Second, formulating conclusions about cognitive processes based on eye movements is not without danger. The relation between the locus of the eye fixation and selective attention is not straightforward. For example, a fixation may point to recognition or use of a landmark, but could also indicate puzzlement because an object is experienced as being complex and unsuited to be used as a landmark. Additionally, people can extract information through peripheral vision or may focus on a point without picking up information (
van Gog et al., 2009;
Williams & Davids, 1997). Third, people also look around to detect possible obstacles to prevent them from falling or walking into a wall for example. As such, these locomotionbased eye fixations will mix with landmark-related fixations and hinder landmark identification.
In this paper, the use of eye tracking as a way to identify landmarks is explored. A method is presented to define which real-world objects are useful landmarks based on data on what people look at while walking through a building. From a practical point of view, we specifically focus on indoor landmarks as lighting conditions are more constant within a building, while changing lighting conditions may interfere with the proper working of the eye tracking device. This paper is organised as follows. In the next section, background information on landmark identification methods and related use of eye tracking is described. Based on this information a landmark identification criterion is proposed in
section 3.
section 4 presents the study design. Following, the results and discussion are presented in
section 5 and
section 6 respectively. Finally,
section 7 presents the main conclusions and future work on this topic.
2. Previous Work
Automatic landmark identification methods rely on the availability of datasets. For example,
Raubal and Winter (
2002) determined the overall saliency of an object (i.e. a building along a street network) based on the weighted sum of the visual, semantic and structural attraction of that object. In turn, these measures of attractions were calculated based on a variety of attributes. For example, the visual attraction of an object was based on its area, shape, colour and visibility. A study of
Nothegger et al. (
2004) stated that the landmarks identified based on this model highly correlate with the objects that were selected by humans when asked to select the most prominent façade. However, datasets containing these object characteristics are often not available or not exhaustive. Another dataset was used by
Elias (
2003). Starting from a comprehensive topographic dataset,
Elias (
2003) proposed a method to define whether or not a building may function as a landmark based on its relative uniqueness in the environment based on the geometric and thematic information that could be extracted from that dataset. In an indoor environment, although floor plans are generally available, a comparable dataset of potential landmarks within a building is most often inexistent and would be extremely labour intensive to maintain as the interior of a building can easily be subject to change.
Other methods require the participation of test persons. For example, participants can be asked to appoint salient objects on pictures (e.g.
Nothegger et al. (
2004);
Sefelin et al. (
2005))). Participants can also be asked to voice their thoughts during or after a wayfinding task. These concurrent or retrospective think aloud protocols can then be analysed to identify potential landmarks (e.g.
Hölscher et al. (
2004);
Kettunen et al. (
2013); Viaene, Vanclooster, et al. (2014)). As landmarks are considered to be part of a person’s cognitive model of the environment, these methods offer the advantage that they start from the navigator’s point of view and, in this way, allow the study of the cognitive processes linked to that model (
Richter & Winter, 2014;
van Elzakker, 2004). However, participants might have difficulties to express their thoughts (
van Elzakker, 2004) and subjects can only provide data on processes that they are aware of (
Spiers & Maguire, 2008). In this respect, eye tracking might offer a solution as eye tracking measures can be used to learn more about cognitive processes without participants having to consciously express these processes. Accordingly, eye tracking has been used to investigate which elements influence spatial decision making in a building (e.g. Wiener, Hölscher, Büchner and Konieczny (2012)). However, the number of eye tracking studies that specifically address the identification of landmarks is limited.
In 2012, Andersen et al. explored gender differences in navigational behaviour via eye tracking. An important aspect of this study was to determine whether a landmark was used by an observer. To do this, they predefined a limited number of landmarks in a 4-on-8 virtual maze and calculated the time spent looking at each landmark divided by the time spent looking at all landmarks to determine to what extent a specific landmark was used as a reference point during navigation. Furthermore, they assumed that the probability of a landmark to be used is equal for all landmarks. Consequently, the probable landmark use of a specific landmark is 100 % divided by the number of landmarks present. A landmark was considered to be actually used if the calculated landmark use was equal to or higher than the probable landmark use. In addition, verbal reports were taken at the end of the task to clarify the used strategy and selected landmarks. The study focussed, however, on the general use of landmarks and did not investigate whether specific landmarks or types of landmarks were used during the navigational task. Following, only a limited amount of objects, eight at the most, were visualised at a decision point, while this number can be much higher in reality. Additionally, the study gives no definition of a landmark and does not explain why the predefined objects were considered to be landmarks. Moreover, no research has been conducted on the validity of the used method to determine whether or not a landmark was used.
A different approach was used by Viaene, Ooms, Vansteenkiste, Lenoir, and De Maeyer (2014). They compared concurrent think aloud protocols with the eye fixations of participants who completed a route in a building twice. Based on the average fixation count, average fixation time and maximum fixation time, the authors checked which mentioned objects were clearly fixated. However, no specific criterion or threshold was presented to determine if an object was considered to be a landmark. Furthermore, participants were asked to verbalise everything related to the navigational task and the building. Consequently, it was not clear which objects would actually be used as landmarks in, for example, route instructions or a mental map of the building.
Another study was conducted by Ohm, Manuel, Ludwig, and Bienk (2014). Again, participants completed a route twice. In contrast to the previous study, the methods of data acquisition were split. During the first completion of the route only eye tracking was applied and participants were asked to remember objects that could be used to explain the route to a stranger. The second time, only verbal protocols were collected, whereby participants appointed landmarks as they would do while explaining the route to a stranger. In a next step, the authors examined to what extent the mentioned objects, which were grouped into four categories, were fixated in the first run. In contradiction with their initial assumption, namely that the verbalised landmarks would have been fixated during the first run, half of the objects mentioned during the second run were not fixated during the first completion of the route. For example, it is possible that the participants selected new objects that were easier to verbalise during the second run as the study did not investigate which objects were actually remembered after the first run. Additionally,
Ohm et al. (
2014) employed a qualitative measure of being fixated or not. The possibility exists, however, that potential landmarks are differentiable via quantitative measures, such as the number of fixations and fixation duration. This paper builds further on these studies and has the objective to present a clear landmark identification criterion based on a quantitative eye tracking measure.
3. A Landmark Identification Criterion
In this study, the landmark identification measure of
Andersen et al. (
2012) will be adapted. The authors chose to build further on this measure because it has proven its usability to examine differences in landmark use and it is based on a clear threshold to determine whether or not an object is used as a landmark. As mentioned earlier, this measure is based on the duration of all fixations on a specific object. This study hopes to underpin the assumptions made by
Andersen et al. (
2012) and its validity as (indoor) landmark identification tool. In contrast to
Andersen et al. (
2012), this measure will not be used to examine the different use of predefined landmarks in a virtual environment, but to differentiate wayfinding landmarks from other objects along a route in a real indoor environment. As a result, the amount of objects (or potential landmarks) that can be fixated during the experiment is much higher compared to the number of landmarks visualised by
Andersen et al. (
2012). For practical reasons, all potential landmarks along the route will be grouped into categories.
Examining landmark categories instead of specific landmarks is analogous to
Duckham et al. (
2010) and
Ohm et al. (
2014) and is recommended by
Richter and Winter (
2014) to reduce data requirements. The categories used by
Duckham et al. (
2010) were similar to those that can be found in a directory service like Yellow Pages (e.g. “Hotels”, “Restaurants”). Following, they selected landmarks based on class-level information whereby characteristics of individual instances were assumed based on knowledge about the landmark categories to which those instances belong.
Duckham et al. (
2010) selected the most adequate landmark at a certain location based on the suitability of a specific landmark category and the likeliness that a particular instance of that category is typical to that category. In this study, the most adequate landmark category will be defined with the help of eye tracking data.
Ohm et al. (
2014) assigned all potential landmark candidates to landmark categories that were much more abstract compared to the categories used by
Duckham et al. (
2010). As such, only four categories were formulated (i.e. “Architecture”, “Function”, “Information” and “Furniture”). In this study, we chose to follow the approach of
Duckham et al. (
2010) and formulate more concrete categories (e.g. “Poster”, “Radiator”). The categories were chosen so that all potential landmarks along the route could be appointed to a single category.
As landmark categories will be used instead of individual objects selected from a predefined collection of landmarks, the number of objects belonging to a category must be taken into account to compensate for the uneven distribution of objects over all categories. For example, there were much more posters along the route than fire extinguishers or computers. Consequently, it is more likely that a poster was fixated during the completion of the route than a computer. Therefore, the calculated landmark use and probable landmark use, proposed by
Andersen et al. (
2012), are adapted to the calculated landmark category use (CLCU) and probable landmark category use (PLCU) (equation 1 and 2). In addition to
Andersen et al. (
2012), this paper considers the extent to which the CLCU is higher than the PCLU as a continuous indication of the suitability of that category as landmark type. For this, the ratio of these measures is compared.
Table 1 provides an example. Instances of the categories “Fire” and “Ornament” are considered to be used as landmarks, although the category “Poster” was fixated much more. Additionally, the category “Ornament” is considered to be most suitable to refer to in route instructions as the ratio between CLCU and PLCU is the highest. Note that this method was not applied on structural landmark categories (e.g. floor, ceiling, walls) as these are difficult to express in quantitative measures.
With CLCUi = calculated landmark category use for a category i. TCi = total fixation time attributed to landmark category i. PLCUi = probable landmark category use for a category i. ni = number of objects in landmark category i. m = number of landmark categories. N = total number of objects.
4. Methods
The proposed landmark identification criterion indicates if the total fixation time on a landmark category in a certain area is higher than is to be expected. According to
Andersen et al. (
2012) this category is considered to be used as a landmark. In order to provide evidence for this assumption, the results of this criterion, namely the identified landmark categories, were compared with the objects used as reference points in the corresponding areas in written route instructions. These instructions normally include important elements that specify a location where a wayfinding action should take place and are often analysed to identify landmarks that are used along a route (
Denis, 1997;
Lovelace et al., 1999). In addition, concurrent verbal protocols were used to support and clarify the eye tracking fixations. This is similar to the three singled out studies in
section 2. Verbal protocols are more commonly used to study cognitive processes related to (indoor) wayfinding (e.g.
Hölscher et al. (
2006)). As such, many authors encourage or see benefits in the combination and interaction of verbal protocols and eye tracking data (
Elling et al., 2012,
2011;
Gerjets et al., 2011;
van Gog et al., 2009;
Williams & Davids, 1997).
4.1. Participants
In total 28 subjects participated in the experiment. All but one participant were in their twenties or early thirties. One person was between fifty and sixty years old. Furthermore, all participants were highly familiar with the test environment. Therefore, it is more likely that fixations would point to recognition and not confusion as mentioned in the introduction. Additionally, unfamiliar participants might have more difficulties formulating route instructions after completion of the route as their cognitive model of the route is in the first stages of its development and, therefore, incomplete. Furthermore, all participants worked at the Geography Department of Ghent University. However, none of them were familiar with the research context of indoor wayfinding. Five participants were excluded from the results, because the tracking ratio was too low. The required tracking ratio was set to 80 %. This ratio is quite low, but takes the difficulties to track the eye while going up or down the stairs into account. The fixations during these actions are not part of further analysis. This resulted in a test population of twelve male and eleven female participants.
4.2. Materials
The verbalised route instructions were recorded with a headset that was mounted on top of a head-mounted eye tracker (SMI iViewX HED). The fixations were calculated with the help of SMI Event Detection and were transferred to a reference image displaying 25 landmark categories, which were attributed with areas of interest, by using the semantic gaze mapping tool of BeGaze 3.4. For each fixation, its corresponding location on the reference image was indicated by a click of the mouse. Fixations on doors and staircases while passing through or going up and down them were not transferred to a reference image, because these are related to locomotion (
Ohm et al., 2014).
The building, which is considered to be complex by most visitors, dates from 1976 and has a traditional design. Within this building a route (see
Figure 1) was selected that had a total length of 440 meters, covered four floor levels and took about eight minutes to complete. All participants completed the same route. The route had the same start and end point and no additional objects were placed along the route for the experiment.
4.3. Procedure
At the beginning of the experiment, the participants, who were told that the study dealt with indoor navigation, were instructed as follows. “After the calibration of the eye tracking device, we will complete a route whereby you will follow me. During this completion, the device will register your eye movements. In addition, you are asked to verbalise route instructions, which can be used to guide someone who is unfamiliar with the building along the same route, out loud. These instructions may be expressed in your own words and you are allowed to correct yourself. After the completion of the route, you will be asked to answer some questions about yourself and your spatial knowledge of the completed route. These questions will address general aspects of the route. No details will be asked.”
The experiment proceeded as described to the participant. During the experiment, the guide walked next to the participants as much as possible to prevent that he would cover potential landmarks along the route. The calibration consisted of a five-point calibration and was validated immediately after the calibration and once more at the end of the experiment to assess the reliability of the fixation loci. The calibration targets were placed at a distance similar to the distance at which most objects along the route can be seen. The guidelines as expressed by
Holmqvist et al. (
2011) were taken into account during the calibration, instruction giving and route completion. Finally, the concluding questions investigated how complex the route was perceived, how the experiment was experienced and which objects were seen along the route. At the end, the participants were asked to write down route instructions for a person that is not familiar with the test environment in order that he or she can complete the same route based on these instructions. These written route instructions provided a retrospective selection of the most salient or most suited objects.
5. Results
The eye tracking data were transferred to a reference image. This was done for each area (i.e. room, corridor) separately. The reason behind this is twofold. First, approximately all objects in an area were visible along the entire path within that area. In this way, it is logical to compare the objects within one area with respect to their potential use as landmarks. Second, the verbal and written route instructions revealed that most participants experienced the route as a sequence of areas connected through doors and staircases. The ratios between CLCU and PLCU for each landmark category in each area can be found in
Table 2.
These observations will be compared with the number of times that an instance of a landmark category was mentioned in the written route instructions describing the corresponding area. Out of 23 participants, six participants were not able to write a complete and correct route description that would allow a person to complete the same route. The correct and complete descriptions consisted of 33 instructions in average and all showed the same design. First, all participants always mentioned the doors and staircases used to go to the next area. Second, instructions were formed each time there was a change of direction, even if there were no options than to follow the corridor. Third, participants always specified if a corridor was to be fully completed. Fourth, half of the times (46 %), these corridors were described in more detail. This allowed a differentiation between them based on colours, ornaments, auditorium names and/or closets. Moreover, hallways that were not entirely passed through before participants turned off into another hallway were often not mentioned. Fifth, all but one action were combined with an object to specify the location where the action should take place or the direction in which a person should continue the route.
When combining the data of all participants for all areas along the route, at least one instance of a landmark category was visible 243 times. Based on these 243 observations the correlation was calculated between the results of the proposed identification criterion and the number of times an instance of that landmark category in the corresponding area was mentioned. When considering the ratio between CLCU and PLCU as a continuous measure a one-tailed Pearson correlation of 0.727 was found. Analogously, a Spearman’s rho of 0.483 was found. Both correlations are significant at the 0.01 level. In contrast,
Andersen et al. (
2012) compared the CLCU and PLCU to determine whether an object was used as landmark or not. When determining for each observation whether a landmark category is considered to be used as a landmark (CLCU/PLCU >= 1) or not (CLCU/PLCU < 1) and comparing this binary measure with the number of times objects were mentioned in the written route instructions, a one-tailed Pearson correlation of 0.560 and a Spear-man’s rho of 0.506 were found. Both correlations are significant at the 0.01 level. Finally, when considering both measures as binary variables (i.e. considered to be a landmark or not and mentioned in the written route instructions or not), a Phi coefficient of 0.468 was found.
6. Discussion
A study was conducted to examine to what extent a landmark identification criterion, which is based on eye tracking measures, can be used to determine which instances of a selection of categories can be considered to be landmarks. To do this, eye tracking data and written route descriptions were collected from participants who were highly familiar with the environment and were focussed on collecting spatial information of a route in an actual indoor environment.
When comparing the results of the proposed identification criterion with the data collected via written route instructions, a positive correlation was found.
Andersen et al. (
2012) compared the CLCU with PLCU to determine whether an object was used as landmark or not. As such, they considered the characteristic of being a landmark as a binary variable. In contrast to this approach, a stronger correlation was found when considering the relation between CLCU and PLCU as a continuous variable. This could indicate that the relation between CLCU and PLCU could be used not only to identify landmarks, but also to differentiate them based on their quality and/or suitability to be used in route instructions. However, the Pearson correlation measure depicts linear relationships and is calculated based on true values. Therefore, it is more outlier sensitive. As such, the high values linked to the category “door route” influence this measure, although they are not considered to be erroneous. The Spearman’s rho, which is computed on ranks, is not influenced by these outliers and is considered to be more reliable in this situation. The Spearman’s rho values are similar for both situations (i.e. 0.483 and 0.506). As a result, it is not possible to determine whether the relation between CLCU and PLCU should be considered as a continuous or as a binary variable. Either way, a significant positive correlation is found. This supports the findings of
Andersen et al. (
2012), which were partially based on this identification criterion.
This correlation is strongly reflected in the categories “Door route” and “Staircase”. The eye tracking measures are remarkably high for these elements that give access to the following area along the route. This was confirmed by the written route descriptions. In addition, this is in line with the conclusions of
Ohm et al. (
2014) that mainly functional objects (i.e. doors, stairs and elevators) were fixated. It is difficult, however, to explain these high values. Given their inconspicuous design, the importance of these ‘connectors’ cannot be attributed solely to their visible salience. Firstly, the doors that gave access to the next area were often situated at the end of a corridor leading to that element. As such, these doors were visible during an extended period of time, which may have attributed to the high values. However, these values are also in line with the notion of advance visibility as proposed by
Winter (
2003), who argued that from a cognitive point of view advance visibility is complement to the object’s salience. Secondly, the staircases indicate an increased complexity of the (layout of the) environment (
Hölscher et al., 2006). Similarly, this may be attributed to the higher values of the eye tracking measures. Following, the physical perceptibility, resulting from the vertical relocation or the action of opening them, may be an additional factor in explaining their importance as landmarks.
In addition, the category “Written evacuation sign” may also highlight the importance of “Door route”. Written evacuation signs were often considered to be used as a landmark (i.e. areas 2, 6, 9, 11 and 20) based on the proposed identification criterion, while these signs were rarely referred to in the route descriptions. Knowing that these signs were always placed above the doors leading to the next area, this might indicate that they attributed more relevant information to the door (and the route). In contrast, doors other than those leading to the next area were rarely mentioned in the route instructions. Moreover, although they were fixated a lot, they were never considered to be used as a landmark based on the CLCU as the CLCU takes the number of doors in an area into account. Additionally, this strong correlation can also be observed with respect to the categories “Server” and “Area sign”.
For several landmark categories, however, it is more difficult to confirm that the proposed landmark identification criterion is confirmed by the written route instructions. For example, the CLCU values indicated that the category “Various” was used as a landmark category in areas 3, 4 and 18. The specific instance of this category was the balustrade in the central hall of the building. However, this balustrade is never mentioned in the route descriptions. It is very likely, however, that the balustrade and the view to other floor levels can be linked to references to the larger spatial entity (e.g. “central hall”, “upstairs”, “floor level”). Similarly, the category “Window” in area 17, which was identified as a landmark and represents a large glass wall that offers a view to and gives access to the street, could be linked to the “main hallway” that is mentioned by most participants.
Following, some landmark categories that were identified by the identification criterion are never mentioned in the written route descriptions. For example, the instances of “Ornament” in areas 3, 12 and 23, respectively an art display, a large plant and a game computer, and instances of the categories “Radiator” and “Trash” were never mentioned in the route instructions. With respect to the category “Closet”, the route instructions only clearly confirm the use as a landmark in area 11. For all other areas, no or very few instances of this category are mentioned. It is possible that these elements might have been forgotten after route completion or participants might have found these elements irrelevant once they were aware of the course of the entire route and the elements along this route.
Finally,
Ohm et al. (
2014) found that more than half of the objects mentioned during the second run were not fixated during the first completion of the route. The authors attributed this to the possible familiarity of the participants with the environment and the possible use of peripheral vision. Taking the design of the CLCU into account, only one feature has not been identified as a landmark category by the identification criterion while it clearly came forward in the route descriptions. This feature is a collection of animal skulls (“Ornament”) in area 20 and was mentioned by eight test persons in their route descriptions. Although the collection of skulls is clearly fixated, the individual skulls are not considered to be a landmark because the CLCU takes the number of objects in a specific category into account. It is possible that the collection as a whole can be seen as a single landmark. For example, when mentioned in the route instructions, the skulls are mentioned in plural. The difficulty of quantifying a feature was the reason why structural landmark categories (e.g. floor, ceiling, walls) were excluded from the test results. As such, the identification criterion was not able to identify these structural features as possible landmarks although they were often mentioned in the route instructions. Especially corridors and their characteristics (e.g. colour, shape) were referenced repeatedly.