An Exploratory Study Investigating Gender Effects on Using 3 D Maps for Spatial Orientation in Wayfinding

3D representations in applications that provide self-localization and orientation in wayfinding have become increasingly popular in recent years because of technical advances in the field. However, human factors have been largely ignored while designing 3D representations in support of pedestrian navigation. This exploratory study aims to explore gender effects on using 3D maps for spatial orientation. We designed a 3D map that combines salient 3D landmarks and 2D layouts, and evaluated gender differences in their performance during direction-pointing tasks by administrating an eye tracking experiment. The results indicate there was no significant overall gender difference on performance and visual attention. However, we observed that males using the 3D map paid more attention to landmarks in the environment and performed better than when using the conventional 2D map, whereas female performance did not show any significant difference between the two types of map usage. We also observed contrary gender differences in visual attention on landmarks between the 3D and 2D maps. While males fixated longer on landmarks than females when using the 3D map, females paid more visual attention to landmarks than males when using the 2D map. In addition, verbal protocols showed that males had more confidence while make decisions. These empirical results can be helpful in the design of map-based wayfinding enhancement tools.


Introduction
Cartographers and GI scientists have a long-established interest in modeling the Earth in three-dimensional (3D) form.Compared to two-dimensional (2D) representations, 3D representations are believed to provide a familiar view of the natural world, making interpretation easier, especially for novice users [1,2].However, it seems that the popularity of 3D representations is largely motivated by technical advances such as 3D data acquisition, modeling and the increasing processing power and speed of hardware [3,4].The influences of human factors such as gender, age, expert knowledge and cultural background have been largely ignored during the creation and evaluation of 3D representations [5].Whether the 3D representations are welcomed by different groups of users and suited for various purposes is still not well documented [4].
This exploratory study focuses on the gender effects of using 3D representations for self-localization and orientation in wayfinding, one of the most important tasks faced in environments, familiar or unfamiliar.Previous studies have suggested that gender differences related to landmarks and map interpretation affect wayfinding performance.However, the previous studies have mainly focused on gender differences seen with traditional 2D maps and empirical evidence on differences with 3D maps is rare.This article particularly concentrates on whether using a 3D map that combines salient 3D landmarks and 2D layouts can lead to gender differences in self-localization and orientation during wayfinding.The focus was the influence of a 3D map on user performance and visual attention to landmarks.Specifically, we address the following two research questions: (1) Are there any gender differences in user performance and visual attention during self-localization and spatial orientation when applying spatial knowledge acquired from 3D maps?(2) If the above gender differences exist, are these differences consistent with those from using 2D maps?
To this end, we designed a 3D map that combines 3D landmarks and 2D layouts.We then evaluated the 3D map using computational visual attention models and designed a two-factorial (3D/2D and male/female) eye tracking experiment.We conducted the experiment in a simulated wayfinding environment (as described in Section 3).We recorded the eye movements of the participants and verbal reports as they visually learned the maps and searched for spatial orientation cues.Participant performance and visual attention were then analyzed qualitatively and quantitatively (as described in Section 4).We finally discuss the observed gender differences and the limitations of the study along with providing suggestions for future studies (as described in Sections 5 and 6).
It is important to note that the term 3D used in this article refers to geo-visualizations that present information in a 3D view.They allow users perceive 3D information by perspective or other depth cues [6].The difference between such geo-visualizations and the "true-3D" visualizations is that they are displayed on 2D planar surfaces such as computer screens instead of real physical stereoscopic displays [6].Some studies term this concept as pseudo-3D, 2.5D, and 3D-view [6][7][8].We have alternatively used the term 3D map/representation in this article to avoid ambiguity.

Gender Differences in Wayfinding
Wayfinding, as defined by Golledge [9], is "the process of determining and following a path or route between an origin and a destination."(p. 6).It is a complex behavior that consists of a series of cognitive processes and spatial behavior [9][10][11].Among them, two crucial and closely related cognitive processes are of special concern in this study: spatial orientation (determining the direction that one is facing) and self-localization (determining one's position in an environment).These two processes are frequently executed along the route especially at decision points during wayfinding.
Literature addressing gender differences with respect to wayfinding has provided mixed results, but it is generally agreed that while gender differences exist, they vary across cognitive tasks and abilities that have been tested; and the magnitude and distribution of these differences vary and can favor males or females [12].A well-known issue with respect to utilizing spatial knowledge for wayfinding is the distinction of "route-survey" strategies between genders.For example, studies showed that males tend to use Euclidean distances and absolute directions (i.e., east, west, south and north) during wayfinding (the "orientation strategy"), while females are more likely to focus on the sequence from place to place and left-right turns (the "route strategy") [13,14].Females were found to use more landmarks in giving and following directions and had higher scores on object location memory tasks [12,14].Females report higher spatial anxiety during navigation, which was proven to be negatively associated with route strategy [14].The fact that males tend to use global configurations and geometric information does not contradict with the fact that they can also employ landmark information for wayfinding [15].When maps are used in wayfinding, inconsistent gender differences have been reported.While some studies found that males are more accurate in map-assisted wayfinding (e.g., [16,17]) and map reading skills (e.g., [18][19][20][21]), other studies reported no differences between males and females (e.g., [19,20,22]).For example, Montello, Lovelace, Golledge and Self [22] found that males had a significantly higher accuracy than females in distance estimation but only when using knowledge acquired from traveling the real environment rather than when using map-derived knowledge.Coluccia and Louse [23] hypothesized that it may be more difficult for females to acquire the survey knowledge of an environment, but that if the survey knowledge is already provided by the maps, the gender differences disappear.
It should be noted that previous studies investigating map-related gender differences only used 2D maps rather than 3D maps.This might be partly because 3D representations did not become popular until technological advances in 3D data acquisition, modeling and rendering have been made in recent years [3,4].By providing a familiar view of the world through three dimensions, it is believed that 3D representations can be easier to interpret than conventional 2D representations [24].The additional axis can provide more space to display information [24].Studies have found that 3D representations can facilitate landmark recognition with higher confidence [1,2,25].It is suggested that while 3D representations can facilitate the use of Euclidean identification in the case of males, object location memory can facilitate the recognition of landmarks by females during navigation [13].Thus, it is plausible that integrating landmarks into maps in a 3D form can benefit both genders.However, empirical evidence regarding such gender effects in wayfinding is rare.
We aim to explore if males and females exhibit different levels of performance during spatial self-localization and orientation using 3D representations, in comparison to the commonly used 2D maps.We look primarily at gender effects on user performance and visual attention to landmarks during spatial decision-making.

Evaluation of 3D Representations for Wayfinding
Although a number of studies have reported that participants considered 3D representations more useful and attractive than 2D representations, few of the studies actually observed better performance with 3D representations [8,26,27].Popelka and Doležalová [8] found no significant difference in several eye movement metrics between using 2D and 3D maps for visual search tasks.Evidence has shown that representations with information overload and high visual complexity demand a high cognitive workload and require additional mental effort and attention [8,27].An excess of 3D landmark models on the screen can confuse and slow down the map user [28].Eye-tracking studies have also suggested that participants using 3D representations need to perform an extensive visual search to find the details of routes during map learning [29].When searching for objects, 3D maps produce clustered fixations with longer duration and smaller saccadic amplitude [30].This means that 3D users are less efficient at gathering detailed information.A possible solution to reduce information overload and cognitive workload is to provide only the most salient and relevant landmarks rather than include all of them [31].
Some recent studies have suggested integrating 3D and 2D representations to combine their advantages [29,30,32].For example, Lei, Wu, Chao and Lee [30] have suggested that 2D maps could be used to provide a general structure of the routes and 3D information could be enabled when users search locations of interest.Liao, Dong, Peng and Liu [29] have found that 2D maps are more efficient for spatial knowledge acquisition, whereas 3D representations provide users a more efficient visual search, resulting in better performance at decision points.An aerial, vertical view of 2D maps enables users to acquire configurational knowledge of the environment (also called survey knowledge).Evidence shows that this global understanding can be more accurate in spatial relations than learning from direct experience [33].In contrast, 3D representations can support landmark recognition and provide landmark knowledge and route knowledge, which is also crucial for wayfinding [34].A benefit of combining the two types of representations for wayfinding is that their combination offers every type of spatial knowledge simultaneously [34].We aim to explore this potentially promising approach by designing a 3D map that combines salient 3D landmarks and 2D layouts.Our hypothesis is that if task-relevant 3D landmarks were designed to be perceptually attractive on the map, users would attend to these landmarks on the map and the corresponding ones in the environment to facilitate the process of spatial orientation.Therefore, the users would make decisions effectively and efficiently.

Designing the 3D Map
To begin with, an experimental area and route were selected (described below in Section 3.7) and then the 3D map was designed manually.One key objective in designing the 3D map is to ensure effective and efficient transmission of cartographic information to users.Swienty et al. [35] proposed a general framework for geovisualization called "attention-guiding geovisualization".Its basic principles are to: (1) design task-relevant objects that are perceptually salient; (2) make less relevant objects less attractive; and (3) omit irrelevant objects [35].Based on a controlled eye-tracking experiment, Fabrikant et al. [36] further demonstrated that perceptually salient designs can contribute to efficient spatial inference making for novice users when interpreting weather maps.We applied these principles to our 3D map design.
The first step in designing the 3D map was to determine the task-relevant objects.Ideally, relevance would be measured quantitatively.However, quantifying relevance in a geographic context is complicated [37,38] and considered outside the scope of this paper.We employed a practical approach: to use landmark salience to approximate task relevance.This assumes that landmarks with higher salience are more relevant to the wayfinding task.We thus adopted the Raubal and Winter [39] model to calculate the salience of each building along the selected route.In this model, landmark salience is calculated as a weighted linear sum of three components-visual, semantic, and structural attraction of the landmarks.For details of the method, see Raubal and Winter [39] and Nothegger et al. [40].For details of our landmark salience calculation and landmark selection process, see the Supplementary materials of this article online.
The second step was to construct 3D models and design a 2D base map.We selected the most salient building for each decision point and built photorealistic 3D models of the buildings.The 2D base map design process generally followed conventional map design processes and principles [41].The key principle in designing the map was to render task-relevant features perceptually more salient and reduce redundant information while retaining adequate information for wayfinding on the selected route [35].For example, to achieve conciseness and reduce visual complexity, labels (street names and building names) and features that were far from the route were considered irrelevant and were thus excluded from the map.In fact, they were invisible to wayfinders along the selected route.
Finally, visual layers were hierarchically organized into regions, roads, the route, labels, and 3D landmarks [35].An oblique view was adopted.The designed map for the experimental area is shown in Figure 1.It was created using ESRI ArcMap and ArcScene (www.esri.com).We expected that a users' visual attention could be attracted to the salient landmarks during the process of acquiring spatial knowledge from the map and that their visual attention could be directed to the corresponding landmarks when they attempted to apply this information to wayfinding.

Pre-Test Evaluation
To test the visual saliency of the designed 3D map, we conducted a pre-test evaluation of the stimuli using computational visual attention models in comparison to a 2D map (the Google map of the same area).The computational visual attention models are based on bottom-up features (e.g., color, intensity, and orientation) to compute the "saliency map" that represents the conspicuousness of scene locations [42].The computational models can use the saliency map to predict where visual attention will be focused.It is important to note that these models depend solely on bottom-up features and do not consider top-down factors such as user tasks and previous knowledge [43].Such computational visual attention models are used to evaluate the visual saliency of map designs and have been commonly used in cartography studies [35,36,44,45].We adopted the widely used Itti model developed by Itti et al. [46] to compute saliency maps.The saliency maps for the designed 3D map and the 2D map are shown in Figure 2. We can see that for the designed 3D map, the 3D landmarks have higher visual saliency than the background area.These landmarks were expected to be more attractive and draw the visual attention of the participants during map learning.In contrast, for the 2D map, many task-irrelevant elements have high saliency, which may distract the visual attention of the participants during map learning.These results provide initial evidence that perceptually salient task-relevant objects in 3D maps are expected to accelerate visual information processing and spatial knowledge acquisition.In the next step, a controlled eye-tracking experiment was designed and conducted to determine if there were gender differences in user performance and visual attention on landmarks, when using the 3D design for spatial orientation, in a comparison to a commonly used 2D map.

Eye-Tracking Experimental Setup
Eye tracking is a widely used technique to investigate visual attention and internal cognitive processes [43].It has been adopted in cartographic studies since the 1970s [47].Recent eye-tracking studies have covered a wide range of topics [48], such as map-usability evaluation [45,49,50], user visual behavior analysis [51,52], gaze-based map interaction [53], and spatial cognition [54,55].The saliency maps for the designed 3D map and the 2D map are shown in Figure 2. We can see that for the designed 3D map, the 3D landmarks have higher visual saliency than the background area.These landmarks were expected to be more attractive and draw the visual attention of the participants during map learning.In contrast, for the 2D map, many task-irrelevant elements have high saliency, which may distract the visual attention of the participants during map learning.These results provide initial evidence that perceptually salient task-relevant objects in 3D maps are expected to accelerate visual information processing and spatial knowledge acquisition.In the next step, a controlled eye-tracking experiment was designed and conducted to determine if there were gender differences in user performance and visual attention on landmarks, when using the 3D design for spatial orientation, in a comparison to a commonly used 2D map.The saliency maps for the designed 3D map and the 2D map are shown in Figure 2. We can see that for the designed 3D map, the 3D landmarks have higher visual saliency than the background area.These landmarks were expected to be more attractive and draw the visual attention of the participants during map learning.In contrast, for the 2D map, many task-irrelevant elements have high saliency, which may distract the visual attention of the participants during map learning.These results provide initial evidence that perceptually salient task-relevant objects in 3D maps are expected to accelerate visual information processing and spatial knowledge acquisition.In the next step, a controlled eye-tracking experiment was designed and conducted to determine if there were gender differences in user performance and visual attention on landmarks, when using the 3D design for spatial orientation, in a comparison to a commonly used 2D map.

Eye-Tracking Experimental Setup
Eye tracking is a widely used technique to investigate visual attention and internal cognitive processes [43].It has been adopted in cartographic studies since the 1970s [47].Recent eye-tracking studies have covered a wide range of topics [48], such as map-usability evaluation [45,49,50], user visual behavior analysis [51,52], gaze-based map interaction [53], and spatial cognition [54,55].

Eye-Tracking Experimental Setup
Eye tracking is a widely used technique to investigate visual attention and internal cognitive processes [43].It has been adopted in cartographic studies since the 1970s [47].Recent eye-tracking studies have covered a wide range of topics [48], such as map-usability evaluation [45,49,50], user visual behavior analysis [51,52], gaze-based map interaction [53], and spatial cognition [54,55].
Eye-tracking experiments in situ differ from those in the lab.Previous studies indicate that field-based experiments using a mobile eye-tracking system may encounter calibration failure [28,56].Furthermore, a field experiment provides less control of the environment and exerts intensive data-processing workloads [54].In contrast, lab-based settings can provide a more controlled environment to collect eye-movement data and avoid facing the risk of calibration failure [57].Therefore, we decided to adopt a lab-based eye-tracking experiment, leaving field navigation experiments for follow-up research.
We used Tencent Street View (http://map.qq.com/) to simulate the real environment [58] (Figure 3c).As with the well-known Google Street View, the Tencent Street View provides users with free control in 360-degree, high-resolution panoramas at the ground level, producing a realistic and immersive experience [59].We employed the street view instead of static images because it is closer to a real environment, and wayfinders can look around to search for visual cues during spatial orientation.In the experiment, we followed the lead of other researchers [33,60,61] by asking participants to learn from either a 3D or 2D map and then perform a set of direction-pointing related tasks in the street view.In addition, we also recorded participants' verbal reports to gain additional insights into their thoughts during the experimental tasks.Eye-tracking experiments in situ differ from those in the lab.Previous studies indicate that field-based experiments using a mobile eye-tracking system may encounter calibration failure [28,56].Furthermore, a field experiment provides less control of the environment and exerts intensive data-processing workloads [54].In contrast, lab-based settings can provide a more controlled environment to collect eye-movement data and avoid facing the risk of calibration failure [57].Therefore, we decided to adopt a lab-based eye-tracking experiment, leaving field navigation experiments for follow-up research.
We used Tencent Street View (http://map.qq.com/) to simulate the real environment [58] (Figure 3c).As with the well-known Google Street View, the Tencent Street View provides users with free control in 360-degree, high-resolution panoramas at the ground level, producing a realistic and immersive experience [59].We employed the street view instead of static images because it is closer to a real environment, and wayfinders can look around to search for visual cues during spatial orientation.In the experiment, we followed the lead of other researchers [33,60,61] by asking participants to learn from either a 3D or 2D map and then perform a set of direction-pointing related tasks in the street view.In addition, we also recorded participants' verbal reports to gain additional insights into their thoughts during the experimental tasks.

Participants
The experiment was a two-factorial design.Twenty participants (10 males and 10 females) aged 18-25 years voluntarily participated in the experiment.The participants were divided into two groups (2D versus 3D) with 5 males and 5 females in each group.They had no knowledge about the nature of the experiment.They were undergraduate students (n = 14) or graduate students (n = 6).Five of them majored in non-geographic fields (three in the 2D group and two in the 3D group) and the others had geographic field-related backgrounds.None of the participants had walked the pre-defined route or visited the four decision points before.They were regular Internet users, and had the experience of using web maps and mobile maps.All of the participants had normal or corrected-to-normal vision.Each participant received ¥20 (Yuan) as compensation for their participation.The study was approved by the institutional review board of the university to which the authors belong.All participants had provided their written informed consent to participate in the experiment.

Participants
The experiment was a two-factorial design.Twenty participants (10 males and 10 females) aged 18-25 years voluntarily participated in the experiment.The participants were divided into two groups (2D versus 3D) with 5 males and 5 females in each group.They had no knowledge about the nature of the experiment.They were undergraduate students (n = 14) or graduate students (n = 6).Five of them majored in non-geographic fields (three in the 2D group and two in the 3D group) and the others had geographic field-related backgrounds.None of the participants had walked the pre-defined route or visited the four decision points before.They were regular Internet users, and had the experience of using web maps and mobile maps.All of the participants had normal or corrected-to-normal vision.Each participant received ¥20 (Yuan) as compensation for their participation.The study was approved by the institutional review board of the university to which the authors belong.All participants had provided their written informed consent to participate in the experiment.

Apparatus
We used a Tobii T120 eye tracker (www.tobii.com)with a sample rate of 120 Hz and a 17-inch monitor (1280 × 1024 pixels).The recording accuracy of the eye tracker was 0.5 • and the spatial resolution was 0.2 • .It allowed head movement within a range of 0.2 • .All of the participant sample rates were above 80%.The software Tobii Studio version 3.0 was used to process the eye-movement data.The Tobii I-VT algorithm was used as the fixation filter to identify fixations from the raw data [62].According to the Tobii Studio user manual, the Tobii I-VT algorithm was proposed by Olsson [63] and was used as the default fixation algorithm in Tobii Studio [62].The algorithm was based on the classic I-VT fixation filter as described in Salvucci and Goldberg [64] and Komogortsev et al. [65].The velocity threshold was set to 30 • /s.Any short fixations that were below 60 ms duration were discarded.Participants used a computer keyboard and mouse as input devices to complete their tasks.Microphones were connected to the computer to record verbal reports.All of the equipment was set up in a dedicated room with proper lighting and no disruptions.

Stimuli
As shown in Figure 3, we used the 2D Google map as the 2D map.The 3D view was displayed in ESRI ArcScene and both the 2D and 3D maps were interactive.A route from the start to the destination about 1.7 km long was highlighted on the map.It began from starting point A, Bainian Luzhu (a snack bar, LM1 in Figure 1) and ended at destination B, Tian'an men (a famous tourist attraction in Beijing, LM6 in Figure 1) (Figure 3a).We selected four crossroads (labeled C1 to C4) along the route corresponding to the four orientation tasks in the experiment.In each task, the participants were presented with a street view scene, which was located at one of the four decision points.The participants were required to determine the direction to the next stop by searching cues from the street view and recalling previously acquired information from the 3D or 2D map.

Procedure
The participants were first welcomed and briefly introduced to the procedure.They were then assigned to either the 3D-map group or the 2D-map group.The number of males and females was balanced between the two groups.Participants were then instructed on how to control the map and the street view (e.g., zooming in and out, panning, changing direction, stepping forward).They were given a period of five minutes to get familiar with the controls and to explore the map and the street view using a sample map and scene.
After the pre-test training and familiarization, the participants started the calibration process.We used the default 5-dot mode to calibrate the eyes of the participants.The participants were required to focus all their visual attention on the center of the dots.Then the participants were presented with the experimental map (in either 3D or 2D).The participants were instructed to read the map and to remember the route and any related information, such as streets, landmarks, and turns, as much as possible for the subsequent tasks.They were told that they would not be allowed to go back and look at the map again in the next task.There was no upper time limit for the learning task.Once the participants felt ready, they could close the map by themselves.
Next, the participants were presented with four scenes (C1 to C4) in street view order (Figure 3).In each scene, the participants were required to point out the direction to the next stop.They could rotate and magnify the street view to locate and orient themselves on the previously learned route.When they made the final decision, they were required to provide the exact direction to the experimenter.The initial direction in the street view for C1-C4 was west, southwest, east and west, respectively.They were allowed to communicate any thoughts about the task, and the experimenter would respond to their questions.After making the final decision, the next scene was presented.There was no upper time limit to finish the tasks.
Finally, participants were required to rate: (1) their ability to find a way to a destination in an unfamiliar environment (on a five-point scale, from 1: very poor to 5: very good); (2) their familiarity with electronic maps (on a five-point scale, from 1: unfamiliar to 5: very familiar); and (3) their experiences using mobile maps for wayfinding (on a five-point scale, from 1: never used to 5: always used).The questionnaire also required the participants to indicate if they have walked the route and visited the landmarks before.

Map Learning
Since our sample size was small, we employed a nonparametric test, the Mann-Whitney U test, to examine the gender (males versus females) effects on all statistics both within and between groups.The significance of difference was tested at the 0.05 level, and the statistic U and the standardized statistic Z are reported here.The duration of the map-learning phase (Figure 4a) for males varied between 38.20 and 173.25 s (M = 101.20,SD = 36.92),while the mean duration for females was 141.00 s, ranging between 78.28 and 286.56 s (SD = 75.97).Although the females took a little longer for reading the maps, the difference was not significant; U = 39.00,Z = −0.832,p = 0.436 > 0.05.No significant differences were detected within either the 2D or 3D groups.This suggests that participants, both males and females, spent equal time acquiring the information they assumed was necessary and sufficient.It is worth noting that the variance of the female group was greater than that of the male group; this reveals that the female participants were less homogeneous than the male participants.ISPRS Int.J. Geo-Inf.2017, 6, 60 8 of 19 experiences using mobile maps for wayfinding (on a five-point scale, from 1: never used to 5: always used).The questionnaire also required the participants to indicate if they have walked the route and visited the landmarks before.

Map Learning
Since our sample size was small, we employed a nonparametric test, the Mann-Whitney U test, to examine the gender (males versus females) effects on all statistics both within and between groups.The significance of difference was tested at the 0.05 level, and the statistic U and the standardized statistic Z are reported here.The duration of the map-learning phase (Figure 4a) for males varied between 38.20 and 173.25 s (M = 101.20,SD = 36.92),while the mean duration for females was 141.00 s, ranging between 78.28 and 286.56 s (SD = 75.97).Although the females took a little longer for reading the maps, the difference was not significant; U = 39.00,Z = −0.832,p = 0.436 > 0.05.No significant differences were detected within either the 2D or 3D groups.This suggests that participants, both males and females, spent equal time acquiring the information they assumed was necessary and sufficient.It is worth noting that the variance of the female group was greater than that of the male group; this reveals that the female participants were less homogeneous than the male participants.
For the questionnaire of participant self-reports of their spatial ability and wayfinding experiences, the mean score is seen to be 3.48 (SD = 0.68) with a minimum value of 1.67 and a maximum value of 4.67.A Mann-Whitney U test shows that the difference in scores between males and females was not significant (U = 28.00,Z = −1.708,p = 0.105 > 0.05).A Spearman Rho test indicates that the scores correlated positively with the accuracy of orientation tasks (r = 0.509, p = 0.022 < 0.05) but not with the response time (r = −0.430,p = 0.058 > 0.05).
For the questionnaire of participant self-reports of their spatial ability and wayfinding experiences, the mean score is seen to be 3.48 (SD = 0.68) with a minimum value of 1.67 and a maximum value of 4.67.A Mann-Whitney U test shows that the difference in scores between males and females was not significant (U = 28.00,Z = −1.708,p = 0.105 > 0.05).A Spearman Rho test indicates that the scores correlated positively with the accuracy of orientation tasks (r = 0.509, p = 0.022 < 0.05) but not with the response time (r = −0.430,p = 0.058 > 0.05).

Visual Attention on Landmarks
Further quantitative analysis is required in order to examine if and for how long the participants perceived the landmarks for their decision-making.Compared to existing eye-tracking studies that used static map stimuli or simple interactive designs (e.g., [49,51,66,67]), this study allowed the user to freely control the street view, which led to a higher level of difficulty in analyzing the eye-movement data.We followed a traditional method to deal with such eye-movement data by generating dynamic areas of interest (AOIs, Figure 5); however, this method is a labor-intensive and time-consuming process [68].For each participant and task, we set key frames to indicate the exact point when the scene changed and drew AOIs of the corresponding landmarks for each key frame.We generated dynamic AOIs for all LMs, except LM6, because LM6 was invisible at Crossroad 4. Based on the dynamic AOIs, two fixation-level metrics were derived and calculated to analyze participant visual attention on landmarks: fixation duration as percentage (fixation duration/response time), and fixation count as percentage (fixation count/total fixation count).We used percentages instead of absolute values because fixation count and duration are associated with response time.
ISPRS Int.J. Geo-Inf.2017, 6, 60 9 of 19 allowed the user to freely control the street view, which led to a higher level of difficulty in analyzing the eye-movement data.We followed a traditional method to deal with such eye-movement data by generating dynamic areas of interest (AOIs, Figure 5); however, this method is a labor-intensive and time-consuming process [68].For each participant and task, we set key frames to indicate the exact point when the scene changed and drew AOIs of the corresponding landmarks for each key frame.We generated dynamic AOIs for all LMs, except LM6, because LM6 was invisible at Crossroad 4. Based on the dynamic AOIs, two fixation-level metrics were derived and calculated to analyze participant visual attention on landmarks: fixation duration as percentage (fixation duration/response time), and fixation count as percentage (fixation count/total fixation count).We used percentages instead of absolute values because fixation count and duration are associated with response time.The results of the analysis performed with regards to visual attention on landmarks are shown in Figure 6.Overall, there was no significant difference in fixation duration and fixation count between males and females.However, males fixated on landmarks for significantly shorter durations (M = 0.011, SD = 0.011) than females did (M = 0.056, SD = 0.011) when using the 2D map (U = 1.00,Z = −2.402,p = 0.016 < 0.05).While this difference was reversed between males (M = 0.111, SD = 0.011) and females (M = 0.051, SD = 0.011) when using the 3D map (U = 0.00, Z = −2.611,p= 0.008 < 0.01).Similarly, in terms of fixation count, males fixated for significantly fewer landmarks than females did when using the 2D map (U = 0.00, Z = −2.611,p = 0.008 < 0.01).We speculate that it is this reversed difference that resulted in insignificant overall difference between males and females.The results of the analysis performed with regards to visual attention on landmarks are shown in Figure 6.Overall, there was no significant difference in fixation duration and fixation count between males and females.However, males fixated on landmarks for significantly shorter durations (M = 0.011, SD = 0.011) than females did (M = 0.056, SD = 0.011) when using the 2D map (U = 1.00,Z = −2.402,p = 0.016 < 0.05).While this difference was reversed between males (M = 0.111, SD = 0.011) and females (M = 0.051, SD = 0.011) when using the 3D map (U = 0.00, Z = −2.611,p= 0.008 < 0.01).Similarly, in terms of fixation count, males fixated for significantly fewer landmarks than females did when using the 2D map (U = 0.00, Z = −2.611,p = 0.008 < 0.01).We speculate that it is this reversed difference that resulted in insignificant overall difference between males and females.
In terms of comparisons across map types, males fixated on landmarks for significantly longer when using the 3D map than when using the 2D map (U = 0.00, Z = −2.611,p = 0.008 < 0.01), whereas females had no significant difference in fixation duration between the 2D and 3D maps (U = 11.00,Z = −0.313,p = 0.841 > 0.05).The significance of the difference in fixation count is similar to that of the difference in fixation duration.In other words, the 3D map increased fixation duration and fixation count on landmarks for males but not for females.This is consistent with the observation that males performed better when using the 3D map than when using the 2D map, whereas females did not.
durations (M = 0.011, SD = 0.011) than females did (M = 0.056, SD = 0.011) when using the 2D map (U = 1.00,Z = −2.402,p = 0.016 < 0.05).While this difference was reversed between males (M = 0.111, SD = 0.011) and females (M = 0.051, SD = 0.011) when using the 3D map (U = 0.00, Z = −2.611,p= 0.008 < Similarly, in terms of fixation count, males fixated for significantly fewer landmarks than females did when using the 2D map (U = 0.00, Z = −2.611,p = 0.008 < 0.01).We speculate that it is this reversed difference that resulted in insignificant overall difference between males and females.In terms of comparisons across map types, males fixated on landmarks for significantly longer when using the 3D map than when using the 2D map (U = 0.00, Z = −2.611,p = 0.008 < 0.01), whereas females had no significant difference in fixation duration between the 2D and 3D maps (U = 11.00,Z = −0.313,p = 0.841 > 0.05).The significance of the difference in fixation count is similar to that of the

Verbal Protocols
We aimed to use the data from the verbal protocol to gain additional qualitative insights into the thoughts of the participants.All verbal reports generated were transcribed and translated into English.The transcripts were then segmented based on the sentence-coding protocol shown in Table 1.We classified the sentences spoken by the participants into questions (Q) and statements (S).Questions were further divided into Q1 (general queries about controlling the street view and the task at hand) and Q2 (queries about nearby buildings and streets).Statements were also split into positive statements (PS) (PS1: landmark/street judgment, PS2: action judgment) and negative statements (NS).Participants' questions and statements (especially Q2 and NS) were considered indicators of their confidence level in making judgments.

R6
Showing the right answer You should go this way.

R7 Other
The experimenter's responses to participants were classified into six levels (R1-R6) based on their utility or usefulness (i.e., the degree to which the response could be used to solve the problem and complete the task): instructions (R1), simple answers (R2), guidance (R3), building or street names (R4), telling the right direction (R5), and pointing out the right direction (R6).Higher levels of the response indicate higher participant dependence.If R6 was given, the participant's performance on the particular task was regarded as a failure.
Generally, females had slightly more questions (Q = 15) and statements (S = 74) than males (Q = 14, S = 65) and correspondingly, the experimenter responded more to females (R = 61) than males (R = 38).Males had fewer questions and negative statements (Q2 + NS = 9) than females (Q2 + NS = 15).Females also received a higher level of responses more frequently + R5 + R6 = 17) than males (R4 + R5 + R6 = 10) regardless of the map type.These results indicate that males appeared to be more confident about their situation and judgments than females.Both males and females had fewer questions, statements, and responses when using the 3D map than when using the 2D map.
It should be noted that landmarks, street names, and other external cues were mentioned by the participants in both the 3D and the 2D group, which means that the participants used a variety of strategies for self-localization and orientation.This is also indicated by the relatively low fixation count percentage (mean value of 6% and 17% in the 2D and 3D group, respectively) and fixation duration (mean value of 3% and 8% in the 2D and 3D group, respectively) on landmarks in the environment (the street view).

Gender Differences and Its Implications to Map Design
One should be careful to interpret the results since the small sample size (n = 20) might cause statistical fluctuations.However, this sample size generally aligns with what has been used in many other eye-tracking studies investigating map reading [45,51,52,66] as well as wayfinding and pedestrian navigation [29,[69][70][71].For example, Kiefer, Giannopoulos and Raubal [54] had 14 participants in their experiment match map symbols and landmarks in the real environment.In the study conducted by Franke and Schweikart [70], the 21 participants were separated into three groups.Each group was presented with text, icon, or vignette landmarks.Therefore, this number of sample size is seen to be typical in related research.
It is not surprising that we did not observe significant gender differences in accuracy and response time, within the 3D group or the 2D group.Neither did we find significant overall gender differences in visual attention (fixation duration and fixation count) on landmarks during spatial orientation.This is consistent with Montello, Lovelace, Golledge and Self [22] who found that males outperformed females when applying recently acquired spatial knowledge from actually travelling through the environment rather than using knowledge acquired from maps.However, we observed two unexpected and interesting findings that merit further discussion.The first interesting finding was that we observed a better performance (higher accuracy and less response time) while using the 3D map than while using the 2D map for males but not for females (Figure 4b,c).We also observed that the 3D map increased fixation duration and fixation count on landmarks for males but not for females (Figure 6).We originally anticipated that both male and female attention could be biased towards the salient 3D landmarks.As found by Liao, Dong, Peng and Liu [29], participant attention was directed to 3D landmarks at the decision points.Davies and Peebles [31] demonstrated that the influence of the 3D visual salience was so strong that it led to ignorance of the optimal strategy for using the 2D layout.However, in this study, this trend could not be confirmed for females.It is plausible that the male participants detected the improvements afforded by the salient 3D landmarks over the corresponding 2D map and encoded this information in their short-term memory, and that during spatial orientation tasks, they were able to attend to these landmarks in the environment (street view) to relate them to the corresponding ones in their mental representations.This could be the reason why male participants made decisions more accurately and efficiently when using the 3D map than when using the 2D map.
The second interesting finding was that we found reversed gender differences for visual attention on landmarks between the 3D and 2D groups.On the one hand, within the 3D group, males fixated longer on landmarks than females.However, although males were successful at noticing and attempting to use these new cues from the 3D map, they did not necessarily outperform females in completing the experimental tasks.On the other hand, within the 2D group, females fixated longer and more frequently on landmarks than males did.Previous studies suggested that females emphasize landmarks when giving and following directions while males use both landmarks and Euclidean information for wayfinding [14,72].In this study, when the 2D map was presented, females focused more visual attention on landmarks than males did, and used them as environmental cues.However, this did not result in a significantly higher effectiveness or efficiency at accomplishing the tasks.It is possible that the contrary results between the 3D and 2D groups when using landmark cues led to the lack of a significant difference in general performance between males and females.We also demonstrated that, as previous studies indicated, females emphasize landmarks during wayfinding whether they acquire knowledge from the 2D map or the 3D map.
Why did such gender differences occur?Multiple theories have been proposed to explain gender differences in their spatial ability [72].An evolutionary perspective considers that males in prehistoric times were required to hunt over large unfamiliar environments, whereas females were required to remember locations and gather food around their living place [73].Therefore, males developed better spatial skills such as mental rotation and spatial orientation while females developed better abilities in object location memory.The salient 3D landmark map, which differed significantly from the commonly used 2D maps but provided useful cues for the later tasks, was apparently new to all participants not only with respect to the map representation style but also in its interactivity.Accordingly, males were seen to be more sensitive to this new tool and tried to apply the newly acquired knowledge.Thus, they had longer fixation durations on landmarks when trying to link them to the objects in their mental representation.We believe that this result is similar to and consistent with Montello, Lovelace, Golledge and Self [22], who demonstrated male capacity with regard to active exploration of and application to new environments.
Social factors also partly account for gender differences.For example, males are more likely to play 3D video games than females [74].The scenes in such 3D video games are similar to those encountered in non-immersive virtual environments as utilized in this work.There is evidence to show that the results of spatial knowledge acquisition from navigation in a desktop-based virtual environment are comparable to those obtained in a real environment [75,76].Repeated exposure to environments encountered in 3D video games could possibly have made males more familiar with 3D representations and more skilled in interacting with 3D interfaces [23].However, because we did not collect data about participants' video-game playing experiences, we cannot validate if this explanation actually contributes to the gender differences that were observed.
Wayfinding strategy might also be contributor to some of the observed gender differences.In our experiment, symbolic 2D layouts and salient photorealistic 3D models were combined to provide multiple types of spatial knowledge at the same time.This means that both the 2D and 3D information played a role in decision-making performed by the participants.The 3D map used in this study did not prevent the use of other orientation strategies, such as 2D cues, from the map.As mentioned earlier, participants referred to street names for self-localization and orientation (Figure 7).It is suggested that while females mainly use landmarks as their visual cues for wayfinding, males use both landmarks and geometric information (direction and distance) [72].It is this flexible strategy that might have contributed to better performance from the male participants in terms of being able to use the 3D map to locate and orient themselves than relying only on the 2D map.In other words, it is possible that males predominantly made use of street information when using the 2D map while absorbing 3D-landmark related information from the 3D map; contrastingly, females persisted with focusing on absorbing landmark information regardless of which type of map they used.However, we cannot generate conclusive explanations from the current experiment alone.The underlying reasons for the observed gender differences need further investigation.
In summary, although no significant overall gender differences were detected, we did identify subtle differences in the manner of usage of 2D and 3D maps in spatial orientation tasks.These interesting results demonstrated the importance of considering the influence of human factors in any evaluation of map design and usability.Furthermore, some implications can be derived from the results to aid the design and usage of 3D representations to aid pedestrian navigation.For example, navigation aid service providers could take care to choose a specific map type to accommodate the needs of female users.Whether the benefits of using 3D maps outweigh those of traditional 2D maps, needs to be analyzed and taken into consideration.A possible solution was to provide traditional 2D maps for female users since there was not enough evidence to show that female users can perform better using 3D maps.In contrast, male users could be recommended to use 3D maps.Given that males and females differ in their wayfinding strategies and visual attention behavior, designers should definitely concern themselves closely with how best to present 3D landmarks during map design.Adaptive strategy and personalized map design methods could be adopted to provide a better user experience.The 3D navigation system should be adaptive to different group of users and tasks.For example, when a user is planning a route, as indicated in our methodology, only salient 3D landmarks along the route should be included in the map.When the user is checking the overview of the route, the system should present a top 2D view because 2D view was found to be more efficient during route learning [29].For male users, when they proceed to turning points, the navigation system can change to 3D view smoothly to show the salient 3D landmarks to better aid determining the next direction.1.
It is worth noting that, as revealed by verbal results, males appeared to be more confident to make decisions than females.This observation can add to the ongoing effort of incorporating differentiated emotional responses observed in a diverse population set into cartographic designs and GI services.For example, Huang et al. [77] collected people's affective responses to environments through crowdsourcing and developed a method to enhance route-planning services.An evaluation showed that users felt more comfortable and satisfied with the new method than with the traditional shortest path.Similarly, Quercia et al. [78] suggested an algorithm to recommend "beautiful, quiet, and happy routes" to users.Confidence, as reported in this article, is undoubtedly an important component of most affective responses on part of humans.How to design representations that can not only help people make effective and efficient decisions, but also make them feel confident and pleasant while doing so, is an interesting aspect for further study.1.
It is worth noting that, as revealed by verbal results, males appeared to be more confident to make decisions than females.This observation can add to the ongoing effort of incorporating differentiated emotional responses observed in a diverse population set into cartographic designs and GI services.For example, Huang et al. [77] collected people's affective responses to environments through crowdsourcing and developed a method to enhance route-planning services.An evaluation showed that users felt more comfortable and satisfied with the new method than with the traditional shortest path.Similarly, Quercia et al. [78] suggested an algorithm to recommend "beautiful, quiet, and happy routes" to users.Confidence, as reported in this article, is undoubtedly an important component of most affective responses on part of humans.How to design representations that can not only help people make effective and efficient decisions, but also make them feel confident and pleasant while doing so, is an interesting aspect for further study.
It is important to note that aside from representation type and gender, many other factors, such as age, cultural background, spatial ability, and domain knowledge can affect spatial behavior [4,14,22,79,80].In this study, the participants' spatial ability, map reading ability, and experiences of using maps for wayfinding were reported by the participants during the questionnaire phase (as described in Section 3.7).The results of these three questions are shown in Figure 8.The results indicated that they had basic knowledge and skills to interpret maps.Only one participant had no experience of using electronic maps to find their way to destinations in unfamiliar environments.It can also be seen that male participants were more experienced in wayfinding using mobile maps (Figure 8).This might, to some extent, explain the subtle gender difference in the experiment.The significant correlation between the self-reported scores and the accuracy (as described in Section 4.2) further demonstrated that these top-down factors affected their performance.The experiment could be improved by employing more sophisticated tests such as the Santa Barba Sense of Direction Scale [81] and Navigational Map Reading Ability Test [82] to control spatial ability and map reading ability.To explore the full potential of 3D representations, human factors and usage contexts cannot be ignored and more attention is needed to address such aspects.

Limitations
In the experiment, participants were allowed to express their thoughts and ask questions.The experimenter was also permitted to respond to them.This is different from the commonly used think aloud method that requires that the participants should not be disturbed by the experimenter [83,84].While the approach of allowing free talk can reveal doubts, judgments, and frustrations which cannot be obtained from eye-movement data directly [85,86], it suffers the criticism that the verbalizing of thoughts can distract participants' attention [49].In addition, although the experimenter kept his response levels as low as possible during this experiment, these communications could also be seen to interfere with the participant thought processes and have an effect on the comparability between participants and the generality of the results.In addition, this leads to a question whether the results of participant performance would be the same without the experimenter's help.Considering their dependency on the experimenter's responses, we would anticipate that if the participants completed the tasks saliently, they would need longer response times and/or make more errors (given the same amount of time).Under this condition, they might be less confident and more frustrated about their decisions.
Although this experiment was a two-factorial design (2D/3D and male/female), the designed 3D map cannot be said to be comparable to the 2D map, for the two representations were not informationally or computationally equivalent [49].The 3D map allowed the users to change viewing angles, for instance.Therefore, we abandoned direct comparisons between 2D and 3D.In future study, it would be desirable to design maps that are comparable in order to test the effect of dimensionality.

Limitations
In the experiment, participants were allowed to express their thoughts and ask questions.The experimenter was also permitted to respond to them.This is different from the commonly used think aloud method that requires that the participants should not be disturbed by the experimenter [83,84].While the approach of allowing free talk can reveal doubts, judgments, and frustrations which cannot be obtained from eye-movement data directly [85,86], it suffers the criticism that the verbalizing of thoughts can distract participants' attention [49].In addition, although the experimenter kept his response levels as low as possible during this experiment, these communications could also be seen to interfere with the participant thought processes and have an effect on the comparability between participants and the generality of the results.In addition, this leads to a question whether the results of participant performance would be the same without the experimenter's help.Considering their dependency on the experimenter's responses, we would anticipate that if the participants completed the tasks saliently, they would need longer response times and/or make more errors (given the same amount of time).Under this condition, they might be less confident and more frustrated about their decisions.
Although this experiment was a two-factorial design (2D/3D and male/female), the designed 3D map cannot be said to be comparable to the 2D map, for the two representations were not informationally or computationally equivalent [49].The 3D map allowed the users to change viewing angles, for instance.Therefore, we abandoned direct comparisons between 2D and 3D.In future study, it would be desirable to design maps that are comparable in order to test the effect of dimensionality.
We administered our eye-tracking experiment in the lab instead of in the real world because using street view in a lab-based setting could provide a good control for the experiment [29].However, fundamental differences between a simulated environment and a real environment could affect the nature of the results [87].For example, evidence has shown that individuals navigate through 3D virtual environments were more likely to disorient themselves [75,76].Further experiments in situ or in an immersive virtual environment could fine-tune the results.In addition, our conclusions are for the most part limited to the small sample size of the participants, materials, and tasks used in the experiment.A broader range of participants, materials, and user tasks is needed in future research to examine the robustness of the gender effects we observed in this study.

Summary and Future Work
Our reliance on navigation support in daily life drives the urgency to explore the potential for new cartographic designs that can enhance our spatial ability and improve performance for spatial decision making.This study explored an approach that combines salient 3D landmarks and 2D layouts into a 3D map to facilitate both of their strengths.It was expected that when the 3D landmarks were emphasized on maps, participants would pay more visual attention to the corresponding landmarks in the environment during wayfinding in an effort to match these landmarks to those stored in memory.In addition, it was also expected that the designed 3D could improve performance for both males and females.We administrated a lab-based eye tracking experiment to validate these hypotheses.Participants were required to learn a route from either a traditional 2D map or the design 3D map and then to perform direction-pointing tasks.We observed no significant gender difference in the general performance but observed a better performance (higher accuracy and less response time) while using the 3D map than while using the 2D map for males but not for females.Another finding was that we observed opposing gender differences in visual attention on landmarks between the 3D and 2D group.In other words, while males fixated longer on landmarks than females when using the 3D map, females paid more visual attention to landmarks than males when using the 2D map.These results were not conclusive for the weaknesses of the experiment design discussed above.However, the exploratory study reported here contributes to the ongoing debate of the influences of 3D representations by providing new experimental evidence of 3D maps for wayfinding.The results can also be helpful to design 3D representations to aid navigation.
Several constraints of the experiment design indicate potential research directions in the future.For example, the dimension variable can be considered into the experiment to compare 2D and 3D directly.In additional to user performance (response time and accuracy) and general eye movement parameters (e.g., fixation duration and fixation count), participants' visual behavior of different user groups can be explored deeply by visualizing and analyzing eye movement data (e.g., [88,89]).

Figure 1 .
Figure 1.The designed 3D map of the experimental area (an oblique view).There are six 3D landmarks (LMs) for the pre-defined route including the start and the end points.The blue line indicates the pre-defined route.

Figure 2 .
Figure 2. The results of pre-evaluation: (a) the designed 3D map; (b) saliency map of the 3D map; (c) the 2D Google map; and (d) saliency map of the 2D map.Darker red means higher saliency, and light green indicates lower saliency.Refer to Figure 1 for English version of Chinese names of landmarks and streets in the map.The blue line indicates the pre-defined route.

Figure 1 .
Figure 1.The designed 3D map of the experimental area (an oblique view).There are six 3D landmarks (LMs) for the pre-defined route including the start and the end points.The blue line indicates the pre-defined route.

Figure 1 .
Figure 1.The designed 3D map of the experimental area (an oblique view).There are six 3D landmarks (LMs) for the pre-defined route including the start and the end points.The blue line indicates the pre-defined route.

Figure 2 .
Figure 2. The results of pre-evaluation: (a) the designed 3D map; (b) saliency map of the 3D map; (c) the 2D Google map; and (d) saliency map of the 2D map.Darker red means higher saliency, and light green indicates lower saliency.Refer to Figure 1 for English version of Chinese names of landmarks and streets in the map.The blue line indicates the pre-defined route.

Figure 2 .
Figure 2. The results of pre-evaluation: (a) the designed 3D map; (b) saliency map of the 3D map; (c) the 2D Google map; and (d) saliency map of the 2D map.Darker red means higher saliency, and light green indicates lower saliency.Refer to Figure 1 for English version of Chinese names of landmarks and streets in the map.The blue line indicates the pre-defined route.

Figure 3 .
Figure 3. Experiment stimuli: (a) designed 3D map; (b) 2D Google map; and (c) a screenshot of the street view at the Crossroad 2 (C2).The blue line from A to B indicates the predefined route.C1, C2 and C4 are decision points that require re-orientation.C3 is a T-junction and requires confirmation of the route.Refer to Figure 1 for English version of Chinese names of landmarks and streets in the map.The blue line indicates the pre-defined route.The picture of the street view was from Baidu Map (http://ditu.baidu.com)under free license for non-commercial use.

Figure 3 .
Figure 3. Experiment stimuli: (a) designed 3D map; (b) 2D Google map; and (c) a screenshot of the street view at the Crossroad 2 (C2).The blue line from A to B indicates the predefined route.C1, C2 and C4 are decision points that require re-orientation.C3 is a T-junction and requires confirmation of the route.Refer to Figure 1 for English version of Chinese names of landmarks and streets in the map.The blue line indicates the pre-defined route.The picture of the street view was from Baidu Map (http://ditu.baidu.com)under free license for non-commercial use.

Figure 5 .
Figure 5. Illustration of generating dynamic AOIs of LM2 at Crossroad 1 for quantitative fixation analysis.The pictures were from Baidu Map (http://ditu.baidu.com)under free license for non-commercial use.

Figure 5 .
Figure 5. Illustration of generating dynamic AOIs of LM2 at Crossroad 1 for quantitative fixation analysis.The pictures were from Baidu Map (http://ditu.baidu.com)under free license for non-commercial use.

Figure 7 .
Figure 7. Summary of the verbal protocol.The number on each cell represents the number of occurrences of the corresponding coding category shown in Table1.

Figure 7 .
Figure 7. Summary of the verbal protocol.The number on each cell represents the number of occurrences of the corresponding coding category shown in Table1.

19 Figure 8 .
Figure 8. Self-rating results of: (a) the ability to find a way to a destination in an unfamiliar environment; (b) the familiarity with electronic maps; and (c) the experiences using mobile maps for wayfinding.

Figure 8 .
Figure 8. Self-rating results of: (a) the ability to find a way to a destination in an unfamiliar environment; (b) the familiarity with electronic maps; and (c) the experiences using mobile maps for wayfinding.

Table 1 .
Code scheme of the dialogue between participants and experimenter.