Assessing Similarities and Differences between Males and Females in Visual Behaviors in Spatial Orientation Tasks

Spatial orientation is an important task in human wayfinding. Existing research indicates sex-related similarities and differences in performance and strategies when executing spatial orientation behaviors, but few studies have investigated the similarities and differences in visual behaviors between males and females. To address this research gap, we explored visual behavior similarities and differences between males and females using an eye-tracking method. We recruited 40 participants to perform spatial orientation tasks in a desktop environment and recorded their eye-tracking data during these tasks. The results indicate that there are no significant differences between sexes in efficiency and accuracy of spatial orientation. In terms of visual behaviors, we found that males fixated significantly longer than females on roads. Males and females had similar fixation counts in building, signpost, map, and other objects. Males and females performed similarly in fixation duration for all five classes. Moreover, fixation duration was well fitted to an exponential function for both males and females. The base of the exponential function fitted by males’ fixation duration was significantly lower than that of females, and the coefficient difference of exponential function was not found. Females were more effective in switching from maps to signposts, but differences of switches from map to other classes were not found. The newfound similarities and differences between males and females in visual behavior may aid in the design of better human-centered outdoor navigation applications.


Introduction
Wayfinding is an almost daily operation defined as the process of determining a route, learning the route, and retracing or reversing the route from memory [1]. Spatial orientation (i.e., determining the direction one is facing in a given spatial reference) is a key task in wayfinding and is closely related to spatial self-localization (i.e., determining one's location). Both spatial orientation and self-localization are often performed at decision points. To facilitate these two tasks, external cues, such as signs, landmarks, and maps are frequently used during wayfinding.
The wayfinding process can be affected by the format of information presentation, such as landmark utility [2,3] and map representations [4]. In addition, human attributes play an important role in wayfinding behavior. Sex [5,6], age [7], and cultural background [8] are confirmed factors that can influence people's wayfinding process. The relation between wayfinding and human attributes could provide a reference for the improved human-centered design of navigation systems and wayfinding applications. Sex difference is a controversial factor that influences wayfinding and has been discussed extensively in the available literature, but most studies focus on exploring the capacities between males and females during wayfinding. These studies are not sufficient to determine the exact behavioral similarities and differences between the sexes in the wayfinding process.
Vision is a primary human sense and requires up to 40% of the brain's functional area [9]. In wayfinding tasks, people mainly find cues (e.g., signposts) and match them with physical maps or cognitive maps in their mind, then make a decision. The ability to observe the environment is crucial for performing wayfinding tasks. However, visual behavior differences between males and females must be further elaborated. The pattern of visual behavior may help clarify the differences in behavior between sexes and aid in the design of an advanced navigation system.
In this study, we asked participants to perform a series of spatial orientation tasks (i.e., determining which direction to go in a given scene using a map). After grouping participants into males and females, we analyzed the visual behavior differences in the following aspects: the fixation count for different categories of cues, the fixation duration for different categories of cues, the fixation duration distribution, and the switch times between the map and four categories of cues in the scene.
The remainder of this paper is structured as follows: In Section 2, we review related works on the perception of sex differences in wayfinding and visual behavior in the context of wayfinding; in Section 3, we describe the experimental design, including the participants, materials, and procedure; data processing and results are presented in Section 4; in Section 5, we discuss the research results; finally, in Section 6, we provide a conclusion about the results and discuss future potential works.

Sexed-Related Research in Wayfinding
Wayfinding has always been an essential task for human beings and remains a key aspect of modern life. People behave differently in wayfinding tasks. Sex is considered an important factor that influences the performance of daily wayfinding tasks [10]. Many researchers have made considerable efforts to illustrate the relationship between sex and wayfinding.
A considerable amount of attention has been focused on the performance difference between males and females. Researchers have designed various experiments to study whether there are sex-related differences in accuracy and efficiency of wayfinding. Existing work also includes real-world experiments [11] and virtual environments [12]. However, many controversial results have been reported. For example, Martens [12] found that females spend more time on finding targets but did not take more detours than males. Lin [13] designed wayfinding tasks in a virtual environment and did not find conclusive evidence supporting the hypothesis that males perform better than females. Wang [14] also found no significant differences between males and females. Hegarty [15] found that males and females do not show differences in accuracy and reaction time. Saucier [16] found that females perform best with the instructions provided by landmark information.
Scientists have also found that spatial anxiety and wayfinding strategies can be used to explain certain differences between males and females during wayfinding tasks. Lawton [17] reported that women feel more anxiety than men during wayfinding and that this increased anxiety leads to more mistakes [18]. Males and females adopt different navigation strategies. For example, Lawton found that males use an orientation strategy in finding their way, whereas females tend to adopt a routing strategy [5]. Females also rely more on landmarks than males, while males prefer a combination of landmarks and route description [19].
In-depth theories have been proposed to explain sex differences. A biological theory was proposed by many researchers to illustrate the differences [20]. These researchers suggested that due to genetic differences, males and females develop different abilities related to spatial cognition and wayfinding. Silverman hypothesized that social division of labor between males and females during early human history caused these sex differences [21]. This theory proposed that females spent most of their time in domestic environments looking after children while males found food far away from home, which caused males to develop greater spatial cognition than females.
Disputed results and complex factors leading to sex differences provide evidence for sex differences. Another aspect of sex differences in wayfinding discussed by researchers is how males and females interact with the environment, which is important for designing human-centered navigation aids to improve the wayfinding ability of both males and females.

Sexed-Related Research in Visual Behavior
Sex differences in visual behavior have been studied in many non-spatial contexts, and a relationship between sex and visual behavior has been shown [22][23][24]. Sex also plays an important role in spatial cognition ability [6,25]. Specifically, it has been widely recognized that sex is related to differences in many aspects, such as wayfinding strategies [5] and arithmetical reasoning, and these differences have been verified using many approaches, including the use of functional magnetic resonance imaging (FMRI) to study the active area of the brain when performing spatial tasks [26,27].
The academic literature on sex differences in visual behavior has revealed several patterns. For example, females are more likely than males to focus their visual attention on landmarks [16]. A greater number of landmarks helps females perform better in wayfinding tasks [28], however, the performance may be inverse given a different map format. For example, Liao found that males focus more than females on 3D map landmarks, while for 2D maps, females focus their visual attention on landmarks to a greater extent than do males [29]. According to Lin [13], males and females perform better when using global and local landmark information, respectively.
There are also several studies discussing sex differences in eye-tracking indices during the performance of wayfinding tasks; visual attention on landmarks has been discussed substantially, and other eye-tracking indices also show differences between males and females. For example, Sven [30] found sex differences in fixation duration and pupil diameter when performing a spatial navigation task. Valentina [31] found that when planning a route, the number of fixations and fixation ratio between males and females is significantly different, but these studies are limited to pure eye-tracking data without considering semantic information on the environment.

Visual Behavior Studies on Wayfinding
Eye-tracking sensors have been widely used in the study of wayfinding to gain real insights into human behavior. During the wayfinding process, people obtain information about the scene, and wayfinding tools, such as maps, navigation aids, and other sources of information are related to eye movement behavior. Compared to traditional methods such as interviews [32], questionnaires [33] and drawing sketch maps [34], for wayfinding studies, the eye-tracking method provides more objective and intuitive data for analysis. This method is also a popular approach in the fields of spatial cognition and cartography.
A group of studies used eye-tracking methods to study the process of wayfinding. Visual attention, an index of fixation location, is the most commonly used index in eye-tracking related research. For example, Kiefer [35] observed the use of different strategies employed by participants based on visual attention to maps and real scenes. Other studies attempted to find the start time of visual attention by using the first fixation on areas of interest (AOIs) [36] or whether fixation falls on AOIs throughout the tasks, which represents the efficiency of wayfinding. However, one limitation to visual attention is that it does not involve how people look when they make decisions. Another type of eye-tracking index is the fixation time; Viaene [37] used the average fixation time and fixation time maximum to rank the most fixated-upon landmark in an indoor environment. The eye gaze shift is also a useful index in wayfinding studies. Wiener [38] calculated the gaze shifts between the center of the AOI and the image edge before making a decision in a virtual environment. Kiefer [39] established a self-localization test in an urban environment and found that more switches between map labels and corresponding landmarks in the environment helped participants perform better in the tasks. Pupil size, as an indicator of cognition load, can also be used to measure the wayfinding process [40]. For example, Condappa [41] found more cognitively demanding configuration strategies to be associated with increases in pupil dilation.
Given the context of wayfinding, visual behavior is defined by visual attention, fixation time, and gaze shifts between specific AOIs to simplify the environment. In reality, understanding how eye movements interact between a complex environment (which has roads, signposts, and buildings) and a map requires the consideration of both landmarks and non-landmarks, such as roads and cars, which may disrupt people's concentration. A complex assessment of the environment's effect on wayfinding is required based on available visual behavior studies on wayfinding.

Participants
Forty undergraduate and graduate students (20 females and 20 males) from Beijing Normal University participated in this experiment voluntarily. Their ages ranged from 20 to 30 (Mean (M) male = 24.54, Standard Deviation (SD) male = 2.32; M female = 22.56, SD female = 2.32). All participants had normal or corrected-to-normal vision. All participants had a basic knowledge of reading maps and experience with the use of electronic maps in pedestrian navigation. None of them had prior knowledge of the experimental materials. Sample rates of all participants were over 70%. Each participant was paid 15 yuan as compensation for participating in the experiment. The experiment was reviewed and approved by the local institutional board of the authors' university. All participants provided their written consent.

Apparatus
The eye tracker used in our experiment was the Tobii T120 eye tracker (USA, Tobii AB; www.tobii. com). This eye tracker has a tracking accuracy of 0.5 • and a sampling rate of 120 Hz. Tobii Studio v3.2 software was used to process the eye movement data. The experimental materials were presented on a 17 inch monitor, with a resolution of 1280 × 1024. The experiment was set in an eye-tracking laboratory with sufficient light and a noise-free environment.

Materials
We designed 20 spatial orientation tasks for the experiment. In each task, an image was shown on the screen (Figure 1). The top half of the image was a scene of an urban environment that was captured with Baidu Street View (http://ditu.baidu.com). The scene showed a junction of two or more roads and included landmarks, such as buildings and signposts. The text on the signposts and buildings was clear enough to be recognized. There were several candidate arrows (e.g., A, B, C, and D) in the scene indicating different turning directions at the junction. The bottom half of the image comprised an electronic map from the internet corresponding to the scene. On the map, a route with a turn at the junction was labelled. The start and endpoints were also shown on the map. All selected scenes were located in Nanjing and Shanghai, China. None of the participants were familiar with these places. Both the street views and maps were provided by Baidu Map (https://map.baidu.com).

Procedure
The participants were first welcomed and briefly introduced to the experiment. Their eyes were then calibrated using five-point calibration methods to ensure tracking accuracy. The participants then read task instructions and started with an example task. The instructions were as follows: Imagine that you are following the route on the map (lower) to go to your destination. Currently, you are standing at the turning point. A picture of the turning point is presented on the upper half of the screen. Four directions (arrows), labelled A, B, C, and D, are shown in the picture. Please read the map and the picture carefully and decide which direction you should turn towards. When you have made a decision, click one of the labels. You have 90 s for the task, and if you do not make a decision when time is up, then the task will be skipped automatically.
After the example task, the participants were required to accomplish another 20 tasks similar to the task shown in Figure 1. Half of the participants (10 males and 10 females) completed the tasks in sequence. For the other half of participants, the sequence of the tasks was reversed. The participants were required to complete a questionnaire regarding to their demographic information, such as age, university major, and experience with using maps in wayfinding. The experiment took approximately 15 min for each participant.

General Performance
General performance in wayfinding can be assessed based on efficiency (response time) and accuracy. We used the number of correct answers and the completion time for all tasks to measure the accuracy and response time, respectively. t-tests were applied to test the significance of the sex differences in these two metrics. We also calculated Cohen's d [42] as metric of effect size, where 0.3 to 0.5 was considered small, 0.5 to 0.8 was considered medium, and above 0.8 was considered large. The results are shown in Table 1

Image Segmentation
It is important to focus on which objects the participants saw. According to traditional methods, drawing AOIs is commonly used in many eye-tracking studies. In contrast to traditional methods, we annotated fixations using semantic image segmentation methods. This method enables for more efficient divisions of the image into different areas with high accuracy. We used Deeplabv3+ [43], an open-source image segmentation model based on a deep neural network, to segment the images automatically. An example is shown in Figure 3. Trained by the Cityscapes dataset [44], Deeplabv3+ could classify each pixel into one of 19 object classes, such as roads, buildings, sky, and vegetation. Deeplabv3+ could achieve an accuracy of 82.0% and was ranked sixth in the Cityscapes dataset pixel-level semantic labelling task (https://www.cityscapes-dataset.com/benchmarks/#scene-labelingtask) on 30 August 2018. After semantic image segmentation, we overlaid fixations on the segmented images and annotated them based on the corresponding object classes. We then reclassified the original 19 object classes into four areas of interest (AOIs; Table 2) [35]. Map was regarded as a single AOI category. The first AOI category was road, which included sidewalks and roads. Roads can provide geometrical and topological structure information for wayfinding. However, few studies have been concerned with visual behavior regarding roads. Here, we classified road as a type of AOI. The second AOI category was building. Buildings are a commonly used AOI in wayfinding studies. People often utilize buildings as landmarks to assist with self-localization and orientation in wayfinding.
Prior studies have suggested that there are significant differences between males and females in the use of buildings during wayfinding. For example, Liao found that males' fixation on buildings lasted for shorter durations than females' fixation [45]. The third AOI category was signpost, which indicated road names and directions; it is also an essential landmark for spatial orientation. The fourth AOI category was other objects, which included vegetation, vehicles, and people; these objects are less useful for spatial orientation, but it is interesting to explore whether these objects affect males and females differently. Finally, map was regarded as a single AOI category used to elucidate the interaction between the map and the environment.

Fixation Count and Fixation Duration
We first tested the sex similarities and differences in the mean fixation count and mean fixation duration on the five types of AOI. The results are shown in Table 3 and

Fixation Duration Distribution
The mean fixation duration can reflect the average duration but does not provide distribution information. The fixation duration distribution reflects the pattern of how people distribute their attention. A long duration of looking at a certain area indicates a high level of interest or higher processing difficulty, while a short duration means indicates that certain information is valueless or easy to process. It has been empirically corroborated that the eye gaze duration obeys an exponential distribution [46,47]. Here, we fit the fixation duration of every participant with the formula y = ab x , where y is the number of fixation counts and x is fixation duration. Parameter b is base of the function that controls the change rate of y influenced by x. Parameter a is a coefficient defining the magnitude of y. The comparison of the fitting coefficients between males and females is shown in Table 5 and Figure 5, which shows that a for males (M = 216.732, SD = 56.789) is higher than that for females (M = 203.743, SD = 43,287), but the difference is not significant (t = 0.134, p = 0.860, Cohen's d = 0.257). The parameter b for males (M = 0.951, SD = 0.543) is significantly (t = 1.92, p = 0.038, Cohen's d = 0.037) lower than that for females (M = 0.968, SD = 0.357).  Furthermore, we explored the fixation duration distribution for different AOIs ( Figure 6A-E). For building and other objects, males and females showed similarities in the fitness curve, while for road, signpost, and map, females had a greater fixation count and a lower fixation duration than males. The total fixation duration distribution ( Figure 6F) also shows that females had more low-value fixation durations than males. We can also determine that males had a broader fixation duration range in the building category than females. Males focused their attention for durations up to 1400 ms, and females' fixation duration reached approximately 1040 ms. For road, signpost, map, and other objects, females showed a wider fixation duration range than males, with maximum fixation durations of 2100 ms, 2300 ms, 1750 ms, and 2500 ms, respectively.

Attention Switches between AOIs
Attention switches between the map and scene involve the processes of localization and orientation. Table 6 and Figure 7 shows the attention switches from the map to road (map-road), map to building (map-building), map to signposts (map-Signpost) and map to other objects (map-others).

Discussion
We focused on studying sex similarities and differences in visual behaviors using the eye-tracking method in spatial orientation tasks, which differs from previous studies emphasizing wayfinding skills or performance. We obtained results showing that there were no significant sex differences in the accuracy and efficiency of spatial orientation tasks. In addition, we discuss the obtained results with respect to sex similarities and differences in visual behaviors.
We concentrated on identifying the visual behavior similarities and differences between males and females based on AOIs. In our experiment, males looked at the roads significantly more than females. This finding reflects the fact that males tend to focus on the information on geometrical cues that is provided by roads [48]. In contrast, females tend to find landmarks that confirm their direction and position, which is consistent with previous studies showing that females rely more on landmarks to obtain an understanding of how to move from place to place [16,28]. Moreover, males and females show similarities in fixation count in buildings, map, signposts, and other objects. And they also have similar fixation duration of road, buildings, map, signposts, and other objects.
The fixation duration distribution in our study also revealed sex-associated similarities and differences. Although previous studies discussed the fixation duration distribution in other tasks [49], little research has discussed the pattern in wayfinding processes or the differences between males and females. Nevertheless, this feature is of interest for determining how males and females distribute their limited visual attention among the abundant information sources available in a street scene within a given time. In our spatial orientation tasks, the goodness-of-fit results indicated that the fixation duration obeys an exponential distribution. The distribution of the fixation duration indicates that most of the information can be processed within a short time, while key information is treated visually and with greater attention. Females have more fixation counts of low fixation durations when it refers to road, signpost, and map, indicating that females tend to use more short-duration processing in the three types of objects than males. Females also have broader fixation duration of road, signpost, and map categories, indicating that females tend to spend a longer fixation time processing these objects. In contrast, males are prone to focus their attention for a longer duration on buildings.
In addition to the fixation count related to the eye-tracking indices, we also found sex similarities and differences in attention switches between the map and other types of AOIs. The process of fixation shifts between the map and street image reflects that the matching process is also an essential eye-tracking behavior. The result shows that females have more switches from the map to signposts. This result can also reflect the fact that females rely more on landmarks than males to match the map and environment. Meanwhile, males and females have similar switches between map and other categories.
Our results fill the gap in wayfinding research from the human perspective and provide new evidence indicating visual behavior similarities and differences between males and females. These findings can aid in the improved design of human-centered navigation applications, especially when technologies such as eye-tracking sensors and augmented reality are introduced into navigation systems in the near future. The present study may provide a useful reference to guide the development of such systems. For example, according to the feature of females have more attention switches between maps and landmarks than males, it can be inferred that females have an advantage in using information of landmark. Therefore a navigation system could enhance landmarks like signposts in the environment for female users.
There are certain limitations to our study. We used a desktop environment instead of a real environment in our research. We considered that because our focus was on spatial orientation tasks, it was unnecessary for the participants to complete the entire wayfinding process requiring them to walk around. The tasks were set in the decision point during which people often lose their way. In addition, a static desktop environment allows the scene to be easily controlled and enables a reasonable comparison between males and females. The desktop environment differs somewhat from a real and dynamic environment in which noise and movement are included. This difference may have caused us to underestimate participants' fixation on cars or pedestrians in comparison to real-world experiments. Another related point of the experimental design is that the image environment allowed participants only a fixed perspective, which provided limited information to the participants. In addition, we used the DeeplabV3+ model to segment the images; although this method was a fast way to classify each pixel into one class, several points need further improvement. First, AOIs drawn by image segmentation methods are different from manually drawn methods, which enlarged AOI areas, and some fixation points might be misclassified. A method for more accurate classification would be desirable in the future. Another shortcoming is that our method failed to confirm the relationship between different objects. This leads to uncertainty in confirming which objects participants are truly interested in looking at because the fixation is overlain on segmented images. As an example, when a fixation falls on a car, it is also a possibility that the participant wants to see the road; the uncertainty is caused by the overlapping relationship between the car and the road. Therefore, it is important to address how to further confirm which objects fixation points truly belong to from the participant's perspective.

Conclusions
In this article, we discuss sex similarities and differences in visual behaviors by using an eye-tracking method. We divided a street scene into four categories according to their guiding features. Different from many existing discussions on sex similarities and differences in wayfinding, our study focused on visual behaviors. These similarities and differences are measured by our selected eye-tracking indices: similarities and differences in fixation count, fixation duration and fixation duration distribution, and switches between AOIs. We found that males and females have no significant differences in the number of correct answers to the tasks or in the completion time. According to the visual behavior analysis, males seem to be more interested in roads, while females transfer more often between map and signpost. In addition to the differences we found in our research, we also claimed several similarities in fixation duration, fixation count, and fixation switches between males and females. Our results reveal eye-tracking feature patterns between males and females, and this may help us to understand their different behaviors in spatial orientation.
It should also be noted that we describe a series of visual behaviors in this study, but no evidence indicates how these visual behaviors are related to task efficiency and accuracy. Further research must be conducted to determine how these visual behaviors affect the decision-making process, and these future works might help us to enhance spatial abilities. The combination of computer vision methods and eye-tracking methods is a potential trend in the study of spatial cognition in the future. Further research on the eye-tracking method could be conducted to identify which objects the participants really want to see instead of relying on the direction of fixation. This advance could be realized by comparing the commonalities and differences between computer vision and human vision.