Evaluation of the Cartographical Quality of Urban Plans by Eye-Tracking

This paper describes a study of the evaluation of cartographic quality of urban plans in the Czech Republic using eye-tracking. Although map visualization is a crucial part of the urban planning process, only a few studies have focused on the evaluation of these maps. The plans of four Czech cities with different styles of visualization and legends were used in this eye-tracking experiment. Respondents were required to solve spatial tasks consisting of finding and marking a certain symbol on a map. Statistical analyses of various eye-tracking metrics were used, and the differences between experts and students and between the map and legend sections of the stimuli were explored. The study results showed that the quality of map symbols and the map legend significantly influence the legibility and understandability of urban plans. For correct decision-making, it is essential to produce maps according to certain standards, to make them as clear as possible, and to perform usability testing on them.


Introduction
A key part of any urban planning process is the final output visualized by the map.Urban and regional planning would be completely chaotic without maps [1].Churchill [2] stated that, although the term "urban cartography" only came into use after the Second World War, it is plausible to consider town plans as the oldest forms of urban maps.Because a plan is also a map, cartographic and geoinformation principles should also be considered [3].
Spatial planning policy documents in Europe involve a symbolic representation of the territory in the form of icons, diagrams, and maps.Cartographic visualization, or the conceptualization of territory, is an integral part of spatial planning [4].Dühr [4] stated that not much research has been undertaken on the use of cartography in planning.Jarvis [5] even commented that planning theory hardly touches "drawing" at all.Neuman [6,7] for Spain, Lussault [8] for France, and Gabellini [9] for Italy, for example, investigated the communicative potential of visualizations in urban planning.Söderström's work [10] has concentrated on understanding how the structure of visualizations influences the activities of planners in Swiss towns cities (Bern and Zurich).

Urban Plan Standardization
The cartographical aspects of urban plans are represented by the processes of standardization.Map symbol standardization received early attention from academic and practicing cartographers beginning more than 150 years ago [11].According to Robinson [11], the first printed discussion of map symbol standardization was introduced by Funkhouser [12].Symbol standardization was applied to economic maps, topographic maps, and transportation maps, mostly during the 1970s, but urban and regional planning were not included.In the case of a map symbology for regional planning, many opinions and completely different outcomes with respect to the graphical aspect can be found [13].
The standardization of urban (master) and regional plans has been investigated by many authors, for example, in Poland [14], Hungary [15] and Norway [16].The relevant documents adopt approaches that vary from complete indifference to very strict rules for a map symbology set.For example, in the Polish standardization, colors are used for various types of land use.These colors are very different from the colors commonly used in Czech urban planning.This discrepancy can lead to misunderstandings at borders where the two plans touch [17].
Urban planning has a long tradition in the Czech Republic.Urban plans have been created since the 1930s, and the importance of maps as a part of the urban plan has increased.Most of the maps included in the urban plan have become more detailed and more complicated, decreasing the level of understanding.These changes were visualized and described by Burian et al. [18] using the example of five urban plans for the city of Olomouc (1930, 1955, 1985, 1999, and 2010).Plans with more complicated structures and content led to a process of standardization.The Construction Act [19] can be seen as the first methodological approach.During the last 20 years and in connection with the implementation of GIS (Geographical Information Systems) techniques, many methodologies have appeared (e.g., [20]).Although the public sector has authored methodologies, many regional methodologies have also been created by private companies, such as Hydrosoft Veleslavín [21,22] or T-MAPY [23,24].However, none of these methodologies address the issues of map symbology creation in detail.Several attempts to create a standardized set of map symbols for urban plans have been introduced by several Czech companies in connection with regional methodologies, but none of these symbol sets have been published.
Currently, no uniform approach to the cartographic visualization of urban plans in the Czech Republic [17] exists.With no sufficient standardization, the author-designer makes many decisions about the visual appearance of the symbology.This can be an issue because the designer might not have sufficient experience to make these decisoins.Therefore, flaws might arise in map composition, map symbology, and other important cartographic aspects.This can reduce legibility and understandability and even lead to misunderstandings.Many of these failures connected with missing standardization were elaborated by the authors of this manuscript in several publications in the past (e.g., [3,13]).The authors applied subjective research methods on more than 50 Czech urban plans and concluded that, in most of the actual plans, many technical and cartographical failures can be observed.Especially in the case of plans comparison or cross-border tasks, wrong decisions can appear.The authors also conclude that the cartographic quality or urban plans and missing unified standardization are the significant issues that should be solved.These conclusions were accepted only by som of the involved urban planners.The rest requested for the confirmation of subjective results using more objective approach.
For this reason, authors suggested using the new symbology that has been created in cooperation between Palacký University in Olomouc and the Regional Authority of the Olomouc Region (see [3,17] for details).The authors considered four aspects of symbology creation: users, cartographic correctness, established convention, and the use of a data model.The advantage of the use of this standardized symbology is that urban plans in different districts look uniform and can be compared.This symbology is currently in use in seven of the eleven districts in the Olomouc region.

Urban Plans Evaluation
According to Štěrba et al. [25], the quality of cartographic visualization influences the user's judgement and subsequently his or her decision.The overall usability of the map (as a strictly objective criterion) should be a determining aspect when evaluating the quality of cartographic visualization.
Various methods can be used to analyze maps and cartographic visualization in spatial planning.Rohrer [26] presented "A Landscape of User Research Methods", in which various user-experience research methods are shown on a three-dimensional framework with the following axes:

•
Attitudinal vs. Behavioral (Subjective vs. Objective) This scheme has been modified and is shown in Figure 1.According to this distribution [26], eye-tracking is considered as a behavioral (objective) method, because it shows "what people do", instead of "what people say".From the qualitative/quantitative point of view, eye-tracking lies in the middle, which means that recorded data could be analyzed qualitatively as well as quantitatively.From the perspective of "context of use", eye-tracking experiments could be designed as natural (in the real world, especially using eye-tracking glasses) or lab-based.This lab-based design was used in the presented case study with urban plans.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 3 of 25 This scheme has been modified and is shown in Figure 1.According to this distribution [26], eye-tracking is considered as a behavioral (objective) method, because it shows "what people do", instead of "what people say".From the qualitative/quantitative point of view, eye-tracking lies in the middle, which means that recorded data could be analyzed qualitatively as well as quantitatively.From the perspective of "context of use", eye-tracking experiments could be designed as natural (in the real world, especially using eye-tracking glasses) or lab-based.This lab-based design was used in the presented case study with urban plans.[26]).
Dühr [4,27] described several approaches to analyzing cartographic visualization in urban planning ( [10,[28][29][30]).Pickles [29] suggested an approach to map analysis not unlike discourse analysis, which treats maps as an expanded concept of text.Two interrelated internal structures of the map are considered: graphical and linguistic.According to Dühr [4], Söderström [10] proposed what he called a "visual circuit" for the analysis of spatial representations in planning.This visual circuit consists of four interrelated fields: the context of elaboration, the process of production, the context of usage of the visualization, and the "materialization" or implementation.
Based on these methods, Dühr [4,27] introduced a method of map analysis for strategic spatial plans in Germany, England, and Denmark.Because neither Pickles nor Harley provided a detailed list of criteria for analysis, Dühr [31] suggested three categories for a comparative analysis of cartographic representation: the level of abstraction, the level of complexity, and the use of associative colors and symbols "on the map".Each category consists of several criteria for analyzing cartographic representation (spatial positioning, visual hierarchy, complexity, map symbols, map style, etc.).Most of the selected criteria use the qualitative method, which is sometimes very subjective, and its application to different maps from different regions can be difficult.However, even between planning systems within a single "planning tradition", we see differences in approach to mapping.Dühr's findings show that the function of a plan ultimately determines the form and style of the visualization.A clear difference between the comprehensive and regulatory approach to planning is evident in the German and Dutch plans compared to England's less formal approach [17].Dühr [4,27] described several approaches to analyzing cartographic visualization in urban planning ( [10,[28][29][30]).Pickles [29] suggested an approach to map analysis not unlike discourse analysis, which treats maps as an expanded concept of text.Two interrelated internal structures of the map are considered: graphical and linguistic.According to Dühr [4], Söderström [10] proposed what he called a "visual circuit" for the analysis of spatial representations in planning.This visual circuit consists of four interrelated fields: the context of elaboration, the process of production, the context of usage of the visualization, and the "materialization" or implementation.
Based on these methods, Dühr [4,27] introduced a method of map analysis for strategic spatial plans in Germany, England, and Denmark.Because neither Pickles nor Harley provided a detailed list of criteria for analysis, Dühr [31] suggested three categories for a comparative analysis of cartographic representation: the level of abstraction, the level of complexity, and the use of associative colors and symbols "on the map".Each category consists of several criteria for analyzing cartographic representation (spatial positioning, visual hierarchy, complexity, map symbols, map style, etc.).Most of the selected criteria use the qualitative method, which is sometimes very subjective, and its application to different maps from different regions can be difficult.However, even between planning systems within a single "planning tradition", we see differences in approach to mapping.Dühr's findings show that the function of a plan ultimately determines the form and style of the visualization.A clear difference between the comprehensive and regulatory approach to planning is evident in the German and Dutch plans compared to England's less formal approach [17].
Another comparative analysis is that of Tang and Hurni [32], who provided an example comparing China and Switzerland at different planning tiers in respect of the plan contents and their symbolic modalities and visual styles.They conclude that the symbol system in Chinese cases has better logic and hierarchies but lacks the harmonious and exquisite vision of the Swiss cases.In general, the vertical comparison of visual styles of the planning maps in the two countries' planning systems show to a certain extent a substantial difference between federal and other levels in Switzerland and a resemblance between the visualization of thematic layers of all tiers in China despite the existence of diverse symbol faces.Next, the transverse cross-comparison between the two countries denotes that the cartographic style of Swiss federal spatial concepts is analogous to the urban system plans above the prefectural level in China, but the Swiss cantonal plans are widely divergent from the Chinese provincial plans [32].
Qualitative research described by Dühr [27,31,33], Tang and Hurni [32], Söderström [10] and Pickles [29] has not determined how people behave when solving tasks using urban plans.Those authors compared several spatial plans (plans at regional, country, national or transnational level) using qualitative methods and described the similarities and differences, but do not evaluate cartographical quality.

Eye-Tracking
For the reasons mentioned in Section 1.2, a behavioral research method (eye-tracking) was adopted to analyze urban plans as objectively as possible.Eye-tracking is one of the most precise and objective methods of usability studies because eye-movement recording does not rely on self-reporting [34].With the use of eye-tracking, it is possible to gather information that is inaccessible using any other technique, particularly when information about people's behavior when solving tasks is difficult to observe by any other method (e.g., how much time participants spend on the different sections of urban plans, such as the map or legend).Other aspects of people's behavior, such as Trial Duration or the correctness of answers, can be investigated using other methods, for example, by direct observation or screen recording.In these examples, it would be much more difficult to distinguish what participants were really doing.
One of the first comprehensive publications to address the application of eye-tracking in cartography was that of Steinke [35], who summarized the results of the former research and stressed the importance of distinguishing between user groups according to their age and education.More recent studies using eye-tracking in cartography have focused on evaluating cartographic principles [36], interactive maps [37], small multiple map displays [38], graphical outputs from GIS [39], the differences between experts and novices [40], map uncertainty [41,42], 3D visualization in maps [43,44], and color schemes [45].
The examination of users' perception in static maps is clearly related with an examination of their reaction in visual variables change.The first cognitive studies in cartography were focused on the effectiveness of symbols used in thematic maps.An example of such study may be Taylor's work [46] where graphical dimensions of symbols such as length, area, or color were investigated.An eye-tracking was used for investigation of the visual variables for example in the classic study of Garlandini et al. [47], who investigated the influence of a change of four visual variables: size, color value, color hue, and orientation.Petchenik [48] stated that, for a successful transfer of information between a cartographer and a map reader, the reader must understand the map in the same way as the author has created it.The task of cognitive cartography is to reveal how users read the individual map elements and how the meaning assigned to these elements varies between different users.
Although cartographic user research of visual variables has a long tradition, the investigation of complex map works is relatively unexplored.A clear example of a complex map containing many various symbols is a geological map.Two legend designs (alphabetically ordered and color-ordered) of soil-landscape (geological) maps were compared in the study of Coltekin et al. [49].Similar to geological maps, urban plans contain many layers represented by various symbols.According to our knowledge, eye-tracking (or another user-experience method) have not been previously used to evaluate urban plans.Only a few studies have addressed landscape perception, and studies focused on specific urban planning issues such as flooding maps [50].One example is the study of Dupont et al. [51], who analyzed the observation patterns of 23 participants viewing different types of photographs of the landscape in Flanders (Belgium).The authors tested whether the degree of openness and heterogeneity of a landscape affects the observation pattern.The analysis clearly reveals that both landscape characteristics have an influence.Kim et al. [52] analyzed nightscape (night-time landscape) images using traditional survey methods (preference survey) and eye-movement analysis.The authors found a significant relationship between the results of the preference survey and recorded eye-movement data.Noland et al. [53] surveyed the eye-tracking visual preferences of 20 participants using a set of 40 images.The aim of the study was to qualitatively evaluate how individuals process and rank images in public settings for urban planning.For the data analyses, Areas of Interest around important parts of the images (cars, buildings, sidewalks, etc.) were marked.Time to First Fixation, Time Spent (Dwell Time), and Fixation Counts were investigated together with the qualitative ranking.The results showed that cars, parking, and advertisements are associated with negative rankings, but attract a participant's attention.All these studies used photographs for the experiments, and no study focusing on the eye-tracking evaluation of urban plans can be found.
In the study described in this paper, we focused on the analysis and evaluation of the cartographical quality of selected urban plans in the Czech Republic.For this purpose, the following hypotheses were made and investigated using the eye-tracking method: 1.
Map symbology (number of colors, map symbols, and features/layers on the map) significantly influences the legibility and understandability of plans, which will impact the duration and correctness of the tasks.

2.
Legend structure significantly influences the legibility and understandability of the plans, which will impact the number of fixations and length of dwell time in the Legend AOI.

3.
Differences between students and experts in how these groups read plans will impact the duration and correctness of the tasks.

Materials and Methods
This section contains three main parts.The study's design is described along with how data were prepared for analyses (identifying fixations).Finally, the methods used for analyses are explained.

Selection of Plans
In the Czech Republic, "Act No. 183/2006 Coll., the Construction Act" [54] provides two main urban and regional planning tools: "Analytical Material for Planning" and "Planning Documentation".Planning Documentation (regional plans and master plans) must be created, updated, and published online for each region (14 regions in the Czech Republic) and each municipality (6258 municipalities).The master plan is a set of specific text and graphic documents that regulate and propose construction in a designated area.The graphical part consists of several maps visualizing several aspects of city planning, for example, zoning plan, land-use limits map, water and waste management map, nature protection map, utility networks map, transportation map, etc.Each map consists of many thematic layers that are not easy to visualize together, even though a large scale of 1:5000 is used.
This study considers urban plans published using web applications.To analyze comparable urban plans, it was necessary to choose applications with similar layouts (large map, a legend on the right hand side, and no additional map features).Four urban plans created by different authors using different styles and published in different years were selected.The oldest plan is that of Jihlava (1999-revised in 2013).The Hradec Králové plan was created in 2012, and the Bohumín and Olomouc plans were created in 2014.All plans were created by private companies; each company adopted their approach to visualizing the urban plan (no methodology described in the Introduction was used).
Each plan represents a different map style.Map styles have been investigated by many authors (e.g., [55][56][57]).According to Beconyte [55], style in modern cartography can be defined as a set of parameters, some of which are determined by the map scale, theme and general purpose, whereas others are subject to the designer's free choice.Beconyte [55] defined the main parameters that allow a map's style to be defined: decorativeness, expressiveness, and originality.To select plans with different map styles, we used expressiveness (composition; proportion and colors of map symbols and text) as the main parameter.
The tasks were designed as static views, and it was therefore not possible to move the maps.The advantage of this approach is that the maps had similar scales and eye-tracking data recorded for all respondents and their answers could be directly compared.Nevertheless, it was impossible to use images as stimuli in the eye-tracking experiment.The legend for all the maps was longer than the height of the image.We created an HTML page for each city/task consisting of the image with the static map on the left and a legend in a window with a slider on the right.The results look exactly same as the original webpages with the plan, and, although it was possible to move the legend, the map was static.The same zoom level (scale 1:5000) was used for all plans because the printed version of each plan in this scale is also available.In this scale, only a small part of the city was displayed on the screen, which was useful to eliminate any participant knowledge of the cities.
Figure 2 shows an overview of the stimuli used.Tasks Q1 and Q3, which were used in our analysis, were omitted from this overview because these tasks were given on the same plan as for Q2 (Zoning map).Plans in higher resolution can be accessed via www.eyetracking.upol.cz/urban,where all the stimuli used in the eye-tracking experiment are displayed.All the stimuli were prepared to a resolution of 1920 × 1200 px.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 6 of 25 Each plan represents a different map style.Map styles have been investigated by many authors (e.g., [55][56][57]).According to Beconyte [55], style in modern cartography can be defined as a set of parameters, some of which are determined by the map scale, theme and general purpose, whereas others are subject to the designer's free choice.Beconyte [55] defined the main parameters that allow a map's style to be defined: decorativeness, expressiveness, and originality.To select plans with different map styles, we used expressiveness (composition; proportion and colors of map symbols and text) as the main parameter.
The tasks were designed as static views, and it was therefore not possible to move the maps.The advantage of this approach is that the maps had similar scales and eye-tracking data recorded for all respondents and their answers could be directly compared.Nevertheless, it was impossible to use images as stimuli in the eye-tracking experiment.The legend for all the maps was longer than the height of the image.We created an HTML page for each city/task consisting of the image with the static map on the left and a legend in a window with a slider on the right.The results look exactly same as the original webpages with the plan, and, although it was possible to move the legend, the map was static.The same zoom level (scale 1:5000) was used for all plans because the printed version of each plan in this scale is also available.In this scale, only a small part of the city was displayed on the screen, which was useful to eliminate any participant knowledge of the cities.
Figure 2 shows an overview of the stimuli used.Tasks Q1 and Q3, which were used in our analysis, were omitted from this overview because these tasks were given on the same plan as for Q2 (Zoning map).Plans in higher resolution can be accessed via www.eyetracking.upol.cz/urban,where all the stimuli used in the eye-tracking experiment are displayed.All the stimuli were prepared to a resolution of 1920 × 1200 px.

Selection of Tasks
To cover the most typical tasks during standard work with an urban plan (i.e., finding a new place for housing or identifying areas with proposed public services), analysis of the urban plans was based on six tasks.There are many different taxonomies of tasks (see [58] for a detailed description).According to the commonly used Wehrend and Lewis [59] objective-based taxonomy, we used the simplest task category (Identify).To prevent any misunderstandings, the tasks were created to be as

Selection of Tasks
To cover the most typical tasks during standard work with an urban plan (i.e., finding a new place for housing or identifying areas with proposed public services), analysis of the urban plans was based on six tasks.There are many different taxonomies of tasks (see [58] for a detailed description).According to the commonly used Wehrend and Lewis [59] objective-based taxonomy, we used the simplest task category (Identify).To prevent any misunderstandings, the tasks were created to be as simple as possible.The tasks focused on the point, line and polygon map features to cover as many cartographical symbols as possible.One task (Q3) involved proposals, not real conditions.The following questions were asked: Finally, six standalone webpages were obtained for each city.Eye-tracker SMI RED 250 (developed by SensoMotoric Instruments, Berlin, Germany) with a sampling frequency of 250 Hz was used in the study, and the experiment was created at the SMI Experiment Center.Stimuli were presented on a monitor with a resolution of 1920 × 1200 px.At the beginning of the experiment, the respondents completed a short questionnaire about their age and experience, and then calibration was performed.We set a calibration threshold of 1 • of the visual angle.Respondents with a higher deviation were excluded from the results, along with those whose Tracking Ratio (the proportion of time that the eye tracker recorded point of gaze coordinates over the entire experiment) was higher than 90%.After calibration, respondents were informed about the purpose of the experiment and provided with some basic information about urban planning.The tasks were then presented.No time limits were set for respondents to read and remember the task.After pressing the F2 key, an Internet browser would open, displaying a webpage with one of the stimuli automatically.The tasks were presented sequentially from Q1 to Q6.The city plans were randomized for each task.At the end of the experiment, a short questionnaire about the respondents' subjective opinions on the presented plans was displayed.This questionnaire was created in Google Forms and contained an image of the representative plan for each city (to remind the respondent how it looked) and a scale from 1 (best) to 5 (worse).The experiment was ten to fifteen minutes long.A diagram showing the experiment's design is in Figure 3.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 7 of 25 simple as possible.The tasks focused on the point, line and polygon map features to cover as many cartographical symbols as possible.One task (Q3) involved proposals, not real conditions.The following questions were asked: Finally, six standalone webpages were obtained for each city.Eye-tracker SMI RED 250 (developed by SensoMotoric Instruments, Berlin, Germany) with a sampling frequency of 250 Hz was used in the study, and the experiment was created at the SMI Experiment Center.Stimuli were presented on a monitor with a resolution of 1920 × 1200 px.At the beginning of the experiment, the respondents completed a short questionnaire about their age and experience, and then calibration was performed.We set a calibration threshold of 1° of the visual angle.Respondents with a higher deviation were excluded from the results, along with those whose Tracking Ratio (the proportion of time that the eye tracker recorded point of gaze coordinates over the entire experiment) was higher than 90%.After calibration, respondents were informed about the purpose of the experiment and provided with some basic information about urban planning.The tasks were then presented.No time limits were set for respondents to read and remember the task.After pressing the F2 key, an Internet browser would open, displaying a webpage with one of the stimuli automatically.The tasks were presented sequentially from Q1 to Q6.The city plans were randomized for each task.At the end of the experiment, a short questionnaire about the respondents' subjective opinions on the presented plans was displayed.This questionnaire was created in Google Forms and contained an image of the representative plan for each city (to remind the respondent how it looked) and a scale from 1 (best) to 5 (worse).The experiment was ten to fifteen minutes long.A diagram showing the experiment's design is in Figure 3.

Participants
Thirty-four respondents participated in the experiment.Because of the inaccuracy of calibration or an insufficient Tracking Ratio, eight were excluded from the results analysis.The device SMI RED 250 used in the study is a remote eye-tracker.The errors were caused by problems with drooping eyelids and participants moving towards the screen.Quality over quantity was preferred and therefore the participants with tracking ratios of less than 1 • of the visual angle were excluded.Finally, we obtained 26 respondents.Twenty were students (11 male and 9 female) in the third year of the Geoinformatics and Cartography bachelor's program or the Geoinformatics master's program.Students were used as the majority of the sample because they have studied urban planning and cartography courses and generally have a basic knowledge of urban plans and cartography.This ensured that their knowledge in these areas was at similar levels and the results would be comparable.
Six respondents (1 male and 5 female) were experts working in the departments of urban planning at the Municipality of the City of Olomouc or the Regional Authority of the Olomouc Region.These experts work with urban plans every day and had similar backgrounds.Thus, it was possible to consider them as a consistent group with relatively homogenous experience and skills.
According to their statements, all the participants had normal or corrected to normal vision and were not color blind.

Fixations and Their Detection
The eyes move in many ways, simultaneously responding to commands from several different brain areas.One of the most important types of eye movement, known as fixation, is not really a movement at all, but instead is the ability to keep the eye trained on a fixed spot in the world.It is generally considered that when we measure fixation, we also measure attention to that position [60].
Our visual experience consists of a series of fixations on different objects.To get from one fixation to the next, the eyes make rapid, ballistic movements known as saccades [61].
It is important to define the exact detection algorithm for detecting fixations and saccades because different parameterizations of an algorithm might lead to different results.Although many algorithms exist, for low-speed data (up to 250 Hz), the most commonly used algorithm is I-DT, which considers the close spatial proximity of eye position points in an eye movement trace [62].The algorithm defines a temporal window that moves one point at a time, and the spatial dispersion created by the points within this window is compared against the threshold.For the case study, SMI BeGaze software and the ID-T algorithm were used to detect fixation.Threshold values in BeGaze were set to 80 ms for "Duration threshold" and 50 px for "Dispersion threshold".More information about this setting is described in [63].

Methods of Data Analyses
Because the data were recorded as screen recording stimuli, the tracking results provided separate video records for each respondent.To analyze them together, it was necessary to combine all the videos according to each task and city combination.To do so, a function of the BeGaze software called Custom Trial Selector was used.The custom trial was designed for each task and city as a screenshot from the video for all city-task combinations.Next, the corresponding part of each recording was assigned to it.With the Custom Trial Selector, whether respondents were looking at the map or the legend could be analyzed.A detailed analysis of eye movements in the map section of the stimulus was also possible.The only disadvantage of this approach was not being able to analyze detailed eye-movements in the legend because respondents moved it in different ways (the legend was long and it was necessary to move it).If detailed analysis of scanpaths is required, the original screen recordings for each participant can be used instead of the custom trials, where data from all participants are displayed together.
To analyze the results, statistical analysis using the Wilcoxon rank-sum test [64] and the Kruskal-Wallis test [65] was performed.These tests were chosen as non-parametric variants of t-test and ANOVA because (as in majority of eye-movement studies) the data recorded in our study did not have normal distribution.The purpose of both tests is to check null hypothesis that no difference exists between variables.All data were analyzed at the 95% confidence level.Statistically significant differences were marked directly into boxplots.
Trial Duration showing how long it took to solve a task and Fixation Count describing how many fixations were performed during a task were investigated.Next, the map and legend were analyzed to see in which part of the stimuli participant attention was focused.Sequence Chart visualization was selected to display respondent eye-movements between the map and legend stimuli.The Sequence Chart shows the temporal sequence of the visited Areas of Interest.From the visualization, it is clear where respondents looked first and where they looked later [66].In addition to Sequence Chart that is included in the software from eye-trackers manufacturer, we visually analyed recorded data using a FlowMap method in V-Analytics software.V-Analytics [67] is intended for the visual analysis of spatio-temporal data and thus can also be used for the analysis of eye-movements [68].The output of FlowMap shows aggregated eye-movement trajectory of all participants.In the first step, Voronoi polygons covering the whole stimulus are created based on the distribution of fixations recorded over this stimulus.Then, the arrows between these polygons are displayed, and their width represents the number of gaze movements between them.In our case, we constructed arrows with the settings 0; 0; 0; 0; 75 and filtered out arrows displaying less than three moves.With this setting, the output is illustrative and is not overfilled.
According to above mentioned indicators, the plans were qualified as either good or bad.For example, if high values of Trial Duration or Fixation Count were observed, the plan was considered bad.If the participants' answers contained many inaccuracies, the plan was also considered bad.Finally, these objective measurements were compared with the results of the subjective questionnaire.

Trial Duration
The first part of analyzing the recorded eye-movement data focused on the Trial Duration metric (Figure 4).This metric shows how long it took to solve a task.Higher values of Trial Duration are expected for more complex tasks or plans with lower legibility.In these cases, participants may have problems finding the proper symbol in the legend or identifying it on the map.Statistically significant differences between cities for each task according to the Kruskal-Wallis test are shown in the upper part of the figure.From the boxplot, it is evident that some tasks were much more difficult than others.Participants were very fast (below 20 s) in Tasks Q1, Q2, and Q4.The most problematic tasks were Q3 and Q6.For Task Q5, the only problematic plan was for Jihlava.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 9 of 25 Trial Duration showing how long it took to solve a task and Fixation Count describing how many fixations were performed during a task were investigated.Next, the map and legend were analyzed to see in which part of the stimuli participant attention was focused.Sequence Chart visualization was selected to display respondent eye-movements between the map and legend stimuli.The Sequence Chart shows the temporal sequence of the visited Areas of Interest.From the visualization, it is clear where respondents looked first and where they looked later [66].In addition to Sequence Chart that is included in the software from eye-trackers manufacturer, we visually analyed recorded data using a FlowMap method in V-Analytics software.V-Analytics [67] is intended for the visual analysis of spatio-temporal data and thus can also be used for the analysis of eye-movements [68].The output of FlowMap shows aggregated eye-movement trajectory of all participants.In the first step, Voronoi polygons covering the whole stimulus are created based on the distribution of fixations recorded over this stimulus.Then, the arrows between these polygons are displayed, and their width represents the number of gaze movements between them.In our case, we constructed arrows with the settings 0; 0; 0; 0; 75 and filtered out arrows displaying less than three moves.With this setting, the output is illustrative and is not overfilled.
According to above mentioned indicators, the plans were qualified as either good or bad.For example, if high values of Trial Duration or Fixation Count were observed, the plan was considered bad.If the participants' answers contained many inaccuracies, the plan was also considered bad.Finally, these objective measurements were compared with the results of the subjective questionnaire.

Trial Duration
The first part of analyzing the recorded eye-movement data focused on the Trial Duration metric (Figure 4).This metric shows how long it took to solve a task.Higher values of Trial Duration are expected for more complex tasks or plans with lower legibility.In these cases, participants may have problems finding the proper symbol in the legend or identifying it on the map.Statistically significant differences between cities for each task according to the Kruskal-Wallis test are shown in the upper part of the figure.From the boxplot, it is evident that some tasks were much more difficult than others.Participants were very fast (below 20 s) in Tasks Q1, Q2, and Q4.The most problematic tasks were Q3 and Q6.For Task Q5, the only problematic plan was for Jihlava.For the first task, statistically significant differences were found between the four pairs of cities.The most efficient plan was the Hradec Králové plan.By contrast, the highest values of Trial Duration were observed for the Bohumín plan.In this case, the legend is not structured, and finding housing areas took a longer time.In Q2, the results were almost balanced.The best results were recorded for the Jihlava plan, the worst for the Bohumín plan.A statistically significant difference was observed between these two plans.In Q3, Trial Duration for the Bohumín plan was again the highest.The best results were obtained for the Olomouc plan.Balanced results were obtained in Q4, where overall Trial Durations were relatively low.The worst results were obtained for the Jihlava plan, the best for Bohumín.Again, a statistically significant difference was observed between the Trial Duration values of these plans.The highest value of Trial Duration from the whole experiment was recorded for the Jihlava plan in Q5.Statistically significant differences were found between this plan and all others.The respondents had to find the wastewater treatment plant, which was not easy because of the use of incorrect map symbols (the symbol in the legend was not equivalent to the symbol on the map).Very different values of Trial Duration between certain plans were observed in Q6.This task focused on finding a protected area of water resources.The legend of the natural protection map was very complex in the Olomouc case.It contained more than 200 symbols and the description of each symbol was also relatively long.Thus, it took a long time for respondents to complete this task with the Olomouc plan.

Fixation Count
For the next analysis, Fixation Count was chosen.A higher number of fixations indicates either a low level of efficiency during a search or an inconvenient user interface [34].We presumed that respondents would perform fewer fixations on urban plans with structured legends and better legibility.However, the values of Trial Duration and Fixation Count are highly correlated, and the boxplot displayed in Figure 4 looks very similar to the Fixation Count.
Figure 5 shows the Fixation Count values for four analyzed cities (the tasks are aggregated).The smallest values of Fixation Count were observed for Hradec Králové (a clearly designed plan with a well-organized legend and a low number of map symbols and colors).Statistically significant differences were found between Hradec Králové and Bohumín and Hradec Králové and Jihlava.The median value for Olomouc was similar to the value of Hradec Králové; therefore, no significant difference between these two cities was found.It may seem that many outliers are in the boxplot, but it is necessary to consider that each city contained six tasks and was observed by 26 participants.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 10 of 25 For the first task, statistically significant differences were found between the four pairs of cities.The most efficient plan was the Hradec Králové plan.By contrast, the highest values of Trial Duration were observed for the Bohumín plan.In this case, the legend is not structured, and finding housing areas took a longer time.In Q2, the results were almost balanced.The best results were recorded for the Jihlava plan, the worst for the Bohumín plan.A statistically significant difference was observed between these two plans.In Q3, Trial Duration for the Bohumín plan was again the highest.The best results were obtained for the Olomouc plan.Balanced results were obtained in Q4, where overall Trial Durations were relatively low.The worst results were obtained for the Jihlava plan, the best for Bohumín.Again, a statistically significant difference was observed between the Trial Duration values of these plans.The highest value of Trial Duration from the whole experiment was recorded for the Jihlava plan in Q5.Statistically significant differences were found between this plan and all others.The respondents had to find the wastewater treatment plant, which was not easy because of the use of incorrect map symbols (the symbol in the legend was not equivalent to the symbol on the map).Very different values of Trial Duration between certain plans were observed in Q6.This task focused on finding a protected area of water resources.The legend of the natural protection map was very complex in the Olomouc case.It contained more than 200 symbols and the description of each symbol was also relatively long.Thus, it took a long time for respondents to complete this task with the Olomouc plan.

Fixation Count
For the next analysis, Fixation Count was chosen.A higher number of fixations indicates either a low level of efficiency during a search or an inconvenient user interface [34].We presumed that respondents would perform fewer fixations on urban plans with structured legends and better legibility.However, the values of Trial Duration and Fixation Count are highly correlated, and the boxplot displayed in Figure 4 looks very similar to the Fixation Count.
Figure 5 shows the Fixation Count values for four analyzed cities (the tasks are aggregated).The smallest values of Fixation Count were observed for Hradec Králové (a clearly designed plan with a well-organized legend and a low number of map symbols and colors).Statistically significant differences were found between Hradec Králové and Bohumín and Hradec Králové and Jihlava.The median value for Olomouc was similar to the value of Hradec Králové; therefore, no significant difference between these two cities was found.It may seem that many outliers are in the boxplot, but it is necessary to consider that each city contained six tasks and was observed by 26 participants.In contrast to Figure 5, Figure 6 shows the Fixation Count values for all tasks (the cities are aggregated).Tasks Q1, Q2, and Q4 were easy to solve.Simple questions were aimed at typical tasks probably expected by the respondents.The higher values of Q5 are due to the problematic stimulus of Jihlava.Task Q6 and the stimulus of Olomouc is similar.The symbol used for protected areas of water resources was featureless and was not easy to find on either the map or legend.The high number of fixations for Q3 was due to the task type: respondents were looking for a proposed element (public areas).
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 11 of 25 In contrast to Figure 5, Figure 6 shows the Fixation Count values for all tasks (the cities are aggregated).Tasks Q1, Q2, and Q4 were easy to solve.Simple questions were aimed at typical tasks probably expected by the respondents.The higher values of Q5 are due to the problematic stimulus of Jihlava.Task Q6 and the stimulus of Olomouc is similar.The symbol used for protected areas of water resources was featureless and was not easy to find on either the map or legend.The high number of fixations for Q3 was due to the task type: respondents were looking for a proposed element (public areas).

Differences between Map and Legend
Previous analyses addressed entire stimuli, whereas subsequent processing focused on evaluating user perception of the map and legend sections of the stimuli.It is would be possible to divide the stimuli into more AOIs, but, for the purpose of this study, it was crucial whether participants were looking for the unknown symbol in the legend or searching for an already known symbol on the map.Another option would be to mark, for example, map-targeted symbols or legendtargeted symbols.Unfortunately, using the Custom Trial Selector (Section 2.4), it was not possible to create an AOI around specific symbols in the legend.
Areas of Interest (AOIs) were marked around the map and legend of each stimulus.Areas of Interest are regions in the stimulus in which the researcher is interested [60].The number of fixations in each AOI were calculated and are shown in Figure 7.It was difficult to locate the correct symbol in the legend for Tasks Q3 and Q6.In Q3, the map symbol for proposed public services was in the legend in the second column, which influenced a high number of fixations in identifying the correct symbol.The most fixations in the legend were observed in Q6 (Olomouc).This result was due to the position of the correct symbol in the legend (in the lower part).

Differences between Map and Legend
Previous analyses addressed entire stimuli, whereas subsequent processing focused on evaluating user perception of the map and legend sections of the stimuli.It is would be possible to divide the stimuli into more AOIs, but, for the purpose of this study, it was crucial whether participants were looking for the unknown symbol in the legend or searching for an already known symbol on the map.Another option would be to mark, for example, map-targeted symbols or legend-targeted symbols.Unfortunately, using the Custom Trial Selector (Section 2.3), it was not possible to create an AOI around specific symbols in the legend.
Areas of Interest (AOIs) were marked around the map and legend of each stimulus.Areas of Interest are regions in the stimulus in which the researcher is interested [60].The number of fixations in each AOI were calculated and are shown in Figure 7.It was difficult to locate the correct symbol in the legend for Tasks Q3 and Q6.In Q3, the map symbol for proposed public services was in the legend in the second column, which influenced a high number of fixations in identifying the correct symbol.The most fixations in the legend were observed in Q6 (Olomouc).This result was due to the position of the correct symbol in the legend (in the lower part).
In contrast to Figure 5, Figure 6 shows the Fixation Count values for all tasks (the cities are aggregated).Tasks Q1, Q2, and Q4 were easy to solve.Simple questions were aimed at typical tasks probably expected by the respondents.The higher values of Q5 are due to the problematic stimulus of Jihlava.Task Q6 and the stimulus of Olomouc is similar.The symbol used for protected areas of water resources was featureless and was not easy to find on either the map or legend.The high number of fixations for Q3 was due to the task type: respondents were looking for a proposed element (public areas).

Differences between Map and Legend
Previous analyses addressed entire stimuli, whereas subsequent processing focused on evaluating user perception of the map and legend sections of the stimuli.It is would be possible to divide the stimuli into more AOIs, but, for the purpose of this study, it was crucial whether participants were looking for the unknown symbol in the legend or searching for an already known symbol on the map.Another option would be to mark, for example, map-targeted symbols or legendtargeted symbols.Unfortunately, using the Custom Trial Selector (Section 2.4), it was not possible to create an AOI around specific symbols in the legend.
Areas of Interest (AOIs) were marked around the map and legend of each stimulus.Areas of Interest are regions in the stimulus in which the researcher is interested [60].The number of fixations in each AOI were calculated and are shown in Figure 7.It was difficult to locate the correct symbol in the legend for Tasks Q3 and Q6.In Q3, the map symbol for proposed public services was in the legend in the second column, which influenced a high number of fixations in identifying the correct symbol.The most fixations in the legend were observed in Q6 (Olomouc).This result was due to the position of the correct symbol in the legend (in the lower part).The most important metric for describing user interactions with AOIs is Dwell Time.Dwell Time is calculated as a sum of all of the fixation durations within a prescribed area, in this case, the map and the legend [69].Figure 8 shows the relative values of Dwell Time for each city.For each city, respondents spent statistically significantly more time in the legend than on the map.The biggest difference was observed for Bohumín, where the legend was not structured at all.This can be interpreted also as the result of low map legibility due to the low level of associativity of the map symbols.In the next part, a detailed analysis of Dwell Times according to the tasks was performed.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 12 of 25 The most important metric for describing user interactions with AOIs is Dwell Time.Dwell Time is calculated as a sum of all of the fixation durations within a prescribed area, in this case, the map and the legend [69].Figure 8 shows the relative values of Dwell Time for each city.For each city, respondents spent statistically significantly more time in the legend than on the map.The biggest difference was observed for Bohumín, where the legend was not structured at all.This can be interpreted also as the result of low map legibility due to the low level of associativity of the map symbols.In the next part, a detailed analysis of Dwell Times according to the tasks was performed.Figure 9 shows that the legend was observed for a longer time than the map in Tasks Q2, Q3, and Q6.The largest difference was found in Q3, which focused on proposed public areas.To solve the task, participants had to consider this and search for the symbol for proposals in the legend.A similar situation can be seen in Q6, the correct map symbol being in the lower part of the legend, which influenced the time spent searching for it.Task Q2 focused on recreation areas, usually marked in a yellow or orange color.It took some time to find the symbol in the legend, but, due to the brightness of the color of recreation areas, it was easy to find them on the map.Recreation areas are often relatively large, so they were easily identifiable.
In Tasks Q1 and Q5, respondents spent more time on the map.Task Q1 was the clearest: respondents were required to find housing areas that were usually shown at the top of the legend.Task Q5 was influenced by an incorrect symbol in the Jihlava plan, and participants spent a lot of time on the map.Task Q4 was the most balanced, requiring respondents to find a railroad.This was the only task in which a statistically significant difference was not found.
In Figure 9, the ratio between time spent on the map and the legend is presented.Figure 10 displays the values of Dwell Time for the AOI "Legend" only.The results show that the most problematic search in the legend was in Tasks Q3 and Q6.High Dwell Time values for these tasks can be explained as the same as above-the proposal for Q3 and location of the symbol at the bottom of the legend for Task Q6. Figure 9 shows that the legend was observed for a longer time than the map in Tasks Q2, Q3, and Q6.The largest difference was found in Q3, which focused on proposed public areas.To solve the task, participants had to consider this and search for the symbol for proposals in the legend.A similar situation can be seen in Q6, the correct map symbol being in the lower part of the legend, which influenced the time spent searching for it.Task Q2 focused on recreation areas, usually marked in a yellow or orange color.It took some time to find the symbol in the legend, but, due to the brightness of the color of recreation areas, it was easy to find them on the map.Recreation areas are often relatively large, so they were easily identifiable.
In Tasks Q1 and Q5, respondents spent more time on the map.Task Q1 was the clearest: respondents were required to find housing areas that were usually shown at the top of the legend.Task Q5 was influenced by an incorrect symbol in the Jihlava plan, and participants spent a lot of time on the map.Task Q4 was the most balanced, requiring respondents to find a railroad.This was the only task in which a statistically significant difference was not found.
In Figure 9, the ratio between time spent on the map and the legend is presented.Figure 10 displays the values of Dwell Time for the AOI "Legend" only.The results show that the most problematic search in the legend was in Tasks Q3 and Q6.High Dwell Time values for these tasks can be explained as the same as above-the proposal for Q3 and location of the symbol at the bottom of the legend for Task Q6.The Trial Duration values for Tasks Q5 (Jihlava) and Q6 (Olomouc) were the highest (see Figure 4).Nevertheless, the reason for these tasks requiring a lot of time to solve is different in both cases, as can be seen in Figure 11.The recorded data were visualized using the Sequence Chart.In this chart, each row visualizes data for one participant looking at stimuli Q5 (Jihlava) and Q6 (Olomouc).The length of the color line corresponds to the time spent on the stimulus.Darker parts of the lines represent the time spent in the legend; brighter ones are associated with the map.In Q5 (Jihlava), it was easy to find the correct symbol in the legend, needing approximately ten seconds.Respondents then looked at  The Trial Duration values for Tasks Q5 (Jihlava) and Q6 (Olomouc) were the highest (see Figure 4).Nevertheless, the reason for these tasks requiring a lot of time to solve is different in both cases, as can be seen in Figure 11.The recorded data were visualized using the Sequence Chart.In this chart, each row visualizes data for one participant looking at stimuli Q5 (Jihlava) and Q6 (Olomouc).The length of the color line corresponds to the time spent on the stimulus.Darker parts of the lines represent the time spent in the legend; brighter ones are associated with the map.In Q5 (Jihlava), it was easy to find the correct symbol in the legend, needing approximately ten seconds.Respondents then looked at The Trial Duration values for Tasks Q5 (Jihlava) and Q6 (Olomouc) were the highest (see Figure 4).Nevertheless, the reason for these tasks requiring a lot of time to solve is different in both cases, as can be seen in Figure 11.The recorded data were visualized using the Sequence Chart.In this chart, each row visualizes data for one participant looking at stimuli Q5 (Jihlava) and Q6 (Olomouc).The length of the color line corresponds to the time spent on the stimulus.Darker parts of the lines represent the time spent in the legend; brighter ones are associated with the map.In Q5 (Jihlava), it was easy to find the correct symbol in the legend, needing approximately ten seconds.Respondents then looked at the map and attempted to find the symbol.In some cases (i.e., respondents P09, P12, P22, P33, P36, and others) and after some time with the map, respondents looked back to the legend to verify the symbol for the wastewater treatment plant.In Task Q6 (Olomouc), the opposite pattern can be seen: respondents spent most of their time in the legend of the stimulus.After finding the symbol in the legend, they quickly marked it with a mouse click on the map.
the map and attempted to find the symbol.In some cases (i.e., respondents P09, P12, P22, P33, P36, and others) and after some time with the map, respondents looked back to the legend to verify the symbol for the wastewater treatment plant.In Task Q6 (Olomouc), the opposite pattern can be seen: respondents spent most of their time in the legend of the stimulus.After finding the symbol in the legend, they quickly marked it with a mouse click on the map.In addition to Sequence Chart, the data from the whole experiment were visualized using a FlowMap method in V-Analytics software [67].The most illustrative results of a FlowMap method were obtained from the data recorded over the Task Q6 (Figure 12).As was already mentioned in the methods, FlowMap shows the aggregated eye-movements of all participants.The width of the arrows represents the number of gaze movements between Voronoi polygons in the stimulus.In accordance In addition to Sequence Chart, the data from the whole experiment were visualized using a FlowMap method in V-Analytics software [67].The most illustrative results of a FlowMap method were obtained from the data recorded over the Task Q6 (Figure 12).As was already mentioned in the methods, FlowMap shows the aggregated eye-movements of all participants.The width of the arrows represents the number of gaze movements between Voronoi polygons in the stimulus.In accordance with previously described results, the most straightforward strategy of stimulus inspection was observed in the case of Hradec Králové.The symbol of a protected area of water resources was quite clear and distinctive, so the aggregated gaze trajectory displayed as a sequence of arrows leads directly to the correct place on the map.The legend was clearly structured, and participants did not need too many fixations for the finding of the proper symbol.On the other hand, the most complicated gaze trajectories were observed for Bohumín and Olomouc plans.The participants spent a lot of time in the legend trying to find the correct symbol in unstructured (Bohumín) or very comprehensive (Olomouc) legend.This can be observed in Figures 7 and 10 as well.In addition, the trajectories in the maps were leading to different places because the protected area of water resources was represented by the symbol that was difficult to distinguish (thin line similar to many other symbols).
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 15 of 25 with previously described results, the most straightforward strategy of stimulus inspection was observed in the case of Hradec Králové.The symbol of a protected area of water resources was quite clear and distinctive, so the aggregated gaze trajectory displayed as a sequence of arrows leads directly to the correct place on the map.The legend was clearly structured, and participants did not need too many fixations for the finding of the proper symbol.On the other hand, the most complicated gaze trajectories were observed for Bohumín and Olomouc plans.The participants spent a lot of time in the legend trying to find the correct symbol in unstructured (Bohumín) or very comprehensive (Olomouc) legend.This can be observed in Figures 7 and 10 as well.In addition, the trajectories in the maps were leading to different places because the protected area of water resources was represented by the symbol that was difficult to distinguish (thin line similar to many other symbols).

Differences between Experts and Students
In the following evaluation, the differences between the experts and students were investigated.The size of the group of experts was small (6), and statistically significant differences could have been influenced by this fact, which is why these results are presented as explanatory.The boxplot in Figure 13 shows Fixation Count values for cities (tasks are aggregated).The differences between experts and students were found for Jihlava and Olomouc.Experts needed fewer fixations to solve the tasks in these cases.
Figure 14 shows the Fixation Count metric values for the tasks (the cities are aggregated).The differences between students and experts for Tasks Q1, Q2, and Q3 are due to the commonness of these tasks.Tasks Q1, Q2, and Q3 focused on very common elements, and experts work with them much more frequently than with the elements in Tasks Q4, Q5, and Q6.

Differences between Experts and Students
In the following evaluation, the differences between the experts and students were investigated.The size of the group of experts was small (6), and statistically significant differences could have been influenced by this fact, which is why these results are presented as explanatory.The boxplot in Figure 13 shows Fixation Count values for cities (tasks are aggregated).The differences between experts and students were found for Jihlava and Olomouc.Experts needed fewer fixations to solve the tasks in these cases.
Figure 14 shows the Fixation Count metric values for the tasks (the cities are aggregated).The differences between students and experts for Tasks Q1, Q2, and Q3 are due to the commonness of these tasks.Tasks Q1, Q2, and Q3 focused on very common elements, and experts work with them much more frequently than with the elements in Tasks Q4, Q5, and Q6.

Accuracy of Answers
In the next part of the data analysis, the accuracy of answers was investigated.The results are summarized in Figure 15.As mentioned above, the first two questions were easy and consisted of standard task solving while working with the plan.Only one incorrect answer (detected by mouse click coordinates) was recorded.
By contrast, solving Q3 was problematic in all four cities.Interestingly, of the six experts participating in the study, four of them were incorrect in three of the four cities.In this task, respondents were required to find the area for proposed public services (marked as a purple crosshatched area).Four of the six experts marked the grey cross-hatched area instead (Figure 16: red diamonds represent mouse clicks by students, and black diamonds with a dot represent clicks by experts).This grey symbol corresponds to the proposed traffic infrastructure element.The experts were familiar with these symbols, explaining that, in their work, they deal primarily with technical or traffic infrastructure, which is usually marked with a grey symbol.

Accuracy of Answers
In the next part of the data analysis, the accuracy of answers was investigated.The results are summarized in Figure 15.As mentioned above, the first two questions were easy and consisted of standard task solving while working with the plan.Only one incorrect answer (detected by mouse click coordinates) was recorded.
By contrast, solving Q3 was problematic in all four cities.Interestingly, of the six experts participating in the study, four of them were incorrect in three of the four cities.In this task, respondents were required to find the area for proposed public services (marked as a purple crosshatched area).Four of the six experts marked the grey cross-hatched area instead (Figure 16: red diamonds represent mouse clicks by students, and black diamonds with a dot represent clicks by experts).This grey symbol corresponds to the proposed traffic infrastructure element.The experts were familiar with these symbols, explaining that, in their work, they deal primarily with technical or traffic infrastructure, which is usually marked with a grey symbol.

Accuracy of Answers
In the next part of the data analysis, the accuracy of answers was investigated.The results are summarized in Figure 15.As mentioned above, the first two questions were easy and consisted of standard task solving while working with the plan.Only one incorrect answer (detected by mouse click coordinates) was recorded.
By contrast, solving Q3 was problematic in all four cities.Interestingly, of the six experts participating in the study, four of them were incorrect in three of the four cities.In this task, respondents were required to find the area for proposed public services (marked as a purple cross-hatched area).Four of the six experts marked the grey cross-hatched area instead (Figure 16: red diamonds represent mouse clicks by students, and black diamonds with a dot represent clicks by experts).This grey symbol corresponds to the proposed traffic infrastructure element.The experts were familiar with these symbols, explaining that, in their work, they deal primarily with technical or traffic infrastructure, which is usually marked with a grey symbol.
Another interesting result was found for Task Q4 (finding the railroad).For each city except Olomouc, all answers were correct.For Olomouc, three incorrect answers by experts were found.This is interesting, because all the experts work with the Olomouc plan every day.The problem was the black line symbol (at the bottom of Figure 17).Although this symbol can be mistaken for a railroad, in reality it represents a boundary between city districts.Experts were confident of their answers being correct and did not check the symbol in the legend.
In Task Q5 (Jihlava), in which respondents were required to find the wastewater treatment plant, the incidence of incorrect answers was highest.Ten students and three experts marked the incorrect area.Additionally, five participants were unable to mark anything, and they skipped this task.The symbol in the legend (the small, black ČOV caption) did not correspond to the symbol on the map (the large, red ÈOV caption) (Figure 18).The difference between captions was due to a coding error of diacritics in the Jihlava plan.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 17 of 25 Another interesting result was found for Task Q4 (finding the railroad).For each city except Olomouc, all answers were correct.For Olomouc, three incorrect answers by experts were found.This is interesting, because all the experts work with the Olomouc plan every day.The problem was the black line symbol (at the bottom of Figure 17).Although this symbol can be mistaken for a railroad, in reality it represents a boundary between city districts.Experts were confident of their answers being correct and did not check the symbol in the legend.
In Task Q5 (Jihlava), in which respondents were required to find the wastewater treatment plant, the incidence of incorrect answers was highest.Ten students and three experts marked the incorrect area.Additionally, five participants were unable to mark anything, and they skipped this task.The symbol in the legend (the small, black ČOV caption) did not correspond to the symbol on the map (the large, red ÈOV caption) (Figure 18).The difference between captions was due to a coding error of diacritics in the Jihlava plan.Another interesting result was found for Task Q4 (finding the railroad).For each city except Olomouc, all answers were correct.For Olomouc, three incorrect answers by experts were found.This is interesting, because all the experts work with the Olomouc plan every day.The problem was the black line symbol (at the bottom of Figure 17).Although this symbol can be mistaken for a railroad, in reality it represents a boundary between city districts.Experts were confident of their answers being correct and did not check the symbol in the legend.
In Task Q5 (Jihlava), in which respondents were required to find the wastewater treatment plant, the incidence of incorrect answers was highest.Ten students and three experts marked the incorrect area.Additionally, five participants were unable to mark anything, and they skipped this task.The symbol in the legend (the small, black ČOV caption) did not correspond to the symbol on the map (the large, red ÈOV caption) (Figure 18).The difference between captions was due to a coding error of diacritics in the Jihlava plan.

Results of the Questionnaire
The final part of the evaluation addressed the questionnaire, in which respondents rated the legibility of the plans of the four cities according to their subjective opinion.The main aim of this part of the data analyses was to compare the objective results of eye-movement and Trial Duration analysis with the subjective attitude of the participants.The questionnaire was filled out immediately after the eye-tracking experiment, and respondents ranked all the plans on a five-point scale from 1 (best) to 5 (worst).The results correspond with objective eye-tracking measurements.Olomouc and Hradec Králové (average rank 2 and 2.04), which used correct, clear and structured legends with a low number of map symbols and colors, were ranked the highest, followed by Bohumín (average rank 2.54).Respondents ranked Jihlava as the worst plan with an average rank of 4.07 (Figure 19).

Results of the Questionnaire
The final part of the evaluation addressed the questionnaire, in which respondents rated the legibility of the plans of the four cities according to their subjective opinion.The main aim of this part of the data analyses was to compare the objective results of eye-movement and Trial Duration analysis with the subjective attitude of the participants.The questionnaire was filled out immediately after the eye-tracking experiment, and respondents ranked all the plans on a five-point scale from 1 (best) to 5 (worst).The results correspond with objective eye-tracking measurements.Olomouc and Hradec Králové (average rank 2 and 2.04), which used correct, clear and structured legends with a low number of map symbols and colors, were ranked the highest, followed by Bohumín (average rank 2.54).Respondents ranked Jihlava as the worst plan with an average rank of 4.07 (Figure 19).

Results of the Questionnaire
The final part of the evaluation addressed the questionnaire, in which respondents rated the legibility of the plans of the four cities according to their subjective opinion.The main aim of this part of the data analyses was to compare the objective results of eye-movement and Trial Duration analysis with the subjective attitude of the participants.The questionnaire was filled out immediately after the eye-tracking experiment, and respondents ranked all the plans on a five-point scale from 1 (best) to 5 (worst).The results correspond with objective eye-tracking measurements.Olomouc and Hradec Králové (average rank 2 and 2.04), which used correct, clear and structured legends with a low number of map symbols and colors, were ranked the highest, followed by Bohumín (average rank 2.54).Respondents ranked Jihlava as the worst plan with an average rank of 4.07 (Figure 19).

Discussion
The Introduction mentions that most previous papers (e.g., [10,29,31]) have focused on the analysis or evaluation of urban plans, but not from a cartographical point of view.All authors use a qualitative method, which is sometimes very subjective, and its application to different maps from different regions can be difficult.The authors evaluate the plans of various types and scales (regional, country, and national).However, their evaluations are more widely focused than the research presented in this paper.In previous studies, cartographical aspects or urban plans were analyzed only very superficially (e.g., [13]), mostly in relation to standardization or spatial policy.For this reason, an additional value of this paper lies mainly in the application of objective research firstly focused on the quality of urban plans.
The problematic part of this study is the unequal and small representation of experts participating in the study.Although user groups with the same sample size would be better, the number of experts in the field of urban planning who might be willing to participate in the study is limited.In analyzing the work of experts in specific fields, it is common to use fewer participants.For example, Bianchetti's doctoral dissertation [70] investigates the cognitive tasks and fundamental visual stimuli used in the interpretation of aerial imagery.She analyzed the work of seven analysts in the domain of forest management.Kiefer et al. [71] included only five participants in their study using a mobile eye-tracking device.For this study, all employees of the urban planning departments in Olomouc were asked, and all agreed to being recorded.Many of them, though, were older, and recording eye-tracking was problematic because they had either drooping eyelids or wore glasses.Because of the different group sizes and the low number of experts, the part of the study comparing the two groups of participants can be considered explanatory, serving only as a brief insight into the differences between the behavior of experts and students.
The number of overall participants (26) in the experiment is not ideal, but it is in accordance with other eye-tracking studies.Alacam and Dalci [72] also used 26 participants in their study about finding map symbols, and Fuchs et al. [50] used 21 respondents in their study about flood maps.Many studies have also used a much smaller sample, for example, Ooms et al. [73] (14 participants), Opach and Nossum [74] (10 participants), etc.In the paper of Coltekin et al. [49], this issue is discussed, and authors concluded that it is quite difficult to recruit a large number of people with expertise in geovisualization and/or specific thematic domain (in their case soil maps, in our case urban planning).Similar to Coltekin et al., we argue that, despite the limitations regarding the sample size, we provide many interesting insights about user's interaction with different urban plans.
Statistical analysis of Trial Duration and Fixation Count was used in this study, as well as visualization using Sequence Charts and FlowMaps.Other methods exist that could also be used to analyze data.A comprehensive description of visualization methods was introduced by Blascheck

Discussion
The Introduction mentions that most previous papers (e.g., [10,29,31]) have focused on the analysis or evaluation of urban plans, but not from a cartographical point of view.All authors use a qualitative method, which is sometimes very subjective, and its application to different maps from different regions can be difficult.The authors evaluate the plans of various types and scales (regional, country, and national).However, their evaluations are more widely focused than the research presented in this paper.In previous studies, cartographical aspects or urban plans were analyzed only very superficially (e.g., [13]), mostly in relation to standardization or spatial policy.For this reason, an additional value of this paper lies mainly in the application of objective research firstly focused on the quality of urban plans.
The problematic part of this study is the unequal and small representation of experts participating in the study.Although user groups with the same sample size would be better, the number of experts in the field of urban planning who might be willing to participate in the study is limited.In analyzing the work of experts in specific fields, it is common to use fewer participants.For example, Bianchetti's doctoral dissertation [70] investigates the cognitive tasks and fundamental visual stimuli used in the interpretation of aerial imagery.She analyzed the work of seven analysts in the domain of forest management.Kiefer et al. [71] included only five participants in their study using a mobile eye-tracking device.For this study, all employees of the urban planning departments in Olomouc were asked, and all agreed to being recorded.Many of them, though, were older, and recording eye-tracking was problematic because they had either drooping eyelids or wore glasses.Because of the different group sizes and the low number of experts, the part of the study comparing the two groups of participants can be considered explanatory, serving only as a brief insight into the differences between the behavior of experts and students.
The number of overall participants (26) in the experiment is not ideal, but it is in accordance with other eye-tracking studies.Alacam and Dalci [72] also used 26 participants in their study about finding map symbols, and Fuchs et al. [50] used 21 respondents in their study about flood maps.Many studies have also used a much smaller sample, for example, Ooms et al. [73] (14 participants), Opach and Nossum [74] (10 participants), etc.In the paper of Coltekin et al. [49], this issue is discussed, and authors concluded that it is quite difficult to recruit a large number of people with expertise in geovisualization and/or specific thematic domain (in their case soil maps, in our case urban planning).Similar to Coltekin et al., we argue that, despite the limitations regarding the sample size, we provide many interesting insights about user's interaction with different urban plans.

Statistical analysis of Trial Duration and Fixation
Count was used in this study, as well as visualization using Sequence Charts and FlowMaps.Other methods exist that could also be used to analyze data.A comprehensive description of visualization methods was introduced by Blascheck [75].Ambient and focal visual attention can be analyzed using methods proposed by Krejtz et al. [76], or similarities in stimuli reading strategy can be investigated with a tool proposed by Dolezalova and Popelka [77].For the purposes of confirming the hypotheses, the employed methods were sufficient.Urban plans were analyzed in general.The aim was not to evaluate each symbol, but to find maps on which symbols were readable and understandable, which could be discovered from effectiveness (accuracy of answers) and efficiency (time to answer).To analyze symbols or symbology in detail, other analyses and visualization methods would be used.Some of the results were influenced by a mistake in the Jihlava plan.The symbols in the legend did not match those on the map, and a problem also existed with diacritics.The study used four plans with different map designs.The study was created following a procedure of design, plan selection, task preparation, and stimuli preparation.In the stimuli preparation stage, the Jihlava plan using an incorrect map symbol was detected.This was not exceptional, as many mistakes of this type are in the Jihlava plan.This task was not eliminated because it permitted observation of a situation causing problems during task solving and provided evidence of the importance of creating correct sets of map symbols.
At the beginning of the study, three hypotheses dealing with map symbology, legend structure, and the differences between students and experts were proposed.
According to the first hypothesis, map symbology significantly influences the legibility and understandability of the plans, which will impact the duration and correctness of the tasks.According to the Trial Duration and Fixation Count, the worst plan is the Jihlava plan, the most time being needed to find an answer in Tasks Q4 and Q5.The problem encountered with this plan was the high number of map symbols in the legend and particularly the errors in those map symbols.In some cases, the symbol in the legend did not match the symbol used on the map.Another problem was with diacritics.The associativity of the map symbols was also low, and line symbols were too thick.The Hradec Králové plan can be considered the best plan, followed by the Olomouc plan.Both plans needed the least time in two of the six tasks (Q1 and Q6 for Hradec Králové and Q3 and Q5 for Olomouc), and also recorded the least number of fixations.The Hradec Králové plan contained the least number of map symbols as well as clearly distinguished colors as was for example illustrated on the Figure 12 showing output of FlowMap.
In addition to the eye-tracking experiments, the subjective questionnaire gathered respondents' opinions about the plans.The plan seen as worst was the Jihlava plan with incorrect symbols and a low level of legibility.As the best plan, respondents identified Olomouc and Hradec Králové.These results are consistent with the objective results from the eye-tracking data analysis.
According to the second hypothesis, legend structure significantly influences the legibility and understandability of the plans, which will impact the number of fixations and length of Dwell Time in the Legend AOI.Respondents spent statistically significantly more time in the legend than on the map.This can be interpreted as a result of low map legibility due to the low level of associativity of map symbols and the low quality of the map legend.Clearly designed maps with well-organized and correct legends and an adequate number of map symbols and colors would increase the legibility and understandability of the plan.The highest value of Dwell Time for Legend AOIs was observed for the Bohumín plan, which has an unstructured legend with a high number of symbols.In most cases, respondents spent more time in the legend than on the map.The exception is Task Q1 concerning housing areas.Symbols for this type of element are usually found in the upper part of the legend.Thus, respondents observed the legend for a statistically smaller proportion of time.Another explanation could be that all the plans used red for the housing category.This color is commonly used in Czech urban planning for this category, which was also expected by the respondents.Another exception is Task Q5, which was influenced by the long observation time on the Jihlava map due to an incorrect symbol being used.The most time spent in the Legend AOI was observed in Q3 and Q6.In Q3, respondents had to find proposed public areas.Proposed elements were in the second column of the legend, and it was more difficult to find the correct answer.In Q6, the map symbol used for the protected area of water resources was in the lower part of the legend on each plan, which prolonged the time needed to solve the task (see Figure 12).
The third hypothesis proposes that differences between students and experts will be seen in how these groups read plans, which will impact the duration and correctness of the tasks.The experts needed fewer fixations to solve the first three tasks in the study, which focused on common elements.Experts work with these elements daily.Greater differences between students and experts were found in the Olomouc and Jihlava plans.The experts were from the Olomouc region, so they may have been more familiar with the Olomouc plan.The difference in the Jihlava plan may have been due to its style.The Jihlava plan was from 1999 and older visualization styles might be better-known to experts than students.Some map symbols were very similar to symbols used in the methodologies described in the Introduction [23,24].The surprising finding is that the proportion of incorrect answers was much higher for the group of experts.In many cases, they were accustomed to another map style (different colors, different map symbols) and were confident of their (incorrect) answer even though they did not check the symbols in the legend.This demonstrates that even experienced users can be misled by the incorrect use of map symbols.If the map symbols used in urban plans had been standardized, the experts' answers probably would not have been incorrect.This argument supports the importance of standardization in urban planning.Similar conclusions appear in Dühr's [4,27] examination of Dutch plans, which are not standardized.However, Dühr mentioned the high level of standardization and uniformity in the German planning system, which means that the established rules for cartographic representations are almost impossible to change.

Conclusions
No existing studies investigate either the cartographical quality of urban plans or the cognitive aspect of working with urban plans.Only one subjective study focuses on cartographic failures in urban plans [13].For this reason, an objective eye-tracking experiment focusing on the analysis of four urban plans of cities in the Czech Republic was performed.The eye-tracking method is considered as objective, and with its use, it is possible to perform analyses that are not possible with any other method of evaluation.One example could involve an analysis of time spent on the map and the legend sections, along with eye-movement transitions between these two parts.
Four urban plans created by different authors, having different styles and published in different years were selected.To cover the most typical tasks involved in standard work with an urban plan, the analysis was based on six tasks.Twenty-six respondents (20 students and 6 experts working in urban-planning departments) participated in the study.
We conclude that two crucial factors influence the legibility of plans and significantly impact the understandability of maps.Those are the quality of map symbology (number of colors, the design of symbols, and features/layers on the map) and logical hierarchy/structure of the legend (number of symbols, legend size, legend structure, and legend order).Use of incorrect map symbol (similar but not the same) in the legend can cause a dramatical change in the duration and correctness of task solving.
Based on our study, we also conclude that the increasing quality of Czech urban plan symbology can be observed.Older plans (Jihlava and Bohumin) were designed based on ten-year-old symbology (many map symbols, similar colors, low level of symbols associativity, and line symbols were too thick).
On the opposite side, plans of Olomouc and Hradec Králové used newer symbology with a clearly structured legend, a low number of map symbols, and clearly distinct colors.According to the results of the eye-tracking data analysis, plans of Jihlava and Bohumín have lower cartographic quality (proved by longer Trial Duration, higher Fixation Count, longer Dwell Time in the Legend AOI and by more incorrect answers).
A similar conclusion follows from the respondents' average ranking of plans (Olomouc, 2; Hradec Králové, 2.04; Bohumín, 2.54; and Jihlava, 4.07).Plans that used correct, clear and structured legends with a small number of map symbols and colors were ranked much higher.
During the explanatory observation of the different behavior between experts and students, the accuracy of answers was found to be dependent on many factors, such as the position of the symbol in the legend, previous user experience, and self-confidence in the correct answer.Prior knowledge of more different urban plans symbology can lead to faster task solving but also to incorrect answers (if the knowledge of varying symbology is applied to another one).In the case of a complicated task or complicated legend (symbology), a group of experts is not faster than other users (students in this study).
To avoid misunderstandings, urban planners should be aware of quality issues in urban plans.For correct decision-making, it is essential to produce maps according to certain standards, make maps as clear as possible, and perform usability testing on maps.Standardization of urban plans, as the most complex thematic maps, should be in the focus of cartographers and urban planners more than today.Supplementary Materials: Plans in higher resolution can be accessed via www.eyetracking.upol.cz/urban.
Author Contributions: J.B. was responsible for the literature review on urban planning issues and collaborated on the experiment's design.S.P. was responsible for execution of the eye-tracking experiment and analysis of recorded data.M.B. collaborated on data analysis.All authors cooperated in interpretation and discussion of the results.

•:
Q1Mark an area for housing (Zoning map) • Q2: Mark an area for sports or recreation (Zoning map) • Q3: Mark an area for proposed public services (Zoning map) • Q4: Mark a railroad (Transportation map) • Q5: Mark a wastewater treatment plant (Utility networks map) • Q6: Mark a protected area of water resources (Natural protection map) 2.1.3.Procedure

•:
Q1Mark an area for housing (Zoning map) • Q2: Mark an area for sports or recreation (Zoning map) • Q3: Mark an area for proposed public services (Zoning map) • Q4: Mark a railroad (Transportation map) • Q5: Mark a wastewater treatment plant (Utility networks map) • Q6: Mark a protected area of water resources (Natural protection map) 2.1.3.Procedure

Figure 3 .
Figure 3. Design of the Experiment.

Figure 3 .
Figure 3. Design of the Experiment.

Figure 4 .
Figure 4. Boxplot of Trial Duration values.The smallest and most consistent values were observed for Hradec Králové and Olomouc.

Figure 4 .
Figure 4. Boxplot of Trial Duration values.The smallest and most consistent values were observed for Hradec Králové and Olomouc.

Figure 5 .
Figure 5. Boxplot of Fixation Count values.According to the results of the Kruskal-Wallis test, statistically significant differences were found between the plans of Hradec Králové and Bohumín and those of Hradec Králové and Jihlava.

Figure 5 .
Figure 5. Boxplot of Fixation Count values.According to the results of the Kruskal-Wallis test, statistically significant differences were found between the plans of Hradec Králové and Bohumín and those of Hradec Králové and Jihlava.

Figure 7 .
Figure 7.The number of fixations in the AOI "Map" and "Legend" for each task and city combination.

Figure 7 .
Figure 7.The number of fixations in the AOI "Map" and "Legend" for each task and city combination.Figure 7. The number of fixations in the AOI "Map" and "Legend" for each task and city combination.

Figure 7 .
Figure 7.The number of fixations in the AOI "Map" and "Legend" for each task and city combination.Figure 7. The number of fixations in the AOI "Map" and "Legend" for each task and city combination.

Figure 8 .
Figure 8. Boxplot of Dwell Time values for AOI around the Map and Legend.Statistically significant differences between maps and legends were found for all four cities.

Figure 8 .
Figure 8. Boxplot of Dwell Time values for AOI around the Map and Legend.Statistically significant differences between maps and legends were found for all four cities.

25 Figure 9 .
Figure 9. Boxplot of Dwell Time values for AOI around the Map and Legend.Statistically significant differences between the maps and legends were found for all tasks except Q4.

Figure 10 .
Figure 10.Boxplot of Dwell Time values in seconds for AOI "Legend".

Figure 9 . 25 Figure 9 .
Figure 9. Boxplot of Dwell Time values for AOI around the Map and Legend.Statistically significant differences between the maps and legends were found for all tasks except Q4.

Figure 10 .
Figure 10.Boxplot of Dwell Time values in seconds for AOI "Legend".

Figure 10 .
Figure 10.Boxplot of Dwell Time values in seconds for AOI "Legend".

Figure 11 .
Figure 11.Sequence Chart showing the differences in participant strategy while solving different tasks.The two most complex tasks (Q5 (Jihlava) and Q6 (Olomouc)) were selected for comparison.

Figure 11 .
Figure 11.Sequence Chart showing the differences in participant strategy while solving different tasks.The two most complex tasks (Q5 (Jihlava) and Q6 (Olomouc)) were selected for comparison.

Figure 12 .
Figure 12.FlowMaps showing the aggregated moves of participants' gaze while Task Q6 solving.Only arrows representing more than three moves are displayed.

Figure 12 .
Figure 12.FlowMaps showing the aggregated moves of participants' gaze while Task Q6 solving.Only arrows representing more than three moves are displayed.

Figure 13 .
Figure 13.Boxplot of Fixation Count values.Differences between experts and students, especially for the Jihlava and the Olomouc plans, were found.

Figure 14 .
Figure 14.Boxplot of Fixation Count values.Differences between experts and students, especially for Tasks Q1, Q2, and Q3, were found.These tasks focused on frequently used map elements.Experts are more familiar with them, and they needed fewer fixations to solve the tasks.

Figure 13 . 25 Figure 13 .
Figure 13.Boxplot of Fixation Count values.Differences between experts and students, especially for the Jihlava and the Olomouc plans, were found.

Figure 14 .
Figure 14.Boxplot of Fixation Count values.Differences between experts and students, especially for Tasks Q1, Q2, and Q3, were found.These tasks focused on frequently used map elements.Experts are more familiar with them, and they needed fewer fixations to solve the tasks.

Figure 14 .
Figure 14.Boxplot of Fixation Count values.Differences between experts and students, especially for Tasks Q1, Q2, and Q3, were found.These tasks focused on frequently used map elements.Experts are more familiar with them, and they needed fewer fixations to solve the tasks.

Figure 15 .
Figure 15.A summary of incorrect answers throughout the experiment.

Figure 16 .
Figure 16.Student answers for Task Q3 (Hradec Králové) are marked with diamonds.Expert answers are marked with diamonds with a dot.The correct answer is the purple cross-hatched area on the left.

Figure 15 .
Figure 15.A summary of incorrect answers throughout the experiment.

Figure 15 .
Figure 15.A summary of incorrect answers throughout the experiment.

Figure 16 .
Figure 16.Student answers for Task Q3 (Hradec Králové) are marked with diamonds.Expert answers are marked with diamonds with a dot.The correct answer is the purple cross-hatched area on the left.Figure 16.Student answers for Task Q3 (Hradec Králové) are marked with diamonds.Expert answers are marked with diamonds with a dot.The correct answer is the purple cross-hatched area on the left.

Figure 16 .
Figure 16.Student answers for Task Q3 (Hradec Králové) are marked with diamonds.Expert answers are marked with diamonds with a dot.The correct answer is the purple cross-hatched area on the left.Figure 16.Student answers for Task Q3 (Hradec Králové) are marked with diamonds.Expert answers are marked with diamonds with a dot.The correct answer is the purple cross-hatched area on the left.

Figure 17 .
Figure 17.Student answers for Task Q4 (Olomouc) are marked with diamonds.Expert clicks are diamonds with a dot.The correct answer is the line at the top.

Figure 18 .
Figure 18.A clip from Task Q5 (Jihlava).The symbol in the legend (bottom) did not correspond to the symbol on the map.

Figure 17 . 25 Figure 17 .
Figure 17.Student answers for Task Q4 (Olomouc) are marked with diamonds.Expert clicks are diamonds with a dot.The correct answer is the line at the top.

Figure 18 .
Figure 18.A clip from Task Q5 (Jihlava).The symbol in the legend (bottom) did not correspond to the symbol on the map.

Figure 18 .
Figure 18.A clip from Task Q5 (Jihlava).The symbol in the legend (bottom) did not correspond to the symbol on the map.

Figure 19 .
Figure 19.Results of the subjective questionnaire.The respondents ranked all plans on a scale from 1 (best) to 5 (worst).

Figure 19 .
Figure 19.Results of the subjective questionnaire.The respondents ranked all plans on a scale from 1 (best) to 5 (worst).