Differences in Thematic Map Reading by Students and Their Geography Teacher

A school world atlas is likely the first systematic cartographic product which students encounter in their lives. However, only a few empirical studies have analysed school atlases in the context of map reading and learning geographical curricula. The present paper describes an eye-tracking study conducted on 30 grammar school students and their geography teacher. The study explored ten tasks using thematic world maps contained in the Czech school world atlas. Three research questions were posed: (i) Are students able to learn using these particular types of maps? (ii) Have the cartographic visualization methods in the school atlas been adequately selected? (iii) Does the teacher read the maps in the same manner as students? The results proved that the students were sufficiently able to learn using thematic maps. The average correctness of their answers exceeded 70%. However, the results highlighted several types of cartographic visualization methods which students found difficult to read. Most of the difficulties arose from map symbols being poorly legible. The most problematic task was estimating the value of the phenomenon from the symbol size legend. Finally, the difference between the students’ and teacher’s manner of reading maps in each task was analysed qualitatively and then quantitatively by applying two different scanpath comparison methods. The study revealed that the geography teacher applied a different method than her students. She avoided looking at the map legend and solved the task using her knowledge.


Map Reading
According to Pravda [1] and Pravda and Kusendová [2], reading a map (perception and understanding map content) is an essential indicator of intelligence in modern humans. Reading a map consists of perceiving the map, using the map's legend, and understanding the map's content. Reading a map is therefore a process of understanding its content through knowledge of the map's language and methods of its use. Reading a map would be meaningless if it was not followed by the use of information acquired from the map, such as standard navigation of the terrain and simple map measurements or generating information which enhances human knowledge. Map reading images studied by Vondráková and Voženílek [3] indicated some specific findings which were reflected in map reader's preferences. Initially, the preferred image map was very often useless, and users chose the map that was subjectively evaluated as one of the worst, but was much more suitable for solving the task. The literature review on map reading found that the vast majority of sources focused on reading topographic maps and navigation of the terrain. In the present paper, the authors understand "map reading" differently. It is not wayfinding, but rather how maps are used to obtain desired School world atlases are teaching materials used by students from the sixth grade in elementary school (11 years) onwards to high school students (19 years). School world atlases have many complementary atlases which focus mainly on the Czech Republic, individual continents, and topics such as world energy or finance. All three of the above-mentioned publishers provide their school world atlases in digital versions.

Evaluation of Thematic Maps and School World Atlases
Only a few studies have focused on the educational aspects of map reading. Brychtova, et al. [17] indicate that a visually appealing map usually achieves higher preferences and popularity with the public, especially students. Carswell [18] analysed children's abilities in topographic map reading, summarizing the findings that teachers overestimate success in teaching map reading skills while also underestimating the map-reading abilities of children. van Dijk, et al. [19] and Schee and Dijk [20] tested the ability of students to use different types of map skills. Their studies revealed that giving students an opportunity to determine their own sequence of performing map assignments is a recommended strategy. Hanus and Marada [21] compared the curricular documents of different countries with special emphasis on map skills. The results showed that the potential of geography in map skills in the Czech curricula is not sufficiently fulfilled.
Havelková and Hanus [9] conducted research examining the effect of different (thematic) mapping methods. The results indicated that students experienced problems with maps using quantitative mapping methods. Students were more successful in tasks where qualitative or both qualitative and quantitative mapping methods were used. Working with thematic maps was also evaluated by Reyes Nuñez, et al. [22] on a sample of students from Argentina and Hungary. The evaluation was supplemented by a questionnaire for teachers. Thematic (political) atlases are used by teachers in Argentina, but mainly physical atlases are used in Hungary. Reyes Nuñez and Juhász [23] also analysed the effectiveness of cartograms. The results showed that geometric area cartograms were more suitable than geographic area cartograms for use in school cartography. The effectiveness of an area cartogram in visualizing spatial data was also evaluated by Sun and Li [24]. Their analysis showed that a pseudo-cartogram is the most preferred technique, and a Dorling cartogram is the least preferred.
Kubíček, et al. [25] measured response times and error rates in map-reading tasks relative to different variations of linear feature visualizations. The results confirmed that colour hue and size were more efficient than shape and colour value.
Gołębiowska [26] aimed to understand how the types of legend layout in thematic maps functioned during map reading. Study participants were asked to perform two sets of tasks using two thematic maps with different legend layouts. Three types of legend layouts were used in the study: list legend, grouped legend, and natural legend. The use of a natural legend required the most time, as this type of legend is not very common, and participants had to concentrate on understanding the legend principle. The arrangement of symbols in a grouped legend reduced the load on working memory. Pétera, et al. [27] conducted an empirical study exploring map drawing skills. Their empirical research showed less developed competence models of map drawing as opposed to map reading.
However, none of the above-mentioned studies tested maps from atlases. Słomska [28] created an overview of different types of maps used as stimuli in cartographical empirical research. The study summarized 103 empirical studies from four cartographic journals. The study substantiated that only one study [26] used maps from atlases. The atlas used in the study [29] was an interactive digital atlas which displayed a broad range of thematic data for the USA. This type of information is completely different material than a school world atlas.
One of the most comprehensive studies of school atlases was conducted by Bugdayci and Bildirici [16], who evaluated 22 atlases used in geography education and social studies. The authors examined generalization, symbology, fonts, colours, and common map elements. The final chapter of the study described the map legend, geographic location, and map scale. It also contained some examples and suggestions for improving the cartographic design of maps contained in the atlases.
Voženílek, et al. [30] investigated the awareness of Czech students of the symbol sets used in 11 different world school atlases. The research applied methods for literature search, comparison of atlases, online surveys, and statistical processing. The results confirmed that Czech students were able to understand the map symbols and cartographic methods used in European school atlases. These results were consistent with Michaelidou, et al. [31], who analysed the ability of elementary school children to analyse the map content of different thematic maps.
Blaha [32] highlighted the importance of aesthetics in the user-friendliness of cartographic products and proposed evaluation methods for map aesthetics, such as scoring, classification, expert estimations, and surveys. The scoring system was used in another study [33] on two Czech school world atlases and explored aesthetics and user-friendliness in maps.
Peresadko and Baltabaeva [34] evaluated the school atlases currently used in Turkmenistan. They indicated that the atlases were outdated and contained a large number of cartographic inaccuracies. The authors justified the need to create a new school atlas for Turkmenistan. Gómez Solórzano, et al. [35] conducted a survey of 50 respondents to compare printed and digital atlases. Using five tasks, the authors measured correctness, reaction time, satisfaction, perception, and emotions. The research showed that printed and digital atlases complement each other. Usability metrics varied slightly; those related to correctness and reaction time were higher for the digital atlas, while those related to satisfaction and perception were higher for the printed atlas.
Song, et al. [36] analysed the main factors affecting the design of symbols in the National Economic Atlas of China. Zhang and Chen [37] undertook an evaluation of the structure, content and design of the Shanxi Province tourist atlas.

The Use of Eye-Tracking
The first decade of the twenty-first century opened a new stage in perceptual research. This stage could be described as cognitive-digital since this type of research is based on computer software and deals with the cognitive aspects of map perception [38]. According to Rohrer [39], one of the most objective methods in evaluating (cartographic) stimuli is eye-tracking since it shows "what people do" instead of "what people say". Popelka and Vozenilek [40] described the common aspects of eye-tracking and space-time-cube and have encouraged joint studies in cartographic research.
Dong, et al. [41] applied eye-tracking in geographic education to evaluate the impact of geography courses in students' abilities to work with maps. However, the map used in the experiment was not from a school atlas but a terrain visualization. Biland and Çöltekin [42] used a similar type of stimuli. Havelková and Gołębiowska [43] evaluated thematic maps using eye-tracking. In their study, the stimuli were created by the authors but selected according to a content analysis of school geography atlases and textbooks.
Kiik, et al. [44] compared four different designs of area symbols in thematic maps in a study to determine whether area symbols are suitable in identifying the extent of polygons while not distracting the map reader. The best results were achieved with hatches. Popelka and Dolezalova [45] used three-dimensional thematic maps as stimuli in eye-tracking experiments. Brychtova and Vondrákova [46] evaluated sequential colour schemes used in thematic maps. Göbel, et al. [45] used eye-tracking to study the adaptation of legend content using gaze-based methods. The study showed that legend content changed according to gaze. The symbol types which had been fixated on previously were drawn with full opacity in the map's legend, while all others were reduced.
The present study compares the map reading strategy of students and their teacher, which focuses on the comparison between experts and novices. This issue was previously analysed in a topic related to cartography, for example, in a study by Burian, et al. [46]. The study evaluated the interpretation of four different urban plans and compared students and experts in urban planning. The results showed that the experts made a relatively large number of mistakes since they were too self-confident and did not look into the map legend. This might signify a parallel with the teacher's strategy in this study, who also avoided using a map legend and answered directly. The difference between the students' and teacher's map-reading strategy might be also studied as the singular value decomposition similarity between scanpath sets [47].
Anderson and Leinhardt [48] asked participants to draw the shortest distance between two locations as they would appear on the earth's surface (using a map with Mercator projection). The results showed that geography experts performed significantly better than novices and pre-service teachers. Their results contrast with the results of the present study, but it is necessary to acknowledge that the tasks were completely different. The participants in the study of Anderson and Leinhardt [48] were expected to use the rules according to their knowledge. In the present study, the participants were instructed to use the map to solve the task.
The difference between experts and novices in reading planimetric and contour maps was analysed by Thorndyke and Stasz [49] and Gilhooly, et al. [50]. More recently, the perception of interactive and static 3D maps was investigated by Herman, et al. [51]. In contrast to our findings, the authors uncovered a statistically significant difference from an accuracy point of view when experts were more correct.
Other cartographic studies have been conducted by Ooms, et al. [52] and Ooms, et al. [53]. The participants in these studies worked with different types of map. Their results indicated that an expert's process of interpretation was much quicker than a novice's. The research confirmed that the trial duration of the teacher was quicker in some tasks but slower in others.
To the best of our knowledge, no previous eye-tracking study has evaluated students working with school atlases.

Motivation and Research Questions
According to cartographic communication models [6,[54][55][56], maps are products which aim to assist people in understanding the world. Generally, the first systematic cartographic product young people encounter in their lives is a school world atlas.
The most commonly used school world atlas in the Czech Republic is published by Kartografie PRAHA. The authors of the present study surveyed 600 Czech geography teachers with an online questionnaire, discovering that most of these teachers (94%) used this atlas in their geography lessons. One of the survey questions asked about the role of the school world atlas in teaching. On a 10-point Likert scale, 10 indicated the most important role; the median value of responses to this question was 9. Most of the teachers worked with the atlas every lesson (57%), while 29% of them worked with the atlas every second lesson. Only 3% of the teachers used the atlas less than every third lesson. These findings verify that the school world atlas is crucial material in geography teaching.
The atlas from Kartografie PRAHA contains 162 maps. Of these, 127 are thematic and 35 are generally geographic. From the 50 world maps contained in the atlas, 9 thematic maps were selected for the experiment. The selection criteria and detailed characteristics of the experiment's maps are described in Section 2.2. (Stimuli and Tasks).
These atlases should help students understand the natural and socio-economic environment of the Earth. School atlases should therefore be comprehensible, well-arranged, and clear and easy to use by students and their teachers.
Studies which examine school atlas map reading can reveal whether students are able to retrieve the information presented on these maps and can also potentially detect problems in map design. However, the process of understanding maps in school atlases has not yet been fully explored. As described above, no study from 103 cartographic user studies has focused on school world atlases, or atlases in general [28]. The objective of the present study is to begin to fill this gap.
The vast majority of cartographic communication models describe the process between the cartographer and map reader. However, these models do not describe whether readers interpret maps in the same manner. A comparison of map reading strategies between students and teachers might unveil a source of problems some students have with map reading. If the teacher and students read maps differently, educational processes might also be affected and disrupted.
The present paper describes an eye-tracking study using thematic maps from a school world atlas as stimuli. Participants solved several tasks using these maps. According to this distribution of map skills, the tasks applied in the experiment fell into the categories of map reading (symbol detection, legend comprehension) and map analysis (extraction of phenomenon location and distribution, comparison of spatial phenomena distribution) categories.
The main aim of the experiment in the present study was to analyse how students and their teacher read maps in a school world atlas. The task in the experiment was to locate a particular object on a thematic map. The present paper addresses three research questions: Q1: Are students able to learn with thematic maps and legends from a school world atlas by finding information and searching for specific objects on a map? Q2: Are the cartographic methods used in the school world atlas comprehensible to students? Q3: Do students read the thematic maps from the school world atlas in the same manner as their teacher?

Experiment Design
At the beginning of the testing session, the purpose of the experiment was explained to participants and basic information obtained about the principle of eye-tracking technology. The experiment was designed in the GazePoint Analysis software. A scheme of the study and experiment is given in Figure 1. The main aim of the experiment in the present study was to analyse how students and their teacher read maps in a school world atlas. The task in the experiment was to locate a particular object on a thematic map. The present paper addresses three research questions: Q1: Are students able to learn with thematic maps and legends from a school world atlas by finding information and searching for specific objects on a map? Q2: Are the cartographic methods used in the school world atlas comprehensible to students? Q3: Do students read the thematic maps from the school world atlas in the same manner as their teacher?

Experiment Design
At the beginning of the testing session, the purpose of the experiment was explained to participants and basic information obtained about the principle of eye-tracking technology. The experiment was designed in the GazePoint Analysis software. A scheme of the study and experiment is given in Figure 1. The experiment was calibrated before testing commenced, and the results were then inspected by the technician responsible for testing. After successful calibration, a task with no stimulus was given to each respondent. The respondents received an indefinite time to read and remember the task. A fixation cross was displayed for 600 ms between the task and the map stimuli to calibrate the origin of eye-movement trajectory to the centre of the screen. The stimulus was displayed for a maximum of 60 s, and respondents were required to find particular objects on the map. Stimuli were presented in a fixed order, from simplest to more complex (according to the authors' opinions). In most of the tasks, the participants responded using a mouse click (clicks) directly on the map. Only task 10 required the participants to search for specific information on the map and say it aloud. The technician registered these answers.

Stimuli and Tasks
All stimuli used in the study were obtained from the electronic version of the School Atlas of the The experiment was calibrated before testing commenced, and the results were then inspected by the technician responsible for testing. After successful calibration, a task with no stimulus was given to each respondent. The respondents received an indefinite time to read and remember the task. A fixation cross was displayed for 600 ms between the task and the map stimuli to calibrate the origin of eye-movement trajectory to the centre of the screen. The stimulus was displayed for a maximum of 60 s, and respondents were required to find particular objects on the map. Stimuli were presented in a fixed order, from simplest to more complex (according to the authors' opinions). In most of the tasks, the participants responded using a mouse click (clicks) directly on the map. Only task 10 required the participants to search for specific information on the map and say it aloud. The technician registered these answers.

Stimuli and Tasks
All stimuli used in the study were obtained from the electronic version of the School Atlas of the World published by Kartografie PRAHA (4th edition) [57]. All the maps are identical to the print version of the atlas. Nine thematic world maps with different topics were selected for the experiment.
Because of the monitor's aspect ratio and resolution (4:3; 1280 × 1024), some maps required cropping for better legibility. No relevant parts or information that could affect the results of the experiment were removed. Each map always contained at least a map field and legend to preserve as much of the map as possible concerning legibility. All maps used as stimuli differ in visualization methods, data type (qualitative/quantitative) and style of legend. All maps are shown in Figure 2.  Because of the monitor's aspect ratio and resolution (4:3; 1280 × 1024), some maps required cropping for better legibility. No relevant parts or information that could affect the results of the experiment were removed. Each map always contained at least a map field and legend to preserve as much of the map as possible concerning legibility. All maps used as stimuli differ in visualization methods, data type (qualitative/quantitative) and style of legend. All maps are shown in Figure 2. Full-resolution previews are included in the Supplementary Materials. The three research questions determined strategic selection of map-stimuli and compilation of tasks. Q1 asks whether students are able to learn with thematic maps in a school world atlas. The atlas was thoroughly inspected for its coverage of a wide range of geography curriculum topics. Accordingly, various world maps focusing on different geographical themes (vegetation zones, urbanisation, geology, economy, etc.) were selected for the eye-tracking experiment. Q2 probes the comprehension of cartographic methods. Maps which applied different cartographic methods (graduated symbols, choropleth maps, area symbols, etc.) were therefore selected. Q3 investigates the similarities and differences in the map-reading strategies of the students and their teacher and builds on concepts of Q1 and Q2.
The tasks were formulated for each map stimuli according to the type of information displayed, visualization method, and legend style and related directly to the research question of whether respondents could read thematic maps and use the legend to search for information and find a specific object on the map.
The maps used in the experiment fell into several types according to the type of data which they displayed: qualitative (Map01, Map05, and Map06), quantitative (Map02, Map04, and Map09), and both qualitative and quantitative (Map03, Map07, Map08, and Map10).
In the maps which displayed qualitative data, the assigned task was straightforward. Respondents were required to find an object in the legend and then identify it on the map. The task was to identify all the areas with temperate deciduous forests in Map01, a convergent plane boundary in Map05, and places where iron ore was mined in Map06.
Quantitative data were visualized using a choropleth map (Map02), graduated symbol map (Map04), and flow map (Map09). The task in the choropleth map (Map02) was to identify all the countries with less than 20% urban populations. The task in both diagram maps was to find urban agglomeration and shipping routes with certain properties.
The remaining maps contained both qualitative and quantitative information. All of these maps included proportional symbols, and areas were displayed as either choropleth maps or area symbols. The tasks required participants to work with the diagrams and identify the country with the highest proportion of potatoes in total calorie consumption (Map03), countries with specific GDP (Map07), and countries with higher imports than exports (Map08). In the task for Map08, the answer could be The three research questions determined strategic selection of map-stimuli and compilation of tasks. Q1 asks whether students are able to learn with thematic maps in a school world atlas. The atlas was thoroughly inspected for its coverage of a wide range of geography curriculum topics. Accordingly, various world maps focusing on different geographical themes (vegetation zones, urbanisation, geology, economy, etc.) were selected for the eye-tracking experiment. Q2 probes the comprehension of cartographic methods. Maps which applied different cartographic methods (graduated symbols, choropleth maps, area symbols, etc.) were therefore selected. Q3 investigates the similarities and differences in the map-reading strategies of the students and their teacher and builds on concepts of Q1 and Q2.
The tasks were formulated for each map stimuli according to the type of information displayed, visualization method, and legend style and related directly to the research question of whether respondents could read thematic maps and use the legend to search for information and find a specific object on the map.
The maps used in the experiment fell into several types according to the type of data which they displayed: qualitative (Map01, Map05, and Map06), quantitative (Map02, Map04, and Map09), and both qualitative and quantitative (Map03, Map07, Map08, and Map10).
In the maps which displayed qualitative data, the assigned task was straightforward. Respondents were required to find an object in the legend and then identify it on the map. The task was to identify all the areas with temperate deciduous forests in Map01, a convergent plane boundary in Map05, and places where iron ore was mined in Map06.
Quantitative data were visualized using a choropleth map (Map02), graduated symbol map (Map04), and flow map (Map09). The task in the choropleth map (Map02) was to identify all the countries with less than 20% urban populations. The task in both diagram maps was to find urban agglomeration and shipping routes with certain properties.
The remaining maps contained both qualitative and quantitative information. All of these maps included proportional symbols, and areas were displayed as either choropleth maps or area symbols. The tasks required participants to work with the diagrams and identify the country with the highest proportion of potatoes in total calorie consumption (Map03), countries with specific GDP (Map07), and countries with higher imports than exports (Map08). In the task for Map08, the answer could be discovered from the graduated symbols (showing values for imports and exports) or using area symbols (chorochromatic map showing trade balance). In the final task (Map10), participants estimated Brazil's exports according to a value scale.
Because the atlas is in the Czech language, all of the tasks were also formulated in Czech. Translations of these tasks are in Table 1. Table 1. List of the experiment's tasks (translated from Czech to English).

Task01
Identify all areas with temperate deciduous forests. Task02 Identify all countries with less than 20% urban populations. Task03 Identify the country with the highest proportion of potatoes in total calorie consumption.

Task04
Identify urban agglomerations with more than 20 million inhabitants in North America, Central America, and South America. Task05 Identify a convergent plate boundary. Task06 Identify a place on every continent where iron ore is mined. Task07 Identify three countries with a total GDP of approximately USD 2500 billion. Task08 Identify three countries whose imports exceed exports. Task09 Identify three shipping routes with an annual capacity under 100 million tonnes. Task10 Estimate Brazil's export volume in billions of USD.

Participants
Forty-one third grade students (~18 years) from a Czech grammar school participated in the experiment. Testing was conducted in two stages over two weeks at the end of 2018. The students' geography teacher also attended the testing in the first half of 2019. For all of the participants, the testing in this experiment was their first experience with eye-tracking technology. Some of them may have felt nervous, which may have affected the data quality. Eleven of the 41 students were removed from the dataset because of the inaccuracy of the device or problems with calibration. This data pre-processing stage is described later. The data recorded for 30 students (8 males and 22 female) and one geography teacher (female) were eventually included in the analysis.
The teacher who participated in the research has been teaching geography for over 30 years at grammar school with more than 400 students. She uses the school world atlas from Kartografie PRAHA (version from 2006) and older atlases (from around 1989) in her classes. Her students use atlases every lesson, primarily with general geographic maps and less with thematic maps (climate, hydrology, lithosphere, biosphere, pedosphere, etc.).

Apparatus
Eye trajectories were measured using three GazePoint 3 eye-trackers operated by three technicians. The GazePoint eye-tracker is an inexpensive device similar to TheEyeTribe tracker and Tobii EyeX. The accuracy and precision of all these low-cost eye-trackers have been previously tested in the studies by Dalmaijer [58], Ooms, et al. [59] and Popelka, et al. [60]. Janthanasub and Meesad [61] tested the accuracy of the GazePoint 3 eye-tracker in their study. The results showed it was suitable for use in research. GazePoint 3 has also been used in studies in the field of neurosciences [62], marketing [63], mathematics [64], physics [65], kinesiology, and sports science [66]. A comprehensive list of publications concerning the GazePoint tracker is available at https://www.gazept.com/meet-the-team/publications/.

Data Pre-Processing
Recorded eye-movement data was pre-processed and validated before data analysis. Recording was conducted in the classroom. The students had had no previous experience with eye-tracking testing.
Data were recorded using GazePoint Analysis software. However, the application's capabilities for data analysis are minimal. The data were therefore converted into a format readable by the open-source application OGAMA [67] using the online tool at http://eyetracking.upol.cz/gp2ogama. The OGAMA application allows the ratio of samples with coordinates 0;0 (upper-left corner of the stimulus) to be calculated. These samples represent data loss caused by eye-blinking and lost signals. The ratio of samples recorded off-screen was another factor which required checking because of the GazePoint eye-tracker. In the extreme cases, the ratio exceeded 60%. This data had to be removed from the dataset.
The values of the ratio of data loss (α) and off-screen samples (β) for each participant and stimuli are given in Table 1. The instances where α or β ≥ 10 are highlighted in red. In the next step, the sum of these samples was calculated, and 11 students with more than three problematic stimuli were excluded from further analysis. The remainder of the participants were renamed consecutively S01-S30. The students' geography teacher also engaged in the testing. A summary of the quality of recorded data is depicted in Figure 3. The values in the table represent the ratio of data loss α (left) or off-screen samples β (right) for each participant. The TOTAL column contains the number of cases where the values exceeded 10%. The ratio of samples recorded off-screen was another factor which required checking because of the GazePoint eye-tracker. In the extreme cases, the ratio exceeded 60%. This data had to be removed from the dataset. The values of the ratio of data loss (α) and off-screen samples (β) for each participant and stimuli are given in Table 1. The instances where α or β ≥ 10 are highlighted in red. In the next step, the sum of these samples was calculated, and 11 students with more than three problematic stimuli were excluded from further analysis. The remainder of the participants were renamed consecutively S01-S30. The students' geography teacher also engaged in the testing. A summary of the quality of recorded data is depicted in Figure 3. The values in the table represent the ratio of data loss α (left) or off-screen samples β (right) for each participant. The TOTAL column contains the number of cases where the values exceeded 10%.

Methods of Analyses
Fixations and saccades were identified before the analyses. The fixation detection algorithm (I-DT) thresholds were set to 20 pixels (distance between points) and 5 (minimum number of samples). The optimal fixation detection algorithm is described by Popelka [68] in more detail.
Q1 (students' ability to learn with thematic maps) was analysed according to the correctness of the responses, and trial duration was analysed as a metric indicating the time required for respondents to give an answer. Participants marked their answers on the stimuli using mouse clicks. The online tool http://eyetracking.upol.cz/gp2vanalytics/ converted data from GazePoint Analysis into V-Analytics [69]. V-Analytics was used to visualize mouse clicks and can also be applied to eyemovement data analysis [70]. The Kruskal-Wallis post hoc Nemenyi test was applied to statistically evaluate the recorded data in RStudio at a significance level of 0.05.
Q2 (comprehension of cartographic methods) was answered based on a visual inspection of recorded scanpaths and data visualization using sequence charts. The results obtained in Q1 (correctness of answers and trial duration) were used for pointing to problematic cartographic tasks. In the next phase, two experimenters analysed eye-movement trajectories (scanpaths) and created

Methods of Analyses
Fixations and saccades were identified before the analyses. The fixation detection algorithm (I-DT) thresholds were set to 20 pixels (distance between points) and 5 (minimum number of samples). The optimal fixation detection algorithm is described by Popelka [68] in more detail.
Q1 (students' ability to learn with thematic maps) was analysed according to the correctness of the responses, and trial duration was analysed as a metric indicating the time required for respondents to give an answer. Participants marked their answers on the stimuli using mouse clicks. The online tool http://eyetracking.upol.cz/gp2vanalytics/ converted data from GazePoint Analysis into V-Analytics [69]. V-Analytics was used to visualize mouse clicks and can also be applied to eye-movement data analysis [70]. The Kruskal-Wallis post hoc Nemenyi test was applied to statistically evaluate the recorded data in RStudio at a significance level of 0.05. Q2 (comprehension of cartographic methods) was answered based on a visual inspection of recorded scanpaths and data visualization using sequence charts. The results obtained in Q1 (correctness of answers and trial duration) were used for pointing to problematic cartographic tasks. In the next phase, two experimenters analysed eye-movement trajectories (scanpaths) and created sequence charts. From these visualizations, they tried to reveal the reason for low correctness or high trial duration. Typically, the distribution of attention between the map and the legend or focusing on specific parts of the map was analysed.
Sequence charts are displaying the distribution of fixations in predefined areas of interest (AOI). Participants' eye-movement data are represented with coloured bars. The colour of each cell in a bar represents one fixation in the particular AOI. Unfortunately, neither OGAMA nor GazePoint Analysis offers this type of visualization; the charts were created manually in MS Excel using the PART function and conditional formatting. Sequence charts for all tasks are available at http://eyetracking.upol.cz/atlases_thematic/SequenceCharts.pdf.
Q3 (comparison of difference of map-reading strategies of students and their teacher) addressed an analysis of eye-movement data using two approaches to calculate scanpath similarity. The first approach is based on the string-edit-distance using the ScanGraph tool [71,72], which is designed to process data exported from OGAMA directly. ScanGraph analyses the order of visited AOIs as a sequence of characters and calculates the similarity of these sequences using Levenshtein distance, the Needleman-Wunsch algorithm or Damerau-Levenshtein distance. Individual participants are visualized as nodes in a graph, and ScanGraph looks for cliques in this graph. The cliques represent groups of participants who were similar to each other at least to a specific (user-defined) degree. The second approach in analysing the scanpath similarity is based on the multimatch method introduced by Jarodzka, et al. [73] and Dewhurst, et al. [74]. This method represents scanpaths as mathematical vectors and allows the scanpath to retain a sequence of fixations and saccades and measure similarity using geometry. Multimatch similarity measurements are sensitive to the differences in shape, position, length, direction, and duration between two scanpaths [73]. As the authors of multimatch indicate, the method does have some drawbacks, the most significant being that measurements only compare two scanpaths.
In the present study, this drawback is addressed by using batch computation in a python-based multimatch alternative called multimatch-gaze [75]. Batch computing was possible in all similarity measurements except duration. In this case, the results were normalized according to the length of the longer scanpath, so it is not possible to compare values for multiple pairs of scanpaths. The results from multimatch-gaze were transformed into separate matrices for each task and each type of similarity (vector, direction, length, position). These matrices can be either imported into ScanGraph for visual analysis or analysed directly (i.e., in MS Excel).

Correctness of Answers-Students' Ability to Learn with Maps (Q1)
In the majority of tasks, participants marked their answers directly on the map using mouse click (clicks). The correctness of these answers was then determined. These answers were used to discuss and resolve the first research question. Participants only estimated the value in Task10 according to the symbol size legend. Figure 4 contains a summary of the answers. The correct answers are highlighted in green, incorrect in red, and partially correct answers (i.e., not all correct countries were marked) in orange. All missing answers were marked as incorrect. Correctness in all the tasks by all participants was summarized. Each correct answer was allocated 1 point, and partially correct answers 0.5 points. The trial duration of each task was also investigated. The boxplots in Figure 5 chart the data for 30 students. The value for the geography teacher is indicated with a red dot. Statistically significant differences between the tasks according to the Kruskal-Wallis post hoc Nemenyi test are represented using blue lines.
The participants required the most time to solve Task07 and Task10. These two tasks also indicated problems with correctness. A high trial duration value was also observed for Task01. However, participants needed only around 19 s (median) to solve Task02. The trial duration value for this task differed significantly from four other tasks (Task01 (p < 0.001), Task04 (p = 0.004), Task07 (p < 0.001) and Task10 (p < 0.001)).
No clear connection for trial duration between the students and teacher was identified. In some tasks, the teacher was quicker than the students, but for other tasks, the teacher's trial duration was much higher. The most straightforward tasks were Task02 ("Identify all countries with less than 20% of the urban population") and Task05 ("Identify convergent plate boundary"), with a correctness of 92%. The most difficult task, however, was Task10, where participants estimated the volume of Brazil's exports according to symbol size legend. Although the correct answer was USD 250 billion, the responses from participants varied from 3 to 5000. Because of tolerances, responses indicating a value between 200 and 300 were counted as partially correct. Participants also demonstrated problems with Task07, which required them to identify countries with a specific GDP value according to symbol size legend.
The average correctness of answers from all students achieved was 71%. It could therefore be said that generally, students were sufficiently able to read thematic maps and learn from them.
The trial duration of each task was also investigated. The boxplots in Figure 5 chart the data for 30 students. The value for the geography teacher is indicated with a red dot. Statistically significant differences between the tasks according to the Kruskal-Wallis post hoc Nemenyi test are represented using blue lines.
The participants required the most time to solve Task07 and Task10. These two tasks also indicated problems with correctness. A high trial duration value was also observed for Task01. However, participants needed only around 19 s (median) to solve Task02. The trial duration value for this task differed significantly from four other tasks (Task01 (p < 0.001), Task04 (p = 0.004), Task07 (p < 0.001) and Task10 (p < 0.001)).
No clear connection for trial duration between the students and teacher was identified. In some tasks, the teacher was quicker than the students, but for other tasks, the teacher's trial duration was much higher.

Results of Individual Tasks-Comprehension of Cartographic Methods (Q2)
The next step analysed the participants' behaviour in solving individual tasks.

Task01
In the experiment's first task, the participants identified all areas with temperate deciduous forests. It was assumed this task would be very easy for the students since all that was required was identifying the correct symbol from the legend and recognizing all the areas indicated by this symbol. However, the accuracy of the answers was 61%, and only 13 students solved the task correctly and 11 students partially. The students marked temperate deciduous forests together with taiga or even subtropical and tropical forests. The reason was probably a poorly distinguishable legend, with all three types of vegetation being visualized using very similar symbols (see bottom-left section of Figure 6). Figure 6 indicates the fixations of participants in grey and the teacher's fixation in red. The answers (clicks) are visualized as blue dots. From the distribution of fixations, it is evident that participants did not focus their attention on the strip with climate belts at the edge of the map field.
The teacher answered partially by clicking on the temperate deciduous forests in Europe and also taiga in Canada. She did not focus her attention on the legend.

Results of Individual Tasks-Comprehension of Cartographic Methods (Q2)
The next step analysed the participants' behaviour in solving individual tasks.

Task01
In the experiment's first task, the participants identified all areas with temperate deciduous forests. It was assumed this task would be very easy for the students since all that was required was identifying the correct symbol from the legend and recognizing all the areas indicated by this symbol. However, the accuracy of the answers was 61%, and only 13 students solved the task correctly and 11 students partially. The students marked temperate deciduous forests together with taiga or even subtropical and tropical forests. The reason was probably a poorly distinguishable legend, with all three types of vegetation being visualized using very similar symbols (see bottom-left section of Figure 6). Figure 6 indicates the fixations of participants in grey and the teacher's fixation in red. The answers (clicks) are visualized as blue dots. From the distribution of fixations, it is evident that participants did not focus their attention on the strip with climate belts at the edge of the map field.
The teacher answered partially by clicking on the temperate deciduous forests in Europe and also taiga in Canada. She did not focus her attention on the legend. ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 13 of 24 Figure 6. Fixations of students (grey) and their teacher (red), together with answers (blue dots). Similar symbols from the legend are enlarged in the left lower corner.

Task02
In the second task, respondents found and marked all countries on the choropleth map with less than 20% of the urban population. Participants found this task very easy, demonstrating one of the highest accuracies in the entire experiment and requiring the least amount of time to solve the task (19.25 s as can be seen from Figure 5).
The sequence chart in Figure 7 shows the distribution of fixations in the map field (green) and legend (red). Only two students (S13 and S22) answered incorrectly ( Figure 4) since they did not look at the legend at all (see Figure 7).

Task02
In the second task, respondents found and marked all countries on the choropleth map with less than 20% of the urban population. Participants found this task very easy, demonstrating one of the highest accuracies in the entire experiment and requiring the least amount of time to solve the task (19.25 s as can be seen from Figure 5).
The sequence chart in Figure 7 shows the distribution of fixations in the map field (green) and legend (red). Only two students (S13 and S22) answered incorrectly ( Figure 4) since they did not look at the legend at all (see Figure 7).

Task02
In the second task, respondents found and marked all countries on the choropleth map with less than 20% of the urban population. Participants found this task very easy, demonstrating one of the highest accuracies in the entire experiment and requiring the least amount of time to solve the task (19.25 s as can be seen from Figure 5).
The sequence chart in Figure 7 shows the distribution of fixations in the map field (green) and legend (red). Only two students (S13 and S22) answered incorrectly ( Figure 4) since they did not look at the legend at all (see Figure 7).  The correct answer for this task was marking eight countries with the brightest colour. The teacher marked only three of them. Since it is visible from the sequence chart, the teacher looked at the legend when she began to view the stimulus, although only for a brief moment. She probably indicated countries according to her knowledge from urban geography.

Task03
In the third task, the participants identified the country with the highest proportion of potato consumption according to total calories. The map legend contained three sections, and the participants were required to discover the information from pie charts, where brown depicted potatoes. Each of the participants looked at the legend, and four of them answered incorrectly (S5, S13, S21, and S28).
The teacher looked at the legend only briefly compared to students (9 fixations, while the students' average was more than 21). Her answer was ranked as partially correct since she selected more than one country. This task took her the most time to solve in the whole experiment.

Task04
Task04 required identifying a particular graduated symbol on the map. The task was to identify an urban agglomeration with more than 20 million inhabitants in North America, Central America, and South America. The legend contained three different symbol sizes for urban agglomerations (see Figure 8). The participants looked for the biggest circle on the map. Ten students indicated incorrect answers, and four others were only partially correct. The problem was likely in the difficulty of identifying graduated symbols (Figure 8). The correct answer for this task was marking eight countries with the brightest colour. The teacher marked only three of them. Since it is visible from the sequence chart, the teacher looked at the legend when she began to view the stimulus, although only for a brief moment. She probably indicated countries according to her knowledge from urban geography.

Task03
In the third task, the participants identified the country with the highest proportion of potato consumption according to total calories. The map legend contained three sections, and the participants were required to discover the information from pie charts, where brown depicted potatoes. Each of the participants looked at the legend, and four of them answered incorrectly (S5, S13, S21, and S28).
The teacher looked at the legend only briefly compared to students (9 fixations, while the students' average was more than 21). Her answer was ranked as partially correct since she selected more than one country. This task took her the most time to solve in the whole experiment.

Task04
Task04 required identifying a particular graduated symbol on the map. The task was to identify an urban agglomeration with more than 20 million inhabitants in North America, Central America, and South America. The legend contained three different symbol sizes for urban agglomerations (see Figure 8). The participants looked for the biggest circle on the map. Ten students indicated incorrect answers, and four others were only partially correct. The problem was likely in the difficulty of identifying graduated symbols (Figure 8). The teacher encountered the above-mentioned problem. From the recordings of her eyemovements, it was evident that she had problems in distinguishing the size of the symbols in North and South America. She spent a considerable time on this task and answered only partially correctly since she mismatched the size of the symbols.

Task05
Task05 was one of the easiest in the whole experiment with correctness of 90% as can be seen from Figure 4. Participants were required to identify the convergent plate boundary. The legend contained four different linear symbols for plate boundary types. Only one student responded incorrectly (S13) and was the only who recorded no fixation on the correct part of the legend (with linear symbols for plate boundaries). This student achieved the worst results in the whole experiment.
The teacher recorded the quickest answer for Task05. Her trial duration of 7.7 s was also quicker than the students (p = 0.06). The teacher omitted the legend and spontaneously focused her attention on the plate boundaries on the map. Unfortunately, she mismatched the divergent and convergent boundary, so her answer was incorrect.

Task06
Task06 was also a simple task. The participants identified a place on every continent where iron ore was mined. Iron ore was indicated with a red "Fe" symbol, and many students needed only a few fixations to inspect the legend to find the right symbol. Only two students indicated an incorrect answer. One of them (S17) did not remember the task and searched for a different symbol (oil field). The teacher encountered the above-mentioned problem.
From the recordings of her eye-movements, it was evident that she had problems in distinguishing the size of the symbols in North and South America. She spent a considerable time on this task and answered only partially correctly since she mismatched the size of the symbols.

Task05
Task05 was one of the easiest in the whole experiment with correctness of 90% as can be seen from Figure 4. Participants were required to identify the convergent plate boundary. The legend contained four different linear symbols for plate boundary types. Only one student responded incorrectly (S13) and was the only who recorded no fixation on the correct part of the legend (with linear symbols for plate boundaries). This student achieved the worst results in the whole experiment.
The teacher recorded the quickest answer for Task05. Her trial duration of 7.7 s was also quicker than the students (p = 0.06). The teacher omitted the legend and spontaneously focused her attention on the plate boundaries on the map. Unfortunately, she mismatched the divergent and convergent boundary, so her answer was incorrect.

Task06
Task06 was also a simple task. The participants identified a place on every continent where iron ore was mined. Iron ore was indicated with a red "Fe" symbol, and many students needed only a few fixations to inspect the legend to find the right symbol. Only two students indicated an incorrect answer. One of them (S17) did not remember the task and searched for a different symbol (oil field).
The teacher again responded according to her knowledge, not according to the map. Although she looked briefly at the legend twice, she did not search for the correct symbol. She marked the countries where iron ore was mined (Canada, South Africa, Sweden, and Brazil), but her clicks were not near the "Fe" symbols.

Task07
Task07 was one of the most complicated in the experiment with correctness of only 35% as can be seen from Figure 4. The participants identified three countries with a total GDP of approximately USD 2500 billion. GDP information was visualized in a proportional pie chart with a logarithmic scale. To find the correct answer, participants had to imagine how large the symbol depicting the value of USD 2500 billion was. This process is indicated in Figure 9. The teacher again responded according to her knowledge, not according to the map. Although she looked briefly at the legend twice, she did not search for the correct symbol. She marked the countries where iron ore was mined (Canada, South Africa, Sweden, and Brazil), but her clicks were not near the "Fe" symbols.

Task07
Task07 was one of the most complicated in the experiment with correctness of only 35% as can be seen from Figure 4. The participants identified three countries with a total GDP of approximately USD 2500 billion. GDP information was visualized in a proportional pie chart with a logarithmic scale. To find the correct answer, participants had to imagine how large the symbol depicting the value of USD 2500 billion was. This process is indicated in Figure 9. Participants had difficulties in estimating the pie chart size. Only eight indicated the correct answer. Almost all the pie charts on the map were marked at least once, which may denote that students misunderstood the legend. As evident from the sequence chart in Figure 10, a majority (55%) fixated on the legend (red), and yet they did not respond correctly.  Participants had difficulties in estimating the pie chart size. Only eight indicated the correct answer. Almost all the pie charts on the map were marked at least once, which may denote that students misunderstood the legend. As evident from the sequence chart in Figure 10, a majority (55%) fixated on the legend (red), and yet they did not respond correctly. The teacher again responded according to her knowledge, not according to the map. Although she looked briefly at the legend twice, she did not search for the correct symbol. She marked the countries where iron ore was mined (Canada, South Africa, Sweden, and Brazil), but her clicks were not near the "Fe" symbols.

Task07
Task07 was one of the most complicated in the experiment with correctness of only 35% as can be seen from Figure 4. The participants identified three countries with a total GDP of approximately USD 2500 billion. GDP information was visualized in a proportional pie chart with a logarithmic scale. To find the correct answer, participants had to imagine how large the symbol depicting the value of USD 2500 billion was. This process is indicated in Figure 9. Participants had difficulties in estimating the pie chart size. Only eight indicated the correct answer. Almost all the pie charts on the map were marked at least once, which may denote that students misunderstood the legend. As evident from the sequence chart in Figure 10, a majority (55%) fixated on the legend (red), and yet they did not respond correctly.  In this task, the teacher used the legend for the first time in the experiment. It took a long time until she oriented herself in the map, but despite this, her trial duration was less than the median value of students. Her answers were recorded as correct.

Task08
Task08 was to identify three countries whose imports exceeded exports. Finding the correct answer was possible in two manners. The first was to search for the area symbols (chorochromatic map) where the information for the trade balance was depicted directly. The second was to use the bar charts ( Figure 11) to find the countries where the bar for imports was taller than the one for exports.
In this task, the teacher used the legend for the first time in the experiment. It took a long time until she oriented herself in the map, but despite this, her trial duration was less than the median value of students. Her answers were recorded as correct.

Task08
Task08 was to identify three countries whose imports exceeded exports. Finding the correct answer was possible in two manners. The first was to search for the area symbols (chorochromatic map) where the information for the trade balance was depicted directly. The second was to use the bar charts ( Figure 11) to find the countries where the bar for imports was taller than the one for exports. Figure 11. Legend for Task08 (translated into English).
Only five participants (S12, S14, S23, S27, and S29) worked with area symbols. The charts were used by 19 other participants, who also indicated the correct answer. These numbers suggest that the task was relatively easy for the participants and was also one of those with low trial duration.
After the experience from the previous task, the teacher looked directly at the legend, and she spent a relatively long time there. She focused on the bar charts in the legend and selected countries accordingly. Her trial duration was slightly less than the median for students. Her answer was also correct.

Task09
Task09 was to identify three shipping routes with an annual capacity below 100 million tonnes. Information about the shipping routes was visualized using graduated linear symbols in blue. The colour was similar to the colour for parallels and meridians, and some participants mismatched these objects. Four participants marked the correct symbol in close proximity to the harbours. Discussion with the students revealed that they marked the lines near the ports to avoid confusion with symbols depicting parallels and meridians.
In general, the task was relatively easy; only four students marked the answer incorrectly. Participant S13 did not look at the legend at all. The teacher behaved similarly, and her response was also incorrect.

Task10
In the final task, participants estimated Brazil's export volume in billions of USD. The map was the same as the map used forTask08. To find the correct answer, participants inspected the bar chart's legend, where 1 mm corresponded to USD 50 billion (Figure 11). This was similar to Task07. The participants fixated mostly on the legend (51%), but only one-quarter of participants indicated a correct answer. Brazil's export value was approximately USD 250 billion, so values 200 and 300 were Only five participants (S12, S14, S23, S27, and S29) worked with area symbols. The charts were used by 19 other participants, who also indicated the correct answer. These numbers suggest that the task was relatively easy for the participants and was also one of those with low trial duration.
After the experience from the previous task, the teacher looked directly at the legend, and she spent a relatively long time there. She focused on the bar charts in the legend and selected countries accordingly. Her trial duration was slightly less than the median for students. Her answer was also correct.

Task09
Task09 was to identify three shipping routes with an annual capacity below 100 million tonnes. Information about the shipping routes was visualized using graduated linear symbols in blue. The colour was similar to the colour for parallels and meridians, and some participants mismatched these objects. Four participants marked the correct symbol in close proximity to the harbours. Discussion with the students revealed that they marked the lines near the ports to avoid confusion with symbols depicting parallels and meridians. In general, the task was relatively easy; only four students marked the answer incorrectly. Participant S13 did not look at the legend at all. The teacher behaved similarly, and her response was also incorrect.

Task10
In the final task, participants estimated Brazil's export volume in billions of USD. The map was the same as the map used forTask08. To find the correct answer, participants inspected the bar chart's legend, where 1 mm corresponded to USD 50 billion (Figure 11). This was similar to Task07. The participants fixated mostly on the legend (51%), but only one-quarter of participants indicated a correct answer. Brazil's export value was approximately USD 250 billion, so values 200 and 300 were considered partially correct. The responses varied from 3 to 5000. This wide range suggested that participants were completely lost with this task and that estimating the value caused them difficulties.
The teacher looked into the legend but did not fixate on Brazil. Thus, she could not estimate the size of the bar chart and her answer was incorrect.

Scanpath Similarity-Difference between Students and Their Teacher (Q3)
The third research question dealt with comparing the strategy used to inspect stimuli between students and their geography teacher. As was described in the previous part of the text, the teacher attempted to solve the tasks mainly using her knowledge, not using the map. The quantitative comparison of the strategies used by the students and the teacher was based on the results of the multimatch-gaze and ScanGraph tool. The similarity of the scanpaths for each pair of participants was evaluated according to four multimatch-gaze metrics (vector, direction, length, and position) and using string-edit-distance (Levenshtein distance) in ScanGraph. The resulting matrices (Figure 12) show the average mutual similarity between students and the average similarity between the teacher and her students. The subtracted average values (∆) indicate whether the teacher applied a unique strategy in inspecting stimuli or used a more conventional approach (similar to students). The higher the value, the more unique the teacher's scanpath, which meant the more dissimilar the map-reading strategy. Values higher than average + standard deviation are highlighted in red.
considered partially correct. The responses varied from 3 to 5000. This wide range suggested that participants were completely lost with this task and that estimating the value caused them difficulties.
The teacher looked into the legend but did not fixate on Brazil. Thus, she could not estimate the size of the bar chart and her answer was incorrect.

Scanpath Similarity-Difference between Students and Their Teacher (Q3)
The third research question dealt with comparing the strategy used to inspect stimuli between students and their geography teacher. As was described in the previous part of the text, the teacher attempted to solve the tasks mainly using her knowledge, not using the map. The quantitative comparison of the strategies used by the students and the teacher was based on the results of the multimatch-gaze and ScanGraph tool. The similarity of the scanpaths for each pair of participants was evaluated according to four multimatch-gaze metrics (vector, direction, length, and position) and using string-edit-distance (Levenshtein distance) in ScanGraph. The resulting matrices ( Figure  12) show the average mutual similarity between students and the average similarity between the teacher and her students. The subtracted average values (Δ) indicate whether the teacher applied a unique strategy in inspecting stimuli or used a more conventional approach (similar to students). The higher the value, the more unique the teacher's scanpath, which meant the more dissimilar the mapreading strategy. Values higher than average + standard deviation are highlighted in red.  Figure 13 reveals the teacher's unique strategy in Task05 from Levenshtein distance and in Task09 from position measurements using multimatch-gaze. In these cases, the similarity between the teacher and students was clearly lower than between the students. Figure 13 depicts these two extreme examples together with Task02, where the differences were minimal. On the left side of the figure, the teacher's scanpath is highlighted using red. The scanpaths of the students are displayed in grey. In Task09 and Task05, the teacher used a different strategy than the students, because she did not look into the legend and focused her attention on different parts of the stimuli than the students. On the other hand, in Task02, the teacher looked at the legend and focused her attention on Africa, where the correct answer was located. The same strategy was used by the students.
The middle part of Figure 13 displays the results of the position measurement calculated in multimatch-gaze and visualized with the ScanGraph tool. Each dot in the graph represents one participant. The participants with a similarity of at least 85% are connected. The teacher is visualized in red. Task09 and Task05 evidently show that the teacher is not connected to the students. By contrast, in Task02, the teacher used a strategy at least 85% similar to 27 students. The section at the right of the figure displays the results of string-edit-distance using Levenshtein distance (similarity greater than 75%) and confirms a similarity in strategy. It was calculated from the sequence of visited areas of interest. In Task09 and Task05, the teacher did not inspect the legend; therefore, the similarity of her strategy towards the students was low. In Task02, the teacher looked at the legend as students did and she had similarity higher than 75% with nine of them.  Figure 13 reveals the teacher's unique strategy in Task05 from Levenshtein distance and in Task09 from position measurements using multimatch-gaze. In these cases, the similarity between the teacher and students was clearly lower than between the students. Figure 13 depicts these two extreme examples together with Task02, where the differences were minimal. On the left side of the figure, the teacher's scanpath is highlighted using red. The scanpaths of the students are displayed in grey. In Task09 and Task05, the teacher used a different strategy than the students, because she did not look into the legend and focused her attention on different parts of the stimuli than the students. On the other hand, in Task02, the teacher looked at the legend and focused her attention on Africa, where the correct answer was located. The same strategy was used by the students.
The middle part of Figure 13 displays the results of the position measurement calculated in multimatch-gaze and visualized with the ScanGraph tool. Each dot in the graph represents one participant. The participants with a similarity of at least 85% are connected. The teacher is visualized in red. Task09 and Task05 evidently show that the teacher is not connected to the students. By contrast, in Task02, the teacher used a strategy at least 85% similar to 27 students. The section at the right of the figure displays the results of string-edit-distance using Levenshtein distance (similarity greater than 75%) and confirms a similarity in strategy. It was calculated from the sequence of visited areas of interest. In Task09 and Task05, the teacher did not inspect the legend; therefore, the similarity of her strategy towards the students was low. In Task02, the teacher looked at the legend as students did and she had similarity higher than 75% with nine of them. Figure 13. Comparison of students' and teacher's map-reading strategies. The left column indicates the scanpaths, the middle column shows the ScanGraph visualization of position measurement, and the right column depicts the results of Levenshtein distance (also visualized using ScanGraph). The teacher is in red.

Discussion
The present paper describes an empirical study which evaluates student learning with a school world atlas. The research is one of the first eye-tracking studies using this kind of stimuli.
In designing the present study's experiment, the authors selected ten thematic maps which depicted the entire world. The maps were also selected to include different types of cartographic visualization methods. For use as stimuli, these maps were cropped to preserve the legibility on computer monitors with an aspect ratio of 4:3. Monitors with this aspect ratio were used to ensure good quality in the recorded eye-movements. With wide-screen monitors, the pupils of the eyes might have been obscured by eyelids.
Students had 60 s to solve the task. Sixty seconds was a sufficient amount of time for most of the participants and we chose this limit to avoid the situation when the student will try to solve the task for so long. The correctness of the students' answers was consistent with the work of Havelková and Hanus [9]. They determined that the students were more successful in tasks with either qualitative or both qualitative and quantitative cartographic methods.
Students from two third grade grammar school classes (~18 years) were selected as participants. Data for 41 students were recorded, but 11 were excluded from the experiment because of inaccuracies in the eye-tracker. All the students shared the same geography teacher, which allowed Figure 13. Comparison of students' and teacher's map-reading strategies. The left column indicates the scanpaths, the middle column shows the ScanGraph visualization of position measurement, and the right column depicts the results of Levenshtein distance (also visualized using ScanGraph). The teacher is in red.

Discussion
The present paper describes an empirical study which evaluates student learning with a school world atlas. The research is one of the first eye-tracking studies using this kind of stimuli.
In designing the present study's experiment, the authors selected ten thematic maps which depicted the entire world. The maps were also selected to include different types of cartographic visualization methods. For use as stimuli, these maps were cropped to preserve the legibility on computer monitors with an aspect ratio of 4:3. Monitors with this aspect ratio were used to ensure good quality in the recorded eye-movements. With wide-screen monitors, the pupils of the eyes might have been obscured by eyelids.
Students had 60 s to solve the task. Sixty seconds was a sufficient amount of time for most of the participants and we chose this limit to avoid the situation when the student will try to solve the task for so long. The correctness of the students' answers was consistent with the work of Havelková and Hanus [9]. They determined that the students were more successful in tasks with either qualitative or both qualitative and quantitative cartographic methods.
Students from two third grade grammar school classes (~18 years) were selected as participants. Data for 41 students were recorded, but 11 were excluded from the experiment because of inaccuracies in the eye-tracker. All the students shared the same geography teacher, which allowed a comparison to be made between the teacher's and students' strategies. This determined the total number of study participants, which was limited to the total number of students of both class groups.
Several approaches to data analysis were employed to compare the results of the students and teacher. First, the teacher's eye-movement data was thoroughly inspected and qualitatively described. Two other methods of quantitative similarity calculations were applied. One approach involved a string-edit-distance method which had been previously used in many eye-tracking studies to compare different participant groups (i.e., [76][77][78][79]). Specifically, ScanGraph calculated the similarity of scanpaths according to Levenshtein distance (i.e., [43,80,81]) and visualized the results calculated using the multimatch method, which can only indicate similarity between two scanpaths. The present study used batch calculations to calculate the similarity between all possible pairs of participants, in other words, 961 calculations (31 × 31) for each of the ten stimuli in the experiment. The only problem encountered was with metric duration, one of the five metrics used in multimatch. The results were normalised by the length of the longer of two analysed scanpaths. It was complicated to find a solution to this problem, and therefore this metric was excluded from the analysis.
Summarizing the differences between the students and the teacher was based on the average similarity for all students and the average similarity between the teacher and all the students. These two values were then subtracted. A greater difference suggested the greater uniqueness of the teacher's strategy. Although this approach directed the present study to instances when the teacher applied a very different strategy to the students, the authors were aware that using these methods might not be an ideal solution. Using any of the clustering methods to calculate the difference between dissimilarity matrices might be a possible enhancement for future research.
Qualitative analysis of the teacher's eye-movements especially and an analysis of her answers revealed that the teacher used a completely different strategy to solve tasks. In the majority of cases, the teacher did not look at the legend and attempted to solve the tasks directly. Unfortunately, her answers were very often not correct. In the discussion after the experiment, the teacher explained that she had a feeling that she should know the correct answers, and therefore she solved the tasks according to her knowledge, not with the aid of the map. This may have been caused by the tasks focusing on topics which were part of the geography curricula. A completely different scenario might occur if the map of an unknown territory or a fictitious map were used. Kulhavy and Stock [82] stated that people do not learn maps in a conceptual vacuum; their map representations are affected by information already retained in memory.
The results of the students' answers showed that they indicated a considerable number of incorrect answers in several tasks. The problems with Task01, Task04, Task08, and Task10 may have been caused by poor choice of cartographic visualization methods or barely legible symbols in the legend. These findings are important, and it may be beneficial to focus on them in future research. School atlases are used in most schools in the Czech Republic, and more user studies focusing on problematic maps may be helpful to publishers and improve the cartographic literacy of students.

Conclusions
School world atlases are crucial in geography education. However, only a few user studies have analysed student learning with a school atlas. The present paper aims to contribute in filling this gap. An eye-tracking experiment with ten tasks using thematic maps from the Czech school world atlas was designed, and the eye-tracking data of 30 students were recorded using GazePoint eye-tracker. The eye-movements of the students' geography teacher and students were recorded and compared.
The paper defined three research questions and explored the results of an experiment designed to provide answers to these questions.
The results for Q1 show that in general, the students were able to learn with the maps effectively. In this research question, the accuracy of answers of all participants was analysed together with the trial duration. The average correctness of answers from all students was 71%. This analysis pointed to several problematic tasks. Reading values from pie-charts with a logarithmic scale (Task07) posed the greatest difficulties. The topics of logarithmic scales and pie-charts could be addressed in geography education.
The results for Q2 revealed difficulties in solving tasks due to poor cartographic visualization methods, for example, some symbols were hard to distinguish (Task01, Task04). The most serious problems were discovered in students estimating the value of the bar chart (Task10). Students barely understood the legend scale in which one millimetre of the bar represented USD 50 million in export volume. These issues should be considered in the next edition of the school word atlas.
Q3 targeted map-reading strategies. The study proved that the geography teacher used a different approach in solving tasks to her students. The experiment revealed that the teacher had a feeling that she should know the correct solution to the task, so she answered according to her knowledge and did not read the map at all. This performance was observed in most of the tasks. The teacher looked at the legend in only a few tasks. This strategy, however, resulted in few correct answers. Discovering that a teacher reads a map and solves tasks differently to her students is very serious. If teachers are not aware of this difference and select maps and compile tasks according to "their own strategies", student learning may not be effective. It is desirable that learning with an atlas is based on consistency in the compilation of tasks with the maps and the student's ability to work with these maps. Geography curricula should focus on issues in map reading.
The present eye-tracking study highlighted several maps with poorly applied cartographic methods which created difficulties for students. Moreover, the research highlighted that the teacher used a different approach in map reading. She rather relied on her knowledge than reading the map, answering directly instead of using the map legend.
The results can assist cartographers and map publishers in improving their maps to be more comprehensible to readers. Geography teachers can also use the results to understand how their students read the maps and how to teach geography more attractively and effectively.