OnMapGaze and GraphGazeD: A Gaze Dataset and a Graph-Based Metric for Modeling Visual Perception Differences in Cartographic Backgrounds Used in Online Map Services

: In the present study, a new eye-tracking dataset (OnMapGaze) and a graph-based metric (GraphGazeD) for modeling visual perception differences are introduced. The dataset includes both experimental and analyzed gaze data collected during the observation of different cartographic backgrounds used in five online map services, including Google Maps, Wikimedia, Bing Maps, ESRI, and OSM, at three different zoom levels (12z, 14z


Introduction
In our constantly developing world, the map has evolved from a static view of the world to an interactive, mobile, dynamic, and collaborative interface.Such an interface now serves as a bridge between individuals and the ever-changing environment.Nowadays, a map can be considered as a platform that allows users to manipulate spatial information.As reported by Roth et al. [1], web maps are adaptive to the use and user's context (e.g., [2,3]), interactive to user's requests (e.g., [4,5]), mobile (e.g., [6,7]), multiscale (e.g., [8,9]), and updated in real time (e.g., [10,11]).Ultimately, the map has become a powerful tool for enhancing communication and decision-making that values a dynamic approach to spatial information.The widespread presence of digital technologies has transformed the way the general audience accesses maps and spatial information [12].Maps are ingrained in our daily lives, and their role is crucial in different tasks, ranging from navigation to trip planning purposes.However, their effectiveness depends on how well users can identify the information they need, such as points of interest and navigation cues [13].Modern cartographic products are mainly distributed through online map services and are accessed by millions of web users every day.It is therefore important to understand how users observe and understand maps to improve their design and usability [9].
Eye-tracking and eye movement analysis a valuable research technique for assessing different aspects related to the interaction between a map and its user [14,15].Although map designers strive to create informative maps, there is a constant need to examine how map users perceive and prioritize the objects depicted on a map.The examination of visual behavior during map reading could be based on both the analysis and visualization of experimental gaze data [16].Over the last decades, a considerable number of experimental studies have been conducted to investigate both the effectiveness and efficiency of cartographic products such as interactive maps [17,18], animated maps [19][20][21][22][23][24], and web maps [13,[25][26][27].The study of map users' behaviors requires developing simple or more sophisticated approaches that reveal crucial information related to user perception and cognition.In particular, specific metrics that could model the performed visual strategies (e.g., [28]), as well as gaze activity patterns during the execution of map tasks (e.g., [29]), are of high importance for the development of robust methodological frameworks for cartographic design evaluation processes.Thus, the need to develop cartographic-oriented user response measures is clear and could contribute significantly to a better understanding of map usability issues.This need becomes even more apparent when considering the large amount of experimental data collected with modern eye-tracking equipment (i.e., modern devices can record gaze data at frequencies up to 2000 Hz).
Today, artificial intelligence (AI) has become an active part of state-of-the-art technology, science, and society.GeoAI is the use of AI and machine learning in geography [30] to develop approaches related to spatiotemporal problems [31].AI could substantially support the production of personalized maps [32,33].User data collected utilizing eye tracking or other user response methods (e.g., mouse tracking [34]), can be used as input for on-the-fly map design processes.In this direction, the existence of rich and open-access eye movement datasets is valuable, especially considering that they can serve as benchmarks [35].Undoubtedly, freely distributed benchmark datasets facilitate the sharing of open data, encourage collaboration, empower the scientific community to tackle challenges, and support research related to AI applications (e.g., visual saliency prediction).Currently, there are several efforts to distribute open datasets to the scientific community.
The aim of the present study is twofold.Firstly, we present a new eye-tracking dataset, called OnMapGaze, which includes both experimental and analyzed gaze data collected during the observation of different cartographic backgrounds utilized in modern online map services.Our dataset aims to serve as an objective ground truth for feeding AI algorithms and developing computational models for predicting visual behavior during map reading.Secondly, we introduce a new graph-based metric, called GraphGazeD, suitable for modeling existing visual perception differences based on aggregated gaze behavior data.Both the OnMapGaze dataset and the source code for computing the GraphGazeD metric are freely distributed to the scientific community (see Supplementary Materials).
Section 2 describes the related work that involves the basic quantitative eye-tracking metrics used in eye movement analyses, the processes for aggregated gaze behavior modeling utilizing statistical grayscale heatmaps, and the use of the aforementioned products in cartographic research as well as the existing gaze datasets.In Section 3, the approaches and methods for the design, execution, and analysis of an eye-tracking experiment are presented, including the introduced metric.Section 4 describes the results of the present study, which mainly involve the components of the new dataset.Section 5 discusses the results of the present study, outlines the conclusions, and includes suggestions for future work.

Basic Quantitative Eye-Tracking Metrics
Eye tracking is a valuable experimental technique that involves the recording of gaze coordinates spatiotemporally during the observation of visual scenes [36].Using this technique, researchers can study eye movement events, changes in pupil size (pupillometry), and overall gaze behavior [37].Fixations and saccades constitute the fundamental metrics utilized to analyze eye movements [38].Fixations correspond to periods of time during which the eyes remain relatively still and focused on a specific location or object in the observed visual scene.Saccades occur between fixations and involve rapid eye movements from one point of interest to another.
Several derived metrics can be computed from the fundamental metrics, including the scan path corresponding to fixation-saccade-fixation sequences [39].Scan paths reveal the patterns in which different areas or objects are visited.Additionally, pupil size (i.e., the measurement of pupil diameter) and the blink rate are metrics that provide insight into visual attention and user behavior during the execution of specific tasks [40].Pupil size measurements are therefore connected to cognitive processes and cognitive load.The blink rate is the frequency at which individuals blink their eyes.It indicates attention levels and changes in blink rate reflect shifts in cognitive states [41].

Aggregated Gaze Behavior Modeling Using Statistical Grayscale Heatmaps
Eye-tracking technology has advanced over the years, facilitating the collection of eye movement data and increasing the demand for enhanced visual analysis methods [42].Several techniques have been developed to visualize eye-tracking data, including heatmaps and gaze plots [43].Understanding the variations in eye movements among the data collected from different participants is a key goal in eye-tracking studies [17,44].Hence, the visualization of aggregated gaze behavior data is important in eye-tracking studies [16].Such visualizations highlight the allocation of visual attention during the observation of visual stimuli, thereby aiding in qualitative examination and providing a comprehensive overview of scanning behavior.Among the performed visualization methods, heatmaps provide a clearer representation of eye movement data spatial distribution [45].This method is applicable to both static and dynamic stimuli and can be implemented either post-experimentally or in real-time applications (e.g., [46]).Statistical grayscale heatmaps indicate the distribution of visual attention during the observation of a visual stimulus, and they can be generated using either raw gaze data or the centers of fixation points [16].Additionally, the aforementioned products could be used to model the overall spatial distribution of visual attention across multiple observers.The creation of statistical grayscale heatmaps is based on the application of a Gaussian filter that takes into account the visual range of foveal vision.The filtering process produces a grayscale image where pixel values are normalized within a predefined range, reflecting the likelihood of gaze activity occurring within each pixel area.Therefore, they can effectively represent, both qualitatively and quantitatively, the salient locations of the visual stimuli.Moreover, statistical indices can be computed to describe the visual strategies of participants during free-viewing conditions or during the execution of a specific task (see e.g., [29]).Statistical grayscale heatmaps are one of the main products during the generation of an eye-tracking dataset.Such datasets could serve as the main ground truth for the production of saliency models (see e.g., [47][48][49][50][51]).Additionally, it is important to mention that grayscale statistical heatmaps could also be computed for the analysis of mouse tracking behavioral experiments since the nature of the corresponding data is similar (see e.g., [52]).Moreover, Krassanakis [16] presented a new method for aggregated gaze data visualization, which is based on the combined use of grayscale statistical heatmaps and contiguous irregular cartograms.

Statistical Grayscale Heatmaps for Cartographic Purposes
As mentioned above, grayscale statistical heatmaps could represent the cumulative spatial distribution of visual attention across different visual stimuli, as observed by multiple individuals.In cartography, however, even if different visual stimuli depict the same geographic area, their cartographic design (i.e., symbolization) may differ significantly.For example, cartographic backgrounds from different online map services are characterized by differences in their cartographic symbols [53].Therefore, the result of the comparison of grayscale statistical heatmaps produced by such visual stimuli could reveal existing differences in the allocation of visual attention [54].Comparing grayscale statistical heatmaps is a process that enhances comprehension and validation of the saliency of objects within the analyzed visual stimuli.While some grayscale statistical heatmaps may show similar gaze patterns, suggesting uniformity in visualization, closer examination through comparison reveals disparities.The information extracted from this type of product contributes to the description of the differences among cartographic products, enriching the knowledge to create more effective and efficient maps.In addition, statistical grayscale heatmaps can be used to examine gaze activity over the entire area of a map or over different areas of interest (AOIs) within a map.For example, in a recent study by Cybulski and Krassanakis [29], specific statistical indices were developed in order to examine the overall allocation of gaze activity during the visual search of labels on cartographic backgrounds.

Eye-Tracking Dataset Distribution in Cartography and Related Fields
As mentioned in Section 2.2, eye-tracking datasets are commonly used to predict salient locations on visual stimuli.However, it should be noted that eye-tracking datasets may differ in terms of the number of participants involved, the type and total number of visual stimuli used, the tasks that the participants were asked to perform, and the duration of each visual stimulus.Additionally, the age range and/or the level of expertise of the participants may vary.Furthermore, the existing eye-tracking datasets can be generated using different types of experimental equipment.
A number of eye-tracking datasets have been designed and distributed in the past years.The MIT/Tuebingen Saliency Benchmark [55] constitutes an extensive collection that provides eye-tracking data and saliency maps for different types of (audio)visual stimuli.The majority of models have been designed specifically to predict visual attention when viewing natural images.Nonetheless, a map, being an artificial and abstract depiction of the real world, poses a challenge for typical saliency models in predicting user behavior when interacting with maps [56].Gaze datasets produced during the observation of cartographic products (and/or related products) could significantly aid in modeling visual attention and broaden the scope of cartographic research to explore emerging topics.EyeTrackUAV [57] and EyeTrackUAV2 [58] were created to study participants' visual attention while observing unmanned aerial vehicle (UAV) videos under free-viewing and task-based (surveillance) conditions.Tzelepis et al. [59] provided an eye-tracking dataset collected during the observation of hill-shading images.GeoEye [51] contains eye movement data during the observation, under free-viewing conditions, of geospatial images, such as thematic maps, remote sensing, and street view images.CartoGaze [13] consists of eye-tracking data acquired during the execution of a controlled memorability experiment on 2D web maps.As Keskin et al. [35] also point out, sharing open benchmark datasets as well as developing specific guidelines are of major importance in understanding map usability issues.

Visual Stimuli
The experimental visual stimuli consisted of cartographic backgrounds that were excerpted from web maps provided by five different online map services (Google Maps, Wikimedia, Bing Maps, ESRI, OSM) utilizing the open-source Geographic Information Software QGIS (v.3.22.4).and the QuickMapServices plugin (v.0.19.32).The visual stimuli were adjusted to the native resolution of the utilized display screen (1920 × 1080 px).The visual stimuli depict fifteen different cities in Greece at three different zoom levels (12z, 14z, and 16z).Hence, five different cities were selected for each zoom level, where cities with higher populations correspond to higher zoom levels.In total, 75 different visual stimuli were generated.All web map services use the WGS84 (EPSG: 4326) coordinate reference system and the Web Mercator projection, which was introduced by Google Maps and constitutes the new version of the Mercator projection [53].Figure 1 depicts some indicative examples of the experimental visual stimuli.
The display order of the visual stimuli was randomized to reduce order bias, while each visual stimulus was displayed for three seconds.Between the experimental stimuli, a blank screen of white color (RGB 255, 255, 255) was displayed for one second to initialize the observer's visual exploration from the starting point, the center of the image.The display time was determined after several trials, taking into account the total duration of the experimental process, since longer experimental durations can lead to physical fatigue in the participants.
Maps and constitutes the new version of the Mercator projection [53].Figure 1   The display order of the visual stimuli was randomized to reduce order bias, while each visual stimulus was displayed for three seconds.Between the experimental stimuli, a blank screen of white color (RGB 255, 255, 255) was displayed for one second to initialize the observer's visual exploration from the starting point, the center of the image.The display time was determined after several trials, taking into account the total duration of the experimental process, since longer experimental durations can lead to physical fatigue in the participants.

Participants
In total, thirty participants, with normal or corrected to normal vision, participated in the experiment (53% female).The right eye was dominant for 73% of the participants.The age of the participants varied with a range from 19 to 61 years old, with the majority of the participants being between the ages of 21 to 23 years old.None of the participants had any previous experience with eye-tracking experiments.All the participants were volunteers.The total number of participants used in this study was in line with the typical number of volunteers used in map user studies [60].Since the experimental process required the approval of the Research and Ethics Committee of the University of West Attica, each participant was given a consent form to read and complete before participation.This document outlined the objectives of this research study, detailed the experimental process, and explained the utilization of the collected data.

Experimental Equipment and Setup
The EyeLink ® 1000 Plus [61] eye tracker is used for the performance of the presented study.The experimental setup consists of two Personal Computers (PCs) and two display monitors.The first PC is used as the display computer, where EyeLink ® software is

Participants
In total, thirty participants, with normal or corrected to normal vision, participated in the experiment (53% female).The right eye was dominant for 73% of the participants.The age of the participants varied with a range from 19 to 61 years old, with the majority of the participants being between the ages of 21 to 23 years old.None of the participants had any previous experience with eye-tracking experiments.All the participants were volunteers.The total number of participants used in this study was in line with the typical number of volunteers used in map user studies [60].Since the experimental process required the approval of the Research and Ethics Committee of the University of West Attica, each participant was given a consent form to read and complete before participation.This document outlined the objectives of this research study, detailed the experimental process, and explained the utilization of the collected data.

Experimental Equipment and Setup
The EyeLink ® 1000 Plus [61] eye tracker is used for the performance of the presented study.The experimental setup consists of two Personal Computers (PCs) and two display monitors.The first PC is used as the display computer, where EyeLink ® software is installed along with SR Research's tools, and the second PC is used as the host computer where all the computations of true gaze and eye-motion event detection and analysis take place.Moreover, the host monitor (primary monitor) is manipulated by the operator of the experiment, and the display monitor (secondary monitor) serves as the projection of the visual stimuli.Both PCs communicate via Ethernet.For the performance of the present study, gaze samples are collected at 500 Hz.The distance between the display monitor and the eye-tracking camera is fixed during the execution of the experimental process, while the distance between the participant and the display monitor is approximately 60 cm (the participant is positioned at approximately 1.75 times the width of the display, ensuring that the entire monitor is visible without exceeding a visual angle of 32 • and 25 • in the horizontal and in vertical dimensions, respectively).

Experimental Programming
The experiment's overall structure is designed using the SR Research Experiment Builder (SREB) [62].The workflow consists of actions followed by triggers, and the same pattern is repeated throughout the experiment.This sequence results in the creation of a hierarchically organized flow diagram.The components of the SRSE are categorized as "actions" and "triggers", and they contain nodes that are inserted directly in the flow diagram utilizing visual programming.In more detail, visual programming facilitates the experimental design and procedures such as dragging nodes with different properties and functions, nesting different sequence blocks, and connecting them.The main experimental flow consists of three main blocks and a starting node.The first node added is the Camera Setup, which triggers the procedures of calibration, validation, and the setup of the eye-tracking device's camera for every participant at the beginning of the experimental procedure.These blocks are nested in one another, and each one serves a different purpose.The first block is called the Sequence Block, the second block is called the Trial Block, and the third block is called the Recording Block.The Sequence Block contains the Drift Correction node, the Trial Block contains the Prepare Sequence node and the data source of the experiment, and the Recording Block contains two Display Screen nodes, each followed by a Timer node. Figure 2 depicts a flowchart of the main experiment.
study, gaze samples are collected at 500 Hz.The distance between the display monitor and the eye-tracking camera is fixed during the execution of the experimental process, while the distance between the participant and the display monitor is approximately 60 cm (the participant is positioned at approximately 1.75 times the width of the display, ensuring that the entire monitor is visible without exceeding a visual angle of 32° and 25° in the horizontal and in vertical dimensions, respectively).

Experimental Programming
The experiment's overall structure is designed using the SR Research Experiment Builder (SREB) [62].The workflow consists of actions followed by triggers, and the same pattern is repeated throughout the experiment.This sequence results in the creation of a hierarchically organized flow diagram.The components of the SRSE are categorized as "actions" and "triggers", and they contain nodes that are inserted directly in the flow diagram utilizing visual programming.In more detail, visual programming facilitates the experimental design and procedures such as dragging nodes with different properties and functions, nesting different sequence blocks, and connecting them.The main experimental flow consists of three main blocks and a starting node.The first node added is the Camera Setup, which triggers the procedures of calibration, validation, and the setup of the eyetracking device's camera for every participant at the beginning of the experimental procedure.These blocks are nested in one another, and each one serves a different purpose.The first block is called the Sequence Block, the second block is called the Trial Block, and the third block is called the Recording Block.The Sequence Block contains the Drift Correction node, the Trial Block contains the Prepare Sequence node and the data source of the experiment, and the Recording Block contains two Display Screen nodes, each followed by a Timer node. Figure 2   Calibration accuracy must be maintained throughout the experimental procedure.Drift checks between a certain number of presented visual stimuli aid in keeping track of the accuracy.The trial data source is divided into three parts, each with 25 visual stimuli.The drift correction procedure is started after the observation of each part of the visual stimuli.If the observed accuracy does not fall into the desired limits, the calibration procedure is performed again.Therefore, the Sequence Block is iterated as many times as the number of parts of the data source.The Trial Block contains the data that are recorded in Calibration accuracy must be maintained throughout the experimental procedure.Drift checks between a certain number of presented visual stimuli aid in keeping track of the accuracy.The trial data source is divided into three parts, each with 25 visual stimuli.The drift correction procedure is started after the observation of each part of the visual stimuli.If the observed accuracy does not fall into the desired limits, the calibration procedure is performed again.Therefore, the Sequence Block is iterated as many times as the number of parts of the data source.The Trial Block contains the data that are recorded in real time, and the Sequence Block draws the visual stimuli from the source to the display screen.

Procedure
The experiment is conducted in a controlled environment where variables such as lighting conditions, environmental vibrations, ambient noise, and the presence of others are carefully manipulated to optimize the accuracy of gaze data collection.Only the participant and the experimenter are allowed in the experimental room.Moderate natural ambient lighting is present, and adjustments are made to the lighting, especially when participants wear glasses or mascara.Potential questions expressed by the participants are addressed prior to the experiment during their review and signing of the consent form (see also Section 3.1.2).Prior to the beginning of the main experiment, the participants undergo a demo experiment to familiarize themselves with the process.At this point, it is important to mention that since any changes may reduce the accuracy of the collected data and require recalibration, the participants must maintain a comfortable position throughout the experiment during the experimental process.
After completing the necessary preparations, calibration and validation become essential steps in the process.Calibration involves the use of point targets with known locations, specific colors, sizes, and shapes.The target's size is determined by the following two parameters: outer size and inner size.The default target is a filled circle with a central hole, where the outer size represents the diameter of the outer circle and the inner size represents the diameter of the inner circle, contributing to calibration accuracy.For peripheral detectability, the outer circle size is set to 17 pixels, and the inner circle size is set to 5 pixels.The chosen colors are red (RGB 255, 0, 0) for the outer circle and white (RGB 255, 255, 255) for the inner circle, against a gray background (RGB 230, 230, 230) to ensure sufficient contrast.The calibration process is based on the use of thirteen point targets.
The main experiment's total duration is estimated at about six minutes, while the demo experiment lasts around one minute.The aforementioned durations are based on ideal conditions and do not include the required time to set up the participant in an optimal position.Calibration and validation together take one minute.Three drift corrections are incorporated between the three divisions of the data source, and a blank screen follows each stimulus appearance.The experiment is conducted under free-viewing conditions (i.e., participants do not have to answer any questions about the stimuli).This approach is followed in order to examine the salient locations of the investigated cartographic backgrounds, minimizing the cognitive parameters that may influence the procedure under the execution of specific map tasks.

Basic Eye Movement Indices
A maximum acceptable error of one degree of visual angle is set for the collection of raw binocular gaze data.It is important to report that the implementation of intermediate drift corrections was found to be an effective method of maintaining accuracy throughout the process, with the capabilities of the equipment facilitating this approach.Based on the collected raw data, both fixation and saccade events are calculated using the corresponding detection algorithms incorporated in EyeLink ® Data Viewer software [63].The minimum threshold for fixation duration and saccade amplitude is set to 80 ms and one degree of visual angle, respectively.
For each visual stimulus and for each one of the basic derived metrics of fixation duration, the total number of fixations, and saccade amplitude, five indices are calculated in order to quantitatively describe the overall visual behavior of the participants during the observation.Specifically, for the three metrics mentioned above, the minimum, the maximum, the average, the median, and the standard deviation values are computed for each visual scene.The computation of the reported eye movement indices is fully automated using the scripting language of MATLAB (MathWorks ® ) software (R2022a).
The calculation of such indices is important since it facilitates the analysis of the variables under consideration.By comparing the relationships between these variables, meaningful conclusions can be drawn.The minimum values indicate the smallest observation among the selected categories, helping to identify outliers and understand the lower boundary of the range.Conversely, the maximum values represent the largest observation, helping to identify outliers and the upper bound of the range.Average values are measures of central tendency and represent the typical value.The median values, as the middle values in a sorted dataset, provide a robust measure of central tendency, which is particularly useful in the presence of outliers.Finally, the standard deviation values quantify variability by measuring how spread out the values are around the mean, with a higher standard deviation indicating greater variability.

Statistical Grayscale Heatmaps
Statistical grayscale heatmaps are produced to determine the salient objects observed by participants in the visual stimuli.Their generation is based on the execution of the corresponding function provided by the EyeMMV toolbox [64].In more detail, the Gaussian parameter is determined by considering factors of the average distance between the participant and the display screen and the dimensions of the display screen.Converting the result from millimeters to pixels yields an approximate sigma value of 38 pixels.The Gaussian kernel is then computed as 6 times the sigma value, while the maximum vertical and horizontal values correspond to the resolution of the display screen.All pixel values are normalized within the range of 0 to 255, resulting in the generation of 8-bit grayscale images, where each intensity represents the probability of occurrence of each point.Hence, higher intensity values (255) indicate a higher probability of gaze.

The GraphGazeD Metric and Python Tool
The produced statistical grayscale heatmaps are utilized to calculate a new metric, called GraphGazeD, that describes quantitatively the differences in gaze patterns among the visual stimuli.The calculation of this metric facilitates the description of the differences in the allocation of visual attention by comparing the generated heatmaps, while a Python (v.3) tool is developed in order to support the computation of the introduced metric.In practice, comparisons are made among heatmaps that refer to cartographic backgrounds depicting the same geographic areas, at the same zoom level but derived from different online map services.However, instead of the result being a new map showing the differences, these differences are represented by a graph indicating the percentage of pixels that differ for the possible number of differences that can occur in terms of pixel intensity.Moreover, to model the produced graphs, curve-fitting techniques are applied.The GraphGazeD Python tool involves two main functions.The first function utilizes the OS, CV2, and Numpy Python libraries, while in order to execute the second function, the CSV, matplotlib.pyplot,scipy.optimize,and sklearn.metricsPython libraries/functions are imported.The scipy.optimize library is used to import the curve_fit function, while the sklearn.metricslibrary is employed to import the r_2 score function.
The first function refers to the comparison of the generated statistical grayscale heatmaps.The input is the directory of the heatmap folder, and the output is a simple .csvfile that contains the generated data that are used to generate the corresponding graphs.There are 75 statistical grayscale heatmaps in total, divided into sets of five that derive from cartographic backgrounds depicting the same area at the same zoom level and distributed from different services.There are ten possible comparison pairs in each set, producing 150 combinations.Consequently, for each pair of heatmaps to be compared, one heatmap is subtracted from the other, generating another grayscale image with its pixels having the values of the absolute difference in the value of the compared heatmaps' pixels.A threshold is set, ranging from values 0 to 255 as are the intensity values of the pixels of the initial statistical grayscale heatmaps.The values of the differences that are equal to or lower than the threshold are recorded and the number of those values is divided by the total number of pixels, thus calculating the percentage of pixels below or equal to the threshold's value.At the end of this process, an output file is created that contains the name of each comparison pair of heatmaps, the threshold value set normalized in the range between 0 and 1, and the percentage of pixels below this threshold.
The second function is referred to as graph generation.The input is the .csvfile produced from the first function and the output is a folder containing the graphs generated for each comparison pair.In each graph, the 256 threshold values normalized in the range between 0 and 1 are on the x-axis, and the percentage of pixels whose absolute difference is equal to or lower than the corresponding value is on the y-axis.The interval of the values on the graph depends on the data.Figure 3 visualizes the aforementioned process in three successive steps.For this example, an interval of 0.2 is selected.At this point, it is important to mention that the final plot is generated automatically (having a standard format) by the Python tool that is developed within the present study.
The second function is referred to as graph generation.The input is the .csvfile produced from the first function and the output is a folder containing the graphs generated for each comparison pair.In each graph, the 256 threshold values normalized in the range between 0 and 1 are on the x-axis, and the percentage of pixels whose absolute difference is equal to or lower than the corresponding value is on the y-axis.The interval of the values on the graph depends on the data.Figure 3 visualizes the aforementioned process in three successive steps.For this example, an interval of 0.2 is selected.At this point, it is important to mention that the final plot is generated automatically (having a standard format) by the Python tool that is developed within the present study.Curve-fitting techniques are applied to model the generated graphs.Specifically, three different mathematical models are employed, including (a) a hexic (6th degree) polynomial function, (b) the rectangular hyperbola, and (c) the logistic function (Figure 4).The above-mentioned functions are selected based on the similarity in their curves to the graphs generated from the data derived from the computations of the first function.

Results
The results of this research, along with the materials used for their computation, are the components of the OnMapGaze dataset.The 75 cartographic backgrounds that served as visual stimuli in the experiment are included in their native resolution (1920 × 1080 px).The raw gaze data, collected in free-viewing conditions for each participant observing each visual stimulus, are stored in text files.The data were recorded at a sample rate of 500 Hz, resulting in 1500 samples for each cartographic background observed by each participant.This implies that the total number of samples for all participants across all cartographic backgrounds is equal to 3,375,000.Furthermore, the aggregated data for each visual stimulus observed by all participants are included in the dataset as .txtfiles.The total number of samples for each cartographic background is 45,000.The 75 statistical

Results
The results of this research, along with the materials used for their computation, are the components of the OnMapGaze dataset.The 75 cartographic backgrounds that served as visual stimuli in the experiment are included in their native resolution (1920 × 1080 px).The raw gaze data, collected in free-viewing conditions for each participant observing each visual stimulus, are stored in text files.The data were recorded at a sample rate of 500 Hz, resulting in 1500 samples for each cartographic background observed by each participant.This implies that the total number of samples for all participants across all cartographic backgrounds is equal to 3,375,000.Furthermore, the aggregated data for each visual stimulus observed by all participants are included in the dataset as .txtfiles.The total number of samples for each cartographic background is 45,000.The 75 statistical grayscale heatmaps are included in their native resolution (1920 × 1080 px).The GraphGazeD metric is appended to the dataset, and it is composed of a .csvfile that is made up of the data of the differences and the plots of these data along with the curve fitting functions.The components of the dataset are presented in Figure 5 and in more detail in Sections 4.1-4.3.

Results
The results of this research, along with the materials used for their computation, are the components of the OnMapGaze dataset.The 75 cartographic backgrounds that served as visual stimuli in the experiment are included in their native resolution (1920 × 1080 px).The raw gaze data, collected in free-viewing conditions for each participant observing each visual stimulus, are stored in text files.The data were recorded at a sample rate of 500 Hz, resulting in 1500 samples for each cartographic background observed by each participant.This implies that the total number of samples for all participants across all cartographic backgrounds is equal to 3,375,000.Furthermore, the aggregated data for each visual stimulus observed by all participants are included in the dataset as .txtfiles.The total number of samples for each cartographic background is 45,000.The 75 statistical grayscale heatmaps are included in their native resolution (1920 × 1080 px).The GraphGazeD metric is appended to the dataset, and it is composed of a .csvfile that is made up of the data of the differences and the plots of these data along with the curve fitting functions.The components of the dataset are presented in Figure 5 and in more detail in Sections 4.1-4.3.

Basic Eye Movement Indices
Basic eye movement metrics generated based on the collected data were first presented by Liaskos and Krassanakis [65].These metrics are incorporated into the OnMapGaze dataset.However, the previous study was a preliminary work that aimed to investigate the representative eye movement metrics that characterize the observation of this specific type of visual stimuli.In the present study, further analysis is performed on these data in order to create three different lists that rank (from higher to lower value) the experimental stimuli based on the generated values of basic eye movement metrics.In more detail, the indices of median fixation duration, median number of fixations, and median saccade amplitude are used for this analysis.The results of the ranking process are shown in Table 1.

Statistical Grayscale Heatmaps
The OnMapGaze dataset contains the statistical grayscale heatmaps generated for each visual stimulus of the performed experiment.These heatmaps constitute a cumulative product as they describe the visual behavior of all participants.Figure 6 shows three different examples of the highest-ranking visual stimuli presented in Table 1.

Statistical Grayscale Heatmaps
The OnMapGaze dataset contains the statistical grayscale heatmaps generated for each visual stimulus of the performed experiment.These heatmaps constitute a cumulative product as they describe the visual behavior of all participants.Figure 6 shows three different examples of the highest-ranking visual stimuli presented in Table 1.

GraphGazeD Metric
GraphGazeD is calculated for 150 different comparison pairs of statistical grayscale heatmaps.The results are 150 graphs depicting the curve that quantitatively describes the difference between each pair.The graphs depict the curve fitting techniques while testing three different functions.The first test is performed with a hexic polynomial equation that has a good fit, and the coefficient of determination has values ranging from 0.71 to 0.98.The second test is performed with the rectangular hyperbola, which fits in most cases with a coefficient of determination equal to 1.00; however, there are cases without adjustment and a small value of the coefficient of determination.The last method is the sigmoidal function, which fits the data with a coefficient of determination value ranging from 0.91 to 0.99. Figure 7 depicts an example of a higher difference pair (on the left side) and a lower difference pair (on the right side).

GraphGazeD Metric
GraphGazeD is calculated for 150 different comparison pairs of statistical grayscale heatmaps.The results are 150 graphs depicting the curve that quantitatively describes the difference between each pair.The graphs depict the curve fitting techniques while testing three different functions.The first test is performed with a hexic polynomial equation that has a good fit, and the coefficient of determination has values ranging from 0.71 to 0.98.The second test is performed with the rectangular hyperbola, which fits in most cases with a coefficient of determination equal to 1.00; however, there are cases without adjustment and a small value of the coefficient of determination.The last method is the sigmoidal function, which fits the data with a coefficient of determination value ranging from 0.91 to 0.99. Figure 7 depicts an example of a higher difference pair (on the left side) and a lower difference pair (on the right side).
The second test is performed with the rectangular hyperbola, which fits in most cases with a coefficient of determination equal to 1.00; however, there are cases without adjustment and a small value of the coefficient of determination.The last method is the sigmoidal function, which fits the data with a coefficient of determination value ranging from 0.91 to 0.99. Figure 7 depicts an example of a higher difference pair (on the left side) and a lower difference pair (on the right side).Furthermore, Figure 8 presents the fitted curves that correspond to the highest and the lowest values of R 2 .
Multimodal Technol.Interact.2024, 8, x FOR PEER REVIEW 14 of 19 Furthermore, Figure 8 presents the fitted curves that correspond to the highest and the lowest values of R 2 .

Discussion and Conclusions
In this study, the examination of aggregated gaze data provides insights into the cumulative behavior of participants while observing cartographic backgrounds from different online map services.The GraphGazeD metric is an attempt to scale down the problem of quantitatively describing the differences between two statistical grayscale heatmaps, and thus, the distinction between the salient objects of two cartographic backgrounds becomes evident.Additionally, the curve fitting functions are selected and applied based on their fit to the data visualized on the graphs.The functions are non-linear, and the coefficient of determination is used to measure their goodness of fit.In the case of nonlinear models, the R 2 measure is often misinterpreted [66].Although R 2 has high values, additional data are required to assess the suitability of a regression equation for modeling the generated graphs [67].In the case of the rectangular hyperbola function specifically, R 2 is equal to 1 in the best-fitting case and equal to 0.39 in the worst-fitting case.This means that the coefficient represents the noise rather than the genuine relationship.In the case of the hectic polynomial function, the fitting is good and a further assessment of the suitabil-

Discussion and Conclusions
In this study, the examination of aggregated gaze data provides insights into the cumulative behavior of participants while observing cartographic backgrounds from different online map services.The GraphGazeD metric is an attempt to scale down the problem of quantitatively describing the differences between two statistical grayscale heatmaps, and thus, the distinction between the salient objects of two cartographic backgrounds becomes evident.Additionally, the curve fitting functions are selected and applied based on their fit to the data visualized on the graphs.The functions are non-linear, and the coefficient of determination is used to measure their goodness of fit.In the case of nonlinear models, the R 2 measure is often misinterpreted [66].Although R 2 has high values, additional data are required to assess the suitability of a regression equation for modeling the generated graphs [67].In the case of the rectangular hyperbola function specifically, R 2 is equal to 1 in the best-fitting case and equal to 0.39 in the worst-fitting case.This means that the coefficient represents the noise rather than the genuine relationship.In the case of the hectic polynomial function, the fitting is good and a further assessment of the suitability of the regression is needed, though the main setback is the computational capacity needed for such an equation.As for the logistic function, the fitting is good according to R 2 and the data visualization resembled the s-shaped curve in the first quadrant of the sigmoid function's curve.The relationship of the data was expected to be sigmoidal because the difference in heatmaps is actually the difference in the intensity values of each heatmap that represent the probability of occurrence of each point.
The ability to describe the qualitative differences between two heatmaps is a valuable tool in the creation of user-oriented maps.Statistical grayscale heatmaps present the allocation of visual attention to the cartographic backgrounds observed by the participants.Comparing the differences between the observation and gaze behavior based on the salient objects, a further understanding can be obtained concerning the design of the visual stimuli.The examination of the cartographic elements that differ in the same geographic area, but using different online map services, reveals the hierarchical organization of the map elements and how it changes with the symbolization used.In this way, evaluating the visual variables that make the visual stimuli have many differences from each other will enhance the hierarchical organization and will achieve a balanced symbology [53].Furthermore, the computation of the GraphGazeD metric can be utilized to model gaze behavior.Statistical grayscale heatmaps can serve as the main input for the creation of cartographic free-viewing saliency models [13,29,68] that predict participant gaze behavior, and the specific metric can serve to model the differences and similarities in gaze behavior [69].If gaze behavior and its differences are known, the investigation of an effective and efficient map design is reinforced by evaluating alternative symbolization techniques on cartographic backgrounds and/or web maps, resulting in the design of user-oriented maps.
The basic eye movement indices contribute to describing the aggregate allocation of visual attention during the participants' observations of the experiment's visual stimuli.The computation of such indices refers to the gaze behavior produced and influenced by the salient objects of the cartographic backgrounds because the experiment is conducted under free-viewing conditions.The ranking list of the selected indices (Table 1) can be used as a reference to describe the complexity in the cartographic design of the visual stimuli in relation to the zoom level and the online map service.The highest-ranking cartographic backgrounds regarding the median fixation duration, the median number of fixations, and the median saccade amplitude are "Katerini_16z_ESRI", "Alexandroupoli_16z_Google" and "Ioannina_14z_Google", respectively.The qualitative description of the basic eye movement indices allows for the quantitative cross-check with the statistical grayscale heatmaps and the cartographic backgrounds, hence enhancing our comprehension.Observing the "Katerini_16z_ESRI" visual stimulus, it is evident that the allocation of visual attention is found on the complex road network with its contrasting color.As Skopeliti and Stamou [53] mention, color is the most frequently employed method of commenting upon the structural elements of the map [70].In the case of "Alexandroupoli_16z_Google", the high number of fixations is due to the large number of pictorial symbols concentrated in the depicted area.As for the case of "Ioannina_14z_Google", the high ranking in median saccade amplitude can be understood when observing the salient objects disclosed by the statistical grayscale heatmap and the symbology present at these locations.The high saccade amplitude value is the distance between the point furthest to the left on the heatmap and the point furthest to the right on the heatmap.Both points follow the Gestalt principle of the focal point, attracting the participants' attention, because they are pictorial points with darker hues than their backgrounds.Consequently, the calculated statistics complement the qualitative analysis of the grayscale statistical heatmaps.No patterns were detected for the correlation between the ranking and the online map service or the zoom level of the visual stimuli in this phase of the analysis; therefore, further analysis is required to gain a more comprehensive understanding of this issue.
The source code developed for the computation of the GraphGazeD metric is provided in a GitHub repository (see Supplementary Materials).There are two different versions that can be used for different purposes.The first version is created for the comparison of images within a directory with the same naming form as the files in this study.In this way, the flexibility of the script can aid the user in easily manipulating the data and making comparisons dedicated to the online map service used, the zoom level, or the geographic area depicted on the visual stimulus.The second version is designed for the comparison of images in general.Additionally, both versions can be found in a Jupyter notebook along with some indicative examples.
As mentioned above, the tool created to compute the GraphGazeD metric is provided freely.In this manner, the users are able to manipulate their data.For example, they can decide if their data will be used as input, either from a directory like in the script or as plain files.Additionally, the user can change the type of the output file where the differences are stored.The images to be compared can be any type they want, not only uint8 images like in this study.The threshold used is another variable that can be modified for more targeted results.As for the curve-fitting function, more coefficients can be added to examine the goodness of fit.Finally, the tool can be employed to create a comprehensive toolbox that will enable modeling the differences in aggregate gaze behavior among heatmaps.Hence, such a tool can be integrated with others, such as the PeyeMMV toolbox [71], in order to deliver complete and open-source software solutions to the scientific community.
The OnMapGaze dataset can be employed to model the visual behavior of participants who observed cartographic backgrounds derived from online map services in free-viewing conditions.The statistical grayscale heatmaps are produced by considering many samples collected with high accuracy, making them suitable to serve as objective ground truth in deep learning approaches [13,57,58].Considering that cartographic backgrounds constitute, in practice, abstract (and not natural) images, modeling approaches can result in cartographic-oriented visual saliency models.The statistical indices provided offer insight into the visual strategies employed by the participants.The open-access nature of the collected dataset allows other researchers to analyze the data further and/or create new indices.However, the designed approach for heatmap comparison can be implemented in any other type of visual stimuli, including natural or more abstract images.Moreover, although the proposed metric cannot be directly compared to typical similarity measures used in image recognition since it is based on a graph-based approach that reveals the aggregated behavior in a range of threshold values, an interesting point for future research involves the comparison of the modeled graphs with such type of measures (e.g., Jaccard index, dice coefficient, and BF score).
The typical eye movement indices, the statistical grayscale heatmaps, and the computation of new metrics (such as GraphGazeD) could direct AI algorithms toward predicting visual behavior during the observation of different types of cartographic products.As a consequence, this will contribute to a better understanding of how map users react to cartographic backgrounds and, at the same time, serve as an objective evaluation approach for alternative choices in the design variables (i.e., visual, dynamic, and sound variables) and/or Graphical User Interfaces (GUIs).
This presented research study examines the visual perception of static cartographic backgrounds.This may be a limitation when modeling the visual behavior of dynamic products is needed.Therefore, in the next step, similar experiments could be performed considering animated maps and/or animated cartographic symbols [22].Furthermore, the produced gaze dataset refers to different cities in Greece.Considering that the cartographic symbol density could differ from region to region (even in the same map service and/or zoom level), the development of similar datasets based on cartographic backgrounds referring to different geographical regions will help to better model the visual behavior of map users.
depicts some indicative examples of the experimental visual stimuli.

Figure 1 .
Figure 1.Indicative samples of experimental visual stimuli.

Figure 1 .
Figure 1.Indicative samples of experimental visual stimuli.
depicts a flowchart of the main experiment.

Figure 2 .
Figure 2. A flowchart of the main experiment in the SR Research Experiment Builder environment.

Figure 2 .
Figure 2. A flowchart of the main experiment in the SR Research Experiment Builder environment.

Figure 3 .
Figure 3. Graph-based metric computation in three successive steps.For the illustrative example, an interval of 0.2 is selected.

Figure 3 .
Figure 3. Graph-based metric computation in three successive steps.For the illustrative example, an interval of 0.2 is selected.Curve-fitting techniques are applied to model the generated graphs.Specifically, three different mathematical models are employed, including (a) a hexic (6th degree) polynomial function, (b) the rectangular hyperbola, and (c) the logistic function (Figure 4).The abovementioned functions are selected based on the similarity in their curves to the graphs generated from the data derived from the computations of the first function.Multimodal Technol.Interact.2024, 8, x FOR PEER REVIEW 10 of 19

Figure 4 .
Figure 4. Curve-fitting examples for modeling the graph-based metric.Blue lines represent the calculated values before fitting process.

Figure 4 .
Figure 4. Curve-fitting examples for modeling the graph-based metric.Blue lines represent the calculated values before fitting process.

Figure 5 .
Figure 5.The components of the OnMapGaze dataset.Blue lines represent the calculated values before fitting process.

Figure 6 .
Figure 6.Aggregated statistical grayscale heatmaps produced for the highest-ranking visual stimuli.

Figure 6 .
Figure 6.Aggregated statistical grayscale heatmaps produced for the highest-ranking visual stimuli.

Figure 7 .
Figure 7.An example of a higher difference pair (on the left side) and a lower difference pair (on the right side).

Figure 7 .
Figure 7.An example of a higher difference pair (on the left side) and a lower difference pair (on the right side).

Table 1 .
Ranking process of all experimental visual stimuli based on basic eye movement metrics.