EEG & Eye Tracking User Experiments for Spatial Memory Task on Maps

The aim of this research is to evaluate the use of ET and EEG for studying the cognitive processes of expert and novice map users and to explore these processes by comparing two types of spatial memory experiments through cognitive load measurements. The first experiment consisted of single trials and participants were instructed to study a map stimulus without any time constraints in order to draw a sketch map afterwards. According to the ET metrics (i.e., average fixation duration and the number of fixations per second), no statistically significant differences emerged between experts and novices. A similar result was also obtained with EEG Frontal Alpha Asymmetry calculations. On the contrary, in terms of alpha power across all electrodes, novices exhibited significantly lower alpha power, indicating a higher cognitive load. In the second experiment, a larger number of stimuli were used to study the effect of task difficulty. The same ET metrics used in the first experiment indicated that the difference between these user groups was not statistically significant. The cognitive load was also extracted using EEG event-related spectral power changes at alpha and theta frequency bands. Preliminary data exploration mostly suggested an increase in theta power and a decrease in alpha power.


Introduction
Developments in medical research allow scientists to observe neurons in the brain with a high spatial and temporal resolution. As "scientific cartography" emerged in the early 1900s, it became possible to borrow theories and methods from experimental psychology to study how map design influences map use in a formal, systematic and empirical way rather than the trial and error method [1]. In this respect, scientific cartography has long dealt with cognitive issues of maps and map use, and Eckert applied experimental psychology principles to establish the laws of map logic [1]. According to him, map logic complies with the map creation laws, which strongly influence cartographic perception [2].
To be able to understand map users' behaviors, it is important to identify the cognitive procedures. Human memory functions within a sequence of three stages; sensory memory, working (short-term) memory, and long-term memory (LTM) [3]. Cognitive processes and strategies particularly occur during circumstances such as being aware of where to look at in a map or the interpretation of map-related information regarding other knowledge stored in LTM. Since different map users have different information stored in their memory, they are expected to have different strategies while reading maps [1]. Therefore, expertise is one of the major individual differences across map users. During a map-related task, all map users, especially novices, rely on the general map knowledge (e.g., knowing that the contour lines represent the elevation) whereas experts mostly consult their specific map knowledge (e.g., knowing the direction of slope by interpreting the contour lines). Specific map knowledge enables experts to establish spatial relationships in a more structured and systematical way compared to non-experts [4]. With the understanding of map knowledge of users, cartographers can focus on effective map designs that ideally do not cause a high cognitive load.
As fundamental units of cartographic design, Bertin's [5] visual variables (i.e., position, size, shape, value, color hue, orientation, and texture) maintain the visual hierarchy (as described in Gestalt theory) which is essential to improve map logic by distinguishing and grouping map symbols and encoding map-related information [1,4]. That is why map perception is, in a way, dependent on the decision of the visual variables used in its design. As important as design elements, the map task is fundamental in cognition because cognitive procedures generally occur instantaneously and within a specific task or context.
To measure real time cognitive responses of map users, cartographers have hitherto implemented many methods in cartographic usability research such as eye tracking (ET), sketch maps, thinking aloud, interviews or questionnaires e.g., [6][7][8]. The cartographic eye-tracking research has focused on the interpretation of visual information while performing a complex visual and cognitive task e.g., [9,10], visual interaction with highly interactive interfaces e.g., [11], cognitive processes linked with visual search in maps e.g., [12], and learning and remembering the information presented via maps e.g., [4]. The insights provided by eye tracking studies are promising for understanding how the human brain handles map-related cognitive tasks, yet there is still much research to do to elaborate how visual elements affect map use (e.g., perception, memory, cognition, etc.) and how to leverage visual variables to facilitate map use with less cognitive load. There is also a lack of empirical evidence on the users' cognitive processes involved in map tasks, especially on the sources of individual differences (i.e., expertise, gender, etc.) and the relationship between the organization of spatial thinking and geographic space e.g., [8,13,14].
Authors propose that non-invasive brain-imagining techniques (e.g., electroencephalogram (EEG)) can benefit cartographic user studies by providing direct measures of brain activity during cognitive processes. EEG, which is used to monitor the electrical activity in the brain, can be combined with other quantitative methods, such as eye tracking, to gain a better understanding of cognitive abilities and limitations of different groups of map users and how visual elements influence map use. The insights that particularly arose from the differences due to expertise will henceforth contribute to creating effective cartographic products.
Although there exists not much research showing the relationship between ET and EEG within the cartographic context, the outcomes of these two methods might provide different outcomes in terms of cognitive load. For instance, Gedminas [15] explored how hurricane advisory maps are perceived by comparing the existing maps and their alternative map designs and found out that fixation durations and the number of fixations that these maps received do not differ significantly, while frontal EEG analysis indicated that the alternative maps had a larger and more positive effect on the user [15].
This study deals with the cartographic user experiments employing ET and EEG as simultaneous and synchronized data collection methods in collaboration for spatial memory tasks on maps (see Figure 1). The goal of the study is twofold: (i) studying the cognitive processes of map users, and (ii) evaluating the use of EEG for these processes by comparing two types of experiments, also allowing to triangulate ET and EEG findings and draw conclusion on the suitability of the methods, especially the contribution of EEG.
In this context, we introduce two user experiments both aiming to explore the (cognitive) strategies of experts and novice map users through cognitive load measurements when they are asked to memorize and then remember a (part of) map content with varying levels of complexity. Due to the methodological differences in the experiment designs, in the first experiment, we used simple and exploratory measurements for cognitive load extraction. However, the findings of the first experiment contributed to the experimental design of the second one. They were utilized as inputs for hypothesizing the second experiment as some of the findings were fundamental for the motivation of the second experiment. Therefore, the second experiment was designed in a more complex way of addressing in-depth investigation of cognitive load. In other words, it became possible to identify how cognitive load affects the recalling performance for both map user groups, and whether some features are recalled independently of task difficulty. If so, we can identify which features are recalled easily/primarily with respect to other features recalled within the task, especially when the task demands higher cognitive load. Moreover, we hope to contribute to cartographic usability research by introducing a brief overview on the methodology of ET and EEG experiments, because it enables us to explore the behavioral and neurophysiological responses of map users and helps with understanding the influence of cartographic design and task on individual map users. It has rarely been applied in a cartographic setting before, especially for map reading instead of map usability e.g., [15]. complex way of addressing in-depth investigation of cognitive load. In other words, it became possible to identify how cognitive load affects the recalling performance for both map user groups, and whether some features are recalled independently of task difficulty. If so, we can identify which features are recalled easily/primarily with respect to other features recalled within the task, especially when the task demands higher cognitive load. Moreover, we hope to contribute to cartographic usability research by introducing a brief overview on the methodology of ET and EEG experiments, because it enables us to explore the behavioral and neurophysiological responses of map users and helps with understanding the influence of cartographic design and task on individual map users. It has rarely been applied in a cartographic setting before, especially for map reading instead of map usability [e.g., 15].

Methodology
The (spatial) memory task in both user experiments focuses on the study process of the main structuring map elements (i.e., roads, green areas, and hydrography) of a map stimulus to be retrieved later. Accordingly, visual variables (e.g., shape, size, color, etc.) used for depicting those elements play an important role in the experimental design because we utilize them to design maps to be used either with less or more cognitive load. While roads contain only linear, and green areas contain only polygon features, hydrography contains both linear and polygon features. Inherently, recalling one or a combination of those can be linked to the different levels of task difficulty; hence, each different (or a group of) task is assumed to cause different cognitive loads. For instance, linear features are easier to learn and remember regardless of paying too much attention, and besides the color, the shape and size of map elements have an equally important impact on visuospatial memory [4]. Table 1 summarizes all the aspects of the experiment design for the first and second experiment.

Methodology
The (spatial) memory task in both user experiments focuses on the study process of the main structuring map elements (i.e., roads, green areas, and hydrography) of a map stimulus to be retrieved later. Accordingly, visual variables (e.g., shape, size, color, etc.) used for depicting those elements play an important role in the experimental design because we utilize them to design maps to be used either with less or more cognitive load. While roads contain only linear, and green areas contain only polygon features, hydrography contains both linear and polygon features. Inherently, recalling one or a combination of those can be linked to the different levels of task difficulty; hence, each different (or a group of) task is assumed to cause different cognitive loads. For instance, linear features are easier to learn and remember regardless of paying too much attention, and besides the color, the shape and size of map elements have an equally important impact on visuospatial memory [4]. Table 1 summarizes all the aspects of the experiment design for the first and second experiment.

Research Question
How does cognitive load vary between experts and novices while memorizing the main structuring elements of a map stimulus without any time constraints?
How does cognitive load vary between experts and novices while memorizing a (part of) map content in a limited study period? How does the complexity/difficulty of the task influence the cognitive load?

Goal
To evaluate the cognitive processes, abilities and/or limitations of map users when they first study a 2D static map and retrieve this information later.
To test the effect of task difficulty on behavior, which is the retrieval of the main structuring elements with varying levels.

Hypothesis
We expect that the spatial memory task will cause higher cognitive load in novices compared to experts.
The tasks involving the retrieval of only linear features will cause less cognitive load for both groups compared to the other features. We additionally expect that experts would perform better at tasks demanding higher cognitive load. Task procedures Participants studied one map stimulus for as long as they wanted to memorize all the main structuring elements included in the map they studied. Once they thought they had studied the map long enough, they pressed a certain key and then they had to draw this map from memory by using MS Paint. After drawing the sketch map, participants used a special key to terminate the task. 1 map design type (i.e., 2D static topographic map) 1 task difficulty level (i.e., retrieval of the main structuring elements of the whole map stimulus) 2 expertise levels (i.e., experts vs. novices) 1 map design type (i.e., Google maps stimuli) 7 task difficulty levels (i.e., classified as easy, moderate, hard)~linear & polygon features within blocks 2 expertise levels (i.e., experts vs. novices)

Dependent variables
Trial durations *, eye movements, EEG (alpha power, FAA), self-reported metrics (i.e., questionnaire) * Response time of correct answers, eye movements, EEG metrics (ERD-ERS), self-reported metrics (i.e., questionnaire) * * not mentioned in this paper, but published in [4]. In the first experiment, participants were asked to study a map stimulus as long as they would like in order to draw a sketch map of what they had studied. The map stimulus used in this experiment is a simplified topographic map that was produced by Belgian National Mapping Agency, NGI/IGN (Nationaal Geografisch Instituut/Institut Géographique National), and was also used by Ooms [12] and Keskin et al. [4]. According to the results of [4] and [12], we hypothesized that the spatial memory task will cause higher cognitive load in novices. To explore participants' recalling strategies, we evaluated the drawn elements in the sketch maps and analyzed fixation related and AOI-based eye tracking metrics (for more detail about the experimental settings and results, please read [4]).
This experiment resulted as single trials of one spatial memory task, but of long ET and EEG recordings due to the absence of time constraints. The ET metrics used as indicators of cognitive load were (i) average fixation duration and (ii) the number of fixations per second, which are frequently used metrics by many researchers when studying individual differences e.g., [12,15,16]. Average fixation duration is useful to study attentional procedures to one specific stimulus, whereas the number of fixations per second reveals the speed of attention [17].
To extract the cognitive load from EEG data, we first averaged alpha power for all recording EEG channels and calculated Frontal Alpha Asymmetry (FAA) using frontal channels. Cognitive load can be measured using EEG activity power spectrum, and several researchers have repeatedly proved that the spectral power changes under alpha and theta frequency bands are related to task difficulty and therefore good predictors of cognitive load in a variety of working memory task demands [18,19]. These studies have found that alpha activity (particularly over the parietal and occipital areas) decreases with growing task demands that inherently cause working memory performance to decrease, whereas theta activity increases (especially over frontal midline areas) when encoding new information [19][20][21][22]. A decrease in alpha power is a sign of attentional demands or comparatively high neuronal excitability (i.e., processing visual information or responding to internal events, e.g., mental activation or cognitive effort), while an increase in power reflects inhibition or cortical deactivation [23].
FAA is a commonly used measure for motivation, emotion, and cognitive control e.g., [24,25]. Greater relative left frontal activity is associated with increased memory & attentional performance and more-focused task performance [26]. FAA is the average hemispheric difference in EEG alpha power between the left and right frontal regions of the brain during EEG recording [27][28][29]. We computed the alpha asymmetry using the left (F3) and right (F4) frontal channels with the following formula [30] (Equation (1) Since EEG power is inversely correlated with the activation, the negative alpha asymmetry scores correspond to greater relative right frontal activation, whereas positive ones indicate greater relative left frontal activity [30][31][32][33]. More activity in the left-frontal hemisphere indicates approach and motivation, whereas greater relative right activation refers to withdrawal and avoidance [26].

Experiment 2
The goal of the second experiment is fundamentally the same as the first one, but more complex, and there were some important modifications in terms of experimental design including participants, task and stimuli, procedures, and psychological measures to extract cognitive load. In this chapter, we will touch on all these aspects in detail.
The theoretical background for formulating the hypotheses of the second experiment was based on the observations in Experiment 1 [4], whose outcomes will be presented in the results section, and the existing literature [34,35]. In this context, linear features are primary to construct the whole map, and therefore, they are easily accessible in working memory. We additionally expect that experts would perform better at tasks demanding higher cognitive load.

Participants
Since we intend to explore the influence of the task on cognitive load between expert and novice map users, within and between designs should be combined. In this context, both experts and novices, whose age and gender match (N = 22, MED = 27.5, SD = 3.9), performed the same experiment under the same conditions (see Table 1 for more detail).

Task and Stimuli
The spatial memory task in this experiment focuses on the retrieval of the main structuring elements with varying difficulty levels. Compared to the first experiment, we increased the number of stimuli of interest and the reference stimuli (e.g., fixation cross), presented them as randomized blocks, added more levels of task difficulty, put time constraints in place, and allowed participants to select from multiple choices instead of drawing the sketch maps themselves as applied in the first experiment.
Next to the fixation crosses used as a pre-stimulus reference, the experiment included two types of visual materials: (i) original map stimuli to be studied and (ii) the corresponding skeleton maps displayed on the graphical response screens. The original map stimuli were acquired from Google maps at zoom level 15 with 1 km scale bar (since the resolution of a map with the Mercator projection is dependent on the latitude, the scale of the maps (collected from regions all around the world) varies slightly but is approximately 1:40000). The skeleton maps are the simplified representations of map stimuli indicating the main structuring map elements of interest for that specific task and were prepared by digitizing the main structuring map elements on the original stimuli using a GIS software. Throughout the design of the skeleton maps used in the experiment, we paid attention to depict each map feature class with a unique color and to make sure that these colors remained true to the ones used in the original stimuli. Accordingly, the main roads were assigned to yellow, major hydrographic features to light blue and the green areas to light green. The maps (1344 × 768 pixels, 14' × 8') and the graphical response screens including four panels (576 × 326 pixels, 6' × 3.4') were shown on a 22" color monitor with 1680 × 1050 spatial resolution.
Tasks including the same number of trials related to the same map element were classified as blocks. For the randomization of stimuli used in trials, randomized block design was used and in total seven blocks of trials were designed. Each block consisted of one trial for each stimulus (i.e., 50 trials within a block) focusing on the similarity of one of the criteria listed in Figure 2: the main structuring elements of (b) the whole map, (c) roads and hydrography, (d) roads and green areas, (e) green areas and hydrography, (f) green areas, (g) hydrography, and (h) roads. The trials in Block 1 were designed to study the recalling performance related to the entire map stimulus; therefore, the skeleton maps were prepared by digitizing all the main roads, all the major hydrographic features and the green areas on the original map stimuli. The trials included in Block 2, 3 and 4 were dedicated to the retrieval of the combination of two map feature classes. In this case, Block 2 refers to the main roads and major hydrographic features, whereas Block 3 addresses the main roads and green areas, and Block 4 involves major hydrographic features and green areas. The trials belonging to Block 5, 6 and 7 deal with a single map feature class; either green areas, hydrographic or road features, respectively, and each of them were digitized individually on the original stimuli.
One important concern about the design is that the task difficulty may not be predicted easily in advance, because it depends on many factors rather than only the number of object classes to remember. According to the average reaction time of the correct answers provided by all participants, we observed clustering among some blocks and natural breaks between those clusters. Subsequently, Block 1 and 2 were designated as hard; Block 3 and 4 as moderate; and the rest were assigned to easy level ( Figure 3). By this way, the blocks falling into the same category can be treated similarly when analyzing and interpreting the gaze and neurophysiological data (i.e., eye tracking, EEG) collected during the entire experiment.

Procedures
Measuring the cognitive load is linked to how a participant indicates a correct answer on the response screen presented to her/him, and reaction times of key presses are a simple and rather reliable way to measure it. During the trial, participants were first asked to study a map stimulus and during the stimulus presentation, they were free to shift their gaze across the display. The response screen appeared with four graphical response panels that shows skeleton maps indicating specific main structuring map elements (Figure 4). Only one of the panels corresponded to the map that a participant just saw (a correct response). Participants were instructed to press the space bar immediately when they found the panel with the correct skeleton map and to remember the corresponding letter.
during the stimulus presentation, they were free to shift their gaze across the display. The response screen appeared with four graphical response panels that shows skeleton maps indicating specific main structuring map elements (Figure 4). Only one of the panels corresponded to the map that a participant just saw (a correct response). Participants were instructed to press the space bar immediately when they found the panel with the correct skeleton map and to remember the corresponding letter.
Pressing the space bar indicated that the search was complete by allowing participants to move to the second response screen where they would see only the letters (i.e., no pictures) (Figure 4). They should click on the letter, which they were keeping in memory to complete the task. If multiple features were needed to be remembered (e.g., roads and hydrography), a participant might remember only one type (e.g., hydrography), and then find a correct skeleton map based only on this type of information. Thus, the options in the graphical response panels assured that a response based on partial information was impossible. Additionally, the possible answers (correct skeleton maps) appeared at different locations between each consecutive trial and the block orders were counterbalanced across participants. Overall, each participant had to complete all seven blocks.

Psychological Measures to Use: ET & EEG Metrics
We used the same eye-tracking metrics employed in the first experiment to extract the cognitive load: average number of fixations per second and fixation durations for each trial.
Events refer to the time points where the stimuli of interest are presented to the participants. During a cognitive task, event-related power changes in EEG bands can be quantified in a specific frequency band. If the event-related power decreases, it causes a reduction of amplitude in response to a stimulus, and therefore is called event-related desynchronization (ERD), whereas power increases result in an increment of amplitude with stimulus presentation, and hence, are referred to as event-related synchronization (ERS) [19,22]. Alpha desynchronization and theta synchronization are fundamental EEG phenomena that have been used in multiple studies on cognitive load and task difficulty [e.g., [36][37][38][39]. ERD/ERS of the alpha band has been found to be especially sensitive to cognitive task performance and higher cognitive abilities [e.g., 40,41]. On the contrary, Gevins et al. [37] examined the changes in cortical activity during spatial and verbal working memory tasks and observed that theta activity increased in magnitude with higher task difficulty. These results suggest that alpha and theta oscillations are differently related to task difficulty; as task difficulty increases, alpha activity decreases (i.e., desynchronizes), whereas theta activity increases (i.e., synchronizes).
To be able to extract the alpha and theta spectral powers, the EEG data went through a series of preprocessing steps. For handling EEG and ET data together, we decided to use EEGLAB, an open source and interactive MATLAB toolbox [42], with the EYEEEG extension [43]. EEGLAB processes Pressing the space bar indicated that the search was complete by allowing participants to move to the second response screen where they would see only the letters (i.e., no pictures) (Figure 4). They should click on the letter, which they were keeping in memory to complete the task. If multiple features were needed to be remembered (e.g., roads and hydrography), a participant might remember only one type (e.g., hydrography), and then find a correct skeleton map based only on this type of information. Thus, the options in the graphical response panels assured that a response based on partial information was impossible. Additionally, the possible answers (correct skeleton maps) appeared at different locations between each consecutive trial and the block orders were counter-balanced across participants. Overall, each participant had to complete all seven blocks.

Psychological Measures to Use: ET & EEG Metrics
We used the same eye-tracking metrics employed in the first experiment to extract the cognitive load: average number of fixations per second and fixation durations for each trial.
Events refer to the time points where the stimuli of interest are presented to the participants. During a cognitive task, event-related power changes in EEG bands can be quantified in a specific frequency band. If the event-related power decreases, it causes a reduction of amplitude in response to a stimulus, and therefore is called event-related desynchronization (ERD), whereas power increases result in an increment of amplitude with stimulus presentation, and hence, are referred to as event-related synchronization (ERS) [19,22]. Alpha desynchronization and theta synchronization are fundamental EEG phenomena that have been used in multiple studies on cognitive load and task difficulty e.g., [36][37][38][39]. ERD/ERS of the alpha band has been found to be especially sensitive to cognitive task performance and higher cognitive abilities e.g., [40,41]. On the contrary, Gevins et al. [37] examined the changes in cortical activity during spatial and verbal working memory tasks and observed that theta activity increased in magnitude with higher task difficulty. These results suggest that alpha and theta oscillations are differently related to task difficulty; as task difficulty increases, alpha activity decreases (i.e., desynchronizes), whereas theta activity increases (i.e., synchronizes).
To be able to extract the alpha and theta spectral powers, the EEG data went through a series of preprocessing steps. For handling EEG and ET data together, we decided to use EEGLAB, an open source and interactive MATLAB toolbox [42], with the EYEEEG extension [43]. EEGLAB processes continuous and event-related EEG and other electrophysiological data (supports data from most of the commercially available software), and performs time/frequency analysis, artifact rejection, event-related statistics, and visualization of averaged or single-trial EEG data. Figure 5 demonstrates the pre-processed (i.e., filtered, bad channels removed, events added and modified based on correct responses, re-referenced, and segmented) EEG recordings belonging to an expert female participant. The vertical axis shows the amplitude (µV), i.e., the amount of energy in artifact-free EEG frequency bands listed on the left-hand side of the graph, whereas the horizontal axis represents time in seconds. The vertical lines on the graph labeled with vertical lettering (e.g., 148, 149, 150) are the event codes, and the intervals represented between the blue vertical lines and numbers above the upper part border of the graph (e.g., 16,17,18) indicate the epochs. continuous and event-related EEG and other electrophysiological data (supports data from most of the commercially available software), and performs time/frequency analysis, artifact rejection, eventrelated statistics, and visualization of averaged or single-trial EEG data. Figure 5 demonstrates the pre-processed (i.e., filtered, bad channels removed, events added and modified based on correct responses, re-referenced, and segmented) EEG recordings belonging to an expert female participant. The vertical axis shows the amplitude (µV), i.e., the amount of energy in artifact-free EEG frequency bands listed on the left-hand side of the graph, whereas the horizontal axis represents time in seconds. The vertical lines on the graph labeled with vertical lettering (e.g., 148, 149, 150) are the event codes, and the intervals represented between the blue vertical lines and numbers above the upper part border of the graph (e.g., 16,17,18) indicate the epochs. Once EEG data had gone through preprocessing steps, we segmented it based on trials. Figure  6 demonstrates the trial sequence of the experiment. To be able to calculate event-related power change at an electrode, we created epochs from the events of our interest based on two different intervals: • [0 2] s for the events in the reference interval -fixation crosses • [0 7] s for the events in the activation interval -map stimuli Bad epochs containing blink or muscle artifacts were rejected based on visual inspection and collected eye-tracking data. Prior to epoching, we synchronized the EEG recording with its corresponding ET recording through shared events present in the ET and EEG: start-event and endevent. Although the time synchronization accuracy of our system was not sufficient for studying eyefixation-related potentials, fixation and saccade detection on EEG help explain the EEG spikes elicited from the eye movements. Therefore, we think offline synchronization of ET data is still useful for artifact rejection (Figure 7). Once EEG data had gone through preprocessing steps, we segmented it based on trials. Figure 6 demonstrates the trial sequence of the experiment. To be able to calculate event-related power change at an electrode, we created epochs from the events of our interest based on two different intervals: continuous and event-related EEG and other electrophysiological data (supports data from most of the commercially available software), and performs time/frequency analysis, artifact rejection, eventrelated statistics, and visualization of averaged or single-trial EEG data. Figure 5 demonstrates the pre-processed (i.e., filtered, bad channels removed, events added and modified based on correct responses, re-referenced, and segmented) EEG recordings belonging to an expert female participant. The vertical axis shows the amplitude (µV), i.e., the amount of energy in artifact-free EEG frequency bands listed on the left-hand side of the graph, whereas the horizontal axis represents time in seconds. The vertical lines on the graph labeled with vertical lettering (e.g., 148, 149, 150) are the event codes, and the intervals represented between the blue vertical lines and numbers above the upper part border of the graph (e.g., 16,17,18) indicate the epochs. Once EEG data had gone through preprocessing steps, we segmented it based on trials. Figure  6 demonstrates the trial sequence of the experiment. To be able to calculate event-related power change at an electrode, we created epochs from the events of our interest based on two different intervals: • [0 2] s for the events in the reference interval -fixation crosses • [0 7] s for the events in the activation interval -map stimuli Bad epochs containing blink or muscle artifacts were rejected based on visual inspection and collected eye-tracking data. Prior to epoching, we synchronized the EEG recording with its corresponding ET recording through shared events present in the ET and EEG: start-event and endevent. Although the time synchronization accuracy of our system was not sufficient for studying eyefixation-related potentials, fixation and saccade detection on EEG help explain the EEG spikes elicited from the eye movements. Therefore, we think offline synchronization of ET data is still useful for artifact rejection (Figure 7). Figure 6. Trial sequence. The fixation cross was followed by the stimulus presentation; the stimulus remained visible throughout the study time. Activation interval ended with the presentation of a graphical response screen. Bad epochs containing blink or muscle artifacts were rejected based on visual inspection and collected eye-tracking data. Prior to epoching, we synchronized the EEG recording with its corresponding ET recording through shared events present in the ET and EEG: start-event and end-event. Although the time synchronization accuracy of our system was not sufficient for studying eye-fixation-related potentials, fixation and saccade detection on EEG help explain the EEG spikes elicited from the eye movements. Therefore, we think offline synchronization of ET data is still useful for artifact rejection (Figure 7).  For the computations of spectral power change of EEG activity, first, the band power of the EEG signal was computed by means of a time-frequency analysis that employs a standard Fast Fourier Transform (FFT). FFT transforms the EEG signal from the time domain into the frequency domain. Therefore, any time-dependent signal can be broken down into a collection of sinusoids, and EEG recordings can be plotted in a frequency power-spectrum. After the transformation, we averaged the spectral power of alpha (8.5-12.5 Hz) and theta (4.5-6.5 Hz) bands for our 7-seconds-long EEG recording (i.e., duration of the stimulus on the screen, activation period) of valid trials in each block.
To extract the cognitive load, event-related power changes can be quantified by contrasting the power in a specified frequency band during a cognitive task (e.g., spatial memory) with a preceding reference interval (i.e., ERD & ERS) (for detailed information please read [22]). In this context, the baseline (pre-stimulus) period of EEG power was used to compare with the event-related EEG power dynamics during the activation intervals in each epoch [44]. Event-related power change (ERP) at an electrode was obtained by subtracting the log-transformed power during pre-stimulus reference intervals from the log-transformed power during the activation intervals according to the following formula [45] (Equation (2)).
Note that this ERP should not be confused with the commonly used abbreviation for eventrelated potentials in EEG domain. After computing ERPs at alpha and theta frequency bands for all task difficulty levels considering expertise, the powers were compared to study the differences between expert and novices particularly based on low and high levels of complexity of tasks.

Experiment 1
The  For the computations of spectral power change of EEG activity, first, the band power of the EEG signal was computed by means of a time-frequency analysis that employs a standard Fast Fourier Transform (FFT). FFT transforms the EEG signal from the time domain into the frequency domain. Therefore, any time-dependent signal can be broken down into a collection of sinusoids, and EEG recordings can be plotted in a frequency power-spectrum. After the transformation, we averaged the spectral power of alpha (8.5-12.5 Hz) and theta (4.5-6.5 Hz) bands for our 7-seconds-long EEG recording (i.e., duration of the stimulus on the screen, activation period) of valid trials in each block.
To extract the cognitive load, event-related power changes can be quantified by contrasting the power in a specified frequency band during a cognitive task (e.g., spatial memory) with a preceding reference interval (i.e., ERD & ERS) (for detailed information please read [22]). In this context, the baseline (pre-stimulus) period of EEG power was used to compare with the event-related EEG power dynamics during the activation intervals in each epoch [44]. Event-related power change (ERP) at an electrode was obtained by subtracting the log-transformed power during pre-stimulus reference intervals from the log-transformed power during the activation intervals according to the following formula [45] (Equation (2)).
Note that this ERP should not be confused with the commonly used abbreviation for event-related potentials in EEG domain. After computing ERPs at alpha and theta frequency bands for all task difficulty levels considering expertise, the powers were compared to study the differences between expert and novices particularly based on low and high levels of complexity of tasks.
The average alpha power across all usable common EEG electrodes (i.e., C3, F3, F4, O1, P3, T5, T6) for all participants (usable data: 6 novices, 6 experts) was 0.000939 (SD = 0.000051, range = 0.000225 -0.002218). Shapiro-Wilk test was used to test the normality of the distribution of the data since our dataset is smaller than 2000 samples (N = 84). p = 0.000 suggested strong evidence that the data was not normally distributed (D(84) = 0.930, p < 0.05). The difference of 0.000171 in alpha power between experts (M = 0.001282, SD = 0.000064) and novices (M = 0.000853, SD = 0.0000777) was statistically significant according to non-parametric Man-Whitney U test (p = 0.024 < 0.05). The greater alpha power is associated with the less cognitive load, therefore, the results indicate that experts spent considerably less cognitive load on this memory task compared to novices. This outcome was important because while sketch map evaluation and ET metrics claimed the other way, EEG alpha power provided an additional insight referring to a significant difference in the spatial memory performance of experts and novices.
For the memory task, average FAA score across participants (usable data: 7 novices, 10 experts) was −0.149 (SD = 0.275, range = −0.810 to 0.160). According to the Shapiro-Wilk test, p = 0.006 showed that the data was normally distributed (D(17) = 0.870, p > 0.05), therefore, we applied two-way ANOVA to explore whether the difference between expert and novice groups was statistically significant. Novices (M = −0.054, SD = 0.252) and experts (M = −0.216, SD = 0.283) showed no significant difference in FAA scores (F(1,15) = 0.199, p = 0.245). However, 70% of experts had negative scores on this metric, which reflects greater relative right activation, suggesting withdrawal-related motivation. Although the average FAA scores were negative for novices, 57% of them exhibited larger left-hemispheric activation, which is an indicator of approach-oriented motivation and positive affective states.

Experiment 2
ET results are shown in Figure 8. Fixation durations of novices were longer, and the difference between experts and novices increased as the difficulty increased. For the hard tasks, this difference was the highest. On the contrary, the number of fixations (per second) of experts was higher, and the difference increased as the difficulty decreased. Therefore, these two groups differed the most for the easy tasks. The eye movement data for both metrics fit normal distribution (Shapiro-Wilk test) for easy and moderate tasks. For these two categories of task difficulty, no statistically significant difference emerged between experts and novices in terms of the average fixation duration (F easy = 0.261, p = 0.232; F moderate = 0.174, p = 0.514).
The difference in the number of fixations per second was not significant (F moderate = 1.861, p = 0.165) for moderate tasks, whereas it was significant for easy tasks (F easy = 0.006, p = 0.019). For the hard tasks, the average fixation durations were not normally distributed across participants (Shapiro-Wilk test) and we observed no statistically significant difference between expert and novice groups (Mann-Whitney U, p = 0.886). The data for the number of fixations per second fit the normal distribution (Shapiro-Wilk test), and no significant difference occurred between the two groups based on two-way ANOVA test (F hard = 0.064, p = 0.983).  Figure 9 depicts the ERPs in theta and alpha averaged for a novice male for Block 1 and Block 2. Here, we would like to show how individual data might include inconsistencies, although we observed negative alpha power and on the contrary, positive and relatively higher powers in theta frequency in most EEG channels (see Table 2 for frequency values at each electrode). Frontal channels (e.g., Fp1, F3, F7, and F8) might not be trusted because they might still contain small blink artifacts acting as confounding effects, however, except for that, we usually observed ERD (event-related desynchronization) in alpha and ERS (event-related synchronization) in theta power. Obviously, the cognitive load cannot be interpreted based on one or two participant data for a single block. The overall results will be of aggregating blocks based on task difficulty (i.e., easy, moderate, hard) and averaging many trials of many participants for every difficulty level. However, the preliminary data analysis seems promising for further analysis of the EEG power spectrum. With this study, we attempted to verify the proposed methodology and prove that with our experiment design and hardware & software set up, it is possible to synchronize ET and EEG data to obtain more detailed insight on user behaviors and observe the EEG metrics, alpha and theta power.   Figure 9 depicts the ERPs in theta and alpha averaged for a novice male for Block 1 and Block 2. Here, we would like to show how individual data might include inconsistencies, although we observed negative alpha power and on the contrary, positive and relatively higher powers in theta frequency in most EEG channels (see Table 2 for frequency values at each electrode). Frontal channels (e.g., Fp1, F3, F7, and F8) might not be trusted because they might still contain small blink artifacts acting as confounding effects, however, except for that, we usually observed ERD (event-related desynchronization) in alpha and ERS (event-related synchronization) in theta power. Obviously, the cognitive load cannot be interpreted based on one or two participant data for a single block. The overall results will be of aggregating blocks based on task difficulty (i.e., easy, moderate, hard) and averaging many trials of many participants for every difficulty level. However, the preliminary data analysis seems promising for further analysis of the EEG power spectrum. With this study, we attempted to verify the proposed methodology and prove that with our experiment design and hardware & software set up, it is possible to synchronize ET and EEG data to obtain more detailed insight on user behaviors and observe the EEG metrics, alpha and theta power.

Discussion
ET metrics in the first experiment showed that there was no significant difference between expert and novice map users, similar to what was found in the second experiment, except for novices exhibiting a lesser number of fixations for easy tasks. This finding is interesting considering our hypotheses and it can be evidence that expert and novices use similar strategies for moderate and hard tasks; however, novices might think about easy tasks more deeply or they find even easy tasks more confusing. On the other hand, we observed more fixations and longer fixation durations for novices in hard tasks, and a similar situation applies for experts in easy tasks. To be able to interpret

Discussion
ET metrics in the first experiment showed that there was no significant difference between expert and novice map users, similar to what was found in the second experiment, except for novices exhibiting a lesser number of fixations for easy tasks. This finding is interesting considering our hypotheses and it can be evidence that expert and novices use similar strategies for moderate and hard tasks; however, novices might think about easy tasks more deeply or they find even easy tasks more confusing. On the other hand, we observed more fixations and longer fixation durations for novices in hard tasks, and a similar situation applies for experts in easy tasks. To be able to interpret this outcome, we can look into saccade-related metrics because of longer search times, more fixations, shorter saccades, and longer fixation durations with increasing crowding and decreasing span size [46]. Triangulating ET data with EEG data might also contribute to judging this result better, therefore, there is still a lot of work to do in terms of further analysis. For instance, while ET metrics do not differ across different conditions, EEG metrics argue otherwise e.g., [15].
Studying the EEG metrics indicating the cognitive load suggested an important insight on map users and seems assuring to be integrated as a complementary methodology and a way of assuring the validity of research. However, EEG requires a quite extensive experience to acquire, analyze and interpret the data, and one of the motivations of this paper was to emphasize the importance of the experiment design, especially when EEG comes into play.
On the one hand, as methodological decisions are highly dependent on the research questions and hypotheses regarding them, it is important to describe a solid objective for the user study with psychological design principles in mind and to identify the key metrics answering the research questions. On the other hand, although the experiment within this paper is limited to a spatial memory task and the methodological design of the other experiments may vary on a large scale, the technical issues to overcome and the preprocessing steps of the collected data are valid for almost all ET&EEG experiments. Recording EEG and ET data in free-viewing tasks has been a challenge and rarely applied, especially due to the precise co-registration of gaze position. To minimize the muscle artifacts due to unnatural sitting positions, using a chin rest and adapting the position of the participant is crucial; besides, this makes sure that the participant has enough rest between blocks so that they do not exhibit fatigue and move as little as possible during the experiment. Electromagnetic artifacts that are elicited from other electrical devices and introduced as line noise in EEG data should be identified and filtered out. For accurate synchronization of both EEG and ET data records, Transistor-transistor logic (TTL) triggers is preferred as it is the most straightforward and reliable method e.g., [47,48]. Although proper synchronization can be achieved with the TTL trigger method, in our experiment, the monitor offset value restricts studying the eye-fixation-related potentials (EFRP) requiring high temporal resolution in terms of synchronization of EEG and ET. However, our experiment setting allows for studying the EEG activity power spectrum, and ET data can still be synchronized offline and ET metrics can be correlated with EEG data on a trial basis. Therefore, the feasibility of the methodology should always be verified in advance considering the possible technical constraints related to the recording equipment.
Although some procedures such as data management (e.g., converting data into a compatible format with EEGLAB) and noise filtering (e.g., applying high-and low-pass filters on the fly) can be automatized, many other steps such as bad channel removal, which is mostly carried out by visual inspection, are performed manually. In addition, preprocessing and analyzing the data is inherently the most labor-intensive and complicated part of the study. Since each participant's data consists of a number of trials and should be handled individually, the processing stage is overall very time-consuming.

Conclusions
We presented two cartographic user experiments first to demonstrate what is possible with the co-registration of EEG and ET and to investigate the spectral characteristics of cognitive processes in free viewing conditions, only within the frame of the specific spatial memory task described throughout the paper. Our results showed that EEG can be employed as a complementary technique to get a detailed insight about user actions and behaviors and reveal the information that we did not observe with eye tracking. While eye tracking metrics in the first experiment demonstrated that the difference between experts and novices are not significant, the EEG alpha power analysis suggested that this difference was significant, indicating that this specific spatial memory task caused more cognitive load in novices. Therefore, triangulating EEG and ET data seems useful to be able draw conclusions on user's behavior and also shows that the data require more investigation.
Although the analysis of the second experiment is still in progress, preliminary results of event-related power changes in alpha and theta allowed us to estimate the variations in the cognitive load that a certain task demands. The future work will focus on alpha & theta power computations considering both user groups and varying task difficulties. In this respect, alpha and theta power changes will be averaged for easy, moderate and hard tasks considering experts and novices to explore the influence of expertise on the cognitive load. By this way, we will be able to tell whether there is a difference across participants, and if so, how much this difference is and how significant it is. Having ET metrics calculated, we will then link and correlate them with EEG metrics to estimate the overall cognitive load.
Combining EEG and ET is not straightforward since there are numerous methodological and technical problems to overcome, yet it is indeed a very valuable technique to explore the individual differences and similarities of map users through perceptual and cognitive procedures. If we continue staying engaged with experimental psychology and cognitive science research, it will contribute to the future progress of scientific cartography. The more we know about the limitations and capabilities of visual perception and cognition of different map users, the higher the possibilities to design cartographic products in a more efficient, understandable and effective way.