Toward Gaze-Based Map Interactions: Determining the Dwell Time and Buffer Size for the Gaze-Based Selection of Map Features

: The modes of interaction (e.g., mouse and touch) between maps and users affect the effectiveness and efﬁciency of transmitting cartographic information. Recent advances in eye tracking technology have made eye trackers lighter, cheaper and more accurate, broadening the potential to interact with maps via gaze. In this study, we focused exclusively on using gaze to choose map features (i.e., points, polylines and polygons) via the select operation, a fundamental action preceding other operations in map interactions. We adopted an approach based on the dwell time and buffer size to address the low spatial accuracy and Midas touch problem in gaze-based interactions and to determine the most suitable dwell time and buffer size for the gaze-based selection of map features. We conducted an experiment in which 38 participants completed a series of map feature selection tasks via gaze. We compared the participants’ performance (efﬁciency and accuracy) between different combinations of dwell times (200 ms, 600 ms and 1000 ms) and buffer sizes (point: 1 ◦ , 1.5 ◦ , and 2 ◦ ; polyline: 0.5 ◦ , 0.7 ◦ and 1 ◦ ). The results conﬁrmed that a larger buffer size raised efﬁciency but reduced accuracy, whereas a longer dwell time lowered efﬁciency but enhanced accuracy. Speciﬁcally, we found that a 600 ms dwell time was more efﬁcient in selecting map features than 200 ms and 1000 ms but was less accurate than 1000 ms. However, 600 ms was considered to be more appropriate than 1000 ms because a longer dwell time has a higher risk of causing visual fatigue. Therefore, 600 ms supports a better balance between accuracy and efﬁciency. Additionally, we found that buffer sizes of 1.5 ◦ and 0.7 ◦ were more efﬁcient and more accurate than other sizes for selecting points and polylines, respectively. Our results provide important empirical evidence for choosing the most appropriate dwell times and buffer sizes for gaze-based map interactions.


Introduction
Interactivity is a basic and indispensable characteristic of modern electronic/web maps and geographic information systems (GIS).The interactions between maps and users play an important role in the effective and efficient transmission of spatial information through web maps.Mouse and touch, two well-established methods in human-computer interactions (HCIs), are commonly used to interact with maps on computer screens and mobile devices, although recent advances in sensing technology (e.g., gaze tracking and gesture sensing) have broadened the potential of new map interaction methods, such as gaze-and gesture-based interactions [1,2].In particular, eye trackers are becoming lighter, ISPRS Int.J. Geo-Inf.2022, 11, 127 2 of 18 cheaper and more accurate; consequently, users of eye trackers now extend from scientific researchers to the general public.Indeed, eye trackers may become ubiquitous in the near future, with such sensors being utilized in everyday life.Therefore, gaze-controlled HCIs, which can provide more intelligent and personalized information to users, may become pervasive [3][4][5][6].More importantly, gaze-controlled HCIs are a vital assistive technology since they only require users to move their eyes, freeing up their hands.This allows people with disabilities (e.g., paralysis and Lou Gehrig's disease) to interact with their smartphones and computers [7][8][9].
Select is a fundamental operation in HCI; it is a prerequisite of many other functions (e.g., deleting an item requires that the item be selected).Selected items are often highlighted (e.g., with a different color or in bold), and their locations can be easily tracked.Yi et al. [10] defined select as an operation that "provides users with the ability to mark a data item(s) of interest to keep track of it" (p.1226).Yi et al. [10] also identified six other interactions (namely, explore, reconfigure, encode, abstract/elaborate, filter and connect) and considered select to be the action that precedes all other operations.Similarly, in map interactions, select acts as a prerequisite for numerous other operations, such as identifying, comparing and ranking objects [11].For example, in common GIS applications, such as ArcMap and QGIS, after selecting a point (e.g., representing a city), we can identify its attributes (e.g., name, country and population).
This study focuses exclusively on the selection of map features (i.e., selecting a point, polyline or polygon) via gaze.Using gaze to select objects may be more efficient than using either mouse or touch because eye movements are initially faster [12].However, gaze-based selection encounters at least two difficulties [13,14]: low spatial accuracy and the Midas touch problem (Figure 1).The former signifies that the spatial accuracy of gaze is not as high as that of a cursor being maneuvered by a mouse, leading to the selection of unwanted features.In contrast, the Midas touch problem occurs when users' unconscious eye movements (fixations, saccades and blinks) are captured by the system, thereby triggering unintentional interactions [15].
ISPRS Int.J. Geo-Inf.2022, 11, 127 2 of 18 gaze-and gesture-based interactions [1,2].In particular, eye trackers are becoming lighter, cheaper and more accurate; consequently, users of eye trackers now extend from scientific researchers to the general public.Indeed, eye trackers may become ubiquitous in the near future, with such sensors being utilized in everyday life.Therefore, gaze-controlled HCIs, which can provide more intelligent and personalized information to users, may become pervasive [3][4][5][6].More importantly, gaze-controlled HCIs are a vital assistive technology since they only require users to move their eyes, freeing up their hands.This allows people with disabilities (e.g., paralysis and Lou Gehrig's disease) to interact with their smartphones and computers [7][8][9].
Select is a fundamental operation in HCI; it is a prerequisite of many other functions (e.g., deleting an item requires that the item be selected).Selected items are often highlighted (e.g., with a different color or in bold), and their locations can be easily tracked.Yi et al. [10] defined select as an operation that "provides users with the ability to mark a data item(s) of interest to keep track of it" (p.1226).Yi et al. [10] also identified six other interactions (namely, explore, reconfigure, encode, abstract/elaborate, filter and connect) and considered select to be the action that precedes all other operations.Similarly, in map interactions, select acts as a prerequisite for numerous other operations, such as identifying, comparing and ranking objects [11].For example, in common GIS applications, such as ArcMap and QGIS, after selecting a point (e.g., representing a city), we can identify its attributes (e.g., name, country and population).
This study focuses exclusively on the selection of map features (i.e., selecting a point, polyline or polygon) via gaze.Using gaze to select objects may be more efficient than using either mouse or touch because eye movements are initially faster [12].However, gazebased selection encounters at least two difficulties [13,14]: low spatial accuracy and the Midas touch problem (Figure 1).The former signifies that the spatial accuracy of gaze is not as high as that of a cursor being maneuvered by a mouse, leading to the selection of unwanted features.In contrast, the Midas touch problem occurs when users' unconscious eye movements (fixations, saccades and blinks) are captured by the system, thereby triggering unintentional interactions [15].One possible solution for both of the abovementioned problems is to enlarge the target's size and increase the dwell time on the target needed to trigger the select operation [16].Penkar et al. [17] compared different combinations of button sizes and dwell times One possible solution for both of the abovementioned problems is to enlarge the target's size and increase the dwell time on the target needed to trigger the select operation [16].Penkar et al. [17] compared different combinations of button sizes and dwell times used to trigger the selection of buttons and found that larger buttons and longer dwell times resulted in higher efficiency than smaller buttons and shorter dwell times.Likewise, Niu et al. [13] reported that larger buttons could increase the selection accuracy.In map interaction, button size can be replaced by the buffer size of map features (points and polylines): if one's gaze falls into the buffer area of a map feature and remains fixated in that area for a certain dwell time, then the feature is selected.However, increasing the buffer size could decrease the capacity of map information [18], and a longer dwell time may delay interactions and cause users to become visually fatigued more easily (Figure 2).Therefore, it is necessary to balance both the buffer size and map information capacity and the dwell time and interaction delay (visual fatigue).As reviewed in the next section, researchers have suggested empirical values of button size and dwell time for gaze-controlled object selection in HCIs.However, maps are significantly different from general user interfaces with regard to HCIs, and thus, whether these values are appropriate for selecting map features remains unknown, particularly because empirical evidence is rare.
ISPRS Int.J. Geo-Inf.2022, 11, 127 3 of 18 used to trigger the selection of buttons and found that larger buttons and longer dwell times resulted in higher efficiency than smaller buttons and shorter dwell times.Likewise, Niu et al. [13] reported that larger buttons could increase the selection accuracy.In map interaction, button size can be replaced by the buffer size of map features (points and polylines): if one's gaze falls into the buffer area of a map feature and remains fixated in that area for a certain dwell time, then the feature is selected.However, increasing the buffer size could decrease the capacity of map information [18], and a longer dwell time may delay interactions and cause users to become visually fatigued more easily (Figure 2).Therefore, it is necessary to balance both the buffer size and map information capacity and the dwell time and interaction delay (visual fatigue).As reviewed in the next section, researchers have suggested empirical values of button size and dwell time for gaze-controlled object selection in HCIs.However, maps are significantly different from general user interfaces with regard to HCIs, and thus, whether these values are appropriate for selecting map features remains unknown, particularly because empirical evidence is rare.In this study, we compared different combinations of two key parameters for selecting map features: buffer size and dwell time.We intended to find the most appropriate parameter values for gaze-based map interactions.After reviewing the related work in Section 2, we developed a testing platform and conducted an experiment in which participants completed a series of map feature selection tasks using their gaze; details of this experiment are presented in Section 3. Next, as described in Section 4, we compared the task completion efficiency and accuracy of the participants and analyzed their experience with gaze-based selection.We then discuss the results in Section 5 and conclude the article in Section 6.

Gaze-Based Interactions in HCI
By tracking and processing users' eye movements (in xy coordinates with time t, called gazes) in real time, a computer system can respond to users' visual attention (i.e., gaze-based interaction) [19]; this technique is based on the eye-mind assumption that the focus of one's eye reflects what one is thinking [20].Various methods have been developed to track the human gaze.The three dominant technologies widely used in research and commercial applications include video oculography, video-based infrared pupil-corneal reflection, and electrooculography [21,22].In this study, we focused on remote, video-based eye trackers, such as Tobii Eye Tracker 4C [23].In this study, we compared different combinations of two key parameters for selecting map features: buffer size and dwell time.We intended to find the most appropriate parameter values for gaze-based map interactions.After reviewing the related work in Section 2, we developed a testing platform and conducted an experiment in which participants completed a series of map feature selection tasks using their gaze; details of this experiment are presented in Section 3. Next, as described in Section 4, we compared the task completion efficiency and accuracy of the participants and analyzed their experience with gaze-based selection.We then discuss the results in Section 5 and conclude the article in Section 6.

Background and Related Work 2.1. Gaze-Based Interactions in HCI
By tracking and processing users' eye movements (in xy coordinates with time t, called gazes) in real time, a computer system can respond to users' visual attention (i.e., gaze-based interaction) [19]; this technique is based on the eye-mind assumption that the focus of one's eye reflects what one is thinking [20].Various methods have been developed to track the human gaze.The three dominant technologies widely used in research and commercial applications include video oculography, video-based infrared pupil-corneal reflection, and electrooculography [21,22].In this study, we focused on remote, video-based eye trackers, such as Tobii Eye Tracker 4C [23].
Multiple solutions have been proposed for eye movement-based HCIs, with each having unique benefits and drawbacks.Blinking eyes (i.e., the opening and closing of individual or both eyes) to select objects is one possible approach [24][25][26].However, blinking is subject mostly to unconscious control and is thus unnatural for interactions.Alternatively, gaze gestures (i.e., specific eye movement sequences such as a horizontal followed by a vertical stroke) are robust against unintentional interactions [27-29].Hyrskykari et al. [14] compared the performance of gaze gestures and dwell-based interactions and discovered that the former was as good as or better than the latter.However, gaze gestures necessitate additional time after finishing gestures before initiating actions; furthermore, gaze gestures require users to learn and memorize the gestures.
Using longer dwell times with larger target sizes is a common, robust approach [30].However, larger sizes and longer dwell times take up more screen space and cause visual fatigue, respectively.Consequently, researchers have suggested empirical values of button size and dwell time for object selection.For example, Paulus and Remijn [31] found that a dwell time of 600 ms was rated easiest by users to select objects.Niu et al. [13] found that selecting a button size of 256 × 256 px was more accurate than selecting a button size of 128 × 128 px, whereas Penkar et al. [17] identified a circular area of 150 px as being the easiest to select.Feit et al. [32] suggested that the shape and size of buttons should be designed individually according to the position of the button on the screen.
In summary, previous studies have suggested that gaze gestures and dwell-based methods are promising approaches to prevent the Midas touch problem and can both be combined with mouse or touch methods to realize multimodal interactions [19].In this study, however, we focused specifically on using a dwell-based approach to interact with map features, leaving gaze gestures and multimodal interactions as topics for future research.

Gaze-Based Interactions in Geo-Application
Eye tracking has long been used in cartography and GIS to investigate pedestrian navigation and the usability and perception of maps [33][34][35][36][37][38][39][40].In contrast, little research has explored the use of eye tracking for gaze-based map interactions and geo-applications.For example, Göbel et al. [2] designed a gaze-controlled campus map enabling users to identify buildings and check indoor and outdoor routes.In this map, users do not interact directly with the map features but instead fixate on hierarchical menus (buttons) to navigate the map.Similarly, Zhu et al. [41] developed a gaze-controlled map that enables users to identify features, zoom in or out, and pan the map.Giannopoulos et al. [42] presented a gaze-based pedestrian navigation system that communicates route information based on where and what users are looking at.Kwok et al. [5] designed a gaze-guided narrative system that provides voice content adaptation to city tourists; this system first determines which building a user is looking at through a head-mounted eye tracker and then provides relevant information about the building through voice.
Unlike previous studies that targeted building gaze-based geo-applications, this study is aimed at a more fundamental problem in gaze-based map interaction: how can a map feature (point, polyline or polygon) be selected via gaze?Although various approaches have been adopted in HCIs (as reviewed in Section 2.1), it is unknown whether these approaches are applicable to the selection of map features because map features differ significantly from menus and buttons in HCIs.This is because menu/button sizes, margins, and placements can all be adjusted methodically in gaze-based HCIs.Map features, on the other hand, are abstract representations of real-world objects, therefore their sizes and placements cannot be altered.Therefore, we explored appropriate values for two parameters, namely, buffer size and dwell time, for dwell-based map feature selection.The details of this method are presented in the next section.

Participants
A total of 39 participants (25 males and 13 females) with normal or corrected-tonormal vision were recruited for this study.The age of the participants was between 19 and 32 years (M = 26, SD = 2.25).They were undergraduate or masters students from the School of Geographic Science, Hunan Normal University.All students had a background in geography.All participants, who were unaware of the purpose of the experiment, volunteered to participate and signed an informed consent form.Each participant was given a small gift as compensation.

Apparatus and Software
We implemented the test platform using a Tobii Eye Tracker 4C (Tobii, Sweden, www.tobii.com,accessed on 10 February 2022) [23] connected to a Lenovo Yoga 14C laptop (Lenovo, China, www.lenovo.com.cn,accessed on 10 February 2022; Intel i7 1165G7 CPU, 2.8 GHz and 16 GB RAM).The eye tracker had a sampling rate of 90 Hz.The laptop had a 14-inch LED screen with a full HD resolution of 1920 × 1080 px (30.8 cm × 17.4 cm) and was running Microsoft Windows 10.The platform was a Windows desktop application that was implemented using the C# programming language, based on ESRI ArcObjects 10.2 and Tobii Interactor application programming interfaces 0.7.3 (APIs).The participants were positioned at a distance of approximately 50 cm and were allowed to move their heads naturally.A visual angle of 1 • corresponds to a distance of ≈0.87 cm (≈54 px) on the screen.
These apparatuses were sited in a GIS laboratory at the School of Geographic Sciences under good lighting conditions (Figure 3).The participants were not disturbed by others during the experiment.All participants performed the experiment in the same environment.

Participants
A total of 39 participants (25 males and 13 females) with normal or corrected-to-normal vision were recruited for this study.The age of the participants was between 19 and 32 years (M = 26, SD = 2.25).They were undergraduate or masters students from the School of Geographic Science, Hunan Normal University.All students had a background in geography.All participants, who were unaware of the purpose of the experiment, volunteered to participate and signed an informed consent form.Each participant was given a small gift as compensation.

Apparatus and Software
We implemented the test platform using a Tobii Eye Tracker 4C (Tobii, Sweden, www.tobii.com,accessed on 10 February 2022) [23] connected to a Lenovo Yoga 14C laptop (Lenovo, China, www.lenovo.com.cn,accessed on 10 February 2022; Intel i7 1165G7 CPU, 2.8 GHz and 16 GB RAM).The eye tracker had a sampling rate of 90 Hz.The laptop had a 14-inch LED screen with a full HD resolution of 1920 × 1080 px (30.8 cm × 17.4 cm) and was running Microsoft Windows 10.The platform was a Windows desktop application that was implemented using the C# programming language, based on ESRI ArcObjects 10.2 and Tobii Interactor application programming interfaces 0.7.3 (APIs).The participants were positioned at a distance of approximately 50 cm and were allowed to move their heads naturally.A visual angle of 1° corresponds to a distance of ≈0.87 cm (≈54 px) on the screen.
These apparatuses were sited in a GIS laboratory at the School of Geographic Sciences under good lighting conditions (Figure 3).The participants were not disturbed by others during the experiment.All participants performed the experiment in the same environment.

Parameter Values to Be Tested
Only points and polylines were tested with different buffer sizes (radii).Several considerations were taken into account to determine appropriate buffer sizes to test.First, the spatial accuracy of the Tobii Eye Tracker 4C (and many other eye trackers) is 0.5° [32], and thus, the buffer size should be at least 0.5°.Second, neuroscience studies have shown that the foveal vision of adults is limited to the central 1.5-2° of the visual field [43,44].Third, commonly tested button sizes in previous studies ranged from 3.3° to 7.7° (radii of approximately 1.6° to 3.8°) [13,32].Fourth, points and polylines should be assigned different buffer sizes because they differ geometrically: points are 0-dimensional objects and polylines are 1-dimensional objects with an attribute of length.Figure 4 shows examples of

Parameter Values to Be Tested
Only points and polylines were tested with different buffer sizes (radii).Several considerations were taken into account to determine appropriate buffer sizes to test.First, the spatial accuracy of the Tobii Eye Tracker 4C (and many other eye trackers) is 0.5 • [32], and thus, the buffer size should be at least 0.5 • .Second, neuroscience studies have shown that the foveal vision of adults is limited to the central 1.5-2 • of the visual field [43,44].Third, commonly tested button sizes in previous studies ranged from 3.3 • to 7.7 • (radii of approximately 1.6 • to 3.8 • ) [13,32].Fourth, points and polylines should be assigned different buffer sizes because they differ geometrically: points are 0-dimensional objects and polylines are 1-dimensional objects with an attribute of length.Figure 4 shows examples of three buffer sizes for points and polylines.For a 1 • buffer size, the buffer areas of polylines (Figure 4f) are much larger and can produce more overlaps than the buffer areas of points (Figure 4a).Therefore, based on these considerations, we chose 1 • , 1.5 • , and 2 • as buffer size candidates for selecting points and 0.5 • , 0.7 • and 1 • as candidates for selecting polylines (Table 1).Note that we used angular units rather than screen pixels to measure the buffer radius because the screen pixel size depends on the size of the physical display.The determination of candidate dwell times similarly requires several considerations.First, the typical fixation duration of visual search ranges from 180 ms to 275 ms [44].Liao et al. [45] showed that the mean fixation duration of map reading varied from 197 ms to 292 ms.Second, common dwell times tested by previous studies ranged from 200 ms to 2000 ms [13,17,31].Therefore, based on previous research and human physiological characteristics, we selected 200 ms, 600 ms and 1000 ms as three dwell time candidates.These three candidates were applied to all three types of map features.Table 1 summarizes the tested dwell times and buffer sizes for the different types of map features.

Stimuli and Tasks
We designed 24 maps (8 maps for each type of map feature) that were divided into two types: map layers (n = 12) and thematic maps (n = 12).The map layers were composed of a single type of feature (e.g., containing only points, polylines or polygons; see Figure 5, left column) and thus were incomplete because they did not contain other map ele- Table 1.Tested buffer sizes and dwell times for gaze-based map feature selection.The screen pixel distance and physical buffer size were calculated by assuming that the users' eyes were 50 cm away from a 14-inch screen (1920 × 1080 px, 30.8 cm × 17.4 cm).Note that the actual size (diameter) is twice the buffer size.

Feature Dwell Time (ms) Buffer Size (Radius)
Point 200, 600, 1000 The determination of candidate dwell times similarly requires several considerations.First, the typical fixation duration of visual search ranges from 180 ms to 275 ms [44].Liao et al. [45] showed that the mean fixation duration of map reading varied from 197 ms to 292 ms.Second, common dwell times tested by previous studies ranged from 200 ms to 2000 ms [13,17,31].Therefore, based on previous research and human physiological characteristics, we selected 200 ms, 600 ms and 1000 ms as three dwell time candidates.These three candidates were applied to all three types of map features.Table 1 summarizes the tested dwell times and buffer sizes for the different types of map features.

Stimuli and Tasks
We designed 24 maps (8 maps for each type of map feature) that were divided into two types: map layers (n = 12) and thematic maps (n = 12).The map layers were composed of a single type of feature (e.g., containing only points, polylines or polygons; see Figure 5, left column) and thus were incomplete because they did not contain other map elements, such as labels, titles, or legends.We designed these map layers to avoid the influences of other map elements on the selection of features.In contrast, the thematic maps were complete maps that contained all three feature types and other map elements (see Figure 5, right column).As only the select operation was considered in this study, the participants were not allowed to zoom in and out of or pan the maps.
ments, such as labels, titles, or legends.We designed these map layers to avoid the in ences of other map elements on the selection of features.In contrast, the thematic m were complete maps that contained all three feature types and other map elements Figure 5, right column).As only the select operation was considered in this study, the ticipants were not allowed to zoom in and out of or pan the maps.Each map was associated with an instruction that was presented to the particip during the experiment, resulting in a total of 24 instructions.For the map layers, exam instructions included "Please select the red point" (Figure 5a), "Please select the red yline" (Figure 5c), and "Please select the yellow polygon" (Figure 5e).For the them maps, example instructions included "Please select Anqiu city" (Figure 5b), "Please s the river labeled 'L'" (Figure 5d), and "Please select the area with the highest preci tion" (Figure 5f).
For each map with point (n = 8) and polyline (n = 8) features, we generated three m using three buffer sizes in Table 1, resulting in 48 (=8 × 3 + 8 × 3) maps.If the feature bu areas overlapped, the feature that was closest to the users' gaze point was selected.N Each map was associated with an instruction that was presented to the participants during the experiment, resulting in a total of 24 instructions.For the map layers, example instructions included "Please select the red point" (Figure 5a), "Please select the red polyline" (Figure 5c), and "Please select the yellow polygon" (Figure 5e).For the thematic maps, example instructions included "Please select Anqiu city" (Figure 5b), "Please select the river labeled 'L'" (Figure 5d), and "Please select the area with the highest precipitation" (Figure 5f).
For each map with point (n = 8) and polyline (n = 8) features, we generated three maps using three buffer sizes in Table 1, resulting in 48 (=8 × 3 + 8 × 3) maps.If the feature buffer areas overlapped, the feature that was closest to the users' gaze point was selected.Note that the buffer areas were not shown to the participants.Note that we did not generate buffers for the polygon maps because we did not intend to test the buffer size for polygon selection.Therefore, we employed a total of 56 maps (24 point maps, 24 polyline maps and 8 polygon maps), and 56 maps with three dwell times (200 ms, 600 ms, and 1000 ms) produced 168 tasks.

Procedure
The experiment consisted of three sessions: calibration, training and selection (Figure 6).
interface by pressing the space bar after reading and understanding the task instru tion correctly (Figure 7b).At this point, the participants needed to select the require map feature using their gaze.Any feature satisfying the parameter thresholds (s Section 3.3) was selected (highlighted in Figure 7c).Once they selected the require target, they were required to press the space bar to submit it as their final answer soon as possible.To avoid the impact of the break-offs (e.g., participants' phone ca or the accidental disconnection of eye tracker) during the experiment, participan could press the enter key to skip the task.Whether the participant pressed the spa bar or the enter key, the next task instruction would be presented (Figure 7d), but t skipped tasks would not be counted into the results.

1.
The calibration session.The participants were first welcomed and were seated in front of a laptop in a comfortable position.They were given a brief introduction to eye tracking, and then a 6-point calibration method was used to calibrate the participants' eyes for the eye tracker.The calibration process was repeated during the experiment when necessary.

2.
The training session.The participants had at least five minutes to learn how to interact with the computer using their gazes; they practiced with the demos provided by Tobii.They were then required to finish five training tasks to ensure that they were familiar with the experimental procedure.

3.
The selection session.In this session, the participants were required to finish 168 tasks.We used a within-subject design, which means that all participants were presented with all maps.Each map was repeated with different combinations of buffer sizes and dwell times.To counter the learning effect, we adopted a Latin square-based order to present the tasks (see Figure 6).In each task, the task instruction was presented on the screen (Figure 7a).The participants were allowed to switch to the map interface by pressing the space bar after reading and understanding the task instruction correctly (Figure 7b).At this point, the participants needed to select the required map feature using their gaze.Any feature satisfying the parameter thresholds (see Section 3.3) was selected (highlighted in Figure 7c).Once they selected the required target, they were required to press the space bar to submit it as their final answer as soon as possible.
To avoid the impact of the break-offs (e.g., participants' phone call or the accidental disconnection of eye tracker) during the experiment, participants could press the enter key to skip the task.Whether the participant pressed the space bar or the enter key, the next task instruction would be presented (Figure 7d), but the skipped tasks would not be counted into the results.3.6.Analysis

Data Quality Check
A total of 6530 trials from 39 subjects were collected in the experiment.One subject quit the experiment for personal reasons and thus was excluded from further analysis.In addition, due to technical issues (the eye tracker was disconnected in some trials) and other uncontrollable break-offs during the experiment, there are 775 invalid trials that were excluded from the results.After careful inspection, a total of 5775 valid trials from 38 subjects were retained; 2486 were point selection trials, 2431 were polyline selection trials, and 858 were polygon selection trials.

Evaluation Metric
For each participant and for each type of map feature, we calculated the response time and accuracy to measure the participants' performance.
(1) Response time.The average task completion time (s) for each task (trial) was calculated to measure the participants' efficiency.
(2) Accuracy.In a task, when a feature was marked as a candidate (highlighted), it meant that a participant triggered a select operation.Note that the selected feature may or may not have been the target that the task required the participant to select because unwanted features might have been selected.Figure 8 shows an example.The accuracy of the task was calculated as 1 or 0 (depending on whether the participant selected the required target correctly regardless of how many select operations were triggered) divided by the total count of select operations in this task.Finally, the mean accuracy of all tasks was calculated as the accuracy of the participant.3.6.Analysis 3.6.1.Data Quality Check A total of 6530 trials from 39 subjects were collected in the experiment.One subject quit the experiment for personal reasons and thus was excluded from further analysis.In addition, due to technical issues (the eye tracker was disconnected in some trials) and other uncontrollable break-offs during the experiment, there are 775 invalid trials that were excluded from the results.After careful inspection, a total of 5775 valid trials from 38 subjects were retained; 2486 were point selection trials, 2431 were polyline selection trials, and 858 were polygon selection trials.

Evaluation Metric
For each participant and for each type of map feature, we calculated the response time and accuracy to measure the participants' performance.
(1) Response time.The average task completion time (s) for each task (trial) was calculated to measure the participants' efficiency.
(2) Accuracy.In a task, when a feature was marked as a candidate (highlighted), it meant that a participant triggered a select operation.Note that the selected feature may or may not have been the target that the task required the participant to select because unwanted features might have been selected.Figure 8 shows an example.The accuracy of the task was calculated as 1 or 0 (depending on whether the participant selected the required target correctly regardless of how many select operations were triggered) divided by the total count of select operations in this task.Finally, the mean accuracy of all tasks was calculated as the accuracy of the participant.Unwanted selections were considered in the accuracy calculation because in practical gaze-based HCIs, such unwanted selections could trigger subsequent operations that disrupt users' intended interactions and therefore lower the interaction fluency.In our method, a task with 100% accuracy means that the user triggered only one (the correct) selection for this task, whereas an accuracy of <100% means that the users selected unwanted features.In particular, a task with 0% means that the participant submitted an incorrect answer for the task.
For both response time and accuracy (dependent variables), we applied a two-way analysis of variance (ANOVA) to detect the main effects and interaction effects of buffer size and dwell time.According to [46,47], we have also calculated the partial eta-squared (ηp 2 ; ANOVA) and Cohen's d value (d) [48] to detect the effect sizes of them.We then conducted pairwise comparisons to detect whether there were statistically significant differences between different buffer sizes and between different dwell times.Note that only two independent variables (i.e., buffer size and dwell time) were included in the ANOVA, with each variable having three levels (3 × 3 combinations).We did not consider the type of stimulus (map layers versus thematic maps) as an independent variable because it was not the main focus of the study.

General Performance of Efficiency and Accuracy
Figure 9 shows the response time (efficiency) and accuracy of point selection under different combinations of buffer size and dwell time.The mean (M) and standard deviation (SD) values of the response time and accuracy are shown in Tables 2 and 3, respectively.Two-way ANOVA showed that both dwell time (p = 0.000 < 0.001, ηp 2 = 0.527) and buffer size (p = 0.000 < 0.001, ηp 2 = 0.034) had a significant main effect on efficiency, whereas their interaction effect was not significant (p = 0.096 > 0.05, ηp 2 = 0.006).A significant main effect on accuracy was also observed for both dwell time (p = 0.032 < 0.05, ηp 2 = 0.012) and buffer size (p = 0.015 < 0.05, ηp 2 = 0.022), but again, their interaction effect was not significant (p = 0.477 > 0.05, ηp 2 = 0.003).Therefore, in the following, we first analyzed the effects of buffer size (regardless of dwell time) on efficiency and accuracy and then analyzed the effects of dwell time (regardless of buffer size).Unwanted selections were considered in the accuracy calculation because in practical gaze-based HCIs, such unwanted selections could trigger subsequent operations that disrupt users' intended interactions and therefore lower the interaction fluency.In our method, a task with 100% accuracy means that the user triggered only one (the correct) selection for this task, whereas an accuracy of <100% means that the users selected unwanted features.In particular, a task with 0% means that the participant submitted an incorrect answer for the task.
For both response time and accuracy (dependent variables), we applied a two-way analysis of variance (ANOVA) to detect the main effects and interaction effects of buffer size and dwell time.According to [46,47], we have also calculated the partial eta-squared (ηp 2 ; ANOVA) and Cohen's d value (d) [48] to detect the effect sizes of them.We then conducted pairwise comparisons to detect whether there were statistically significant differences between different buffer sizes and between different dwell times.Note that only two independent variables (i.e., buffer size and dwell time) were included in the ANOVA, with each variable having three levels (3 × 3 combinations).We did not consider the type of stimulus (map layers versus thematic maps) as an independent variable because it was not the main focus of the study.

Point Selection 4.1.1. General Performance of Efficiency and Accuracy
Figure 9 shows the response time (efficiency) and accuracy of point selection under different combinations of buffer size and dwell time.The mean (M) and standard deviation (SD) values of the response time and accuracy are shown in Tables 2 and 3, respectively.Two-way ANOVA showed that both dwell time (p = 0.000 < 0.001, ηp 2 = 0.527) and buffer size (p = 0.000 < 0.001, ηp 2 = 0.034) had a significant main effect on efficiency, whereas their interaction effect was not significant (p = 0.096 > 0.05, ηp 2 = 0.006).A significant main effect on accuracy was also observed for both dwell time (p = 0.032 < 0.05, ηp 2 = 0.012) and buffer size (p = 0.015 < 0.05, ηp 2 = 0.022), but again, their interaction effect was not significant (p = 0.477 > 0.05, ηp 2 = 0.003).Therefore, in the following, we first analyzed the effects of buffer size (regardless of dwell time) on efficiency and accuracy and then analyzed the effects of dwell time (regardless of buffer size).Figure 10 shows the results of pairwise comparisons between response time and accuracy under different buffer sizes.With increasing buffer size, the efficiency increased (i.e., the response time decreased), whereas the accuracy decreased.The response time under a 1° buffer (M = 3.69 s, SD = 3.22 s) was significantly longer than those under 1.5° (M = 3.02 s, SD = 3.15 s, p = 0.000 < 0.001, d = 0.224) and 2° (M = 2.94 s, SD = 2.91 s, p = 0.000 < 0.001, d = 0.252) buffers.Moreover, the accuracy of both the 1° buffer (M = 0.79, SD = 0.24) and the 1.5° buffer (M = 0.77, SD = 0.24) was significantly higher than that of the 2° buffer (M = 0.70, SD = 0.25) (p = 0.000 < 0.001, d = 0.368 and p = 0.006 < 0.01, d = 0.287, respectively).In summary, for point selection, the 1.5° and 2° buffers were significantly more efficient than the 1.0° buffer, but the 1.0° and 1.5° buffers were more accurate than the 2° buffer.Therefore, 1.5° is better than the other two sizes considering both efficiency and accuracy.

Effects of Buffer Size on Efficiency and Accuracy
Figure 10 shows the results of pairwise comparisons between response time and accuracy under different buffer sizes.With increasing buffer size, the efficiency increased (i.e., the response time decreased), whereas the accuracy decreased.The response time under a 1 • buffer (M = 3.69 s, SD = 3.22 s) was significantly longer than those under 1.5 • (M = 3.02 s, SD = 3.15 s, p = 0.000 < 0.001, d = 0.224) and 2 • (M = 2.94 s, SD = 2.91 s, p = 0.000 < 0.001, d = 0.252) buffers.Moreover, the accuracy of both the 1 • buffer (M = 0.79, SD = 0.24) and the 1.5 • buffer (M = 0.77, SD = 0.24) was significantly higher than that of the 2 • buffer (M = 0.70, SD = 0.25) (p = 0.000 < 0.001, d = 0.368 and p = 0.006 < 0.01, d = 0.287, respectively).In summary, for point selection, the 1.5 • and 2 • buffers were significantly more efficient than the 1.0 • buffer, but the 1.0 • and 1.5 • buffers were more accurate than the 2 • buffer.Therefore, 1.5 • is better than the other two sizes considering both efficiency and accuracy.

General Performance of Efficiency and Accuracy
Figure 12 shows the response time and accuracy of polyline selection under different dwell times and buffer sizes.The detailed data are presented in Tables 4 and 5, demonstrating that a 600 ms dwell time and a 1° buffer resulted in the shortest response time (M = 2.74 s, SD = 2.52 s), whereas 1000 ms and 0.5° led to the longest response time (M = 5.18For accuracy, 600 ms resulted in significantly higher accuracy (M = 0.85, SD = 0.15) than 200 ms (M = 0.47, SD = 0.15, p = 0.000 < 0.001, d = 2.533), while 1000 ms (M = 0.93, SD = 0.12) resulted in significantly higher accuracy than both 600 ms (p = 0.000 < 0.001, d = 0.584) and 200 ms (p = 0.000 < 0.001, d = 3.382).Combining the efficiency and efficiency results indicates that a dwell time threshold of 600 ms is better than 200 ms and 1000 ms for point selection.

General Performance of Efficiency and Accuracy
Figure 12 shows the response time and accuracy of polyline selection under different dwell times and buffer sizes.The detailed data are presented in Tables 4 and 5, demonstrating that a 600 ms dwell time and a 1° buffer resulted in the shortest response time (M = 2.74 s, SD = 2.52 s), whereas 1000 ms and 0.5° led to the longest response time (M = 5.18

Polyline Selection 4.2.1. General Performance of Efficiency and Accuracy
Figure 12 shows the response time and accuracy of polyline selection under different dwell times and buffer sizes.The detailed data are presented in Tables 4 and 5, demonstrating that a 600 ms dwell time and a 1 • buffer resulted in the shortest response time (M = 2.74 s, SD = 2.52 s), whereas 1000 ms and 0.5 • led to the longest response time (M = 5.18 s, SD = 5.60).However, 1000 ms and 0.5 • had the highest accuracy (M = 0.94, SD = 0.10), whereas 200 ms in combination with 1 • had the lowest accuracy (M = 0.45, SD = 0.14).

Effects of Buffer Size on Efficiency and Accuracy
Similar to point selection (Figure 10), in polyline selection, with an increase in buffer size, the efficiency increased, but the accuracy decreased (Figure 13).The response time of the 0.5 • buffer (M = 4.33 s, SD = 5.03 s) was significantly longer than those of both the 0.7 • (M = 3.54 s, SD = 3.92 s, p = 0.000 < 0.001, d = 0.175) and the 1 • (M = 3.16 s, SD = 3.30 s, p = 0.000 < 0.001, d = 0.275) buffer sizes.However, the accuracy of the 0.5 • buffer size (M = 0.80, SD = 0.22) was significantly higher than that of the 1 • buffer (M = 0.72, SD = 0.24, p = 0.014 < 0.05, d = 0.347).In summary, buffer sizes of 0.7 • and 1.0 • exhibited similar performance in terms of efficiency and accuracy.However, considering that a larger buffer size may decrease the capacity of map information, 0.7 • is preferred over 1.0 • for polyline selection.

Polygon Selection
Only dwell time was tested for polygon selection, and the results are shown in Figure 15.One-way ANOVA showed that a 1000 ms dwell time resulted in the longest response time (M = 3.94 s, SD = 3.36 s), which was significantly longer than the response times under dwell times of 600 ms (M = 2.98 s, SD = 2.45 s, p = 0.000 < 0.001, d = 0.326) and 200 ms (M = 3.18 s, SD = 3.22 s, p = 0.000 < 0.001, d = 0.231).Moreover, the accuracy under 600 ms (M =

Polygon Selection
Only dwell time was tested for polygon selection, and the results are shown in Figure 15.One-way ANOVA showed that a 1000 ms dwell time resulted in the longest response time (M = 3.94 s, SD = 3.36 s), which was significantly longer than the response times under dwell times of 600 ms (M = 2.98 s, SD = 2.45 s, p = 0.000 < 0.001, d = 0.326) and 200 ms (M = 3.18 s, SD = 3.22 s, p = 0.000 < 0.001, d = 0.231).Moreover, the accuracy under 600 ms (M =

Polygon Selection
Only dwell time was tested for polygon selection, and the results are shown in Figure 15.One-way ANOVA showed that a 1000 ms dwell time resulted in the longest response time (M = 3.94 s, SD = 3.36 s), which was significantly longer than the response times under dwell times of 600 ms (M = 2.98 s, SD = 2.45 s, p = 0.000 < 0.001, d = 0.326) and 200 ms (M = 3.18 s, SD = 3.22 s, p = 0.000 < 0.001, d = 0.231).Moreover, the accuracy under 600 ms (M = 0.77, SD = 0.16) was significantly higher than that under 200 ms (M = 0.23, SD = 0.13, p = 0.000 < 0.001, d = 3.704), and that under 1000 ms (M = 0.88, SD = 0.14) was significantly higher than those under both 600 ms (p = 0.001 < 0.01, d = 0.732) and 200 ms (p = 0.000 < 0.001, d = 4.812).Collectively, these results suggest that a 600 ms dwell time can achieve both higher efficiency and higher accuracy than other dwell times for polygon selection.

Choosing the Appropriate Dwell Time and Buffer Size
For points and polylines, we did not observe significant interaction effects between dwell time and buffer size on efficiency or accuracy (and polygons were tested only with dwell time).Therefore, we summarize the results and suggest appropriate values for these two parameters separately.

Buffer Size
It is unsurprising that for both points and polylines, a larger buffer size could increase efficiency but decrease accuracy (Tables 2-5).For point selection, we found that 1.5° is more appropriate than both 1.0° and 2° considering its efficiency and accuracy (Figure 10).For polyline selection, 0.7° is preferred over 1.0° and 0.5° (Figure 13).

Dwell Time
A dwell time of 600 ms resulted in the more efficient selection of all three types of map features (points, polylines and polygons) than dwell times of 200 ms and 1000 ms (Tables 2 and 4 and Figure 15a).For accuracy, for all three types of map features, the accuracy increased with increasing dwell time (Tables 3 and 5 and Figure 15b).However, the increase in accuracy from 600 ms to 1000 ms was less than that from 200 ms to 600 ms.This means that the added value provided by a dwell time of 1000 ms was less than that provided by 600 ms.Furthermore, a longer dwell time has a higher risk of causing visual fatigue.Therefore, we consider 600 ms to be the most appropriate dwell time for gazebased map feature selection, which is consistent with the conclusions of Penkar et al. [17] and Paulus and Remijn [31].

Limitaions
We created a series of map tasks with the thematic maps in our experiment to imitate a real map reading environment (e.g., Figure 5f: "Please select the area with the highest precipitation").Using these tasks could increase the ecological validity of the experiment because they were typical problems in map reading.However, this is an indirect method to select map features because the participants had to first search for a text (e.g., "Anqiu" city), letter or color before performing the select operation.As a result, the visual search

Choosing the Appropriate Dwell Time and Buffer Size
For points and polylines, we did not observe significant interaction effects between dwell time and buffer size on efficiency or accuracy (and polygons were tested only with dwell time).Therefore, we summarize the results and suggest appropriate values for these two parameters separately.

Buffer Size
It is unsurprising that for both points and polylines, a larger buffer size could increase efficiency but decrease accuracy (Tables 2-5).For point selection, we found that 1.5 • is more appropriate than both 1.0 • and 2 • considering its efficiency and accuracy (Figure 10).For polyline selection, 0.7 • is preferred over 1.0 • and 0.5 • (Figure 13).

Dwell Time
A dwell time of 600 ms resulted in the more efficient selection of all three types of map features (points, polylines and polygons) than dwell times of 200 ms and 1000 ms (Tables 2 and 4 and Figure 15a).For accuracy, for all three types of map features, the accuracy increased with increasing dwell time (Tables 3 and 5 and Figure 15b).However, the increase in accuracy from 600 ms to 1000 ms was less than that from 200 ms to 600 ms.This means that the added value provided by a dwell time of 1000 ms was less than that provided by 600 ms.Furthermore, a longer dwell time has a higher risk of causing visual fatigue.Therefore, we consider 600 ms to be the most appropriate dwell time for gaze-based map feature selection, which is consistent with the conclusions of Penkar et al. [17] and Paulus and Remijn [31].

Limitaions
We created a series of map tasks with the thematic maps in our experiment to imitate a real map reading environment (e.g., Figure 5f: "Please select the area with the highest precipitation").Using these tasks could increase the ecological validity of the experiment because they were typical problems in map reading.However, this is an indirect method to select map features because the participants had to first search for a text (e.g., "Anqiu" city), letter or color before performing the select operation.As a result, the visual search process may have an impact on reaction time (i.e., the response time was not the actual select operation time because it contained both the visual search time and the select operation time).At this stage, we are unable to divide the response time into visual search time and the real select operation time.This is a major limitation of the experiment.

Conclusions and Future Work
This study explored the effects of dwell time and buffer size on the efficiency and accuracy of gaze-based map feature selection.The results showed that for the selection of both point and polyline features, the participants were more efficient but less accurate with a larger buffer size.However, for all three types of map features, the participants were less efficient but more accurate with longer dwell times.Our results showed that a 600 ms dwell time is better than dwell times of 200 ms and 1000 ms for map feature selection and that 1.5 • and 0.7 • buffer sizes are the best parameter values in the selection of points and polylines, respectively.These results provide initial evidence for choosing suitable dwell times and buffer sizes in the development of gaze-based map interactions.
In addition to efficiency and accuracy, user subjective evaluations of gaze-based map interactions are also important; nevertheless, this aspect remains unknown.In the future, we plan to apply these parameter values to gaze-based map interactions and evaluate their usability by comparing them with traditional mouse-and touch-based approaches.

Figure 2 .
Figure 2. Pros and cons of longer/shorter dwell times and larger/smaller buffer sizes.

Figure 2 .
Figure 2. Pros and cons of longer/shorter dwell times and larger/smaller buffer sizes.

Figure 4 .
Figure 4. Examples of three buffer sizes of points (a-c) and polylines (d-f).

Figure 4 .
Figure 4. Examples of three buffer sizes of points (a-c) and polylines (d-f).

Figure 5 .
Figure 5. Example tasks of (a,b) point selection, (c,d) polyline selection and (e,f) polygon selec The left column shows map layers, and the right column shows thematic maps.All maps wer signed using ESRI ArcMap 10.2 (https://www.esri.com/,accessed on 10 February 2022).

Figure 5 .
Figure 5. Example tasks of (a,b) point selection, (c,d) polyline selection and (e,f) polygon selection.The left column shows map layers, and the right column shows thematic maps.All maps were designed using ESRI ArcMap 10.2 (https://www.esri.com/,accessed on 10 February 2022).

Figure 6 .
Figure 6.Procedure of the experiment.Figure 6. Procedure of the experiment.

Figure 6 .
Figure 6.Procedure of the experiment.Figure 6. Procedure of the experiment.

Figure 7 .
Figure 7.The procedure of a task in the testing platform: task instruction (a), map presentation (b), feature selection (c) and another task instruction (d).

Figure 7 .
Figure 7.The procedure of a task in the testing platform: task instruction (a), map presentation (b), feature selection (c) and another task instruction (d).

Figure 8 .
Figure 8. Example of calculating the accuracy of a task.

Figure 8 .
Figure 8. Example of calculating the accuracy of a task.

Figure 9 .
Figure 9. Response time (a) and accuracy (b) of point feature selection under different buffer sizes and dwell times.

Figure 9 .
Figure 9. Response time (a) and accuracy (b) of point feature selection under different buffer sizes and dwell times.

Figure 10 .
Figure 10.ANOVA results of the response time (a) and accuracy (b) for point selection under different buffer sizes.Note: *** p < 0.001, ** p < 0.01 (the same below).

Figure 12 .
Figure 12.Response time (a) and accuracy (b) of polyline selection under different buffer sizes and dwell times.

Figure 12 .
Figure 12.Response time (a) and accuracy (b) of polyline selection under different buffer sizes and dwell times.

Figure 13 .
Figure 13.ANOVA results of the response time (a) and accuracy (b) for polyline feature selection under different buffer sizes.Note: *** p < 0.001, * p < 0.05.4.2.3.Effects of Dwell Time on Efficiency and AccuracyFigure14shows the effects of dwell time on the efficiency and accuracy of polyline selection.The response time under a dwell time of 1000 ms (M = 4.43 s, SD = 4.77 s) was significantly longer than those under dwell times of both 600 ms (M = 3.26 s, SD = 3.24 s, p = 0.000 < 0.001, d = 0.287) and 200 ms (M = 3.34 s, SD = 4.17 s, p = 0.000 < 0.001, d = 0.243).The accuracy under a dwell time of 600 ms (M = 0.85, SD = 0.13) was significantly higher than that under a dwell time of 200 ms (M = 0.50, SD = 0.16, p = 0.000 < 0.001, d = 2.401), whereas the accuracy under 1000 ms (M = 0.91, SD = 0.13) was significantly higher than those under both 600 ms (p = 0.000 < 0.001, d = 0.462) and 200 ms (p = 0.000 < 0.001, d = 2.813).Combining these results, a 600 ms dwell time maintains a better balance between efficiency and accuracy than a dwell time of either 200 ms or 1000 ms.

Figure 14 .
Figure 14.ANOVA results of the response time (a) and accuracy (b) for polyline feature selection under different dwell times.Note: *** p < 0.001.

Figure 13 . 18 Figure 13 .
Figure 13.ANOVA results of the response time (a) and accuracy (b) for polyline feature selection under different buffer sizes.Note: *** p < 0.001, * p < 0.05.4.2.3.Effects of Dwell Time on Efficiency and AccuracyFigure14shows the effects of dwell time on the efficiency and accuracy of polyline selection.The response time under a dwell time of 1000 ms (M = 4.43 s, SD = 4.77 s) was significantly longer than those under dwell times of both 600 ms (M = 3.26 s, SD = 3.24 s, p = 0.000 < 0.001, d = 0.287) and 200 ms (M = 3.34 s, SD = 4.17 s, p = 0.000 < 0.001, d = 0.243).The accuracy under a dwell time of 600 ms (M = 0.85, SD = 0.13) was significantly higher than that under a dwell time of 200 ms (M = 0.50, SD = 0.16, p = 0.000 < 0.001, d = 2.401), whereas the accuracy under 1000 ms (M = 0.91, SD = 0.13) was significantly higher than those under both 600 ms (p = 0.000 < 0.001, d = 0.462) and 200 ms (p = 0.000 < 0.001, d = 2.813).Combining these results, a 600 ms dwell time maintains a better balance between efficiency and accuracy than a dwell time of either 200 ms or 1000 ms.

Figure 14 .
Figure 14.ANOVA results of the response time (a) and accuracy (b) for polyline feature selection under different dwell times.Note: *** p < 0.001.

Figure 14 .
Figure 14.ANOVA results of the response time (a) and accuracy (b) for polyline feature selection under different dwell times.Note: *** p < 0.001.

Table 2 .
Response time (mean and standard deviation) of point selection.

Table 3 .
Accuracy (mean and standard deviation) of point selection.

Table 2 .
Response time (mean and standard deviation) of point selection.

Table 3 .
Accuracy (mean and standard deviation) of point selection.

Table 4 .
Response time (mean and standard deviation) of polyline selection.

Table 5 .
Accuracy (mean and standard deviation) of polyline selection.

Table 4 .
Response time (mean and standard deviation) of polyline selection.

Table 5 .
Accuracy (mean and standard deviation) of polyline selection.