Inﬂuences of Global and Local Features on Eye-Movement Patterns in Visual-Similarity Perception of Synthesized Texture Images

: Global and local features are essential for visual-similarity texture perception. Therefore, understanding how people allocate their visual attention when viewing textures with global or local similarity is important. In this work, we investigate the inﬂuences of global and local features of a texture on eye-movement patterns and analyze the relationship between eye-movement patterns and visual-similarity selection. First, we synthesized textures by separately controlling global and local textural features through the primitive, grain, and point conﬁguration (PGPC) texture model, a mathematical morphology-based texture model. Second, we conducted an experiment to acquire eye-movement data where participants identiﬁed the texture that was highly similar to the standard texture. Experiment data were obtained through an eye-tracker from 60 participants. The collected eye-tracking data were analyzed in terms of three metrics, including total ﬁxation duration in each region of interest (ROI), ﬁxation-point variance in each ROI, and ﬁxation-transfer counts between different ROIs. Analysis results indicated the following. (1) The global and local features of a texture inﬂuenced eye-movement patterns. In particular, the texture image that was globally similar to the standard texture contained dispersed ﬁxation points. By contrast, the texture image that was locally similar to the standard texture contained concentrated ﬁxation points. The domination of global and local features inﬂuenced the viewers’ similarity choice. (2) The ﬁnal visual-similarity selection was related to the ﬁxation-transfer count between different ROIs, but not to the ﬁxation time in each ROI. This research also extends the applicability of the mathematical morphology-based texture model to human visual perception.


Introduction
Whether human visual processing is dominated by holistic or analytic processing is a hot issue in the field of human cognition. Many researchers have investigated the influences of global and local features on visual perception. According to the hypothesis advocated by Navon [1], people perceive a forest before seeing its trees; this assertion emphasizes the precedence of the global features. Navon defined these effects as the "global precedence effect" (GPE) [2]. The GPE can be reduced or even reversed by factors such as task variables [3,4], sparsity between local features [5], the position of local points in the texture would be highly concentrated. We also verified that the final similarity choice was closely related to FTCs between ROIs of different textures viewed in the experiment.
Our research differs from conventional studies on visual perception. First, instead of artificial images (e.g., patterns or fabric images), the utilized textures in the eye-tracking experiment were synthesized by a mathematical morphology-based texture model. We artificially synthesized textures with the required features instead of simply analyzing and extracting features from the existing textures. Second, it was identified that viewing distance influences the visual perception of global and local texture features. Viewers are attracted to local features when viewing images from a short distance and to global features when viewing images from a long distance. In [17], a logistic-regression model was built to construct a statistical relationship between viewing distance and visual perception of global/local features. In this research, viewing distances in the visual-similarity experiment were determined according to the distance equation proposed in [17]. We then set different viewing distances to ensure the equal probability of similarity selection in the experiment.

PGPC Texture Model
Mathematical morphology is a theory of describing shapes. It is used to investigate the interaction between an image and a selected structuring element through the basic operations of erosion and dilation. The PGPC texture model is a mathematical morphology-based model. It was first proposed in [25] and subsequently improved in [26][27][28]. The PGPC texture model represents a texture as an image composed of regular or irregular arrangement of grains. The grains are considerably smaller than the image, resemble one another, and are derived from one or a few typical objects called primitives. The PGPC texture model enables the independent characterization of primitive shapes and grain configurations.
In the PGPC texture model, nonempty texture image X is represented as: where B is a primitive and nB represents the grains that are derived from n-times homothetic magnifications of the primitive B, with n being zero or a positive integer. This is usually defined as follows: where ⊕ denotes a Minkowski set addition. This definition is, however, inconvenient since the difference between nB and (n + 1)B is too large if the original B is large. In this experiment, we used a single-sized structuring element (n = 1) as the grain for texture synthesis. Φ n is a point configuration (skeleton) that is a set indicating the pixel positions for locating grain nB. These locations were randomly generated, whereas the number and directional strength of the locations could be controlled. N is the maximal size of magnification. The PGPC model can be used to synthesize different textures by controlling different primitives and skeletons. For the synthesized textures, the primitive or grains of a texture refer to local features, whereas the skeleton of a texture and grain-size distribution refer to global features. Therefore, global and local features can be separately modified in the PGPC texture model. In contrast to existing textures (e.g., Brodatz textures), those textures that were globally and locally similar could be synthesized through the PGPC texture model.

Stimuli and Apparatus
Stimuli: This research investigated the influences of global and local features on eye-movement patterns in texture visual-similarity perception. Therefore, textures that were globally and locally similar had to be synthesized for use in this research. We synthesized the textures by separately controlling local (grain) and global (skeleton) parameters using the PGPC texture model. Here, global parameters refer to the directionality and density of skeletons, and local parameters refer to the shapes of the grain. The synthesized textures, their grains, and their skeletons are shown in Figure 1. The different grains of Textures 1 ( Figure 1a) and 2 ( Figure 1e) generated different local features.
The skeleton of Texture 1 was created with horizontal strength (Figure 1b), and the skeleton of Texture 2 was created with diagonal strength (Figure 1f). Both skeletons had the same density, but different directionality. Synthesized textures (Figure 1c,d) were cropped into disks to reduce the visual effects of the horizontal and vertical borders of the texture frames. Using the two synthesized textures, we derived six texture stimuli and arranged them into two scenes, as shown in We synthesized the two other textures on the basis of the same idea as in the above synthesis procedure. The grains used for synthesis were the same as those in Figure 1a,e. However, the skeletons used for synthesizing two textures had the same directionality (with horizontal strength), but different density. The synthesized texture scenes were Scenes 3 and 4, as shown in Figure 3. The density of the skeleton was twice higher in Texture A than that in Texture B in Scenes 3 and 4. A total of four scenes were generated in this experiment (the synthesized textures are available for research upon request).
In this experiment, to counterbalance the influences from the order of trials (Scenes 1 and 3 used a similar synthesizing process, so the display of the front trial appeared to influence the subsequent trial) and the presentation in the left and right visual field (some participants may have had attention-related disorders such as spatial hemineglect, so the location of Textures A and B may influence the visual attention), we took two strategies: (1) in both tests, the display orders of two scenes were alternated, and (2) in each scene, the locations of Textures A and B were randomly on the left or right. Apparatus: In this experiment, the participants' visual-attention patterns were tracked and recorded with the eye-tracker Tobii T60 (Tobii Technology AB, Danderyd, Stockholm, Sweden). The eye-tracker had a 17 inch monitor and a resolution of 1280 × 1024. The sampling rate of the eye-tracking system was 60 Hz. The eye-tracker had an accuracy of 0.2 • and 0.3 • spatial resolution. Tobii studio software (Version 3.3, Tobii Technology AB, Danderyd, Stockholm, Sweden, 2016) was used in the initial processing of the eye-tracking data. Fixation was defined by a pause in eye-movements for 75 ms within a certain spatial area defined in 0.5 • . We found some related references on eye-movements [29,30] and arranged the experiment settings. A participant was seated in a soft chair while viewing the display of the eye-tracker. The distance from the screen was controlled by a head rack that had the same function as that of a chin rest. To avoid environmental influences, we eliminated external noise and disturbances from the experiment.

Participants
A total of 60 undergraduate students from Shanxi University participated in the experiment. The participants were comprised of 28 females and 32 males aged 19-23 years (average = 19.13, standard deviation (SD) = 1.11). They had normal or corrected-to-normal vision. Informed consent was obtained from all participants prior to the experiment.

Viewing Distance
Viewing distance is a critical factor for global/local perception [17]. In the experiment, subjects were asked to identify if candidate Texture A or B was more similar to Texture S. To ensure that the viewing distance set for the experiment yielded the same probabilities of selecting Texture A or B, we referred to the equations proposed in [17]. In [9], a logistic-regression model was built to construct a statistical relationship between the viewing distance and visual perception of global/local features. It was also identified that there was an increasing domination of global features with an increase in viewing distance. In order to evaluate the absolute position where local and global features had the same domination, two equations were proposed in [17], as follows: where d 1 denotes the diameter of the texture in Scene 1, d 2 denotes the diameter of the texture in Scene 2, D 1,P A =0.5 is the viewing distance at which the probability of selecting Texture A of Scene 1 was 50%, and D 2,P A =0.5 is the viewing distance at which the probability of selecting Texture A of Scene 2 was 50%. By the above equations, we estimated the viewing distances for different visual stimuli. In the experiment, texture diameters were 10.5 cm. According to Equation (3), the viewing distances for Scenes 1 and 3 were set to 48 cm, and the viewing distances for Scenes 2 and 4 were set to 70 cm.

Procedure
Two tests were conducted in this experiment. Scenes 1 and 3 were used as visual stimuli in Test 1. Scenes 2 and 4 were used as visual stimuli in Test 2. In each test, every participant (30 participants in each test) first signed an informed consent form, and then, they were seated in a soft chair with a suitable height at which the eye-tracker could easily capture the participant's eyes. After setting the viewing distance and chair height, a calibration test was performed in the eye-tracking system in accordance with the participant's eye-movement and visual acuity. The participant's eyes were calibrated to help fixate on five markers on the display area. They were asked to gaze at and visually track the movement of a red dot on the screen of the eye-tracker. If the calibration failed, then the participant was excluded from the experiment.
When calibration was successful, the participant was required to view the screen on which the instructions of the experiment were shown. The visual angle of the presented pattern was 0.5 • . The experiment task was asking the subjects to select if Texture A or B was more similar to Texture S. The participant was required to click the mouse to immediately change the visual stimuli when his/she made a decision. Lastly, the participants provide their answers to similarity selection.

Eye-Tracking-Data Analysis
Before exporting the data from Tobii Studio [31], we divided each scene into three regions of interest (ROIs). In eye-tracking research, there is always disparity between a person's actual gaze location and the location recorded by the eye-tracker [32,33]. The Tobii T60 eye-tracker has a reported accuracy of a 0.5 visual angle. Therefore, to eliminate the effect of systematic error, we took each texture and its external padding as one ROI. This strategy was adopted to avoid the problem of a participant viewing the edge of a texture and not being inside the ROI as a result of calibration issues.According to viewing distance, we calculated the external padding for different scenes. For Scenes 1 and 3, external padding was around 12 pixels. For Scenes 2 and 4, external padding was around 18 pixels. The size and location of the marked ROIs are shown in Figure 4.
In the Introduction, we wondered whether global/local features and final similarity selection influence eye-movement patterns in texture recognition. Therefore, we used three metrics to identify the participants' visual behavior on the basis of the outcomes of the eye-tracking experiment: (1) total fixation duration (TFD) in each ROI, (2) fixation-point variance (FPV) in each ROI, and (3) fixation-transfer counts (FTCs) between different ROIs [34,35]. As the final selection was determined between Textures A and B, we mainly considered the eye-movement data in Textures A and B.
(1) TFD in each ROI represents the duration of fixation within an ROI. It was calculated as the average value in seconds. For Scenes 1-4, we calculated the average TFDs and the SD of TFDs in different ROIs (as shown in Table 1). The TFD in Texture B was longer than that in Texture A for the four scenes. This result was in accordance with that of Deng et al. [36], who demonstrated that horizontal displays were easier to process. Therefore, TFD in Texture A was shorter than that in Texture B. To verify whether global and local texture features influenced the average TFD in Textures A and B, we analyzed the average TFD through ANOVA (Analysis of Variance).    Table 1). Analysis results showed that, for Scene 1, TFD in Texture A was significantly greater (in the paired t-test, t(16) = 2.434, p = 0.027) than that for Texture B when the similarity choice was Texture A under the significance level of 0.05. TFD in Texture B, on the other hand, was not significantly greater (in the paired t-test, t(12) = −1.327, p = 0.209) than that for Texture A when the choice of similarity was Texture B. For Scene 2, TFD in Texture A was not significantly greater (in the paired t-test, t(11) = −0.327, p = 0.750) than that for Texture B when the choice of similarity was Texture A. TFD in Texture B was also not significantly greater (in the paired t-test, t(17) = −1.79, p = 0.091) than that for Texture A when the choice of similarity was Texture B. For Scene 3, TFD in Texture A was significantly greater (in the paired t-test, t(23) = 3.059, p = 0.006) than that for Texture B when the choice of similarity was Texture A. However, TFD in Texture B was not significantly greater (in the paired t-test, t(5) = −2.274, p = 0.072) than that for Texture A when the choice of similarity was Texture B. For Scene 4, TFD in Texture A was not significantly greater (in the paired t-test, t(16) = 1.674, p = 0.114) than that for Texture B when the choice of similarity was Texture A. However, TFD in Texture B was significantly greater (in the paired t-test, t(12) = −3.286, p = 0.007) than that for Texture A when the choice of similarity was Texture B. Analysis results showed that there was no significant relationship between final similarity choices and TFDs in the ROIs.
Furthermore, we added Cohen's d (on Social Sciences Statistics website [37]) to reflect the effect size when interpreting our results. Cohen's d is the difference between two means divided by the standard deviation. Cohen [38] gave useful rules of thumb about what to regard as a "large", "medium", or "small" effect. On the basis of the above analysis of the p-values and Cohen's d-values, the relationships between final similarity choices and TFDs in ROIs were not significant.
(2) FPV in each ROI refers to the degree of fixation concentration when the participants were viewing the stimuli. In our research, this indicator was used to reflect whether a participant's eye-movement was local attention or global scan. Since Texture S had the same global features as those of Texture A and the same local features as those of Texture B, if participants captured the global features of the visual stimuli, the fixation points were dispersed; if the participants paid more attention to the local features of the visual stimuli, the fixation points were concentrated. Therefore, we hypothesized that, during the process of visual searching, the fixation points in Texture A were more dispersed than those in Texture B. To verify this hypothesis, we calculated the FPV in each ROI.
The coordinates of all fixation points in each ROI could be accessed from Tobii Studio. We first computed the FPVs in Textures A and B through the following steps.
I. Calculate the central point of all fixations in each ROI.
where (x i , y i ) is one of the fixations in each ROI and (a, b) is the central point among all fixations in each ROI. II. Calculate the Euclidean distance between each fixation and the center point.
where d i is the Euclidean distance between each fixation and the center point in each ROI. III. Calculate the variance from all distances.
where Var is the variance from all distances. It represents the concentration of all fixations in each ROI. The average and SD of FPVs in the ROIs of Textures A and B for four scenes are shown in Table 2. We conducted a paired t-test to estimate differences in global and local features on the basis of FPVs for Textures A and B. The independent variable used in the t-test was the FPVs, which were calculated from 30 participants' eye-movements. The p-values and Cohen's d-values indicated that significant differences existed between global and local features on the basis of the FPVs for Textures A and B at the significance level of 0.05. Results validated the hypothesis that fixation points were more dispersed in Texture A than in Texture B. Overall, the fixation points in Texture A were scattered because Textures A and S had the same global features; fixation points in Texture B were concentrated because Textures B and S had the same local features. An example of the fixation-point distribution is shown in Figure 5. From this figure, it is obvious that the fixation points in Texture A were more dispersed than the fixation points in Texture B.  In the four scenes, Texture A contained dispersed fixation points because participants paid more attention to its global features (horizontally similar to Texture S). The horizontal displays were easier to process, which subsequently led to higher choice quantities [36,39]. Therefore, we hypothesized that the final similarity choices and the FPVs in ROIs (especially for Textures A or B) were highly related. That is, if the FPV in Texture A was larger than that in Texture B, the final similarity choice tended to be Texture A; by contrast, if the FPV in Texture A was smaller than that in Texture B, the final similarity choice tended to be Texture B. To verify this hypothesis, we conducted a paired t-test to verify differences between choices (Textures A and B) on the basis of the FPVs in Textures A and B (shown in Table 1). The independent variable used in the t-test was the FPVs in ROIs under a certain final similarity choice (Texture A or B).
Analysis results showed that, for Scene 1, the FPV in Texture A was significantly greater (in the paired t-test, t(14) = 3.661, p = 0.001) than that for Texture B when the similarity choice was Texture A under the significance level of 0.01. The FPV in Texture B, on the other hand, was not significantly greater (in the paired t-test, t(7) = −2.034, p = 0.040) than that for Texture A when the choice of similarity was Texture B. For Scene 2, the FPV in Texture A was not significantly greater (in the paired t-test, t(5) = 2.603, p = 0.024) than that for Texture B when the choice of similarity was Texture A. The FPV in Texture B was also not significantly greater (in the paired t-test, t(5) = −1.835, p = 0.063) than that for Texture A when the choice of similarity was Texture B. For Scene 3, the FPV in Texture A was significantly greater (in the paired t-test, t(14) = 4.512, p = 0.000) than that for Texture B when the choice of similarity was Texture A. However, the FPV in Texture B was not significantly greater (in the paired t-test, t(5) = −2.487, p = 0.028) than that for Texture A when the choice of similarity was Texture B. For Scene 4, the FPV in Texture A was significantly greater (in the paired t-test, t(8) = 3.561, p = 0.004) than that for Texture B when the choice of similarity was Texture A. The FPV in Texture B was not significantly greater (in the paired t-test, t(6) = −2.768, p = 0.016) than that for Texture A when the choice of similarity was Texture B. Analysis results showed that there was no significant relationship between final similarity choices and FPV in the ROIs.
(3) FTCs between different ROIs represent the fixation-transfer counts between different ROIs. In this research, the subjects viewed three synthesized texture images in one scene. Their final similarity choices between Textures A and B were determined by checking which of the two textures was more similar to Texture S. Therefore, we mainly considered the FTCs between Textures S and A (FTCs_S,A) and the FTCs between Textures S and B (FTCs_S,B).
As per Question (2) in the Introduction, we hypothesized that visual-similarity perception may influence eye-movement patterns, and there are two relationships between the final similarity choice and the FTCs between different textures (especially Textures A and S and Textures B and S). Moreover, Atalay et al. [40] demonstrated that gaze duration and fixation frequency tended to influence the final choice. Therefore, the relationships that we hypothesized are shown as follows: I. If the final similarity choice was Texture A, then FTCs_S,A> FTCs_S,B. II. If the final similarity choice was Texture B, then FTCs_S,A<FTCs_S,B.
To verify the hypothesis, we conducted a paired t-test to evaluate differences in FTCs_S,A and FTCs_S,B for the similarity choice between Textures A and B in the four scenes. The independent variable used in the t-test was the FTCs between different ROIs under a certain finial similarity choice (Texture A or B). Results are shown in Table 3. For Scene 1, FTCs_S,A was significantly greater (in the paired t-test, t(16) = 4.883, p < 0.001) than FTCs_S,B when the similarity choice was Texture A. FTCs_S,A was much less (in the paired t-test, t(12) = −3.32, p = 0.006) than FTCs_S,B when the choice of similarity was Texture B. The same results were obtained in Scene 2. FTCs_S,A was significantly greater (in the paired t-test, t(11) = 2.966, p = 0.013) than FTCs_S,B when the similarity choice was Texture A. FTCs_S,A was considerably less (in the paired t-test, t(17) = −2.247, p = 0.038) than FTCs_S,B when the similarity choice was Texture B. In addition, in Scene 3, FTCs_S,A was significantly greater (in the paired t-test, t(23) = 6.075, p < 0.001) than FTCs_S,B when the similarity choice was Texture A. FTCs_S,A was much less (in the paired t-test, t(5) = −3.464, p = 0.018) than FTCs_S,B when the choice of similarity was Texture B. Lastly, in Scene 4, FTCs_S,A was significantly greater (in the paired t-test, t(16) = 3.289, p = 0.005) than FTCs_S,B when the similarity choice was Texture A. FTCs_S,A was less (in the paired t-test, t(12) = −2.241, p = 0.045) than FTCs_S,B when the choice of similarity was Texture B. Examples of FTCs for the four scenes are shown in Figure 6. On the basis of pand Cohen's d-values, analysis results verified the hypothesis that the final visual-similarity selection was related to FTCs between the different ROIs, and it influenced the fixation-transfer counts between different ROIs.

Discussion and Conclusions
Two main issues were studied in this experiment. The first was related to how global and local features influence eye-movement patterns under similarity perception, and the second was related to the relationship between similarity choice and eye-movement patterns.
Regarding the first issue, TFD and FPV in each ROI were calculated to analyze eye-movement patterns. Generally, a high TFD value was attributed to the attractiveness of an ROI [41]. For Scenes 1-4, the TFD in Texture B was longer than that in Texture A. This reflected that Texture B got more attention. However, some studies had different views. Long fixation duration on an ROI indicates that the ROI is too complex to understand [42,43]. In this research, textures in each scene were synthesized with the PGPC texture model. The complexities of grains and skeletons that were used in the synthesis were at the same level of complexity. Therefore, the only difference of complexity for Textures A and B was shown in directionality (for Scenes 1 and 2) and density (for Scenes 3 and 4). Deng et al. [36] demonstrated that horizontal displays are easier to process. This also proves our finding that the TFD in Texture A was shorter than the TFD in Texture B. Previous research [24] demonstrated that similarity has a different effect on processing global and local features. In the present study, fixations in Textures A and B showed different distributions under similarity perception. Textures A and S had a similarity of global features, and the fixation points in Texture A were dispersed. Textures B and S had a similarity of local features, and the fixation points in Texture B were relatively concentrated.
In terms of the second issue, we summarized the experiment data corresponding to the two types of relationships between final similarity choice and FTCs between the different ROIs; results are shown in Table 4. A total of 88.2%(15/17) of subjects in Scene 1, 75.0% (9/12) of subjects in Scene 2, 87.5% (21/24) of subjects in Scene 3, and 76.5% (13/17) of subjects in Scene 4 transferred more frequently between Textures A and S than between Textures B and S; these subjects believed that Texture A was more similar to Texture S. Meanwhile, a total of 61.5% (8/13) of subjects in Scene 1, 72.2% (13/18) of subjects in Scene 2, 83.3% (5/6) of subjects in Scene 3, and 61.5% (8/13) of subjects in Scene 4 transferred more frequently between Textures B and S than between Textures A and S; these subjects regarded Texture B as more similar to Texture S. Furthermore, we conducted a paired t-test to evaluate differences in FTCs_S,A and FTCs_S,B in the similarity choice between Textures A and B for the four scenes. On the basis of paired t-test results, we concluded that the final similarity choice was closely related to the FTCs of different textures viewed in the tests. This was in accordance with previous research that transitions between different fixations are related to the search behavior and expectations of the observers [44]. In conclusion, we investigated whether global and local features influence eye-movement patterns in texture-similarity perception. When the texture was globally similar to the texture being compared, the fixation points in the texture were highly dispersed. By contrast, when the texture was locally similar to the texture being compared, the fixation points in the texture were highly concentrated. Furthermore, the domination of global and local features influences viewers' similarity choice. The final visual-similarity selection was related to FTCs between different ROIs, but not related to the TFD in each ROI. This research contributes to analyzing the pattern of texture recognition using an eye-tracker and extends a new application of the mathematical morphology-based texture model to human visual perception.
The texture stimuli utilized in the experiment were limited. In our future work, we aim to synthesize different types of textures for experimental research. We will further analyze other metrics of eye-movements to investigate the relationship between similarity perception and global/local processing.