Eye Tracking Research on the Influence of Spatial Frequency and Inversion Effect on Facial Expression Processing in Children with Autism Spectrum Disorder

Facial expression processing mainly depends on whether the facial features related to expressions can be fully acquired, and whether the appropriate processing strategies can be adopted according to different conditions. Children with autism spectrum disorder (ASD) have difficulty accurately recognizing facial expressions and responding appropriately, which is regarded as an important cause of their social disorders. This study used eye tracking technology to explore the internal processing mechanism of facial expressions in children with ASD under the influence of spatial frequency and inversion effects for improving their social disorders. The facial expression recognition rate and eye tracking characteristics of children with ASD and typical developing (TD) children on the facial area of interest were recorded and analyzed. The multi-factor mixed experiment results showed that the facial expression recognition rate of children with ASD under various conditions was significantly lower than that of TD children. TD children had more visual attention to the eyes area. However, children with ASD preferred the features of the mouth area, and lacked visual attention and processing of the eyes area. When the face was inverted, TD children had the inversion effect under all three spatial frequency conditions, which was manifested as a significant decrease in expression recognition rate. However, children with ASD only had the inversion effect under the LSF condition, indicating that they mainly used a featural processing method and had the capacity of configural processing under the LSF condition. The eye tracking results showed that when the face was inverted or facial feature information was weakened, both children with ASD and TD children would adjust their facial expression processing strategies accordingly, to increase the visual attention and information processing of their preferred areas. The fixation counts and fixation duration of TD children on the eyes area increased significantly, while the fixation duration of children with ASD on the mouth area increased significantly. The results of this study provided theoretical and practical support for facial expression intervention in children with ASD.


Introduction
Autism spectrum disorder (ASD) is a developmental disability that can cause significant social, communication, and behavioral challenges [1]. The symptoms of ASD begin in early childhood and typically last a lifetime, placing a heavy burden on families and society. The incidence of ASD has been increasing in recent years, with an estimated 1 in 44 children diagnosed with the disorder, according to the Centers for Disease Control of the USA [2]. Facing the growing demand for diagnosis and treatment, more and more are differences in processing methods between children with ASD and TD children. For example, children with ASD performed better than TD children on embedded figures tests [23], suggesting that they prefer local rather than holistic processing method. The Weak Central Coherence theory proposed by Frith and Happé provides a certain explanation [24]. TD children usually process information at the expense of ignoring local details to form meaning and gestalt configuration. The face is just a gestalt representation integrated by various local features such as eyes, mouth, etc., which is considered a typical case of configural processing [25]. TD children tend to perceive and process the face as a whole. However, children with ASD tend to interpret multiple complex stimuli as independent parts, and then independently perceive and process these local features. It is difficult for them to integrate the local features into a meaningful whole, and perform holistic facial processing like TD children [26].
Researchers have designed many methods, such as spatial frequency paradigm and inverted paradigm, to further experimentally analyze the internal processing mechanism of children with ASD. These paradigms include: 1.
The spatial frequency paradigm mainly uses different spatial filters to transform facial expression images [27]. The change of spatial frequency would cause the change of expression features in the facial image, which would have an impact on different facial expression processing methods. It is generally believed that after the low spatial frequency (LSF) filter blurs the facial image, the configural information of the face is retained, which is beneficial to the configural processing method. The high spatial frequency (HSF) filter highlights the local features of the face, which is beneficial to the featural processing method. Additionally, the broad spatial frequency (BSF) is the original image itself [28]. Exploring the performance of individuals under different spatial frequency conditions is helpful to analyze their facial expression processing methods.
Some researchers presented facial expression images under different spatial frequency conditions, and asked the participants to make recognition judgments [29]. Their research found that TD individuals processed facial information more effectively under the LSF condition than the HSF condition, indicating that they mainly adopted the configural processing method. Deruelle et al. [30] first used this paradigm to find that children with ASD were weaker than TD children in recognizing various expressions, and generally relied on HSF information to process facial expressions, confirming that they mainly used featural processing method. However, some researchers believed that changes in spatial frequency would not directly affect the facial expression recognition of individuals with ASD. Vanmarcke et al. [28] found that teenagers with ASD performed worse than TD teenagers with the same age in facial expression classification tasks, but the level of spatial frequency did not significantly affect the performance of these two groups. Goffaux et al. [31] found that although featural information was enhanced under the HSF condition, it still retained certain configural information and still supported the occurrence of configural processing. Therefore, simply presenting facial images with different spatial frequencies does not provide a good insight into how children with ASD process facial expressions.

2.
The inversion paradigm adopts the method of inverting the entire facial image, and then asks the participants to perceive and process [32]. Since the facial image is inverted, it breaks the original layout of the face and has a greater impact on the configural processing method [33]. Participants need to reintegrate featural information from various areas of the face, such as eyes and mouth. Therefore, participants have difficulty in recognizing inverted facial images compared to upright facial images.
There is a huge contrast in their reactions, known as the inversion effect [34]. If there is an inversion effect, it can be inferred that this participant mainly adopts a configural processing method. For TD children, inverted faces are more difficult to recognize than upright faces, and there are significant differences between these two facial presentations. TD children are affected by the inversion effect, which is consistent with their predominant use of configural processing. Langdell [35] first used this paradigm and found no inversion effect in children with ASD during the experiment. This conclusion supported that children with ASD did not rely on configural processing. Many researchers supported this conclusion and believed that children with ASD were not affected by the inversion effect, and were more inclined to use a featural processing method [36]. However, some others considered that children with ASD could also be affected by the inversion effect, supporting the conclusion that children with ASD had the capacity of configural processing [37]. Pallett et al. [38] found that with increasing age and IQ, children with ASD would be affected by the inversion effect like TD children. They would be sensitive to faces and encode the facial information using a configural processing method. The controversy in these studies might be due to the heterogeneity of participants and the differences in experimental tasks. Therefore, the impact of the inversion effect on children with ASD remains to be further investigated.
Some researchers thought that a single paradigm experiment could not fully reveal the inner processing mechanism of children with ASD. The combination of spatial frequency and inversion effect might provide new clues about the characteristics of facial expression processing in children with ASD. The findings of Kikuchi et al. [39] showed that children with ASD had an inversion effect under the LSF condition, demonstrating the capacity for configural processing of facial expressions in children with ASD. Furthermore, it is worth in-depth exploring whether children with ASD can fully fixate and acquire facial expression features from the core areas of the face, and whether they can make adaptive adjustments under different conditions.
As reviewed above, researchers generally believe that children with ASD have certain disorders in the processing of facial expressions. There are still some issues worthy of in-depth study on the internal processing mechanism of children with ASD. (1) There is still controversy about what facial expression processing strategies children with ASD use. The combination of multiple experimental paradigms is a beneficial research avenue to reveal their processing mechanisms. It is worth further studying the relationship of different conditions and their influence on the facial expression processing of children with ASD. (2) There is still controversy about the visual processing mechanisms of the face and specific areas of interest in children with ASD. Eye tracking technology is an important tool to reveal their tacit emotional and cognitive processing. It deserves further research on the eye tracking indicators of children with ASD in core areas of the face, such as eyes and mouth, under different conditions, as well as their adjustment strategies affected by different factors. (3) The age of the participants had a wide distribution across experiments. Young children with ASD are in the golden age of brain development and can be effectively improved through intervention. They need to be the subjects of more research.
In view of this, this research is based on previous studies and intends to explore the influence of spatial frequency and inversion effect on facial expression processing of children with ASD through eye tracking technology, so as to further explore their facial expression processing mechanism. This study intends to use comparative experimental research. The experimental group is children with ASD, and the control group is matched TD children. The differences in facial expression processing and eye tracking characteristics between children with ASD and TD children are compared and analyzed. This experiment adopts a multi-factor mixed experimental scheme to analyze the facial expression recognition rates of the two groups of children under different spatial frequencies and orientations, as well as eye tracking indicators for core areas such as eyes and mouth, and their correlation, so as to deeply study the facial expression processing mechanism of children with ASD.
This research uses eye tracking technology and different experimental paradigms to explore the facial expression processing characteristics of children with ASD, and reveal their processing mechanism, which has theoretical significance. On the other hand, it can provide a basis for the design of facial expression intervention materials and their presentation forms, which has practical value for children with ASD.

Participants
The experiment of facial expression processing required participants to have certain abilities in visual attention, cognition, and comprehension. Due to the prevalence of developmental delay in children with ASD, the inclusion criteria for the two study groups were specified as two groups of children with matched abilities rather than matched ages. In addition, children with attention deficit and difficulty completing the experiment were excluded.
The participants consisted of 12 children with ASD from a special education institution in Wuhan (ASD group: 9 males and 3 females; 5-7 years old; mean age = 5.6 years, SD = 0.5 years) and 11 TD children from a kindergarten in Xinyang (TD group: 7 males and 4 females; 3-5 years old; mean age = 4.1 years, SD = 0.3 years). The Peabody Picture Vocabulary Test Revised (PPVT-R) was used to assess their abilities [40,41], and the results were analyzed by t-test. It was found that there was no significant difference in the level of verbal IQ between these two groups (ASD group: mean score = 51.08, SD = 14.66; TD group: mean score = 53.54, SD = 4.23; t = −0.56, p = 0.59 > 0.05), which met the experimental requirements.
Before the experiment, all children with ASD had been double-blindly diagnosed by two expert physicians in child development and behavior, and confirmed the diagnoses according to DSM-5 criteria [42]. After parental interviews and clinical observations, the two groups of participants were excluded from childhood schizophrenia, epilepsy, and other organic brain diseases, and were confirmed to have normal vision (or corrected vision) and normal intelligence.
Privacy protection agreements were signed with the special education institution and the kindergarten. Informed consent was obtained from the participants' parents. This study only collected relevant data anonymously when the participants completed the experimental tasks. No personally identifiable information or portraits of participants were involved.

Design
A multi-factor mixed experiment of 2 (group) × 3 (spatial frequency) × 2 (orientation) was designed. When analyzing eye tracking data, another factor (area of interest) would be added. The between-subject variable was the group, divided into two levels of ASD group and TD group. The others were within-subject variables. The spatial frequency includes three levels: broad spatial frequency (BSF), which is the original image, low spatial frequency (LSF), and high spatial frequency (HSF). The orientation includes two levels: upright and inverted. The area of interest includes two levels: eyes and mouth.

Materials
In order to obtain high-quality images for facial expression processing, the standardized facial expression datasets were required, with standardized facial angles, expression types, expression strengths, and a rich source of subjects. There were several datasets that satisfied the experimental needs. The formal experimental materials used the BU-4DFE database of State University of New York at Binghamton [43,44], which had been purchased and licensed for non-profit research use. This was a high-resolution facial expression database presenting fine-grained expression structural variation, including multiple ethnicities, a broad age range, more than 100 subjects, 6 basic expressions, each including 4 intensity levels. The facial expression images of 4 young Asians (2 males and 2 females, with an average age of 22 years) were selected as experimental materials. Each person's facial images contained 4 basic expressions: happiness, sadness, anger, and fear, all of which were facial expressions at the highest intensity level for easy identification by children. Photoshop CS6 software (Adobe, San Jose, CA, USA) was used for normalization and grayscale processing, and then the MATLAB 2018b software (MathWorks, Natick, MA, USA) was used to process the images: first with Fourier transform, and then with Gaussian filter for LSF processing and HSF processing. The filtering standard was international general standard [30]: LSF parameter < 2 cycle/face, HSF parameter > 6 cycle/face. Finally, a total of 48 facial expression images in the upright states were formed, including 16 BSF images of the original image, 16 LSF images, and 16 HSF images. Then these images were rotated by 180 • to obtain another 48 facial expression images in the inverted state. Figure 1 was an example of the formal experimental materials. 4 intensity levels. The facial expression images of 4 young Asians (2 males and 2 females, with an average age of 22 years) were selected as experimental materials. Each person's facial images contained 4 basic expressions: happiness, sadness, anger, and fear, all of which were facial expressions at the highest intensity level for easy identification by children.
Photoshop CS6 software (Adobe, San Jose, USA) was used for normalization and grayscale processing, and then the MATLAB 2018b software (MathWorks, Natick, USA) was used to process the images: first with Fourier transform, and then with Gaussian filter for LSF processing and HSF processing. The filtering standard was international general standard [30]: LSF parameter <2 cycle/face, HSF parameter >6 cycle/face. Finally, a total of 48 facial expression images in the upright states were formed, including 16 BSF images of the original image, 16 LSF images, and 16 HSF images. Then these images were rotated by 180° to obtain another 48 facial expression images in the inverted state. Figure 1 was an example of the formal experimental materials.

Equipment
An all-in-one computer with a 23-inch multi-touch screen was used to present the experimental materials on the screen and record the participants' responses. The screen resolution was 1920 × 1080 pixels.
An Eye Tracker 4C (Tobii, Stockholm, Sweden) was mounted directly below the computer screen, connected to and controlled by this computer. It had a sampling frequency of 90 Hz and offered the software development kit for eye tracking data acquisition. The calibration of the eye tracker followed the standard procedure provided by Tobii device driver, called the 7-point positioning method. That is, the participants were required to gaze at 7 target points on the screen in sequence (a central point, then three peripheral points, then another set of three peripheral points), staring at each point until it disappeared to complete the calibration. Failure to calibrate at any point would result in a recalibration of all 7 points. Only after successful calibrating all 7 points were the participants allowed to take part in further steps of the experiment.

Procedure
The experiment consisted of two blocks, upright and inverted. Six children with ASD and six TD children observed the upright block first, and the others observed the inverted block first. Each block had 48 trials, that is, 48 facial expression images consisting of 3 spatial frequency conditions, from 4 Asians, and 4 basic expressions of each Asian. All these trials were conducted in random order. Each participant was required to complete a total of 96 trails from these two blocks. The experimental task of facial expression processing adopted the two-alternative forced-choice (2FAC) matching task, containing one target image of the facial expression and two probes (matching option and non-matching option).
The procedure was as follows: first, a red dot was presented in the center of the black screen for 0.5 s to attract the participant's attention. Next, the experimental materials were presented with the target image at the top middle and the 2FAC probes at the bottom of

Equipment
An all-in-one computer with a 23-inch multi-touch screen was used to present the experimental materials on the screen and record the participants' responses. The screen resolution was 1920 × 1080 pixels.
An Eye Tracker 4C (Tobii, Stockholm, Sweden) was mounted directly below the computer screen, connected to and controlled by this computer. It had a sampling frequency of 90 Hz and offered the software development kit for eye tracking data acquisition. The calibration of the eye tracker followed the standard procedure provided by Tobii device driver, called the 7-point positioning method. That is, the participants were required to gaze at 7 target points on the screen in sequence (a central point, then three peripheral points, then another set of three peripheral points), staring at each point until it disappeared to complete the calibration. Failure to calibrate at any point would result in a recalibration of all 7 points. Only after successful calibrating all 7 points were the participants allowed to take part in further steps of the experiment.

Procedure
The experiment consisted of two blocks, upright and inverted. Six children with ASD and six TD children observed the upright block first, and the others observed the inverted block first. Each block had 48 trials, that is, 48 facial expression images consisting of 3 spatial frequency conditions, from 4 Asians, and 4 basic expressions of each Asian. All these trials were conducted in random order. Each participant was required to complete a total of 96 trails from these two blocks. The experimental task of facial expression processing adopted the two-alternative forced-choice (2FAC) matching task, containing one target image of the facial expression and two probes (matching option and non-matching option).
The procedure was as follows: first, a red dot was presented in the center of the black screen for 0.5 s to attract the participant's attention. Next, the experimental materials were presented with the target image at the top middle and the 2FAC probes at the bottom of the screen. Each trail lasted for 6 s. The participant was asked to carefully observe the target image and found the correct probe. In the interval between two trails, a black screen appeared for 2 s as a rest. This process was repeated in turn until the block ended. Each block took about 7 min. The participant was allowed to rest for half an hour or more in the interval between two blocks. Figure 2 was an example of the experimental procedure. target image and found the correct probe. In the interval between two trails, a black screen appeared for 2 s as a rest. This process was repeated in turn until the block ended. Each block took about 7 min. The participant was allowed to rest for half an hour or more in the interval between two blocks. Figure 2 was an example of the experimental procedure. The experiment was carried out in a quiet and comfortable room. Each participant was required to sit on a chair 60-65 cm away from the computer screen, and performed the eye tracking calibration. Two operators guided the participant to complete the experimental task. Operator A was responsible for controlling the computer program and eye tracker, presenting each target image on the screen. Operator B prompted the participant to watch the target image and gave instructions to the participant in a language adapted to his/her level, for example, "Hello kid! Look! Which one of the following expressions do you think is the same as the target image?" The participant needed to touch the correct probe or verbalize the expression type within the specified time. The operator would assist the participant in the tasks until the end of the experiment. The computer program automatically recorded the participant's experimental data, including score, elapsed time and eye tracking data. The experiment adopted the 0/1 scoring method: 1 point for correctness, 0 for errors or no response. When the participant chose the correct probe, a cartoon character would appear on the screen as reward feedback.

Data Analysis Indicators
The experiment recorded the resultant data of the participants completing the experimental task, that is, the facial expression recognition rate, which comprehensively reflected their facial expression processing ability. This article focused on the influence of spatial frequency and inversion effect on facial expression processing of children with ASD and TD children. Statistical analysis was performed using SPSS 27 (IBM, Armonk, The experiment was carried out in a quiet and comfortable room. Each participant was required to sit on a chair 60-65 cm away from the computer screen, and performed the eye tracking calibration. Two operators guided the participant to complete the experimental task. Operator A was responsible for controlling the computer program and eye tracker, presenting each target image on the screen. Operator B prompted the participant to watch the target image and gave instructions to the participant in a language adapted to his/her level, for example, "Hello kid! Look! Which one of the following expressions do you think is the same as the target image?" The participant needed to touch the correct probe or verbalize the expression type within the specified time. The operator would assist the participant in the tasks until the end of the experiment. The computer program automatically recorded the participant's experimental data, including score, elapsed time and eye tracking data. The experiment adopted the 0/1 scoring method: 1 point for correctness, 0 for errors or no response. When the participant chose the correct probe, a cartoon character would appear on the screen as reward feedback.

Data Analysis Indicators
The experiment recorded the resultant data of the participants completing the experimental task, that is, the facial expression recognition rate, which comprehensively reflected their facial expression processing ability. This article focused on the influence of spatial frequency and inversion effect on facial expression processing of children with ASD and TD children. Statistical analysis was performed using SPSS 27 (IBM, Armonk, NY, USA) to discuss their differences and infer their respective processing methods of facial expressions. The influence of other factors (different characters, different expression types) of the experimental materials would be discussed in another article. The indicator of facial expression recognition rate in this article was the average recognition rate of participants for different characters and different types of expressions.
The eye tracker recorded the procedural data of the participants during the experimental task. Eye tracking indicators included: (1) Fixation count, referred to the total number of fixation points in the target area. (2) Fixation duration, referred to the total duration of fixation on the target area. These indicators were used to explore the facial expression processing characteristics of the two groups of children on target image (face) and specific areas of interest (eyes, mouth). Ogama 5.1 (Opensource software) was used for the division of areas of interest, eye tracking data statistics, and result visualization. The fixation calculation was performed using the default fixation detection algorithm built into the Ogama 5.1 software released from LC technologies. The parameters were also the default values (maximum distance in pixels was 20, and minimum number of samples was 5).
The eye tracking indicators of the two groups of children on the eyes and mouth areas under different conditions reflected the amount of facial expression information they obtain from the target area, which could comprehensively reflect their facial expression processing ability and reveal their tacit emotional and cognitive processing. It could also be inferred whether and how they adaptively adjusted facial expression processing strategies under the influence of different spatial frequencies and inversion effects. Using SPSS for statistical analysis, the correlation between the attention to eyes/mouth area and the facial expression recognition rate was obtained, which is helpful to deeply explore the internal processing mechanism of the two groups of children, and reveal the causes of facial expression processing disorders in children with ASD.

Facial Expression Recognition Rate
For each condition of spatial frequency and orientation, the mean facial expression recognition rates of the two groups of children and the Mann-Whitney U test results (p-values) are shown in Table 1. The distribution of the facial expression recognition rate made the use of parametric tests inappropriate. Therefore, the Mann-Whitney U test was performed on these data of the two groups of children. Bonferroni corrections were adopted for all comparisons. It could be seen that the facial expression recognition rates of the two groups of children under different conditions were significantly different (p < 0.05). That is, the average expression recognition rate of children with ASD under each condition was significantly lower than that of TD children.
The Friedman test results showed that the conditional effect was significant in both groups (p < 0.05). Then the Wilcoxon test with Bonferroni correction was performed in each group. The results showed that the facial expression recognition rate of children with ASD under the LSF condition was significantly lower than that under the other two spatial frequency conditions (p < 0.01). When the face was upright, the recognition rate of children with ASD under the HSF condition was significantly higher than that under the other two spatial frequency conditions (p < 0.05), while the recognition rate of TD children under the BSF condition was significantly higher than that under the other two spatial frequency conditions (p < 0.05). When the face was inverted, the recognition rate of TD children under the LSF condition was significantly lower than that under the other two spatial frequency conditions (p < 0.05). Additionally, TD children had the inversion effect under all the three spatial frequency conditions, which was manifested as a significant decrease in recognition rate (p < 0.05). However, children with ASD were less affected by the inversion effect. They only had the inversion effect under the LSF condition, that is, the recognition rate decreased significantly when the face was inverted (p < 0.05). They had no inversion effect under the BSF and HSF conditions (p > 0.05).
A simple effect analysis of the interaction was carried out and showed that when the face was upright, the fixation counts of children with ASD on the target image under all three spatial frequency conditions were significantly fewer than those of TD children (p < 0.05). The fixation counts of children with ASD on the target image under the HSF condition was significantly more than those under the other two spatial frequency conditions (p < 0.05), while the fixation counts of TD children on the target image under the LSF condition was significantly fewer than those under the other two spatial frequency conditions (p < 0.05). When the face was inverted, the fixation counts of children with ASD on the target image was significantly reduced under the LSF condition (p < 0.01) and significantly increased under the HSF condition (p < 0.05), while the fixation counts of TD children on the target image was significantly reduced under the BSF condition (p < 0.05). The fixation counts of children with ASD on the target image under the LSF condition was significantly fewer than that of TD children (p < 0.01).

Fixation Counts on the Areas of Interest
For each condition of spatial frequency and orientation, the mean fixation counts of the two groups of children on different areas of interest (eyes, mouth) in the target image and the t-test results (p-values) are shown in Table 3. The proportion of children with ASD's fixation counts on the eyes area to the target facial image was from 16.5% to 27.0%, and on the mouth area was from 44.0% to 70.0%. The proportion of TD children's fixation counts on the eyes area to the target facial image was from 34.9% to 70.1%, and on the mouth area was from 15.0% to 27.1%.
A simple effect analysis of the interaction was carried out and showed that the fixation counts of children with ASD on the mouth area under each condition were significantly more than those on the eyes area (p < 0.05). However, the fixation counts of TD children on the eyes area under each condition were significantly more than those on the mouth area (p < 0.05). In addition, TD children had significantly more fixation counts on the eyes area than children with ASD under each condition (p < 0.01). The fixation counts of children with ASD on the mouth area under the HSF condition were significantly more than those of TD children (p < 0.05). When the face was upright, the fixation counts of children with ASD on the eyes and mouth area under the HSF condition were significantly more than those under the LSF condition (p < 0.05). When the face was inverted, the fixation counts of children with ASD on the mouth area under the HSF condition were significantly more than those under the other two spatial frequency conditions (p < 0.05). The fixation counts of TD children on the eyes and mouth area under the BSF condition were significantly more than those under the LSF condition (p < 0.05). When the face was inverted, the fixation counts of children with ASD on the mouth area were significantly increased under the HSF condition (p < 0.05), while the fixation counts of TD children on the eyes area were significantly increased under all three spatial frequency conditions (p < 0.05). Children with ASD had significantly more fixation counts on the mouth area than TD children under all three spatial frequency conditions (p < 0.05) when the face was inverted.
A simple effect analysis of the interaction was carried out and showed that the fixation duration of children with ASD on the target image was significantly reduced under the LSF condition (p < 0.05) and significantly increased under the HSF condition (p < 0.05). When the face was upright, the fixation duration of children with ASD on the target image under the BSF and LSF conditions was significantly less than that of TD children (p < 0.01). When the face was inverted, the fixation duration of children with ASD on the target image under the LSF condition was significantly less than that of TD children (p < 0.01). The fixation duration of TD children on the target image was significantly reduced under all the three spatial frequency conditions (p < 0.05).

Fixation Duration on the Areas of Interest
For each condition of spatial frequency and orientation, the mean fixation duration of the two groups of children on different areas of interest (eyes, mouth) in the target image and the t-test results (p-values) are shown in Table 5. The proportion of children with ASD's fixation duration on the eyes area to the target facial image was from 10.1% to 21.9%, and on the mouth area was from 29.2% to 74.2%. The proportion of TD children's fixation duration on the eyes area to the target facial image was from 34.3% to 67.0%, and on the mouth area was from 16.7% to 27.5%.
A simple effect analysis of the interaction was carried out and showed that the fixation duration of children with ASD on the mouth area under each condition was significantly more than that on the eyes area (p < 0.05). However, the fixation duration of TD children on the eyes area under each condition was significantly more than that on the mouth area (p < 0.05). In addition, TD children had significantly more fixation duration on the eyes area than children with ASD under each condition (p < 0.01). The fixation duration of children with ASD on the mouth area under the HSF condition was significantly more than that under the other two spatial frequency conditions (p < 0.05). When the face was upright, the fixation duration of children with ASD on the mouth area under the HSF condition was significantly more than that of TD children (p < 0.05). When the face was inverted, the fixation duration of children with ASD on the mouth area under the BSF and HSF conditions was significantly more than that of TD children (p < 0.05). Under all three spatial frequency conditions, the fixation duration of children with ASD on the mouth area was significantly increased (p < 0.05), while the fixation duration of TD children on the eyes area was significantly increased (p < 0.05) when the face was inverted.

Eye Tracking Visualization
The Ogama 5.1 (Opensource software, http://www.ogama.net/ accessed on 6 January 2022) was used to analyze the eye tracking data of the two groups of children and visualize the results, as shown in Figure 3.
duration of children with ASD on the mouth area under the HSF condition was significantly more than that under the other two spatial frequency conditions (p < 0.05). When the face was upright, the fixation duration of children with ASD on the mouth area under the HSF condition was significantly more than that of TD children (p < 0.05). When the face was inverted, the fixation duration of children with ASD on the mouth area under the BSF and HSF conditions was significantly more than that of TD children (p < 0.05). Under all three spatial frequency conditions, the fixation duration of children with ASD on the mouth area was significantly increased (p < 0.05), while the fixation duration of TD children on the eyes area was significantly increased (p < 0.05) when the face was inverted.

Eye Tracking Visualization
The Ogama 5.1 (Opensource software, http://www.ogama.net/) was used to analyze the eye tracking data of the two groups of children and visualize the results, as shown in Figure 3. TD children mainly focused their visual attention on the core area of the target face, especially the eyes area. However, children with ASD had more distracted visual attention, and they preferred to stare at the mouth area. The visualized heat map reflected the facial expression processing characteristics of children with ASD and their preference for mouth features.

Overall Analysis
This study explored the differences in facial expression processing and eye tracking features between children with ASD and TD children. The overall results showed that the facial expression recognition rate of children with ASD under various experimental conditions (spatial frequency, orientation) was significantly lower than that of TD children. It could be inferred that the facial expression processing ability of children with ASD was weaker than that of TD children. Due to the prevalence of developmental delay in children with ASD, the participants in this study consisted of two groups of children with no significant difference in the level of verbal IQ. Children with ASD were older than TD children, but their performance in experimental tasks was still significantly weaker than that TD children mainly focused their visual attention on the core area of the target face, especially the eyes area. However, children with ASD had more distracted visual attention, and they preferred to stare at the mouth area. The visualized heat map reflected the facial expression processing characteristics of children with ASD and their preference for mouth features.

Overall Analysis
This study explored the differences in facial expression processing and eye tracking features between children with ASD and TD children. The overall results showed that the facial expression recognition rate of children with ASD under various experimental conditions (spatial frequency, orientation) was significantly lower than that of TD children. It could be inferred that the facial expression processing ability of children with ASD was weaker than that of TD children. Due to the prevalence of developmental delay in children with ASD, the participants in this study consisted of two groups of children with no significant difference in the level of verbal IQ. Children with ASD were older than TD children, but their performance in experimental tasks was still significantly weaker than that of TD children, which further indicated that children with ASD had facial expression processing disorders.
The eye tracking results showed that the fixation counts and fixation duration of children with ASD on the mouth area under each condition were significantly more than those on the eyes area. In contrast, the fixation counts and fixation duration of TD children on the eyes area under each condition were significantly more than those on the mouth area.
In addition, the fixation counts and fixation duration of TD children on the eyes area under each condition were significantly more than those of children with ASD on the eyes area.
The correlation between the attention to eyes/mouth area and the facial expression recognition rate under each condition was statistically analyzed, and the results are shown in Table 6. The results of Pearson correlation analysis showed that there was a significant positive correlation between children's attention (fixation counts and fixation duration) to eyes area and facial expression recognition rates under both BSF and LSF conditions (r > 0.5, p < 0.01). Additionally, there was a significant positive correlation between children's fixation counts on the eyes area and facial expression recognition rate under the condition of inverted face and HSF (r = 0.544, p < 0.01). There was a significant positive correlation between children's fixation duration on the eyes area and facial expression recognition rate under the condition of upright face and HSF (r = 0.505, p < 0.05). It could be inferred that children with more visual attention to the eyes area would achieve higher facial expression recognition rates. There was a significant negative correlation between children's attention (fixation counts and fixation duration) to the mouth area and facial expression recognition rates under the HSF condition (r < −0.4, p < 0.05) and under the condition of inverted face and BSF (r < −0.4, p < 0.05). It could be inferred that children with more visual attention to the mouth area would achieve lower facial expression recognition rates.
From the comparison results of the two groups of children, it could be inferred that different deployment of visual attention to eyes/mouth area in the two groups of children might lead to different abilities to process and recognize facial expressions. TD children had more visual attention to the eyes and could perceive and acquire relatively more facial expression information, so as to perform relatively more adequate facial expression processing. However, children with ASD preferred features of the mouth area, and lacked visual attention and processing of the eyes area, which might lead to their relatively weaker ability to process and recognize facial expressions than TD children. Therefore, it could be inferred that the facial expression processing disorders of children with ASD were mainly due to their atypical facial expression processing methods and strategies.
The eye avoidance hypothesis provided a certain explanation for why children with ASD had the manifestation of a lack of attention to the eyes [45]. Individuals with ASD perceived the eyes as socially threatening. Direct eye contact would trigger their strong physiological response, such as an increase in skin conductance or amygdala activity [46]. Avoiding the eyes was an adaptive strategy for them.

The Influence of Spatial Frequency on Facial Expression Processing
The change of spatial frequency would cause the change of expression features in the facial image, which would have an impact on different facial expression processing methods. It was generally believed that the low spatial frequency (LSF) was beneficial to the configural processing method. The high spatial frequency (HSF) was beneficial to the featural processing method. Additionally, the broad spatial frequency (BSF) was the original image itself, which contained all the facial information [28]. This study explored the effects of these three spatial frequencies on the facial expression processing of two groups of children. They exhibited different facial expression processing characteristics.
The experimental results showed that the recognition rate of TD children under the BSF condition was significantly higher than that under the other two spatial frequency conditions (p < 0.05) when the face was upright. It indicated that rich facial feature information could help TD children to process facial expressions. The change of spatial frequency weakened the facial feature information, affected the visual perception and information processing of TD children, thereby causing difficulties in facial expression recognition [27].
When the face was inverted, the recognition rate of TD children under the LSF condition was significantly lower than that under the other two spatial frequency conditions (p < 0.05). Although the LSF condition was beneficial to the configural processing method, TD children also suffer from inversion effects at this time. The eye tracking results showed that the fixation counts of TD children on the target image under the LSF condition were significantly fewer than those under the other two spatial frequency conditions (p < 0.05), and the fixation counts of TD children on the eyes and mouth area under the LSF condition were significantly fewer than those under the BSF condition (p < 0.05). The decrease in the acquisition of facial information was considered to be an important reason for the decline in their recognition rate. In addition, Deruelle et al. [30] gave a certain explanation, which might be related to their age. The configural processing ability of TD children gradually increased with age. When children were younger, as in the case of this study, their configural processing abilities were weaker. They failed to interpret facial expressions under the LSF condition, resulting in a decrease in facial expression recognition rates.
The experimental results showed that the facial expression recognition rate of children with ASD under the LSF condition was significantly lower than that under the other two spatial frequency conditions (p < 0.01). In contrast, children with ASD had the highest recognition rate in the HSF condition, which was significantly higher than that under the other two spatial frequency conditions (p < 0.05) when the face was upright. It could be seen that changes in spatial frequency had different effects on children with ASD. The HSF condition with more prominent local features was more conducive to the use of the featural processing method for children with ASD to process facial expressions, which was reflected in the high recognition rate [31]. However, the LSF condition with more blurred facial features prevented children with ASD from using their own processing methods, resulting in a significant decrease in facial expression recognition rate [29].
The eye tracking results showed that under HSF condition, children with ASD significantly increased the fixation counts and fixation duration on the target image, as well as the mouth area. It demonstrated that in response to the changes of spatial frequency, TD children had made certain strategic adjustments to increase visual attention to the core areas under the HSF condition, to enhance the interpretation of facial expressions [47].
These results supported that children with ASD spontaneously adopt the featural processing method to process facial expressions, relied more on local features rather than configural information, and were more accustomed to processing facial information under HSF condition that could enhance local features [48]. It also provided a certain basis for the expression intervention of children with ASD using HSF faces.
Comparing the characteristics of the two groups of children, children with ASD had significantly less fixation duration on the target image than TD children (p < 0.01) under the LSF condition, and significantly more fixation counts and fixation duration on the mouth area than TD children (p < 0.05) under the HSF condition. It could be seen that there were more differences in facial expression processing methods between the two groups of children, and their adjustment strategies under different spatial frequencies were also different.

The Influence of Inversion Effect on Facial Expression Processing
When the face was inverted, the spatial configuration of the face was affected, and the configural information from various facial areas needed to be reintegrated [34]. It had a greater impact on the configural processing method, resulting in a decrease in the facial expression recognition rate [33]. However, the local features were not affected by the inversion effect and had little effect on individuals who adopt the featural processing method. They could still rely on the local features for facial expression processing. Therefore, the inversion effect could be used to evaluate the relative dependence of individuals on the configural processing method [32].
The results of this study showed that when the face was inverted, TD children had the inversion effect under all three spatial frequency conditions, which was manifested as a significant decrease in expression recognition rate. However, children with ASD were less affected by the inversion effect. They had the inversion effect only under the LSF condition, and no inversion effect under the BSF and HSF conditions. This result was consistent with Kikuchi et al.'s conclusion that the occurrence of inversion effect in children with ASD was related to the spatial frequency of the face [39]. However, Kikuchi et al.'s experimental research only used the behavioral indicator of facial expression recognition rate. This study not only analyzed the recognition rates of the two groups of children under different conditions, but also explored their eye tracking characteristics in core facial areas such as the eyes and mouth, and the correlation between the attention to the eyes/mouth area and the facial expression recognition rate, so as to reveal their internal processing mechanism of facial expressions.
This result was different from the study of Pallett et al. [38]. They found that children with ASD and TD children adopt the same configural processing method, and would also be affected by the inversion effect. The differences in the studies might be due to the differences in participants' demographic characteristics such as age and verbal IQ levels. Pallett et al. studied adolescents aged 13 to 18 with higher average verbal IQ. However, the children with ASD that participated in this experiment were in the younger age group of 5-7 years old, and their average verbal IQ was also relatively low. It would deserve extended research from a larger age range on how the processing of facial expressions in children with ASD change with age and IQ, and whether they would adopt more configural processing and being affected by the inversion effect like TD children.
Under the LSF condition, the local features were blurred, and the featural processing method commonly used by children with ASD was difficult to use. As a result, under the condition of the upright face and LSF, the facial expression recognition rates of children with ASD were reduced. Whether children with ASD would change the processing strategy and switched to the configural processing method needed further analysis. When the face was inverted, the facial configural information was also affected. Children with ASD needed to reintegrate feature information from the areas such as eyes and mouth to process facial expressions. Compared with the condition of the upright face, children with ASD maintained the fixation counts and fixation duration on the eyes area, and significantly increased the fixation duration on the mouth area, but the facial expression recognition rate was significantly reduced. It meant that children with ASD did have an inversion effect under the LSF condition. It could be inferred that children with ASD might have the capability of configural processing, and might adopt the same configural processing as TD children under the LSF condition. Because children with ASD usually relied more on the featural processing method, they tended not to actively use the configural processing method. Only when the local features were ambiguous or absent, that is, under the condition of inverted face and LSF, children with ASD would use the configural processing method. In this situation, the weak central coherence theory was not applicable.
The eye tracking results showed that when the face was inverted, TD children significantly increased the fixation counts and fixation duration on the eyes area (p < 0.05) under all three spatial frequencies. In contrast, ASD children significantly increased the fixation duration on the mouth area (p < 0.05) under all three spatial frequencies, and significantly increased the fixation counts on the mouth area (p < 0.05) under the HSF condition. Comparing the two groups of children, it was found that children with ASD had significantly more fixation counts on the mouth area than TD children (p < 0.05), and significantly less fixation counts and fixation duration on the eyes area than TD children (p < 0.05). It could be seen that when the face was inverted, both children with ASD and TD children were able to adjust their processing strategies accordingly. However, their strategies and preferred processing areas were different. TD children could consciously concentrate their visual attention to their preferred eyes area. On the other hand, although children with ASD were less affected by the inversion effect, they would also consciously adjust their strategies and focus their attention on the mouth area, to perceive and acquire more expression information. Therefore, their facial expression recognition rate did not decrease significantly under the BSF and HSF conditions.

Conclusions
Facial expression processing disorder was one of the core causes of social disorder in children with ASD. In this study, eye tracking technology was used to analyze the facial expression processing methods and eye tracking characteristics of children with ASD and TD children. The influence of spatial frequency and inversion effect on the facial expression processing of children with ASD were explored. The main conclusions of this study were as follows: 1.
The facial expression processing ability of children with ASD was significantly weaker than that of TD children, that is, the facial expression recognition rate of children with ASD under various experimental conditions (spatial frequency, orientation) was significantly lower than that of TD children.

2.
The facial expression processing disorders of children with ASD were mainly due to their atypical facial expression processing methods and strategies. TD children paid more visual attention to the eyes area. However, children with ASD preferred the features of the mouth area and lacked visual attention and processing of the eyes area, which might lead to their relatively weaker ability to process and recognize facial expressions than TD children.

3.
Children with ASD mainly used the featural processing method to process facial expression information. HSF highlighted the local feature information of the face, which was more conducive to the use of the featural processing method for children with ASD, reflected in the increase in visual attention to facial feature areas and the improvement in expression recognition rate. 4.
TD children had the inversion effect under all three spatial frequency conditions, which was manifested as a significant decrease in expression recognition rate, indicating that TD children mainly used configural processing method. However, children with ASD only had the inversion effect under LSF condition, indicating that children with ASD had the capacity of configural processing under the LSF condition. Therefore, the weak central coherence theory was not applicable under this condition. 5.
When the face was inverted or facial feature information was weakened, both children with ASD and TD children would adjust their facial expression processing strategies accordingly, to increase the visual attention and information processing of their respective preferred processing areas. The fixation counts and fixation duration of TD children on the eyes area increased significantly, while the fixation duration of children with ASD on the mouth area increased significantly.
The results of this study provided theoretical and practical support for facial expression intervention in children with ASD. It is possible to consider using HSF images for early intervention training, and then use LSF images for learning transfer to develop their configural processing ability. Meanwhile, children with ASD need to be guided to increase visual attention and information processing on the eyes area.
In addition, this study also had some shortcomings and needed further improvement. The number of participants in this experiment was similar to previous studies. It is worthwhile to further expand the scale and type in future research. As age increases and abilities improve, whether and when children with ASD have new characteristics that are closer to TD children is also worthy of study. Furthermore, in-depth research will consider the use of electro-skin sensors or EEG devices, combined with eye tracking data.