Towards-Person Vocalization Effect on Screening for Autism Spectrum Disorders in the Context of Frustration

The purpose of this study is to investigate the vocalization characteristics of infants with autism spectrum disorder (ASD) in the context of frustration. The duration and frequency of vocalization in 48 infants with ASD and 65 infants with typical development (TD) were followed up to 24 months later for subsequent diagnosis. The typical vocalizations of infants with ASD were retrospectively analyzed, such as speech-like vocalizations, nonspeech vocalizations, vocalizations towards the person and non-social vocalizations. The results showed that, compared with the TD group, vocalizations of infants with ASD during the still-face period had lower typical vocalizations and characteristics associated with social intention, and that these characteristics were closely related to the clinical symptoms of ASD, among which vocalizations towards the person accompanied by social intention had discriminative efficacy.


Introduction
Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by deficits, limitations and repetitive interest behavior patterns in social interaction and communication [1]. The prevalence of ASD has been increasing in recent years and is about 1.5% in the general population [2]. The prevalence rate of ASD is as high as 10-20% in high-risk groups, such as siblings with ASD [3,4]. At present, the pathological mechanism of ASD is unknown and there is no specific drug treatment. Early detection and diagnosis can promote early intervention and thus effectively improve the prognosis of children with ASD [5][6][7]. Although the diagnosis of ASD was stable at 2 years of age [8,9], a recent study found that the diagnostic stability of ASD was 79% at 14 months and 83% at 16 months [10]; however, in reality the average age of diagnosis is after 3 years [11]. Therefore, research on early behavioral indicators of ASD is of great significance to promote the early detection and diagnosis of ASD.
The early abnormal behavior of ASD can be observed as early as 12 months of age but the heterogeneity is significant and no single behavioral indicator has been found to predict a stable diagnosis outcome of ASD. The early abnormal behavior of ASD mainly showed difficulties in social communication and stereotyped behavior [12,13]. Studies have found that nearly 60% of children aged 4-6 years with ASD have moderate to severe language problems [14]. An abnormal language development trajectory was usually the earliest abnormal behavior reported by parents of children with ASD, with 56% of parents showing concern about their children's insufficient language communication ability as Table 1. Assessment tools for ASD vocalization experiment.

Vocalization Acquisition and Coding
Vocalizations were captured using the SFP behavioral video in a standardized behavior observation room. The researchers unified instructions and environment settings. Infants were arranged to sit on fixed dining chairs and mothers were arranged to sit opposite and interact with infants for 2 min without toys or physical contact (interaction period). Two minutes is the baseline period for vocal collection. Then they stopped the interaction and mothers looked straight at the infants' heads and maintained neutral facial expressions for 1 min (static period) ( Figure 1).
Based on the behavioral coding system of children developed by Werner and Dawson and the coding definition of vocalizations in previous studies [37], vocalization behavior indicators in this study included two types: (1) Typical vocalizations: speech-like vocalizations that contain standardized syllables such as babbling; and nonspeech-like vocalizations of words and syllables that cannot be recognized, mainly including three types-sad vocalizations (e.g., crying), a happy sound (e.g., laughter) and atypical vocalizations (e.g., screams). (2) Vocalizations towards object/person: vocalizations towards a person have social intention, such as gaze at others' faces, or with pointing, giving, sharing and other meanings; in contrast, with vocalizations towards an object there is no social intention.
Two trained professionals coded the duration and frequency of the above two types of vocalizations using the Observer XT 12.0 (an analysis system for behavior observation recording) in the SF episode. The two types of 140 vocalizations were randomly assigned to two trained coders to control evaluator bias. The two coders did not participate in the SFP experiment. Both of them were blind to the grouping of the coded children.
Vegetative sounds such as yawning, burping and coughing were excluded from the analysis [38]. In this study, 20% of the videos were randomly selected as a coder consistency test. Through intraclass correlation coefficients (ICC), it was found that the coding consistency coefficients of the two graduate students were all in the good to excellent range: (1) typical nonspeech-like vocalizations: duration = 0.897 s, frequency = 0.680; speech-like vocalizations: duration = 0.976 s, frequency = 0.969; (2) vocalizations towards person: duration = 0.830 s, frequency = 0.701; vocalizations towards object: duration = 0.816 s, frequency = 0.914. Based on the behavioral coding system of children developed by Werner and Dawson and the coding definition of vocalizations in previous studies [37], vocalization behavior indicators in this study included two types: (1) Typical vocalizations: speech-like vocalizations that contain standardized syllables such as babbling; and nonspeech-like vocalizations of words and syllables that cannot be recognized, mainly including three types-sad vocalizations (e.g., crying), a happy sound (e.g., laughter) and atypical vocalizations (e.g., screams). (2) Vocalizations towards object/person: vocalizations towards a person have social intention, such as gaze at others' faces, or with pointing, giving, sharing and other meanings; in contrast, with vocalizations towards an object there is no social intention.
Two trained professionals coded the duration and frequency of the above two types of vocalizations using the Observer XT 12.0 (an analysis system for behavior observation recording) in the SF episode. The two types of 140 vocalizations were randomly assigned to two trained coders to control evaluator bias. The two coders did not participate in the SFP experiment. Both of them were blind to the grouping of the coded children. Vegetative sounds such as yawning, burping and coughing were excluded from the analysis [38]. In this study, 20% of the videos were randomly selected as a coder consistency test. Through intraclass correlation coefficients (ICC), it was found that the coding consistency coefficients of the two graduate students were all in the good to excellent range: (1) typical nonspeech-like vocalizations: duration = 0.897 s, frequency = 0.680; speech-like vocalizations: duration = 0.976 s, frequency = 0.969; (2) vocalizations towards person: duration = 0.830 s, frequency = 0.701; vocalizations towards object: duration = 0.816 s, frequency = 0.914.

Statistical Treatment
SPSS 23.0 was used for the statistical data analysis. In this analysis, gender is expressed by the chi-square test of the case (%). The Shapiro-Wilk test was used as a normality test. Measurement of normally distributed data were expressed as the mean ± standard deviation (⎯χ ± s) and an independent sample t-test was used for intergroup comparison. The measurement data of non-normal distribution were expressed as the median (interquartile spacing) [M(P25, P75)] and comparison between groups was performed by the Mann-Whitney U test. Pearson's correlation analysis was used to explore the rela-

Statistical Treatment
SPSS 23.0 was used for the statistical data analysis. In this analysis, gender is expressed by the chi-square test of the case (%). The Shapiro-Wilk test was used as a normality test. Measurement of normally distributed data were expressed as the mean ± standard deviation (χ ± s) and an independent sample t-test was used for intergroup comparison. The measurement data of non-normal distribution were expressed as the median (interquartile spacing) [M(P 25 , P 75 )] and comparison between groups was performed by the Mann-Whitney U test. Pearson's correlation analysis was used to explore the relationship between vocalization behavior and age, development level and clinical symptoms of ASD (p < 0.05 was considered to be statistically significant). A binary logistic regression model was used for regression analysis and a multilayer perceptron (MLP) was used to build an early screening model for ASD.

Experimental Process
First, both groups completed development-level assessment and SFP behavioral test voice collection and the HR-ASD group additionally completed ASD symptom assessment. Four professionals followed the HR-ASD group up to 24 months to complete the diagnostic assessment of ASD. If assessed as ASD, the diagnosis was made by two senior child psychiatrists according to the DSM-5. From the development, clinical symptom and voice data of the groups, the characteristics of ASD vocalization behavior were analyzed, the relationship between ASD vocalization behavior and clinical symptoms was studied and the indicators of vocalization behavior with distinguishing efficiency were determined ( Figure 2). voice collection and the HR-ASD group additionally completed ASD symptom assess ment. Four professionals followed the HR-ASD group up to 24 months to complete th diagnostic assessment of ASD. If assessed as ASD, the diagnosis was made by two senio child psychiatrists according to the DSM-5. From the development, clinical symptom an voice data of the groups, the characteristics of ASD vocalization behavior were analyzed the relationship between ASD vocalization behavior and clinical symptoms was studie and the indicators of vocalization behavior with distinguishing efficiency were deter mined ( Figure 2).

Comparison of General Data between the ASD and TD Groups
There were statistically significant differences in age and developmental quotien (DQ) (adaptive, fine-motor, language and personal-social) between the ASD and TD groups (p < 0.05) but there were no statistically significant differences in gender or gran motor DQ (p > 0.05), as shown in Table 2.

Comparison of General Data between the ASD and TD Groups
There were statistically significant differences in age and developmental quotient (DQ) (adaptive, fine-motor, language and personal-social) between the ASD and TD groups (p < 0.05) but there were no statistically significant differences in gender or grand motor DQ (p > 0.05), as shown in Table 2.

Comparison of Static Vocalization Behavior in the ASD and TD Groups
In the SF episode, on comparing the ASD and TD group SFP vocalization results (Figure 3a,b), the duration and frequency of speech-like vocalizations and vocalizations towards the person in the TD group were higher than those in the ASD group; the difference was statistically significant (Z = −2.183, −3.179, −2.275, −3.707; p < 0.05). However, total vocalizations, nonspeech-like vocalization and duration and frequency of vocalizations towards an object have no statistical significance in the two groups (p > 0.05).
In the SF episode, on comparing the ASD and TD group SFP vocalization results (Figure 3a,b), the duration and frequency of speech-like vocalizations and vocalizations towards the person in the TD group were higher than those in the ASD group; the difference was statistically significant (Z = −2.183, −3.179, −2.275, −3.707; p < 0.05). However, total vocalizations, nonspeech-like vocalization and duration and frequency of vocalizations towards an object have no statistical significance in the two groups (p > 0.05).

Correlation Analysis of SFP Vocalizations with Age, DQ and Clinical Symptoms in the ASD Group
According to the correlation analysis results of SFP vocalizations with age, development level and clinical symptoms (Table 3), the duration and frequency of vocalizations towards the person in the ASD group were positively correlated with adaptive DQ (p < 0.05) and the frequency of vocalizations towards person was positively correlated with fine motor DQ (p < 0.05). There was a positive correlation between the duration of speechlike vocalizations and age (p < 0.05) but no significant correlation between other indices and age or DQ (p > 0.05).
The duration and frequency of vocalizations towards the person were negatively correlated with the social communication and total score of the CARS and ADOS (both p < 0.05). The frequency of vocalizations towards the person was also negatively correlated

Correlation Analysis of SFP Vocalizations with Age, DQ and Clinical Symptoms in the ASD Group
According to the correlation analysis results of SFP vocalizations with age, development level and clinical symptoms (Table 3), the duration and frequency of vocalizations towards the person in the ASD group were positively correlated with adaptive DQ (p < 0.05) and the frequency of vocalizations towards person was positively correlated with fine motor DQ (p < 0.05). There was a positive correlation between the duration of speech-like vocalizations and age (p < 0.05) but no significant correlation between other indices and age or DQ (p > 0.05).
The duration and frequency of vocalizations towards the person were negatively correlated with the social communication and total score of the CARS and ADOS (both p < 0.05). The frequency of vocalizations towards the person was also negatively correlated with the limited and rigid behavior of the ADOS (p < 0.05). The duration of speech-like vocalizations was positively correlated with symbolic behavior in the CSBS-DP-ITC (p < 0.05). Its duration and frequency were also positively correlated with social communication, language factor and total score of the CSBS-DP-ITC (p < 0.05). However, the total vocalization frequency of ASD was negatively correlated with the CARS (p < 0.05). There was no significant correlation between other indicators and clinical symptoms (p > 0.05).

Regression Analysis of ASD Screening in SFP Static Vocalizations
The binary logistic regression equation was constructed by including speech-like vocalizations and the duration and frequency of vocalizations towards the person. Binary logistic regression analysis of the ASD diagnosis in the SF episode showed that the frequency of vocalizations towards the person had discriminating efficiency (OR = 1.609, 95% CI = 1.143-2.266; p = 0.006) ( Table 4).

Use of an MLP to Build an Early Screening Model for ASD
The length of vocalizations towards the person, the age and the length and frequency of speech-like vocalizations in the SF episode were taken as sample behavioral characteristics. The ASD and TD groups were taken as samples, with the ASD group labeled as 0 and the TD group as 2; 70% of the data were randomly selected as the training set and 30% as the test set. The results show that classification accuracy was 93% in the ASD group and 75% in the TD group, with an average accuracy of 84% (Table 5). The receiver operating characteristic curve (ROC) is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The ROC of the proposed MLP is shown in Figure 4. The true positive rate (TPR) changes with false positive rate (FPR) in the Figure 4. The area under the curve (AUC) is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one. It can be found the proposed MLP shows a good recognition performance as AUC is greater than 0.75.

Discussion
In the study, a comparative analysis of ASD and TD groups showed that there were indeed abnormal early vocalizations in ASD, and abnormal vocalizations had significant potential in the early screening of ASD. The length and frequency of speech-like vocalizations towards the person in the SF episode for the ASD group are less than for the TD group. The TD group showed more paralanguage and vocalizations towards the person with social intention, such as pointing, sharing and giving, indicating that the TD group would use more active social behaviors to attract and interact with their mothers in a frustrating situation. In contrast, the ASD group showed lower social motivation in a frustrating situation, which suggests that frustrating situations are more likely to highlight impaired co-attention-based active social interactions in infants with ASD. This study is consistent with the findings of Yirmiya et al., where infants with ASD already show less requesting behavior and social participation behavior before 14 months of age [39]. The results of this study support the exploration of a combination of language maturity and social intention as an early behavioral indicator of ASD [40]. There is a social feedback loop between children's language development and adults' vocalization. Adults tend to respond to children's typical vocalizations and the adults' responses influence children's vocalizations. Children with ASD have more nonspeech-like vocalizations and vocalizations towards the object, and these atypical vocalizations are easily ignored by adults, which leads to fewer responses from adults.
Meanwhile, impairment of the social core of ASD prevents them from learning from the fewer responses from adults. The incomplete social feedback loop between children with ASD and adults further deprives them of early language and social learning opportunities, thereby exacerbating the core damage of ASD [41]. Therefore, studying the behavioral characteristics of early ASD vocalization is of great significance for exploring the damage of early ASD core symptoms, providing valuable behavioral indicators for early detection and exploring the critical goals of early clinical intervention. In recent years, it has been a trend to combine the maturity of language expression (typical vocalizations) and vocalizations of social intention to find more reliable early behavioral indicators of ASD vocalizations [42]. In this study, the phonological behaviors of children with ASD in the context of frustration, especially those with social intention, are more prominent than those in the context of normal interaction.
Although the average duration of vocalizations towards the object in the ASD group was longer than in the TD group, the difference between the two groups was not statistically significant. This is consistent with previous literature. Ozonoff et al. found that the vocalizations of infants with ASD at 12-36 months of age were accompanied by a lower gaze but there was no abnormality in their vocalizations toward objects at 6-36 months of age [43]. In normal social situations, such as free games or frustrating situations, infants

Discussion
In the study, a comparative analysis of ASD and TD groups showed that there were indeed abnormal early vocalizations in ASD, and abnormal vocalizations had significant potential in the early screening of ASD. The length and frequency of speech-like vocalizations towards the person in the SF episode for the ASD group are less than for the TD group. The TD group showed more paralanguage and vocalizations towards the person with social intention, such as pointing, sharing and giving, indicating that the TD group would use more active social behaviors to attract and interact with their mothers in a frustrating situation. In contrast, the ASD group showed lower social motivation in a frustrating situation, which suggests that frustrating situations are more likely to highlight impaired co-attention-based active social interactions in infants with ASD. This study is consistent with the findings of Yirmiya et al., where infants with ASD already show less requesting behavior and social participation behavior before 14 months of age [39]. The results of this study support the exploration of a combination of language maturity and social intention as an early behavioral indicator of ASD [40]. There is a social feedback loop between children's language development and adults' vocalization. Adults tend to respond to children's typical vocalizations and the adults' responses influence children's vocalizations. Children with ASD have more nonspeech-like vocalizations and vocalizations towards the object, and these atypical vocalizations are easily ignored by adults, which leads to fewer responses from adults.
Meanwhile, impairment of the social core of ASD prevents them from learning from the fewer responses from adults. The incomplete social feedback loop between children with ASD and adults further deprives them of early language and social learning opportunities, thereby exacerbating the core damage of ASD [41]. Therefore, studying the behavioral characteristics of early ASD vocalization is of great significance for exploring the damage of early ASD core symptoms, providing valuable behavioral indicators for early detection and exploring the critical goals of early clinical intervention. In recent years, it has been a trend to combine the maturity of language expression (typical vocalizations) and vocalizations of social intention to find more reliable early behavioral indicators of ASD vocalizations [42]. In this study, the phonological behaviors of children with ASD in the context of frustration, especially those with social intention, are more prominent than those in the context of normal interaction.
Although the average duration of vocalizations towards the object in the ASD group was longer than in the TD group, the difference between the two groups was not statistically significant. This is consistent with previous literature. Ozonoff et al. found that the vocalizations of infants with ASD at 12-36 months of age were accompanied by a lower gaze but there was no abnormality in their vocalizations toward objects at 6-36 months of age [43]. In normal social situations, such as free games or frustrating situations, infants with ASD have abnormal vocalizations towards the person but there is no significant difference in vocalizations towards objects compared with TD infants. These results suggest that vocalizations towards objects may not play an early role in discrimination.
It is still controversial whether the typical vocalizations of children with ASD are abnormal [40,42]. Chenausky et al. believed that children with ASD had significantly less speech-like vocalizations than TD infants in the early stage [22] but other researchers held the opposite view that there was no difference [44]. The results of this study support the former view. By comparing the vocalizations of infants in the ASD and TD groups in a frustrating situation, it was found that the duration and frequency of vocalizations in the ASD group were less than in the TD group; however, there was no significant difference in nonspeech-like vocalizations between the two groups. These results suggest that children with ASD have typical abnormal vocalizations in the context of frustration, which is mainly manifested by less speech-like vocalizations, and that nonspeech-like vocalizations may not play a discriminative role.
Correlation analysis showed a positive correlation between the duration and frequency of speech-like vocalizations with age, social communication, language factor and symbolic behavior, but no correlation with the scores of CARS and ADOS. It indicates that speechlike vocalizations reflect the development process of typicality. The duration and frequency of vocalization towards person were correlated with the scores of CARS and ADOS, which indicated that the vocalization towards person reflected social indicators. The duration and frequency of vocalization towards person were correlated with individual indicators in the developmental scale, and the results were not representative. There was no correlation between other vocalization behavior, age and developmental level.
ASD vocalizations were correlated with clinical symptoms, language and social development ability to varying degrees and that the duration and frequency of vocalizations towards the person were negatively correlated with the social core symptoms and limited stereotyped behavior of ASD. There is a positive correlation between the duration and frequency of speech-like vocalizations and the development of children's social communication and language competence but no correlation with age or other developmental dimensions. The lower the total vocal frequency of children with ASD, the higher the symptom score, and vice versa. There was no correlation between nonspeech-like vocalizations and clinical symptoms, which is consistent with Chericoni's results [45]. The results of this study support vocalizations (especially their length and frequency) towards the person as potentially valuable behavioral indicators and speech-like vocalizations as reflecting whether the language and cognitive development of infants with ASD is delayed. Regression analysis further found that the frequency of vocalizations towards the person had discriminative efficacy and that vocalizations towards the person with social intention could predict the diagnosis of ASD better than typical vocalizations in frustrating situations. The results of this study suggest that vocalization with social intention is more valuable for the early identification and screening of core symptoms in infants with ASD than typical vocalizations in the context of frustration. An MLP was used to construct an early screening model for ASD based on the behavioral characteristics of targeted vocalization, age and duration and frequency of speech-like vocalizations in the context of frustration, and the accuracy of predicting the outcome of ASD was up to 84%.

Limitations and Future Directions
There are several limitations to the current study. Firstly, the age of HR-ASD group was higher than that of TD group, we found that the age with most indicators of vocalization behavior does not exist in correlation. Secondly, environmental and intervention factors that may affect infant vocalization were not included in this study, and 16 HR-ASD children who no longer meet the diagnosis of ASD during follow-up may have false-negative cases. Moreover, this study is a preliminary study of ASD vocalization. Vocalizations towards person in this study have a certain effectiveness in predicting ASD. However, it is not the only diagnosis predictor. Future more prolonged studies include the suspected but not confirmed ASD, analyzing the family environment and intervention factors to evaluate the utility of the ASD predictors better.

Conclusions
This study showed that children with ASD had abnormal vocalizations at an early stage, manifesting in both typical vocalizations and vocalizations towards people with social intention. The former may be related to the language development process of children whereas the latter are often closely related to the severity of the clinical symptoms of ASD and have an important differential significance for predicting the outcome of ASD. Parents' concerns and reports on the vocalizations of infants with ASD have a certain clinical basis that should be taken into account in early screening.
According to this study, vocalizations towards people and typical vocalizations are of great significance for ASD children. Parents should pay attention to providing a rich language environment, such as imitation and timely response to the children's vocalizations, guiding the joint attention in the parent-child interaction, and reinforcing children's vocalizations towards people to promote the ASD children's language development and improve social interaction.