Social Distance in Interactions between Children with Autism and Robots

The use of non-industrial robots, called service robots, is increasing in the welfare fields to meet the demand for robot therapy among individuals with autism. The more simple communication structures and repetitive behaviors of robots, compared to humans, make it easier for children with autism to interpret communication and respond appropriately. Interacting with a robot allows for social distance to be designed and maintained depending on a person’s social interaction needs. To simulate natural social interactions, robots need to perform social distance in some way. In the context of interacting with autistic children, understanding their social response levels is crucial for the robot to implement decisions regarding the distance kept during the interaction. In this study, an experiment was conducted to examine the accuracy of a detection program and explore the correlations between the social responsiveness of children and social distance, wherein 15 autistic children interacted with a robot on a one-to-one basis for about 20 min. The results revealed that both programs implemented in the robot-assisted autism therapy were effective in detecting social distance in a natural HRI situation.


Introduction
In recent years, there has been increasing attention on non-industrial robots called "service robots" in the welfare and humanities fields. This technology differs from industrial robots that perform assembly and other tasks in factories to replace human labor. With recent technological advancements, in 2015, domestic robots were introduced for home use in Japan and other countries with expectations for future expansion [1]. Service robots can be used to simulate social interactions with individuals who may be experiencing social anxiety, difficulties in communicating, or reading social cues. These robots can be designed to interact with humans and communicate in a simulated natural manner without generating fear or anxiety in humans [2]. To achieve this, there is a field of research called "human-robot interaction (HRI)". HRI assumes that humans can coexist with robots in daily life and that robots can support humans in daily life activities without harm. This requires robots to have the ability read human emotions. Robot technology research actively conducts practical applications of mental care robots that assist people in medical, long-term care, and welfare settings [3]. Socially assistive robots are one branch of this research field and some studies have focused on robot-assisted autism therapy [4]. In 2016, 1 in 54 children (aged 8 years) in the United States were diagnosed with autism spectrum disorder (ASD) [5], and the prevalence has been increasing [6]. In Japan, a recent report estimated the prevalence of children (aged 6-9 years) with neurodevelopmental disorders, including autism at 1.9% according to parents' ratings and 9.3% according to teachers' ratings [7]. Other studies have shown that Japan is also witnessing a surge in the number of children with autism [8]. Therefore, the social demand for effective programs to support the development of children with autism is extremely high. To address this problem, robot therapy is now used for children with autism [4,9]. This has proven to be effective because children with autism face difficulty in reading facial expressions and since robots have simpler structures and expressions compared to humans, they can be easily interpreted by children with autism [10,11]. Studies surrounding robot-assisted autism therapy have focused not only on facial expressions, but other social skills [12]. This leads to more positive communication.
In this study, we focus on social distance as an evaluation measure of robot therapy. Social distance depends on the nature of the relationship between people and their environment or on interpersonal relationships [13]. As such, this study suggests an automatic estimating program of social distance during robot-assisted autism therapy. Moreover, this study explores the correlation between the social responsiveness of children with autism and the detected social distance. The remainder of this article is organized as follows. Section 2 describes the developed programs to estimate social distance in natural HRI environments. Section 3 describes the experiments that featured children with autism interacting with a robot. Section 4 discusses the impact of the developed programs and social distance between children with autism and the robot, as well as the correlation between social responsiveness and distance. Section 5 describes some unanticipated findings and the limitations of this study. Section 6 provides the conclusions of this study.

HRI
HRI is the study of understanding, designing, and evaluating robot systems that humans actively use or robot systems that coexist with humans. In recent years, robots have become more complex and sophisticated, being integrated into workplaces, homes, hospitals, remote areas, dangerous environments, and battlefields [14]. The more they become part of our daily lives, the more we need to investigate human robot interactions. However, HRI presents many challenges because humans are directly involved. This includes considering interactions with robots that simulate human communication, such as through voice dialogue, gestures, sight, and positioning, but the complex processes of how these simulations stimulate and affect human beings is also a factor [14]. Interaction in HRI is used in a fairly broad sense. An appropriate and effective HRI has not yet been clarified or defined at this stage; however, knowledge of human-to-human communication is growing, which informs HRI. HRI is often applied in the form of experiments designed or analyzed based on findings from behavioral and clinical psychology [15]. Robot technology for clinical use has mainly been targeted toward care for older adults in nursing homes, and experiments on the social effects of robot therapy have confirmed its effectiveness in improving the moods of older adults and reducing depression [16]. In a previous study on robot therapy, Wada et al. examined the interactions of animal-type robots with older cancer patients and patients with depression, showing improvements in the psychological conditions of the patients [16]. This method involved evaluating the moods of the older adults, using face scales that depicted seven facial expressions as responses to the animal robot. Their recorded mood profile states were based on a questionnaire that was completed before and after the experiment. Tetsui et al. performed robotic therapy as an alternative to animal-assisted therapy. Their evaluation was based on the decrease in diastolic blood pressure and gaze observations to verify the validity of the therapeutic effects [17]. In a group-type robot therapy study conducted by Hamada, the evaluation was divided into passive and negative reactions, and was performed visually at regular intervals [18]. Evaluations in these robot therapies were not quantitative and were performed visually after the experiment.

HRI in Autism Therapy
ASD is a developmental disorder characterized by speech delay, difficulty with interpersonal interactions, and a strong fixated interest in specific subjects [19]. According to a 2012 report, among seven major countries, Japan ranks second in the prevalence of autistic individuals after the United States [20]. ASD includes lifelong disorders that affect communication skills and the ability to understand and respond to social cues. ASD may also be associated with intellectual disabilities or other disorders such as hyperesthesia. Against this background, ASD was redefined as a continuum of various autism disorders, thus eliminating the boundaries with the typical [21]. ASD is also considered a congenital brain disorder. Other features of the disorder include non-responsiveness to calls, hypersensitivity and dullness, poor concentration, and poor attention and memory. Symptoms such as difficulty in controlling emotions, immediate panic, and dislike of changes in the environment are also observed [22].
Robot therapy is a form of psychotherapy aimed at improving symptoms of autism and dementia through mediation by robots. Several experiments in which this method was applied confirmed the positive effects of using robots in therapy with children. These experiments have also led to greater knowledge that better supports people with ASD [23]. Among individuals with autism, reading complex human facial expressions is difficult. Since robots have simpler structures than humans, children with autism find it easier to interpret the social and communication cues, and thus actively communicate with the robot [10]. Therefore, by communicating with a robot, children with autism can improve their communication skills [12,24]. For example, Lee et al. examined the effects of verbal communication between autistic children and humans, and then robots [10]. Their results revealed that autistic children could follow the words of the robots rather than that of humans. Rudovic et al. similarly examined the effectiveness of robot therapy from three modalities (audio, visual, and autistic physiology) [25]. Miyamoto et al. gave blocks to a child with autism while the robot gave commands asking the child to perform tasks, such as stacking and breaking [26]. The analysis method examined the cooperative state in which the participant was cooperating and performing the intended actions and the conflict state in which the participant was taking a non-intended action. It was suggested that the participant could understand the intention and independence of the robot's behaviors and communicate naturally by gazing.
The following is a brief description of the most well-known robots used in robot therapy for autistic children. KASPAR is a child-sized robot developed by the Adaptive Systems Research Group at the University of Hertfordshire [27]. This technology is capable of making human facial expressions and gestures. Therefore, a game is played between autistic children, and KASPAR is used for emotional training. Keepon was developed by the CareBots Project and has a yellow dango-shaped body (height 120 mm, diameter 80 mm) [28]. The upper dango comprises the head, with a wide-angle video camera marking the left and right eye and a microphone attached to the nose. The dango on the side corresponds to the belly. Keepon has four degrees of freedom, and it expresses its state of mind through gestures. QTrobot is a robot designed by artificial intelligence with a liquid crystal display face and a robot arm made by LuxAI [29]. It enhances the emotion recognition abilities of autistic children by imitating the emotions of children with expressive faces and moving arms. The robot's behavior is partly fixed and it never denies the child. By using the attached tablet terminal, the QTrobot interacts with the child, arousing the child's interest and reducing confusion.

HRI and Social Distance
The concept of interpersonal distance deals with how the distance between people during an interaction indicates the facets of the relationship between them. This is called the "social distance". In the proxemics theory, the space between people is determined by being aware of the other person's presence, and includes psychological and communicative properties [30]. Social distance consists of four types; close distance, individual distance, social distance, and public distance [31,32]. In a close relationship, such as with a family member or an intimate partner, the distance between people is 0 m to 0.45 m, or within touching distance. Individual distance is usually the distance one maintains with friends, while social distance is around 1.2 m to 3.5 m when conversations are conducted not within touching distance. Public distance is the distance maintained in business negotiations or between an audience and a speaker in a public forum and is usually 3.5 m or more (See Figure 1). We implemented this theory into our robot to maintain personal distance during the interactions; however, this study does not use the function of keeping this specific distance between a human and robot.  [31] and Argyle [32]. 'A' means an agent, either human or robot.
Although social distance is defined above, the size of one's preferred personal space varies. Introverts tend to have a larger personal space preference, for example. Invading one's desired personal space can cause discomfort and tension in social interactions. Social distance, therefore, extends in front of a person during social interactions. Nakashima and Sato conducted an experiment in which a robot advanced toward a participant in a standing or sitting position from a distance of 10 m, and when the participant did not want the robot to approach any further, they raised their hand and the robot stopped. It was found that it is easier to estimate the distance between the robot and the participant when the latter is in a sitting position; moreover, the distance increases in proportion to the speed of the moving robot [33]. Yoda and Siota measured the distance between two humans by attaching a marker to the top of the participant's head and taking pictures with a camera mounted on the ceiling [34]. Suga et al. measured the distance in a room using omnidirectional cameras installed in the corners and ceiling to measure the location of people from the images captured [35]. These two methods involved attaching devices to the participant or using a specific camera. Koay et al. measured the comfort of participants during interactions with a social robot. In this study, the distance during interactions was divided into three zones: less than 1 m, 1 m to 3 m, and more than 3 m. Participants felt discomfort when the distance between them and the robot fell within 3 m [36]. As for social distance in HRI, Kim and Mutlu considered not only proxemics but also power relationships (such as a supervisor robot with a subordinate robot) and the nature of the task (such as a cooperating robot and a competing robot). This study showed that participants preferred to have more distance from the robot if the human and the robot were in a cooperative relationship [13]. Kang et al. explored the acceptance of users toward robots when they are touched by the robot at an intimate distance. The results showed that people felt discomfort when physically contacted by the robots [37]. In these studies, however, in measuring the social distance of humans to robots in HRI, it cannot be said that natural communication is performed only by examining the appropriate social distance.

Methodology
This study aims to explore the tendencies of social distance when autistic children interact with robots and to verify the accuracy of a skin-color-detection program by automatically detecting the social distance in the interaction. There are numerous pixel-based skin-color-detection techniques, and one was modified and used for this study [38]. In the skin-color-detection program, when the background is close to skin color, it is detected as "the skin color" and the error is large. It is hypothesized that the error can be reduced by specifying the range of skin-color-detection and extracting it for processing. Here, we consider the accuracy of distance measurement with Program B and measure the difference between this and Program A by processing the skin-color detection that specifies and extracts the range. From this, we examine if there is a difference in the distance between the robot and the participant depending on the extent of the autism traits.
The distances between the robot and the children were estimated from video footage. The software (called ESD version 2.0) was programmed based on Python language and used OpenCV, particularly the cvtColor function, to create a mask for extracting the skin color of children. The images were read as BGR and converted to HSV. As for inRange function, the mask tracks the target object and extracts the HSV values, and the output is saved in the csv form. We attempted another method (ESD version 1.0) that calculated the area of the target color over the image but found errors because of the similar shades in the environment (e.g., wall, tables). ESD version 2.0 created a frame to track the face and calculated the target colored area inside the frame. The program that estimated the distance ran on Python and OpenCV, using the cvtColor function for masking to extract only the specified color from the image input from the camera. BGR is a color index based on blue, green, and red components. The image captured in this format was converted into an HSV model image in which the hue (color), saturation (vividness), and brightness were separated. The inRange function masks H (hue) from 0 to 20, S (saturation) from 10 to 140, and V (brightness) from 60 to 210 acted as thresholds close to the skin color. The area of the extracted skin color part is stored as a score in a variable prepared for each frame. After the program ended, the score in the variable was output in a csv file format that can be used by a spreadsheet software and then analyzed. We modified the program and named it Program B. The basic skin-color-detection component was the same as in Program A, but the range for skin-color detection was specified by the upper left of the image as a reference, and the specified range was extracted and detected. In addition, to reduce measurement errors caused by the background of the laboratory room, a low-pass filter was applied to the image to smooth the image before the skin-color detection. Figure 2 shows the flowchart of the distance measurement in Programs A and B.
Distance estimation was performed using video or images from the camera. The video was read as an image for each frame, and the images were grayscaled. Facial recognition was performed by the cascade judgment device. Further, the same image was converted into an HSV model, and only colors close to skin color were masked. Next, the number of pixels recognized as skin color by mask processing, in the part estimated to be the location of the face, was recorded. This was performed frame by frame to estimate the distance. For distance estimation from pixels, the number of pixels of the skin color of the face-recognized component of the participant, at about 0.5 m, 1 m, and 1.5 m before the experiment, were recorded. The quadratic function was then approximated from the recorded three points, and the distance was estimated from the approximate expression. Next, the distance was measured, using the video footage. Since this video contains 30 frames per second, the average number of pixels in the 30 frames was taken as the number of pixels per second. In addition, to measure the accuracy of the program, we considered accuracy from the difference between the distance visually annotated with the video and that measured from the two programs.

Experiments
As the robot communicating with children, we used NAO (see Figure 3). It is a small humanoid robot that walks on two legs independently and was developed by Aldebaran Robotics (currently SoftBank Robotics). This robot is used in therapy for the elderly and children with autism, and as a programming tool in many educational institutions and research facilities. The height of the robot is 574 mm, which is sufficient for playing touch game with children. It speaks numerous languages, including Japanese. It has seven touch sensors on his head, hands, and feet, which were used in our experiment [39]. This robot was controlled by the Wizard of Oz (WoZ) technique in this study. The cameras were inserted into the face of robot, but we used the data collected by a high-resolution webcam (full HD 1080p) that recorded from behind the robot. This was because the robot needed to continue moving for the games, and the resolution was too low to run the detection programs discussed in Section 2. In total, 15 children with ASD (age: 4-14 (M = 9.3, SD = 3.9), including 2 girls and 13 boys) participated in the trials. Two interactive movement sessions were conducted in consultation with the occupational therapists. Touch Body Game [40] and Dancing with Me [41], designed in our previous work, proved that the robot-play therapy helped children with ASD to improve their prosocial behavior. As shown in Figure 4, in all trials, either the caregiver or the therapist played the games together with the child. Before starting the experiment, the researcher explained the protocol of the experiment to the caregiver or therapist, and child. After the explanation, the participants entered the play room and followed what the NAO asked to do. While they interacted with NAO in the play room, the WoZ operator controlled the NAO and collected the webcam data in the hidden room. The Touch Body Game was used to investigate whether a child with autism had a solid understanding of body parts. The children were asked by the robot to touch specific body parts. Five questions were asked at 30 s intervals. For example, NAO says, "Please touch the top of your head slowly", and waits 30 s. When the child touches the specified part, NAO praises the child and moves to the next question. If the response is wrong, NAO will say "Please try again". If there is no response from the child, NAO automatically moves on to the next question after 30 s. The sensors and bumpers of the body parts used were the head, left hand, right hand, left foot, and right foot, in that order. During Dance with NAO, autistic children danced with NAO to theme songs from popular animation shows, "Yokaitaiso Daiichi" and "Head, Shoulders, Knees, and Toes". Here, NAO jumped along with the participants. Each song was played for 90 s, as it would on TV. When the song begins, NAO says, "Follow me and dance with me". The dance imitates the actual dance of "Yokaitaiso Daiichi" and "Head, Shoulders, Knees, and Toes" with the head and arms swinging to the rhythm. All these interactions were recorded with the webcam (see Figure 5). The social responsiveness scale (SRS) was used to measure the strength of autistic traits in the children [42]. The total scores of SRS were converted to T scores, and they represented the level of the autism traits. Symptomatic traits of autism were considered severe when the T score was 76 or higher; a score between 66 and 75 was considered moderate. T scores of 60 to 65 represented mild autism traits, and 59 and lower T scores indicated typical levels [42,43]. SRS has five treatment subscales: social awareness, social communication, social cognition, social motivation, and restricted interests and repetitive behavior. In this study, SRS was performed on the 15 participants, and the average total T score was 68.47 ± 11.06. The maximum T score was 95 (participant 14) and the minimum was 50 (participant 2). Table 1 shows the results of the SRS test.

Results
The distances estimated by Programs A and B are displayed in Figure 6. The figure also shows the comparison of the distance measured by a human coder. The horizontal axis of the graph is time (seconds) and the vertical axis is the distance (meters) between the participant and the robot. In addition, Table 2 indicates the comparison of error between the estimated maximum distance by the different measurement methods (human coder, Program A, and Program B), and the average error of distance estimated by the two programs.

P SRS Total Pro A(H) [m] Pro B(H) [m] Pro A(A) [m] Pro B(A) [m]
P

Correlation between Social Distance and SRS
The distance measurement was performed using both Programs A and B, and the average distance error was estimated as shown in Figure 7. The horizontal axis of the graph was the participants, and the vertical axis was the average distance error (meters). From the results of this experiment, the average measurement distance error in 15 participants was −0.0164 m when measured with Program A and −0.0035 m when measured with Program B. To investigate whether there is a difference between the two types of programs, the average distance error of all the members was T-tested and the P value was considered. Since the P value was 0.3509, there was no significant difference between the two programs.
In Figure 7, the results of the SRS subscales (social awareness, social communication, social cognition, social motivation, and restricted interests and repetitive behavior) are indicated in each participant. The results show the tendencies of the participants with lower SRS scores (which means less severe autistic traits) to prefer a smaller distance with the robot.

Discussion
The average distance between the child with the highest T score on the SRS and the robot was 0.7 ± 0.42 m, and the average distance between the child with the lowest T score and the robot was 0.83 ± 0.30 m. The distance between the child and the robot with the largest T score and the distance between the child and the robot with the lowest T score is output as time (s) on the horizontal axis and the distance (m) between the child and the robot on the vertical axis. In this experiment, the child with the lowest T score (very mild autistic traits) placed the robot within personal distance during the experiment. The child with the highest T score (severe autistic traits) kept the robot within an intimate distance during the interaction. However, the average distance of three people with severe autism traits with SRS total scores was 0.62 ± 0.62 m, and that of three people with very mild autistic traits was 0.71 ± 0.38 m. This means that the average distance of three people with severe autistic traits and that of three people with mild autistic traits was both within the personal distance. It may be concluded that the severity of autism traits may not affect social distance. Moreover, participants 9, 10, 12, and 14 stayed longer at an intimate distance than other participants. The SRS total score of participant 12 was very low, and this participant could therefore be considered neurotypical based on his or her total score. However, the social awareness scores were higher in participants 10, 12, and 14, and they demonstrated the same tendencies of proximity toward the robot during the interactions (see Figure 8). Using the T-test to examine whether there is a difference between the distance measured by a human coder and by the program in this experiment, the P value between the visually measured distance and the program was larger than 0.05 (no difference) in all 15 children. Therefore, NAO can automatically detect social distance in robot-assisted autism therapy by using the simple skin-color-detection program, avoiding extra sensors. The disadvantage of the skin-color-detection program is that if the background is a color similar to skin color, or if two or more people appear on the same screen, the range detected as the skin color and the error become large. Additionally, there was a limitation in this work; for example, even when the distance between the child and the robot did not change, such as when the child crouched down, the skin-colored part decreased, and the distance became larger in this experiment. By selecting the range, the problem was almost solved in this study, but it would be better to find a more sufficient detection program for implementing in natural HRI. We did not insert the program into robot directly and instead used the webcam behind the robot, but we believe that this allowed the robot to infer the emotions of the child from the perspective of social distance and react according to this distance. In this study, NAO provide limited feedback by verbal response according to the activation of touch sensors. During the dance interaction, NAO did not provide feedback to the children. By implementing the detection program to NAO, social distancing could be part of the feedback from NAO during interaction.

Conclusions
From this work, it is clear that the severity of autism symptoms did not affect the social distance between the children and the robot during interacting with the robot. Since the robot was placed at a social distance depending on the participant, the robot needed to interact according to the social distance maintained by the participant. As such, the robot needed to automatically detect the social distance from the other party to interact with it appropriately, as seen from the camera images in real-time. We believe that it will be possible for the robot to facilitate more natural communication during real-time interactions based on the social distance implied by the participants.