IoT and AI-Based Application for Automatic Interpretation of the Affective State of Children Diagnosed with Autism

In the context in which it was demonstrated that humanoid robots are efficient in helping children diagnosed with autism in exploring their affective state, this paper underlines and proves the efficiency of a previously developed machine learning-based mobile application called PandaSays, which was improved and integrated with an Alpha 1 Pro robot, and discusses performance evaluations using deep convolutional neural networks and residual neural networks. The model trained with MobileNet convolutional neural network had an accuracy of 56.25%, performing better than ResNet50 and VGG16. A strategy for commanding the Alpha 1 Pro robot without its native application was also established and a robot module was developed that includes the communication protocols with the application PandaSays. The output of the machine learning algorithm involved in PandaSays is sent to the humanoid robot to execute some actions as singing, dancing, and so on. Alpha 1 Pro has its own programming language—Blockly—and, in order to give the robot specific commands, Bluetooth programming is used, with the help of a Raspberry Pi. Therefore, the robot motions can be controlled based on the corresponding protocols. The tests have proved the robustness of the whole solution.


Introduction
Autism spectrum disorder is a condition of brain development that can appear before the age of 3 years [1]. Children diagnosed with autism have difficulties in expressing their feelings and communicating nonverbally. Through drawings, children can express emotions, and this is where the PandaSays mobile application is helpful for children diagnosed with autism; in order to improve the communication between parents or tutors and their children and to understand better their children's emotional state, the application will offer them some indicators based on the machine learning algorithms. Currently, the model trained with MobileNet has an accuracy of 56.25%, and the database of drawings is constantly increasing.
The main feature of the application is the use of machine learning algorithms to predict the affective state of the child from his/her drawings. Analysis of the drawings could provide parents/tutors or therapists a profound understanding of the child's psychological or mental state. In addition, drawings can aid the comprehension of the child's personality. This is a gap that will be filled by the PandaSays application.
Other features contained by the application are label recognition for identifying objects, activities, products, and so on, an augmented reality module, a drawing module, a sign language module, and Google Text-To-Speech. The Sign language and Google Text-To-Speech modules might be helpful in the case of aphasia or different types of speech impairments.
The latest studies have demonstrated that humanoid robots are efficient in helping children diagnosed with autism. Recent studies have demonstrated that children are more feedback obtained from touching the device is more effective than the one resulting from the touching forces of the caregiver.
All of the papers mentioned above reflect the importance of humanoid robots in helping children with autism. The part that we are introducing is affective state interpretation from drawings, connecting it further to the humanoid robot, and using its capabilities to help children express themselves and communicate better with their parents and with other children. In addition, autism spectrum disorder can be diagnosed from early age (2 years). As soon as the child begins drawing, the "PandaSays" mobile application can help the parent/tutor to monitor the child's affective state as early as possible.
In our previous work [16] we discussed the Android application PandaSays, which predicts the affective state of children with autism from their drawings, and with regard to integration with humanoid robots activating their movements. More specifically, in [16], the possibility of controlling the robot Alpha 1 Pro [17], from Ubtech company (Shenzhen, China), using Bluetooth communication protocol was mentioned, a subject that is explored in the current paper. The robot is used to perform some action based on the output received from the machine learning algorithm. For example, if the affective state that resulted from a drawing is "sad", the robot will start dancing in order to distract the child from his current state and make them more joyful. In this context, the current paper reaches the following objectives: testing the PandaSays mobile application, discussing performance evaluation using deep convolutional neural networks and residual neural networks, establishing a strategy for commanding the Alpha 1 Pro robot without its native application and developing a robot module that includes communication protocols with the PandaSays application. The output of the machine learning algorithm involved in PandaSays is sent to the humanoid robot to execute some actions, such as singing, dancing, and so on. Alpha 1 Pro has its own programming language-Blockly-and in order to give the robot specific commands, Bluetooth programming is used in conjunction with a Raspberry Pi. Therefore, the robot motions can be controlled based on the corresponding protocols. After setting up the communication with the robot, the code for controlling the robot is written in Python 3. The other robots that will be used in PandaSays application are NAO (SoftBank) and Marty (robotical.io); these are used for testing the mobile application and to further discuss the performance evaluations.
The article has five sections. The first section is "Introduction", followed by Section 2 in which the PandaSays mobile application is described, including performance tests by using deep convolutional neural networks and residual neural networks. Section 3 presents the Alpha 1 series Bluetooth communication protocol, emphasizing the whole strategy for establishing the Bluetooth serial communication between Raspberry Pi and Alpha 1P robot, using a SPP (Serial Port Profile) application. Section 4 shows a comparison between Raspberry Pi 4, Native Alpha 1P Android App, and BlueSPP Android App, underlying the evaluation results based on their connectivity time. The last section is dedicated to conclusions and future work.

Description of "PandaSays" Mobile Application and Performance Tests Presentation Using Deep Convolutional Neural Networks and Residual Neural Networks
PandaSays is a mobile application dedicated to children diagnosed with autism and for parents and tutors to help them better understand their children's feelings and expressions [18]. It is an application where the child can draw. This is a simple action, but studies have shown that drawings can reveal emotions and feelings. Our app does more than that; we have tried to automate the understanding and interpretation of the emotion associated with the child by using a neural network. We train the network with a corpus of annotated drawings, and we present our results in the prediction of the obtained model for the affective state of the child.
The dataset is composed of 1279 drawings, divided into 5 affective states, representing 5 classes: "happy", "sad", "fear", "insecure", and "angry" (Figure 1). The whole train- The dataset is composed of 1279 drawings, divided into 5 affective states, representing 5 classes: "happy", "sad", "fear", "insecure" , and "angry" (Figure 1). The whole training data set has been labeled by a professional psychologist specialized in interpreting children's drawings. In our previous work [18], we realized a critical analyze of Convolutional Neural Networks and Feedforward Neural Networks, in order to find the best model to predict the affective state of the children from their drawings, and the conclusion was that Mo-bileNet performed better than an artificial neural network. In the current work, increasing the number of drawings from the data set, we again trained the model using Convolutional Neural Networks and ResNet Neural Networks, and the new results will be also presented further in this article. The dataset was split as follows: 80% for training and 20% for testing. For the image data processing, Keras ImageDataGenerator [19] has been used. Figure 2 represents the original drawing and Figure 3 shows the drawings processed with ImageDataGenerator from the Keras library. The parameters and their values applied to the original image were the following: The rotation_range refers to the fact that the image is rotated randomly within the range of 40 degrees; width_shift_range means that the image is shifted horizontally (left or right) with 0.1 percent of the total image width, and height_shift_range, means that the drawing will be shifted vertically with 0.1 percent of the total image height. The image will be distorted along an axis with the range of 0.2 (shear_range), in order to simulate, to a computer, the way humans see things from different angles. The "zoom_range" parameter, refers to zooming inside the drawings. The drawings have horizontal_flip set to "True", which means that the images will be randomly flipped horizontally. Parameter fill_mode is used for filling in the new pixels that were created, which can result after a rotation or a vertical/horizontal shift. The value "0.7" from the brightness_range indicates that the image will be darker, and the value 1.2 means that the image will be brightened. In our previous work [18], we realized a critical analyze of Convolutional Neural Networks and Feedforward Neural Networks, in order to find the best model to predict the affective state of the children from their drawings, and the conclusion was that MobileNet performed better than an artificial neural network. In the current work, increasing the number of drawings from the data set, we again trained the model using Convolutional Neural Networks and ResNet Neural Networks, and the new results will be also presented further in this article. The dataset was split as follows: 80% for training and 20% for testing. For the image data processing, Keras ImageDataGenerator [19] has been used. Figure 2 represents the original drawing and Figure 3 shows the drawings processed with ImageDataGenerator from the Keras library. The parameters and their values applied to the original image were the following:    First, for training the model, VGG16 ("Vision Geometry Group") was used. For the training, the fully connected output layers of the model are not added and for weights, the "ImageNet" was utilized. The "ImageNet" dataset was considered because it contains approximatively 14 million images and every node is illustrated by hundreds and thousands of images [20]. This dataset is relevant, because it contains classes of animals, trees, and sports that children can draw. In addition, the database of drawings from PandaSays application has a small number of images, and transfer learning helped to reaching some valuable results. Further, three convolution dense layers with a ReLU (Rectified linear activation unit) activation function were added, in order to allow the model to learn more complex functions and to obtain better results. For compiling the model, "Adam" (adaptive moment estimation) optimizer was used, which represents an optimization algorithm that operates sparse gradients on noise, with a learning rate of 0.001 and for loss, "categorical_crossentropy". The VGG16 model summary is described in Table 1. Next, the first 20 layers of the network were set to be non-trainable and the result is presented in Table 2; there are 14, 714, and 688 non-trainable parameters and 2, 102, and 277 trainable parameters.  The rotation_range refers to the fact that the image is rotated randomly within the range of 40 degrees; width_shift_range means that the image is shifted horizontally (left or right) with 0.1 percent of the total image width, and height_shift_range, means that the drawing will be shifted vertically with 0.1 percent of the total image height. The image will be distorted along an axis with the range of 0.2 (shear_range), in order to simulate, to a computer, the way humans see things from different angles. The "zoom_range" parameter, refers to zooming inside the drawings. The drawings have horizontal_flip set to "True", which means that the images will be randomly flipped horizontally. Parameter fill_mode is used for filling in the new pixels that were created, which can result after a rotation or a vertical/horizontal shift. The value "0.7" from the brightness_range indicates that the image will be darker, and the value 1.2 means that the image will be brightened.
First, for training the model, VGG16 ("Vision Geometry Group") was used. For the training, the fully connected output layers of the model are not added and for weights, the "ImageNet" was utilized. The "ImageNet" dataset was considered because it contains approximatively 14 million images and every node is illustrated by hundreds and thousands of images [20]. This dataset is relevant, because it contains classes of animals, trees, and sports that children can draw. In addition, the database of drawings from PandaSays application has a small number of images, and transfer learning helped to reaching some valuable results. Further, three convolution dense layers with a ReLU (Rectified linear activation unit) activation function were added, in order to allow the model to learn more complex functions and to obtain better results. For compiling the model, "Adam" (adaptive moment estimation) optimizer was used, which represents an optimization algorithm that operates sparse gradients on noise, with a learning rate of 0.001 and for loss, "categorical_crossentropy". The VGG16 model summary is described in Table 1. Next, the first 20 layers of the network were set to be non-trainable and the result is presented in Table 2; there are 14, 714, and 688 non-trainable parameters and 2, 102, and 277 trainable parameters. The results of the VGG16 model were loss = 1.8508 and accuracy = 50.9375% ( Figure 4). The batch size was 16 and the number of epochs was 40.

Vgg16 Model Summary-First 20 Layers Non-Trainable
Total params: 16,816,965 Trainable params: 2,102,277 Non-trainable params: 14,714,688 The results of the VGG16 model were loss = 1.8508 and accuracy = 50.9375% ( Figure  4). The batch size was 16 and the number of epochs was 40. It can be noticed that the test loss (red line in Figure 5) is increasing while the training loss is decreasing. This indicates that the model is overfitting. The loss represents the difference between the expected output and the final output resulting from the machine learning model and is a double data type. In the classification report from Figure 6, the data shown from 0-4 represents 5 classes (applied on all classification reports-MobileNet and ResNet) described as follows: 0-insecure; 1-happy; 2-fear; 3-angry; 4-sad. The highest precision was represented by class "happy", followed by "insecure", and the lowest precision was represented by the class "angry". "Happy" class has 337 drawings, while "angry" has only 248 drawings. One of the reasons for the lower precision "angry" is the color of images: some have multiple colors, and some are black/gray. We are using the RGB (red-green-blue) color space and for the future training we will take in consideration other color spaces, such as: It can be noticed that the test loss (red line in Figure 5) is increasing while the training loss is decreasing. This indicates that the model is overfitting. The loss represents the difference between the expected output and the final output resulting from the machine learning model and is a double data type.

Vgg16 Model Summary-First 20 Layers Non-Trainable
Total params: 16,816,965 Trainable params: 2,102,277 Non-trainable params: 14,714,688 The results of the VGG16 model were loss = 1.8508 and accuracy = 50.9375% ( Figure  4). The batch size was 16 and the number of epochs was 40. It can be noticed that the test loss (red line in Figure 5) is increasing while the training loss is decreasing. This indicates that the model is overfitting. The loss represents the difference between the expected output and the final output resulting from the machine learning model and is a double data type. In the classification report from Figure 6, the data shown from 0-4 represents 5 classes (applied on all classification reports-MobileNet and ResNet) described as follows: 0-insecure; 1-happy; 2-fear; 3-angry; 4-sad. The highest precision was represented by class "happy", followed by "insecure", and the lowest precision was represented by the class "angry". "Happy" class has 337 drawings, while "angry" has only 248 drawings. One of the reasons for the lower precision "angry" is the color of images: some have multiple colors, and some are black/gray. We are using the RGB (red-green-blue) color space and for the future training we will take in consideration other color spaces, such as: In the classification report from Figure 6, the data shown from 0-4 represents 5 classes (applied on all classification reports-MobileNet and ResNet) described as follows: 0insecure; 1-happy; 2-fear; 3-angry; 4-sad. The highest precision was represented by class "happy", followed by "insecure", and the lowest precision was represented by the class "angry". "Happy" class has 337 drawings, while "angry" has only 248 drawings. One of the reasons for the lower precision "angry" is the color of images: some have multiple colors, and some are black/gray. We are using the RGB (red-green-blue) color space and for the future training we will take in consideration other color spaces, such as: The best F1-score is obtained by the class "fear" and the lowest F1-score was represented by the class "insecure", 0.34, which means that there are some false positives and some high false negatives. The best F1-scores are represented by the classes "fear" and "sad". The F1 score has the following formula: F1 = 2 × (precision × recall)/(precision + recall) [22], and represents the mean of the precision and the recall, which is helpful to determine if there are large number of actual negatives.
For MobileNet, the "imagenet" dataset was used as weights; the batch size was 16 and the number of epochs was 40. The model contained the fully connected classifier and the summary is given in the Table 3. Because we used a pre-trained model, we trained only the last layer and froze the other layers. The trainable parameters are from the last layer, named "autismOutput" (Figure 7). It can be noticed that there is a significant difference regarding the trainable parameters between the VGG16 model and the MobileNet model. MobileNet has 5005 trainable parameters and VGG16 has 2,102,277. The metrics values were loss = 1.5260 and accuracy = 56.2500%, as seen in Figures 8 and 9. In Figure 9, the validation loss is a little greater than the training loss, which means that the model is performing well and is not overfitting. It The best F1-score is obtained by the class "fear" and the lowest F1-score was represented by the class "insecure", 0.34, which means that there are some false positives and some high false negatives. The best F1-scores are represented by the classes "fear" and "sad". The F1 score has the following formula: F1 = 2 × (precision × recall)/(precision + recall) [22], and represents the mean of the precision and the recall, which is helpful to determine if there are large number of actual negatives.
For MobileNet, the "imagenet" dataset was used as weights; the batch size was 16 and the number of epochs was 40. The model contained the fully connected classifier and the summary is given in the Table 3. Because we used a pre-trained model, we trained only the last layer and froze the other layers. The trainable parameters are from the last layer, named "autismOutput" (Figure 7). The best F1-score is obtained by the class "fear" and the lowest F1-score was represented by the class "insecure", 0.34, which means that there are some false positives and some high false negatives. The best F1-scores are represented by the classes "fear" and "sad". The F1 score has the following formula: F1 = 2 × (precision × recall)/(precision + recall) [22], and represents the mean of the precision and the recall, which is helpful to determine if there are large number of actual negatives.
For MobileNet, the "imagenet" dataset was used as weights; the batch size was 16 and the number of epochs was 40. The model contained the fully connected classifier and the summary is given in the Table 3. Because we used a pre-trained model, we trained only the last layer and froze the other layers. The trainable parameters are from the last layer, named "autismOutput" (Figure 7). It can be noticed that there is a significant difference regarding the trainable parameters between the VGG16 model and the MobileNet model. MobileNet has 5005 trainable parameters and VGG16 has 2,102,277. The metrics values were loss = 1.5260 and accuracy = 56.2500%, as seen in Figures 8 and 9. In Figure 9, the validation loss is a little greater than the training loss, which means that the model is performing well and is not overfitting. It It can be noticed that there is a significant difference regarding the trainable parameters between the VGG16 model and the MobileNet model. MobileNet has 5005 trainable parameters and VGG16 has 2,102,277. The metrics values were loss = 1.5260 and accuracy = 56.2500%, as seen in Figures 8 and 9. In Figure 9, the validation loss is a little greater than the training loss, which means that the model is performing well and is not overfitting. It is also important to mention that the dataset is constantly changing from the addition of new drawings. is also important to mention that the dataset is constantly changing from the addition of new drawings.   Figure 10 indicates that the highest precision was represented by the class "angry"-66%, followed by "insecure" with 60%, and the lowest precision was represented by the class "sad" at 48%. The highest F1-score is represented by the class "insecure" and the lowest by the class "sad". As shown in Figure 11, the main output can be observed, which shows that the drawing represents a "happy" state, with 94% accuracy, when using Pan-daSays app with the MobileNet model.  is also important to mention that the dataset is constantly changing from the addition of new drawings.   Figure 10 indicates that the highest precision was represented by the class "angry"-66%, followed by "insecure" with 60%, and the lowest precision was represented by the class "sad" at 48%. The highest F1-score is represented by the class "insecure" and the lowest by the class "sad". As shown in Figure 11, the main output can be observed, which shows that the drawing represents a "happy" state, with 94% accuracy, when using Pan-daSays app with the MobileNet model.   Figure 10 indicates that the highest precision was represented by the class "angry"-66%, followed by "insecure" with 60%, and the lowest precision was represented by the class "sad" at 48%. The highest F1-score is represented by the class "insecure" and the lowest by the class "sad". As shown in Figure 11, the main output can be observed, which shows that the drawing represents a "happy" state, with 94% accuracy, when using PandaSays app with the MobileNet model. is also important to mention that the dataset is constantly changing from the addition of new drawings.   Figure 10 indicates that the highest precision was represented by the class "angry"-66%, followed by "insecure" with 60%, and the lowest precision was represented by the class "sad" at 48%. The highest F1-score is represented by the class "insecure" and the lowest by the class "sad". As shown in Figure 11, the main output can be observed, which shows that the drawing represents a "happy" state, with 94% accuracy, when using Pan-daSays app with the MobileNet model.   Another neural network used for training the mode, was ResNet50 (Residual Neural Networks) from Keras. For weight, it was used the "imagenet" dataset, as input tensor it was used the input of shape (224,224,3), representing the width (224 pixels), height (224 pixels) and the number of channels (3), and a single pre-trained convolutional block was included (include_top = False). The ResNet base model is formed as follows: base_model = tf.keras.applications.ResNet50(weights = 'imagenet',include_top = False, input_tensor = input_tensor) The first 143 layers of the network were set to non-trainable, as seen in the code bellow: for layer in base_model.layers[:143]: layer.trainable = False The pretrained model (base_model) will be connected with new layers of a new model. Global pooling, a flatten layer, and a dense layer was added, with a softmax classifier.
The model summary is presented in Table 4. The ResNet50 model's loss was 1.5041 and the accuracy was 47.6562%. As an optimizer for compiling the model, RMSprop (Root Mean Square Propagation) was used from Keras, with a learning rate of: 2 × 10 −4 . The accuracy is displayed in Figure 12 and the model loss is presented in Figure 13. The validation loss is significantly greater than the training loss, which means that the model is overfitting. Another neural network used for training the mode, was ResNet50 (Residual Neural Networks) from Keras. For weight, it was used the "imagenet" dataset, as input tensor it was used the input of shape (224,224,3), representing the width (224 pixels), height (224 pixels) and the number of channels (3), and a single pre-trained convolutional block was included (include_top = False). The ResNet base model is formed as follows: base_model = tf.keras.applications.ResNet50(weights = 'imagenet',include_top = False, input_tensor = input_tensor) The first 143 layers of the network were set to non-trainable, as seen in the code bellow: for layer in base_model.layers[:143]: layer.trainable = False The pretrained model (base_model) will be connected with new layers of a new model. Global pooling, a flatten layer, and a dense layer was added, with a softmax classifier.
The model summary is presented in Table 4. The ResNet50 model's loss was 1.5041 and the accuracy was 47.6562%. As an optimizer for compiling the model, RMSprop (Root Mean Square Propagation) was used from Keras, with a learning rate of: 2 × 10 −4 . The accuracy is displayed in Figure 12 and the model loss is presented in Figure 13. The validation loss is significantly greater than the training loss, which means that the model is overfitting.   Figure 14 illustrates that the highest precision was represented by the class "happy"-62%, followed by "sad" with 59%, and the lowest precision was represented by the class "angry", with 33%. The highest F1-score is represented by the class "insecure" class and the lowest by the classes "sad" and "angry"-0.36. As noticed in Table 5, the highest accuracy is obtained by MobileNet neural network with 56.25%, followed by VGG16 with 50.93%. Although the smallest accuracy is obtained by ResNet50 neural network, its loss is the smallest, followed by MobileNet's loss of 1.5260%. For the PandaSays application, this was chosen as the model trained with Mo-bileNet neural network because of its accuracy results.   Figure 14 illustrates that the highest precision was represented by the class "happy"-62%, followed by "sad" with 59%, and the lowest precision was represented by the class "angry", with 33%. The highest F1-score is represented by the class "insecure" class and the lowest by the classes "sad" and "angry"-0.36. As noticed in Table 5, the highest accuracy is obtained by MobileNet neural network with 56.25%, followed by VGG16 with 50.93%. Although the smallest accuracy is obtained by ResNet50 neural network, its loss is the smallest, followed by MobileNet's loss of 1.5260%. For the PandaSays application, this was chosen as the model trained with Mo-bileNet neural network because of its accuracy results.  Figure 14 illustrates that the highest precision was represented by the class "happy"-62%, followed by "sad" with 59%, and the lowest precision was represented by the class "angry", with 33%. The highest F1-score is represented by the class "insecure" class and the lowest by the classes "sad" and "angry"-0.36.   Figure 14 illustrates that the highest precision was represented by the class "happy"-62%, followed by "sad" with 59%, and the lowest precision was represented by the class "angry", with 33%. The highest F1-score is represented by the class "insecure" class and the lowest by the classes "sad" and "angry"-0.36. As noticed in Table 5, the highest accuracy is obtained by MobileNet neural network with 56.25%, followed by VGG16 with 50.93%. Although the smallest accuracy is obtained by ResNet50 neural network, its loss is the smallest, followed by MobileNet's loss of 1.5260%. For the PandaSays application, this was chosen as the model trained with Mo-bileNet neural network because of its accuracy results. As noticed in Table 5, the highest accuracy is obtained by MobileNet neural network with 56.25%, followed by VGG16 with 50.93%. Although the smallest accuracy is obtained by ResNet50 neural network, its loss is the smallest, followed by MobileNet's loss of 1.5260%. For the PandaSays application, this was chosen as the model trained with MobileNet neural network because of its accuracy results. In Table 6, it is shown that VGG16 is the most complex model, because it has the largest number of parameters (138 million) [23], followed by ResNet50 with over 23 million parameters [24] and MobileNet, which is the least complex model, with 13 million parameters [25]. In terms of latency, ResNet50 has the highest latency, and VGG16 has the lowest. ResNet50 has the lowest time of convergence and MobileNet has the highest time-422.074 seconds, regarding the time of convergence.

Alpha 1 Pro Server-Client Connection and Bluetooth Communication Protocol
One of the objectives of the current research is to present how the Alpha 1 Pro robot can be commanded without its native application and considering the integration with PandaSays application. Alpha 1 Pro has 16 servo joints, equivalent to humans' joints. The robot can dance, move, fight. Based on interpretation of drawings, the robot will execute these actions, helping with children's therapy and mood interpretation. In this section, the communication protocol will be presented based on the Bluetooth communication. The Alpha 1 Pro robot can be programmed through Bluetooth using the Bluetooth communication protocol [26].
The command format formula is:

Header + Length + Command + [first parameter][second parameter][third parameter] + Check + End Character
BT handshake: Field Header (2B) is set to 0XFB 0XBF. Length (1B) represents the total number of bytes of (header + length + command + parameter + CHECK). Field Command (1B) refers to a specific command and Parameter (nB) represents one parameter at least. If the parameter is not needed, then the default value 0x00 is used. Field CHECK (1B) is composed of length + command + parameter and the End character (1B) is set to 0XED.
The following actions can be implemented by the robot, writing the specified code: •  Alpha 1P robot has dual-mode Bluetooth 3.0/4.0 BLE (Bluetooth Low Energy) + EDR (Enhanced Data Rate). Bluetooth Classic (or BR/EDR) is the Bluetooth radio that is mostly used in streaming applications and in the following devices/applications wireless headsets, wireless speakers, wireless printers and keyboards, and file transfers between devices. Bluetooth LE (low energy), is utilized in low-bandwidth applications that rarely need data transmission between devices. Bluetooth LE is known for its very low power consumption and is present in the following devices/applications: smartphones, computers and tablets [27], mobile payment applications, healthcare system, ticketing or access control applications.
Every Bluetooth chip is marked with a globally unique 48-bit address, which represents the Bluetooth address, device address or MAC (Media Access Control) address. This is identical in nature to the MAC addresses of Ethernet [27]. Some Bluetooth protocols are utilized in the same contexts as internet protocols. For connecting the raspberry pi device with Alpha 1P robot, it will be used RFCOMM (Radio frequency communication) protocol -set of transport protocols, providing emulated RS-232 (Recommended Standard 232) serial ports. RFCOMM provides mainly the same attributes as TCP (Transmission Control Protocol); the difference between the two is that TCP supports up to 65,535 open ports on a single machine, while RFCOMM only allows 30 [28].

Establishing Bluetooth Communication with Python
To provide access to Bluetooth system resources on GNU/Linux computers, we use PyBluez, which is a Python extension. The PyBlueZ module provides a high-level socket interface for establishing a connection between two Bluetooth devices. One acts like a client and the other as server. In our case, Raspberry pi is the server and Alpha 1P is the client. The Python code for establishing the client-server connection of the robot and Raspberry Pi can be found in the links provided in references [29,30].

Establishing Bluetooth Serial Communication between Raspberry Pi and Alpha 1P Robot Using a SPP Application
Bluetooth programming in Python is based on the socket programming model, a socket being an endpoint of a communication channel. When sockets are first created, they are not connected. In order to complete a connection, it must be called connect (client application) or accept (server application) [31]. For sending messages to Alpha 1P via Bluetooth, a Samsung Note 9 mobile phone and BlueSPP android application were used. The communication between the server and the client is illustrated in Figure 15. According to the Bluetooth communication protocol document, the robot can perform numerous actions. For powering off of all the servos, we transmit to the robot the next command: FB BF 06 0C 00 12 ED (Figure 16). In order to see the status of the robot, the next code is transmitted: FB BF 06 0A 00 10 ED. The send operation of the message is illustrated in Figure 16. The question marks in this figure are shown, because the numbers are sent randomly from Alpha1 Pro to the BlueSPP app, so the application is interpreting the bytes as ascii literals.
As it can be seen in Figure 17, the robot receives the next code associated to each status: According to the Bluetooth communication protocol document, the robot can perform numerous actions. For powering off of all the servos, we transmit to the robot the next command: FB BF 06 0C 00 12 ED (Figure 16). According to the Bluetooth communication protocol document, the robot can perform numerous actions. For powering off of all the servos, we transmit to the robot the next command: FB BF 06 0C 00 12 ED (Figure 16). In order to see the status of the robot, the next code is transmitted: FB BF 06 0A 00 10 ED. The send operation of the message is illustrated in Figure 16. The question marks in this figure are shown, because the numbers are sent randomly from Alpha1 Pro to the BlueSPP app, so the application is interpreting the bytes as ascii literals.
As it can be seen in Figure 17, the robot receives the next code associated to each status: In order to see the status of the robot, the next code is transmitted: FB BF 06 0A 00 10 ED. The send operation of the message is illustrated in Figure 16. The question marks in this figure are shown, because the numbers are sent randomly from Alpha1 Pro to the BlueSPP app, so the application is interpreting the bytes as ascii literals.
As it can be seen in Figure 17, the robot receives the next code associated to each status:   Figure 17 illustrates that the robot responded to the message sent using Bluetooth communication protocol.

Comparison of the Communication Times from Candidate Devices
The BlueSPP android app is a Bluetooth Serial Port Profile communication tool. The device used was a Samsung S10+ Android phone. To measure the time, a digital watch was used. The required time to connect to the robot was 3.8 s. Bluetooth latency, representing Alpha 1P robot's execution of a certain action, for BlueSPP Android app, on Samsung S10+, was 5 s (Figure 18). For connecting the Raspberry Pi 4 to the Alpha 1P Robot, a Bluetooth adapter with frequency of 2.4-2.4835 (GHz) was used. The PyBluez library was used to establish a client-server connection. Figure 19 shows the time in seconds for the connection to the robot-3.23 s.

Comparison of the Communication Times from Candidate Devices
The BlueSPP android app is a Bluetooth Serial Port Profile communication tool. The device used was a Samsung S10+ Android phone. To measure the time, a digital watch was used. The required time to connect to the robot was 3.8 s. Bluetooth latency, representing Alpha 1P robot's execution of a certain action, for BlueSPP Android app, on Samsung S10+, was 5 s (Figure 18).  Figure 17 illustrates that the robot responded to the message sent using Bluetooth communication protocol.

Comparison of the Communication Times from Candidate Devices
The BlueSPP android app is a Bluetooth Serial Port Profile communication tool. The device used was a Samsung S10+ Android phone. To measure the time, a digital watch was used. The required time to connect to the robot was 3.8 s. Bluetooth latency, representing Alpha 1P robot's execution of a certain action, for BlueSPP Android app, on Samsung S10+, was 5 s (Figure 18). For connecting the Raspberry Pi 4 to the Alpha 1P Robot, a Bluetooth adapter with frequency of 2.4-2.4835 (GHz) was used. The PyBluez library was used to establish a client-server connection. Figure 19 shows the time in seconds for the connection to the robot-3.23 s. For connecting the Raspberry Pi 4 to the Alpha 1P Robot, a Bluetooth adapter with frequency of 2.4-2.4835 (GHz) was used. The PyBluez library was used to establish a client-server connection. Figure 19   The Alpha 1P robot has its native android application called "Alpha 1" [32], which is available here: https://play.google.com/store/apps/details?id=com.ubt.alpha1s, accessed on 10 February 2022. The total time representing the successful execution of pairing and the connection to the robot was 28.18 s. The Bluetooth latency, representing Alpha 1P robot's execution of a certain action, for the Android "Alpha 1" Android app on a Samsung S10+ was 31.22 s.  The Alpha 1P robot has its native android application called "Alpha 1" [32], which is available here: https://play.google.com/store/apps/details?id=com.ubt.alpha1s, accessed on 10 February 2022. The total time representing the successful execution of pairing and the connection to the robot was 28.18 s. The Bluetooth latency, representing Alpha 1P robot's execution of a certain action, for the Android "Alpha 1" Android app on a Samsung S10+ was 31.22 s. The Alpha 1P robot has its native android application called "Alpha 1" [32], which is available here: https://play.google.com/store/apps/details?id=com.ubt.alpha1s, accessed on 10 February 2022. The total time representing the successful execution of pairing and the connection to the robot was 28.18 s. The Bluetooth latency, representing Alpha 1P robot's execution of a certain action, for the Android "Alpha 1" Android app on a Samsung S10+ was 31.22 s. Table 7 presents the comparative results regarding the connectivity time of the robot. The best time is represented by Raspberry Pi (3.23 s) through the Bluetooth version, which is lower than the other devices.  Table 8 shows that the best time for the Alpha 1P robot to execute an action is obtained by Raspberry Pi, followed by BlueSPP Android App.

Conclusions
Humanoid robots play an important part in helping children diagnosed with autism. As several studies demonstrated, children are keener to interact with a robot than with their parents, doctors, or tutors. Robots can teach children how to express feelings, how to interact with other children, and how to communicate better. The interaction with the robot is one of the essential parts in the PandaSays application for the machine learning algorithm to be complete.
Raspberry Pi with PyBluez can be used to create a fast Bluetooth connection and to start programming a robot to execute some actions. The time difference represented on the Android devices is due to the Bluetooth latency as is well analyzed in [33].
The contributions brought by this paper are: • In this paper, it was shown that the best trained model was the one trained with MobileNet, because of its highest accuracy-56.25%. MobileNet is the least complex neural network, with 13 million parameters, in comparison with VGG16 and ResNet50. Because of its low complexity and small size, MobileNet is more suitable for mobile applications. Those are the reasons why the MobileNet model was chosen to be used in PandaSays mobile application. • Establishment of a control methodology for connecting the robot Alpha 1 Pro with PandaSays application, using Bluetooth communication protocol. • Development of a robot module that includes the communication protocols with the app PandaSays, which will be used further to control the robot and send the machine learning output to it, in order to perform a specific action. • Python module implementation for setting the client-server communication.

•
The configuration setup of the Raspberry Pi and robot's Bluetooth communication protocol, used to measure latency and connectivity time.

•
The efficiency of using Raspberry Pi with PyBluez to create a client-server connection, represented by the lowest latency-3.66 s and by the connectivity time-3.23 s, which was faster than other devices (Android Device, BlueSPP app). • Emphasis of the importance of the humanoid robots in helping children diagnosed with autism.
Given the small data set, we used a pretrained model and applied Transfer Learning. In future work, the next step will be to enlarge the data set and to apply the following algorithms to train the model: K nearest neighbor, decision tree, and support vector machines.
Another step for us would be to use the PandaSays application and one of the humanoid robots (Alpha, NAO, or Marty) in Autism Centers in Romania to gather feedback and improve the machine learning algorithm.