Next Article in Journal
Neighborhood Aggregation Collaborative Filtering Based on Knowledge Graph
Previous Article in Journal
Online Estimation of Short-Circuit Fault Level in Active Distribution Network
 
 
Article
Peer-Review Record

Critical Examination of the Parametric Approaches to Analysis of the Non-Verbal Human Behavior: A Case Study in Facial Pre-Touch Interaction

Appl. Sci. 2020, 10(11), 3817; https://doi.org/10.3390/app10113817
by Soheil Keshmiri 1,*, Masahiro Shiomi 1, Kodai Shatani 2, Takashi Minato 1 and Hiroshi Ishiguro 1,2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2020, 10(11), 3817; https://doi.org/10.3390/app10113817
Submission received: 21 April 2020 / Revised: 27 May 2020 / Accepted: 28 May 2020 / Published: 30 May 2020

Round 1

Reviewer 1 Report

The paper tackles the assumption that all humans have common personal space boundaries - body areas or distances which they perceive off-bound and it is meant to shed light on an approach to modeling pre-touch space and also bring light to the assumptions researcher might fall for when applying these type of interactions to their research.

Overall, the authors offers a better way to estimate an individual's facial pre-touch space, than the current literature offers, by using a machine learning model based on the support vector regression (SVR) algorithm.

However, some concerns and questions may be considered for enhancing the research outcomes.
Concerns -
1) Sample size is too small - forty younger adults were used (M = 21.83, SD = 1.53) : only 10 pairs per group
2) Lack of detail on the elevation angles
3) No information on the physical characteristics of the “toucher” - wouldn’t height and appearance matter?
* Notes - Overfitting noted in line 277, could be do their small sample size

Questions -
1) What about cultural differences?
2) What about in relation to acquaintance / social relationship?
3) What was their allocated interaction time?


There are some typos such as line-156 (RMSE - missing "root"), line-195 (GR - typo in figure ==> GP), etc.

Author Response

First and foremost, the authors would like to take this opportunity to thank the reviewer for the time spent and the kind consideration to review our manuscript. The comments by the reviewer helped us improve the quality of our results and their presentation substantially.

 

In what follows, we provide our responses to the reviewer’s comments and concerns.

Sincerely,

 

 

Reviewer 1

 


Concerns -
Reviewer’s Comment: 1) Sample size is too small - forty younger adults were used (M = 21.83, SD = 1.53) : only 10 pairs per group

 

Authors’ Response:We agree with the reviewer’s comment on the small number of individuals who participated in our study. On the other hand,although the number of participants was limited, the data collected from this individuals was moderately large. As we noted in Section 2.4. Analysis, line 126, our study included 5664 facial area pre-touch samples.

However, since these data were all collectedfromthe same individuals, it couldpossibility causethese measurements to be affected by such factors as habituation (e.g., touchee felt more relaxed after a few trials and therefore let the toucher’s hand to get closer) or sensitization (i.e., opposite to the habituation, the repeated attempt by the toucher may have resulted in higher sensitivity of the touchee, thereby asking the toucher to stop in farther and farther distance from the touchee’s face in each consecutive trials).

 

Therefore, we addressed this concern in the newly added section (5. Limitations and Future Direction). We discussed the necessity for including more participants in lines 339-357 of this Section. It reads as follows. An important consideration with regards to the present study is the limited number of participants and the toucher-touchee pairs that they formed. On the one hand, although the number of participants was limited, we collected a moderately large samples (i.e., 5664 facial area pre-touch data points) from them. On the other hand, since these data were all collected from the same individuals, our data acquisition procedure could have been affected by two important factors: habituation and sensitization effects. For instance, in the context of present study the habituation could have taken place if the touchee felt more relaxed after a few trials (due to repeated facial pre-touch attempts by the same toucher) and subsequently allowed the toucher's hand to approach closer to the face area. Conversely, the sensitization effect could have occured if the repeated attempts by the toucher resulted in higher sensitivity of the touchee, thereby asking the toucher to stop in ever quicker and therefore farther distances from the touchee's face. The occurrence of such effects could also become more possible, considering the limited number of toucher-touchee pairs in the present study whose members were fixed. Specifically, although the participants in our study were all strangers to one another, it would be foreseeable that the toucher-touchee pairs could feel more comfortable (or conversely more sensitive to the presence of a stranger) after the first few trail attempts. Therefore, it is crucial for the future research to further investigate the potential effect of such factors by including more number of participants. In this context, it is also interesting to study how swapping the individuals between the pairs may affect the responses of the touchees on the one hand and the confidence of the touchers for approaching new individuals' faces on the other hand.


Reviewer’s Comment: 2) Lack of detail on the elevation angles

 

Authors’ Response:To provide further information about the azimuth and elevation angles, we first included their respective mean, median, standard deviation, and 95% confidence interval values in the current version of the manuscript (Section 2.4. Analysis, lines 136-139in the current version of the manuscript). It reads as follows.

 

The resulting modified grid included 5664 facial area pre-touch sampleswhose azimuth and elevations angles were [-80.00, . . . , +80.00] (mean (M) = 0.54, median (Mdn) = 0.10, standard deviation (STD) = 38.99, 95.0% confidence interval (CI95.0%) = [-0.11 1.20]) and [-32.00, . . . , +53.00] (M = 9.83, Mdn = 9.17, STD = 24.17, CI95.0% = [9.33 10.31]).”

 

We then elaborated on their statistics (Section 2.4., Analysis, lines140-144in the current version of the manuscript) as follows.

 

Considering the mean and median of the azimuth angles associated with these data points that are closely centered about zero, the participants’ data around azimuth axis was fairly uniformly distributed between the two sides of the face. In the case of elevation angles, we observed that the touchers’ data were mostly approached the touchee’s face above the lips area.

 

We also included a new subplot to Figure 2 (i.e., Figure 2 (D)) that depicts the distribution of touchers’ pre-touch data points around the touchees’ face area. These data points correspond to the locations where the touchers were asked by the touchee to stop approaching their face. Coordinates of these data points correspond to toucher's hand distance (x-axis, in cm), azimuth angle (y-axis, in degree), and elevation angle (z-axis, in degree) with respect to the touchees' face.We added the following to caption of figure 2.

 

(D) Distribution of touchers’ pre-touch data points around the touchees’ face area. Points are plotted in different shades of grey for better visualization purpose. These data points correspond to the locations where the touchers were asked by the touchee to stop approaching their face. Coordinates of these data points correspond to toucher’s hand distance (x-axis, in cm), azimuth angle (y-axis, in degree), and elevation angle (z-axis, in degree) with respect to the touchees’ face.

 

Wefurther referred to this new subplot within the manuscript as follows (Section2.4. Analysis, lines144-148).

 

Figure2 (D) visualizes the distribution of touchers’ pre-touch distances around the touchees’ face area (i.e., the location at which the touchers were asked by the touchees to stop approaching). The coordinates of these data points correspond to the toucher’s hand distance from the touchee’s face (x-axis, in cm),azimuth angle (y-axis, in degree), and elevation angle (z-axis, in degree) with respect to the touchees’ face.

 

We further discussed the observed effect and the necessary for future research to analyze its potential impact on our results in Section (5. Limitations and Future Direction, lines 358-369,in the current version of the manuscript). It reads as follows.

 

With regards to the distribution of touchers' data around the face area of the touchees, we observed that the mean and median of their data were close to zero. This indicated that the touchers' data was quite fairly and smoothly distributed along the azimuth axis. On the other hand, we observed that their elevation angles were more elongated towards the above-lips area. Perhaps such a stretch of data towards the area above the lips is to an extent expected when we consider the fact that the touchees were seated and the touchers were standing in front of the touchees' seat. However, this observation also highlights a shortcoming in our setting in which the fixed hight of the seat did not allow for the lower portion of the faces to be more conveniently exposed to touchers' reach. This indicated that our consideration for the position of the touchees' seat according to the touchers' arm length needed to be complemented with subsequent adjustment of the seat's hight as well. Therefore, the future research is necessary to analyze the potential effect of this shortcoming on the data collection and the performance of different regression models that were studied in the present study.”

 


Reviewer’s Comment: 3) No information on the physical characteristics of the “toucher” - wouldn’t height and appearance matter?

 

Authors’ Response:In the present study, we unfortunately did not collect information about the participants’ height. We also did not acquire participants’ subjective feelings about touchers’ appearances. We asked our touchess to look straight ahead throughout the experiment to avoid any eye-contact between the toucher and touchee. We also asked them not to intentionally look at and/or follow the approaching hand to prevent its potential effect on touchee’s decision. These instructions also allowed for controlling the potential impact of face direction effects on the toucher’s dexterity. However, we also agree with the reviewer’s comment that such information as physical characteristics could have affected our toucher/touchee interaction.

 

In the current version of the manuscript, we discussed this issues in Section 5. Limitations and Future Direction, lines 370-378.Our discussion reads as follows.

 

Another issue that requires further consideration is the physical characteristics of the touchers. In the present study, we did not collect information about the participants' height. We also did not acquire touchees' subjective impression of the touchers' appearance on their decision for allowing or stopping the reaching hand. In this respect,it is worthy of note that wedid askthe touchees to constantly look straight ahead(i.e., to avoid eye-contact and/or to lookat/followtheapproaching hand) to alleviate such potential effects as touchers' appearance. However,these measures would probably be insufficient to completely eliminate these effects. Therefore, the future research should consider collecting data about participants' subjective feelings (e.g., in the form of questionnaires), thereby allowing for more robust analysis of such non-verbal human behaviour.”


Reviewer’s Comment: * Notes - Overfitting noted in line 277, could be do their small sample size

 

Authors’ Response:As noted adequately by the reviewer, the sample size indeed affect the performance of such analyses and certainly requires to be taken into consideration. However, with regards to our results, there are two points that alleviate its effect on our observations. First, in the present study, we repeated our cross-validation strategy that randomly split data (i.e., 5664 distance data ) into 80.0% train (4531 pre-touch data) and 20.0% test (i.e., 1133) for 1000 simulation runs. This allowed for more reliable estimation of the RMSE and R2of the models that we utilized in our study. This information is included in Section 2.4. Analysis, lines 173-180 as follows.

 

For each of these two cases, we adapted a cross-validation strategy in which we randomly split the pre-touch surface that included 5664 distance data (Figure 2 (B)) into 80.0% train (4531 pre-touch data) and 20.0% test (i.e., 1133) sets. We repeated this procedure for 1000 simulation runs during which we recorded the root-mean-squared-error (RMSE) and coefficient of determination (R2) of the models as measures of their goodness of estimates. The RMSE allowed us to determine how well each model was estimating the pre-touch distance values on test set. On the other hand, the R2enabled us to evaluate whether the use of averaged pre-touch distance in the train set might yield a better estimates of test set’s pre-touch distances than the models’ estimates based on their training.”

 

Second,we carried out our analyses on these models’ metricsthrough1000 rounds of random sampling (without replacement) of 100 of these RMSE and R2values (per round) and applying our statistical tests of significance. We reported the average of these 1000 simulation rounds. This further allowed us to reduce the possibility of biased analysis due to the sample size. This information is provided in Section 2.4. Analysis, lines 184-197.It reads as follows.

 

For the analysis, we performed 1000 rounds of random sampling in which we selected (without replacement) 100 of these RMSE and R2 values for our statistical tests of significance. At each round, we first applied pairwise Wilcoxon rank sum between each pair of polynomial degrees (i.e., 0 versus 4) for each model. Next, we applied Wilcoxon rank sum test between models and the fixed pre-touch distance (i.e., 20.0 cm). In the case of models, we considered their pairs of equal degrees (e.g., GP vs. SVR using degree 0). In the case of fixed pre-touch distance, R2indicated whether the use of averaged pre-touch distance might yield a better estimates than such a fixed spherical boundary. We also applied Wilcoxon signed rank (i.e., one-sample test) on R2values of these models (for both polynomial degrees 0 and 4) to determine how significantly better or worse (in the case of R2< 0) than the use of averaged pre-touch distance they performed. For each round, we also calculated the effect size r = W [29] with N denoting the sample size (N = 100, i.e., RMSE and R2 associated with each of N the 1000 random sampling without replacement rounds) and W denoting the Wilcoxon test-statistics. We Bonferroni-corrected all p-values by multiplying them with N, given the use of non-parametric tests. We reported the average of these 1000 simulation rounds.”

 

On the other hand, we also appreciate the importance of the reviewer’s comment with regards to the effect of sample size on the performance of machine learning algorithms and their results. Therefore, in addition to our discussion on the limited number participants with respect to data collection (please see our response to the reviewer’s comment 1) Sample size is too small - forty younger adults…), we also discussed its effect specifically on the performance of these algorithms in Section 5. Limitations and Future Direction, lines 385-398.Our discussion reads as follows.

 

From a broader perspective, it is a well-known fact that the performance of the machine learning algorithms is highly dependent on the sample size based on which these algorithms are trained. It is also widely demonstrated that limited data can result in degradation of these models' performance and consequentially bias and/or reduce their accuracy. To (at least partially) control for such undesirable effects, we opted for a strategy in which we repeated our cross-validation for 1000 random split of our 5664 distance data between train and test sets. We also further carried out our statistical tests of significance on these models' RMSE and R2values through 1000 rounds of random sampling (without replacement). These steps enabled us to more reliably estimate these models' RMSE and R2values as well as to obtain a more accurate estimate of their differences. However, to draw a more informed conclusion on our results, it is necessary to repeat these analyses with a larger sample sizes and in scenarios in which more comprehensive sets of non-verbal human behaviour are considered. Such analyses will not only let researchers realize the extent to which our results can be generalized but also will help them acquire more thorough understanding of the domains of application for these algorithms.

 

Questions -
Reviewer’s Comment: 1) What about cultural differences?

 

Authors’ Response: This is indeed an important factor to be considered. In the current version of the manuscript, we discussed this matter in Section 5. Limitations and Future Direction, lines 379-384as follows.

 

Cultural backgrounds and social norms are another important issues that will inevitably affect the outcome of such analyses as human (non-verbal) interaction. In this respect, the scope of our study was also limitedinthat our participants were not multinationals. Although the fact that these individuals were total strangers to each other help validate the interpretation of our findings, this factor on its own is not sufficient for generalization of our results across cultures. Therefore, future research must take into account such differences, thereby focusing on studies that introduce such cultural variabilities.”


Reviewer’s Comment: 2) What about in relation to acquaintance / social relationship?

 

Authors’ Response: In our study, all the participants were Japanese nationals were strangers to each others. None of these participants had any (social) relation/connection with the other participants. We added this information to Section 2.1. Participants, lines85-86. It reads as follows.

 

In the present study, all the participants were Japanese nationals whowere strangers to each others. None of theseparticipants had any(social)relation/connection with the other participants.”

 


Reviewer’s Comment: 3) What was their allocated interaction time?

 

Authors’ Response:The total interaction time, per pair, was two hours. However, we did not allow free-interaction (e.g., talking to each other, etc.) between them during the experiment session. We added this information to Section 2.2. Paradigms, lines 106-107as follows.

 

The total interaction time, per pair, was two hours. However, we did not allow free-interaction (e.g., talking to each other, etc.) between them during the experiment session.”

 


Reviewer’s Comment: There are some typos such as line-156 (RMSE - missing "root"), line-195 (GR - typo in figure ==> GP), etc.

 

Authors’ Response: We added the missing “root” and also checked the manuscript any further potential misspellings. We also corrected the mislabeled “GR” by changing it to “GP” in both Figures 3 and 4.

 

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper investigates the socially relevant "pre-touch" thresholds around a person's face. Using preexisting data, the paper explores three different modelling techniques to find out the best modeling approach

The paper is highly technical in nature. Utmost clarity is therefore very relevant.

The final sentences in the abstract are rather vague --- they deserve more precision. I here refer to "... such assumptions as averaged/fixed boundary thresholding or imposing an unjustified distribution on data do not warrant an acceptable estimate of the human subjects’ facial pre-touch behaviour. In a broader sense, our results suggest that the use of such simple mathematical models as ordinary regression to unveil rather than impose underlying similarities can yield more reliable results that reflect the observed human behaviour." Can this text be sharpened?

 

The literature overview is somewhat fragmented, in mu opinion. Especially the sentences: "One important aspect of these studies is to identify the universal dimensions along which human behaviour is expressed. In this regards, Goldberg [3] asserts that "the variety of individual differences are insignificant in people’s daily interactions;" a viewpoint that echoes that of Galton [18]. On the other hand, although it appears plausible to presume a certain degree of similarity among individuals, this presumption does not necessarily warrant such simplifying assumptions as average or normally distributed [19–21] behavioural response. For instance, Micceri [22] analyzed the distributional characteristics ..." 

Which aspects from the literature have been guiding in the design of the modeling in your paper? The paper states you are reusing the data:

"For this purpose, we utilize the data from our previous study [24] and perform three types of regression-based estimation of such a distance boundary on it." but it is less clear to what extent you reuse modelling techniques.

 

It is nnclear what fig 2B exactly shows.

 

"...the set of either azimuth or elevation angles and identifies a specific angle that is scaled". I guess the text shoud be "to be scaled". 

 

"We examined polynomial degrees 0 (i.e., no polynomial degree or control condition) through 10." What abou tisk of overfitting?

 

"At each round, we first applied pairwise Wilcoxon rank sum between each pair of polynomial degrees (i.e., 0 versus 4) for each model. " Please discuss a need for Bonferroni correction.

 

Sections from line 196 and further: this reads like a technical report, rather than a mature paper.  would suggest to adjust and make it more readable for the reader.

Fig 3 is quite unclear to me.

Why is degree 4 chosen?

Fig 5: a bit unclear what we exactly see.

 

 

Author Response

First and foremost, the authors would like to take this opportunity to thank the reviewer for the time spent and the kind consideration to review our manuscript. The comments by the reviewer helped us improve the quality of our results and their presentation substantially.

 

In what follows, we provide our responses to the reviewer’s comments and concerns.

Sincerely,

 

Reviewer 2



Reviewer’s Comment: The final sentences in the abstract are rather vague --- they deserve more precision. I here refer to "... such assumptions as averaged/fixed boundary thresholding or imposing an unjustified distribution on data do not warrant an acceptable estimate of the human subjects’ facial pre-touch behaviour. In a broader sense, our results suggest that the use of such simple mathematical models as ordinary regression to unveil rather than impose underlying similarities can yield more reliable results that reflect the observed human behaviour." Can this text be sharpened?

 

Authors’ Response: We rewrote this part of the Abstract (lines 16-21in the current version of manuscript) to read as follows.

 

We show that within the context of facial pre-touch interaction, normal distribution does not capture the variability that is exhibited by human subjects during such non-verbal interaction. We also provide evidence that such interactions can be more adequately estimated by considering the individuals’ variable behaviour and preferences through such estimation strategies as ordinary regression that solely relies on the distribution of their observed behaviour which may not necessarily follow a parametric distribution.

 

Reviewer’s Comment: The literature overview is somewhat fragmented, in mu opinion. Especially the sentences: "One important aspect of these studies is to identify the universal dimensions along which human behaviour is expressed. In this regards, Goldberg [3] asserts that "the variety of individual differences are insignificant in people’s daily interactions;" a viewpoint that echoes that of Galton [18]. On the other hand, although it appears plausible to presume a certain degree of similarity among individuals, this presumption does not necessarily warrant such simplifying assumptions as average or normally distributed [19–21] behavioural response. For instance, Micceri [22] analyzed the distributional characteristics ..."

Which aspects from the literature have been guiding in the design of the modeling in your paper? The paper states you are reusing the data:

"For this purpose, we utilize the data from our previous study [24] and perform three types of regression-based estimation of such a distance boundary on it." but it is less clear to what extent you reuse modelling techniques.

 

Authors’ Response: We addressed the reviewer’s comment in two steps. First, we broke this paragraph to two. The first paragraph (Section Introduction, lines 34-46, in the current version of the manuscript) contained the findings in the literature on non-normal characteristic of individuals’ psycho-behavioural and personality traits. We added this paragraph with the following sentence (Section Introduction, lines 43-45, in the current version of the manuscript).

 

These findings indicated that the differences in individuals’ psycho-behavioural and personality traits are associated with types of variability whose trend may not readily and necessarily follow a normal distribution.”

 

The entire paragraph reads as follows (lines 33-45).

 

One important aspect of these studies is to identify the universal dimensions along which human behaviour is expressed. In this regards, Goldberg [3] asserts that "the variety of individual differences are insignificant in people’s daily interactions;" a viewpoint that echoes that of Galton [18]. On the other hand, although it appears plausible to presume a certain degree of similarity among individuals, this presumption does not necessarily warrant such simplifying assumptions as average or normally distributed [1921] behavioural response. For instance, Micceri [22] analyzed the distributional characteristics of over 440 samples of achievement and psychometric measures. His results revealed that several classes exhibit deviations from the normal distribution. Similarly, analysis of the 693 psychological data by Blanca et al. [23] identified that merely 5.50% of the distributions were close to expected values under normality while 74.40% exhibited slight to moderate and 20.00% more extreme deviations. These findings indicated that the differences in individuals’ psycho-behavioural and personality traits are associated with types of variability whose trend may not readily and necessarily follow a normal distribution.”

 

Second, we provided further evidence for the above observation based on more observable individuals’ behaviour. This paragraph starts with the following sentence (lines 46-47).

 

In the same vein, our recent study [24] also found that such non-normal characteristics in individuals’ personality was also present in their more observable behaviour.

 

This paragraph then succeeds with briefly describing the findings with regards to these individuals’ facial pre-touch interaction that was documented in [24].

 

The entire paragraph reads as follows (lines 46-53).

 

In the same vein, our recent study [24] also found that such non-normal characteristicsin individuals’ personality was also present in their more observable behaviour. Specifically, we observed that these individuals’ facial pre-touch interaction did not follow a specific distribution pattern. Our results further identified that the non-parametric cluster analysis of their facial pre-touch interaction resulted in formation of well-defined facial pre-touch subspaces that significantly correlated with the participants’ openness in the five-factor model (FFM) [25]. Additionally, they indicated that such a non-parametric modeling was able to predict individuals’ openness with a significantly above chance level accuracy.

 

Reviewer’s Comment: It is nnclear what fig 2B exactly shows.

 

Authors’ Response:To better describe this subplot, we added “3-D heatmap” at the opening of its caption. We also added a brief additional explanation about what the z-axis of this subplot present. In the current version of the manuscript, its caption reads as follows (Figure 2, page 5).

 

3-D heatmap of pre-touch surface around the face using the modified grid that includes 5664 facial area pre-touch data points. The azimuth and elevations angles of these samples are within [-80.00, ..., +80.00] and [-32.00, ..., +53.00], respectively. The z-axis corresponds to the distance of the touchers' hand from the touchees' face (i.e., distance at which the touchers were asked by the touchees to stop approaching hand to get any closer).

 

We also included a new subplot to Figure 2 (i.e., Figure 2 (D)) that depicts the distribution of touchers’ pre-touch data points around the touchees’ face area. These data points correspond to the locations where the touchers were asked by the touchee to stop approaching their face. Coordinates of these data points correspond to toucher's hand distance (x-axis, in cm), azimuth angle (y-axis, in degree), and elevation angle (z-axis, in degree) with respect to the touchees' face.

We added the following to caption of figure 2.

 

(D) Distribution of touchers’ pre-touch data points around the touchees’ face area. Points are plotted in different shades of grey for better visualization purpose. These data points correspond to the locations where the touchers were asked by the touchee to stop approaching their face. Coordinates of these data points correspond to toucher’s hand distance (x-axis, in cm), azimuth angle (y-axis, in degree), and elevation angle (z-axis, in degree) with respect to the touchees’ face.

 

Wefurther referred to this new subplot within the manuscript as follows (Section2.4. Analysis, lines 144-148).

 

Figure2(D) visualizes the distribution of touchers’ pre-touch distances around the touchees’ face area (i.e., the location at which the touchers were asked by the touchees to stop approaching). The coordinates of these data points correspond to the toucher’s hand distance from the touchee’s face (x-axis, in cm),azimuth angle (y-axis, in degree), and elevation angle (z-axis, in degree) with respect to the touchees’ face.

 

Reviewer’s Comment: "...the set of either azimuth or elevation angles and aidentifies a specific angle that is scaled". I guess the text shoud be "to be scaled".

 

Authors’ Response: We changed “that is scaled” to “to be scaled”.

 

Reviewer’s Comment: "We examined polynomial degrees 0 (i.e., no polynomial degree or control condition) through 10." What abou tisk of overfitting?

 

Authors’ Response: This is indeed an important observation by the reviewer. Since the use of polynomial degree was common between all the models and conditions, its effect should have affected all of them and not potentially subset of these models. Furthermore, considering the fact that these models did not opt for higher polynomial degrees (please see our response to the reviewer’scomment: Why is degree 4 chosen?below) indicated that the use of polynomial degree 4 was not inflating/biasing their performance (e.g., overfitting). This was also evidence by the fact that all the models achieved their best results with polynomial degree 4. This is quite reasonable considering the fact that the feature space to these model was quite low dimensional (i.e., 2-D., azimuth and elevation angle) while the sample was comparably larger (i.e., 5664distance data points).

 

 

There are alsotwoadditionalpointsto consider. First, in the present study, we repeated our cross-validation strategy that randomly split data (i.e., 5664 distance data) into 80.0% train (4531 pre-touch data) and 20.0% test (i.e., 1133) for 1000 simulation runs. This allowed for more reliable estimation of the RMSE and R2of the models that we utilized in our study. This information is included in Section 2.4. Analysis, lines 173-180 as follows.

 

For each of these two cases, we adapted a cross-validation strategy in which we randomly split the pre-touch surface that included 5664 distance data (Figure 2 (B)) into 80.0% train (4531 pre-touch data) and 20.0% test (i.e., 1133) sets. We repeated this procedure for 1000 simulation runs during which we recorded the root-mean-squared-error (RMSE) and coefficient of determination (R2) of the models as measures of their goodness of estimates. The RMSE allowed us to determine how well each model was estimating the pre-touch distance values on test set. On the other hand, the R2enabled us to evaluate whether the use of averaged pre-touch distance in the train set might yield a better estimates of test set’s pre-touch distances than the models’ estimates based on their training.”

 

Second,we carried out our analyses on these models’ metricsthrough1000 rounds of random sampling (without replacement) of 100 of these RMSE and R2values (per round) and applying our statistical tests of significance. We reported the average of these 1000 simulation rounds. This further allowed us to reduce the possibility of biased analysis due to the sample size. This information is provided in Section 2.4. Analysis, lines 184-197.It reads as follows.

 

For the analysis, we performed 1000 rounds of random sampling in which we selected (without replacement) 100 of these RMSE and R2 values for our statistical tests of significance. At each round, we first applied pairwise Wilcoxon rank sum between each pair of polynomial degrees (i.e., 0 versus 4) for each model. Next, we applied Wilcoxon rank sum test between models and the fixed pre-touch distance (i.e., 20.0 cm). In the case of models, we considered their pairs of equal degrees (e.g., GP vs. SVR using degree 0). In the case of fixed pre-touch distance, R2indicated whether the use of averaged pre-touch distance might yield a better estimates than such a fixed spherical boundary. We also applied Wilcoxon signed rank (i.e., one-sample test) on R2values of these models (for both polynomial degrees 0 and 4) to determine how significantly better or worse (in the case of R2< 0) than the use of averaged pre-touch distance they performed. For each round, we also calculated the effect size r = W [29] with N denoting the sample size (N = 100, i.e., RMSE and R2 associated with each of N the 1000 random sampling without replacement rounds) and W denoting the Wilcoxon test-statistics. We Bonferroni-corrected all p-values by multiplying them with N, given the use of non-parametric tests. We reported the average of these 1000 simulation rounds.”

 

 

Reviewer’s Comment: "At each round, we first applied pairwise Wilcoxon rank sum between each pair of polynomial degrees (i.e., 0 versus 4) for each model. " Please discuss a need for Bonferroni correction.

 

Authors’ Response: We apologize for missing on this information. Our results are indeed Bonferroni-corrected. Specifically, our analyses were between four settings: “GP”, “SVR”, “Lasso”, and “Fixed.” As a result, the Bonferroni-corrected p-value for significance level p = 0.05 was 0.05/4 = 0.0125. We added this information to the current version of the manuscript (Section 2.4. Analysis, lines 206-209). It reads as follows.

 

We applied Bonferroni-correction to all our tests. Specifically, our analyses were between four settings: "GP", "SVR", "Lasso", and "Fixed" pre-touch distance boundary. As a result, the Bonferroni-corrected p-value for significance level p = 0.05 was 0.05/4 = 0.0125.

 

For reporting these values, we included the actual p-values of our test results (since they were all p < 0.001). Alternatively, we can update them by multiplying them with 4 in case the reviewer considers it to be necessary.

 

 

Reviewer’s Comment: Sections from line 196 and further: this reads like a technical report, rather than a mature paper. would suggest to adjust and make it more readable for the reader.

 

Authors’ Response: We modified the writing of these subsections.

 

Reviewer’s Comment: Fig 3 is quite unclear to me.

 

Authors’ Response: The two parts of this figure refer to the case where no polynomial degree was used (i.e., polynomial degree 0) and when we used polynomial degree 4 (which all models achieved their best results based on) (for the choice of polynomial degree, please see our response to the reviewer’s comment:Why is degree 4 chosen? below). We put the two parts adjacent to each other for ease of their comparison. However, we can display them in two separate figures if the reviewer finds it necessary.

 

Reviewer’s Comment: Why is degree 4 chosen?

 

Authors’ Response:We chose this polynomial degree based on the performance gain of the models on test set while updating its value from 0 (i.e., no polynomial degree) to 10. For instance, we set its value to 0 (i.e., no polynomial degree) and obtained the models’ performance on test set. Then, we updated the polynomial degree to 1 and acquired these models’ performance based on test set. We did this for all polynomial degrees 0 through 10. We observed that all these models obtained their best performance when they used polynomial degree 4. Therefore, we use this value. We explained this in the manuscript (Section 2.4. Analysis, lines 165-173, in the current version of the manuscript) as follows.

 

We also examined whether the use of polynomial degree on input features (i.e., azimuth and elevation angles) benefited these models’ estimate of participants’ facial area pre-touch boundary. The use of polynomial degree was a natural consideration since any such pre-touch distance boundary was analogous to approximation of a three dimensional volume around the face (e.g., a perfect spherical space in the case of fixed pre-touch distance). We examined polynomial degrees 0 (i.e., no polynomial degree or control condition) through 10. All models attained their best estimates using polynomial degree 4. Therefore, we used 0 (i.e., control setting in the case of the models) and 4 (i.e., feature vectors of length 24= 16, given the azimuth and elevation angles as original input features) polynomial degrees in our analyses.”

 

 

 

Reviewer’s Comment: Fig 5: a bit unclear what we exactly see.

 

Authors’ Response:To increase the clarity of the results presented in this figure, we first removed the subplots that were depicted in the middle row (i.e., the top-view of the models’ 2-D heatmaps that showed their estimates of the facial pre-touch area around the face). On the other hand, subplot (D) corresponds to the statistical analysis of the estimates by these models during their training phase(i.e., results reported in Section3.3. Investigation of the Training Process, lines 269-275, in the current version of the manuscript). We further modified the caption of this figure to better explain its content. It reads as follows (Figure 5, page 10, in the current version of the manuscript).

 

Estimated 2-D heatmap surfaces by models trained on the entire facial area pre-touch distances. Column-wise: (A) GP appears to perfectly fit the data during its learning, thereby resulting in an overfitted model. (B) SVR discards the differences between pre-touch distances and instead finds the hyper/plane that best suites its optimization criterion i.e., minimization of the orthogonal distances between pre-touch data and the candidate boundary. It is apparent that in 3D space of the facial pre-touch distances a hemisphere is the best candidate. (C) Lasso’s learning process appears to better balance the balanced bias-variance tradeoff while satisfying its criterion that is to keep the entries of its learned weight matrix small (i.e., potentially as coarse-grained as possible), thereby allowing for a better generalization to novel cases. The middle row gives the top-view of the surfaces that were estimated by each of these models. (D) Distribution of estimated pre-touch distances by each of these models. ”

 

Furthermore, we modified the first paragraph of this section (Section 3.3. Investigation of the Training Process, lines 246-257)

 

Considering these models’ respective training phases, we noticed that GP perfectly fitted (Figures 5(A)) its training data (RMSE = 0.00, R2 = 1.00), thereby yielding an overfitted model. On the other hand, we observed that SVR learning process overtly discarded (Figure 5(B)) the differences between these facial pre-touch distances (RMSE = 7.69, R2 = .10). The SVR’s performance can be explained in light of its optimization criterion that attempts to minimize the orthogonal distances between data and its fitted hyper/plane. Considering the distribution of facial area pre-touch distances around the face area (i.e., an approximately hemispheric surface), it is foreseeable for SVR’s optimization criterion to find one such candidate hemisphere (i.e., analogous to a plane in 3D that is bent along the azimuth angle). On the other hand, Lasso (Figure 5(C)) whose only criterion was to penalize the growth of its266 learned weight matrix appeared to achieve a balance between bias and variance (RMSE = 4.90, R2 = 64). This suggestion is clarified by comparing its RMSE and R2 in the case of full data and its average performance on test cases in Sections 3.1and 3.2 (MRMSE = 5.79 and MR2 = .44).

 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Lines 180 and further: "Precisely, R2 < 0.0 would mean that the use of the averaged pre-touch distances as their estimated pre-touch boundary was better than the model estimates, R2 = 0.0 would indicate no difference between the two, and R2 > 0.0 would justify the use of a model for estimating this boundary."

Comment: I would be careful: the R2  = (amount explained variation/amount total variation) itself is usually presented as a proportion. I would reshape this section to avoid confusion.

 

In the discussion, I would certainly spend a few words about e.g. using more modern techniques such as linear mixed effects modelling as a statistical approach to analyze this dataset.

lmer: https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf

 

 

 

 

 

Author Response

First and foremost, the authors would like to take this opportunity to thank the reviewer for the time spent and the kind consideration to review our manuscript. The comments by the reviewer helped us improve the quality of our results and their presentation substantially.

 

In what follows, we provide our responses to the reviewer’s comments and concerns.

 

Sincerely,

 

Reviewer 2

 

Reviewer’s Comment:Lines 180 and further: "Precisely, R2 < 0.0 would mean that the use of the averaged pre-touch distances as their estimated pre-touch boundary was better than the model estimates, R2 = 0.0 would indicate no difference between the two, and R2 > 0.0 would justify the use of a model for estimating this boundary."

Comment: I would be careful: the R2 = (amount explained variation/amount total variation) itself is usually presented as a proportion. I would reshape this section to avoid confusion.

 

Authors’ Response: Reviewer’s observation with regards to the range of coefficient of variation is indeed valid and adequate. On the other hand,wereported thecoefficient of determination in the present manuscript.However, we believe that the similar notation that is used by these two measures (i.e., R2) is the source of this misunderstanding. Therefore, to prevent any future misinterpretationby the readers, we modified our reference to coefficient of determination byincludingits formula and also addingfurther description (Section 2.4. Analysis, lines 181-183, in the current version of the manuscript). It reads as follows.

Precisely, in the case of regression, coefficient of determination (i.e., R2 =where SSresidualand SStotalrefer to residual and explained sum of squares, respectively) measures how well the regression predictions approximate the real data points [28,29].

 

Reviewer’s Comment: In the discussion, I would certainly spend a few words about e.g. using more modern techniques such as linear mixed effects modelling as a statistical approach to analyze this dataset.

lmer:https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf

 

Authors’ Response: We added the following paragraph to Section 5. Limitations and Future Direction, lines 397-401,in the current version of the manuscript.

Last, we limited our analyses of the facial pre-touch distances to GP, SVR, and Lasso regressors. It is apparent that the choice of regression models for approximating non-verbal human behaviour is not limited to these options. Future research can benefit from the use of other sophisticated predictive approaches such as linear mixed-effects models [40] and Bayesian linear regression [41], thereby further advancing the findings that are presented in the present study.

Author Response File: Author Response.pdf

Back to TopTop