Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Printed Edition

A printed edition of this Special Issue is available at MDPI Books....

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Person Independent Recognition of Head Gestures from Parametrised and Raw Signals Recorded from Inertial Measurement Unit

Appl. Sci. 2020, 10(12), 4213; https://doi.org/10.3390/app10124213

by Anna Borowska-Terka^*

and Pawel Strumillo

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Reviewer 5: Anonymous

Reviewer 6: Anonymous

Appl. Sci. 2020, 10(12), 4213; https://doi.org/10.3390/app10124213

Submission received: 10 May 2020 / Revised: 14 June 2020 / Accepted: 17 June 2020 / Published: 19 June 2020

(This article belongs to the Special Issue Machine Learning for Biomedical Application)

Round 1

Reviewer 1 Report

This article is very well written and very clear.

There have been many works on Human activity recognition, but not so many on head movements. However, the problem as presented int his work does not look too challenging.

The high accuracy and F1-score values obtained show that simple feature extraction and even the use of the raw data can be enough for recognition. I am particularly surprised by the best results in the NSP setup when compared with the SP setup. I would expect some comments about this... is it that there is so low noise in the signals that the raw data can be directly exploited and not fool the learning algorithms ?

To provide more significant results, I would expect an evaluation of the models in a real-life setup where there can be movements shorter than 5 seconds and no a-priori fixed pauses between the different movements. Moreover, the computational times to process the signals once the algorithms are trained may also be included to support their usefulness in the context of the proposed application.

The above comments will provide a better view of the performance of the presented approaches for real-time processing of the sensor signals.

Author Response

This article is very well written and very clear.

There have been many works on Human activity recognition, but not so many on head movements. However, the problem as presented int his work does not look too challenging.

We thank the Reviewer for the comments about our manuscript.

Yes, we were also surprised that raw signals directly used for training the classifiers gave superior results to the parametrised approach. We hypothesise that the six signal channels recorded from the IMU carry rich enough information for the classifiers to confidently recognize the head gesture. Also, we conclude that the inherent measurement noise (that occurs randomly) is averaged out during the training procedures of the classifiers. We have added a comment to the above reasoning in the Conclusion section. Note also, that the authors of [13] have arrived at similar conclusions.

[13] M. Dobrea, D. Dobrea and I. Severin, "A New Wearable System for Head Gesture Recognition Designed to Control an Intelligent Wheelchair," 2019 E-Health and Bioengineering Conference (EHB), Iasi, Romania, 2019, pp. 1-5, doi: 10.1109/EHB47216.2019.8969993.

Please note that the experiment reported in T2 testing scenario (see section 2.1) addresses exactly your comments, i.e. the head gestures were executed in random order and there were no gaps between the gestures or head immobility that occurred randomly. Also, in our understanding, because of the random sequence of the performed gestures the time of the experiment does not matter much.

The time of detection a certain gesture for a single input data vector :

No.	Classifier	Recognition of parametrised signals	Recognition of raw IMU signals
1.	Decision tree	~ 1 ms	~ 1 ms
2.	Decision tree with a minimum of 5 samples in the leaf	~ 1 ms	~ 1 ms
3.	Random forest	~ 10 ms	~ 6 ms
4.	Random forest with a minimum of 5 samples in the leaf	~ 6 ms	~ 7 ms
5.	k-NN for kÎ{1,3,5,7,9,11,13,15,17,19}	~ 5 ms	~ 2 ms
6.	SVM with an RBF kernel for C Î{0.1,1,5,10}	~ 10 ms	~ 5-11 ms
7.	SVM with a 3rd degree polynomial kernel for C Î{0.1,1,5,10}	~ 10 ms	~ 4-10 ms

Note, however, that for the parametrised approach we must first compute the statistical parameters, for the shortest time window consisting of 10 signal samples we need to wait for 0.1s before we feed these parameters to the classifies. Thus, an extra time delay will be introduced.

The above comments will provide a better view of the performance of the presented approaches for real-time processing of the sensor signals.

Yes, we agree, that our ultimate goal is to achieve real-time performance of the system. The computation times required for processing the data by the trained classifiers (for a PC with Intel Core i5-7500 processor) varies from 1ms for the decision tree classifier to 11 ms for the SVM classifier with an RBF kernel.

We have added the table with the recognition times in the manuscript. [lines: 403 - 411]

Reviewer 2 Report

The authors propose a hands-free interface enabling blind persons to control the menu of navigation device by means of head movements.

Although the authors conducted various experiments and present the comparison of the performance between the different classifier with the parametric and non-parametric representations of IMU signals, I do not find the scientific contributions of this study and the novelty of the system design.

What is the main contribution of this study? Making new public benchmark dataset for this area or proposing a novel methods? The authors do not propose a new methodology, but only focus on experimentation using existing methodologies.
The experiment design is too naive. What is the purpose of the experiments? What would you want to show through experiments?
What insight do you find through this study? The authors simply listed the experimental results and did not present a meaningful analysis.

Author Response

1. What is the main contribution of this study? Making new public benchmark dataset for this area or proposing a novel methods? The authors do not propose a new methodology, but only focus on experimentation using existing methodologies.

Our objective is to seek alternative human-machine interfaces that would not involve manual handling (the importance of such interfaces is growing during the Covid-19 pandemia). Using head gestures for this purpose is one of the research directions worth exploring. Another foreseen application of our study might be in rehabilitation of patients with neck stiffness.

An interesting result of our experiment is that raw samples of the IMU signals can be directly used for training the classifiers with better recognition results than the results obtained for the classifiers trained on the parametrized signals.

Finally, in our view, the database of IMU recordings, that we made public accessible, can be a valuable source of machine learning data.

2. The experiment design is too naive. What is the purpose of the experiments? What would you want to show through experiments?

In the Introduction section, we review other studies in which acceleration sensors are used for tracking human body movements and note that this research field grows in popularity. Monitoring head gestures is one of the approaches in this line of research. Yes, we agree that simple head gestures are recognized only. However, this was our aim in fact to simplify the gestures and show that they can be confidently recognised and used in human-machine interaction applications, e.g. for controlling electronic travel aids for the visually impaired.

3. What insight do you find through this study? The authors simply listed the experimental results and did not present a meaningful analysis.

Please note, that in two page long sections of the manuscript, i.e. in the Discussion and Conclusions we analyse the achieved results and provide point-by-point discussion of the study and formulate (in our mind) valuable conclusions.

Thank you for your comments. We hope that, at least partly, we have explained the points raised in your review.

Reviewer 3 Report

Dear authors,

My opinion is that the manuscript is well structured and presented. Gesture recognition is a topical issue, with important applications for impaired persons. Therefore, your article is, undoubtedly, of high importance. However, I propose the following corrections/suggestions:

The template used for your article is an old one (2019), please update to the current template.
The Abstract should begin with a clear background and placing of the issues addressed in the article. Also, make sure you do not exceed the limit of 200 words.
In the Introduction and Discussion parts, please change citation [14, 15, 16, 22] to [14 - 16, 22].
In the Classifier training and results:

Explain the meaning of "F1-score"
Give more insight about how the testing procedure was conducted, how were the movements detected during the test
How long (ms) does it take to detect a certain gesture? Does it depend on the training algorithm or on the method SP/NSP? If yes, please provide a comparative table similar to Table 7
Explain if the "C" parameter can influence/improve the confusion matrix

In the Conclusion part you state that the application is more suitable for paralyzed/tetraplegic persons. Just as a suggestion, you could reformulate your initial ideas and set as target group for your interface the paralyzed/tetraplegic persons. The blind persons group can be referred to as other domain of application.
Please revise, according to the template, the following sections: Author contributions and Acknowledgment (add the same Horizon project).
In the References part, I suggest you change the accessed date of the URLs, since the time frame between the accessing and the uploading of the article is too small.

Comments for author File: Comments.pdf

Author Response

Dear authors,

Response: We thank you for this opinion about our work and your valuable comments.

However, I propose the following corrections/suggestions:

1. The template used for your article is an old one (2019), please update to the current template.

Response 1: Corrected

2. The Abstract should begin with a clear background and placing of the issues addressed in the article. Also, make sure you do not exceed the limit of 200 words.

Response 2: We have added one sentence at the beginning of the abstract to position our study in a wider context.

3. In the Introduction and Discussion parts, please change citation [14, 15, 16, 22] to [14 - 16, 22].

Response 3: Corrected

4. In the Classifier training and results:

Explain the meaning of "F1-score"

Response 4.1: The F1 score is defined as a harmonic average of the sensitivity and positive predictive value. It is a good measure of overall classifier performance. We have added an appropriate comment in the manuscript.
Give more insight about how the testing procedure was conducted, how were the movements detected during the test

Response 4.2: In our mind quite a detailed description of the testing procedure is given in the first paragraph of Materials and Methods section. Note, however, that the movements were not detected on-line during the test. Recognition of head gestures was done off-line after training the classifiers. However, by evaluating the response time of the trained classifiers we conclude that the proposed system is suitable for on-line recognition of the head gestures.

How long (ms) does it take to detect a certain gesture? Does it depend on the training algorithm or on the method SP/NSP? If yes, please provide a comparative table similar to Table 7

Response 4.3: Recognition time of a gesture for a single input data pattern:

No.	Classifier	Recognition of parametrised signals	Recognition of raw IMU signals
1.	Decision tree	~ 1 ms	~ 1 ms
2.	Decision tree with a minimum of 5 samples in the leaf	~ 1 ms	~ 1 ms
3.	Random forest	~ 10 ms	~ 6 ms
4.	Random forest with a minimum of 5 samples in the leaf	~ 6 ms	~ 7 ms
5.	k-NN for kÎ{1,3,5,7,9,11,13,15,17,19}	~ 5 ms	~ 2 ms
6.	SVM with an RBF kernel for C Î{0.1,1,5,10}	~ 10 ms	~ 5-11 ms
7.	SVM with a 3rd degree polynomial kernel for C Î{0.1,1,5,10}	~ 10 ms	~ 4-10 ms

Explain if the "C" parameter can influence/improve the confusion matrix

Response 4.4: For the smaller values of parameter “C” (in particular for C = 0.1) the detection rates of the gestures improved but we observed more frequent cases of immobility recognized as a gesture. Thus, we had more false positive detections of gestures. In order to have less of these false positives (the system activates action without our intention) we judged as the best trade-off was to set C=1.0 for the SP training procedure and C=10.0 for training the SVM classifier on raw IMU signal samples. In terms of the confusion matrix entries we do not want to maximize the sensitivity because this will decrease specificity and result in increased rates of false positive detections of the gestures.

5. In the Conclusion part you state that the application is more suitable for paralyzed/tetraplegic persons. Just as a suggestion, you could reformulate your initial ideas and set as target group for your interface the paralyzed/tetraplegic persons. The blind persons group can be referred to as other domain of application.

Response 5: Yes, our predominant application is to aid the visually impaired in handling assistive devices. This idea was inspired after discussion with the blind individuals, who prefer to control the device without the need of using hands. Thus, we would like to keep this order of envisioned applications. However, indeed we see other potential applications in rehabilitation and as an aid for the persons with physical disabilities, e.g. for controlling a wheelchair or other appliances.

6. Please revise, according to the template, the following sections: Author contributions and Acknowledgment (add the same Horizon project).

Response 6: Revised according to template.

7. In the References part, I suggest you change the accessed date of the URLs, since the time frame between the accessing and the uploading of the article is too small.

Response 7: URLs links were verified and access dates were updated

Reviewer 4 Report

The authors discuss an interesting application of IMUs as a possible decision making tool for the disabled. They show that for seated personnel, head movement IMU signals are better classified when raw sensor signals are used versus when these signals are parameterized. Below comments highlight some aspects which should be included in the manuscript:

1) Is there any particular reason why the SVM appears to perform better overall? What features of the input dataset might cause these results?

2) It would be useful to include a little more text about how the parameters were computed. Did seating postures change during the duration of a single participant data acquisition? How were different head sizes and baseline head positions accounted for when forming the parameters? Were the motion ranges normalized by the head sizes?

3) Furthermore, in terms of signal parametrization (SP), have the authors considered correlations between the parameters as possible additional input parameters?

4) Possible use of this system is discussed for the visually impaired. In such a case, it is useful to keep realistic scenarios in mind. For example, how will this framework will adapt when the user is walking? How will the signals be decoupled from those resulting just from head motion due to walking? In such cases, one might have to reconsider whether nonparametric is better than parametric. For example, parametric features might allow for adequate baseline subtraction/normalization during formulation of the different parameters which might make it a more robust system to different possible walking styles than perhaps a nonparametric framework? What are the authors’ thoughts about these realistic scenarios?

Author Response

1) Is there any particular reason why the SVM appears to perform better overall? What features of the input dataset might cause these results?

Response 1: The SVM is recognised as one of the best classifiers for solving nonlinear classification problems. This observation was confirmed by our study. Note, however, that the SVM performed just marginally better than other classifiers.

Response 2: The computed parameters were basic statistical quantities computed for the IMU signals as listed out in section 2.1. These parameters were calculated for different window widths of the signals ranging from 0.1 s to 2 s. We did not take anthropometric measurements of the individuals taking part in the trials. Thus these parameters were not included in the training sets. Also, the motion ranges were not normalized to account for different head sizes. Including these extra parameters would probably slightly improve the recognition performance, however, it will make the system user dependent.

For a more complete description of the experiment we have added the following text into the manuscript: “Each participant was sitting straight in a chair and did not change position during the experiment and performed only the given motions (yaw, pitch, roll, and immobility). Each user had the DUO MLX device mounted rigidly on their forehead. The participant did not touch the sensors or change its position on the forehead during the experiment.” [lines: 118-121]

3) Furthermore, in terms of signal parametrization (SP), have the authors considered correlations between the parameters as possible additional input parameters?

Response 3: Yes, in fact we have included correlation coefficients computed for pairs of signals from the accelerometer and pairs of signals from the gyroscope. This is feature no. 6 listed in section 2.1 of the manuscript.

Response 4: Thank you for these very relevant comments. We have addressed these points in the last paragraph of the discussion section of the manuscript, where we have pointed out the limitations of our study. At this stage of development of the system, the trials are conducted for the participants staying in a stationary position. This is how the blind individuals would see the use of the system. However, the mobility tests are worth exploring - we have already done recordings of users who walked, sat down and stood up with the sensors mounted on their foreheads. Our further work will focus on recognising head gestures during normal/daily movements. Obviously, an open question is: which of the recognition approaches will perform better in these new experimental conditions. Our view is that also the raw signal approach will perform well, since in the supervised training scenario of the classifiers will learn how to neglect the signals’ components due to body movements.

Reviewer 5 Report

The paper investigates the performance of a hands-free head-gesture controlled interface for supporting people with disabilities. Recognition of three head movements (pitch, roll, yaw) along with head immobility was performed based on raw and processed signals recorded from the inertial measurement unit. Sufficiently large number individuals (65 persons) employed in experiments assures statistically reliable results. All the classification experiments were carefully designed and performed, thus presented conclusions are convincing and confirmed by the obtained results.

There are just minor issues that should be corrected to improve the quality of this paper.

Please consider in the literature review (now the last cited publications are from 2019) some recent papers that cover similar research topic, e.g.:

Ascari, R., Silva, L., Pereira, R., Personalized gestural interaction applied in a gesture interactive game-based approach for people with disabilities, 2020, International Conference on Intelligent User Interfaces, Proceedings IUI, pp. 100-110

Solea, R., Margarit, A., Cernega, D., Serbencu, A., Head movement control of powered wheelchair, 2019, 23rd International Conference on System Theory, Control and Computing, ICSTCC 2019, 8885844, pp. 632-637;

Please correct: ax. -> ax, (lines 125, 155, 167);
In the Fig. 3, “Time window T” is covered by dashed line, please modify;
Obtained sensitivity for both classification experiments is significantly lower (by at least 5%) than specificity. Please make a comment how this can influence a general performance of the gesture recognition system;
There is something wrong in the statement “…F1-score of 0.92%...” (line 415), please check.

Author Response

Response: Thank you for your kind comments about our study.

There are just minor issues that should be corrected to improve the quality of this paper.

1. Please consider in the literature review (now the last cited publications are from 2019) some recent papers that cover similar research topic, e.g.:

Response 1: Thank you for drawing our attention to these publications. We have cited both publications and commented them in the review section (bibliographic items [9] and [12]).

2. Please correct: ax. -> ax, (lines 125, 155, 167);

Response 2: Thank you, corrected.

3. In the Fig. 3, “Time window T” is covered by dashed line, please modify;

Response 3: Corrected.

4. Obtained sensitivity for both classification experiments is significantly lower (by at least 5%) than specificity. Please make a comment how this can influence a general performance of the gesture recognition system;

Response 4: Maximizing sensitivity rates means we maximize detection rates of the head gestures. At the same time, however, the specificity will be lowered and we will have more false positive detections of the gestures, i.e. the system will be more frequently falsely activated. The consequence of slightly lower values of sensitivity mean that some of the gestures will be not detected and the user will have to repeat the action. Nevertheless, both measures are at an average level above 90%.

5. There is something wrong in the statement “…F1-score of 0.92%...” (line 415), please check.

Response 5: Corrected, this was a mistake. Thank you for pointing it out.

Reviewer 6 Report

Overview

The manuscript proposes a hands-free head-gesture controlled interface
to help persons with disabilities to communicate. They aim to develop
a user-friendly and efficient electronic travel aid that will help the
visually impaired retain orientation and mobility in unfamiliar environments.

The proposed method is clearly presented and the experimental results are clear.
The manuscript is well written and it is easy to understand its key ideas and the steps
taken by the authors on their research.

The proposal is interesting.

Please see my comments below.
I have technical comments as well as some comments on the writing.

Some technical comments

1 - On Section 2.1, on item 2, the total time of the recording is 20+40+20+40+20+40+20=200 seconds.
However, in section 2.1, in line 171, the authors write that the total recording time is 150 seconds.
Please check on this.

2 - Line 205. We have "Random forests are efficient classifiers for very large data sets."
Please add a proper literature reference for this statement.

3 - Section 2.2.3. Please add a proper literature reference to the KNN classifier.
I suggest this paper https://link.springer.com/article/10.1007/BF00153759

4 - On the experimental results of Table 1 and Table 2, please add the standard deviation value along with the mean value.
Please, highlight in bold face the best result.

5 - Since the data taken from the NSP method yields the best results, I wander if the feature extraction process is adequate,
considering the 7 features addressed. Since the feature-based representation consistently produces worst results, the reason for this to happen may be on the
feature extraction process. Those seven features may not be adequate choices to represent the data for this task. I suggest
that the authors take a look into this in future work.

6 - On Line 155 and 156 we have "the following statistical parameters were computed"
I would say the "the following features were extracted"

7 - Regarding the performed statistical tests

On Lines 255 and 256, we have "Because we consider more than two classifiers and our data do not
256 have a normal distribution, the Friedman test was used for this purpose [33]."

On Lines 445 and 446, we have "The Friedman test revealed that the classifiers yield
statistically different results (p<0.05)."

This is a bit confusing. Please explain.

8 - On Section 5 - conclusions. The item number 5 (from line 457 to 460) seems to belong to future work.
Thus, I suggest to place it on the following paragraph.

Some comments on writing

1 - Please do not use acronyms on the title (Applied Sciences is not dovoted specifically to the paper topic).
Please use inertial measurement unit instead of IMU

2 - The choice of the terms "parametrised" and "non-parametrised" may not be the best choice.
A suggestion: On the entire manuscript, I would replace "parametrised" -> "feature based" and "non-parametrised" -> direct time domain

Notice that you are not putting parameters on the signal. In the first method, you use a standard pattern recognition
approach, by performing feature extraction from the input signal. On the second method, by the description on the paper
you use the direct time domain samples acquired from the IMU.

3 - On the entire manuscript it seems to have a mixture of US English and UK English.
Here are some examples:

UK
On the title, we have "parametrised"
Line 231, we have "non-parametrised and parametrised"

US
Line 98, we have "organized"
Line 214, we have "non-parametrized"

4 - Abstract
Line 15. Please change
Two approaches to recognizing
->
Two approaches to recognize

Line 24. Please change
for a Support Vector Machine (SVM) classifier
->
for Support Vector Machines (SVM) classifier

Line 25
Marginally worse results,
->
Slightly worse results,

Line 26
Achieved high recognition rates
->
The achieved high recognition rates

5 - Section 1
Line 35 and 36
Human-Computer Interfaces (HCIs)
->
Human-Computer Interfaces (HCI)

Line 69
In [17] a system that can be
->
In [17], a system that can be

Line 82
very recent work [20], IMUs were applied
->
very recent work [20], IMU were applied

Line 85
In our study we propose a hands-free
->
In our study, we propose a hands-free

Line 91
reported in [21], in which IMUs were used
->
reported in [21], in which IMU were used

Line 93
In our work we show
->
In our work, we show

Line 98
The rest of this paper is organized as follows.
->
The remainder of this paper is organized as follows.

This sentence on line 98 should start a new paragraph.

In Section 2 we describe
->
In Section 2, we describe

Line 100
In Section 3 we present the head
->
In Section 3, we present the head

6 - Section 2
Line 104
we have applied a DUO MLX device equipped with a
->
we have applied a DUO MLX device (see Figure 1b) equipped with a

Line 111
After fixing the DUO MLX device on their forehead the users
->
After fixing the DUO MLX device on their forehead, the users

Line 113
During collection of the data the participants remained
->
During collection of the data, the participants remained

7 - Section 2.1
Line 135
yaw, roll, pitch and immobility.
->
yaw, roll, pitch, and immobility.

Line 146
We have "testing datasets were recorded:" after "testing scenarios applied:" on line 142.
Please do not use ":" inside ":" because it gets confusing.

Line 154
1. With signal parameterisation (SP): for each of the six
->
1. With signal parameterisation (SP) - for each of the six

Again, please do not use ":" inside ":" because it gets confusing.

Line 181
1. For the SP procedure: set with 42-element vectors.
->
1. For the SP procedure - set with 42-element vectors.

Again, please do not use ":" inside ":" because it gets confusing.

Line 184
2. For the NSP procedure: set of vectors each with 6 signal samples.
->
2. For the NSP procedure - set of vectors each with 6 signal samples.

8 - Section 2.2.3
Line 207
The k-Nearest Neighbour Classifier (k-NN) another non-parametric data classification
->
The k-Nearest Neighbour Classifier (k-NN) is another non-parametric data classification

Line 214
for the non-parametrized case (NSP). In all the tested k-NN classifiers the Euclidean distance was
->
for the non-parametrized case (NSP). In all the tested k-NN classifiers, the Euclidean distance was

9 - Section 2.2.4
Line 220
called support vectors.
->
called the support vectors.

Line 221 and 222
for the multiclass problem a scheme in which one-vs-rest classification approach was adopted.
->
for the multiclass problem, a scheme in which the one-vs-rest classification approach was adopted.

Line 224
types of kernels, an RBF kernel and
->
types of kernels, a RBF kernel and

Line 226
Thus we present
->
Thus, we present

Line 239 and 240
Tables 1 and 2 present the results of 10-fold cross-validation training for the SP and NSP training procedures correspondingly.
->
Tables 1 and 2 report the results of the 10-fold cross-validation for the SP and NSP training procedures, respectively.

10 - Section 3
Line 239
The 10-fold cross validation method was used to evaluate classifiers’ performance
->
The 10-fold cross validation method was used to evaluate the classifiers’ performance

11 - Section 3.1
Line 280
slightly worse than achieved by the random forest classifier
->
slightly worse than the one achieved by the random forest classifier

12 - Caption of Figure 5, 6, 7, and 8
The comparison of the classifiers’...
->
Comparison of the classifiers’...

trained classifiers on T_53_set and
->
trained classifiers on the T_53_set and

13 - Section 3.4
Line 327
We also verified how the classifiers’ accuracy depended on the type
->
We also verified how the classifiers’ accuracy depends on the type

Line 330
For this testing scenario the accuracy o
->
For this testing scenario, the accuracy o

Line 331
classifiers were the SVMs and the random forests.
->
classifiers were the SVM and the random forests.

Line 348
for the 12 trial participants who did not take part in
->
for the 12 trial participants who did not took part in

Line 361
Note that F1-score, sensitivity and specify rates are similar
->
Note that F1-score, sensitivity and specificity rates are similar

Lines 363 and 364
Please check on the font size of the caption of Table 5.

Line 382
for training the classifiers are presented in Table 7:
->
for training the classifiers are presented in Table 7.

14 - Section 4
Line 400
In [26] we have shown
->
In [26], we have shown

Line 408
and select the setting appropriate for the encountered
->
and select the appropriate setting for the encountered

Line 415
we obtained a comparable F1-score of 0.92% for the
->
we obtained a comparable F1-score of 92% for the

15 - Section 5
Line 450
the IMU signal samples it achieved the recognition accuracy better than 95%.
->
the IMU signal samples it achieved a recognition accuracy above 95%.

Line 458 and 459
with every arriving IMU signal sample
->
with each arriving IMU signal sample

Author Response

Thank you very much for a positive evaluation of our research efforts.

I have technical comments as well as some comments on the writing.

Some technical comments

T1 - On Section 2.1, on item 2, the total time of the recording is 20+40+20+40+20+40+20=200 seconds.
However, in section 2.1, in line 171, the authors write that the total recording time is 150 seconds.
Please check on this.

Response T1: In the conducted experiments we made two types of recordings – the training data T1 and the test data T2. The length of each training recordings was about 150 s that was noted in line 178. Whereas, test T1 recordings took 200 s and we wrote about it in lines 150-152.

T2 - Line 205. We have "Random forests are efficient classifiers for very large data sets."
Please add a proper literature reference for this statement.

Response T2: We added new literature items cited as [33, 34]:

[33] Random Forests. https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm. Accessed 31 May 2020

[34] Zakariah, Mohammed. Classification of large datasets using Random Forest Algorithm in various applications: Survey. International Journal of Engineering and Innovative Technology (IJEIT) 2014, 4, 189-198.

T3 - Section 2.2.3. Please add a proper literature reference to the KNN classifier.
I suggest this paper https://link.springer.com/article/10.1007/BF00153759

Response T3: Thank you for your suggestion. We included this reference cited as item [35].

T4 - On the experimental results of Table 1 and Table 2, please add the standard deviation value along with the mean value.
Please, highlight in bold face the best result.

Response T4: Corrected, standard deviations were computed and added.

T5 - Since the data taken from the NSP method yields the best results, I wander if the feature extraction process is adequate,
considering the 7 features addressed. Since the feature-based representation consistently produces worst results, the reason for this to happen may be on the
feature extraction process. Those seven features may not be adequate choices to represent the data for this task. I suggest
that the authors take a look into this in future work.

Response T5: In our mind the parameters were quite versatile and we wanted to keep the dimension of the classification problem low. Nevertheless, we will consider this suggestion in our future work.

T6 - On Line 155 and 156 we have "the following statistical parameters were computed"
I would say the "the following features were extracted"

Response T6: Corrected.

T7 - Regarding the performed statistical tests

On Lines 255 and 256, we have "Because we consider more than two classifiers and our data do not
256 have a normal distribution, the Friedman test was used for this purpose [33]."

On Lines 445 and 446, we have "The Friedman test revealed that the classifiers yield
statistically different results (p<0.05)."

This is a bit confusing. Please explain.

Response T7: We are not sure whether we have understood you correctly. We chose the Friedman test because of non-Gaussian distribution and more than two considered classifiers. We have confirmed that the differences between classifiers’ results are statistically significant.

T8 - On Section 5 - conclusions. The item number 5 (from line 457 to 460) seems to belong to future work.
Thus, I suggest to place it on the following paragraph.

Response T8: Thank you, we have moved this text fragment to the suggested place [lines: 489 - 492].

Some comments on writing

Response: We wish to thank the Reviewer for all these editorial and language corrections. We have updated the text according to all these suggestions.

W1 - Please do not use acronyms on the title (Applied Sciences is not dovoted specifically to the paper topic).
Please use inertial measurement unit instead of IMU

Response W1: Thank you for your suggestion. Corrected.

W2 - The choice of the terms "parametrised" and "non-parametrised" may not be the best choice.
A suggestion: On the entire manuscript, I would replace "parametrised" -> "feature based" and "non-parametrised" -> direct time domain

Response W2: Thank you, we have followed your suggestion, however partly, i.e. we have left the term “parametrised” whereas term “non-parametrised'” we replaced with “time domain”. These changes were highlighted in text.

Round 2

Reviewer 1 Report

I recognize the great effort from the authors to answer all the questions and issues raised by the reviewers. All my suggestions or questions were considered by the authors and their answers are present in the new version of the paper. However, the recognition task still does not appear to be very challenging, that is why I keep on my impression that their contribution is "average". Perhaps that a demonstration of their system in a real-time processing setup can better show the impact of their solution.

Author Response

Response: Thank you for your final comments. We will certainly pursue a real-time functionality of the system by having a good background from the currently conducted study.

Reviewer 2 Report

I am somewhat satisfied with the authors' responses.

I also acknowledge the contribution of this paper in that it provides preliminary research results and experimental data that can be used by subsequent researchers.

Author Response

I am somewhat satisfied with the authors' responses.

I also acknowledge the contribution of this paper in that it provides preliminary research results and experimental data that can be used by subsequent researchers.

Response: Thank you for accepting our efforts to improve the manuscript.

Reviewer 4 Report

Normalizing for head sizes is meant to make the features user-independent, not user-dependent as the authors mention in their response. One can imagine larger heads providing greater displacement data versus smaller heads. Not accounting for this will inherently introduce errors in the features which propagate through into the results. The authors should mention this as an additional limitation of the study.

Furthermore, there are no results testing the optimality of the functional form of the features. That is, there is no evidence that these set of 42 parameters represent the feature set which provide the best performance. To tests this, various feature sets are formed and tested to evaluate their results.

The authors should acknowledge these drawbacks of the study in their manuscript. However, I still believe this work has some merit therefore I am happy for this paper to be accepted once the study's limitations are more clearly highlighted in the paper.

Author Response

Point 1: Normalizing for head sizes is meant to make the features user-independent, not user-dependent as the authors mention in their response. One can imagine larger heads providing greater displacement data versus smaller heads. Not accounting for this will inherently introduce errors in the features which propagate through into the results. The authors should mention this as an additional limitation of the study.

Response 1: We mean a user-independent system that does not require any user-specific measurements or system tuning. Normalizing for head sizes would require individual anthropometric measurements.

However, we agree that we might normalize the range of signals amplitudes. In such a case we might improve system performance even further and still keep the system as user-independent.

Point 2: Furthermore, there are no results testing the optimality of the functional form of the features. That is, there is no evidence that these set of 42 parameters represent the feature set which provide the best performance. To tests this, various feature sets are formed and tested to evaluate their results.

Response 2: Yes, we agree that we have not carried out extensive exploration and testing of various possible signal features, that probably would even further improve system recognition performance.

We have commented this limitation of the system in the Discussion section of the manuscript by inserting the following text [lines 452-455]:

“Also, we should underline that our choice of IMU signals parameters, e.g. statistical parameters, was arbitrary and one might hypothesise that a different set of parameters might be composed that would even further improve the recognition performance of the head gestures.”

Point 3: The authors should acknowledge these drawbacks of the study in their manuscript. However, I still believe this work has some merit therefore I am happy for this paper to be accepted once the study's limitations are more clearly highlighted in the paper.

Response 3: Thank you again for your final kind comments about our work.

Article Menu

Printed Edition

Person Independent Recognition of Head Gestures from Parametrised and Raw Signals Recorded from Inertial Measurement Unit

Further Information

Guidelines

MDPI Initiatives

Follow MDPI