Next Article in Journal
Estimation of Translational Motion Parameters in Terahertz Interferometric Inverse Synthetic Aperture Radar (InISAR) Imaging Based on a Strong Scattering Centers Fusion Technique
Previous Article in Journal
Hyperspectral Dimensionality Reduction Based on Multiscale Superpixelwise Kernel Principal Component Analysis
 
 
Technical Note
Peer-Review Record

Heart ID: Human Identification Based on Radar Micro-Doppler Signatures of the Heart Using Deep Learning

Remote Sens. 2019, 11(10), 1220; https://doi.org/10.3390/rs11101220
by Peibei Cao, Weijie Xia * and Yi Li
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Remote Sens. 2019, 11(10), 1220; https://doi.org/10.3390/rs11101220
Submission received: 1 April 2019 / Revised: 15 May 2019 / Accepted: 16 May 2019 / Published: 23 May 2019

Round 1

Reviewer 1 Report

The paper introduces human identification through analysis of the heartbeat in micro-Doppler signatures. The paper would need some improvement before being ready for publication. 


1) The authors should explain if the subjects were breathing or not when the data were acquired

2) The authors claim that the slide step for the STFT is 1/2000 seconds, but the sampling frequency is 1kHz, it means that the sliding step is only half a sample. Is it correct ?

3) Sections 3.2 and 3.3 can be removed and probably replaced with references. 


Author Response

Response to Reviewer 1 Comments


Point 1: The authors should explain if the subjects were breathing or not when the data were acquired.

Response 1: Thanks for your recommendation. When the data were acquired, the subjects were not breathing. Details are in the newly submitted manuscript (in line 69).


Point 2: The authors claim that the slide step for the STFT is 1/2000 seconds, but the sampling frequency is 1kHz, it means that the sliding step is only half a sample. Is it correct?

 

Response 2: We are very grateful to you for your recommendation. Actually, the sampling rate we set on the machine was 50 kHz, and then the data was sampled down during data processing. I confirmed the multiple of the downsampling in the program, it was 25 times, indicating that the final sampling time was 2 kHz. I am really sorry for this mistake. Details are in the newly submitted manuscript (in line 64, 83-84).

 

Point 3: Sections 3.2 and 3.3 can be removed and probably replaced with references.

 

Response 3: Thanks for the Reviewer’s recommendation. However, according to the reviewer 2 and reviewer 3, more description of conventional supervised learning methods is needed. So I have added some description. Details are in the newly submitted manuscript (in section 3.2).

 


Author Response File: Author Response.pdf

Reviewer 2 Report

Nice and well prepared experiment. However I have doubt in technique applied. Authors utilize RF radar working at 24GHz frequency. What is skin depth of RF wave in confrontation to human body ? How deep radar is penetrating human body ? Are the Doppler signals related to micromovements of human thorax rather heart muscle inside ? What is the influence of clothes, person are wearing ? What the radar will detect in rain ? What is the  signal of frequency (~240Hz,~380Hz, ~360Hz and ~200Hz),  in spectrograms in Fig 2 (a,b,c,d- respectively)  related to ? What kind of data were taken for non-neural network methods? Authors just describe the methods without providing details on that. Since DCNN are used as a image classification, what are the data used for other methods? Author claims , that these are using much smaller amount of data .... I think more detailed description of data used for SVM and Bayes based methods should be provided

Author Response

Response to Reviewer 2 Comments

 

 

 

Point 1: Authors utilize RF radar working at 24GHz frequency. What is skin depth of RF wave in confrontation to human body? How deep radar is penetrating human body?

 

Response 1: Thank you for your comments. We have done the experiment on the penetrability of radar. The experiment performed in the laboratory of the college of Electronic and Information Engineering in our campus, which is shown in Fig.1. During the experiment, the steel plate moved back and forth as the rope pulled. We used pork to replace human tissue for the experiment. The pork that we used is shown in Fig.2, and the thickness of them are 9mm and 13mm respectively, of which the measurement error is about two millimeters. The pork was placed between radar and pendulous steel plate. When there was no pork between radar and pendulous steel plate, radar could receive the reflection from the steel plate, which is shown in the Fig.3 (b). When there is no subject before radar, there was no micro-Doppler signal can be observed, which is shown in Fig.3 (a). Also, we have done the experiment with pork A only, pork A and B between radar and pendulous steel plate. The results are shown in Fig.3. Fig.3 (c) shows the result when there is pork A only. As we can observe, the micro-Doppler effect of the signal is obvious, indicating that the shielding effect of pork A is weak and most of the signal can penetrate. Fig.3 (d) shows the result of pork A and B between radar and pendulous steel plate. It can be observed that micro-Doppler effect is obviously weakened, indicating that the shielding effect become stronger. In conclusion, with the thickness increasing, less signal can be received by radar. Considering the thickness of people’s thorax, the signal we receive is from heart. Details are in the newly submitted manuscript (in section 2.3).

                                             

Fig.1 Setup of the experiment on the penetrability of radar.

Fig.2 The pork we employed in the experiment.

(a)                                            (b)

(c)                                         (d)

Fig.3 Time-frequency graphs for different conditions. (a) No subjects before radar; (b) Only steel plate before radar; (c) Pork A between radar and steel plate; (d) Pork A and B between radar and steel plate.

 

Point 2: Are the Doppler signals related to micro movements of human thorax rather heart muscle inside?

 

Response 2: We are really sorry that we have not explained it clearly. When the data were acquired, the subjects were not breathing. So the micro movements is not from human thorax.

 

Point 3: What is the influence of clothes, person are wearing?

 

Response 3: We are very grateful for your comment. First of all, the clothes that subjects wearing were made of cotton or linen, and there was no metal material on them. Besides, when we collected the data, the subjects did not breathe, and there was no extra movement, so the clothes did not move and therefore did not produce micro-motion. Furthermore, the data of one person was collected over several days, so the influence of clothing on human recognition was reduced to some extent.

 

Point 4: What the radar will detect in rain? 

 

Response 4: We are very grateful for your comments, and sorry that due to the limitation of our equipment, we cannot collect data in the heavy rain, it may damage the acquisition card and ACME system. However, we have read some literature about the effect of rainfall on radar. The figure below shows the decrement of the radar signal under different conditions, it is provided by Consultative Committee of International Radio (CCIR). As in our experiments, subjects sit about 1.5 meters before radar, the decrement derived by rain is very small.

 

 

Point 5: What is the signal of frequency (~240Hz,~380Hz, ~360Hz and ~200Hz),  in spectrograms in Fig 2 (a,b,c,d- respectively)  related to ? 

 

Response 5: We are really sorry that we have not explained it clearly. It is related to the micro-Doppler of heart beat.

 

Point 6: What kind of data were taken for non-neural network methods?   Authors just describe the methods without providing details on that. Since DCNN are used as an image classification, what are the data used for other methods? Author claims, that these are using much smaller amount of data.... I think more detailed description of data used for SVM and Bayes based methods should be provided.

 

Response 6:  The input data of non-neural network methods are the features that extracted from raw radar signals. And the features are as follows:

(1) The period of heartbeat;

(2) The energy of heartbeat;

(3) Bandwidth of Doppler signal.

80% of the extracted feature for each subject is used for training, and the remaining 20% is used for testing.

 


Author Response File: Author Response.pdf

Reviewer 3 Report

The manuscript "Heart ID: Human Identification Based on Radar Micro-Doppler Signatures of Heart Using Deep Learning" proposes a successful combination of three different fields of research. The idea to use Doppler radar for monitoring the heartbeat, however, may not be new. A tiny web search gave this: https://sctracy.github.io/chensong.github.io/publication/mobicom17/. I guess there is much more literature on this topic, which should be referenced. For me the topic of a unique heartbeat signal is new, and I think it will be for many of the "Remote Sensing"-readers.

It occurs that the authors are experts in the field of radar and therefore all details of the measurement (penetration of waves through skin and tissue) are missing here, because they are clear to them. A detailed description of the experiment setup is missing (including a discussion of possible side effects).

The main part of the manuscript is dedicated to the different methods used to classify the data. In all four sub-chapters the description of the algorithms is held too general. Please reference a paper or textbook and describe the methods graphically with respect to the application to your data.

The results chapter misses some very important basics, when it comes to machine learning: the training and test data sets are only briefly described and both a little small for very good results, as is shown with the accuracy going down when more people are involved. It is not clear to me, how the data set for the different groups of people were treated: did you train one network and test with different groups or did every group get its one specifically trained network. This is crucial for the assessment of the validity of results. What is also missing are the two important quality measures of "precision" and "recall". Depending on the input data, a network can reach 99% accuracy, but still fail.

My main issue with this study is that the focus is on comparing different machine learning methods, one deep-learning method (DCNN) and three feature based more basic machine learning methods. In line 302, the authors state that the "DCNN performs better than the conventional supervised learning methods", which for me is no wonder at all. In my opinion this is comparing apples to oranges, because the DCNN naturally has many more degrees of freedom to fit the data than the other methods. In addition, the other methods use a reduced data set of extracted features, which might not at all be the relevant features for human identification. One feature is heart beat rate, which may change during measurement (see Fig. 2b) another is beat energy, which also may change (see Fig. 2a). In lines 261-262 the authors state that a loss tending to 0 indicates a not-overfit network. This statement is a little awkward as the training algorithm is supposed to minimize the loss. If during training the loss becomes very small AND during testing the accuracy gets worse with a larger group of people (whatever this exactly meant in this context) indicates overfit, i.e. the network is able to replicate the training data, but with unknown data it cannot correctly classify results.

The last part of the results and discussion chapter deals with a completely new topic of noise resistance. It is completely unclear, what was done here, because "different level of noise" is not explained.

The manuscript needs substantial language editing. In passages I could not follow the argumentation or explanations because I did not understand the sentences. There are many sentence fragments and unexplained acronyms and symbols. The captions to the figures are too short and any symbol explanation is missing. Some fragments of caption-like sentences appear in the main text but e.g. the explanation of figure 2, which shows the data, is too scarce to help anybody not familiar to time-lines of Doppler radar data to understand it.



Author Response

Response to Reviewer 3 Comments

 

Point 1: The idea to use Doppler radar for monitoring the heartbeat, however, may not be new. A tiny web search gave this: http://sctracy.github.io/chensonf.github.io/publication/ mobicom17/. I guess there is much more literature on this topic, which should be referenced. For me the topic of a unique heartbeat signal is new, and I think it will be for many of the "Remote Sensing"-readers.

 

Response 1: We are very grateful for your comments. I have read the literature on this topic and have referenced them in the new submitted manuscript. Details are in the newly submitted manuscript (in line39-41).

 

Point 2: It occurs that the authors are experts in the field of radar and therefore all details of the measurement (penetration of waves through skin and tissue) are missing here, because they are clear to them. A detailed description of the experiment setup is missing (including a discussion of possible side effects).

 

Response 2: Thank you for your comments. We have done the experiment on the penetrability of radar. The experiment performed in the laboratory of the college of Electronic and Information Engineering in our campus, which is shown in Fig.1. During the experiment, the steel plate moved back and forth as the rope pulled. We used pork to replace human tissue for the experiment. The pork that we used is shown in Fig.2, and the thickness of them are 9mm and 13mm respectively, of which the measurement error is about two millimeters. The pork was placed between radar and pendulous steel plate. When there was no pork between radar and pendulous steel plate, radar could receive the reflection from the steel plate, which is shown in the Fig.3 (b). When there is no subject before radar, there was no micro-Doppler signal can be observed, which is shown in Fig.3 (a). Also, we have done the experiment with pork A only, pork A and B between radar and pendulous steel plate. The results are shown in Fig.3. Fig.3 (c) shows the result when there is pork A only. As we can observe, the micro-Doppler effect of the signal is obvious, indicating that the shielding effect of pork A is weak and most of the signal can penetrate. Fig.3 (d) shows the result of pork A and B between radar and pendulous steel plate. It can be observed that micro-Doppler effect is obviously weakened, indicating that the shielding effect become stronger. In conclusion, with the thickness increasing, less signal can be received by radar. Considering the thickness of people’s thorax, the signal we receive is from heart. Details are in the newly submitted manuscript (in section 2.3).

                                             

Fig.1 Setup of the experiment on the penetrability of radar.

Fig.2 The pork we employed in the experiment.

(a)                                            (b)

(c)                                         (d)

Fig.3 Time-frequency graphs for different conditions. (a) No subjects before radar; (b) Only steel plate before radar; (c) Pork A between radar and steel plate; (d) Pork A and B between radar and steel plate.

 

The system, which is employed to collect experimental micro-Doppler signatures of targets, included IVS-179 radar, M2i.4912 eight-channel parallel data acquisition card and ACME industrial personal portable computer. IVS-179 radar is connected to eight-channel parallel data acquisition card, and eight-channel parallel data acquisition card is connected to ACME industrial personal portable computer.Details are given in the newly submitted manuscript (in section 1, 2.1 and 2.3)

 

Point 3: The main part of the manuscript is dedicated to the different methods used to classify the data. In all four sub-chapters the description of the algorithms is held too general. Please reference a paper or textbook and describe the methods graphically with respect to the application to your data.

 

Response 3: Thank you for your careful reading of our manuscript, and we are really sorry that we have not explained it clearly. We have added some explanation of the SVM. Details are in newly submitted manuscript (in section 3.2).

 

Point 4: The results chapter misses some very important basics, when it comes to machine learning: the training and test data sets are only briefly described and both a little small for very good results, as is shown with the accuracy going down when more people are involved. It is not clear to me, how the data set for the different groups of people were treated: did you train one network and test with different groups or did every group get its one specifically trained network. This is crucial for the assessment of the validity of results. What is also missing are the two important quality measures of "precision" and "recall". Depending on the input data, a network can reach 99% accuracy, but still fail.

 

Response 4: We are very grateful for your comments. For DCNN, after data augment, there are 4000 pieces time-frequency graphs for each subject. Then 80% of the graphs are used to train the network, and the remaining are used to test the stability of the network. When the traditional machine algorithm is used, the original radar signal is used for feature extraction directly. 80% of the extracted feature for each subject is used for training, and the remaining 20% is used for testing. Different from DCNN, when using these methods, we don’t expand the data set.

Every group get its one specifically trained network.

“precision” and “recall” can be obtained by confusion matrix, so I did not calculate them.

 

Point 5: My main issue with this study is that the focus is on comparing different machine learning methods, one deep-learning method (DCNN) and three feature based more basic machine learning methods. In line 302, the authors state that the "DCNN performs better than the conventional supervised learning methods", which for me is no wonder at all. In my opinion this is comparing apples to oranges, because the DCNN naturally has many more degrees of freedom to fit the data than the other methods. In addition, the other methods use a reduced data set of extracted features, which might not at all be the relevant features for human identification. One feature is heart beat rate, which may change during measurement (see Fig. 2b) another is beat energy, which also may change (see Fig. 2a). In lines 261-262 the authors state that a loss tending to 0 indicates a not-overfit network. This statement is a little awkward as the training algorithm is supposed to minimize the loss. If during training the loss becomes very small AND during testing the accuracy gets worse with a larger group of people (whatever this exactly meant in this context) indicates overfit, i.e. the network is able to replicate the training data, but with unknown data it cannot correctly classify results.   

 

Response 5: Thank you for your careful reading of our manuscript. The choice of features are mentioned in section 2.2. It is really true that the conventional supervised learning methods relying on the extracted features, and it requires operator has the domain knowledge of each problem. It is true that the features we choose are not perfect. The heart beat rate and the beat energy will change, but we think the two features of human will not change in few time without stimulating. Also, the choice of features can be always improved to improve the accuracy rate. So I think this need to be further researched.

As for the problem of overfit of the network, we give both loss values of training and testing, if the network overfits, the loss value of testing will not approach to 0.

 

Point 6: The last part of the results and discussion chapter deals with a completely new topic of noise resistance. It is completely unclear, what was done here, because "different level of noise" is not explained.

 

Response 6: We are really sorry that we have not explained it clearly. We added different level () of random noise to the raw radar signal respectively. For example, we added random noise () to the raw signals that we collected in section 2. And after that, we will do the STFT to the signal or extract features from the signal with noise. Details are in newly submitted manuscript (in section 4.3).

 

Point 7: The manuscript needs substantial language editing. In passages I could not follow the argumentation or explanations because I did not understand the sentences. There are many sentence fragments and unexplained acronyms and symbols. The captions to the figures are too short and any symbol explanation is missing. Some fragments of caption-like sentences appear in the main text but e.g. the explanation of figure 2, which shows the data, is too scarce to help anybody not familiar to time-lines of Doppler radar data to understand it. 

 

Response 7: We are really sorry for our vague expression. We have modified the language of the article, and added some explanations to the parts that haven’t been explained clearly.

 

Author Response File: Author Response.pdf

Reviewer 4 Report

This is an interesting manuscript that applies the machining learning algorithms into the radar gesture recognition. Following are some questions and comments: 1. Equation (2), what is the length of the Gaussian window? And why pick Gaussian window as it shows lower spectrum resolution compared with other windows such as Hamming window. 2. In this paper, the information of heart beats is analyzed to identify different people. But heart beat signal is much weaker than respiration and more often is overwhelmed by thoracic motions. Then how to get such clean Doppler signal shown in Fig. 2? 3. In equation (3), what is θ? 4. If understand well, “SGD” and “dropout operation” (Page 2 line 125 -130) are the new part author added to conventional DCNN. If so, I think author need to explain in detail on how these two steps work. 5. Page 5, line 151, I think authors miss operator or variant between “while” and “are”. 6. What are the pros and cons of SVM and NB? And why authors use SVM as the 1st feature extraction and NB as the interface. 7. In result section, authors explicitly explained the experimental procedure and parameters of DCNN experiment; while barely mention these information in SVN, NB and SVN-NB fusion experiments. Did all of them employ the same test condition, 100 spectrograms/person, repeated 100 times, etc.? 8. In Page9, line 278, authors mention that SVN and NB require less amount of data compared with DCNN. Can author explain more about it? 9. Section 4.1 DCNN was tested in a much more complex condition which involves up to 10 people (Fig. 8 (d)), while SVM, NB and SVM-NB were tested for 4 people scenario. It is not a fair comparison. 10. In Fig. 9, NB algorithm shows the lowest accuracy among all three methods. However, in the beginning of section 3.3, author mention that NV “has the smallest misclassification rate” even in “many conditions which are opposite to the assumption”. Can author explain this disagreement? 11. The fusion SVN-NB algorithm doesn’t significantly improve the accuracy to SVN alone algorithm. Is there any other advantage of implementing this fusion algorithm? 12. One disadvantage of DCNN is time-consuming. So I think it is necessary to put time cost in table I. Otherwise, DCNN shows superior in all other performances.

Author Response

Response to Reviewer 4 Comments

 

Point 1: Equation (2), what is the length of the Gaussian window? And why pick Gaussian window as it shows lower spectrum resolution compared with other windows such as Hamming window.

 

Response 1: We are very grateful for your comments. The length of the Gaussian window is 263/2000 s. When choosing the window function, I tried both hamming window and Gaussian window, and found that in this case, the effect of the two windows was not different, so I chose one as the window function.

 

Point 2: In this paper, the information of heart beats is analyzed to identify different people. But heart beat signal is much weaker than respiration and more often is overwhelmed by thoracic motions. Then how to get such clean Doppler signal shown in Fig. 2?

 

Response 2: Thank you for your comments. Actually, when the data were acquired, the subjects were not breathing. So the heartbeat will not be overwhelmed by thoracic motions

 

Point 3: In equation (3), what is θ?

 

Response 3: Thank you for your careful reading of our manuscript. θ represents the parameters of the training model.

 

Point 4: If understand well, “SGD” and “dropout operation” (Page 2 line 125 -130) are the new part author added to conventional DCNN. If so, I think author need to explain in detail on how these two steps work.

 

Response 4: We are really sorry that we have not explained it clearly. “SGD” and “dropout” operation are the design of the AlexNet.

 

Point 5: Page 5, line 151, I think authors miss operator or variant between “while” and “are”.   

 

Response 5: We apologize for our carelessness. The variant that between “while” and “are” is                                               , and we have added it to the manuscript. Details are in the newly submitted manuscript. (in line 193)

 

Point 6: What are the pros and cons of SVM and NB? And why authors use SVM as the 1st feature extraction and NB as the interface.

 

Response 6: Thank you for your careful reading of our manuscript. The pros of SVM and NB over DCNN is that they can get satisfying results with less data and less time. The cons of SVM is that SVM algorithm is difficult to implement for large-scale training samples. The cons of NB are as follows. First, conditional independent hypothesis is often not established, and the result will not be satisfying when the attributes are related to each other closely. Second, prior probability is needed, which is derived from hypothesis. However, there are many kinds of hypothesized models, so sometimes the results will be unsatisfying due to the drawbacks of the hypothesized model.

The reason why we use SVM as the 1st feature extraction and NB as the interface is that in the process of Bayesian reasoning, when there is no empirical data available, subjective probability can be used to replace the prior probability and likelihood functions of hypothetical event which makes Bayesian inference very suitable for the decision level of the fusion algorithm.

 

Point 7: In result section, authors explicitly explained the experimental procedure and parameters of DCNN experiment; while barely mention these information in SVN, NB and SVN-NB fusion experiments. Did all of them employ the same test condition, 100 spectrograms/person, repeated 100 times, etc.?   

 

Response 7: We are really sorry that we have not explained it clearly. All of these methods use the same dataset. However, for DCNN, we use time-frequency graphs. When the traditional machine algorithms are used, 2/3 of the raw radar signal is used for feature extraction directly. 80% of the extracted feature for each subject rather than spectrograms is used for training, and the remaining 20% is used for testing.

 

Point 8: In Page9, line 278, authors mention that SVN and NB require less amount of data compared with DCNN. Can author explain more about it?   

 

Response 8: We are really sorry that we have not explained it clearly. It is well known that DCNN needs plenty of data, and we used some basic image transformation methods to augment the dataset. When it comes to SVM and NB, only 2/3 of the raw radar signal was used to recognize human. Therefore, compared with SVM and NB, DCNN needs more data.

 

Point 9: Section 4.1 DCNN was tested in a much more complex condition which involves up to 10 people (Fig. 8 (d)), while SVM, NB and SVM-NB were tested for 4 people scenario. It is not a fair comparison.  

 

Response 9: We are very grateful for your comments. We have analysed the accuracy of the four different algorithms, which is shown in the figure below.  As it can be observed from the figure, the accuracy of recognition will decrease with the number of human increasing. Besides, DCNN performs best among the four algorithms. The accuracy of SVM-Bayes fusion algorithm is a little bit higher than that of SVM, but with the number of people increasing, the accuracy of SVM-Bayes fusion algorithm drops more quickly than that of SVM. Details are in newly submitted manuscript (in section 4.3).

Fig. The impact of human number for the four classification algorithm.

 

Point 10: In Fig. 9, NB algorithm shows the lowest accuracy among all three methods. However, in the beginning of section 3.3, author mention that NV “has the smallest misclassification rate” even in “many conditions which are opposite to the assumption”. Can author explain this disagreement?  

 

Response 10: Thank you for your comments. It means that comparing to other conditions of the same method, when the conditional independent hypothesis is satisfied, the misclassification rate of NB reaches the minimum. The second sentence means that NB has some generation, and can be used in real life.

 

Point 11: The fusion SVN-NB algorithm doesn’t significantly improve the accuracy to SVN alone algorithm. Is there any other advantage of implementing this fusion algorithm?  

 

Response 11: Thank you for your careful reading of our manuscript. The accuracy of the fusion algorithm is higher than that of SVM alone algorithm, which means that traditional algorithms can be improved through fusion, and it can be a research direction in the future.

 

Point 12: One disadvantage of DCNN is time-consuming. So I think it is necessary to put time cost in table I. Otherwise, DCNN shows superior in all other performances.   

 

Response 12: We are very grateful for your comments. I have provided the training time and identification time of one subject of the four algorithms. It is shown in the table below. Training time is the time it takes to train the network in the condition of four people, while the identification time means the time it costs to identify one spectrogram or a set of feature using the four methods. From Table 1, it can be observed that it takes more time to train DCNN, while the identification time of the four methods are similar. Details are in newly submitted manuscript (in section 4.3).

Table Comparison of the results of DCNN, SVM-Bayes, SVM and NB

Method

Accuracy

Train Time

Identification time

DCNN

98.5%

12min

1.539s

SVM-Bayes

91.25%

4.267s

0.771s

SVM

88.75%

1.982s

0.643s

NB

80.75%

1.674s

0.558s

 


Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Dear Authors,

thank you for this very interesting study and for the various clarifications you inserted in the revised version of the manuscript. Several questions, which remained open after my first read are now answered. However, I do have some more comments, which mainly deal with language issues. As I pointed out in my last review, I had trouble understanding several passages and sentences due to some unconventional formulations. I attached a pdf with comments and suggestions and hope that you find them stimulating.

I still have one major issue with this study. The main outcome is somewhat unsatisfying in that all algoriths perform worse with increasing number of test persons. This is very bad, as there are billions of people on this planet and identifying them is (at least in your introduction) the key of your research. Could you please compare your results to other methods like video. Is the accuracy of other methods better or worse? Maybe you can discuss different scenarios: Identify one specific individuum from a large group vs. idenitfying each in that group. This is clearly all given as numbers in the confusion matrices, but it would be good to have this as a small conclusion in words.


Comments for author File: Comments.pdf

Author Response

Point 1: Your acquisition rate is 50Hz. How can you choose a step size much smaller?

 

Response 1: Thank you for your comments, However, our acquisition rate is 50KHz, which has been stated in section 2.1.

 

Point 2: What is the color? What unit? Are the colors normalized to maximum? What are the horizontal lines in each picture (e.g. at +200 and -200Hz in d)

 

Response 2: We are very grateful for your comments. The energy can be figure out from the color of the picture. For example, the blue one has less energy than the red one. Its unit is “W/Hz”, and the colors are normalized to maximum. Horizontal lines in each pictures represents the frequency micro-Doppler.

 

Point 3: I still have one major issue with this study. The main outcome is somewhat unsatisfying in that all algoriths perform worse with increasing number of test persons. This is very bad, as there are billions of people on this planet and identifying them is (at least in your introduction) the key of your research. Could you please compare your results to other methods like video. Is the accuracy of other methods better or worse? Maybe you can discuss different scenarios: Identify one specific individual from a large group vs. idenitfying each in that group. This is clearly all given as numbers in the confusion matrices, but it would be good to have this as a small conclusion in words

 

Response 3: We are very grateful for your comments. The accuracy decreases with the increasing of the number of people, which exists in many methods and should be studied in the future. We have compared our results with other methods including image, WiFi and footstep. When we test our method like DCNN, we randomly selected 100 spectrograms for each person from the test data. And then the network need to verify each picture, which I think is similar to Identify one specific individual from a large group. Details are in the newly submitted manuscript (in section 4.4).


Author Response File: Author Response.pdf

Reviewer 4 Report

Authors have addressed my questions and comments explicitly. I think this paper is good to go.

Author Response

Point 1: Authors have addressed my questions and comments explicitly. I think this paper is good to go.

 

Response 1: We are very grateful for your recommendation. Some modifications have been made to the language and conclusion. Details are in the newly submitted manuscript


Author Response File: Author Response.pdf

Back to TopTop