Emotion Estimation Using Noncontact Environmental Sensing with Machine and Deep Learning Modelsâ€
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors presented an original and interesting method for assessing the level of emotional arousal using a non-contact environmental sensor. The method consists in analyzing a facial image using artificial intelligence - machine learning and deep learning. Their accuracy in assessing the emotional state was studied. The authors' results show a fairly good efficiency of the developed approaches for assessing the emotional state based on environmental data.
- What is the main question of the study?
The authors presented an original and interesting method for assessing the level of emotional arousal using a contactless environmental sensor. They used a set of experimental environmental data and developed artificial intelligence models (Random Forest, Gradient Boosting Decision Trees, CNN-LSTM) to analyze such data.
- Do you think the topic is original or relevant to the industry? Does it address a specific gap in the industry? Please also explain why it does or does not.
The topic of the article is relevant due to the various applications of emotional AI methods in the tasks of assessing human behavior and privacy and security issues, and expands the range of tasks that can be effectively researched. Publications on this topic have been quite popular in recent years, and the authors point this out in the literature review.
- What does this add to the subject area compared to other publications
material?
The original feature of the paper is that the authors prepared the experimental equipment and conducted the experiments to collect the data. The use of the prepared datasets (not only the so-called emotional ones, but also, as the authors claim, environmental ones) together with the models built for emotion estimation using random forest, gradient boosting decision trees and CNN-LSTM and the ability to assess the accuracy of such models using the coefficient of determination (R²) is a sufficient contribution to the subject area.
- Are the conclusions consistent with the evidence and arguments presented and do they address the main question posed? Also explain why this is/is not the case.
The authors' comparison of three models - Random Forest, Gradient Boosting Decision Trees and CNN-LSTM - showed that the RF model achieved the highest accuracy, the GBDT model was less accurate than RF, and the CNN-LSTM model showed significantly lower accuracy compared to the other two models with a greater variation in results. It should be noted that the presented text and conclusions are fully consistent with the data results and the authors' arguments.
- Are the references appropriate?
The references in the literature review are appropriate. As a drawback, it should be noted that the authors omitted references to some sources in the text.
- Any additional comments on tables and figures.
Some comments on the text format.
Line 13, 50. Real time → real time?
Line 143. In another room - extra space after ‘room’.
Line 197. deep learning → deep learning?
Figure 1.18. The text is not aligned.
References in the text are not in ascending order.
Not all references are used: 3, 12, 13. I suggest that the authors check the entire text.
Tables are mentioned in different styles. Not as Table 2, Table 3, but as the table below.
Table 4 is also not mentioned.
The article can be accepted after minor corrections in the text.
Author Response
Comment 1. The references in the literature review are appropriate. As a drawback, it should be noted that the authors omitted references to some sources in the text.
Reply
Thank you for pointing out the issues with references. We have reviewed and corrected the order of references in the text to ensure they are in ascending order.
Other comments
Line 13, 50. Real time → real time?
Line 143. In another room - extra space after ‘room’.
Line 197. deep learning → deep learning?
Figure 1.18. The text is not aligned.
Tables are mentioned in different styles. Not as Table 2, Table 3, but as the table below.
Table 4 is also not mentioned.
We have corrected the formatting inconsistencies, including "Real time" and "deep learning," and removed extra spaces (e.g., after "room" on line 143). Figure 1.18 has been realigned, and all figures and tables now follow a consistent referencing style.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe article addresses an interesting and timely research topic. The results presented are encouraging but require some adjustments. Below are some of my comments that may help the authors to improve the publication.
Introduction
The publication lacks a defined purpose for the article, structure and main contribution.
Results
The results only present the results of the measurements carried out without their interpretation. What conclusions can be drawn from the results? How do these results translate into applying the analysed models in research?
Discussion
The text in lines 296 - 315 should be included in the results, as it summarises the results presented in the diagrams and repeats some of the data from Section 4. The discussion should consist of a causal analysis and interpretation of the differences in the results obtained. The discussion should also reference the results obtained by other authors. The discussion should also answer the question: why was the GBDT not optimised when it could have improved the accuracy of the measurement? Can such results be compared with the results of other models, and can conclusions be drawn from them?
Conclusion
What contribution do the obtained results make to research related to measuring emotions? How can other researchers use them? Can the results obtained be used by industry? Do the directions for possible future development indicated in the conclusion represent directions for further research by the authors?
Author Response
Comments for Introduction
The publication lacks a defined purpose for the article, structure and main contribution.
Reply
Thank you for your valuable feedback on the introduction. In response to your suggestions, I have added more detailed descriptions regarding the objectives, structure, and contributions of this study. Specifically, I have clarified that the primary goal of this research is to investigate the feasibility of emotion estimation based on environmental data, rather than simply comparing the accuracy of various methods. Additionally, I have emphasized the practical advantages of our approach, such as its non-invasive nature and the absence of privacy concerns, making it particularly suitable for real-world applications in homes, educational settings, and workplaces. Furthermore, I have elaborated on the contributions of the study, particularly in terms of identifying environmental factors that influence emotions and how this can lead to a deeper understanding of emotions and more effective environmental adjustments in everyday life.
Comments for Results
The results only present the results of the measurements carried out without their interpretation. What conclusions can be drawn from the results? How do these results translate into applying the analysed models in research?
Reply
Thank you for your comment. We have added a concise discussion to the results section to enhance its content. This addition clarifies how the obtained results are interpreted and how the analyzed models can be applied in research.
Comments for Discussion
The text in lines 296 - 315 should be included in the results, as it summarises the results presented in the diagrams and repeats some of the data from Section 4.
Reply
Thank you for your feedback. The section you referred to has been moved to the results section, and the discussion has been revised to focus on the analysis and interpretation of the results, as well as future challenges. Additionally, we have included a response to the question you raised regarding the optimization of GBDT and its comparison with other models.
Comments for Conclusion
What contribution do the obtained results make to research related to measuring emotions? How can other researchers use them? Can the results obtained be used by industry? Do the directions for possible future development indicated in the conclusion represent directions for further research by the authors?
Reply
Thank you for your feedback. We have revised the conclusion section to make the contributions of the study, along with future directions and implications, clearer. This revision highlights how the obtained results contribute to emotion measurement research, how other researchers can use them, their potential applications in industry, and the directions for future research that align with the study's objectives.
Reviewer 3 Report
Comments and Suggestions for AuthorsThis paper proposes a novel method to estimate human emotions (arousal and valence) using non-contact environmental sensing. This approach addresses the limitations of traditional methods, such as discomfort and privacy concerns. The researchers employed machine learning models, including Random Forest and Gradient Boosting Decision Trees, to analyze environmental data and accurately estimate emotional states.
However, please take into consideration the next remarks:
1. Authors try to classify the emotions (Relax, Angry, Sad... as in line 170). It meas that this a categorical classification that needs 4 outputs in your neural network each one with a probability value. However, your network have only one output and the the most suitable loss function for this king of problems if cross-entropy, metrics should be confusion matrix or classification report (Accuracy, F-Score...) or AUC... instead of R-Square (since this is not regression).
2. Add some details about the training results, loss convergence curve, accuracy evolution curve...
3. Conclusing that the results indicate that decision tree-based models, particularly Random Forest, are highly effective for this task has to be rethinked again. Deep Learning (powerfull non-linear classifier is the most accurate up to now in most of ML problems, but the network must e designed well).
Comments on the Quality of English LanguageNo comments.
Author Response
Comment 1. Authors try to classify the emotions (Relax, Angry, Sad... as in line 170). It meas that this a categorical classification that needs 4 outputs in your neural network each one with a probability value. However, your network have only one output and the the most suitable loss function for this king of problems if cross-entropy, metrics should be confusion matrix or classification report (Accuracy, F-Score...) or AUC... instead of R-Square (since this is not regression).
Reply
Thank you for your comment. However, our study differs from classification tasks as it employs two continuous values, Arousal and Valence, as ground truth data. Therefore, this study addresses a regression problem rather than a classification problem. For this reason, we believe the use of R-Square as a metric is appropriate in our context. Additionally, we have revised the introduction to emphasize that this study focuses on a regression problem.
Comment 2. Add some details about the training results, loss convergence curve, accuracy evolution curve...
Reply
Thank you for your valuable suggestion. We have added graphs showing the training loss and accuracy progression for the GBDT and CNN-LSTM models to the results section. These additions provide a clearer view of the model training process and performance trends. We hope this addresses your request.
Comment 3. Conclusing that the results indicate that decision tree-based models, particularly Random Forest, are highly effective for this task has to be rethinked again. Deep Learning (powerfull non-linear classifier is the most accurate up to now in most of ML problems, but the network must e designed well).
Reply
Thank you for pointing this out. In response, we have included a discussion on the potential reasons for the lower performance of the CNN-LSTM model compared to the tree-based methods. This discussion explores factors such as the dataset characteristics, data size, and potential limitations in the model design, such as insufficient tuning of hyperparameters. We hope this provides a satisfactory explanation.
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThank you for the corrections made. I have no further comments.