Hybrid Machine Learning for IoT-Enabled Smart Buildings
Round 1
Reviewer 1 Report (Previous Reviewer 1)
Comments and Suggestions for AuthorsKindly refrain from presenting a sequence of five to six figures consecutively without providing any explanatory context in between. It would be convenient to include a paragraph of text that explains each figure either immediately before or after the figure itself. Although you have explanatory text, it is presently situated collectively before all the figures. Positioning these explanations in proximity to the corresponding figures will enhance clarity and legibility for the audience.
Author Response
Dear reviewer, thank you for your pertinent observations and your useful suggestions that will help us improve the quality and readability of our manuscript. Based on your review, the manuscript has been revised and we hope all the modifications will meet the demands to publish the manuscript in the journal.
Reviewer #1
Comment 1. Kindly refrain from presenting a sequence of five to six figures consecutively without providing any explanatory context in between. It would be convenient to include a paragraph of text that explains each figure either immediately before or after the figure itself. Although you have explanatory text, it is presently situated collectively before all the figures. Positioning these explanations in proximity to the corresponding figures will enhance clarity and legibility for the audience.
Response: Thank you for your feedback. We rearranged the material and reorganized the figures from subsection 7.2. to enhance clarity and ensure a better experience for the reader.
Thanks again for your effort and for spending your precious time making such important comments, we hope this round our manuscript reached the level of satisfaction expected by you and the journal.
Sincerely yours,
Robert CRACIUN
Faculty of Automatic Control and Computers, National University of Science and Technology Politehnica Bucharest, Romania
Reviewer 2 Report (Previous Reviewer 2)
Comments and Suggestions for AuthorsThe authors addressed all my comments.
Author Response
Dear reviewer, thank you for your review and for acknowledging our efforts. We appreciate your feedback and are glad we could address all your comments.
Thanks again for your effort and for spending your precious time making such important comments, we hope this round our manuscript reached the level of satisfaction expected by you and the journal.
Sincerely yours,
Robert CRACIUN
Faculty of Automatic Control and Computers, National University of Science and Technology Politehnica Bucharest, Romania
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper follows a clear, logical structure for its research design.
1. Evaluate the hybrid model against alternative models (e.g., random forests, support vector machines, or neural networks) to substantiate the superiority of the hybrid strategy. Conduct a baseline comparison with more rudimentary IDS models to demonstrate the additional advantages of the hybrid methodology.
2. Specify whether and how data cleaning was performed (e.g., outlier elimination, management of missing numbers). Explain the rationale for selecting the 80/20 split over k-fold cross-validation at this point in time.
3. Provide a rationale for the incorporation of Hamming loss. Calculate confidence intervals for accuracy, precision, recall, and F1-score.
4. Explicitly describe the meanings of "type," "class," and "subclass" to prevent confusion. Detail the CPU, RAM, and training duration necessary without PCA to demonstrate the benefits of dimensionality reduction.
Incorporate annotations regarding significant metrics, such as "optimal PCA threshold = 70%."
5. Include a table that summarizes the enhancements in RAM consumption, CPU usage, and training duration when comparing PCA 100 to PCA 70 (or a non-hybrid XGBoost methodology) to clearly illustrate the efficiency gains.
Author Response
Response to Reviewers’ Comments – Reviewer 1
Dear reviewer, thank you for your pertinent observations and your useful suggestions that will help us improve the quality and readability of our manuscript. Based on your review, the manuscript has been revised and we hope all the modifications will meet the demands to publish the manuscript in the journal.
Reviewer #1
Comment 1. Evaluate the hybrid model against alternative models (e.g., random forests, support vector machines, or neural networks) to substantiate the superiority of the hybrid strategy. Conduct a baseline comparison with more rudimentary IDS models to demonstrate the additional advantages of the hybrid methodology.
Response: Thank you for your feedback. In response, we have added a detailed comparison between our hybrid model and alternative machine learning models, including Random Forests, Naive Bayes and Decision Trees, to demonstrate the superiority of the hybrid strategy. Additionally, we conducted a baseline comparison between these traditional IDS models to showcase the advantages of the hybrid approach. Our results indicate that while these alternative models perform well, the hybrid model consistently outperforms them, particularly at the subclass level, where it efficiently handles complex network traffic patterns. This comparison highlights the enhanced accuracy and reliability of the hybrid method, confirming its superiority in detecting a wide range of intrusions and providing more precise classifications. We introduced a new Table with the metrics and the discussion at the beginning of chapter 7.
Comment 2. Specify whether and how data cleaning was performed (e.g., outlier elimination, management of missing numbers). Explain the rationale for selecting the 80/20 split over k-fold cross-validation at this point in time.
Response: Thank you for your comment. In response, we clarified that data cleaning was performed by the Argus tool, which performed filtration of the data, removed duplicate data and missing values. This addition can be found in Chapter 5, under Table 2. Data partitioning into 80% data for training and 20% data for testing is the “golden rule” in nowadays machine learning applications since it offers a very good balance between training efforts, time and relevance and performance metrics obtained by the model (accuracy, precision, recall etc.). We added the clarification in Chapter 6 after Table 5.
Comment 3. Provide a rationale for the incorporation of Hamming loss. Calculate confidence intervals for accuracy, precision, recall, and F1-score.
Response: Thank you for your suggestion. We removed the mention of Hamming loss in the text. The confidence intervals were specified in Table 6, as well as a discussion on the results before the table.
Comment 4. Explicitly describe the meanings of "type," "class," and "subclass" to prevent confusion. Detail the CPU, RAM, and training duration necessary without PCA to demonstrate the benefits of dimensionality reduction.
Incorporate annotations regarding significant metrics, such as "optimal PCA threshold = 70%."
Response: Thank you for your feedback. We added a description of the dataset classification in chapter 5, after Table 4. PCA 100 is referring to keeping 100% of the dimensions of the dataset which translates in having the full dataset without dimensionality reduction and this case is covered.
Comment 5. Include a table that summarizes the enhancements in RAM consumption, CPU usage, and training duration when comparing PCA 100 to PCA 70 (or a non-hybrid XGBoost methodology) to clearly illustrate the efficiency gains.
Response: Thank you for the suggestion. We added Table 7 which contains the comparison between the metrics of both PCA values.
Thanks again for your effort and for spending your precious time making such important comments, we hope this round our manuscript reached the level of satisfaction expected by you and the journal.
Sincerely yours,
Robert CRACIUN
Faculty of Automatic Control and Computers, National University of Science and Technology Politehnica Bucharest, Romania
Reviewer 2 Report
Comments and Suggestions for AuthorsThe structure of the paper is a little bit confusing. The introduction is very long with many referenced papers, and also in the section "Methodology" there are many references to the state of the art, but there is a "Related work" section. Authors should move the text into the proper sections: "introduction" introduces, Related works contain the analysis of the state of the art, and methodology explains the proposed approach.
Section "Results" is very detailed but needs some improvement. Figures "Metrics Variation" (1, 2, etc) are not very clear. I do not suggest removing them, authors should choose, but it's difficult to interpret results, and it should be better explained.
The authors did several tests, but the section results were confusing. My advice is to resume, maybe with the help of a tab, all the experiments, describing the aim and what they demonstrate.
Results are conducted on a single dataset and show a very high accuracy. The authors should discuss the limitations of the results, and suggest how to test the proposed approach to further challenge that, such as in a real use case scenario.
Minor remarks:
- Pseudocode in Table 5 contains lines such as "Load the dataset from the specified file path, Read the data and encode labels, Split data ..." and seems a little trivial and useless.
- Can authors discuss some practical use case scenarios of existing centralized control systems for smart buildings? take for example DOI 10.1109/ICAISC56366.2023.10085312
Author Response
Response to Reviewers’ Comments – Reviewer 2
Dear reviewer, thank you for your pertinent observations and your useful suggestions that will help us improve the quality and readability of our manuscript. Based on your review, the manuscript has been revised and we hope all the modifications will meet the demands to publish the manuscript in the journal.
Reviewer #2
Comment 1. The structure of the paper is a little bit confusing. The introduction is very long with many referenced papers, and also in the section "Methodology" there are many references to the state of the art, but there is a "Related work" section. Authors should move the text into the proper sections: "introduction" introduces, Related works contain the analysis of the state of the art, and methodology explains the proposed approach.
Response: Thank you for your feedback. In response, we moved the reference from Methodology to Related Work so we have consistency between chapters.
Comment 2. Section "Results" is very detailed but needs some improvement. Figures "Metrics Variation" (1, 2, etc) are not very clear. I do not suggest removing them, authors should choose, but it's difficult to interpret results, and it should be better explained.
Response: Thank you for your feedback. We appreciate your comments on the Results section. The figures are intended to illustrate the variation in performance metrics over 100 iterations and PCA variation of the model training and evaluation process. These figures provide a clear representation of the stability and consistency of the models across multiple runs, as stated in the description of each individual figure.
Comment 3. The authors did several tests, but the section results were confusing. My advice is to resume, maybe with the help of a tab, all the experiments, describing the aim and what they demonstrate.
Response: Thank you for your suggestion. There was only one experiment conducted that produced a number of different metrics. Each metric was explained in each of the figures in the Results section. From model performance metrics to gateway performance metrics, all the performance indicators were explained step by step to explain the benefits for having a hybrid identification system.
Comment 4. Results are conducted on a single dataset and show a very high accuracy. The authors should discuss the limitations of the results, and suggest how to test the proposed approach to further challenge that, such as in a real use case scenario.
Response: Thank you for your feedback. We used a real environment for making the dataset. An additional phrase was added in Section 5 to fix the confusion. There is no limitation, this is how the dataset combined with the Argus network analyzer curated the data and provided a robust dataset to be used with machine learning classifiers.
Comment 5. - Pseudocode in Table 5 contains lines such as "Load the dataset from the specified file path, Read the data and encode labels, Split data ..." and seems a little trivial and useless.
Response: Thank you for the suggestion. We would like to keep the explicit lines in the pseudocode as we would like other researchers to fully understand every step we took in order to have get the results in the proposed approach so we can facilitate a faster approach to get them on board with the hybrid strategy.
Comment 6. - Can authors discuss some practical use case scenarios of existing centralized control systems for smart buildings? take for example DOI 10.1109/ICAISC56366.2023.10085312
Response: Thank you for the suggestion. Our hybrid strategy is based on a real scenario, with all the devices being deployed in a building and the dataset already contains real world data in a “production ready” environment.
Thanks again for your effort and for spending your precious time making such important comments, we hope this round our manuscript reached the level of satisfaction expected by you and the journal.
Sincerely yours,
Robert CRACIUN
Faculty of Automatic Control and Computers, National University of Science and Technology Politehnica Bucharest, Romania
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors just answered to my comments, without taking any action in the text for most of them.
Moreover, when you present a paper with a only dataset and no comparison, you cannot state that "has no limitations"
Therefore, my suggestions remains the same as in the previous round.
Author Response
Please see the attachment. Thank you.
Author Response File: Author Response.pdf