Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Causality-Driven Feature Selection for Calibrating Low-Cost Airborne Particulate Sensors Using Machine Learning

Sensors 2024, 24(22), 7304; https://doi.org/10.3390/s24227304

by Vinu Sooriyaarachchi

, David J. Lary^*

, Lakitha O. H. Wijeratne

and John Waczak

Reviewer 1:

Tao Liu

Reviewer 2: Anonymous

Sensors 2024, 24(22), 7304; https://doi.org/10.3390/s24227304

Submission received: 8 October 2024 / Revised: 8 November 2024 / Accepted: 14 November 2024 / Published: 15 November 2024

(This article belongs to the Section Sensor Networks)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The authors employed a causality analysis method, convergent cross mapping (CCM), to select features for a low-cost air quality sensor calibration. This method was chosen for its ability to identify causal relationships between variables, which is crucial for sensor calibration. Several features were selected based on a metric computed from CCM, and the subsequent odor classification results validated the effectiveness of the proposed method. While the use of CCM for feature selection is a novel approach, the authors should address the following concerns:

Section 2.1 needs to be rewritten. The current version uses text to illustrate the process of the proposed method, which is not enough considering the rigor of scientific expression. A mathematical description is also needed.
A more detailed explanation of the mathematical assumption and procedure of CCM would enhance the readability of the manuscript and ensure that the method is fully understood by the reader.
Why can CCM be used in optical gas sensor calibrations? Please give the rationality in view of the sensing principle of an optical gas sensor.
Information on the optical sensors and the feature generation method should be presented.
Other causality analysis methods, such as Granger causality analysis, should be used in performance comparisons besides CCM. The results of these methods may help to solve the doubt in Comment 3.
The presented results are like outputs of a classification instead of a calibration process. There is no comparison between with and without the calibration.
The sensor type should be emphasized in the title, abstract, and introduction. Different odor-sensing principles result in signals with different characteristics, and different characteristics cannot be dealt with by a unique method.

Author Response

Thanks, we really appreciate your time! Please see attached file with our detailed replies to the reviewers comments.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

My general evaluation of the “Causality-Driven Feature Selection for Calibrating Low-Cost Air Quality Sensors Using Machine Learning” is as follows. This manuscript proposes a causality-driven feature selection for the machine learning regression model to realize low-cost air quality sensing methods. It is believed that the following corrections will be beneficial for the strengthening of the article.

1 – Was the overfitting issue considered enough during the Gradient Boosting Regressor training process used for prediction? This is important when comparing the accuracy.

2 – In the regression problems, the R-squared value above 0.99 is generally considered to be an effective model. For PM1, the SHAP-based and Causality-based models each showed performance improvements to 0.991 and 0.993, and for PM2, 0.984 and 0.985, respectively, with almost similar improvements. Can this be considered a statistically significant difference between the two methods? In addition, is there a possibility that the proposed method shows improved performance only in a particular split of the dataset? It is necessary to validate whether the proposed method remains effective in training and test datasets that are randomly shuffled and split.

3 – For the model training, was input and output data scaling applied? If so, an explanation of the scaling method used should be added to the manuscript. In addition, was MSE calculated using values that were inversely scaled? Furthermore, each plot requires units to be added on the respective axes.

4 – In this study, the authors used 10 input features determined by the Causality method. In addition, based on the number of input features, the same 10 features were selected by high SHAP as a comparison. Although the general threshold is 0.5, the high variance among scores may result in an appropriate comparison. A comparison with 4, 5, or 6 input features may also be necessary.

5 – The introduction may refer to additional research related to machine learning techniques for air quality prediction. Some suggested references are DOI: 10.1016/j.buildenv.2023.110191, DOI: 10.1016/j.chemosphere.2023.139518, DOI: doi.org/10.1016/j.envpol.2024.123371, DOI: 10.1038/s41598-023-28287-8

Author Response

Thanks, we really appreciate your time! Please see attached file with our detailed replies to the reviewers comments.

Article Menu

Causality-Driven Feature Selection for Calibrating Low-Cost Airborne Particulate Sensors Using Machine Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI