Review Reports - A Hybrid Deep Learning Framework for Non-Intrusive Load Monitoring

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Dear authors, here are my observations.

Comments for author File: Comments.pdf

Author Response

Reviewer Comment 1:

The introduction is quite general and it is not clearly explained why the specific architecture (CNN–BiLSTM–Transformer) is the right solution.

Response 1:

Thank you for this valuable comment. We agree that the motivation for the proposed architecture needed to be clarified. In the revised manuscript, we have revised and expanded the last paragraph of the Introduction to explicitly explain why the CNN–BiLSTM–Transformer architecture is an appropriate solution for NILM. Specifically, we now clearly relate the key challenges of NILM—local feature extraction, long-term temporal dependency modeling, and global contextual learning—to the respective roles of the CNN, BiLSTM, and Transformer modules. This revision provides a clearer rationale for the architectural design and strengthens the motivation of the proposed approach. The corresponding revisions can be found in Lines 113–127 of the revised manuscript..

Reviewer Comment 2:

In the state of the art analysis, the authors mention that current methods have problems with long-term dependencies. However, models based exclusively on Transformer already solve long-term dependencies much better than LSTMs.

Response 2:

Thank you for this valuable comment. We agree that Transformer-based models are generally more effective than LSTM-based methods in capturing long-term dependencies. In the revised manuscript, we have clarified this point in both the state-of-the-art analysis and the Method Framework section. Specifically, in Section 2 (Method Framework), first paragraph, we revised the description to explicitly acknowledge the advantage of Transformer architectures in modeling long-range dependencies, while also noting that, in NILM scenarios, Transformer-only models may still face challenges in handling fine-grained local power variations, low-power appliances, and complex multi-state operating patterns. This clarification improves the precision of our discussion and better motivates the use of the proposed hybrid CNN–BiLSTM–Transformer architecture, without contradicting the strengths of Transformer-based approaches. The corresponding revisions are provided in Lines 147–159 of the revised manuscript..

Reviewer Comment 3:

In recent years, it has been shown that NILM in combination with ML can provide extremely good results as long as the set of features is chosen properly. For example, the paper Smart Non-Intrusive Appliance Load-Monitoring System Based on Phase Diagram Analysis captures this well. It should be mentioned in the text in the introduction and referred to as well.

Response 3:

We thank the reviewer for suggesting the reference “Smart Non-Intrusive Appliance Load-Monitoring System Based on Phase Diagram Analysis.” This study provides a representative example of how effective feature design can enhance NILM performance when combined with machine learning techniques. In response, we have incorporated this reference at two locations in the Introduction. First, in the paragraph describing the core process of NILM and the importance of feature extraction, we emphasize that appropriate feature engineering plays a crucial role in accurate load disaggregation. Second, in the paragraph discussing machine learning–based NILM methods, we cite this work to illustrate how feature design directly influences model performance. These additions strengthen the background discussion and better position our work within the existing NILM literature. The corresponding revisions are provided in Lines 63–66 and 88–91 of the revised manuscript..

Reviewer Comment 4:

In section 2 and 3 the equations must be rewritten so that they are visible, aligned and clearly structured, possibly with a software.

Response 4:

We thank the reviewer for the suggestion regarding the visibility and structure of equations in Sections 2 and 3. In response, we have carefully revised all formulas to ensure that they are clearly visible, properly aligned, and structurally clear in the manuscript. During this process, we also noticed that equation 12 was missing and the numbering had skipped directly to 13; this has now been corrected. We believe that these revisions improve the readability and clarity of the mathematical expressions.

Reviewer Comment 5:

Fig 1 is not clear and it is not understood how those values were reached.

Response 5:

Thank you for this valuable comment. We have revised Fig. 1 to improve its clarity and to clearly explain how the values shown in the figure were obtained. The figure now presents the cumulative power consumption over time for selected appliances in House 1 from the low-frequency subset of the REDD dataset, rather than the relationship between a single appliance and the total load. The data were obtained by aligning the power readings of all appliance channels by timestamp, cleaning the data by replacing zero values with NaN and removing invalid channels, and then computing the cumulative power for each selected appliance. For better readability, only odd-numbered channels were retained, the time range was truncated up to April 18, 2011, and half of the appliance channels were removed from the visualization. The figure legend and axis labels have also been updated to clearly indicate the corresponding appliances, time, and power units. These revisions ensure that Fig. 1 is clear and that the origin and calculation of all values are fully traceable.

Reviewer Comment 6:

There is a problem in the methodology. Transformer already has the ability to model global and local dependencies (through self-attention). BiLSTM adds a sequential complexity that slows down training (it cannot be parallelized as well as Transformer). Why was Transformer alone not enough?

Response 6:

Thank you for this valuable comment. Thank you for this insightful comment. We agree that Transformer-based models are capable of modeling both global and local dependencies through self-attention, and we appreciate the reviewer’s concern regarding the additional sequential complexity introduced by BiLSTM.In the revised manuscript, we have clarified this point in Section 3.5. Specifically, we now explicitly compare our model with a Transformer-only baseline (BERT-NILM) under identical experimental settings and discuss the role of the BiLSTM module in the NILM context. The experimental results show that introducing BiLSTM improves recall and stability for appliances with strong temporal continuity, short-duration high-power patterns, and low-frequency operation.We have added a clearer explanation that, while Transformer effectively captures global dependencies, the explicit bidirectional sequential modeling provided by BiLSTM complements self-attention and enhances robustness in modeling appliance state transitions. These clarifications have been incorporated into the revised Section 3.6.

Reviewer Comment 7:

No data is mentioned about Positional Encoding

Response 7:

Thank you for this valuable comment. We agree that the description of the positional encoding should be made more explicit. In the revised manuscript, we have clarified this point in Section 2, in the paragraph immediately above Eq. (3). Specifically, we now state that positional encoding is implemented using learnable positional embeddings, which are treated as trainable model parameters and optimized during training, rather than being derived from additional data. These embeddings are added to the BiLSTM output before being fed into the Transformer encoder. The corresponding revisions can be found in Lines 243–247 of the revised manuscript..

Reviewer Comment 8:

No details about critical hyperparameters (Input Sequence Length?)

Response 8:

Thank you for this valuable comment. In this work, the input sequence length is defined by the sliding window size used to construct each sample. Specifically, a fixed window size of 480 time steps is adopted for all appliances, which directly corresponds to the input sequence length of the model. This clarification has been added to the revised manuscript in Lines 403–407.

Reviewer Comment 9:

The chosen models (BERT, LSTM+, Seq2Seq) are relevant, but the description of their implementation is vague. Was BERT trained correctly?

Response 9:

Thank you for this valuable comment. We would like to clarify that the implementations of all three baseline models—BERT [33], LSTM+ [35], and the CNN-based Seq2Seq model [34]—are now described in detail in Section 3.3 of the revised manuscript (Lines 393–407).

The BERT model was trained correctly under the same experimental settings as the other baseline models and our proposed method. To ensure a fair and reproducible comparison, all models were trained using consistent configurations, including the sliding window strategy, stride, optimizer, batch size, and early stopping criteria. The detailed hyperparameter settings for all appliances are summarized in Table 5, and the complete training procedure has been explicitly documented.

In addition, we identified and corrected incorrect references to the LSTM+ and Seq2Seq models in the original manuscript. These citations have now been revised to properly reference the original works [34, 35].

We sincerely thank the reviewer for their careful reading and constructive suggestions, which helped us improve the clarity and correctness of the manuscript..

Reviewer Comment10:

MRE values are very high for some devices in the case of competing models, which suggests that perhaps the baselines were not optimized enough.

Response 10:

We thank the reviewer for the insightful comment regarding the MRE results. As reported in Tables 7 and 8, the revised manuscript now provides a clearer analysis of MRE performance on both the REDD and UK-DALE datasets (Lines 530–537).

On the REDD dataset, although the MRE of Our-NILM for certain appliances (e.g., washing machine and dishwasher) is slightly higher than that of BERT, the average MRE (0.234) remains comparable to BERT (0.231) and is lower than those of LSTM+ (0.243) and Seq2Seq (0.244), indicating robust overall power estimation performance.

On the UK-DALE dataset, Our-NILM achieves the lowest average MRE (0.155) among all compared methods, outperforming BERT (0.167), LSTM+ (0.215), and Seq2Seq (0.193). In particular, high-power appliances such as the refrigerator, dishwasher, and kettle exhibit significantly lower MRE, while short-duration appliances such as the microwave show comparable estimation accuracy.

All baseline models were implemented following their original publications, with hyperparameters set according to standard recommendations to ensure a fair comparison. These results demonstrate that Our-NILM maintains strong generalization capability and reliable power estimation accuracy across different appliance types and datasets.

Reviewer Comment 11:

There is no information about complexity, computational time, limitations, implementation in practical systems, etc.

Response 11:

We thank the reviewer for highlighting the importance of providing detailed information on model complexity, computational efficiency, practical deployment, and limitations. In the revised manuscript, we have added Section 3.5: Model Complexity and Practical Considerations to explicitly address these points. Specifically, we (i) quantify the computational complexity of the proposed model, showing that BiLSTM and Transformer modules dominate the cost; (ii) report actual inference times on the same dataset and hardware, including comparisons with BERT and TransUNet-NILM (Tables 6–7), demonstrating a favorable trade-off between accuracy and efficiency; (iii) discuss practical deployment scenarios, such as GPU-enabled edge devices or cloud platforms, indicating suitability for near-real-time NILM applications; and (iv) acknowledge limitations, including sensitivity to unseen appliances or highly fluctuating loads, as well as challenges with very short-duration appliance operations. These additions provide a comprehensive assessment of the proposed model’s applicability and performance.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

-The proposed method combines different neural networks for research, which is a common or ordinary approach in NILM applications. What are the innovations?
-The description of the method needs to be further detailed. Additionally, are there targeted improvements for NILM?
-The paper should compare with existing advanced methods to highlight the advantages of the proposed approach.
-The images in the paper are blurry and need to be improved for clarity.

Author Response

Summary

We sincerely thank the reviewer for the thoughtful and constructive comments. The reviewer’s suggestions have been highly valuable in improving the clarity, rigor, and practical relevance of the manuscript. We carefully considered each comment and have revised the manuscript accordingly. Detailed responses to all points are provided below.

Reviewer Comment 1:

The proposed method combines different neural networks for research, which is a common or ordinary approach in NILM applications. What are the innovations?

Response 1:

Thank you for this insightful comment. We agree that hybrid neural network architectures are commonly used in NILM research. The novelty of our work lies in the task-driven hierarchical integration of these components. Specifically, the model first employs a BiLSTM to explicitly capture short-term bidirectional temporal patterns related to appliance switching, followed by a Transformer module that models long-term dependencies and periodic behaviors at a higher feature level. This sequential, feature-level integration is tailored to the characteristics of NILM signals and differs from conventional parallel or loosely coupled hybrid models. These innovation points have been clarified in the revised manuscript (lines 113–133) to highlight the methodological contributions.

Reviewer Comment 2:

The description of the method needs to be further detailed. Additionally, are there targeted improvements for NILM?

Response 2:

Thank you for this valuable comment. To address the challenges in NILM, such as poor disaggregation of low-power and multi-state appliances, limited integration of local and global information, and high reconstruction errors, we have clarified the design and targeted improvements of our framework in the revised manuscript. Specifically, the proposed hybrid CNN–BiLSTM–Transformer architecture extracts local features via CNN, captures medium-term temporal dependencies with BiLSTM, and models long-term contextual relationships using the Transformer module. Deconvolution with non-linear transformations further enhances signal reconstruction. This integrated design strengthens both load disaggregation accuracy and robustness. The corresponding modifications can be found in lines 147–159 of the revised manuscript..

Reviewer Comment 3:

The paper should compare with existing advanced methods to highlight the advantages of the proposed approach.

Response 3:

We thank the reviewer for the suggestion regarding comparison with advanced NILM methods. In response, we have included TransUNet‑NILM as an additional comparative baseline. TransUNet‑NILM is a recently proposed deep learning approach for NILM that enhances local feature extraction via residual and attention mechanisms based on the TransUNet architecture, and has demonstrated improved performance on standard public datasets such as REDD and UK‑DALE. Under identical experimental conditions, Tables 6–7 present performance (F1 and MAE) and computational efficiency comparisons among Our‑NILM, BERT, and TransUNet‑NILM. The results show that Our‑NILM achieves a favorable balance between accuracy and runtime, highlighting the practical advantages of our proposed framework. The corresponding modifications can be found in lines 445–488 of the revised manuscript.

Reviewer Comment 4:

The images in the paper are blurry and need to be improved for clarity.

Response 4:

We thank the reviewer for the comment regarding figure clarity. Some of the figures have been adjusted to improve readability. The apparent blurriness in the manuscript may be partly due to Word software rendering, which can reduce on-screen resolution. To ensure full clarity in the published version, we can provide all original high-resolution images to the journal.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

1.Incomplete or Truncated Sentences.There are several sentences that appear incomplete or truncated. For example: “The model integrates CNN to extract local features, enhancing the recognition capability …” Such sentences should be completed to ensure grammatical correctness and clarity.

2.Inconsistent Terminology.Some technical terms are used inconsistently throughout the manuscript (e.g., Seq2seq vs. Seq2Seq, BiLSTM vs. Bilstm). These should be standardized across the entire paper.

3.Typographical and Grammatical Errors.Minor grammatical errors and typographical mistakes are present in several sections, particularly in the Introduction and Methodology. A thorough language proofreading is recommended.

4.Figure and Table Referencing Errors.Some figures and tables are not referenced in the correct order or are mentioned multiple times with inconsistent numbering. For instance, duplicate figure numbers (e.g., Fig. 5 and Fig. 6) should be corrected.

Author Response

Summary

Reviewer Comment 1:

Incomplete or Truncated Sentences.There are several sentences that appear incomplete or truncated. For example: “The model integrates CNN to extract local features, enhancing the recognition capability …” Such sentences should be completed to ensure grammatical correctness and clarity.

Response 1:

We thank the reviewer for pointing out the issue of incomplete or truncated sentences. We have carefully reviewed the manuscript and completed all such sentences to ensure grammatical correctness and clarity. For example, the sentence “The model integrates CNN to extract local features, enhancing the recognition capability …” has been revised to fully describe the intended meaning.

Reviewer Comment 2:

Inconsistent Terminology.Some technical terms are used inconsistently throughout the manuscript (e.g., Seq2seq vs. Seq2Seq, BiLSTM vs. Bilstm). These should be standardized across the entire paper.

Response 2:

We appreciate the reviewer’s comment regarding inconsistent terminology. We have thoroughly reviewed the manuscript and standardized all technical terms, including “Seq2Seq” and “BiLSTM,” to ensure consistent usage throughout the paper.

Reviewer Comment 3:

Typographical and Grammatical Errors.Minor grammatical errors and typographical mistakes are present in several sections, particularly in the Introduction and Methodology. A thorough language proofreading is recommended.

Response 3:

We appreciate the reviewer’s careful reading and helpful comments. The manuscript has been thoroughly proofread to correct typographical and grammatical errors, especially in the Introduction and Methodology sections, ensuring improved clarity and readability.

Reviewer Comment 4:

Figure and Table Referencing Errors.Some figures and tables are not referenced in the correct order or are mentioned multiple times with inconsistent numbering. For instance, duplicate figure numbers (e.g., Fig. 5 and Fig. 6) should be corrected.

Response 4:

Thank you for pointing this out. We have carefully reviewed all figures and tables, corrected numbering inconsistencies, and ensured that each figure and table is referenced in the correct order throughout the manuscript.