A Survey of Traditional and Emerging Deep Learning Techniques for Non-Intrusive Load Monitoring

Huzzat, Annysha; Khwaja, Ahmed S.; Alnoman, Ali A.; Adhikari, Bhagawat; Anpalagan, Alagan; Woungang, Isaac

doi:10.3390/ai6090213

Open AccessReview

A Survey of Traditional and Emerging Deep Learning Techniques for Non-Intrusive Load Monitoring

by

Annysha Huzzat

¹

,

Ahmed S. Khwaja

^1,2,*

,

Ali A. Alnoman

³

,

Bhagawat Adhikari

¹

,

Alagan Anpalagan

¹

and

Isaac Woungang

²

¹

Department of Electrical, Computer and Biomedical Engineering, Toronto Metropolitan University, Toronto, ON M5B 2K3, Canada

²

Department of Computer Science, Toronto Metropolitan University, Toronto, ON M5B 2K3, Canada

³

Department of Computer Science and Engineering, American University of Ras Al Khaimah, Ras Al Khaimah 72603, United Arab Emirates

^*

Author to whom correspondence should be addressed.

AI 2025, 6(9), 213; https://doi.org/10.3390/ai6090213

Submission received: 12 August 2025 / Revised: 28 August 2025 / Accepted: 30 August 2025 / Published: 3 September 2025

(This article belongs to the Section AI Systems: Theory and Applications)

Download

Browse Figures

Versions Notes

Abstract

To cope with the increasing global demand of energy and significant energy wastage caused by the use of different home appliances, smart load monitoring is considered a promising solution to promote proper activation and scheduling of devices and reduce electricity bills. Instead of installing a sensing device on each electric appliance, non-intrusive load monitoring (NILM) enables the monitoring of each individual device using the total power reading of the home smart meter. However, for a high-accuracy load monitoring, efficient artificial intelligence (AI) and deep learning (DL) approaches are needed. To that end, this paper thoroughly reviews traditional AI and DL approaches, as well as emerging AI models proposed for NILM. Unlike existing surveys that are usually limited to a specific approach or a subset of approaches, this review paper presents a comprehensive survey of an ensemble of topics and models, including deep learning, generative AI (GAI), emerging attention-enhanced GAI, and hybrid AI approaches. Another distinctive feature of this work compared to existing surveys is that it also reviews actual cases of NILM system design and implementation, covering a wide range of technical enablers including hardware, software, and AI models. Furthermore, a range of new future research and challenges are discussed, such as the heterogeneity of energy sources, data uncertainty, privacy and safety, cost and complexity reduction, and the need for a standardized comparison.

Keywords:

non-intrusive load monitoring; deep learning; hybrid models; generative AI; attention-enhanced; system implementation

1. Introduction

Energy consumption has been increasing globally due to rapid urbanization [1]. The high proliferation and usage of consumer appliances to improve comfort levels and achieve automation in smart cities have especially contributed to this increase. Specifically, in developed countries, the electricity usage in the residential sector amounts to 40% of the total electricity demand [2]. It was further stated in [2] that up to one third of this electricity could be wasted involuntarily. Given the ever-increasing importance of climate change and reaching net-zero emission goals [3], it is important to reduce the overconsumption and wastage of electricity. One possible means to achieve this goal is to raise awareness of the consumers about their electricity consumption patterns, allowing them to voluntarily adjust their habits to reduce unnecessary and avoidable electricity consumption.

Load monitoring is a tool to achieve the aforementioned consumer awareness. It is the process of acquiring and identifying load measurements in a power system, and it can be used to determine the electricity consumption of individual appliances, providing insight about their contribution in the overall load [4]. This understanding can help in facilitating the reduction of the usage of high-energy-consuming devices, eliminating unwanted activities and encouraging the scheduling of devices to reduce electricity bills. It was stated in [5] that real-time feedback on household consumption as well as the disaggregated consumption of individual appliances could result in energy savings in the 5–15% range. Similarly, according to [6], up to 10–20% of energy bills could be reduced thanks to load monitoring. Furthermore, when this change in the consumption behavior accumulates over a large scale such as a city or a country, it can reduce the energy deficit and decrease the emission of green house gases.

Load monitoring can be divided into two categories: intrusive load monitoring (ILM) and non-intrusive load monitoring (NILM). The former utilizes a number of low-cost electricity meters to monitor the energy consumption of individual appliances. Internet of things (IoT) devices, such as wireless sensors or smart plugs, can be used for this purpose by installing a sensing device for each appliance that needs to be monitored. In this case, the number of installed sensing devices, and hence the cost of monitoring, will increase according to the number of appliances that need to be monitored.

This is considered an intrusive approach, as it requires actual installation of the monitoring devices inside the premises of the consumer, and in proximity to each appliance being monitored [4,7]. This approach can provide accurate results; however, it requires complex installations within the household [8] with internal wiring, sub-metering, data storage units, and smart devices, to name a few. An optimization-based technique to reduce the number of monitoring devices is presented in [9], where each monitoring device can be connected to more than one appliance with a unique power consumption signature. In this way, one monitoring device can be used to indicate the usage of different devices connected to it.

Unlike ILM, NILM technology monitors the energy consumption of individual appliances using the total aggregated energy of appliances. This energy is available at a single point, which is the smart meter. The authors in [10] stated that the IoT and artificial intelligence (AI) techniques can be combined to develop an NILM infrastructure in a smart city scenario. They presented an example solution for the NILM, where low-cost IoT devices are used to measure the electricity consumption, transmit the data, and separate the data at the appliance level. Indeed, an example of a computer, monitor, and lamp is considered, and it is illustrated how the NILM technology can be used to provide real-time energy consumption data within a smart grid, which is an essential part of smart cities. Not only can it provide online feedback to consumers, but it can also help in the optimization of power generation and dispatch [11].

The NILM is also known as energy disaggregation [4], and it is represented visually in Figure 1. It involves the training of an AI model to estimate the power demand of the individual appliance from the aggregated electricity data using machine learning (ML) algorithms. This approach is simpler in its installation than the ILM, as it does not require any sensors to be installed on the monitored appliances. It can use data such as power, voltage, and current measurements, as well as harmonic analysis corresponding to different appliances [2], to train the disaggregation models using different ML algorithms. It allows for the analysis of energy consumption without making a permanent change to the electrical infrastructure [7].

A few different uses of NILM can be summarized as follows:

1.: It may be installed temporarily at a customer’s premises upon their request to perform an energy audit and generate a detailed energy consumption report. This report can help the customer to understand and change their electricity usage habits in order to reduce their electricity usage and bills. This process can be followed up by another temporary installation to confirm the electricity consumption savings achieved after the change.
2.: It can also be used to detect unusual patterns of appliance usage and thus prevent failure of the appliances. These features are useful in home automation. In [12], a preliminary proposal was made to use the NILM in disaster and emergency scenarios to help the first responders in identifying victims.
3.: Electric utilities can use NILM to monitor specific loads from up to hundreds of consumers in a non-intrusive manner. The monitoring data can be used for statistical analysis purposes by load forecasters, policy makers, etc. This feature can especially be useful in the estimation of renewable energy generation patterns at a highly aggregated level, such as at a regional level. It can help improve the planning and operations of electricity distribution in the presence of an increasing proliferation of renewable energy resources.

The NILM problem can be formulated mathematically by considering a total of N appliances,

i = 1, 2, \dots N

, where at a time-instant t, the power consumed by the ith appliance is given by

x_{i} (t)

. This power can be considered as an appliance’s signature, which is measurable and can provide information about the operating state of the appliance [7]. We consider a single smart meter that measures the total power consumed by these appliances. We consider a binary variable

a_{i} (t)

, which is equal to 1 if the ith appliance is switched on at a time-instant t; otherwise, it is equal to 0. At any time-instant t, a vector

a (t) = [a_{1} (t), a_{2} (t), \dots a_{N} (t)]

contains these binary variables corresponding to all the appliances. Using these variables, the power

y (t)

measured by the smart meter at a time-instant t can be written as

y (t) = \sum_{i = 1}^{N} a_{i} (t) x_{i} (t) + ε (t),

(1)

where

ε (t)

refers to the measurement noise or any residual load at the time-instant t that is not measured. If

x_{i} (t)

is known for each appliance, the solution to the NILM problem based on (1) can be considered as the estimation of

a (t)

as follows [7]:

\hat{a} (t) = \underset{a (t)}{argmin} |y (t) - \sum_{i = 1}^{N} a_{i} (t) x_{i} (t)| .

(2)

That is, given the knowledge of

x_{i} (t)

, we attempt to find the combination of the appliances that are switched on and off such that their cumulative sum at the time-instant t matches the actual measured power

y (t)

. This is a combinatorial optimization problem, which can be solved using exhaustive search. However, an exhaustive search-based solution becomes computationally infeasible for a large number of appliances [7].

In [13], it was proposed that deep learning (DL) algorithms such as long short-term memory (LSTM) networks could be used to solve computationally intractable problems. The proposed solution was applied to two well-known combinatorial problems, and it was shown that the proposed algorithm could obtain a good approximation of the global solution while consuming only a fraction of the computational resources. The authors in [14] stated that given the limited computing capabilities, the branch-and-bound algorithm-based exact solution could not solve the combinatorial optimization problems in a reasonable amount of time. The authors further surveyed DL algorithms to solve combinatorial optimization problems, including the pointer networks (Ptr-Nets), graph neural networks (GNNs), Transformers, reinforcement learning (RL), etc. The authors also highlighted that the popularity of deep neural networks (DNNs) for solving combinatorial optimization problems has increased in the last decade. This is thanks to their rapidity, low reliance on expert knowledge, and possibility to be implemented using parallel computing on graphical processing units (GPUs), to name a few.

The authors in [15] stated that attention-enhanced methods exhibit some advantages over traditional DL methods such as recurrent neural networks (RNNs). Furthermore, Ptr-Nets were the most commonly used models for solving combinatorial optimization problems. The authors also mentioned the suitability of GNNs for solving certain combinatorial optimization problems and further surveyed the use of RL for solving combinatorial optimization problems. The authors studied the use of DL techniques in the field of energy. Specifically, with respect to the power field, the authors showed that RNNs, convolution neural networks (CNNs), Ptr-Nets, and RL, among others, were used to solve scheduling problems.

This review is motivated by the effectiveness of DL algorithms for solving the combinatorial optimization problems in NILM. We provide a survey of recent developments in the use of DL for NILM, including attention-enhanced algorithms, which have demonstrated advantages over traditional DL methods. To the best of our knowledge, these attention-enhanced algorithms have been scarcely discussed in the existing literature. We also review practical hardware implementations for NILM applications, highlighting real-world system deployments and the integration of hardware, software, and AI, rather than focusing solely on laboratory-based experiments as in previous studies. Furthermore, we address key challenges such as energy source heterogeneity, aggregate uncertainty, multi-layered privacy and safety, edge deployment constraints, and inconsistent evaluation metrics, highlighting perspectives that are not comprehensively covered in previous reviews. Figure 2 summarizes the DL models for NILM that are reviewed in this paper. A list of abbreviations used is provided at the end of this paper.

The rest of this paper is organized as follows: Section 2 presents a survey of existing review papers on NILM and presents the novelty of this paper compared to existing papers. Section 3 presents the use of traditional DL algorithms for NILM, including DNNs, CNNs, sequence models, and hybrid models. Section 4 reviews more recent “non-traditional” techniques for NILM, including generative adversarial networks (GANs), autoencoders, attention-enhanced models, and transfer learning (TL). Section 5 presents studies where a complete NILM system was implemented. Section 6 presents some challenges and future directions. Finally, Section 7 concludes this paper.

2. Existing Review Papers on NILM Using DL

Recent works have placed significant emphasis on the application of DL techniques to NILM, driving notable advancements in energy disaggregation and appliance identification. In this context, we present an overview of these works and the methods highlighted in similar review papers, as summarized in Table 1. This section categorizes the NILM review papers into three subsections, namely traditional DL, hybrid DL, and NILM architectures.

2.1. Traditional DL for NILM

Tokam et al. [2] provided a comparative analysis of ILM and NILM approaches, highlighting the strengths and limitations of each. They emphasized NILM’s cost-efficiency in hardware deployment and pointed out its challenges in achieving high disaggregation accuracy. The study highlighted the capability of DL models to eliminate the need for manual feature engineering, thereby improving the NILM’s effectiveness. Specifically, the authors discussed the application of various DL architectures, including CNNs, LSTM, RNNs, Autoencoders, and DNNs in NILM tasks. Furthermore, they addressed critical cybersecurity concerns associated with load monitoring, such as data privacy, network security, authentication and authorization, and the integrity of collected data in both ILM and NILM. Additionally, they emphasized the necessity of securing physical sensors and devices within NILM frameworks to mitigate vulnerabilities against unauthorized access and tampering.

The study further identified key factors that improve the performance, including shorter sampling intervals (less than 10 s), larger field-of-view input windows, and the use of post-processing techniques to smooth outputs and mitigate noise. Multi-task learning approaches were particularly highlighted for their effectiveness in simultaneously predicting multiple appliance states or power values. The authors also discussed the limitations, such as the lack of standardized datasets, variability in appliance behaviors, and the need for better cross-domain TL. They emphasized the need for systematic evaluations of architectures under consistent conditions to address these challenges. Future directions were specified as exploring hybrid architectures, leveraging TL, and improving real-world scalability by integrating NILM systems with smart grid technologies.

Kahl et al. [19] presented a comprehensive exploration of ML applications in NILM for energy disaggregation. The work addressed two core challenges in NILM processes: event detection and appliance classification. For event detection, the study proposed a supervised, multivariate approach that distinguishes between user-relevant appliance events and unrelated transients, significantly reducing false positives using adaptive training. For appliance classification, the research evaluated a wide range of hand-crafted and automated features across multiple datasets, including a novel high-frequency dataset named WHITED [27], which records transient energy consumption for a diverse set of appliances. Furthermore, the paper compared classical ML techniques with DL methods such as CNNs, convolutional autoencoders (CAEs), and representation learning, demonstrating the potential of DL to replace manual feature extraction with automated, high-performance models. The study also explored cross-dataset validation and smart meter configurations to assess real-world applicability, providing valuable insights for enhancing NILM systems.

The paper by Dash et al. [22] provided a comprehensive overview of NILM technology, tracing its evolution from clustering algorithms to advanced DL techniques. It systematically discussed event-based and eventless approaches, appliance signatures, performance evaluation metrics, and key challenges in NILM implementation. The study explored various DL models, including CNNs for feature extraction, RNNs and LSTM for capturing temporal dependencies, autoencoders for unsupervised feature learning, and DNNs for modeling complex energy disaggregation tasks. Additionally, it highlighted important challenges such as scalability, privacy concerns, and the integration of renewable energy, proposing future directions like IoT-enabled systems, privacy-preserving methods, and improved adaptability for real-world scenarios.

The paper by Nalmpantis et al. [23] presented a comprehensive review of ML-based NILM methods, emphasizing DL techniques. It highlighted methods such as one-dimensional (1D) CNNs for appliance classification, RNNs for capturing temporal dependencies, and TL models for improved generalization across datasets. The study also introduced DL-powered dictionary learning models and adaptive weighted recurrence graph blocks for advanced feature extraction. Factorial hidden Markov models (FHMMs) were noted for their ability to use low-frequency data, operate in real-time, and function in an unsupervised manner, reducing the need for training. However, FHMMs face challenges in complexity and scalability. On the other hand, DNNs leverage low-frequency data, operate in real-time, and achieve higher accuracy while handling scalability requirements, though they require extensive training and computational resources. The paper also identified critical requirements for robust NILM, including processing low-frequency data, achieving high accuracy, minimizing training efforts, and ensuring scalability for diverse operating conditions, underscoring the role of DL in advancing NILM research.

Reference [16] presents a comprehensive review of NILM applications, covering popular datasets, preprocessing techniques, feature extraction methods, and existing approaches, with a strong emphasis on state-of-the-art (SOTA) DL techniques. It categorizes appliance load signatures into four types: two-state appliances (e.g., lamps), finite-state appliances with repeating operational patterns (e.g., washing machines), continuously variable appliances (e.g., dimmer lights), and permanent appliances that remain constantly active (e.g., smoke detectors). The review highlights the critical role of DL in automating feature extraction and enhancing NILM performance. It explores various DL architectures such as CNNs and RNNs (including LSTM), which are effective at modeling spatial and temporal dependencies in power consumption data. Autoencoders and other generative models are utilized for feature learning and dimensionality reduction. A key contribution of the paper is its examination of generative AI (GAI) techniques including autoencoders, variational autoencoders (VAEs), and GANs for NILM tasks such as synthetic data generation and signal reconstruction. It also discusses the use of hybrid architectures (e.g., CNN-LSTM, variational RNNs) and attention mechanisms for improved sequence modeling, along with privacy-preserving approaches like federated learning (FL), which enable distributed model training without compromising data privacy. The review also compares supervised and unsupervised learning methods, discussing their respective strengths and limitations. Finally, it emphasizes NILM’s value in identifying energy-inefficient devices, promoting energy optimization, and supporting the development of intelligent, privacy-aware smart energy grids.

Schirmer et al. [24] presented a comprehensive review of NILM methodologies, highlighting the advancements driven by DL in energy disaggregation and appliance identification. Their paper categorized NILM approaches into three main classes: ML-based, pattern matching-based, and source separation-based methods. Among these, DL models such as CNNs, LSTM networks, bidirectional LSTM, and GANs have demonstrated superior performance in extracting appliance-level consumption from the aggregated signals. The review highlighted that recent works have increasingly leveraged large-scale datasets, improving NILM accuracy by capturing both spatial and temporal dependencies in power signals. Notably, CNN-based architectures excel in feature extraction, while LSTM and GRUs enhance sequential modeling capabilities. Additionally, the paper discussed the rising importance of TL in NILM, aiming to improve model generalization across different households and datasets. Furthermore, the review examined benchmark datasets, performance metrics, and hardware implementations, providing a comparative analysis of NILM methods. The study underscored that while DL-driven NILM achieves high accuracy, challenges such as overfitting, computational complexity, and dataset biases remain open research problems. The future directions mentioned include edge AI for real-time processing, domain adaptation for transferability, and hybrid NILM models.

The review by Rafiq et al. [25] provided a comprehensive overview of load monitoring techniques aimed at promoting energy conservation in residential buildings. The paper discussed methods such as DNNs, CNNs, RNNs, and LSTM models, which have shown promising results in terms of disaggregation accuracy. Despite the progress made through these DL models, issues such as data requirements, real-time performance, and interpretability remain as barriers to effective deployment. Following a structured assessment of relevant literature, the review examined the advantages and disadvantages of various data-driven NILM methods, available datasets, and performance evaluation metrics.

Liu et al. [26] conducted a comprehensive review of the role of NILM in smart grids and smart homes. They provided a compilation of electricity consumption datasets from residential, industrial, and commercial sectors across multiple countries, commonly used in published studies. The review outlined key NILM features, categorizing them into steady-state, transient-state, and non-traditional features such as weather, temperature, and day of the week, along with methods for feature selection and extraction. Non-intrusive load monitoring approaches were classified into various types, including ML, pattern recognition, and blind source separation problems, as well as supervised, semi-supervised, and unsupervised methods. The review highlighted the increasing adoption of GNNs for structural pattern recognition in NILM and the application of DNNs in supervised and semi-supervised learning. Metrics used in NILM were also discussed, covering those for state/event estimation and power/energy estimation. The paper further reviewed NILM applications in areas such as comfort enhancement, security, demand response, and privacy preservation. Finally, the authors identified several challenges facing NILM, including the need for models that can disaggregate multiple appliances, the difficulty of handling appliances with irregular usage patterns, the absence of baseline data, and the need to improve generalization for unknown appliances, as well as further research on practical implementation for smart grid integration.

2.2. Hybrid DL for NILM

The paper by Herrero et al. [4] provided an in-depth review of NILM solutions, focusing on their strengths and limitations in energy disaggregation tasks. It categorized NILM as a time-series classification problem and explored key approaches, including autoencoders, hidden Markov models (HMMs), and DL models like CNNs and LSTM. Autoencoders are noted for their simplicity and unsupervised learning capabilities but suffer from low F1-scores due to challenges in identifying overlapping appliance signals. Hidden Markov models, while effective at modeling the appliance states and transitions, are computationally intensive and less scalable, with extensions like FHMMs offering limited improvement. Deep learning models are highlighted as the most promising, with CNNs excelling at feature extraction from raw signals, LSTM capturing temporal dependencies for sequential data, and hybrid CNN-LSTM models leveraging both strengths for high accuracy in NILM tasks. Despite their computational demands and reliance on large datasets, DL methods are identified as the future of NILM due to their scalability, adaptability, and superior performance in noisy and diverse data scenarios. The paper also discussed NILM’s broader applications, including detailed billing, energy management, occupancy detection, and energy theft identification, while addressing challenges such as high sampling rate requirements and household-specific model recalibration.

An extensive examination of DNN-based architectures tailored for NILM tasks utilizing low-frequency data is provided by Huber et al. [17]. Their review outlined the evolution of NILM research and emphasized the practical advantages of low-frequency approaches, such as reduced infrastructure costs and easier deployment via smart meters. The authors analyzed a variety of DNN architectures, including denoising autoencoders (DAEs), which are used for noise reduction and signal reconstruction, particularly effective in separating appliance signals from aggregated power data. This paper also highlighted CNNs for their ability to extract spatial features from input windows, which is suitable for detecting distinct appliance patterns in time-series data. Variations include CNN-dAEs and CNN-GAN hybrids, which integrate convolutional layers with autoencoders or GANs for better reconstruction and generation capabilities. The authors further discussed RNNs and LSTM for their proficiency in modeling temporal dependencies in appliance usage patterns. Bidirectional LSTM (biLSTM) and gated recurrent units (GRUs) were also reviewed for their efficiency in sequential data processing.

Ouzine et al. [21] presented a comprehensive review of NILM methodologies, focusing on probabilistic and AI-based techniques for energy disaggregation. The review explored probabilistic models, such as FHMMs, and advanced AI techniques such as CNNs, LSTM, and hybrid architectures, e.g., CNN-LSTM, which improve accuracy, particularly in unseen environments. The key challenges mentioned included detecting low-power signals, lack of standardized evaluation metrics, and limited labeled data, with recommendations for scalable NILM algorithms and unsupervised learning methods. Similarly, the study by Monteiro et al. [28] evaluated the performance of various AI-based classifiers for NILM and found that 1D CNNs outperformed other approaches for load disaggregation. The paper also emphasized the importance of electrical current signals over voltage and power signals for accurate appliance classification, underscoring their effectiveness in residential NILM applications. Together, these two latter papers highlighted the potential of AI-driven models, particularly CNNs, in advancing NILM by addressing key challenges and enabling sustainable energy management.

2.3. Non-Intrusive Load Monitoring Architectures

Kaselimi et al. [18] presented a comprehensive review of NILM advancements, challenges, and methodologies. Their work identified critical challenges such as generalization to unseen data, noise handling, user feedback integration, model explainability, fairness in evaluation, and data privacy concerns. The review traced the evolution of NILM from Hart’s combinatorial optimization to modern DL approaches like GANs, Transformers, and U-Nets, emphasizing their strengths and limitations. It also categorized NILM techniques into classification, e.g., support vector machines (SVMs), artificial neural networks (ANNs), etc., and regression tasks, noting that while classification handles binary states (ON/OFF), regression predicts energy consumption. Advanced architectures like sequence-to-sequence (Seq2seq), sequence-to-point (Seq2point), and sequence-to-subsequence (Seq2subseq) were highlighted for their trade-offs in prediction granularity. Explainability was explored to understand ML model decisions, while pre-processing methods like feature extraction and balancing were reviewed for handling low and high frequency data. Additionally, the paper reviewed evaluation metrics such as precision, recall, and F1-score for classification and mean absolute error/root mean squared error (MAE/RMSE) for regression. It noted the limitations in current NILM approaches, particularly for multi-state and unknown appliances, and emphasized U-Net’s unique capacity for multi-appliance detection.

Reference [20] highlighted the NILMTK-API toolkit introduced in 2019, designed to standardize benchmarking and comparison of NILM algorithms. It reviewed several DL techniques applied to NILM, including DAEs, RNNs, Seq2seq models, Seq2point models, and online GRU. These models demonstrated strong potential in energy disaggregation tasks, excelling at capturing temporal dependencies and non-linear patterns. However, their performance varied across datasets (IAWE [16], REDD [29], UK-DALE [30]) due to differences in appliance usage patterns and dataset quality. For instance, Seq2seq outperformed others for UK-DALE, while Seq2point performed best for REDD. Challenges such as high error rates for irregular appliances, e.g., air conditioners, and performance variability across datasets, were identified. The review emphasized the NILMTK-contrib repository, which provides code, documentation, and examples, enabling researchers to implement and test these DL models effectively. The work also underscored the importance of standardized tools and DL techniques in advancing the NILM research.

Compared to prior surveys that mainly focused on CNN-LSTM architectures, this paper extends the discussion of hybrid models to include recent CNN-Transformer and GRU-Transformer hybrids, variational recurrent neural networks (VRNNs), convolutional recurrent neural networks (CRNNs), and gated hybrids such as the Subtask Gated Network (SGN). These newer models, particularly Transformer-based hybrids, were largely absent in earlier reviews. While existing surveys often treat CNN-LSTM models as baseline hybrids, our review expands this coverage to advanced variants, including ensemble, gated, and transfer learning-augmented forms. Additionally, unlike previous surveys [20,26] that primarily summarize lab-based hardware platforms, our paper highlights recent hardware case studies and actual implementations, providing detailed insights into microcontroller optimization, edge/cloud deployment, and IoT-enabled NILM setups. This broader scope captures both the architectural evolution and practical deployment challenges, distinguishing our survey from earlier works.

3. Non-Intrusive Load Monitoring Using Traditional DL Techniques

3.1. Deep Neural Networks and Multilayer Perceptrons

Deep neural networks play a pivotal role in NILM due to their ability to model complex non-linear relationships in time-series energy data. By learning hierarchical features from raw input data such as voltage, current, and power, DNNs have the capability to enhance load disaggregation accuracy, handle noisy data, and differentiate appliances with similar power signatures. Among DNN architectures, multilayer perceptrons (MLPs) serve as a fundamental model in NILM, effectively capturing non-linear dependencies for appliances’ classification and load disaggregation [31]. Multilayer perceptrons utilize features such as voltage, current, and power factor to distinguish appliances, particularly those with overlapping power profiles. While MLPs provide a strong foundation, deeper DNN architectures with additional hidden layers can further improve feature extraction and representation, leading to more robust NILM performance [31].

For instance, DNNs have demonstrated their efficiency in NILM by effectively disaggregating the aggregate power consumption into individual appliance loads. The ability of DNNs to model complex, non-linear relationships enables them to capture intricate patterns in time-series energy data, making them particularly useful for NILM applications, since real-time monitoring is crucial for energy efficiency. Moreover, optimizing DNN architectures by reducing the number of layers and neurons ensures a balance between computational efficiency and model accuracy, which is especially relevant in NILM systems deployed on resource-constrained smart meters [32]. A similar optimization strategy has been applied in other domains, such as estimating the state of charge (SOC) of lithium-ion battery cells in electric vehicles (EVs), where DNNs utilize voltage, current, and temperature data for precise predictions. In NILM, this architectural refinement, often guided by greedy search algorithms, helps enhance the disaggregation accuracy while mitigating overfitting risks. As a result, error metrics like MAE and RMSE are minimized, reinforcing the adaptability of DNNs in handling domain-specific challenges. These findings not only underscore the versatility of DNNs but also highlight their potential for improving energy disaggregation performance and scalability in NILM applications.

The paper by Zhang et al. [33] explored the integration of differential privacy techniques with a DNN-based energy disaggregation system. The study focused on addressing privacy concerns associated with collecting and analyzing personal electricity usage data in online energy disaggregation applications. The paper highlighted the advanced capabilities of DNNs in energy disaggregation tasks, including their ability to analyze high-frequency smart meter data, such as active/reactive power, current, voltage, and harmonics, for load disaggregation, peak load shaving, and valley filling. However, it also identified the privacy risks posed by the continuous collection of sensitive personal data, such as revealing household behaviors and activities. To mitigate these risks, the authors proposed a privacy-preserving approach by integrating Gaussian noise into the DNN training process. This application of differential privacy ensures that the training data and model outputs are protected, preventing the inference of sensitive personal information while maintaining the utility of the energy disaggregation system.

Liu et al. [34] proposed a DNN-based NILM approach using low-sampling data, with a 15 min interval between consecutive data points, to adapt the NILM algorithm to the typical sampling rate available in commercial power meters. Unlike most studies that rely on high sampling rates, such as 1 s or 1 min, the lower sampling rate posed challenges in extracting meaningful power features and identifying individual appliances. The authors used the NILMTK-contrib API to evaluate the performance of various NILM algorithms on the dataset, employing a rolling-based training and testing approach, where the data from the last two months were used to predict the following month. The study also explored model transferability across different seasons and datasets, demonstrating the applicability and robustness of the proposed approach for NILM in diverse real-world scenarios.

3.2. Convolutional Neural Networks

Convolutional neural networks have proven to be highly effective in NILM, particularly for processing the time-series energy data. Unlike traditional MLPs, CNNs excel at identifying spatial and localized patterns within data, making them suitable for tasks that involve sequential appliance power signatures. The key advantage of CNNs lies in their ability to perform convolutional operations, which capture relevant features such as peaks, transitions, and variations in energy usage. This computational efficiency stems from the ability of convolutional layers to reduce data dimensionality while retaining critical information, making CNNs particularly useful for high-resolution energy datasets and real-time monitoring systems, where both accuracy and speed are essential. Leveraging this strength, Du et al. [35] employed CNNs as a shared feature representation network to learn temporal and localized patterns from aggregated main meter readings. By effectively capturing subtle variations and temporal dependencies, CNNs enabled the model to distinguish between overlapping appliance signatures. To handle the multimode signal structures of various appliances, the framework incorporated multiple discriminators, each tasked with fine-grained separation of appliance-specific features. The generator competed against these discriminators, progressively learning to extract the shared features that represent the multimodal characteristics of the aggregated signals. This adversarial training approach, powered by the localized feature extraction capabilities of CNNs, enabled precise and efficient disaggregation of complex energy usage patterns.

The study by Santoro [36] explored the application of DL models for NILM using the DAEs, CNN-based DAEs, U-Net, and Seq2seq architectures. The models were trained and evaluated on a dataset collected from five residential houses in Rio de Janeiro, Brazil, with power consumption data sampled at 1 Hz and later downsampled to 1/6 Hz. A key focus of the study was the impact of input data types on NILM performance. The models were tested using single-channel input (apparent power demand only) and two-channel input (active and reactive power demand separately). The results showed that the two-channel U-Net model slightly outperformed the other architectures, demonstrating the advantage of incorporating richer data representations for improved appliance-level disaggregation. The study also replicated the Seq2seq model proposed by Kelly et al. [30] as a baseline and highlighted the overfitting challenges, particularly in the Seq2seq model, due to its large number of parameters. To mitigate this, regularization techniques such as dropout and batch normalization were applied. Additionally, the U-Net model was designed to improve performance without requiring fully connected output layers, thus reducing parameter complexity.

Electricity disaggregation is challenging, particularly for appliances with similar energy profiles or complex multi-state behaviors. Traditional methods like LSTM perform well for simple two-state appliances but are computationally inefficient and struggle with multi-state or unpredictable power patterns. To address these challenges, Edmonds et al. [37] proposed an innovative CNN-based approach that transforms electricity time-series data into heatmaps, where higher energy readings are depicted as hotter colors. These heatmaps, generated by aggregating high-frequency data hourly, enable the use of image-based DL techniques to extract spatial features from the energy patterns. Evaluated on the UK-DALE dataset, the model achieved the highest accuracy for refrigerators in single-house scenarios and for dishwashers across multiple houses. While the approach showed promising performance, its reliance on correlation-based heatmaps, rather than raw energy data, risks overlooking temporal dynamics, requiring further investigation. Despite its limitations, the study highlighted the potential of CNNs in NILM while identifying opportunities for refinement, particularly in maintaining temporal structure and handling similar energy profiles.

Massidda et al. [38] proposed a CNN-based NILM approach that improves accuracy by disaggregating the aggregated household energy consumption into individual appliance usage profiles. Their model, inspired by the pyramid scene parsing network (PSPNet), incorporated a temporal pooling module that aggregates multi-scale temporal features, enabling the CNN to capture both global and localized patterns in the aggregated load data. The encoder–decoder architecture processes time-series data, where the encoder extracts features through convolutional layers and pooling modules, and the decoder reconstructs the appliance activation states. Using the UK-DALE dataset, the model demonstrated SOTA performance in both seen (in-house) and unseen (cross-house) scenarios, outperforming traditional and existing DL models. By achieving high accuracy in estimating appliance-specific consumption while maintaining computational efficiency, this model offered a robust and cost-effective solution for real-time NILM applications in smart energy management systems.

Convolution neural networks have been leveraged in data augmentation to enhance NILM model performance. Delfosse et al. [39] employed a CNN-powered multi-agent-based simulator (MAS) to generate synthetic training data derived from real datasets, addressing the challenge of varying appliance types, manufacturers, and regional energy usage patterns. By transforming real-world energy data into diverse synthetic representations, the simulator improved the model’s generalization and accuracy, especially when real data are scarce or inconsistent. The combination of synthetic and real data during the training stage significantly enhanced the NILM model’s flexibility and robustness in diverse real-world scenarios. The study also examined FHMM, which, despite its effectiveness, suffered from scalability issues and high computational complexity. Cross-validation results demonstrated that CNN-based architectures, particularly time distributed convolutional networks and temporal convolutional networks, outperformed FHMM. Furthermore, models that were trained with a mix of synthetic and real data achieved the highest accuracy, underscoring the impact of CNN-driven data augmentation in NILM.

Current CNN-based models often struggle to capture the dependencies in energy consumption behavior, such as sequential appliance usage patterns, e.g., drying after washing or repeated microwave use, leading to inaccuracies in energy disaggregation. To address this, Chen et al. [40] introduced a CNN-driven scale- and context-aware approach for NILM. Their model, built upon a multi-branch CNN architecture, employed different dilation rates across parallel branches to enhance the scale awareness, allowing it to process the signals at varying time resolutions. Context awareness was further incorporated through a self-attention mechanism, enabling the network to consider long-range dependencies across the entire input sequence. Additionally, they integrated an adversarial loss module that combines CNN with a GAN to refine the disaggregation performance by reducing errors and improve generalization. Extensive evaluations on the REDD and UK-DALE datasets demonstrated that this hybrid CNN-GAN approach can surpass conventional DNN-based models, achieving SOTA performance in NILM tasks.

Unlike traditional NILM systems that rely on extensive data processing over extended periods, the approach by Athanasiadis et al. [41] focused on identifying appliance turn-on events through transient power signals, allowing immediate energy consumption estimation. This approach offers a scalable solution for large-scale deployment in residential and commercial settings. The proposed system comprises three primary components: an event detection algorithm that identifies the precise moment an appliance is turned on by analyzing transient changes in active power; a CNN classifier that determines whether the detected event corresponds to a specific target appliance; and a power estimation algorithm that calculates the appliance’s real-time power consumption based on its operational characteristics. This methodology was designed to operate with minimal hardware requirements, making it suitable for integration into low-cost smart meters and edge devices. By processing data at a sampling rate of 100 Hz and focusing on short-duration transients, the system achieved real-time performance without the need for extensive data storage or high computational resources. Experimental evaluations demonstrated that the system could accurately detect appliance turn-on events and estimate power consumption in real-time, even with reduced data sampling intervals.

Seq2point learning, a promising CNN-based approach in NILM, maps a sequence of aggregate power data to a target appliance’s power consumption at the midpoint of the input sequence. However, this method often demands significant computational resources due to the extensive number of parameters in CNN-based architectures, which exceed thirty million weights [42]. To address this challenge, Barber et al. [42] proposed a pruned Seq2point learning model that leverages CNNs with reduced complexity, making it highly suitable for deployment in resource-constrained environments such as smart meters. By eliminating weights that minimally impact network output, their pruning process significantly enhanced computational efficiency without compromising performance. Four pruning techniques were evaluated, namely low magnitude pruning (LMP), which removes a fixed proportion of the smallest absolute-value weights; relative threshold pruning, which adjusts pruning thresholds based on the weight distribution within layers; structured probabilistic pruning (SBP), which is less effective for CNNs due to its reliance on structural constraints; and entropy-based pruning (EBP), which prioritizes the weights based on their information contribution.

Moreover, this lightweight Seq2point model [42] further incorporates dropout and a simplified CNN architecture to enhance sparsity and reduce complexity. Experimental results using the REFIT dataset demonstrated that EBP and SBP achieved the best error metrics, while LMP yielded the highest sparsity improvement. The combined reduced LMP with dropout approach achieved an 87% reduction in model weights, outperforming the control models [18,43] in error metrics and proving its effectiveness for appliances like kettles and dishwashers. Hence, this optimized CNN-based model successfully balances efficiency and accuracy, solidifying its practicality for NILM applications.

Accurate prediction of individual appliance energy consumption from aggregated household energy data is challenging due to the complexity of energy consumption patterns. Jiang et al. [44] proposed an energy disaggregation mechanism using the WaveNet model, a variant of CNNs that utilizes dilated convolutions. Unlike traditional CNNs or RNNs, WaveNet efficiently processes long sequences without relying on recurrent structures, making it computationally efficient and easier to parallelize. The study evaluated WaveNet on the REFIT dataset, predicting the appliance ON/OFF states using two frameworks: one based on aggregated energy readings, and the other using binarized readings. Additionally, the independent training of models for each appliance improved the disaggregation accuracy. The study employed a fast Seq2point method for training, further enhancing performance while maintaining computational efficiency.

Luan et al. [45] proposed a lightweight Seq2seq neural network model for NILM, where Seq2seq refers to models that map an input sequence to an output sequence, commonly used for time-series tasks. Designed for edge devices, this model addresses challenges in real-time energy monitoring while preserving user privacy. With only two convolutional layers and dense layers, it leverages the efficiency of CNNs to perform energy disaggregation with minimal computational and memory requirements. By processing the aggregated energy data into appliance-level consumption patterns, the model enables real-time feedback on energy usage. The study evaluated its performance using low-frequency data from the UKDALE and REFIT datasets, demonstrating comparable or superior results to existing methods while significantly reducing model size and computational demands. This makes it a promising solution for deploying NILM in edge devices without compromising privacy.

A Seq2point CNN architecture was also utilized by Murray et al. [46] to estimate the energy consumption of individual appliances from aggregated smart meter data. The primary focus was on enhancing the explainability and transparency of such models, which is often lacking in traditional NILM approaches. The authors introduced the use of heatmaps generated through occlusion techniques to visualize parts of the input data that are critical for energy consumption predictions, helping to increase the interpretability of the model. The results showed improved accuracy, with the proposed method achieving an improved reduction in error compared to SOTA models, but the work also highlighted the challenges in making these methods comprehensible for non-expert users. In particular, while the heatmaps offer valuable insights into the model’s decision-making, they may still be difficult for end-users to interpret without domain knowledge. The work underscored the importance of transparency in AI models for practical applications like smart grids and suggested further research into interactive methods for the user-centric evaluation and performance metrics.

Zhang et al. [47] proposed a Seq2point model that primarily leverages CNNs as an alternative to Seq2seq methods, effectively mitigating the vanishing gradient problem and the computational overhead associated with long input sequences. The study systematically demonstrated the effectiveness of convolutional architectures in learning appliance signatures for energy disaggregation. However, conventional Seq2point learning often requires deep convolutional architectures with millions of parameters, leading to redundancy, overfitting, and high computational costs. To overcome these challenges, Yu et al. [48] proposed an augmented lightweight parallel network that embeds an attention mechanism over the Seq2point framework to improve feature extraction and load disaggregation. Instead of relying solely on stacked CNN layers, the model employs multiple lightweight convolutional branches in parallel to capture diverse receptive fields while keeping the parameter count low. Additionally, the integration of an attention module allows the network to dynamically focus on the most relevant temporal features of aggregated power signals, thereby improving prediction accuracy. Evaluated on UK-DALE and REFIT datasets, their model demonstrated superior performance compared to baseline Seq2point approaches, achieving improved disaggregation accuracy across various appliances while reducing training complexity.

3.3. Sequential and Hybrid Models

In this subsection, we review various research works that demonstrate the application of sequential and hybrid models in NILM. Unlike prior surveys, this review provides an in-depth discussion of how these models are implemented, their performance across multiple real-world datasets, integration with attention mechanisms, and deployment considerations, offering a more comprehensive perspective on sequential and hybrid architectures.

LSTM networks have been widely adopted in NILM due to their ability to learn long-range dependencies and overcome vanishing gradient issues. Cho et al. [49] employed LSTM with smart meter data sampled at one-minute intervals to classify the ON/OFF states and power consumption of air-conditioning (A/C) units. The study addressed the challenge of accurately disaggregating A/C power consumption from aggregated residential power usage, a critical issue given the increasing global demand for air conditioning. Traditional methods like HMMs struggle to model appliances with fluctuating power patterns, prompting the authors to leverage DL techniques, specifically LSTM and MLPs. While MLPs achieved slightly better RMSE and MAE results, LSTM excelled at capturing the temporal dependencies and long-term patterns inherent in time-series data. Using the Pecan Street dataset, the authors in [49] demonstrated that LSTs could effectively identify A/C operating cycles and estimate power usage, achieving high accuracy across multiple houses. This highlights the potential of sequential models like LSTM in NILM for disaggregating complex appliance loads, improving real-time energy management and addressing the challenges of fluctuating power consumption. Another study by Zhang et al. [50] introduced a probabilistic feeder-level energy disaggregation model using multi-quantile LSTM (MQ-LSTM), which disaggregates loads like rooftop photovoltaic (PV) systems, thermostatically controlled loads (TCLs), and non-TCLs. The model integrated features such as historical demand, meteorological data, and calendar information, demonstrating high adaptability across diverse datasets.

In hybrid approaches, CNNs are combined with other architectures, such as LSTM or GRUs, to improve performance by integrating the temporal dependencies with localized feature extraction. In particular, CNNs excel at feature extraction, while LSTM and GRUs capture the temporal aspects of the energy usage, addressing challenges like data sparsity, noisy signals, the complexity of identifying appliance states, and multi-state appliance classification. For example, Bimenyimana et al. [51] combined LSTM and random forest (RF) models to disaggregate energy consumption for seven appliances and forecast future consumption. The ensemble-based approach showed promising results for NILM, particularly in multi-state appliance recognition. Moving toward ensemble models, the work by Junfei et al. [52] integrated LSTM and feedforward neural networks (FFNNs) to address multi-state appliance recognition. By processing aggregated consumption data alongside temporal features like day, time, and month, the model effectively classified appliance states and mitigated data imbalance issues. The ensemble approach demonstrates a superior performance compared to standalone LSTM or DAE models.

In [53], a DL approach for energy disaggregation that combines CNNs with bidirectional LSTM layers was proposed. The model, which was applied to appliances like kettles, fridges, and washing machines, incorporated a denoising encoder and demonstrated improved accuracy and noise reduction compared to traditional LSTM-based models. Similarly, Tan et al. [54] presented a behind-the-meter (BTM) NILM framework that integrates renewable energy sources, particularly PV and wind generation, to enhance load decomposition and monitoring. The proposed approach addressed the challenges posed by the intermittent nature of renewable energy sources by incorporating environmental parameters, such as temperature and wind speed, into the load disaggregation process. The study employed LSTM, RNN combined with CNNs, and GRUs to analyze and predict appliance states based on power consumption patterns. Specifically, the CNN was utilized for feature extraction, reducing the computational complexity, while the GRU provided a faster alternative to the LSTM with slightly lower performance in large-scale load decomposition. Their model was validated on the UK-DALE dataset, demonstrating improved accuracy in identifying appliance operations despite the variability introduced by renewable energy sources.

Bejarano et al. [55] used VRNNs to jointly disaggregate individual appliance signals from the aggregate power signal. By incorporating latent variables, the VRNNs captured the underlying state transitions, similar to traditional models like HMMs, but within a DNN framework. This enabled the VRNNs to effectively model the highly variable sequential data. Similarly, CRNNs were employed by Serafini et al. [56], where they proposed a multiple instance regression approach. This approach reduced reliance on strongly labeled data by using coarse-grained information and pooling functions to derive weakly labeled datasets. The CRNN architecture, as shown in Figure 3, combined convolutional layers for feature extraction with bidirectional GRUs for temporal modeling, demonstrating effective performance in NILM tasks.

As an alternative to LSTM, GRUs have also been applied in NILM due to their efficiency in modeling sequential data with a smaller number of parameters. Xuan et al. [57] introduced a hybrid GRU-Transformer model enhanced by convolutional layers and time-aware self-attention mechanisms. This architecture effectively addressed the challenges associated with multi-stage appliances by extending the receptive fields and improving the responsiveness to state transitions. The model was evaluated against several SOTA approaches, including LSTM+, regularized bidirectional GRU (GRU+), Seq2seq CNN, and BERT4NILM, using consistent input sequence lengths, hidden sizes, and training conditions. The results provided in Table 2 and Table 3 highlight the model’s superior performance on both the UK-DALE and REDD datasets. Specifically, the model was referred to as CTA-BERT, as it combined time-sensing self-attention with BERT. It improved the F1-score by 5% over BERT4NILM while reducing the relative and absolute errors by 20% and 3%, respectively. It outperformed other baselines by up to 30% in several metrics. The proposed loss function further enhanced the performance across most appliances compared to mean squared error (MSE), with CTA-BERT showing the most consistent overall accuracy despite occasional metric advantages demonstrated by BERT4NILM.

Similarly, the authors in [58] demonstrated that GRUs outperformed other recurrent architectures like simple RNNs and LSTM in the presence of persistent noise, achieving the highest disaggregation accuracy while maintaining computational efficiency. Moreover, GRUs showed superior performance in terms of model convergence speed and robustness when applied to real-world household energy consumption data.

Studies have demonstrated the effectiveness of hybrid CNN-LSTM architectures. For instance, Matindife et al. [31] introduced disaggregation networks utilizing CNN and LSTM layers combined with deep MLPs for appliance classification. This method employs three signal parameters, voltage, current, and power, and TL to address data scarcity, achieving robust disaggregation of sampled data at a 1 Hz frequency. Similarly, the authors in [59] proposed a Seq2point learning framework using CNNs for appliance state detection and LSTM for power prediction. This lightweight model showed superior performance on the REDD dataset compared to Seq2seq and CNN-based methods, making it suitable for deployment on embedded devices.

To tackle the limited availability of supervised data, Delfosse et al. [39] introduced a simulator that is capable of generating synthetic datasets from real appliance signatures. This synthetic data enabled effective training of CNN-LSTM networks, improving the disaggregation accuracy when evaluated using metrics like MAE and symmetric mean absolute percentage error (SMAPE). Conversely, TL can degrade the performance in some cases, highlighting the complexity of domain adaptation for NILM. The use of hybrid models can also be extended to low-sampling-rate data and multi-state appliances. For example, applications of LSTM-RNN models to datasets such as UK-DALE and REDD achieved better performance compared to HMMs [60]. In another hybrid application, Yu et al. [61] employed a combination of 1D-CNNs and LSTM in a sliding window framework for feature extraction and temporal modeling, achieving enhanced load disaggregation accuracy. In this context, the combination of CNNs and LSTM with dropout layers to process the sliding windows of energy data can facilitate the identification of multiple appliances with reduced error rates.

Najafi et al. [62] explored low sample rate hybrid architectures that incorporated CNN-LSTM-based DL models for NILM, specifically targeting the disaggregation of A/C energy consumption. The study highlighted how the combination of convolutional layers for feature extraction and LSTM layers for temporal sequence learning can enhance model accuracy while effectively addressing the challenges related to data sparsity and cost constraints. Similarly, He et al. [63] addressed the challenge of energy disaggregation by employing DL models to enhance the accuracy of NILM. Their approach involved a hybrid architecture combining feedforward and backward propagation models within a CNN framework. In contrast to traditional cascaded architectures, their model employed multiple parallel convolution layers followed by pooling layers for feature extraction. The use of an LSTM-based backpropagation model overcame the vanishing gradient problem commonly faced by RNNs. This Seq2seq model improved energy disaggregation by effectively capturing both the spatial and temporal dependencies in the data.

Similarly, Shin et al. [64] proposed a hybrid SGN for NILM. It combined a Seq2seq CNN-based regression subnetwork, which extracted the temporal dependencies from the aggregated power signals to estimate appliance-level consumption, with a classification subnetwork that predicted the ON/OFF state of appliances. Unlike conventional multitask learning, the SGN incorporates a gating mechanism where the regression output is multiplied by the classification network’s probability prediction, enhancing the disaggregation accuracy by enforcing the appliance state constraints. Leveraging the efficiency of CNNs, this hybrid model effectively disaggregated the energy usage from aggregated measurements, providing feedback to consumers to influence their energy consumption behaviors. Evaluations on the REDD and UK-DALE datasets showed that SGN achieves a 15–30% error reduction compared to SOTA models, including the Zhang et al.’s CNN-based NILM approach [47]. The hybrid architecture improved both the accuracy and computational efficiency, making it well-suited for real-world NILM applications.

4. Emerging Deep Learning Techniques

4.1. Generative Adversarial Networks

In the context of NILM, GANs are increasingly being utilized to address the challenges of energy disaggregation by generating realistic appliance-level energy consumption profiles from the aggregate power signals. Generative adversarial networks enhance the robustness of NILM models against noisy and inconsistent signals while improving their generalization to unseen data, making them suitable for both offline and online applications. For instance, adversarial training was used by Du et al. [35] to learn discriminative feature representations, enabling better disaggregation by distinguishing the overlapping appliance signatures. During adversarial training, the generator challenges the discriminators by producing outputs that closely resemble real appliance consumption patterns, ensuring that the shared feature space captures the multimodal characteristics of appliance-specific energy consumption. Their method also employed CNNs to extract appliance features, and a classifier was trained on the shared features for the purpose of appliance-level energy disaggregation. This adversarial learning framework highlights the potential of GANs in capturing complex relationships in NILM data.

The combination of GANs with sequential modeling approaches, such as RNNs or GRUs, can improve temporal understanding of appliance usage as studied by Kaselimi et al. [65]. The authors combined CNNs and GRU-based recurrent networks within a GAN framework to model the temporal dependencies and recurrent patterns of appliance energy consumption. Specifically, the generator, which was implemented as a convolutional autoencoder, created synthetic appliance energy time series based on a compressed noise signal, while the discriminator, enriched with GRU layers, classified the outputs as real or synthetic. This approach models both the operational dynamics and power consumption of appliances, achieving a high accuracy in estimating the appliance loads and demonstrating robustness against noisy aggregate signals.

A novel energy disaggregation framework by Çimen et al. [66] combined GANs with a variational autoencoder (VAE) and a UNet structure to enhance generalization and handle uncertainties in the NILM data. In particular, the generator utilized a Gaussian prior distribution to address variations in unseen data, while the discriminator validated the generated appliance power signals. The proposed method outperformed traditional DL approaches like HMMs and CNNs, achieving SOTA results on datasets such as REDD and UK-DALE. Furthermore, the model demonstrated superior performance even with limited data and supported online analysis with practical delay constraints for both long- and short-duration appliances.

The Seq2subseq GAN framework can further enhance the granularity by mapping the aggregate energy sequences to subsequences of appliance-level consumption, addressing the trade-offs between model accuracy and computational complexity. To balance between computational complexity and training efficiency, a Seq2subseq GAN-based method was introduced by Pan et al. [67]. This framework predicted shorter time windows of appliance energy consumption, aligning them with the center of the input window. The generator, which was built on a U-Net architecture, synthesized realistic appliance power signals, while the discriminator refined the generator’s performance by automatically learning the loss function through adversarial training. Additionally, instance normalization was applied to stabilize training and improve feature extraction. This approach offers a middle ground between the Seq2seq and Seq2point NILM techniques, optimizing accuracy and efficiency for energy disaggregation tasks.

Robust GAN architectures can help overcome challenges like imbalanced datasets and noisy measurements, offering reliable load predictions by leveraging their ability to model complex distributions. For instance, a GAN-based model by Kaselimi et al. [68] employed a generator to produce realistic appliance energy signals based on encoded aggregate power data, while a discriminator validated the outputs. The generator incorporated a seeder module, comprising three convolutional layers, to transform random noisy input into a more informative subspace, ensuring realistic and valid load profiles. This process refined the generator’s output to match the desired distribution of the appliance power signals, while the discriminator, consisting of two convolutional layers and two fully connected layers, validated the responses by estimating whether the signal originated from the generator or the real training data. This model performance underscored the robustness of GANs against noisy aggregate signals and demonstrated their ability to generate accurate appliance status predictions, such as ON/OFF states.

Additionally, GANs excel at refining the appliance load, as shown by Bao et al. [69], delivering high-quality disaggregation outputs that facilitate better energy management. Their model utilized a generator and a discriminator, each comprising five convolutional layers and a fully connected layer, to learn both the temporal patterns and the shape of appliance energy consumption. The generator was trained to deceive the discriminator by producing energy profiles that closely resemble real-world appliance consumption patterns, while the discriminator worked to distinguish real measures from the generated samples. The model was validated on real-world NILM datasets collected from five households, sampled at 1/6 Hz, and demonstrated its ability to accurately detect the time intervals when appliances consume energy and reproduce the shape of the target appliance load. These results highlight the robustness of GANs in handling noisy aggregate signals and generating realistic load profiles, making them a reliable tool for energy disaggregation and management.

Autoencoders

Autoencoders have emerged as a robust tool for NILM, thanks to their ability to learn efficient data encodings and decode them to reconstruct the original inputs. They are particularly well-suited for tasks involving noise removal, feature extraction, and energy disaggregation. The effectiveness of DAEs in NILM was emphasized by Sewwandi et al. [70] by framing the problem as a denoising task, where the autoencoder isolates clean appliance-specific power signals from aggregated noisy data. The model employed a convolutional architecture, where the encoder extracts the key features from normalized 48 × 48 input signals using multiple two-dimensional (2D) convolutional layers with ReLU activation and max-pooling operations to reduce dimensionality and capture essential patterns. The decoder mirrors this structure with up-sampling layers to reconstruct appliance-specific power signals. The model, evaluated on both the public (REDD) and local datasets, achieved promising results, particularly for refrigerators, with high R²; values and better performance than traditional DNN models. These findings reinforce the versatility of autoencoders in energy disaggregation and their potential to improve NILM applications by handling noise and extracting meaningful features.

In another approach, Bousbiat et al. [71] transformed the input sequence into an image representation, which was then processed using modified DAEs. Specifically, the model replaced the standard 1D convolutional layer with a 2D convolutional layer to better handle the image data. The study explored three imaging techniques: Gramian angular fields (GAF), Markov transition fields (MTF), and recurrence plots (RP). The GAF technique, which encodes temporal dependencies in the angular space, consistently outperformed the other techniques in most cases. Furthermore, RP, a method that visualizes repeating patterns in time series, was also found to be a viable alternative in some scenarios. Using these techniques, the model improved appliance-specific energy estimation, accurately predicting the power consumption of individual devices from the aggregate household signal. The approach demonstrated enhanced performance in 19 out of 24 experiments compared to traditional methods.

Traditional NILM techniques require high-frequency data acquisition and complex models, both of which are computationally demanding and costly to implement. In response, He et al. [72] introduced a novel NILM approach that leverages DAEs to enhance load disaggregation. Specifically, the model processes low-frequency active power measurements, making it more cost-effective and less complex than conventional NILM methods. The DAEs were employed to reconstruct the appliance-level power signals from the aggregated household energy consumption, effectively filtering out noise and improving disaggregation accuracy. Additionally, the method enhanced the performance by reconfiguring the overlapping sliding window segments and applying a median filter for data processing. The effectiveness of this approach was validated using real-world residential datasets, including REDD and TraceBase, demonstrating its superiority over FHMM-based methods. Notably, the proposed model can disaggregate energy data regardless of appliance brand or model, making it a versatile and scalable solution for residential load monitoring.

Langevin et al. [73] discussed a novel energy disaggregation approach using VAEs, which include a probabilistic encoder, enhancing the model’s ability to encode the relevant features for accurate appliance consumption reconstruction. This approach is particularly effective in handling multi-state appliances, such as washing machines or dishwashers, by improving power signal reconstruction. The regularized latent space of the VAE aids in better generalization across different houses, addressing the scalability and generalization challenges of traditional NILM techniques. Additionally, the implementation of skip connections between the encoder and decoder, inspired by the U-Net architecture, allows the decoder to benefit from high-level features extracted by the encoder, leading to enhanced load reconstruction performance. The model was evaluated on the UK-DALE and REFIT datasets, demonstrating an average reduction in MAE by 18% and improvement in F1-score by 11%, outperforming five SOTA algorithms. This approach offered significant improvements in both appliance detection and load reconstruction, showcasing its effectiveness in real-world scenarios with multi-state appliances.

Gkalinikis et al. [74] proposed a variational regression framework for multi-target energy disaggregation that learns to disaggregate several appliances simultaneously within a single model. The authors employed a variational encoder to capture latent representations of the aggregated energy signal, followed by a reparameterization step that sampled appliance-specific latent variables. Subsequently, these representations were fused using lightweight mechanisms such as element-wise addition, dot attention, or dense mappings before being passed to regression heads that predicted both the real-time power consumption and operational states of multiple appliances. The model avoided overfitting and improved generalization to unseen households by leveraging variational inference, effectively addressing the limitations of one-vs-one NILM strategies. Experimental results on benchmark datasets showed that the proposed method achieved competitive or superior accuracy compared to SOTA single-target and multi-target models while reducing training overhead and enabling more scalable deployment in smart grid applications.

Data collection for NILM is challenging due to intrusive and costly appliance-level monitoring. The long short-term memory network autoencoders introduced by Verma et al. [75] aimed to overcome this challenge by capturing the temporal patterns in aggregated smart meter data, enabling accurate and non-intrusive appliance disaggregation. The model addressed the dynamic nature of NILM signals, which was inadequately captured by earlier static methods. It employed an LSTM autoencoder architecture where the encoder processes time-series data to extract latent representations capturing temporal dependencies, while the decoder reconstructs a time-reversed version of the input signal to ensure robust feature learning. The latent embeddings from the encoder are mapped to appliance-specific one-hot-encoded labels via a fully connected layer, indicating the ON/OFF states of appliances. The model optimized a combined loss function comprising a reconstruction loss and a classification loss, prioritizing disaggregation accuracy. Evaluated on the REDD and Pecan Street datasets, the model outperformed SOTA techniques like SGN and deep dilated residual networks (DDRN), achieving higher F1-scores and lower average energy errors, particularly for appliances like refrigerators and kitchen outlets. By effectively capturing the complex temporal patterns and reducing the need for intrusive data collection, the proposed approach sets a new standard for dynamic multi-label classification in NILM.

Autoencoders have also been widely utilized for dimensionality reduction and feature extraction in various domains, including industrial NILM applications. Li et al. [76] presented a novel NILM approach tailored for commercial power data, leveraging an autoencoder-based Transformer model. The study began by measuring the operating power of various electrical appliances across different modes, subsequently combining these operational modes to enhance monitoring accuracy. Given the complexity and variety of industrial electrical appliances, the authors utilized an autoencoder to recode and reduce the dimensionality of the combined data, ensuring precision in the analysis. The Transformer model was then employed to train the relationship between the total power consumption information sequence and the corresponding states of the appliances. The latter model translated electrical signals into distinct electrical state codes, facilitating effective load energy decomposition. When tested on real-world gas station field data, the proposed model achieved an impressive accuracy of 90.17%, demonstrating its potential for efficient energy monitoring in industrial settings.

4.2. Attention-Enhanced Models

Attention-based models, such as encoder–decoder architectures with attention layers, have demonstrated superior performance in energy disaggregation tasks. Attention mechanisms, particularly in encoder–decoder architectures, allow models to dynamically assign importance to specific regions of the aggregated power signal, focusing on areas corresponding to appliance activation or state changes. A notable study by Piccialli et al. [43] proposed a hybrid DL architecture that combines a regression sub-network with a classification sub-network, each enhanced with attention mechanisms. The regression sub-network employs a CNN-bidirectional LSTM (BiLSTM) encoder with an attention unit to emphasize relevant temporal patterns, enabling effective extraction of appliance-specific power usage. A fully connected decoder reconstructs the disaggregated signals, while the classification sub-network estimates the appliance states through convolutional and dense layers. The combination of regression and classification outputs enhanced generalization and achieved SOTA performance, surpassing traditional models like HMMs, DAEs, and standalone CNNs. This approach highlights the scalability and effectiveness of attention mechanisms in NILM for extracting appliance-level insights from aggregate power data.

In another study, Pan et al. [77] proposed a parallel multi-scale attention mechanism for NILM, combining CNN-based feature extraction with attention layers to enhance the model’s ability to capture both local and long-range dependencies in aggregated signals. The parallel multi-scale structure allowed the model to process input at different receptive fields simultaneously, while the attention mechanism adaptively emphasized appliance-related activations. This CNN–attention hybrid demonstrates the effective integration of attention modules with convolutional architectures to improve accuracy and robustness in energy disaggregation tasks, further underscoring the versatility of attention in NILM beyond Transformer-based approaches. Following this, Yao et al. [78] introduced another multiscale attention mechanism, where CNN-based feature extraction was combined with attention modules to dynamically focus on informative regions of the input across multiple temporal scales. This design further improved disaggregation accuracy and robustness by addressing overlapping appliance usage and long-range dependencies, reinforcing the versatility of attention mechanisms in CNN-based NILM architectures.

Self-attention is a powerful tool for modeling complex dependencies in NILM tasks. It enables the model to focus on important elements across the entire sequence, regardless of their position, which is especially beneficial for appliances with unpredictable or overlapping usage patterns. By incorporating data-efficient techniques such as sample augmentation by Xiong et al. [79], NILM systems can achieve a strong performance even with limited training data. Specifically, the latter approach introduced a Transformer-based architecture with a 2D multi-attention mechanism, which included temporal attention for capturing time-dependent patterns and appliance-wise attention for modeling the interactions between different appliances at the same time step. The model utilized multi-head self-attention, residual connections, and layer normalization, inspired by Transformer designs, to enhance NILM performance. Combined with data-efficient techniques like sample augmentation, this multi-appliance-task NILM approach effectively disaggregated the appliance-level power consumption with limited training data, achieving significant improvements over baseline models. Furthermore, meta-learning approaches [80] and few-shot learning can significantly enhance model adaptability across datasets, paving the way for scalable and practical NILM solutions.

The use of time-sensing self-attention by Xuan et al. [57] further boosts the performance by focusing on active appliances and handling complex, multi-stage operational patterns, making these models more adaptable to diverse energy consumption scenarios. The method incorporates a bi-directional Transformer to capture local feature dependencies, enhancing the detection of simultaneous appliance operations, and uses GRUs to model the temporal sequences. Additionally, a novel masking strategy based on device states improved the model’s sensitivity to state transitions in multi-stage appliances, such as washing machines and HVAC systems. Experimental results showed a 25% reduction in MAE and a 20% increase in F1-score, demonstrating the model’s superior performance on the REDD and UK-DALE datasets.

Similarly, hybrid approaches that integrate Transformers with CNNs and GRUs leverage multi-scale information to improve feature extraction and generalization, though computational overhead remains a challenge. Cheng et al. [81] introduced a Transformer-based DL approach to address challenges in NILM, aiming to improve energy efficiency in demand-side management. The authors proposed a hybrid model that integrates Transformer-enhanced CNNs with a temporal scaling module to capture multi-scale information, and the decoder uses residual GRU modules for enhanced feature learning. Tested on both residential and commercial datasets, the model outperformed SOTA methods in terms of accuracy and generalization, although it faced limitations such as high computational cost and reliance on only active power data.

Another expanded architecture proposed by Zhou et al. [82] introduced the Transformer-Temporal Pooling-RethinkNet (TTRNet) model for commercial load monitoring in NILM. They address the complexities of monitoring commercial loads, which involve multiple simultaneous devices and high correlations between appliances such as HVAC (Heating, Ventilation, and Air Conditioning) systems and elevators [83]. The architecture combines convolutional layers, a Transformer module, and a temporal pooling layer to enhance contextual feature extraction. Its RethinkNet module incorporates LSTM for memory retention over long input sequences, enabling effective multi-label classification of appliance loads. Evaluated on the commercial building energy dataset (COMBED), TTRNet achieved an impressive F1-score of 0.957, outperforming other multi-label classification models. This study highlights the combination of convolutional, attention-based, and memory-augmented modules to significantly improve multi-label classification performance in commercial NILM settings. Together, these studies underscore the potential of advanced sequential models, including LSTM and Transformer-based architectures, to address the diverse challenges of NILM in residential and commercial applications.

Self-attention mechanisms are also used to improve both computational efficiency and predictive performance in [84]. The proposed model combines convolutional layers to extract local features, a self-attention mechanism to capture long-range dependencies, a bidirectional GRU layer to model temporal sequences, and a dense regression layer for final power estimation. Evaluated on real-world datasets such as UK-DALE, REDD, and REFIT across multiple appliances including fridges, washing machines, kettles, microwaves, and televisions, this model demonstrates up to 7.5 times faster training and 6.5 times faster inference compared to the Window GRU model while achieving comparable accuracy in metrics like F1-score and MAE. The model particularly excels in disaggregating multi-state appliances and generalizes well across households.

A recent study by Varanasi et al. [85] addressed the challenges of high computational cost by proposing a switch Transformer model that reduces computational costs through sparse activation of model parameters, enhancing efficiency. This model incorporated both active and reactive power data, improving load monitoring accuracy and overcoming the limitations of using only active power. The aforementioned switch Transformer architecture integrates switching and routing layers in place of traditional Transformer components, enhancing the model’s ability to capture both short-term signal patterns and long-range dependencies in the energy data. Adapted for NILM, it also incorporates self-attention layers to model global dependencies between aggregate and appliance-specific power signals, improving scalability and efficiency. These enhancements enable robust disaggregation, even for appliances with variable usage patterns. Experimental results demonstrated notable gains in accuracy and F1-score, highlighting the potential of this approach for energy disaggregation in both residential and industrial settings.

These Transformer-based NILM models utilize self-attention to effectively capture complex dependencies across time sequences, enabling superior modeling of long-range interactions between multiple appliances. However, these models often involve high computational costs and a large number of parameters, which can hinder real-time deployment or use in resource-constrained environments. To address these challenges, more efficient Transformer-based architectures have been developed, maintaining strong disaggregation performance while reducing computational overhead. Wang et al. [86] proposed one such model, the Midformer, a compact NILM architecture that integrates a lightweight 1D CNN with a Transformer encoder and eliminates recurrent layers. This architecture consists of a 1D CNN to extract local appliance features, a Transformer encoder to capture temporal patterns via multi-head self-attention, and a two-layer feed-forward network for sequence-to-point regression. By avoiding LSTM and GRUs, the midformer reduces computational overhead while maintaining efficient sequential modeling through attention mechanisms. When evaluated on the UK-DALE and REFIT datasets, it demonstrated strong cross-domain generalization and achieved consistently lower MAE and SAE values across multiple appliances, including significant improvements for devices such as kettles, microwaves, and fridges. It also surpassed full-scale Transformer and RNN-based models in disaggregation performance, offering a significantly smaller number of parameters and faster inference, making it suitable for real-time NILM in resource-constrained settings.

Sykiotis et al. [87] proposed another Transformer-based architecture for NILM, offering an efficient solution for energy disaggregation tasks. Unlike traditional recurrent models, their proposed model, called the ELECTRIcity model, relied entirely on attention mechanisms to extract the global dependencies between the aggregate and individual appliance signals, improving the disaggregation accuracy. The model used Transformer layers to estimate the power consumption patterns while requiring minimal dataset preprocessing, hence addressing issues such as data imbalance. Additionally, ELECTRIcity employed a two-stage training routine, consisting of unsupervised pre-training followed by fine-tuning for downstream tasks, resulting in an improved predictive accuracy and a reduced training time. Experimental results showed that ELECTRIcity outperformed several SOTA methods, offering significant advantages in performance and training efficiency.

The advent of advanced models in NILM has significantly improved disaggregation accuracy and generalization. These models benefit from the ability to adapt quickly to specific NILM tasks, even with minimal labeled data. Architectures like bidirectional encoder representations from Transformers (BERT) leverage Seq2seq learning capabilities to perform effectively in NILM tasks. The BERT4NILM model proposed by Yue et al. [88] introduced a tailored objective function and masked training techniques, which could enhance the model’s robustness and generalization. Particularly, BERT4NILM utilizes a bidirectional Transformer architecture to capture dependencies across time and appliances, enabling more accurate energy disaggregation. This model, evaluated on datasets such as UK-DALE and REDD, outperformed SOTA methods, further advancing DL-based solutions for energy disaggregation. Recent advancements in Transformer architectures have further underscored the significance of attention mechanisms in NILM. For instance, Rong et al. [89] demonstrated that the self-attention layers of Transformer models could effectively capture intricate temporal dependencies in energy consumption data. This improves disaggregation accuracy and robustness, especially for appliances with overlapping usage patterns.

Additionally, a novel AdaX optimization method was introduced in [90], which enhanced the performance of BERT by adaptively adjusting the learning rates based on the accumulated long-term gradient information. The latter method improved the convergence and helped achieve more stable training results. When combined with BERT, AdaX enabled the model to classify and monitor energy consumption from household appliances. It was tested on datasets like REDD and TEAD [91], demonstrating its effectiveness in real-time smart metering and monitoring systems while improving the energy demand-side management in an efficient manner. Specifically, the model disaggregates appliance-level energy data by capturing both past and future dependencies within the aggregate load data, enhancing accuracy without heavy reliance on extensive labeled data. By pretraining on large datasets and fine-tuning for specific NILM tasks, these models can significantly reduce the need for labeled data while maintaining a high disaggregation accuracy, marking a step forward in scalable and efficient energy management.

4.3. Transfer Learning

Transfer learning has gained significant attention in NILM due to its ability to leverage knowledge from one domain and apply it to another, thus improving model generalization and reducing reliance on large amounts of labeled data. In NILM, TL can be especially beneficial, as it allows models trained on one set of appliances or dataset to be used for other appliances or domains, overcoming the challenges posed by the limited labeled data for each appliance or environment. Two primary forms of TL have been widely explored, namely appliance TL and cross-domain TL.

Appliance TL, as proposed by D’Incecco et al. [92], focuses on transferring the latent features that are learned by complex appliances into simpler appliances. This approach leverages learned knowledge from high-complexity appliances, which can be reused for appliances with simpler operational patterns. On the other hand, cross-domain TL facilitates the application of models across different domains, where the training data originate from one domain and the model is applied to test the data from another. In this case, the authors found that Seq2point learning could be applied directly to similar domains without fine-tuning. However, when there were significant domain differences, fine-tuning the fully connected layers was required to adapt the model to the new data domain. The authors’ experiments with the REFIT, UK-DALE, and REDD datasets demonstrated that the features learned by CNNs were invariant across appliances and domains. This property of TL enables the creation of universal models for energy disaggregation, reducing the need for extensive sensor installation in households and significantly lowering computational cost by reusing pre-trained models across different appliances or regions.

Building on the concept of TL, Li et al. [93] proposed a distributed federated NILM (DFNILM) system, which integrates FL with TL for the purpose of privacy-preserving energy disaggregation. The system trained the models locally within individual households, which were then periodically uploaded to a central parameter server to create a global model. This approach helped ensure that local data privacy was maintained, as only the model updates, not raw data, were shared. The authors applied TL in this context, showing that the models trained in one dataset, e.g., UK-DALE, could be transferred to another, e.g., REDD, with varying success. While the performance of TL between the datasets was slightly degraded, this experiment highlighted the challenge of applying models across regions with different appliance usage patterns. Nevertheless, the DFNILM approach proved effective in balancing data privacy concerns with the potential for improved energy disaggregation using TL and FL.

In a recent study, Li et al. [94] introduced a novel TL approach for multi-objective NILM in smart buildings. They combined both appliance TL and cross-domain TL to create a one-to-many model that estimated the power consumption for multiple appliances using a smaller number of measurements. This method enabled a trained model to be transferred across different appliances within the same data domain (appliance TL) or between different domains (cross-domain TL). Along with reporting improvements in accuracy, the study also analyzed factors influencing transferability, e.g., appliance heterogeneity, operational pattern similarity, and dataset characteristics, and it proposed targeted strategies such as selective fine-tuning and domain adaptation to enhance model generalization. The work demonstrated its effectiveness through experiments on public datasets such as REFIT, REDD, and UK-DALE, showing significant improvements in accuracy and practicality compared to traditional one-to-one models. The results indicated the potential of having reduced sensor installation costs and the ability to use a single model across various appliances, providing a cost-effective and scalable solution for smart buildings.

Despite these successes, the transferability of TL models in NILM remains a critical concern. As Klemenjak et al. [95] emphasized, a model’s ability to generalize depends on the similarity of appliance signatures, operational patterns, and data distributions between source and target domains. Attention-enhanced models, while powerful for extracting appliance-specific temporal features, may overfit to the source domain and exhibit reduced generalization when the target domain presents significantly different activation patterns. To address this challenge, strategies such as selective fine-tuning of attention layers, domain adaptation, and meta-learning or few-shot learning can improve transferability, enabling robust performance on unseen datasets or appliances. Transformer-based models within a TL framework, as explored by Rong et al. [89], demonstrated that pre-trained models could be effectively fine-tuned for new appliances or domains. This approach improved generalization, reduced dependence on large labeled datasets, and highlighted the synergy between attention mechanisms and TL for scalable NILM solutions.

Additionally, feature selection strategies, as proposed by Houidi et al. [96], can provide more informative inputs to attention-enhanced TL models. They allow the network to better focus on appliance-specific activations and maintain robustness across domains. Self-adaptive TL frameworks that incorporate pseudo-labeling and partially labeled target data further enhance model generalization, offering practical solutions to the challenges of heterogeneous appliance usage and diverse datasets. Recent work by Yasodya et al. [97] further extended this line of research by introducing a self-adaptive TL framework. It integrates pseudo-labeling to refine models as the appliance behavior evolves with aging. By using synthetic aging datasets and a dynamic learning mechanism, their approach maintained high disaggregation accuracy over time, addressing the practical challenge of appliance aging in real-world deployments. Together, these self-adaptive and feature-enhanced TL strategies highlight promising directions to enhance model robustness across heterogeneous datasets, domains, and appliance lifecycles.

In summary, TL presents a promising pathway toward universal and scalable NILM solutions; however, critical challenges remain, including domain shifts between datasets, heterogeneity in appliance usage, privacy constraints in cross-household learning, and long-term degradation due to appliance aging. Solutions explored in recent literature include selective fine-tuning, domain adaptation, meta-learning, and attention-based TL for handling domain shifts; federated TL for privacy-preserving learning; and self-adaptive frameworks with pseudo-labeling for managing evolving appliance behaviors. Collectively, these approaches underscore that achieving robust and transferable NILM models requires balancing generalization, adaptability, and privacy considerations in real-world environments.

5. Studies on NILM System Implementation

This section presents an overview of various NILM system implementations, highlighting key methodologies, features, and results across different studies. Table 4 summarizes the highlights of these implementations, showcasing the diversity in approaches, hardware configurations, and performance metrics. This section provides practical insights into the integration of hardware, software, and AI for NILM, covering aspects such as edge/cloud deployment, microcontroller level optimizations, and semi-automatic data labeling areas that prior surveys have not fully explored. Unlike earlier surveys that primarily focus on laboratory-based or dataset-oriented hardware studies, our review underscores recent real-world deployments, demonstrating practical NILM system implementations and their effective integration with edge and cloud architectures.

In the following, we have divided NILM system implementations into different categories based on their focus, namely NILM data acquisition and labeling and NILM algorithm implementations in edge and cloud.

5.1. Non-Intrusive Load Monitoring Data Acquisition and Labeling

We review [100,101] in this subsection. These references primarily propose hardware and software implementations for NILM data acquisition and labeling. In the former reference, device data signatures were collected via manual switching, while in the latter reference, plug-level metering was used to acquire these signatures. In [101], a semi-automatic approach for labeling device data was further proposed and validated in residential settings.

5.1.1. A Custom-Built IoT Device to Monitor Energy Pulses for Load Disaggregation

McCrory et al. [100] investigated the application of NILM in industrial settings, particularly within Northern Ireland Water (NIW), a utility company facing substantial energy costs. Using a custom-built IoT device, they demonstrated how NILM can disaggregate the energy consumption into appliance-level data, showcasing its ability to enhance energy efficiency and operational performance in industrial settings. Specifically, the study employed ML techniques, including HMMs, CNNs, and RNNs.

The device hardware was designed to monitor the energy pulses, which correspond to a defined amount of energy passing through a meter. It included a resistor for regulating current, a photoresistor for detecting electric meter pulses, and an ESP8266 microcontroller with wireless connectivity for data transmission. The device aimed to measure the electric meter pulses and correlated the pulse frequency with the appliance energy consumption. Deployed in a residential setting for 24 h, the device recorded data by manually switching the appliances on and off to capture the load signatures and consumption patterns. Pulse frequencies were then converted into energy consumption metrics (kWh) using mathematical models, and the energy costs for individual appliances were estimated based on the electricity tariffs.

The study demonstrated the ability of IoT devices to accurately capture the pulse data and identify the appliance-specific load signatures, enabling a reliable energy disaggregation even at higher pulse frequencies. It effectively analyzed the energy consumption patterns, highlighting the cost variability based on runtime and demands. While residential NILM research benefits from datasets like REDD, industrial studies face challenges due to the scarcity of datasets. To address these gaps, the authors recommended lightweight disaggregation models for reduced computational overhead, standardized evaluation metrics, and expanded industrial research with high-frequency data collection and sub-metering. These findings reinforce the NILM’s potential to enhance energy management and reduce costs, particularly in industrial settings like NIW.

5.1.2. A Hybrid Hardware–Software Approach via Semi-Automatic Tools for NILM Dataset Labeling

A lack of fully labeled datasets has slowed NILM research, as manual data labeling is time-consuming and error-prone. To address this limitation, Pereira et al. [101] introduced a novel hardware and software platform designed to address the challenges of collecting and labeling datasets for NILM. This platform combined aggregate and plug-level monitoring systems with semi-automatic labeling tools, streamlining the process and enabling more accurate, efficient dataset creation.

To monitor the aggregate energy consumption, the authors developed a custom hardware system using a LabJack U6 data acquisition system (DAQ) as well as current and voltage transformers. The data were processed by a Toshiba NB300 unit and stored in the EMD-DF format, which is a specialized format for NILM datasets. For appliance-level monitoring, a commercially available Plugwise system with plug-level meters was used, which was connected via the ZigBee protocol. The system was enhanced by modifying an open source Plugwise Python package, reducing the sampling interval to zero for continuous data collection at up to 0.5 Hz, allowing the appliances to be enabled or disabled during runtime, and a background process was implemented to automate a real-time data transfer to a database. These improvements enabled high-resolution data collection for more accurate energy disaggregation.

The platform was deployed in a 10-day pilot study in a residential apartment, monitoring the whole house’s energy consumption with the DAQ setup and tracking 17 individual appliances using the Plugwise system. Appliances included high-energy devices like refrigerators and dishwashers and lower-energy ones like laptops and TVs. Fixed-wired loads such as ceiling lights were excluded. The whole-house system operated at 3.2 kHz, while the plug-level system sampled the appliances’ consumption at 0.5 Hz. The platform captured 92% of expected appliance data samples with minimal variation, collecting 93% of samples at the expected two-second intervals. It explained 82% of the total aggregate energy consumption but could not account for 20% of the household energy due to non-plug-connected loads.

The study also introduced a semi-automatic labeling application with an event detection algorithm and a human supervision interface. The event detection algorithm used a modified heuristic method with a sliding detection window to flag power changes above a predefined threshold (PT). The events were filtered based on minimum time spacing between events (TS) to reduce the noise. The authors compared the detected events to those given by the plug-wise ground truth and provided the best PT and TS to automatically detect each type of appliance or their combinations with a high accuracy. The human supervision interface displayed the detected events graphically, enabling the users to confirm, delete, and adjust the labels.

By adjusting some settings such as PT and TS, the system achieved a 94% detection rate, correctly identifying 2066 out of 2196 transitions. False positives and false negatives were minimized, requiring only 293 manual corrections, which signifies cases where the system misclassified or missed events, necessitating human intervention. This resulted in an 86% reduction in manual labeling effort. Appliances like kettles, ovens, and refrigerators were labeled with perfect accuracy, while noisy devices like laptops and TVs had higher false positive rates due to the fluctuating power signatures, which were misinterpreted by the algorithm. Overall, the study demonstrated the effectiveness of the platform in collecting and labeling NILM datasets. However, it faced challenges such as an inability to monitor hard-wired loads and noisy appliances. The authors planned to address these challenges in future work by integrating sensors for non-plug connected loads, improving noise handling, and leveraging crowd-sourcing for labeling to enhance adoption in energy monitoring.

5.2. Edge and Cloud Implementations of NILM Algorithms

In this subsection, we review [98,103]. In [98], the author acquired individual appliance data, applied NILM locally, then uploaded the NILM results to the cloud for storage. In [103], both edge- and cloud-based implementations of NILM algorithms were presented along with data acquisition hardware. Data labeling was not considered. The authors used the WHITED dataset current waveforms to validate the suitability of their hardware for NILM data acquisition.

5.2.1. Low-Power IoT for NILM: A Case Study Using SensiML and QuickFeather

The project in [98] demonstrated a proof-of-concept for a user-friendly NILM system leveraging the QuickFeather Development Kit and the SensiML Analytics Toolkit to monitor and classify the energy usage of household appliances. This NILM implementation collected voltage and current waveform data using the QuickFeather board’s integrated low-power microcontroller. The primary focus was to build a low-power IoT device capable of running ML applications, making it suitable for real-time energy management scenarios.

The system employed the SensiML toolkit for developing ML models. Voltage and current signals were acquired from various household appliances, such as beaters, hair dryers, heaters, irons, and lights. The collected data were labeled manually by the user and then processed within the SensiML toolkit to build classification models. Once trained, the models were deployed onto the QuickFeather development board for local execution. The classification results were then uploaded to a Firebase cloud database, providing a centralized platform for data storage. Additionally, a smartphone application was developed using Google Firebase integration to enable remote monitoring of the appliance classification results, showcasing a practical and interactive approach to NILM.

The deployment strategy emphasized simplicity, efficiency, and scalability. The voltage and current data acquisition were facilitated by the QuickFeather board, which handled real-time data processing and classification tasks. By leveraging the low-power nature of the QuickFeather board, the system ensured minimal energy consumption, making it suitable for continuous operation in IoT environments. The classification results for different household appliances were transmitted to the Firebase cloud database, enabling seamless access to energy usage insights.

The results demonstrated the system’s effectiveness in accurately classifying appliances based on their voltage and current signatures. Deployed on a development board, the system performed reliably in real-world conditions, with cloud connectivity enabling efficient data visualization through a custom smartphone app. Overall, this work highlights the innovative combination of the SensiML toolkit and QuickFeather hardware, bridging edge-based energy monitoring with cloud management.

5.2.2. Non-Intrusive Load Monitoring Prototype for Smart Home and Assisted Living Applications

Trobiani et al. [103] outlined the development of a hardware prototype for NILM, focusing on its implementation in real-world applications such as home energy management systems (HEMSs) and ambient assisted living (AAL). The prototype was designed to detect appliance activation events, extract features, and classify loads using ML models, ensuring seamless integration with modern IoT ecosystems via Wi-Fi. Built around the MSP430FR5994 microcontroller, the system prioritizes low power consumption, digital signal processing (DSP) capabilities, and real-time classification. It pairs with the TIDA-00929 evaluation board for signal conditioning and voltage spike protection.

The appliance recognition system uses a low-complexity FFNN with a single hidden layer, ideal for resource-constrained microcontrollers. The input layer processes 14-element harmonic feature vectors, while the output layer identifies the appliances. Neurons in the hidden layer are optimized for accuracy and efficiency using a heuristic formula. The model training employs a supervised learning approach with data augmentation to prevent overfitting. Two prototypes were developed: cloud-based classification, which processes data on cloud servers but incurs latency and bandwidth costs, and edge-based classification, which performs all processing locally, reducing the latency but requiring greater computational power.

The NILM prototype demonstrated strong performance, with edge-based classification achieving 92% accuracy and cloud-based classification reaching 90% accuracy. The higher edge-based accuracy was due to reduced transmission noise and the absence of cloud latency. Moreover, reducing the dataset to 105 appliances improved the classification accuracy by removing overlapping signals. The cloud-based approach offered scalability for complex models but required a higher bandwidth and introduced latency, while the edge-based model was faster and self-contained but demanded more computational resources. The validation process confirmed that the 10-bit precision, where each sample value of the data used for appliance classification is represented with 10 bits, effectively balances accuracy and efficiency. This level of precision reduces memory and computational overhead while maintaining over 90% classification accuracy in the NILM prototype. The study also highlighted the trade-offs between computational load, resource allocation, and latency, with future work aimed at optimizing hardware performance, improving model robustness, and standardizing evaluation metrics.

5.3. Scalable Edge Implementations of NILM Algorithms

The primary emphasis of the references reviewed in this subsection is on providing scalable and low-cost edge-based energy consumption data acquisition and NILM solutions. In these solutions, the acquired data are processed locally on a device that is installed at the residence being monitored. The authors in [99] demonstrated the feasibility of their proposed system by training an NILM model on data from the UK and applying the trained model to Italian household data. The authors in [102] concentrated on reducing the number of features to execute NILM models on edge devices.

5.3.1. Edge-Based NILM Solutions: A Year-Long Deployment in Italian Households

Mari et al. [99] proposed a novel approach to NILM by designing and deploying an embedded DL system for real-world applications. Unlike traditional cloud-based systems, their edge-based solution leverages a low-power microcontroller to address some key challenges such as privacy concerns, latency, and installation complexities. The core of the system is a Seq2point DL model (shown in Figure 4) based on a 1D CNN proposed by Zhang et al. [47]. Trained using normalized data from the REFIT dataset, the model adapts to varying appliance consumption patterns. This approach is needed, as it provides a practical and scalable solution for real-time energy disaggregation in smart homes, enhancing privacy by processing the data locally, reducing latency through edge computing, and lowering deployment costs compared to traditional cloud-based alternatives.

The hardware design incorporates advanced components to optimize performance. An EVALSTPM32 board captures the active and reactive power, voltage, and current with high precision, while the NUCLEO-H743ZI2 board, featuring an Arm Cortex-M7 microcontroller, handles real-time disaggregation. An ESP32 Wi-Fi module ensures reliable communication within a star-topology WLAN network. A real-time measurement and processing unit collects and analyzes the aggregate power data, while appliance-level power meters validate the model’s accuracy. The data were aggregated by a central concentrator and stored in a MySQL database for analysis.

The system was deployed in two real households in central Italy over six months, monitoring appliances like dishwashers, washing machines, and refrigerators. A web-based interface provided real-time monitoring and visualization, enhancing ease of use in practical settings. This edge-based solution reduces the reliance on cloud servers, hence mitigating privacy risks and improving robustness against connectivity issues in addition to ensuring reliable performance in real-world scenarios. The system’s results demonstrated its exceptional performance and adaptability. The signal aggregate error (SAE) remained below 12% for most appliances, while the MAE indicated effective real-time performance. Despite being trained on UK data, the system showed a strong generalization to Italian households, identifying variations in the appliance cycles. These outcomes highlight the feasibility of embedded DL for real-time NILM, paving the way for scalable applications in smart homes and industries.

5.3.2. Edge-Based NILM for MCU Systems: A Feature Trimming Approach

Tabanelli et al. [102] categorized NILM systems into two types: cloud-based and edge-based (or smart meter-based). They highlighted the drawbacks of cloud-based approaches, such as latency and privacy concerns, and suggested that edge-based NILM systems can mitigate these issues, offering the potential for greater energy savings. However, to make edge-based NILM systems feasible, the cost must be kept low, necessitating the use of low-cost microcontroller unit (MCU)-based systems for data acquisition and classification. These MCUs, though, come with limitations, including restricted memory, computational power, and sampling frequencies, making it challenging to directly implement SOTA NILM algorithms. To address these challenges, the authors optimized and evaluated an NILM framework designed for resource-constrained edge devices. Specifically, they focused on an STM32F4 MCU-based system using an ARM Cortex-M4 processor, referred to as a smart measurement node. This system acquired voltage and current load data at a sampling frequency of 10 kHz, utilizing 100 ms of acquired data for feature extraction. The acquired data corresponds to the Domestic Appliance Dataset (DAD), which provides both single-appliance and multi-appliance measurements suitable for evaluating edge-based NILM algorithms. The authors evaluated the performance of four traditional ML-based NILM algorithms, including RF, SVMs, two-layer MLP, and k-nearest neighbors (KNN) within the NILM framework, focusing on both single and multi-appliance disaggregation scenarios. They began by analyzing the theoretical time and space complexities of these algorithms based on the number of features used. Additionally, the computational and memory requirements for extracting various time and frequency domain features on the ARM Cortex-M4 core were addressed.

To optimize the system, the authors reduced the number of features used for appliance classification and examined the impact on the performance. In single-appliance disaggregation tests, they found that using only five features significantly lowered the computational and memory requirements, with a minor decrease in the precision, accuracy, and recall compared to using 71 features. For the SVM, a reduction to 36 features improved the precision, recall, and accuracy compared to using the full 103 features. Furthermore, the authors demonstrated that relying solely on frequency-domain features still yielded satisfactory appliance disaggregation performance, enabling the elimination of voltage measurement sensors and further reducing system costs.

The authors also evaluated the performance of the MLP algorithm in a multi-appliance disaggregation scenario involving five devices, namely a fan, electric coffee maker, light bulb, monitor, and power bank. They demonstrated that using 34 feature vectors reduced the computational and memory requirements while maintaining reasonable disaggregation performance compared to using the full set of 100 features. Although deploying the full-feature NILM on the smart monitoring node was not feasible, the authors showed that reducing the number of features made the implementation possible. This reduction in features simplified the NILM model while still providing satisfactory appliance disaggregation results. Finally, the authors compared the algorithms in terms of latency, memory usage, and accuracy, concluding that RFs were the most suitable for edge device implementation, followed by SVM, MLP, and KNN.

6. Challenges and Future Directions

The usefulness of NILM in finding appliance energy usage and reducing energy bills has already been demonstrated in various works in the literature. In this section, we categorize some future research directions of NILM, including the development and consolidation of new applications and challenges faced in the adoption of NILM. We also identify and review some works that can serve as a basis for building these future research directions.

6.1. Energy Source Heterogeneity and Aggregate Data Uncertainty

Distribution network operators face challenges due to the continual increase in the integration of distributed energy resources (DERs), such as PV systems and EVs, in energy grids. These challenges arise due to several reasons including the volatility of renewable energy sources such as behind-the-meter solar generation, wind energy, and other fluctuating loads, lack of complete load status information, and limited monitoring infrastructure on the low-voltage side. NILM can also help in improving grid management and distribution operations by detecting DER consumption from aggregated data at substations and low-voltage networks. However, this process faces issues, since the aggregation process masks the individual contributions of various loads, making it challenging to distinguish unique consumption patterns. This issue is compounded by the “partial labels” problem, where the operators may only have partial knowledge of the active loads during specific time intervals. Furthermore, the overlapping load patterns and the lack of pure measurements for individual loads hinder the effectiveness of traditional disaggregation methods, which typically depend on fully labeled datasets or high-resolution measurements. Future research must address the challenge of developing NILM models that can learn effectively under weak supervision, tolerate uncertainty, and operate reliably with minimal labeling. In the following, we review some references that present the ongoing work in this field and which can be used for further development in this direction.

To cope with the aforementioned challenges, Li et al. [104] proposed an innovative dictionary learning-based approach that leverages partially labeled aggregate data to extract distinct load patterns. Their method introduced column-sparsity to identify unlabeled loads and incorporated an incoherence regularization term to differentiate between the patterns of various load types. This approach facilitated both the offline learning of load patterns and real-time disaggregation, offering a robust and practical solution to the limitations of conventional methods. However, scalability to high-frequency data streams and adaptation to unseen load behaviors remain open questions.

Yi et al. [105] addressed these challenges by proposing a Bayesian dictionary learning-based framework that enhances the disaggregation accuracy while introducing an uncertainty index to evaluate the reliability of the estimated loads. Using probabilistic modeling, this approach learns the load patterns and decomposition coefficients from the partially labeled data and computes the mean and covariance of the resulting probability distributions to model the uncertainty. This innovative method is robust to noisy data. It also eliminates the need for fully labeled datasets, and it equips operators with a tool to make informed decisions. Compared to deterministic approaches, the framework demonstrated superior performance, particularly in noisy and incomplete data environments. Nonetheless, future implementations will need to consider the trade-offs between computational cost and real-time uncertainty estimation for large-scale deployment.

Wang et al. [106] extended NILM from the household to the substation level, incorporating traditional loads, PVs, and EVs. They used spectral analysis to forecast the traditional loads, improving the signal-to-noise ratio, and applied peak coincidence analysis to estimate the PV load. The residual was then used to estimate the EV load and the number of EVs using limited activation matching pursuit, a sparse coding technique that improves the load estimation accuracy. This approach outperformed traditional methods like orthogonal matching pursuit and non-negative matching pursuit. The testing of the proposed method was carried out using traditional load data from 800 UK substations, DER data from 35 PV panels, EV data from Manitoba, Canada, and EV charging profiles from the Low Carbon London projects, with a 30 min sampling interval. The future challenge lies in refining such approaches for finer-grained sampling rates and ensuring interoperability across geographically and temporally diverse data sources.

Jaramillo et al. [107] adapted NILM, traditionally used to identify household appliances, to detect the PV generation and EV consumption from aggregated network measurements. By utilizing the IEEE European low-voltage test feeder and OpenDSS simulation tool, the authors applied the KNN algorithm to classify the data sampled at one-minute intervals. Their results showed that the NILM could accurately identify the PV and EV profiles, offering a valuable tool to support DER integration and improve the management of low-voltage power networks. However, classification models like KNN still face limitations when subjected to highly dynamic and unseen load profiles, necessitating future research on adaptive learning mechanisms.

Similarly, the authors in [108] introduced a novel use of NILM for identifying the electrical signatures of DERs, specifically EVs and rooftop PV systems, from the aggregated measurements at low-voltage distribution transformers. The authors evaluated three supervised ML algorithms, namely KNN, RF, and MLP across 100 DER integration scenarios using a dataset of minute-level electric current, voltage, active power, and reactive power measurements. The model achieved high accuracy, with F1-scores of 73% for EVs and 93% for PV systems, while maintaining fast processing times of under 314 μs. This work demonstrates the feasibility of NILM techniques for improving observability and enabling real-time intelligent control of DERs, ultimately supporting the efficient management of low-voltage distribution networks. Despite the promising results, future work must address generalization across new device types, evolving load patterns, and deployment across heterogeneous infrastructures.

Addressing data uncertainty and the lack of standardized datasets remains a critical challenge. Yaniv et al. [109] highlighted several practical strategies to improve data quality and preprocessing. These include incorporating power quality features such as voltage harmonics and current transients to enhance appliance differentiation; applying noise reduction techniques like filtering and smoothing to mitigate measurement errors; normalizing features to ensure comparability across devices and operational conditions; handling missing data through interpolation or imputation; and selecting relevant features to reduce dimensionality and prevent overfitting. The authors also recommended scenario-specific data collection tailored to operational characteristics, as well as collaborative efforts to standardize datasets across industrial and distribution contexts. Collectively, these preprocessing strategies are considered crucial for mitigating data uncertainty and enhancing the reliability of NILM system deployment in heterogeneous and partially labeled industrial environments.

Another possible research direction to aid in the application of NILM to aggregated network measurements involves the utilization of weather parameters. The use of weather parameters with probabilistic ML models was demonstrated in [110,111] for forecasting net aggregated load and providing estimates of forecasting uncertainty caused by the presence of PV generation. We believe that such a concept can be studied to achieve NILM from aggregated network measurements in the presence of uncertain load fluctuations caused by DERs, possibly also providing an estimate of the uncertainty affiliated with the disaggregated load.

6.2. Privacy and Safety

Non-intrusive load monitoring techniques can raise significant privacy concerns, as they may reveal sensitive information about household activities, such as being at home or away, sleep patterns, and daily routines. This level of detail poses a substantial privacy risk, making it essential for NILM researchers to find ways to protect users’ privacy while still gathering valuable energy data. To address these concerns, we have developed a four-layered conceptual framework that categorizes privacy and safety strategies in NILM systems based on our review of recent studies discussed throughout this subsection. The framework is illustrated in Figure 5 and organizes the strategies into (1) model-level techniques focusing on privacy-preserving learning architectures; (2) system-level defenses such as secure infrastructures and encrypted communications; (3) regulatory and policy-based interventions that govern the ethical use of smart meter data; and (4) safety-related NILM applications that demonstrate broader societal benefits while requiring careful privacy handling.

Building on the first layer, Wang et al. [112] addressed privacy and efficiency concerns in NILM by proposing a blockchain-based clustered FL framework. Non-intrusive load monitoring relies on disaggregating household energy consumption into appliance-level data, but traditional centralized ML approaches raise privacy risks. To mitigate this issue, the authors introduced a decentralized FL approach that allows edge devices to train the models locally, reducing the need for direct data sharing. Blockchain technology was also integrated to enhance security and trust, while clustering techniques were used to create multiple global NILM models, improving personalization. Additionally, differential privacy was enforced by injecting Laplace noise into the first layer of neural networks, hence balancing privacy protection with performance retention. Lightweight optimization techniques such as data quantization and weight pruning further improved the computational efficiency. Case studies demonstrated that this approach outperforms traditional FL methods in terms of accuracy, robustness, and security. However, deploying such layered model-level solutions at scale remains a challenge, particularly in low-resource edge environments where computing capabilities are constrained.

Continuing with model-level defense strategies, Ibrahem et al. [113] proposed a hybrid DL defense model by integrating CNNs and GRUs to mitigate the privacy threats posed by presence-privacy attacks. By sending spoofing transmissions to obscure energy patterns, the model effectively reduced the attacker’s success rate. The hybrid CNN-GRU model leverages temporal patterns to decide on spoofing actions, offering robust protection for energy disaggregation systems. The authors used convolutional layers for feature extraction, max pooling for feature combination, and fully connected layers with a softmax output for presence classification. After training the model on data from consumer presence, it became active when consumers were away, providing real-time spoofing responses. While this approach showcases strong technical potential, future implementations should address the adaptive strategies of adversaries and the generalizability of spoofing defenses across varied household profiles.

Expanding beyond model-level approaches, Zhang et al. [114] proposed smart meter data aggregation across multiple neighboring homes to enhance privacy and reduce appliance detectability. Several AI-based adversary models, including CNN, CNN-LSTM, GRU, and KNN, were employed to examine the effectiveness of this aggregation strategy. Complementing this work, the study in [115] highlighted privacy challenges in the Norwegian power distribution system, focusing on uncertainties regarding ethical boundaries in data collection, transmission, and usage. These studies underscore the need for system-level interventions that can safeguard smart meter networks from increasingly sophisticated attack vectors. However, research must prioritize resilient aggregation mechanisms and intelligent filtering techniques that can dynamically adapt to adversarial threats in real-time.

At the infrastructure level, system-oriented privacy solutions were extensively reviewed in [116], where the authors proposed a layered architecture consisting of the control center, distribution layer, and smart meter layer. Each layer was capable of verifying the signature of others without accessing the underlying data content. Other defense solutions included energy masking using renewable and battery energy sources, smart privacy gateways for access control and data isolation, and encryption mechanisms like hash-based keys and orthogonal chip sequences. These infrastructural designs add critical defensive depth to the NILM pipeline. However, ensuring compatibility across diverse utility infrastructures and maintaining operational efficiency without compromising security remains an open challenge that warrants further interdisciplinary investigation.

Shifting focus to the policy layer, the studies in [117,118] emphasized the importance of robust legislative support for privacy enforcement. For example, the Ontario Energy Data Act and other proposed policies advocate for informed consent, data minimization, and the establishment of independent regulatory bodies. These initiatives reflect growing recognition of the need to formalize data governance. Yet, legal frameworks must evolve in tandem with technological developments, especially as NILM finds new use-cases such as real-time monitoring, behavioral analytics, and energy trading—areas where regulatory clarity is still lacking.

Finally, the safety-related applications of NILM demonstrate its potential in public safety contexts. Wang et al. [119] presented the Bats algorithm, capable of identifying battery charging and discharging hazards using dictionary learning and sparse coding. Additionally, ref. [12] explored the use of NILM for victim localization during disasters and emergencies. These applications underscore the dual-use nature of NILM technology, presenting both opportunities and risks that must be carefully balanced through ethical design and regulatory oversight. Future research should explore standardized frameworks for safe NILM deployment in critical response scenarios.

Table 5 presents a structured overview of these identified privacy threats and the corresponding mitigation strategies across four key layers in NILM systems. While these strategies represent significant progress, future challenges persist in developing privacy-preserving models that scale to real-world conditions, aligning legal frameworks with rapidly evolving technologies, ensuring trustworthy system-level enforcement of privacy, and ethically regulating NILM applications for safety and emergency use cases. Addressing these challenges will require multidisciplinary collaboration to ensure privacy protection, user trust, and operational resilience in NILM-enabled smart grid systems.

6.3. Cost and Implementation Complexity Reduction

Energy consumption data acquisition for NILM faces different challenges. One challenge is the data sampling frequency: It is preferable to collect the data at a high frequency to increase the distinguishability of load signatures; however, data collection at a high frequency increases the data storage, data acquisition, and computational hardware costs. Data acquisition at a lower frequency can decrease hardware costs; however, it may lead to information loss, making it more difficult to distinguish between different loads. A possible solution is the use of compressed sensing (CS) theory, which allows for data acquisition at a reduced sampling rate while minimizing information loss. The authors in [120] studied the use of different CS measurement and reconstruction techniques in power engineering. The authors in [121] proposed the use of CS in NILM and showed that using CS, a smaller number of samples could still result in higher NILM accuracy compared to traditional NILM techniques. However, the full exploitation of CS capabilities in NILM is only possible using the corresponding hardware implementation for electricity consumption data acquisition. Therefore, we suggest the use of CS principles to design low-complexity DAQ hardware for NILM data acquisition, similar to [122], where a low-complexity CS architecture was proposed for an analog-to-digital converter.

Energy disaggregation is computationally intensive and is considered NP-hard. This complexity is further influenced by factors such as differences among datasets, sampling frequency of energy data, prediction timeframe, and number of active devices.

Another challenge is the collection of device signatures and the labeling of each device’s signature. Traditional supervised methods rely on sub-metering the individual appliances to collect ground-truth data and involves significant human intervention, which is inconvenient and time-consuming. On the other hand, unsupervised methods require manual labeling after disaggregation or assume prior knowledge of the number of household appliances, limiting their scalability. Promising solutions to deal with these challenges can be developed based on the work in [100,101]. The former reference required the use of plug-wise monitors but reduced the need for human intervention by making use of semi-automatic labeling. The latter reference, on the other hand, used existing data from the UK to train NILM models that were applied to energy consumption data acquired in Italy. The authors in [101] demonstrated that by using available training data from a different country, it was still possible to accurately carry out NILM. We believe that this process can be improved by combining it with TL and semi-automated labeling concepts [123]. An initial model trained on data belonging to a different location can be refined based on human–machine collaboration, which would use limited feedback from a user about the NILM results to iteratively improve the initial model. This direction of research requires further study and analysis.

The ANNs that are typically used for NILM require significant computational resources, as they take aggregated energy consumption as input and aim to predict the energy usage of individual appliances. To address these challenges, Nalmpantis et al. [124] introduced the so-called neural Fourier energy disaggregator (NFED) model, which replaces the attention mechanism of the self-attentive energy disaggregator [42] with the Fourier transform. This model reduced the computational complexity by utilizing a lower number of learning parameters while achieving a performance comparable to that of the SOTA NILM systems. The authors in [125] proposed a probabilistic graphical model-based approach that does not require sub-metering or complete appliance knowledge. Each appliance was modeled using a variant of the difference HMM, and the training process involved learning the model parameters. Such a framework can minimize the need for intrusive data collection, reducing the complexity and improving the scalability.

Distant processing of NILM gives rise to latency and privacy concerns, which can be overcome through edge-based NILM systems. However, to maintain the low cost of such a system, edge-based devices with low memory and computational capacities need to be utilized [102], which require the optimization of the NILM algorithms. Athanasoulias et al. [126] tackled the computational complexity of deploying NILM systems on edge IoT devices by introducing a novel method for compressing DNNs. The authors proposed an iterative magnitude pruning strategy based on

l_{1}

-norm to identify an optimal compressed DNN model before training, as opposed to traditional post-training compression methods. This pre-training pruning reduced the computational costs by up to 95% while maintaining model accuracy. The method isolates the sub-networks containing only 5% of the original model’s parameters, achieving comparable accuracy on the UK-DALE dataset. By balancing the computational complexity, measured in multiply-and-accumulate units, with the performance, this strategy enhanced the feasibility of deploying NILM applications on edge devices. It ensured efficient energy disaggregation while addressing the constraints of storage and computational power in real-world IoT environments.

Tan et al. [127] explored the application of self-supervised learning in NILM, addressing the challenge of utilizing massive amounts of unlabeled monitoring data. The authors proposed a self-supervised learning approach that eliminated the need for labeled data by leveraging an encoder–decoder DL model. Specifically, the encoder extracts high-dimensional features from the input data and compresses them into a dense low-dimensional vector, which the decoder then reconstructs, allowing the model to autonomously learn meaningful load characteristics. The method was tested on the AMPds2 dataset [128] and compared with other ML approaches, including HMM, CNNs, and LSTM networks. The results demonstrated that the self-supervised learning-based approach achieved comparable or superior accuracy in load identification while also offering some advantages in terms of computational efficiency. The research results highlight the potential of self-supervised learning to enhance NILM by enabling autonomous training from real-world data, reducing the reliance on labeled datasets, and improving the scalability for large-scale applications.

Ding et al. [129] proposed an NILM approach using few-shot TL with meta-learning and relational networks (MRNILR-TL). This approach addresses the challenge of limited labeled data due to the growing variety of appliances and high data collection costs. The method constructs an episode-based task dataset using meta-learning, enhancing the generalization and improving the NILM performance in low-data scenarios. Subsequently, a relational network was utilized to enhance the model’s ability to compare similarities between the appliance features, improving the classification accuracy without requiring fine-tuning in the target domain. The approach was evaluated using the PLAID [130] and WHITED datasets, demonstrating superior performance over existing NILM TL methods. The results show that the MRNILR-TL achieved higher accuracy in few-shot multi-classification tasks while eliminating the need for large labeled datasets in the target domain. This makes it a practical solution for real-world NILM applications, ensuring efficient load recognition even with minimal labeled training data.

6.4. Standardized Comparison of Different Methods

One of the key challenges in NILM research lies in the lack of standardized methodologies for evaluating model performance. In [26], the authors reviewed the metrics used for performance evaluation of NILM algorithms and emphasized that different methods proposed in the literature may use only a subset of these metrics. This inconsistency makes it difficult to compare and benchmark NILM models effectively. In [25], the authors further noted that there is no universally accepted criterion for selecting NILM evaluation metrics, as their significance often depends on various contextual factors such as the type of appliance, data sample size, and noise levels in the dataset.

As an illustrative example, the disaggregation accuracy of a microwave oven appears very high, as shown in Figure 6, where the model consistently maintains an accuracy of 0.99 across 20 epochs. However, this is not reflective of the actual disaggregation performance depicted in Figure 7, where considerable discrepancy is observed between the predicted and actual energy consumption. In this context, the F1-score is more informative, as it captures the imbalance in true positive and false positive rates for low-power or short-duration devices. Despite the high accuracy, the lower F1-score in Figure 6 suggests the model struggles with reliably detecting microwave usage events. This discrepancy underscores the need to adopt evaluation metrics that reflect real-world NILM performance and usage constraints, particularly for appliances with sparse or bursty usage patterns.

Another critical concern, as discussed in [131], is the variability in NILM output formats across different studies. While some models are developed to classify device types, others focus on detecting ON/OFF events or estimating energy consumption. These varying objectives and output types present a fundamental barrier to standardized comparison. A method optimized for event detection may not perform well in energy estimation, and vice versa. Consequently, without a uniform output specification, the effectiveness of NILM algorithms remains difficult to evaluate on a level playing field.

While various studies have proposed evaluation metrics and output formats for NILM, the field still lacks a universally adopted standard. Therefore, future research must focus on consolidating these efforts into standardized evaluation protocols that define a core set of metrics and unify the expected output types. Such standardization is essential to enable fair benchmarking, ensure reproducibility, and support the real-world deployment of NILM systems.

7. Conclusions

In conclusion, NILM has emerged as an effective solution for energy management by enabling efficient monitoring and control of the energy consumption across various devices in a household or commercial setting. This paper highlighted the growing importance of utilizing advanced AI and DL techniques to enhance the accuracy of NILM systems. Various AI models, including hybrid architectures and emerging techniques, such as TL, GANs, and attention-enhanced models, were identified as key enablers for improving load detection and addressing the issue of data scarcity. A review of existing NILM system implementations showed the feasibility of DL-based NILM. However, some challenges remain, such as the heterogeneity of energy sources, data uncertainty, privacy, and complexity. Furthermore, the need for standardization remains, requiring further research and innovation in both the technical and regulatory aspects. Continued advancements in AI, hardware, and software integration will play a critical role in overcoming these barriers, ultimately leading to more efficient, reliable, and cost-effective NILM-based solutions.

Author Contributions

Conceptualization, A.H., A.S.K., and A.A.A.; methodology, A.H., A.S.K., and A.A.A.; formal analysis, A.H., A.S.K., A.A.A., and B.A.; investigation, A.H., A.S.K., and A.A.A.; writing—original draft preparation, A.H., A.S.K., and A.A.A.; writing—review and editing, A.H., A.S.K., A.A.A., B.A., A.A., and I.W.; visualization, A.H.; supervision, A.A. and I.W.; funding acquisition, A.A. and I.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by a research grant from the National Sciences and Engineering Research Council of Canada (NSERC) held by Alagan Anpalagan. This work is also supported in part by the NSERC grant Ref. no. 1-51-48574, held by Isaac Woungang.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

NILM	Non-Intrusive Load Monitoring	ILM	Intrusive Load Monitoring
IoT	Internet of Things	AI	Artificial Intelligence
ML	Machine Learning	DL	Deep Learning
1D	One Dimension	FHMM/HMM	(Factorial) Hidden Markov Model
DAE	Denoising Autoencoder	LSTM	Long Short-Term Memory
GAI	Generative AI	A-GAI	Attention-enhanced Generative AI
PTr-Nets	Power Transformer Networks	GNN	Graph Neural Network
RL	Reinforcement Learning	GRU	Gated Recurrent Unit
CNN	Convolutional Neural Network	GAN	Generative Adversarial Network
TL	Transfer Learning	DNN	Deep Neural Network
CAE	Convolutional Autoencoder	FL	Federated Learning
SVM	Support Vector Machine	ANN	Artificial Neural Network
Seq2point	Sequence-to-Point Learning	Seq2seq	Sequence-to-Sequence Learning
Seq2subseq	Sequence-to-Subsequence Learning	MAE	Mean Absolute Error
RMSE	Root Mean Squared Error	MLP	Multi-Layer Perceptron
SOC	State of Charge	EV	Electric Vehicle
PSPNet	Pyramid Scene Parsing Network	SOTA	State of the Art
MAS	Multi-Agent-based Simulator	LMP	Low Magnitude Pruning
SBP	Stacked Bidirectional Predictor	EBP	Event-Based Processing
MQ-LSTM	Multi-Quantile Long Short-Term Memory	PV	Photovoltaic Load
TCL	Thermostatically Controlled Load	RF	Random Forest
FFNN	Feed-Forward Neural Network	BTM	Behind-The-Meter
VRNN	Variational Recurrent Neural Networks	CRNN	Convolutional Recurrent Neural Network
TTRNet	Transformer-Temporal Pooling-RethinkNet	SMAPE	Symmetric Mean Absolute Percentage Error
SGN	Subtask Gated Network	VAE	Variational Autoencoder
GAF	Gramian Angular Fields	MTF	Markov Transition Fields
RP	Recurrence Plots	DDRN	Deep Dilated Residual Networks
NIW	Northern Ireland Water	HEMs	Home Energy Management Systems
AAL	Ambient Assisted Living	PT	Predefined Threshold
TS	Time Spacing	SAE	Signal Aggregate Error
MCU	Microcontroller Unit	KNN	K-Nearest Neighbors
DER	Distributed Energy Resources	NFED	Neural Fourier Energy Disaggregator
CTA-BERT	Combined With Time-Sensing Self-Attention with BERT	DAD	Domestic Appliance Dataset

References

Ma, S.; Li, S.; Luo, Q.; Yu, Z.; Wang, Y. Revisiting the relationships between energy consumption, economic development and urban size: A global perspective using remote sensing data. Heliyon 2024, 10, e27318. [Google Scholar] [CrossRef]
Tokam, L.W.; Ouro-Djobo, S.S. Comparative Study on Load Monitoring Approaches. Appl. Sci. 2023, 13, 5755. [Google Scholar] [CrossRef]
Nations, U. For a Livable Climate: Net-Zero Commitments Must Be Backed by Credible Action. Available online: https://www.un.org/en/climatechange/net-zero-coalition#:~:text=To%20keep%20global%20warming%20to,reach%20net%20zero%20by%202050 (accessed on 1 September 2024).
Revuelta Herrero, J.; Lozano Murciego, Á.; López Barriuso, A.; Hernández de la Iglesia, D.; Villarrubia González, G.; Corchado Rodríguez, J.M.; Carreira, R. Non Intrusive Load Monitoring (NILM): A State of the Art. In Trends in Cyber-Physical Multi-Agent Systems, Proceedings of the PAAMS Collection—15th International Conference, PAAMS 2017, Porto, Portugal, 21–23 June 2017; De la Prieta, F., Vale, Z., Antunes, L., Pinto, T., Campbell, A.T., Julián, V., Neves, A.J.R., Moreno, M.N., Eds.; Springer: Cham, Switzerland, 2018; pp. 125–138. [Google Scholar]
Neenan, B.; Robinson, J.; Boisvert, R.N. Residential Electricity Use Feedback: A Research Synthesis and Economic Framework; Technical Report; Electric Power Research Institute, Inc.: Palo Alto, CA, USA, 2009. [Google Scholar]
Heffernan, T. Do You Really Need a Home Energy Monitor? 2024. Available online: https://www.nytimes.com/wirecutter/reviews/home-energy-monitor/ (accessed on 29 November 2024).
Hart, G. Nonintrusive appliance load monitoring. Proc. IEEE 1992, 80, 1870–1891. [Google Scholar] [CrossRef]
Aladesanmi, E.; Folly, K. Overview of non-intrusive load monitoring and identification techniques. IFAC-PapersOnLine 2015, 48, 415–420. [Google Scholar] [CrossRef]
Khwaja, A.S.; Naeem, M.; Anpalagan, A.; Venetsanopoulos, T.A.; Venkatesh, B. Smart meter deployment optimisation and its analysis for appliance load monitoring. J. Eng. 2015, 2015, 116–124. [Google Scholar] [CrossRef]
Buzachis, A.; Fazio, M.; Galletta, A.; Celesti, A.; Villari, M. Intelligent IoT for Non-Intrusive Appliance Load Monitoring Infrastructures in Smart Cities. In Proceedings of the AI&IoT@AI*IA, Rende, Italy, 22 November 2019. [Google Scholar]
Shi, Y.; Li, W.; Chang, X.; Yang, T.; Sun, Y.; Zomaya, A. On enabling collaborative non-intrusive load monitoring for sustainable smart cities. Sci. Rep. 2023, 13, 6569. [Google Scholar] [CrossRef]
Alnoman, A.; Khwaja, A.S.; Anpalagan, A.; Woungang, I. Emerging AI and 6G-Based User Localization Technologies for Emergencies and Disasters. IEEE Access 2024, 12, 197877–197906. [Google Scholar] [CrossRef]
Milan, A.; Rezatofighi, S.; Garg, R.; Dick, A.; Reid, I. Data-Driven Approximations to NP-Hard Problems. Proc. AAAI Conf. Artif. Intell. 2017, 31. [Google Scholar] [CrossRef]
Wang, F.; He, Q.; Li, S. Solving Combinatorial Optimization Problems with Deep Neural Network: A Survey. Tsinghua Sci. Technol. 2024, 29, 1266–1282. [Google Scholar] [CrossRef]
Yang, X.; Wang, Z.; Zhang, H.; Ma, N.; Yang, N.; Liu, H.; Zhang, H.; Yang, L. A Review: Machine Learning for Combinatorial Optimization Problems in Energy Areas. Algorithms 2022, 15, 205. [Google Scholar] [CrossRef]
Angelis, G.F.; Timplalexis, C.; Krinidis, S.; Ioannidis, D.; Tzovaras, D. NILM applications: Literature review of learning approaches, recent developments and challenges. Energy Build. 2022, 261, 111951. [Google Scholar] [CrossRef]
Huber, P.; Calatroni, A.; Rumsch, A.; Paice, A. Review on Deep Neural Networks Applied to Low-Frequency NILM. Energies 2021, 14, 2390. [Google Scholar] [CrossRef]
Kaselimi, M.; Protopapadakis, E.; Voulodimos, A.; Doulamis, N.; Doulamis, A. Towards Trustworthy Energy Disaggregation: A Review of Challenges, Methods, and Perspectives for Non-Intrusive Load Monitoring. Sensors 2022, 22, 5872. [Google Scholar] [CrossRef]
Kahl, M. Machine Learning for Non-Intrusive Load Monitoring. Ph.D. Thesis, Technische Universität München, Munich, Germany, 2019. [Google Scholar]
Verma, A.; Anwar, A.; Mahmud, M.A.P.; Ahmed, M.; Kouzani, A. A Comprehensive Review on the NILM Algorithms for Energy Disaggregation. arXiv 2021, arXiv:2102.12578. [Google Scholar] [CrossRef]
Ouzine, J.; Marzouq, M.; Bennani, S.D.; Lahreche, K.; El Fadili, H. Overview of Non-Intrusive Load Monitoring: Probabilistic and Artificial Intelligence approaches. E3S Web Conf. 2022, 351, 01021. [Google Scholar] [CrossRef]
Dash, S.; Sahoo, N. Electric energy disaggregation via non-intrusive load monitoring: A state-of-the-art systematic review. Electr. Power Syst. Res. 2022, 213, 108673. [Google Scholar] [CrossRef]
Nalmpantis, C.; Vrakas, D. Machine learning approaches for non-intrusive load monitoring: From qualitative to quantitative comparation. Artif. Intell. Rev. 2019, 52, 217–243. [Google Scholar] [CrossRef]
Schirmer, P.A.; Mporas, I. Non-intrusive load monitoring: A review. IEEE Trans. Smart Grid 2022, 14, 769–784. [Google Scholar] [CrossRef]
Rafiq, H.; Manandhar, P.; Rodriguez-Ubinas, E.; Ahmed Qureshi, O.; Palpanas, T. A review of current methods and challenges of advanced deep learning-based non-intrusive load monitoring (NILM) in residential context. Energy Build. 2024, 305, 113890. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Y.; Ma, J. Non-Intrusive Load Monitoring in Smart Grids: A Comprehensive Review. arXiv 2024, arXiv:2403.06474. [Google Scholar] [CrossRef]
Kahl, M.; Haq, A.U.; Kriechbaumer, T.; Jacobsen, H.A. Whited-a worldwide household and industry transient energy data set. In Proceedings of the 3rd International Workshop on Non-Intrusive Load Monitoring, Vancouver, BC, Canada, 14–15 May 2016; pp. 1–4. [Google Scholar]
Monteiro, R.; de Santana, J.; Teixeira, R.; Bretas, A.; Aguiar, R.; Poma, C. Non-intrusive load monitoring using artificial intelligence classifiers: Performance analysis of machine learning techniques. Electr. Power Syst. Res. 2021, 198, 107347. [Google Scholar] [CrossRef]
Kolter, J.Z.; Johnson, M.J. REDD: A public data set for energy disaggregation research. In Proceedings of the Workshop on Data Mining Applications in Sustainability (SIGKDD), San Diego, CA, USA, 21 August 2011; Citeseer: Princeton, NJ, USA, 2011; Volume 25, pp. 59–62. [Google Scholar]
Kelly, J.; Knottenbelt, W. The UK-DALE dataset, domestic appliance-level electricity demand and whole-house demand from five UK homes. Sci. Data 2015, 2, 1–14. [Google Scholar] [CrossRef]
Matindife, L.; Sun, Y.; Wang, Z. A Machine-Learning Based Nonintrusive Smart Home Appliance Status Recognition. Math. Probl. Eng. 2020, 2020, 9356165. [Google Scholar] [CrossRef]
Schirmer, P.A.; Mporas, I. Binary versus Multiclass Deep Learning Modelling in Energy Disaggregation. In Energy and Sustainable Futures; Mporas, I., Kourtessis, P., Al-Habaibeh, A., Asthana, A., Vukovic, V., Senior, J., Eds.; Springer: Cham, Switzerland, 2021; pp. 45–51. [Google Scholar]
Zhang, X.Y.; Kuenzel, S. Differential Privacy for Deep Learning-based Online Energy Disaggregation System. In Proceedings of the 2020 IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe), The Hague, The Netherlands, 26–28 October 2020; pp. 904–908. [Google Scholar] [CrossRef]
Liu, S.J. Deep Learning Approach for Load Disaggregation of Residential Electricity Consumption Data with Low-Sampling Rate. Master’s Thesis, Eindhoven University of Technology, Eindhoven, The Netherlands, 2021. [Google Scholar]
Du, Z.; Li, J.; Zhu, L.; Lu, K.; Shen, H.T. Adversarial Energy Disaggregation. ACM/IMS Trans. Data Sci. 2021, 2, 1–16. [Google Scholar] [CrossRef]
Eduardo Santoro, M. Applications of Deep Learning for Load Disaggregation in Residential Environments. Bachelor Thesis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil, 2017. [Google Scholar]
Edmonds, J.; Abdallah, Z.S. IMG-NILM: A Deep learning NILM approach using energy heatmaps. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, SAC ’23, Tallinn, Estonia, 27–31 March 2023; pp. 1151–1153. [Google Scholar] [CrossRef]
Massidda, L.; Marrocu, M.; Manca, S. Non-Intrusive Load Disaggregation by Convolutional Neural Network and Multilabel Classification. Appl. Sci. 2020, 10, 1454. [Google Scholar] [CrossRef]
Delfosse, A.; Hébrail, G.; Zerroug, A. Deep Learning Applied to NILM: Is Data Augmentation Worth for Energy Disaggregation? In Proceedings of the European Conference on Artificial Intelligence, Santiago de Compostela, Santiago de Compostela, Spain, 29 August–8 September 2020. [Google Scholar]
Chen, K.; Zhang, Y.; Wang, Q.; Hu, J.; Fan, H.; He, J. Scale- and Context-Aware Convolutional Non-Intrusive Load Monitoring. IEEE Trans. Power Syst. 2020, 35, 2362–2373. [Google Scholar] [CrossRef]
Athanasiadis, C.L.; Papadopoulos, T.A.; Doukas, D.I. Real-time non-intrusive load monitoring: A light-weight and scalable approach. Energy Build. 2021, 253, 111523. [Google Scholar] [CrossRef]
Barber, J.; Cuayáhuitl, H.; Zhong, M.; Luan, W. Lightweight Non-Intrusive Load Monitoring Employing Pruned Sequence-to-Point Learning. In Proceedings of the 5th International Workshop on Non-Intrusive Load Monitoring, NILM’20, Virtual, 18 November 2020; pp. 11–15. [Google Scholar] [CrossRef]
Piccialli, V.; Sudoso, A.M. Improving Non-Intrusive Load Disaggregation through an Attention-Based Deep Neural Network. Energies 2021, 14, 847. [Google Scholar] [CrossRef]
Jiang, J.; Kong, Q.; Plumbley, M.D.; Gilbert, N.; Hoogendoorn, M.; Roijers, D.M. Deep Learning-Based Energy Disaggregation and On/Off Detection of Household Appliances. ACM Trans. Knowl. Discov. Data 2021, 15, 1–21. [Google Scholar] [CrossRef]
Luan, W.; Zhang, R.; Liu, B.; Zhao, B.; Yu, Y. Leveraging sequence-to-sequence learning for online non-intrusive load monitoring in edge device. Int. J. Electr. Power Energy Syst. 2023, 148, 108910. [Google Scholar] [CrossRef]
Murray, D.; Stankovic, L.; Stankovic, V. Transparent AI: Explainability of deep learning based load disaggregation. In Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, BuildSys ’21, Coimbra, Portugal, 17–18 November 2021; pp. 268–271. [Google Scholar] [CrossRef]
Zhang, C.; Zhong, M.; Wang, Z.; Goddard, N.; Sutton, C. Sequence-to-Point Learning With Neural Networks for Non-Intrusive Load Monitoring. Proc. AAAI Conf. Artif. Intell. 2018, 32. [Google Scholar] [CrossRef]
Yu, W.; Yang, L.; Liu, X. AugLPN-NILM: Augmented lightweight parallel network for NILM embedding attention module over sequence to point. Sustain. Energy Grids Netw. 2024, 38, 101378. [Google Scholar] [CrossRef]
Cho, J.; Hu, Z.; Sartipi, M. Non-Intrusive A/C Load Disaggregation Using Deep Learning. In Proceedings of the 2018 IEEE/PES Transmission and Distribution Conference and Exposition (T&D), Denver, CO, USA, 16–19 April 2018; pp. 1–5. [Google Scholar] [CrossRef]
Zhang, X.Y.; Watkins, C.; Kuenzel, S. Multi-quantile recurrent neural network for feeder-level probabilistic energy disaggregation considering roof-top solar energy. Eng. Appl. Artif. Intell. 2022, 110, 104707. [Google Scholar] [CrossRef]
Bimenyimana, T. Using Machine Learning and Deep Learning for Load Disaggregation and Recognition of Activities in Household. Master’s Thesis, Carleton University, Ottawa, ON, Canada, 2020. [Google Scholar]
Junfei, W. Deep Learning on Smart Meter Data: Non-Intrusive Load Monitoring and Stealthy Black-Box Attacks. Master’s Thesis, The University of Western Ontario, London, ON, Canada, 2020. [Google Scholar]
Kelly, J.; Knottenbelt, W. Neural NILM: Deep Neural Networks Applied to Energy Disaggregation. In Proceedings of the 2nd ACM International Conference on Embedded Systems for Energy-Efficient Built Environments, BuildSys ’15, Seoul, Republic of Korea, 4–5 November 2015; pp. 55–64. [Google Scholar] [CrossRef]
Tan, W.; Guo, W.; Rao, F.; Che, L. Machine-learning based decomposition and monitoring of behind-the-meter resources. Electr. J. 2022, 35, 107131. [Google Scholar] [CrossRef]
Bejarano, G.; DeFazio, D.; Ramesh, A. Deep Latent Generative Models for Energy Disaggregation. Proc. AAAI Conf. Artif. Intell. 2019, 33, 850–857. [Google Scholar] [CrossRef]
Serafini, L.; Tanoni, G.; Principi, E.; Spinsante, S.; Squartini, S. A Multiple Instance Regression Approach to Electrical Load Disaggregation. In Proceedings of the 2022 30th European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 29 August–2 September 2022; pp. 1666–1670. [Google Scholar] [CrossRef]
Xuan, Y.; Pang, C.; Yu, H.; Zeng, X.; Chen, Y. Load Energy Decomposition Algorithm Based on Improved Bidirectional Transformer Combined With Time-Sensing Self-Attention. IEEE Access 2024, 12, 75625–75639. [Google Scholar] [CrossRef]
Praneeth, V.B. Deep Neural Networks Based Disaggregation of Swedish Household Energy Consumption. Master’s Thesis, Blekinge Institute of Technology, Karlskrona, Sweden, 2020. [Google Scholar]
Aghera, R.; Chilana, S.; Garg, V.; Reddy, R. A Deep Learning Technique using Low Sampling rate for residential Non Intrusive Load Monitoring. arXiv 2021, arXiv:2111.05120. [Google Scholar] [CrossRef]
Kim, J.; Le, T.T.H.; Kim, H. Nonintrusive Load Monitoring Based on Advanced Deep Learning and Novel Signature. Comput. Intell. Neurosci. 2017, 2017, 4216281. [Google Scholar] [CrossRef]
Yu, H.; Jiang, Z.; Li, Y.; Zhou, J.; Wang, K.; Cheng, Z.; Gu, Q. A Multi-objective Non-intrusive Load Monitoring Method Based on Deep Learning. IOP Conf. Ser. Mater. Sci. Eng. 2019, 486, 012110. [Google Scholar] [CrossRef]
Najafi, B.; Di Narzo, L.; Rinaldi, F.; Arghandeh, R. Machine learning based disaggregation of air-conditioning loads using smart meter data. IET Gener. Transm. Distrib. 2020, 14, 4755–4762. [Google Scholar] [CrossRef]
He, W.; Chai, Y. An Empirical Study on Energy Disaggregation via Deep Learning. In Proceedings of the 2016 2nd International Conference on Artificial Intelligence and Industrial Engineering (AIIE 2016), Beijing, China, 20–21 November 2016; Atlantis Press: Dordrecht, The Netherlands, 2016; pp. 338–342. [Google Scholar] [CrossRef]
Shin, C.; Joo, S.; Yim, J.; Lee, H.; Moon, T.; Rhee, W. Subtask gated networks for non-intrusive load monitoring. Proc. AAAI Conf. Artif. Intell. 2019, 33, 1150–1157. [Google Scholar] [CrossRef]
Kaselimi, M.; Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. A Generative Adversarial Gated Recurrent Network for Power Disaggregation & Consumption Awareness. In Proceedings of the NeurIPS 2020 Workshop on Tackling Climate Change with Machine Learning, Virtual, 11–12 December 2020. [Google Scholar]
Çimen, H.; Wu, Y.; Wu, Y.; Terriche, Y.; Vasquez, J.C.; Guerrero, J.M. Deep Learning-Based Probabilistic Autoencoder for Residential Energy Disaggregation: An Adversarial Approach. IEEE Trans. Ind. Inform. 2022, 18, 8399–8408. [Google Scholar] [CrossRef]
Pan, Y.; Liu, K.; Shen, Z.; Cai, X.; Jia, Z. Sequence-To-Subsequence Learning with Conditional Gan for Power Disaggregation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 3202–3206. [Google Scholar] [CrossRef]
Kaselimi, M.; Voulodimos, A.; Protopapadakis, E.; Doulamis, N.; Doulamis, A. EnerGAN: A generative adversarial Network for energy disaggregation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 1578–1582. [Google Scholar] [CrossRef]
Bao, K.; Ibrahimov, K.; Wagner, M.; Schmeck, H. Enhancing neural non-intrusive load monitoring with generative adversarial networks. Energy Inform. 2018, 1, 18. [Google Scholar] [CrossRef]
Sewwandi, W.A.H.M.; Anthony, C.N.D.; Disanayaka, D.M.A.M.; Boralessa, M.A.K.S.; Hemapala, K.T.M.U. Non-Intrusive Load Monitoring Using Denoising Autoencoder Neural Networks. In Proceedings of the 2022 IEEE 10th Region 10 Humanitarian Technology Conference (R10-HTC), Hyderabad, India, 16–18 September 2022; pp. 408–413. [Google Scholar] [CrossRef]
Bousbiat, H.; Klemenjak, C.; Elmenreich, W. Exploring Time Series Imaging for Load Disaggregation. In Proceedings of the 7th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, BuildSys ’20, Virtual, 18–20 November 2020; pp. 254–257. [Google Scholar] [CrossRef]
He, X.; Dong, H.; Yang, W.; Hong, J. A Novel Denoising Auto-Encoder-Based Approach for Non-Intrusive Residential Load Monitoring. Energies 2022, 15, 2290. [Google Scholar] [CrossRef]
Langevin, A.; Carbonneau, M.A.; Cheriet, M.; Gagnon, G. Energy disaggregation using variational autoencoders. Energy Build. 2022, 254, 111623. [Google Scholar] [CrossRef]
Virtsionis Gkalinikis, N.; Nalmpantis, C.; Vrakas, D. Variational regression for multi-target energy disaggregation. Sensors 2023, 23, 2051. [Google Scholar] [CrossRef]
Verma, S.; Singh, S.; Majumdar, A. Multi-label LSTM autoencoder for non-intrusive appliance load monitoring. Electr. Power Syst. Res. 2021, 199, 107414. [Google Scholar] [CrossRef]
Li, C.; Guo, F.; Yang, R.; Wang, H.; Yao, B. End-to-end NILM model of industrial power data based on autoencoder transformer. Intell. Control Syst. Eng. 2024, 2, 427. [Google Scholar]
Pan, G.; Wang, H.; Tian, T.; Luo, Y.; Xia, S.; Li, Q. Research on non-intrusive load decomposition model based on parallel multi-scale attention mechanism and its application in smart grid. Energy Build. 2024, 312, 114210. [Google Scholar] [CrossRef]
Yao, L.; Wang, J.; Zhao, C. Non-Intrusive Load Monitoring Based on Multiscale Attention Mechanisms. Energies 2024, 17, 1944. [Google Scholar] [CrossRef]
Xiong, J.; Hong, T.; Zhao, D.; Zhang, Y. MATNilm: Multi-Appliance-Task Non-Intrusive Load Monitoring With Limited Labeled Data. IEEE Trans. Ind. Inform. 2024, 20, 3177–3187. [Google Scholar] [CrossRef]
Wang, L.; Mao, S.; Wilamowski, B.M.; Nelms, R.M. Pre-Trained Models for Non-Intrusive Appliance Load Monitoring. IEEE Trans. Green Commun. Netw. 2022, 6, 56–68. [Google Scholar] [CrossRef]
Cheng, X.; Zhao, M.; Zhang, J.; Wang, J.; Pan, X.; Liu, X. TransNILM: A Transformer-based Deep Learning Model for Non-intrusive Load Monitoring. In Proceedings of the 2022 International Conference on High Performance Big Data and Intelligent Systems (HDIS), Tianjin, China, 10–11 December 2022; pp. 13–20. [Google Scholar] [CrossRef]
Zhou, M.; Shao, S.; Wang, X.; Zhu, Z.; Hu, F. Deep Learning-Based Non-Intrusive Commercial Load Monitoring. Sensors 2022, 22, 5250. [Google Scholar] [CrossRef]
Ang, J.H.; Yusup, Y.; Zaki, S.A.; Salehabadi, A.; Ahmad, M.I. Comprehensive energy consumption of elevator systems based on hybrid approach of measurement and calculation in low-and high-rise buildings of tropical climate towards energy efficiency. Sustainability 2022, 14, 4779. [Google Scholar] [CrossRef]
Virtsionis-Gkalinikis, N.; Nalmpantis, C.; Vrakas, D. SAED: Self-attentive energy disaggregation. Mach. Learn. 2023, 112, 4081–4100. [Google Scholar] [CrossRef]
Varanasi, L.S.; Karri, S.P.K. STNILM: Switch Transformer based Non-Intrusive Load Monitoring for short and long duration appliances. Sustain. Energy Grids Netw. 2024, 37, 101246. [Google Scholar] [CrossRef]
Wang, L.; Mao, S.; Nelms, R.M. Transformer for Nonintrusive Load Monitoring: Complexity Reduction and Transferability. IEEE Internet Things J. 2022, 9, 18987–18997. [Google Scholar] [CrossRef]
Sykiotis, S.; Kaselimi, M.; Doulamis, A.; Doulamis, N. Electricity: An efficient transformer for non-intrusive load monitoring. Sensors 2022, 22, 2926. [Google Scholar] [CrossRef]
Yue, Z.; Witzig, C.R.; Jorde, D.; Jacobsen, H.A. BERT4NILM: A Bidirectional Transformer Model for Non-Intrusive Load Monitoring. In Proceedings of the 5th International Workshop on Non-Intrusive Load Monitoring, NILM’20, Virtual, 18–20 November 2020; pp. 89–93. [Google Scholar] [CrossRef]
Rong, J.; Wang, C.; Zhou, Q.; He, Y.; Wu, H. Enhancing non-intrusive load monitoring through transfer learning with transformer models. Energy Build. 2025, 330, 115334. [Google Scholar] [CrossRef]
Çavdar, t.H.; Feryad, V. Efficient Design of Energy Disaggregation Model with BERT-NILM Trained by AdaX Optimization Method for Smart Grid. Energies 2021, 14, 4649. [Google Scholar] [CrossRef]
Singhal, R.; Mehta, D.; Sharma, K. Efficient energy optimization techniques for smart grids which uses ML and DL algorithms. In Proceedings of the 2023 International Conference on Power Energy, Environment & Intelligent Control (PEEIC), Greater Noida, India, 19–23 December 2023; pp. 1300–1304. [Google Scholar]
D’Incecco, M.; Squartini, S.; Zhong, M. Transfer Learning for Non-Intrusive Load Monitoring. IEEE Trans. Smart Grid 2020, 11, 1419–1429. [Google Scholar] [CrossRef]
Li, Q.; Ye, J.; Song, W.; Tse, Z. Energy Disaggregation with Federated and Transfer Learning. In Proceedings of the 2021 IEEE 7th World Forum on Internet of Things (WF-IoT), New Orleans, LA, USA, 14 June–31 July 2021; pp. 698–703. [Google Scholar] [CrossRef]
Li, D.; Li, J.; Zeng, X.; Stankovic, V.; Stankovic, L.; Xiao, C.; Shi, Q. Transfer learning for multi-objective non-intrusive load monitoring in smart building. Appl. Energy 2023, 329, 120223. [Google Scholar] [CrossRef]
Klemenjak, C.; Faustine, A.; Makonin, S.; Elmenreich, W. On metrics to assess the transferability of machine learning models in non-intrusive load monitoring. arXiv 2019, arXiv:1912.06200. [Google Scholar] [CrossRef]
Houidi, S.; Fourer, D.; Auger, F.; Sethom, H.B.A.; Miègeville, L. Comparative evaluation of non-intrusive load monitoring methods using relevant features and transfer learning. Energies 2021, 14, 2726. [Google Scholar] [CrossRef]
Yasodya, W.; Arampola, S.; Nisakya, M.; Logeeshan, V.; Kumarawadu, S.; Wanigasekara, C. Self-Adaptive Deep Learning Framework for Non-Intrusive Load Monitoring: Addressing Aging Appliance Challenges with Transfer Learning and Pseudo Labeling. IEEE Access 2025, 13, 106524–106539. [Google Scholar] [CrossRef]
Alam, M.K. Non-Intrusive Load Monitoring using SensiML & QuickFeather. 2021. Available online: https://www.hackster.io/taifur/non-intrusive-load-monitoring-using-sensiml-quickfeather-e489cf#toc-building-the-app-and-uploading-the-binary-to-quickfeather-board-8 (accessed on 26 December 2024).
Mari, S.; Bucci, G.; Ciancetta, F.; Fiorucci, E.; Fioravanti, A. An embedded deep learning nilm system: A year-long field study in real houses. IEEE Trans. Instrum. Meas. 2023, 72, 1–15. [Google Scholar] [CrossRef]
McCrory, M.; Marshall, A.H.; Novakovic, A.; Collins, G. Reviewing Non-intrusive Load Monitoring Using a Pilot Study of an IoT Device to Disaggregate Energy Usage. In Proceedings of the International Congress on Information and Communication Technology; Springer: Singapore, 2023; pp. 293–307. [Google Scholar]
Pereira, L.; Ribeiro, M.; Nunes, N. Engineering and deploying a hardware and software platform to collect and label non-intrusive load monitoring datasets. In Proceedings of the 2017 Sustainable Internet and ICT for Sustainability (SustainIT), Funchal, Portugal, 6–7 December 2017; pp. 1–9. [Google Scholar]
Tabanelli, E.; Brunelli, D.; Acquaviva, A.; Benini, L. Trimming feature extraction and inference for MCU-based edge NILM: A systematic approach. IEEE Trans. Ind. Inform. 2021, 18, 943–952. [Google Scholar] [CrossRef]
Trobiani, V. Realization of a Prototype for the Implementation of Non-Intrusive Load Monitoring. Ph.D. Thesis, Politecnico di Torino, Turin, Italy, 2021. [Google Scholar]
Li, W.; Yi, M.; Wang, M.; Wang, Y.; Shi, D.; Wang, Z. Real-Time Energy Disaggregation at Substations With Behind-the-Meter Solar Generation. IEEE Trans. Power Syst. 2021, 36, 2023–2034. [Google Scholar] [CrossRef]
Yi, M.; Wang, M. Bayesian Energy Disaggregation At Substations With Uncertainty Modeling. In Proceedings of the 2023 IEEE Power & Energy Society General Meeting (PESGM), Orlando, FL, USA, 16–20 July 2023; p. 1. [Google Scholar] [CrossRef]
Wang, S.; Li, R.; Evans, A.; Li, F. Regional nonintrusive load monitoring for low voltage substations and distributed energy resources. Appl. Energy 2020, 260, 114225. [Google Scholar] [CrossRef]
Moreno Jaramillo, A.F.; Lopez-Lorente, J.; Laverty, D.; Martinez-del Rincon, J.; Foley, A.M. Identification of Distributed Energy Resources in Low Voltage Distribution Networks. In Proceedings of the 2021 IEEE PES Innovative Smart Grid Technologies Europe (ISGT Europe), Espoo, Finland, 18–21 October 2021; pp. 1–6. [Google Scholar] [CrossRef]
Jaramillo, A.F.M.; Lopez-Lorente, J.; Laverty, D.M.; Brogan, P.V.; Velasquez, S.H.H.; Martinez-Del-Rincón, J.; Foley, A.M. Distributed Energy Resources Electric Profile Identification in Low Voltage Networks Using Supervised Machine Learning Techniques. IEEE Access 2023, 11, 19469–19486. [Google Scholar] [CrossRef]
Yaniv, A.; Beck, Y. Advances in non-intrusive load monitoring for the industrial domain: Challenges, insights, and path forward. Renew. Sustain. Energy Rev. 2025, 210, 115136. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, N.; Chen, Q.; Kirschen, D.S.; Li, P.; Xia, Q. Data-Driven Probabilistic Net Load Forecasting With High Penetration of Behind-the-Meter PV. IEEE Trans. Power Syst. 2018, 33, 3255–3264. [Google Scholar] [CrossRef]
Faustine, A.; Nunes, N.J.; Pereira, L. Efficiency Through Simplicity: MLP-Based Approach for Net-Load Forecasting With Uncertainty Estimates in Low-Voltage Distribution Networks. IEEE Trans. Power Syst. 2025, 40, 46–56. [Google Scholar] [CrossRef]
Wang, T.; Dong, Z. Blockchain-Based Clustered Federated Learning for Non-Intrusive Load Monitoring. IEEE Trans. Smart Grid 2024, 15, 2348–2361. [Google Scholar] [CrossRef]
Ibrahem, M.I.; Mahmoud, M.; Fouda, M.M.; Alsolami, F.; Alasmary, W.; Shen, X. Privacy Preserving and Efficient Data Collection Scheme for AMI Networks Using Deep Learning. IEEE Internet Things J. 2021, 8, 17131–17146. [Google Scholar] [CrossRef]
Zhang, X.Y.; Watkins, C.; Took, C.C.; Kuenzel, S. Privacy boundary determination of smart meter data using an artificial intelligence adversary. Int. Trans. Electr. Energy Syst. 2021, 31, e13020. [Google Scholar] [CrossRef]
Gran, A. Privacy Challenges in AMS/Smart Grid. Master’s Thesis, Department of Computer Science and Media Technology, Gjøvik University College, Gjøvik, Norway, 2013. [Google Scholar]
Zeadally, S.; Pathan, A.S.; Alcaraz, C.; Badra, M. Towards Privacy Protection in Smart Grid. Wirel. Pers. Commun. 2013, 73. [Google Scholar] [CrossRef]
Bill 140, Smart Grid Cyber Security and Privacy Act; Legislative Assembly of Ontario: Toronto, ON, Canada, 2015.
Lee, D.; Hess, D.J. Data privacy and residential smart meters: Comparative analysis and harmonization potential. Util. Policy 2021, 70, 101188. [Google Scholar] [CrossRef]
Wang, W.; Wang, Z.; Chen, Y.; Guo, M.; Chen, Z.; Niu, Y.; Liu, H.; Chen, L. Bats: An appliance safety hazards factors detection algorithm with an improved nonintrusive load disaggregation method. Energies 2021, 14, 3547. [Google Scholar] [CrossRef]
Chandran, L.R.; Karuppasamy, I.; Nair, M.G.; Sun, H.; Krishnakumari, P.K. Compressive Sensing in Power Engineering: A Comprehensive Survey of Theory and Applications, and a Case Study. J. Sens. Actuator Netw. 2025, 14, 28. [Google Scholar] [CrossRef]
Bo, Y.; Hong, L.; Shaoyun, G.; Guoping, L. Improved method of non-intrusive load monitoring based on compressed sensing. Int. J. Electr. Power Energy Syst. 2025, 170, 110736. [Google Scholar] [CrossRef]
Carmine, P.; Luciano, P.; Fabio, P.; Mauro, M.; Riccardo, R.; Gianluca, S. A passive and low-complexity Compressed Sensing architecture based on a charge-redistribution SAR ADC. Integration 2020, 75, 40–51. [Google Scholar] [CrossRef]
Desmond, M.; Duesterwald, E.; Brimijoin, K.; Brachman, M.; Pan, Q. Semi-Automated Data Labeling. In Proceedings of the NeurIPS 2020 Competition and Demonstration Track, Virtual, 6–12 December 2021; Escalante, H.J., Hofmann, K., Eds.; PMLR 2021. Volume 133, pp. 156–169. [Google Scholar]
Nalmpantis, C.; Virtsionis Gkalinikis, N.; Vrakas, D. Neural Fourier Energy Disaggregation. Sensors 2022, 22, 473. [Google Scholar] [CrossRef] [PubMed]
Parson, O.; Ghosh, S.; Weal, M.; Rogers, A. Non-Intrusive Load Monitoring Using Prior Models of General Appliance Types. Proc. AAAI Conf. Artif. Intell. 2021, 26, 356–362. [Google Scholar] [CrossRef]
Athanasoulias, S.; Sykiotis, S.; Temenos, N.; Doulamis, A.; Doulamis, N. A Pre-Training Pruning Strategy for Enabling Lightweight Non-Intrusive Load Monitoring On Edge Devices. In Proceedings of the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Seoul, Republic of Korea, 14–19 April 2024; pp. 249–253. [Google Scholar] [CrossRef]
Tan, Z.; Liu, B.; Xu, Y.; Feng, S.; Zhao, H. Application of self-supervised learning in non-intrusive load monitoring. J. Phys. Conf. Ser. 2023, 2425, 012037. [Google Scholar] [CrossRef]
Makonin, S.; Popowich, F.; Bartram, L.; Gill, B.; Bajić, I.V. AMPds: A public dataset for load disaggregation and eco-feedback research. In Proceedings of the 2013 IEEE Electrical Power & Energy Conference, Halifax, NS, Canada, 21–23 August 2013; pp. 1–6. [Google Scholar]
Ding, D.; Li, J.; Wang, H.; Wang, K. Load recognition with few-shot transfer learning based on meta-learning and relational network in non-intrusive load monitoring. IEEE Trans. Smart Grid 2024, 15, 4861–4876. [Google Scholar] [CrossRef]
Gao, J.; Giri, S.; Kara, E.C.; Bergés, M. Plaid: A public dataset of high-resoultion electrical appliance measurements for load identification research: Demo abstract. In Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings, Memphis, Tennessee, 3–6 November 2014; pp. 198–199. [Google Scholar]
Parkash, B.; Tito, S.R.; Ahmed, M.D.; Ur Rehman, A.; Nieuwoudt, P.; Soltic, S.; Lie, T.T.; Pandey, N. Challenges Associated with Standardization of NILM Methodology. In Proceedings of the 2022 IEEE PES 14th Asia-Pacific Power and Energy Engineering Conference (APPEEC), Melbourne, Australia, 20–23 November 2022; pp. 1–7. [Google Scholar] [CrossRef]

Figure 1. Overview of NILM.

Figure 2. Deep learning techniques for NILM reviewed in this paper.

Figure 3. The CRNN architecture from [56]. Used with permission of the authors of [56].

Figure 4. Sequence-to-point approach proposed by Zhang et al. [47], illustrated in [99]. Used with permission of the corresponding author of [99].

Figure 5. A multi-layered overview of privacy and safety strategies in NILM systems, categorized into four key areas: (1) model-level privacy-preserving techniques such as FL and differential privacy; (2) system-level defenses including layered architectures, gateways, and signal encryption; (3) regulatory and policy-based interventions addressing legal and ethical concerns; and (4) safety-driven NILM applications for hazard detection and emergency response. This layered taxonomy highlights the interplay between technical, infrastructural, and legislative solutions to ensure privacy protection and operational resilience in NILM-enabled smart grids.

Figure 6. Performance of microwave over each training epoch using the BERT4NILM model. The plot illustrates that while accuracy remains consistently high at 0.99 across 20 epochs, the F1-score fluctuates at a relatively low level, indicating the model’s difficulty in effectively detecting microwave usage events.

Figure 7. Actual vs. predicted microwave load comparison on the test set of the UK-DALE dataset using the BERT4NILM model.

Table 1. Comparison of recent review papers based on topics surveyed.

Ref.	DNN	CNN	Sequence	Generative AI (GAI)	Attention-Enhanced Generative AI (A-GAI)	Hybrid	Hardware Case Study	Challenges Discussed
[2]	Yes	Yes	Yes	Yes	No	Yes, CNN-LSTM	No	Cybersecurity risks, data privacy, network security, authentication, tamper resistance and sensor security.
[4]	Yes	Yes	Yes	No	No	Yes, CNN-LSTM	No	High sampling rate requirements, model recalibration.
[16]	No	Yes	Yes	Yes	No	Yes, CNN-LSTM, Variational RNN	No	High sampling rate, overlapping signals, household-specific recalibration, computational cost, large datasets.
[17]	Yes	Yes	Yes	Yes	No	Yes, CNN-GAN	No	Dataset standardization, noise reduction, cross-domain TL.
[18]	Yes	Yes	Yes	Yes	No	Yes, CNN-RNN	No	Generalization, explainability, data privacy
[19]	Yes	Yes	No	No	No	Yes, CAEs	Yes, highlights only lab-based experimental hardware setups	Scalability, cross-dataset validation
[20]	No	Yes	Yes	Yes	No	No	Yes, summarizes NILM hardware platforms and configurations	Standardized benchmarking, toolkit limitations
[21]	Yes	Yes	Yes	No	No	Yes, CNN-LSTM	No	Low-power signal detection, labeled data scarcity
[22]	Yes	Yes	Yes	No	No	No	No	Scalability, privacy, economic constraints
[23]	Yes	Yes	Yes	No	No	No	No	Real-time feedback, scalability, cost-effectiveness
[24]	No	Yes	Yes	Yes	No	No	Yes, covers hardware used in NILM experiments generally in lab oriented studies	Model generalization, overfitting, computational complexity, dataset biases, real-time processing.
[25]	Yes	Yes	Yes	Yes	Yes, only briefly mentioned	Yes, GAN-CNN-GRU	No	Feature selection, accuracy requirements, minimal user training, real-time processing, scalability.
[26]	Yes	Yes	No	No	No	No	Yes, discusses general hardware platforms for NILM systems	Data quality, expanding cost-effectively, hardware compatibility, minimizing power consumption.
This paper	Yes	Yes	Yes	Yes	Yes	Yes, CNN–LSTM, CNN–GRU, CNN–Transformer, VRNN, CRNN, and gated hybrid models with Transformer and self-attention.	Yes, highlights recent hardware case studies and actual real-world implementations, including edge/cloud deployment, microcontroller optimization, and IoT-enabled NILM systems	Handling energy source heterogeneity and aggregate uncertainty, multi-layered privacy and safety, computational/edge deployment challenges, and lack of standardized evaluation metrics.

Table 2. Comparison of models on the UK-DALE dataset [57]. The best results are highlighted in bold. Data used with permission of the corresponding author of [57].

Device	Model	Acc.	F1-Score	MRE	MAE
Kettle	LSTM+	0.994	0.531	0.007	21.26
	GRU+	0.993	0.425	0.008	27.22
	CNN	0.997	0.850	0.003	9.64
	BERT4NILM	0.998	0.907	0.002	6.82
	CTA-BERT	0.999	0.963	0.001	3.36
Fridge	LSTM+	0.573	0.174	0.956	43.74
	GRU+	0.636	0.401	0.901	39.54
	CNN	0.772	0.718	0.758	29.20
	BERT4NILM	0.813	0.756	0.732	32.35
	CTA-BERT	0.812	0.796	0.608	25.32
Washing machine	LSTM+	0.938	0.150	0.067	15.66
	GRU+	0.342	0.018	0.062	68.65
	CNN	0.913	0.173	0.094	11.90
	BERT4NILM	0.966	0.325	0.040	6.98
	CTA-BERT	0.959	0.340	0.046	8.83
Microwave	LSTM+	0.995	0.060	0.014	6.55
	GRU+	0.996	0.266	0.014	6.14
	CNN	0.995	0.341	0.014	6.36
	BERT4NILM	0.995	0.014	0.014	6.57
	CTA-BERT	0.996	0.209	0.013	6.27
Dishwasher	LSTM+	0.976	0.605	0.033	36.36
	GRU+	0.977	0.639	0.035	38.42
	CNN	0.947	0.560	0.069	25.43
	BERT4NILM	0.966	0.667	0.049	16.18
	CTA-BERT	0.979	0.669	0.042	13.32
Average	LSTM+	0.895	0.304	0.215	24.71
	GRU+	0.789	0.350	0.324	32.25
	CNN	0.925	0.528	0.188	16.51
	BERT4NILM	0.948	0.536	0.167	12.41
	CTA-BERT	0.950	0.595	0.142	11.42

Table 3. Comparison of models on the REDD dataset [57]. The best results are highlighted in bold. Data used with permission of the corresponding author of [57].

Device	Model	Acc.	F1-Score	MRE	MAE
Refrigerator	LSTM+	0.789	0.709	0.841	44.82
	GRU+	0.794	0.705	0.829	44.28
	CNN	0.796	0.689	0.822	35.69
	BERT4NILM	0.841	0.756	0.806	32.35
	CTA-BERT	0.887	0.761	0.796	30.69
Washer dryer	LSTM+	0.989	0.125	0.020	35.73
	GRU+	0.922	0.216	0.090	27.63
	CNN	0.970	0.274	0.042	36.12
	BERT4NILM	0.991	0.559	0.022	34.96
	CTA-BERT	0.993	0.694	0.017	18.02
Microwave	LSTM+	0.989	0.604	0.058	17.39
	GRU+	0.988	0.574	0.059	17.72
	CNN	0.986	0.378	0.060	18.59
	BERT4NILM	0.989	0.476	0.057	17.58
	CTA-BERT	0.997	0.599	0.056	17.61
Dishwasher	LSTM+	0.956	0.421	0.056	25.25
	GRU+	0.955	0.034	0.042	25.29
	CNN	0.953	0.298	0.053	25.29
	BERT4NILM	0.969	0.523	0.039	20.49
	CTA-BERT	0.975	0.659	0.045	19.88
Average	LSTM+	0.933	0.465	0.244	30.80
	GRU+	0.915	0.382	0.255	28.73
	CNN	0.926	0.410	0.244	28.92
	BERT4NILM	0.948	0.579	0.231	26.35
	CTA-BERT	0.960	0.632	0.229	22.18

Table 4. Highlights of different NILM system implementations.

Ref.	Methodology	Key Features	Applications	Dataset Used	Evaluation Metrics and Results
[98]	Leveraged QuickFeather board for voltage/current data acquisition and SensiML toolkit for model training.	Low-power IoT device; local edge-based ML processing; cloud-based Firebase integration for remote monitoring.	Appliance classification, energy monitoring, real-time load disaggregation for residential applications.	Custom dataset collected via QuickFeather	Accuracy, precision, recall; accurate classification of appliances like beater, hair dryer, heater, iron, and light; effective cloud data visualization through Firebase.
[99]	Utilized a Seq2point DL model based on a 1D CNN. Edge-based deployment using Arm Cortex-M7 microcontroller to process aggregate power data in real time.	Compact, edge-based NILM system; Seq2point CNN model; sliding-window preprocessing; real-time processing.	Smart homes, real-time energy monitoring, and energy disaggregation for household appliances like dishwashers and fridges.	REFIT	SAE, MAE, accuracy; achieved high accuracy in energy disaggregation; SAE <12% for most appliances; demonstrated adaptability to new environments.
[100]	Developed a custom IoT device for energy pulse monitoring, using manual switching to create 24 h load signatures and applying CNNs and RNNs for data disaggregation.	Custom IoT hardware for energy pulse detection, leveraging CNNs for noise reduction and RNNs for sequential data analysis.	Residential energy disaggregation; potential industrial NILM applications; cost estimation for individual appliances and operational optimization for utilities.	Custom IoT data	Accuracy, precision, recall, F1-score. High accuracy in energy disaggregation; identified appliance-specific energy consumption patterns.
[101]	Developed a hybrid NILM platform with LabJack U6 and Plugwise monitors. Used semi-automatic labeling and a sliding window algorithm.	Scalable hybrid hardware–software system with custom DAQ setup for aggregate monitoring. Enhanced Plugwise system for real-time data collection. Achieved 86% reduction in labeling effort using semi-automatic tools.	Addressing the labeling bottleneck for NILM datasets; residential energy monitoring; potential scalability for industrial energy analysis.	Residential data (17 appliances)	Event detection accuracy, labeling accuracy; reduced labeling time by 86%; high labeling accuracy with semi-automatic tools.
[102]	Optimized edge-based NILM using MCU.	Low-cost microcontroller, time/frequency feature extraction, multi-appliance disaggregation.	Edge-based NILM for smart homes.	DAD (Domestic Appliance Dataset)	Precision, recall, accuracy (RF vs SVM), 82% accuracy with reduced features.
[103]	Developed an MSP430FR5994-based NILM prototype with edge/cloud classification, fast Fourier transform feature extraction.	Edge and cloud-based NILM systems; harmonic content analysis up to 16th odd harmonic; real-time classification via MQTT-enabled IoT hardware.	Smart HEMS and AAL; real-time appliance detection and energy optimization.	WHITED	Event detection accuracy, energy estimation error; real-time high-performance classification with cloud scalability.

Table 5. Overview of privacy threats, techniques, and solutions in NILM across four layers.

Layer	Ref.	Threats Addressed	Techniques and Purpose
Learning Models	[112]	Data leakage from centralized training	Federated Learning (FL): Enables local model training without sharing raw data.
		Inference from neural activations	Differential Privacy: Laplace noise is added to obfuscate sensitive signals.
		Resource constraints on edge devices	Lightweight Optimization: Quantization and pruning improve efficiency.
	[113]	Presence inference attack	CNN-GRU Hybrid + Spoofing: Obscures presence using adversarial energy traces.
	[114]	Appliance usage detection	Smart Meter Aggregation: Combines data from multiple homes to mask individual loads.
System-Level Defenses	[116]	Data tampering across layers	Layered Architecture + Signature Verification: Secures inter-layer communication.
		Unauthorized consumer data access	Smart Gateway: Local access control and isolation of private data.
		Load pattern eavesdropping	Energy Masking: Uses renewable or battery-based noise injection.
		Message spoofing or interception	Hash-Based Keys: Protects data integrity in communication.
		Vulnerable in-network aggregation	Orthogonal Chip Sequences: Secure signal-level encryption.
Regulation/Policy	[115,117]	Secondary use of smart meter data	Informed Consent Protocols: Require user approval for data use.
	[117]	Lack of privacy laws in smart grids	Ontario Energy Data Act: Provincial regulation of smart meter data.
	[118]	Weak governance mechanisms	Independent Regulatory Body: Supervises smart grid data collection.
		Emergency use of NILM without safeguards	Context-Aware Privacy Laws: Tailored to emergency/disaster use cases.
		User discomfort with monitoring	Opt-out/Low Sampling Options: Reduces granularity or allows analog fallback.
Safety Applications	[119]	Battery charging/discharging hazards	Bats Algorithm: Detects unsafe battery usage using sparse coding.
Safety Applications	[12]	Victim tracking in emergencies	NILM for Disaster Response: Identifies activity patterns to aid rescue.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huzzat, A.; Khwaja, A.S.; Alnoman, A.A.; Adhikari, B.; Anpalagan, A.; Woungang, I. A Survey of Traditional and Emerging Deep Learning Techniques for Non-Intrusive Load Monitoring. AI 2025, 6, 213. https://doi.org/10.3390/ai6090213

AMA Style

Huzzat A, Khwaja AS, Alnoman AA, Adhikari B, Anpalagan A, Woungang I. A Survey of Traditional and Emerging Deep Learning Techniques for Non-Intrusive Load Monitoring. AI. 2025; 6(9):213. https://doi.org/10.3390/ai6090213

Chicago/Turabian Style

Huzzat, Annysha, Ahmed S. Khwaja, Ali A. Alnoman, Bhagawat Adhikari, Alagan Anpalagan, and Isaac Woungang. 2025. "A Survey of Traditional and Emerging Deep Learning Techniques for Non-Intrusive Load Monitoring" AI 6, no. 9: 213. https://doi.org/10.3390/ai6090213

APA Style

Huzzat, A., Khwaja, A. S., Alnoman, A. A., Adhikari, B., Anpalagan, A., & Woungang, I. (2025). A Survey of Traditional and Emerging Deep Learning Techniques for Non-Intrusive Load Monitoring. AI, 6(9), 213. https://doi.org/10.3390/ai6090213

Article Menu

A Survey of Traditional and Emerging Deep Learning Techniques for Non-Intrusive Load Monitoring

Abstract

1. Introduction

2. Existing Review Papers on NILM Using DL

2.1. Traditional DL for NILM

2.2. Hybrid DL for NILM

2.3. Non-Intrusive Load Monitoring Architectures

3. Non-Intrusive Load Monitoring Using Traditional DL Techniques

3.1. Deep Neural Networks and Multilayer Perceptrons

3.2. Convolutional Neural Networks

3.3. Sequential and Hybrid Models

4. Emerging Deep Learning Techniques

4.1. Generative Adversarial Networks

Autoencoders

4.2. Attention-Enhanced Models

4.3. Transfer Learning

5. Studies on NILM System Implementation

5.1. Non-Intrusive Load Monitoring Data Acquisition and Labeling

5.1.1. A Custom-Built IoT Device to Monitor Energy Pulses for Load Disaggregation

5.1.2. A Hybrid Hardware–Software Approach via Semi-Automatic Tools for NILM Dataset Labeling

5.2. Edge and Cloud Implementations of NILM Algorithms

5.2.1. Low-Power IoT for NILM: A Case Study Using SensiML and QuickFeather

5.2.2. Non-Intrusive Load Monitoring Prototype for Smart Home and Assisted Living Applications

5.3. Scalable Edge Implementations of NILM Algorithms

5.3.1. Edge-Based NILM Solutions: A Year-Long Deployment in Italian Households

5.3.2. Edge-Based NILM for MCU Systems: A Feature Trimming Approach

6. Challenges and Future Directions

6.1. Energy Source Heterogeneity and Aggregate Data Uncertainty

6.2. Privacy and Safety

6.3. Cost and Implementation Complexity Reduction

6.4. Standardized Comparison of Different Methods

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI