Non-Intrusive Load Monitoring: A Systematic Review of Methods, Scenario-Specific Challenges, and Pathways to Practical Deployment

Xiang, Haotian; Su, Wenjing; Zong, Yi

doi:10.3390/en19081883

Open AccessSystematic Review

Non-Intrusive Load Monitoring: A Systematic Review of Methods, Scenario-Specific Challenges, and Pathways to Practical Deployment

by

Haotian Xiang

¹

,

Wenjing Su

¹

and

Yi Zong

^2,*

¹

School of Electrical and Information, Wuhan Institute of Technology, Wuhan 430205, China

²

Wind and Energy Systems Department, Technical University of Denmark, 4000 Roskilde, Denmark

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(8), 1883; https://doi.org/10.3390/en19081883

Submission received: 27 February 2026 / Revised: 7 April 2026 / Accepted: 9 April 2026 / Published: 13 April 2026

(This article belongs to the Section A1: Smart Grids and Microgrids)

Download

Browse Figures

Versions Notes

Abstract

Non-intrusive load monitoring (NILM), as a key technology for decomposing power loads by analyzing aggregate electrical signals, holds significant importance for advancing refined energy management and achieving carbon peaking and carbon neutrality goals. This paper systematically reviews the technical processes of event-based and state-based NILM methods. It focuses on analyzing key technical challenges in typical application scenarios, such as real-time feedback, energy efficiency optimization, and demand response. These challenges include balancing high real-time performance with accuracy, leveraging edge computing while ensuring privacy protection, and addressing issues like unknown load identification and user behavior modeling. Furthermore, this paper discusses cross-cutting challenges related to data quality, algorithm transferability, system integration, and cost. This review aims to provide a systematic, scenario-based analytical framework to facilitate the transition of NILM from theoretical research to practical application, offering insights for subsequent technological development and engineering implementation.

Keywords:

non-intrusive load monitoring; real-time feedback; energy efficiency optimization; demand response

1. Introduction

Against the strategic backdrop of the global energy transition and the Dual Carbon Goals, achieving the fine-grained management and efficient utilization of energy on the user side has become a crucial component in building a new-type power system [1,2,3]. Non-intrusive load monitoring (NILM) technology can identify the operational states and decompose the energy consumption of individual electrical appliances merely by analyzing the voltage and current signals at the main power entry point.

The core task of an NILM system is to accurately disaggregate diverse electrical loads from the aggregated signal. To systematically describe and handle these loads, ref. [4] classified loads into four types: on/off, finite-state, continuously variable, and always-on. On/off appliances have only two states, “on” and “off”; finite-state appliances operate in a limited number of discrete modes; continuously variable appliances exhibit power consumption that changes smoothly and continuously over a certain range; and always-on appliances theoretically run 24/7, with nearly constant or minimally fluctuating power.

Existing NILM approaches can be broadly divided into two categories: event-based methods and state-based methods. The former starts by detecting abrupt changes in the aggregated load and relies on feature engineering and conventional classification models; the latter is characterized by end-to-end learning, directly mapping continuous data to load states [5,6].

Although these approaches have matured theoretically, the widespread deployment of smart meters, rapid advances in computing power, and deep integration of artificial intelligence technologies have gradually moved NILM from early theoretical exploration and algorithm validation toward piloting and application in real-world scenarios. NILM has demonstrated significant potential in diverse fields such as residential electricity analysis [7], commercial building energy conservation [8], industrial load monitoring [9], and grid-interactive response [10]. However, as application scenarios continue to expand and deepen, NILM systems still face numerous challenges in terms of real-time performance [11], accuracy [12], interpretability [13], privacy protection [14], and cross-scenario generalization [15], which constrain their large-scale deployment and commercial promotion.

Currently, a large body of research has focused on fundamental NILM algorithms—including event detection, feature extraction, load identification, and disaggregation—and considerable progress has been made. Nevertheless, systematic reviews of NILM in practical application scenarios remain relatively scarce, especially regarding key technical requirements across different scenarios, the applicability of existing solutions, and common unresolved challenges. A clear research roadmap and practical guidance have yet to be established.

To highlight the distinction between this review and existing works, Figure 1 classifies representative NILM surveys published in recent years into four categories and summarizes the core idea of each category.

Different from the above 11 reviews that are centered on algorithms, datasets, or generic system architectures, this paper adopts a scenario-driven analytical framework. It decomposes NILM technology into three typical application scenarios—real-time feedback, energy efficiency optimization, and demand response—and reveals the core technical contradictions in each scenario (e.g., the trade-off between real-time performance and accuracy, unknown load identification, and user acceptance personalization). On this basis, we construct a “scenario–challenge–solution” three-dimensional mapping and systematically sort out differentiated technical pathways ranging from edge–cloud collaboration and model compression to user profiling. This review fills the gap of “scenario-oriented guidance for engineering deployment” in the NILM field. For the first time, it incorporates a human-centric perspective (satisfaction quantification, implicit behavior inference) together with hardware deployment (TinyML, NPU) into a unified analytical framework, thereby providing an actionable technical roadmap for researchers and practitioners targeting different application goals.

Before delving into the technical workflows, it is necessary to clarify the inherent challenges that load identification faces in practical applications, as these are fundamental factors constraining NILM system performance. Specifically, these challenges include: (i) load diversity—appliances can be categorized into on/off, finite-state, continuously variable, and always-on types [4]; (ii) multiple operating modes and power fluctuations—the same appliance may operate in multiple states with varying power levels, and transient fluctuations can obscure event boundaries; (iii) concurrent appliance events—multiple devices switching simultaneously produce overlapping electrical signatures, increasing disaggregation difficulty; (iv) unknown or newly added appliances—deployed systems must handle device types not seen during training; (v) cross-household generalization—due to differences in appliance brands, usage habits, and installation environments, a model trained well in one household often suffers significant performance degradation when transferred to another. To address the above challenges, this paper discusses them in relevant sections: multi-state appliances and low-power fluctuations are covered in Section 4.1.3 and Section 4.2.2 (feature extraction); the limitations of concurrent events are analyzed in Section 4.3 (Technical Summary and Discussion); unknown appliance identification is specifically discussed in Section 5.2.2; and cross-household transferability is addressed in Section 5.4.2.

To provide a structured roadmap, the remainder of this paper is organized as follows. Section 2 outlines the methodology used for literature retrieval and analysis. Section 3 traces the historical evolution of NILM, from its conceptual origins to contemporary intelligent paradigms, establishing the context for subsequent technical discussions. Building on this foundation, Section 4 systematically dissects the two dominant technical workflows—event-based and state-based methods—detailing their data acquisition, feature extraction, and load disaggregation processes, along with representative algorithms and a comparative assessment of their respective strengths and limitations. This technical groundwork then serves as a prism through which Section 5 examines NILM’s deployment across three critical real-world scenarios: real-time feedback, energy efficiency optimization, and demand response. For each scenario, we pinpoint the specific technical bottlenecks (e.g., the real-time vs. accuracy trade-off, fine-grained state awareness, reliable closed-loop control) and evaluate how existing solutions address them, before synthesizing persistent cross-cutting challenges such as data privacy, algorithmic transferability, and system integration costs. Finally, Section 6 synthesizes these insights to propose promising future research directions, advocating for hybrid algorithmic models, distributed edge–cloud architectures, and multimodal sensing as key pathways toward large-scale practical deployment.

2. Materials and Methods

This study aims to conduct a critical analysis and synthesis of the existing literature in the field of non-intrusive load monitoring (NILM) and its application scenarios. To ensure comprehensive and representative coverage, we adopted a structured literature identification and analysis framework.

2.1. Literature Search Strategy

The relevant literature was primarily obtained from mainstream academic databases, including Web of Science, IEEE Xplore, and CNKI. The search covered all years from the inception of the NILM concept up to the submission date to capture the full evolution of the field. Core search terms included “non-intrusive load monitoring”, “NILM”, “load monitoring”, “deep learning”, “machine learning”, “application scenarios”, “demand response”, “energy efficiency”, and “real-time feedback”, using various combinations. In addition, we performed backward citation tracing from key references to supplement potentially missed studies.

2.2. Inclusion and Exclusion Criteria

To ensure systematic and representative selection, we established the following criteria:

Research type: Journal articles, conference papers, dissertations, and technical reports were included. Conference abstracts, news items, and unpublished preprints were excluded. Preprints were considered only if they contained original content not yet published elsewhere and were clearly marked as preprints; if a later formal publication existed, the formal version was cited instead.
Language: Only English and Chinese publications were included.
Publication status: Preference was given to formally published articles. Preprints were included only when they provided novel insights and had not been superseded by a published version.
Content relevance: The work must be directly related to NILM technology or its typical application scenarios (real-time feedback, energy efficiency optimization, demand response) and must provide clear algorithm descriptions, experimental designs, or system implementation details.
Methodological requirements: Algorithm-oriented papers were required to employ reproducible experimental setups and be validated on public datasets or real-world measurements. Application-oriented papers were required to present technical implementation paths and effectiveness evaluations in specific scenarios.

2.3. Literature Screening and Selection Process

After initial retrieval, a two-stage screening process was applied as follows:

Relevance screening: Based on titles and abstracts, papers not directly related to NILM or that merely mentioned the topic without substantive contribution were excluded.
Full-text review: The remaining papers were read in full, with priority given to those presenting concrete application cases, system architectures, or empirical findings. Purely conceptual discussions or redundant works were excluded.

A total of 95 core references were finally selected, covering theoretical methods, applications, system implementations, and reviews.

2.4. Thematic Classification Framework

To support a scenario-driven analysis, we developed a multi-dimensional classification framework. Each selected publication was categorized according to the following dimensions:

Research type: methodological studies (new algorithms or technical improvements), application studies (validation in specific scenarios), reviews/surveys, dataset/tool releases, and system implementations.
Technical paradigm: event-based methods (explicit event detection + feature engineering), state-based methods (end-to-end deep learning), and hybrid/emerging paradigms (meta-learning, few-shot learning, federated learning, zero-shot identification, etc.).
Application scenario: real-time feedback, energy efficiency optimization, demand response, or cross-cutting challenges (privacy, transferability, cost, system integration).
Data characteristics: high-frequency (>1 kHz), medium-frequency (1 Hz–1 kHz), low-frequency (<1 Hz), or multimodal (electrical + auxiliary sensing).
Maturity level: theoretical exploration (proof-of-concept), algorithm validation (evaluation on public datasets), prototype system (lab-scale demonstration), and large-scale deployment (field trials or commercial systems).

This framework guided both the literature selection and the subsequent thematic synthesis, enabling identification of gaps between algorithmic advances and practical deployment requirements.

2.5. Distribution of Selected Literature

Figure 2 shows the annual distribution of the 95 selected references by research type (methodology, application, review, system implementation, dataset/tool) for the period 2020–2026. Among these, 13 papers were published before 2020 (approximately 13.7%). These earlier works are deliberately retained because they not only laid the theoretical groundwork for NILM [25] and established key benchmark datasets [26,27,28], but also proposed early methods for event detection and data acquisition [29,30,31]. Importantly, they documented valuable deployment experiences from early smart meter pilot projects and long-term field trials, including hardware reliability, communication stability, and maintenance challenges. These practical insights are essential for understanding the gap between laboratory performance and real-world durability, and inform the scenario-specific challenges analyzed in Section 5.1, Section 5.2, Section 5.3 and Section 5.4.

Based on the classification framework, the 95 references can be categorized as summarized in Table 1. Event-based methods account for 15 papers, state-based deep learning methods for 23 papers, and hybrid/emerging paradigms for 13 papers. Reviews, datasets/tools, system implementation, and feature/clustering-based studies constitute the remaining methodological contributions. In addition, four background papers are cited in the Introduction to contextualize the broader energy policy and business landscape but are not included in the technical classification.

The PRISMA flow diagram is available in the Supplementary Materials.

3. History of Non-Intrusive Load Monitoring Development

The concept of NILM can be traced back to the 1980s. In 1986, a research team at the Massachusetts Institute of Technology (MIT) in the United States applied for a patent to separate the energy consumption of various appliances by analyzing the total electrical entry signal, laying the basic idea for NILM [32,33].

The seminal paper “Nonintrusive Appliance Load Monitoring” by George Hart et al. in 1992 systematically elaborated on load models and disaggregation algorithms, formally establishing the NILM research methodology [25]. At this stage, the technology mainly relied on manually extracted features such as steady-state power changes and harmonic components for identification. Although its feasibility was theoretically verified, its accuracy and generalization ability in practical deployment faced challenges due to algorithmic complexity and low-sampling-rate data [34].

With the widespread adoption of smart meters and improvements in computing power, research entered the data-driven phase. The release of public high-sampling-rate datasets such as REFIT [27], UK-DALE [26], and ECO [28] provided data support for algorithm training and fair comparison [35]. Researchers began to widely adopt traditional machine learning methods such as Hidden Markov Models (HMMs) [36] and Support Vector Machines (SVMs) [37], automatically learning load features from data and significantly improving identification capabilities.

Inspired by the success in computer vision and natural language processing fields, NILM research underwent a transformation. Convolutional neural networks (CNNs), recurrent neural networks (RNNs), and their variants were introduced to directly process electrical sequence data, enabling end-to-end learning from raw data to load disaggregation results [20]. Particularly, the Seq2Seq (Sequence-to-Sequence) architecture combined with attention mechanisms [38] effectively addressed long-term dependency issues, achieving breakthrough improvements in accuracy under complex load overlapping scenarios.

In recent years, the research frontier has exhibited characteristics of multi-technology integration and practical deployment orientation:

Model Architecture Evolution: Advanced architectures such as Transformer have been explored to better model global dependencies [39,40].

Privacy Protection Requirements: To address stringent data privacy regulations, federated learning has been systematically introduced. Studies have demonstrated that by sharing model parameters instead of raw data, high-performance collaborative model training can be achieved while protecting user privacy [41,42].

Trustworthiness and Interpretability: Explainable Artificial Intelligence (XAI) techniques have been applied to explain the decision-making basis of deep models, enhancing the credibility and user acceptance of NILM results [43].

Adaptive Learning Strategies: Reinforcement learning has begun to be used to optimize event detection and disaggregation strategies, enabling systems to adaptively adjust through interaction with the environment [18,44].

Application Scenario Expansion: NILM technology has expanded from residential energy consumption analysis to broader fields such as commercial building energy conservation [45], grid demand response [46], and fault warning [29], providing key technical support for refined energy management and the achievement of carbon peaking and carbon neutrality goals.

The development history of NILM technology is a process from theoretical conception to data-driven and then to intelligent model-driven approaches, continuously evolving toward practicality, privacy protection, and trustworthiness. In the future, with the integration of edge computing, cross-domain transfer learning, and other technologies, NILM is expected to become an indispensable sensing layer intelligent technology for building new power systems while ensuring user privacy and data security.

4. Key Technologies of Non-Intrusive Load Monitoring

Over time, two distinct approaches have emerged in the NILM process. The first is the event-based method, which incorporates an explicit event detection step and primarily employs traditional machine learning techniques. The second is the state-based method, which directly analyzes features within the aggregate signal, predominantly utilizing deep learning approaches.

4.1. Event-Based Methods

Figure 3 illustrates the flowchart of the event-based NILM method. Event detection is its core task, aiming to identify significant change points in the aggregate power sequence that typically correspond to the switching on/off or state transitions of appliances. The feature library is constructed by modeling the electrical characteristics of appliances within a specific usage scenario, serving to resolve load identification problems. If an unsupervised clustering approach is adopted, pre-building a feature library is unnecessary; however, this often results in reduced accuracy and poorer interpretability.

4.1.1. Data Acquisition

Event-based methods heavily rely on medium-to-high-frequency data yet, in practice, this faces a fundamental trade-off between acquisition cost and accuracy. [47] pointed out that while high-sampling-rate data acquisition can extract rich transient features, it relies on specialized hardware, incurs high costs, and is difficult to deploy widely in existing smart meters. Conversely, low-sampling-rate data, while easier to acquire, limits the granularity of feature extraction, affecting the accuracy of load disaggregation.

To address this trade-off, researchers are committed to developing efficient data acquisition and processing systems. For example, the NILM Dashboard system proposed by [30] constructs an efficient, real-time data acquisition and preprocessing pipeline through high-sampling-rate multi-sensor collection, the Sinefit algorithm for data compression, NilmDB for event storage, and the Joule streaming framework, significantly enhancing event detection sensitivity and system response speed. The authors successfully applied the system to ship power monitoring, validating its reliability in complex industrial scenarios.

Therefore, for event-based methods relying on transient features, the core contradiction in their data acquisition strategy lies in balancing feature richness, system cost, and deployment feasibility. The aforementioned work demonstrates that designing intelligent acquisition–compression–processing pipelines can partially alleviate the tension between high-frequency data demands and low-cost deployment.

4.1.2. Event Detection

Event detection is a crucial step for precisely locating appliance state transitions from the aggregate power signal. The methods can be categorized mainly into threshold-based, statistical, signal processing, and machine learning-based approaches. However, with the increasing complexity of electricity usage scenarios, traditional methods face severe challenges in generalization capability, adaptability to low sampling rates, and robustness to high baseload, driving detection algorithms towards more intelligent and adaptive development.

To address the insufficient generalization and low detection accuracy of traditional event detection methods caused by appliance diversification and complex usage scenarios, ref. [48] proposed an adaptive NILM event detection algorithm based on the Bayesian Information Criterion (BIC). By setting different detection thresholds for different types of appliances, it enables the dynamic adjustment of algorithm parameters, enhancing adaptability to various appliance load characteristics.

While many methods rely on high-frequency data, data sampling rates are often low in practical applications. Ref. [49] proposed a low-complexity sliding window event detection algorithm based on variance and mean absolute deviation, specifically targeting low-sampling-rate NILM systems, significantly reducing computational overhead while maintaining detection performance. Furthermore, under low sampling rates, traditional fixed-threshold event detection methods are susceptible to noise and transient fluctuations. To address this, ref. [50] proposed a statistical anomaly detection method by calculating the min/max ratio of adjacent samples and identifying statistical outliers, effectively improving event detection accuracy in low-frequency data without requiring preset thresholds.

Traditional event detection methods, such as the standard Chi-square Goodness-of-Fit test (X2GOP), perform well when the baseload in household aggregate power signals is low, but their performance degrades significantly when the baseload is high, leading to missed or false detections. Ref. [31] enhanced detection capability for small power changes in high-baseload environments by introducing a sliding window and voting mechanism. Additionally, by transforming the power signal to the frequency domain for analysis and using cepstrum smoothing for event detection, the method becomes insensitive to baseload variations, offering better robustness.

Traditional evaluation of NILM event detection typically uses a “tolerance interval”-based matching method, representing an event with a single point in time and using metrics like precision and recall for assessment. This method cannot measure the algorithm’s ability to capture the complete transient process of an event. Ref. [51] was the first to incorporate “detection completeness” as a core dimension into quantitative evaluation, promoting a shift in event detection research from pursuing the “detected moment” to pursuing the “reconstructed process.” This provides a more scientific evaluation tool that better aligns with the needs of downstream NILM tasks.

The development of NILM technology has undergone a clear transition from a reliance on high-sampling-rate data to the pursuit of low-complexity, high-robustness algorithms. A summary is provided in Table 2.

4.1.3. Feature Extraction

Feature extraction can be understood as a mapping

Φ : X \to R^{d}

, where

X

is the space of raw aggregate signals and d is the feature dimension. In event-based NILM, this mapping is explicitly constructed using handcrafted features, which can be categorized into the following types: time-domain features, frequency-domain features, waveform features, sequence features, and derived features.

To address the issues of insufficient recognition accuracy and robustness in traditional NILM methods when facing similar load characteristics, multi-state appliances, and modern smart home environments, ref. [52] proposed a deep learning method based on color-mapped Voltage–Current (V-I) trajectories fused with frequency-domain features. Specifically, this method embeds instantaneous power into V-I trajectories via color mapping and adaptively fuses multi-dimensional features using a channel attention mechanism, significantly improving load classification accuracy and stability.

Due to the varying usage frequencies of different appliances (e.g., washing machines used infrequently, lights used frequently), NILM datasets often suffer from a severe imbalance between majority and minority class samples, causing traditional machine learning models to perform poorly on minority appliances. Ref. [53] addressed the class imbalance problem in NILM by proposing a multi-domain feature extraction method. They constructed a comprehensive feature set containing 39 features from four dimensions, P-Q plane, current waveform, V-I trajectory, and harmonic currents, to enhance discriminative power for appliance identification. This research also emphasized the necessity of high sampling rates for V-I trajectory and harmonic features.

To address the challenge of identifying low-power appliances in NILM, ref. [54] proposed a novel feature extraction method based on fractal and multifractal analysis. Traditional features have limited discriminative power in characterizing weak transient signals from low-power, nonlinear loads. Therefore, the authors extracted a hybrid feature set from appliance startup current transients, including monofractal features (e.g., fractal dimension, Hurst exponent, lacunarity) and multifractal features (e.g., singularity spectrum, Hölder exponent), capable of finely depicting signal complexity, self-similarity, and local singularity. Experiments showed that this feature set significantly improved classification performance for low-power appliances, achieving up to 98.3% recognition accuracy on an optimized deep neural network.

At the feature extraction stage, researchers have employed various strategies like steady-state and transient fusion features, multi-domain features, and fractal features, each with its own advantages and disadvantages. A summary is provided in Table 3.

4.1.4. Load Disaggregation and Identification

This stage primarily employs traditional machine learning methods, i.e., after extracting handcrafted features, models are used for classification or regression analysis. It can also be categorized into supervised and unsupervised methods based on the availability of labeled training data.

To investigate the effectiveness and practicality of traditional machine learning in low-computational-resource environments, ref. [37] proposed a low-frequency sampling-based NILM method using active and reactive power as features and an SVM classifier for complex state identification. This system achieved real-time load disaggregation on an embedded platform with 91% accuracy.

The traditional machine learning disaggregation process involves detecting significant changes in aggregate power or current, extracting steady-state electrical features from the post-event period, and then feeding them into a pre-trained classification model for appliance identification. For instance, ref. [55] systematically compared traditional classifiers like Random Forest (RF), Support Vector Machine, and K-Nearest Neighbors (KNN) for identifying five common household appliances using active power and current as core steady-state features. Their experiments showed that under this setting, the Random Forest algorithm achieved the best overall performance, with an F1-score exceeding 0.99, highlighting the advantage of ensemble learning models in handling such classification tasks.

Following event detection, a critical step is classifying the extracted transient features. In this context, researchers also commonly use clustering methods alongside classifiers. A recent systematic study [56] compared eight clustering algorithms, including K-means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Ordering Points To Identify the Clustering Structure (OPTICS), on the real-world BLUED dataset [57]. The authors found that the density-based OPTICS algorithm showed more advantages in handling scenarios with multiple appliances and an uneven feature space distribution, particularly when combined with dual-stream feature inputs like active power P and reactive power Q or current I. The study also noted that such algorithms can maintain a reliable performance at sampling frequencies no lower than 1/30 Hz, suggesting their potential application in low-frequency smart meter domains.

The load disaggregation stage in event-based methods comprises two main streams: supervised algorithms (handcrafted features + classifier/regressor) and unsupervised algorithms (clustering). Their respective advantages and disadvantages are summarized in Table 4.

4.2. State-Based Methods

Figure 4 illustrates the flowchart of the state-based NILM method. The model is pre-trained, and its network structure automatically learns high-level, discriminative features from raw power sequences. The directly extracted aggregate mixed signal contains various noises. To ensure subsequent model identification accuracy, preprocessing is required, typically involving: data alignment and cleaning, normalization/standardization, and constructing sample sequences.

4.2.1. Data Acquisition

The primary contribution of state-based NILM methods to the data acquisition stage lies in significantly reducing the dependence on data sampling frequency and real-time event detection accuracy, thereby enhancing the feasibility of NILM deployment in real residential environments [5]. However, higher frequency data generally provides richer information and greater model potential [38]. Yet, high-frequency data also brings challenges of a large data volume, increased noise, and higher costs, necessitating a careful trade-off.

4.2.2. Feature Extraction

State-based methods learn the feature extraction mapping

Φ

implicitly through stacked nonlinear transformations.

This step is the origin of the “black box.” Deep learning models possess the ability to automatically learn and extract features. They extract implicit, data-driven, and optimized feature representations, often at the expense of interpretability.

At the feature extraction stage, ref. [58] employed a scattering transform with time-shift invariance to enhance model robustness. The scattering transform, via cascading wavelet transforms and modulus operations, extracts feature representations from high-frequency current signals that are invariant to small time shifts and deformations. Compared to convolutional neural networks that require training many parameters, this method has analytically determined filter coefficients, allowing it to maintain a superior discriminative performance even in small-sample scenarios.

To address the issue that traditional methods using only time-domain information may lose critical features and struggle to identify multi-state processes of appliances, ref. [59] proposed a feature extraction method based on wavelet decomposition. Specifically, the authors first converted the 1D aggregate power time series into 2D images via sliding windows and a repeat vector layer. Then, multi-level wavelet decomposition was applied to decompose the images, extracting time–frequency features in horizontal, vertical, and diagonal directions to capture both low-frequency steady-state components and high-frequency transient components of the signal, providing richer input representations for subsequent models.

To address the inherent low information content of low-frequency data, ref. [60] proposed a novel feature enhancement method that integrates weather data and temporal information into a sequence-to-sequence model, effectively expanding the feature space for low-sampling NILM tasks and improving the model’s representational capacity under limited data conditions.

The aforementioned methods propose three distinct approaches to NILM, each with unique contributions in the feature extraction stage. A comparison of their characteristics is provided in Table 5.

4.2.3. Load Disaggregation and Identification

This step can be categorized into three types: traditional probabilistic models, deep learning models, and hybrid models. Alternatively, based on training paradigms, they can be classified into supervised, semi-supervised, and unsupervised methods.

Traditional load disaggregation models, such as combinatorial optimization and Hidden Markov Models (HMMs), face issues like high computational complexity, susceptibility to local optima, and a limited ability to extract temporal features when dealing with complex, variable load data. To address this, ref. [61] proposed a load disaggregation model incorporating an attention mechanism. Specifically, the authors used a Bidirectional LSTM (BiLSTM) to capture temporal dependencies and introduced channel attention and spatial attention to enhance feature extraction capability, allowing the model to dynamically focus on key information and improve its expressive power through the attention mechanism.

Existing unsupervised NILM methods often suffer from limited disaggregation accuracy due to inaccurate state identification when handling complex appliances with multiple operating modes. To address this problem, ref. [62] proposed a hybrid HDBSCAN-FHMM framework. This framework employs the density-based Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) clustering algorithm to identify, in an unsupervised and adaptive manner, multiple fine-grained operating states of appliances, including low-frequency or transient states. Subsequently, these precisely identified states are used as prior knowledge to initialize an improved Factorial Hidden Markov Model (FHMM), thereby significantly enhancing the model’s ability to characterize appliance dynamics and concurrent operations. Experiments show that this method significantly outperforms several baseline models on the AMPds dataset [63].

Most current NILM research focuses on residential loads, while industrial loads, due to their complex equipment, large power fluctuations, and long operation times, are difficult to address directly with existing methods. Ref. [64] proposed an effective disaggregation method targeting industrial load characteristics. The authors applied a Factorial Hidden Markov Model (FHMM) to industrial load disaggregation, modeling multiple industrial devices with a multi-chain HMM and using the Viterbi algorithm for state estimation to separate individual device power from the aggregate load.

The aforementioned methods are all state-based load disaggregation approaches employing different technical routes, each with its own advantages and disadvantages, summarized in Table 6.

4.3. Technical Summary and Discussion

This section classifies the NILM process, which may or may not include an explicit state/event detection step, into event-based and state-based methods. A systematic summary and comparison of the advantages and disadvantages of these two paradigms is provided. Table 7 contrasts the two routes. Future development trends are also discussed.

While the event-based and state-based dichotomy provides a useful organizing framework, recent advances have introduced methodologies that transcend this classification. Meta-learning approaches, such as those proposed by [15,65], enable NILM models to rapidly adapt to new appliances or households using only a few labeled samples. These methods learn a general similarity function across tasks, effectively “learning to learn” load patterns, which addresses the long-standing challenge of cross-household generalization without requiring extensive retraining.

Zero-shot load identification represents another frontier, where models are trained to recognize appliances not seen during training by leveraging semantic attributes or external knowledge bases [65]. This capability is particularly valuable for handling unknown appliances and evolving load profiles in real-world deployments.

From a mathematical perspective, feature extraction in modern NILM can be formalized as a mapping

Φ : X \to F

, where

X

is the space of raw aggregate signals (e.g., voltage, current, power) and

F

is a feature space. In event-based methods,

Φ

is explicitly constructed through handcrafted features such as harmonic content, V-I trajectory shape descriptors, or fractal dimensions. In state-based deep learning models,

Φ

is implicitly learned via stacked nonlinear transformations:

Φ (x) = σ_{L} (W_{L} \dots σ_{1} (W_{1} x + b_{1}) \dots + b_{L})

, where

σ_{i}

are activation functions and

{W_{i}, b_{i}}

are parameters optimized for the disaggregation objective. This formulation highlights the fundamental difference: event-based methods rely on human-engineered, interpretable features, whereas state-based methods learn features that are task-optimized but often less interpretable.

Hybrid architectures that combine explicit feature engineering with learned representations—such as the attention-enhanced BiLSTM in [61]—offer a promising path to balance interpretability and performance.

Based on the systematic comparison of the advantages and disadvantages of event-based and state-based methods, it is evident that current NILM technology still has room for improvement in both practicality and accuracy. Future development trends will primarily focus on two major directions: reducing data acquisition costs and integrating hybrid models.

Transition from high-frequency to low-frequency data acquisition. Many studies (e.g., ref. [49] developed low-complexity sliding window algorithms, ref. [50] developed statistical anomaly detection algorithms) have achieved reliable event detection under low-sampling-rate conditions. Furthermore, state-based methods further reduce the dependence on data sampling frequency, significantly enhancing the feasibility of NILM deployment in real residential environments [5].
Evolution from single architectures to hybrid architectures. For example, ref. [62] proposed a HDBSCAN-FHMM hybrid architecture, first identifying unsupervised appliance multi-states via density clustering, then initializing an improved FHMM to enhance dynamic characterization capability; ref. [61] combined BiLSTM with attention mechanisms to enhance temporal dependency modeling and key feature focusing ability. These hybrid approaches aim to leverage the strengths of multiple models to improve model practicality and accuracy.

5. Application Scenarios and Practical Challenges

Before proceeding to the scenario-specific analyses, it is useful to summarize the primary application domains of non-intrusive load monitoring (NILM): residential, commercial, and industrial. Residential settings typically exhibit high load diversity, encompassing on/off, finite-state, and variable-power appliances. User engagement is moderate, with potential to change consumption behavior through energy feedback, and the main value comes from voluntary energy saving or time-of-use pricing. Commercial buildings feature more concentrated load types, dominated by HVAC and lighting systems. Energy efficiency decisions are driven by facility managers, resulting in lower user engagement, but regulatory drivers are stronger, requiring compliance with building energy codes and carbon reduction mandates. Industrial environments have even more concentrated load types, dominated by dedicated high-power equipment with complex operational patterns. User engagement is very low, as control is primarily automated, while regulatory requirements are stringent, including demand response obligations and power quality standards. The following three subsections examine representative scenarios that cut across these domains: real-time feedback (Section 5.1), energy efficiency optimization (Section 5.2), and demand response (Section 5.3). Section 5.4 then addresses the cross-cutting challenges common to all domains. See the figure for more details Figure 5.

5.1. Real-Time Feedback

Real-time feedback involves providing near-real-time electricity usage details to users (residents/enterprises) to promote behavior change.

5.1.1. High Real-Time Performance and High Accuracy

The value of feedback decays rapidly over time, ideally requiring event detection and identification to be completed within seconds/minutes. However, high-accuracy NILM algorithms, especially those based on deep learning, require a certain length of data window for analysis, which is computationally time-consuming and difficult to meet extremely low latency requirements. Therefore, achieving a balance between lightweight fast algorithms and complex high-precision algorithms is the core challenge in this scenario.

To balance high real-time performance and high accuracy, recent research has shifted from pure algorithm optimization to system-level architecture design. For example, ref. [66] proposed an event-driven NILM system that performs appliance identification using transient active power responses sampled at 100 Hz. Their CNN classifier has only 54,000 parameters and enables real-time power estimation without latency, significantly reducing computational and storage overhead. A complementary direction is edge–cloud collaboration, which distributes the workload between local devices and cloud servers. Ref. [11] addressed the issues of high latency and high computational costs in traditional cloud-based NILM systems by proposing an innovative three-tier edge–cloud collaborative framework. The core idea of this framework is that “edge guarantees real-time, cloud guarantees accuracy.” Specifically, a lightweight eXtreme Gradient Boosting (XGBoost) model is deployed on the resource-constrained edge for real-time data preprocessing and preliminary disaggregation, responding to user requests with minimal latency. Simultaneously, a deep learning Seq2Point model integrating CNN and Transformer is deployed in the cloud, responsible for complex high-precision load disaggregation. Furthermore, their deployment solution integrating Gunicorn and NGINX effectively addressed load balancing issues under high concurrency, significantly improving system stability and throughput.

The proposed edge–cloud collaborative architecture is designed to address the economic and latency barriers associated with transmitting large volumes of raw data. In this architecture, edge devices perform initial preprocessing, feature extraction, and lightweight inference, transmitting only compressed features, detection results, or cases requiring further analysis to the cloud. For instance, ref. [30] demonstrated that the Sinefit algorithm can compress high-frequency voltage and current waveforms into a compact set of parameters, significantly reducing data volume. Similarly, ref. [11] reported that by deploying lightweight XGBoost models at the edge, only disaggregation results and low-frequency summaries are uploaded to the cloud, eliminating the need for continuous high-frequency data streaming. In this setup, the cloud serves as a complementary resource to assist the edge when complex appliances or ambiguous cases exceed the edge’s processing capacity, rather than as the primary processing unit. This layered approach balances on-device efficiency with cloud-assisted capability, avoiding prohibitive communication costs while maintaining high accuracy.

5.1.2. High-Frequency Data and Edge Computing

Real-time feedback often requires high-sampling-rate data [67,68], which imposes significant pressure and costs on smart meter hardware, data communication networks, and cloud storage. To reduce latency and communication burden, algorithms need to run on edge devices such as meters or home gateway nodes, but these are constrained by the computational power, memory, and power consumption of edge devices.

In practice, resolving the contradiction between high-frequency data requirements and limited edge computing resources often requires a systematic dimensionality reduction design from data acquisition to model deployment. The one-year embedded NILM field study by [69] provides an exemplary case. First, at the data source, they proactively reduced the sampling rate to 1/8 Hz, fundamentally reducing the data stream to be processed. Second, at the algorithm level, they adopted a lightweight 1D CNN suitable for extracting patterns from long-term low-frequency data and optimized only for a few major appliances to control computational complexity. Finally, in engineering implementation, they used the X-CUBE-AI toolchain to quantize the model to 8 bits and compile it into a dedicated C code library that runs efficiently on an Arm Cortex-M7 microcontroller (MCU). Ref. [70] proposed a feature space pruning and lightweight inference framework to address the resource bottleneck caused by high sampling rates in NILM. Through mean-reduced precision analysis, they filtered out key time-domain and frequency-domain features, reducing feature dimensions from 71 to five while maintaining classification accuracy (95.15%), achieving 80.56% storage compression and a 5.45-fold. inference speedup, significantly alleviating resource pressure on edge MCUs.

The approach by Mari et al. reduces data complexity at the problem definition stage, fundamentally avoiding the high-frequency processing conflict; while the method by Tabanelli et al. accepts high-frequency data but extracts only the most critical feature subset through careful screening to achieve efficient representation. Ref. [71] does not avoid high-frequency data; instead, it distills high-density information from complex features through multi-feature fusion and model co-design for use by lightweight models. Specifically, they proposed Fusion-ResNet—a lightweight multi-label classification model based on PCA-ICA feature fusion. By fusing features extracted from Principal Component Analysis (PCA) and Independent Component Analysis (ICA), this model maintains the discriminative power of high-frequency data while significantly reducing the number of parameters to only 65,000. Experiments show that even under high-load scenarios with up to 15 appliances running simultaneously, the model maintains high identification accuracy. Ref. [72] systematically proposed a lightweight U-Net disaggregation model incorporating multi-output structures, depthwise separable convolution compression, and channel attention mechanisms, significantly reducing model parameters and computational complexity while maintaining good disaggregation accuracy. Research indicates that combining structured pruning and quantization techniques can greatly optimize models for edge deployment. For example, experiments by [73] showed that applying OTOv3 pruning followed by 8-bit dynamic quantization to the ConvNeXt Small model reduced model size by 89.7%, parameters and MACs by 95%, while accuracy even improved by 3.8%. Ref. [74] proposed a performance-aware NILM model optimization framework that uses quantization and pruning techniques to achieve efficient inference on edge devices, reducing storage requirements by up to 75% and computational complexity by an average of 36.3%.

Edge AI (artificial intelligence) deployment optimization is a multi-level problem. At the algorithm level, researchers reduce computational demands by designing lightweight models; while at the architecture level, ref. [75] leveraged specialized hardware and intelligent compilers to maximize hardware execution efficiency. A Neural Processing Unit (NPU) co-designed with a compiler can transform given computational resources into higher actual throughput, providing a more powerful foundational execution platform for various upper-layer lightweight models. Deep optimization for specific hardware platforms is a key approach to improving edge AI efficiency. For example, the research by [76] demonstrated how to achieve hardware-aware model optimization through the Qualcomm QNN SDK, leveraging NPU parallelism, low-precision computation, and memory subsystem characteristics to achieve significant on-device inference acceleration for multimodal large models.

5.1.3. Effectiveness and Understandability of Feedback Information

Simply listing all appliance switching events has little meaning for users and may instead cause interference. Therefore, transforming load disaggregation results into intuitive, actionable advice and choosing effective delivery timing and methods are key factors influencing the effectiveness of user behavior change.

Traditional NILM models often present disaggregation results as event lists or power curves. This “data dump”-style feedback often makes it difficult for users to understand its actual meaning and even harder to translate into specific energy-saving behaviors. Ref. [13] proposed a multi-target NILM model based on a conditioned fully convolutional denoising autoencoder. This model not only improves load disaggregation accuracy but also significantly enhances model interpretability by introducing a feature-wise linear modulation mechanism and latent space visualization (UMAP). Users can interactively obtain corresponding load disaggregation results by specifying target appliances, while the model internally visualizes the clustering distribution of different appliances in the latent space via Uniform Manifold Approximation and Projection (UMAP), allowing users to intuitively understand the characteristic differences between different energy consumption patterns. Users can also observe smooth transitions between different appliance load patterns by adjusting conditional inputs, thereby understanding the internal logic of energy consumption changes.

Unsupervised NILM can decompose appliance operating states, but its output typically lacks intuitive physical appliance names, constituting the first barrier to user understanding. Directly presenting users with a list of “switching events for Device A, B, C” not only fails to generate effective cognition but may even cause information interference. Therefore, transforming anonymous load trajectories into named appliance events is the primary step in building an effective user feedback system. Addressing this challenge, recent research by [77,78] proposed an automatic appliance labeling framework based on hierarchical decision-making and rough set theory. This method can automatically assign physical names such as “refrigerator” or “air conditioner” to unknown appliances identified by unsupervised NILM by mining universal operational characteristics of appliances. Additionally, the temporal distribution characteristics and pattern correlation characteristics of appliances extracted by such methods also provide data support for optimizing the timing and content organization of feedback delivery.

5.2. Energy Efficiency Optimization

Based on NILM data, we can identify inefficient links and provide automated or suggested optimization strategies.

5.2.1. Load State Identification

Energy efficiency optimization requires not only knowing which appliance is running but also understanding its operating mode, load state, and working efficiency. This requires building more detailed power consumption models for each appliance, posing higher demands on algorithms and requiring large amounts of labeled data for training.

Accurate load state identification is the first step for implementing energy efficiency optimization. With the development of deep learning, models based on attention mechanisms have significantly improved identification accuracy. For example, ref. [79] proposed a parallel multiscale attention mechanism model that simultaneously captures local mutations and global periodic patterns in load power sequences, achieving an F1-score above 0.9 for identifying the operation of multi-state appliances like washing machines and refrigerators on public datasets. Simultaneously, ref. [38] proposed a sequence-to-sequence model integrating attention mechanisms and Transformer, capable of accurately identifying the real-time on/off states of common household appliances such as refrigerators, dishwashers, and washing machines, achieving an overall F1-score of 92.96% on the UK-DALE dataset [26].

To achieve deep energy efficiency optimization, models need to perceive the internal operating states and modes of appliances. The multi-agent NILM architecture proposed by [80] provides a feasible technical path. Their system can not only detect appliance switching events but also distinguish different operating levels of the same appliance and characterize its electrical characteristics during operation by analyzing shape features of V-I trajectories and power envelope types of startup transients.

5.2.2. Unknown Appliance Identification

NILM can only monitor appliances connected through the main circuit. However, some appliances, such as old or faulty ones, may not be modeled or identified, leading to incomplete optimization plans. Furthermore, when new appliances are introduced into a household, the system needs to detect “unknown” patterns and be capable of online learning or labeling through user interaction, which imposes extremely high requirements on algorithm robustness and adaptability.

To address the unknown load identification problem in NILM, ref. [81] proposed a method based on feature fusion and trainable Siamese networks. Their method determines whether a load is unknown by setting a similarity threshold and supports dynamic updates of the feature library and online fine-tuning of the model, enabling effective detection and learning of unmodeled appliances, thereby improving system robustness and adaptability. Ref. [65] proposed the RTNILM framework, which combines metric learning with model-agnostic meta-learning to simultaneously achieve the high-precision detection of unknown appliances, strong robustness to grid noise, and rapid cross-domain adaptation requiring only a few samples within a single model.

Most existing methods require prior knowledge of the number of appliances and power characteristics or rely on appliance-level labeled data, making them difficult to apply in practice. Ref. [82] analyzed two scenarios of new appliance appearance: new appliances cause significant changes in total power value but have similar power sequence shapes; new appliances cause significant changes in total power sequence shape but have similar power values. The authors proposed a new appliance detection method based on power sequence similarity. Specifically, they used a sliding window to partition historical power data into a “template set” and a “reference set.” Dynamic Time Warping (DTW) was used to calculate power similarity and shape similarity, constructing a similarity matrix, and the presence of new appliances was determined by setting similarity thresholds. This method does not rely on the number of appliances, power characteristics, or appliance-level labels and is suitable for low-frequency sampling data common in smart meters.

Researchers have introduced unsupervised algorithms to identify unknown appliances. However, unsupervised NILM systems often misclassify old, faulty, or new unmodeled appliances as known ones due to improper initial clustering threshold settings, leading to incomplete optimization plans. To improve system detection and adaptation capabilities for unknown patterns, ref. [78] proposed a post-processing framework based on density clustering analysis. This method does not rely on preset appliance models. Instead, it automatically identifies and separates unknown appliance samples that were mistakenly merged into known appliance clusters by evaluating density consistency and overlap analysis of initial unsupervised NILM clustering results. Furthermore, by selecting high-confidence samples as typical patterns and using label propagation to correct sample labels, the system can improve clustering purity and provide a reliable data foundation for subsequent online learning without manual labeling.

5.2.3. Optimization Strategies

This section focuses on how to transform identified energy efficiency problems into specific, safe, and user-acceptable automated control instructions or maintenance recommendations. However, changing the operating strategy of one appliance may affect other appliances or user comfort. Therefore, the system needs to possess certain “causal analysis” capabilities to avoid negative experiences from optimization measures.

To make the system better understand the human factors behind energy consumption and possess the ability to perform causal trade-offs among multiple objectives, ref. [83] first constructed a probabilistic model of user electricity consumption behavior based on non-intrusive load disaggregation data, quantitatively defining “habitual usage intervals” and “tolerance intervals.” Subsequently, they innovatively embedded the user satisfaction quantification model together with electricity economy and low-carbon objectives into the optimization engine of a home energy management system. Within this framework, any load scheduling instructions generated by the system are not driven solely by single-point cost optimization but are pre-evaluated for the “measure–impact” causal chain. This approach, which treats user preferences as core constraints for multi-objective collaborative optimization, represents a key direction evolving from “rigid control” to “flexible guidance” and from “single-point optimization” to “system-level causal balance.”

5.3. Demand Response

Demand response (DR) refers to automatically or assisted adjustment of user-side loads based on grid signals (electricity prices, emergency events) for peak shaving and valley filling.

5.3.1. Flexible Load Identification

NILM must accurately identify which loads are flexible (e.g., air conditioners, water heaters) and assess their current and future adjustable capacity. Moreover, during a demand response (DR) event, it is necessary to accurately estimate what the user’s electricity consumption would have been without response measures to calculate the actual response effect. Historical patterns and real-time data from NILM are crucial for baseline prediction but are susceptible to various interfering factors.

Ref. [21] systematically pointed out in their review that flexible load identification based on NILM can be divided into two stages: first, using NILM to disaggregate the electricity consumption patterns of shiftable appliances, and then quantifying the load’s adjustment capability by mining characteristics such as earliest and latest start times.

Ref. [84] proposed an air conditioning load demand response potential assessment method based on non-intrusive load monitoring and Bayesian Physics-Informed Neural Networks (BPINN). This method extracts air conditioner operating states via NILM and combines them with an equivalent thermal parameter model and BPINN for parameter identification and uncertainty quantification, achieving accurate assessment of DR potential without relying on indoor temperature sensors.

5.3.2. Control Reliability and Safety

After the system issues instructions such as “turn off the air conditioner” or “delay starting the washing machine,” it must reliably verify whether the instructions were executed. This requires NILM to have high-confidence real-time monitoring capabilities, forming a closed loop with the control module. Furthermore, automatic control must strictly adhere to the safety boundaries of appliances and users. This requires NILM to not only identify appliances but also perceive whether their current operating state is approaching boundaries.

Defining the electricity safety boundaries on the user side is a prerequisite for implementing safety-constrained demand response. This requires understanding the normal and abnormal electrical characteristics of equipment under different operating conditions. Ref. [85] used NILM data to quantify the load factor, duty cycle, and event interval characteristics of key ship equipment under normal and fault conditions. These electrical signatures and state-correlated data extracted from real environments can be directly used to build more accurate equipment health models and safe operating boundaries.

5.3.3. User Acceptance and Personalization

Users need to comprehend the rationale and mechanisms behind the control actions. While feedback based on NILM can demonstrate the effects of control, overly aggressive or opaque control strategies are likely to trigger user aversion. Moreover, different users have different preferences for comfort, sensitivity to electricity prices, and at-home time patterns. Ideal demand response strategies should be personalized, requiring deep integration of NILM data with user behavior patterns to form user profiles.

Traditional demand response models overemphasize economy, neglecting the impact of changes in electricity usage habits on users’ living comfort. Existing research often fails to fully incorporate actual user electricity consumption data during modeling, resulting in scheduling outcomes disconnected from real user habits. Ref. [86] categorized appliances into three types, fixed (e.g., refrigerator), shiftable (e.g., washing machine), and adjustable (e.g., air conditioner), establishing mathematical models for each. They combined user comfort and economic models to form a comprehensive satisfaction index as the optimization objective. The authors formulated an optimization problem with maximizing user satisfaction as the objective function, considering constraints such as power balance, usage duration, and power fluctuations, and then developed a chaotic map-based adaptive annealing particle swarm optimization algorithm to avoid the local optimum problem of traditional algorithms.

Based on the fine-grained load data obtained through NILM decomposition, ref. [87] deconstructed user acceptance into four quantifiable dimensions through questionnaires: price sensitivity, policy sensitivity, interaction willingness, and green electricity awareness. Using this method, each user obtains a comprehensive “behavioral awareness” score and sub-scores for the four dimensions. Subsequently, a multivariate DBSCAN clustering algorithm automatically groups users with similar characteristics. They selected the three most representative labels as clustering dimensions, interaction willingness, load matching degree, and adjustable potential, serving as the x, y, and z axes. Clustering generated eight different user groups, labeled A (highest quality) to H (lowest quality). Group A represents high-willingness, high-potential, high-matching quality resources, while Group H consists of low-quality users who can be directly ignored.

It is important to note that while explicit questionnaires can provide precise user preference data, they may introduce a form of informational intrusion that differs from the hardware-level non-intrusiveness of NILM. To mitigate this, user profiling can be implemented through implicit behavior inference using historical NILM data. For example, usage patterns—such as preferred appliance operation windows and tolerance for load shifting—can be extracted from long-term disaggregated load curves without requiring active user surveys [86]. The questionnaire-based approach described in [87] represents one extreme of explicit data collection; in practice, a hybrid approach that combines implicit inference with minimal explicit input (e.g., simple opt-in preferences) can achieve personalization while preserving user autonomy and minimizing perceived intrusion.

5.4. Cross-Cutting Challenges

5.4.1. Data Quality and Privacy

Training and validating NILM algorithms require large amounts of data containing both total power and individual sub-circuit/appliance power. Such data collection is costly, involves privacy concerns, and appliance combinations vary widely. Additionally, real-world grids contain various noises and load coupling, affecting accuracy; therefore, models need to possess good robustness.

Traditional federated learning, while not sharing raw data, still exposes gradients that attackers could exploit. Addressing the privacy risk of gradient leakage in federated learning, ref. [14] employed secure multi-party computation for aggregation and used a distributed Markov switching topology to randomly select participating nodes, reducing communication complexity and enhancing system robustness against poisoning attacks.

In federated learning, while gradient exchange avoids raw data leakage, there is a lack of mechanisms to protect data contributors’ rights, which discourages data holders from participating. Ref. [88] proposed a blockchain-based data rights confirmation mechanism for federated learning. Smart contracts record contributions, confirm rights, and automatically distribute benefits, enabling participants to receive economic incentives without privacy infringement, thereby promoting the sustainable utilization of private data in federated learning.

To address privacy concerns in cross-household NILM model training, ref. [89] proposed a peer-to-peer network-based federated learning framework. Specifically, the authors eliminated the central server, allowing nodes to directly communicate and exchange model updates, reducing privacy leakage risks. Nodes only accept model updates from “trusted neighbors,” screening updates through similarity scoring and model accuracy dual mechanisms to reject malicious or low-quality updates. The authors also used a blind RSA + Jaccard similarity protocol to calculate data similarity between nodes without exposing raw data and designed defense mechanisms against adversarial attacks, including protection against model poisoning, membership inference attacks, and PSI inversion attacks.

5.4.2. Algorithm Transferability

A model trained well in one household often suffers significant performance degradation when directly applied to another household with different appliance brands, models, and usage habits. Therefore, models need to possess transferability. Moreover, appliances age, and households acquire new appliances; algorithms need to have online learning capabilities to adapt to new load patterns without requiring complete retraining to avoid increased workload.

Through systematic experiments, ref. [90] found that traditional “domain biases,” such as grid noise and frequency differences, have insignificant impacts on model transfer. Instead, performance degradation mainly stems from inherent feature differences of appliances themselves, e.g., the same type of appliance from different brands or models may have vastly different load profiles. Their research indicates that even within the same appliance category, if its power characteristic curve differs significantly from training data, the model still struggles to identify it accurately. Therefore, NILM systems need online learning capabilities to adapt to new load patterns.

Experiments by [91] showed that even within the same region and time period, model performance in unseen households is significantly lower than in seen households, especially for human-driven appliances like electric kettles. The authors further attempted to improve generalization by grouping training using household metadata, finding that grouping by the number of occupants improved disaggregation accuracy for kettles but had limited effect on stable appliances like refrigerators. Recently, researchers have introduced frameworks like meta-learning. For example, ref. [15] proposed a few-shot transfer model based on meta-learning and relation networks. During training, this model does not aim to memorize specific appliance patterns but learns a general similarity comparison function through massive few-shot tasks. When deployed to a new household, only a few samples of any appliance from that household are needed for the model to perform identification via internal sample similarity. This approach bypasses explicit modeling of behavioral habit differences across households, implicitly learning generalization strategies directly from data, thus providing a potential solution to the cross-household performance degradation problem highlighted by Srivastava et al., especially for behavior-related appliance identification challenges.

Traditional deep learning-based NILM models often suffer significant performance degradation in cross-household and cross-appliance transfer tasks due to data distribution differences and feature specificity. To address this challenge, ref. [92] proposed an Adaptive Fusion Feature Transfer Learning (AFF-TL) method. Its core lies in introducing a convolutional block attention module for the cross-household challenge and an appliance transfer task for the cross-appliance challenge to enable knowledge reuse. The convolutional block attention module automatically enhances common features between source and target households while suppressing features unique to the source household. Experiments show that fine-tuning only the feature adaptation module and output layer with 20–80% of the target household’s labeled data allows the model to effectively adapt to the new environment. The appliance transfer task method assumes that load features of different appliances can be decomposed into common and individual parts. Only a small amount of new appliance data is needed to retrain the feature adaptation module and output module to complete model construction.

To enable load identification models trained on a source domain to quickly adapt to a target domain using very few labeled samples, overcoming domain shift, ref. [93] proposed a Weighted Transferable Random Forest method. Based on Random Forest, which inherently has a good load identification performance and low model complexity, they introduced a transfer learning mechanism to perform targeted updates on the pre-trained forest rather than retraining. Specifically, selective updating does not update all trees in the forest but randomly selects a subset for improvement using the Improved Structure Extension and Reduction (ISER) algorithm, while keeping the rest unchanged. This balances computational efficiency and adaptability. During final prediction, higher weights are assigned to new data from the target domain, while lower weights are assigned to retained old data from the source domain, forming a weighted ensemble prediction to enhance discriminative ability in the new domain.

5.4.3. System Integration and Cost

NILM rarely operates independently; it needs to integrate with home energy management systems, smart meters, IoT platforms, and grid communication protocols, constituting a complex systems engineering challenge. Therefore, the final solution must strike a balance between hardware costs (sampling rate requirements), computational costs (cloud/edge), communication costs, algorithm development/maintenance costs, and the value they bring (energy savings, grid incentives, user satisfaction).

Ref. [94] built a low-cost, real-time, privacy-preserving NILM system. The measurement device used was the relatively inexpensive PZEM-004T module for measuring key parameters like voltage, current, and power. The controller used was the cost-effective ESP32, which integrates Wi-Fi and Bluetooth and has a rich open-source ecosystem. The entire system is built on widely used IoT development boards like the ESP32, eliminating the need for expensive custom chips or industrial-grade data acquisition cards. The learning model used was an optimized lightweight machine learning model—Random Forest—running on the resource-limited ESP32, eliminating the need for expensive external AI acceleration chips or uploading to paid cloud AI services. The authors deployed the trained Random Forest model directly on the ESP32 microcontroller. After data acquisition, inference and classification are performed immediately on the device locally, eliminating network latency caused by uploading data to the cloud, waiting for cloud computation, and receiving results back. Pruning reduced the model size from 365.6 KB to 81.3 KB, and inference time decreased from 310 ms to 71 ms. Local deployment logic also protects user privacy well, preventing data leakage.

5.5. Summary of Application Scenarios

In summary, the three application scenarios impose differentiated requirements on NILM, driving technological evolution along three main threads: architecture, accuracy, and deployment. Real-time feedback pursues response speed at the system level, energy efficiency optimization focuses on perception granularity at the model level, and demand response extends to user interaction at the control level. Meanwhile, privacy, transferability, and cost issues are fundamental challenges that permeate all scenarios. Table 8 presents a comparison of the core technical solutions for the different application scenarios.

6. Future Outlook

After decades of development, non-intrusive load monitoring (NILM) technology has achieved significant breakthroughs at the algorithmic level. However, to make the leap from laboratory research to large-scale commercial application, NILM systems must address challenges related to robustness, real-time performance, privacy security, and user experience in real-world scenarios, all while keeping costs under control. In the future, NILM technology will gradually mature and become practical by advancing along three main pathways: the reconstruction of algorithmic paradigms, the evolution of system architecture, and the expansion of sensing dimensions.

Algorithmic Paradigm Reconstruction: Event-based and state-based methods have long developed in parallel, often viewed as opposing approaches. Event-based methods offer clear physical interpretability but rely on accurate transient detection, making it difficult to handle scenarios with multiple devices operating concurrently. State-based methods excel at end-to-end learning but suffer from weak interpretability. For commercial applications, pursuing the ultimate performance of a single approach often entails high computational costs and deployment risks. Therefore, integrating the strengths of both methods to build a highly robust hybrid system is an effective path to balance accuracy and cost.
System Architecture Evolution: Much current NILM research relies on cloud-centric processing, which faces three major challenges in commercial promotion: first, the communication cost and bandwidth pressure from transmitting massive amounts of high-frequency data; second, security concerns arising from uploading private user data; and third, the difficulty of meeting the low-latency requirements of real-time feedback with cloud processing alone. Consequently, constructing a collaborative cloud–device–edge–end system—encompassing edge-side collection, edge processing, and cloud optimization—and introducing privacy-preserving computation mechanisms is a viable pathway for the large-scale application of NILM.
Sensing Dimension Expansion: Commercial applications demand more from NILM than simply knowing which device is running. They require understanding the operational state of the device, predicting when maintenance is needed, and even inferring the user’s true intentions to provide personalized recommendations. Relying solely on voltage and current signals can be ambiguous in complex electricity usage scenarios, where similar power changes might correspond to completely different device states or user behaviors. To address this, future NILM systems will selectively integrate low-cost, non-intrusive auxiliary signals—such as vibration sensors or acoustic signatures—to resolve ambiguities and enhance functional capabilities in real-world deployments. It is important to clarify that such cross-modal perception is proposed as a scenario-specific enhancement rather than a universal requirement; in most settings, single-point electrical signals remain sufficient for core disaggregation tasks. This targeted expansion preserves the non-intrusive and cost-effective nature of NILM while extending its utility toward higher-level functions like predictive maintenance and user-centric optimization.

In summary, by pursuing hybrid intelligence in algorithms, cloud–device–edge–end synergy in architecture, and multimodal fusion in sensing, NILM technology is poised to overcome its current bottlenecks in accuracy, cost, privacy, and robustness. When NILM can be integrated stably, safely, and affordably into smart home ecosystems and new-type power systems, its role as the “sensing nerve” for refined user-side energy management will herald broad commercial prospects.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/en19081883/s1.

Author Contributions

Conceptualization, H.X. and Y.Z.; methodology, H.X.; investigation, H.X.; writing—original draft preparation, H.X.; writing—review and editing, W.S. and Y.Z.; supervision, W.S.; project administration, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created in this study. Data supporting reported results can be found in the cited references.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
BiLSTM	Bidirectional Long Short-Term Memory
BPINN	Bayesian Physics-Informed Neural Network
CNN	Convolutional Neural Network
DL	Deep Learning
DR	Demand Response
DTW	Dynamic Time Warping
FHMM	Factorial Hidden Markov Model
HDBSCAN	Hierarchical Density-Based Spatial Clustering of Applications with Noise
HMM	Hidden Markov Model
ICA	Independent Component Analysis
IoT	Internet of Things
LSTM	Long Short-Term Memory
MCU	Microcontroller Unit
ML	Machine Learning
NILM	Non-Intrusive Load Monitoring
NPU	Neural Processing Unit
PCA	Principal Component Analysis
RF	Random Forest
RNN	Recurrent Neural Network
Seq2Point	Sequence-to-Point
Seq2Seq	Sequence-to-Sequence
SVM	Support Vector Machine
Transformer	Transformer Neural Network Architecture
UMAP	Uniform Manifold Approximation and Projection
V-I	Voltage–Current
XAI	Explainable Artificial Intelligence
XGBoost	eXtreme Gradient Boosting

References

Wu, L.; Chen, C.; Hu, J.; Wang, C.; Tong, Y. User Side Resource Application and Key Technologies for Flexibility Demand of Renewable Energy Power System. Power Syst. Technol. 2024, 48, 1435–1450. [Google Scholar] [CrossRef]
Li, Y.; Dai, Q.; Yang, F.; Sun, D.; Zhang, X. PEST-SWOT Analysis of New Power System under “Dual Carbon” Goal. Sichuan Electr. Power Technol. 2024, 47, 35–42. [Google Scholar] [CrossRef]
Huang, L.; Ren, Y.; Zhou, G.; Lu, X. Design and Application of the Carbon Generalized System of the Preferences for Energy Optimization for Power Customers. Power Syst. Clean Energy 2024, 40, 70–79. Available online: https://kns.cnki.net/kcms2/article/abstract?v=mysMdCyU6hLWQM2O2e-xiCckUN5lzHRtDWSKjO-llewDB1l7XDXDFcq8TB6ThO2AB9dZrcrYARHZ0_RJKHGp7gRkqHUvtrzBbDSSOY2sE-BsThxQ1wzsiO6KVIK1lXbBudVQRwwWkYXVwHfKQcKAh0ZeXooroGeycnCnRwmzYpZnc-Vh1HX-6Q==&uniplatform=NZKPT&language=CHS (accessed on 8 April 2026).
Al-Khadher, O.; Mukhtaruddin, A.; Ridzuan Hashim, F.; Azizan, M.M.; Mamat, H. An Implementation Framework Overview of Non-Intrusive Load Monitoring. J. Sustain. Dev. Energy Water Environ. Syst. 2023, 11, 1110471. [Google Scholar] [CrossRef]
Rafiq, H.; Manandhar, P.; Rodriguez-Ubinas, E.; Ahmed Qureshi, O.; Palpanas, T. A Review of Current Methods and Challenges of Advanced Deep Learning-Based Non-Intrusive Load Monitoring (NILM) in Residential Context. Energy Build. 2024, 305, 113890. [Google Scholar] [CrossRef]
Breyer, J.; Koerhuis, J.; Alizai, M.H.; Wehrle, K. Practical Insights from Implementing Event-Based NILM Systems. In Proceedings of the 16th ACM International Conference on Future and Sustainable Energy Systems, Rotterdam, The Netherlands, 17–20 June 2025; pp. 751–756. [Google Scholar] [CrossRef]
Amorim, A.N. Data Augmentation for Non-Intrusive Load Monitoring: A Review and Empirical Study. Master’s Thesis, Universidade Federal de Campina Grande, Campina Grande, Brazil, 2025. [Google Scholar]
Gopinath, R.; Kumar, M. DeepEdge-NILM: A Case Study of Non-Intrusive Load Monitoring Edge Device in Commercial Building. Energy Build. 2023, 294, 113226. [Google Scholar] [CrossRef]
Günter, J.; Fabri, L.; Wenninger, S.; Kaymakci, C. Developing a Smart Energy Service Canvas: Fostering Sustainable Manufacturing. Schmalenbach J. Bus. Res. 2025, 77, 95–125. [Google Scholar] [CrossRef]
Yang, N.; Wang, Y.; Zhang, Y.; Yuan, D. Non-Intrusive Load Monitoring for Energy Management in Smart Grids Incorporating EVs. IEEE Trans. Consum. Electron. 2025, 71, 1696–1706. [Google Scholar] [CrossRef]
Xue, J.; Zhang, Y.; Wang, X.; Wang, Y.; Tang, G. Towards Real-world Deployment of NILM Systems: Challenges and Practices. In Proceedings of the 2024 IEEE International Conference on Sustainable Computing and Communications (SustainCom), Kaifeng, China, 30 October–2 November 2024; pp. 16–23. [Google Scholar] [CrossRef]
Shabbir, N.; Vassiljeva, K.; Nourollahi Hokmabad, H.; Husev, O.; Petlenkov, E.; Belikov, J. Comparative Analysis of Machine Learning Techniques for Non-Intrusive Load Monitoring. Electronics 2024, 13, 1420. [Google Scholar] [CrossRef]
García, D.; Pérez, D.; Papapetrou, P.; Díaz, I.; Cuadrado, A.A.; Enguita, J.M.; Domínguez, M. Conditioned Fully Convolutional Denoising Autoencoder for Multi-Target NILM. Neural Comput. Appl. 2025, 37, 10491–10505. [Google Scholar] [CrossRef]
Dong, Y.; Wang, Y.; Gama, M.; Mustafa, M.A.; Deconinck, G.; Huang, X. Privacy-Preserving Distributed Learning for Residential Short-Term Load Forecasting. arXiv 2024, arXiv:2402.01546. [Google Scholar] [CrossRef]
Ding, D.; Li, J.; Wang, H.; Wang, K. Load Recognition With Few-Shot Transfer Learning Based on Meta-Learning and Relational Network in Non-Intrusive Load Monitoring. IEEE Trans. Smart Grid 2024, 15, 4861–4876. [Google Scholar] [CrossRef]
Jaime, H.C.; De Souza, A.D.; Santos Machado, R.C.; Gomes, O.D.S.M. Machine Learning Models for Non-Intrusive Load Monitoring: A Systematic Review and Meta-Analysis. Inventions 2026, 11, 29. [Google Scholar] [CrossRef]
Huzzat, A.; Khwaja, A.S.; Alnoman, A.A.; Adhikari, B.; Anpalagan, A.; Woungang, I. A Survey of Traditional and Emerging Deep Learning Techniques for Non-Intrusive Load Monitoring. AI 2025, 6, 213. [Google Scholar] [CrossRef]
Ravi, D.L. Non-Intrusive Load Monitoring (NILM): A Comprehensive Review. Int. J. Innov. Res. Technol. (IJIRT) 2024, 11. [Google Scholar]
Cruz-Rangel, D.; Ocampo-Martinez, C.; Diaz-Rozo, J. Online Non-Intrusive Load Monitoring: A Review. Energy Nexus 2025, 17, 100348. [Google Scholar] [CrossRef]
Bousbiat, H.; Himeur, Y.; Varlamis, I.; Bensaali, F.; Amira, A. Neural Load Disaggregation: Meta-Analysis, Federated Learning and Beyond. Energies 2023, 16, 991. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Y.; Ma, J. Non-Intrusive Load Monitoring in Smart Grids: A Comprehensive Review. arXiv 2024, arXiv:2403.06474. [Google Scholar] [CrossRef]
Papageorgiou, P.G.; Christoforidis, G.C.; Bouhouras, A.S. NILM in High Frequency Domain: A Critical Review on Recent Trends and Practical Challenges. Renew. Sustain. Energy Rev. 2025, 213, 115497. [Google Scholar] [CrossRef]
El Husseini, F.; Noura, H.N.; Salman, O.; Chahine, K. Machine Learning in Smart Buildings: A Review of Methods, Challenges, and Future Trends. Appl. Sci. 2025, 15, 7682. [Google Scholar] [CrossRef]
Silva, M.D.; Liu, Q. A Review of NILM Applications with Machine Learning Approaches. CMC 2024, 79, 2971–2989. [Google Scholar] [CrossRef]
Hart, G. Nonintrusive Appliance Load Monitoring. Proc. IEEE 1992, 80, 1870–1891. [Google Scholar] [CrossRef]
Kelly, J.; Knottenbelt, W. The UK-DALE Dataset, Domestic Appliance-Level Electricity Demand and Whole-House Demand from Five UK Homes. Sci. Data 2015, 2, 150007. [Google Scholar] [CrossRef] [PubMed]
Murray, D.; Stankovic, L.; Stankovic, V. An Electrical Load Measurements Dataset of United Kingdom Households from a Two-Year Longitudinal Study. Sci. Data 2017, 4, 160122. [Google Scholar] [CrossRef]
Beckel, C.; Kleiminger, W.; Cicchetti, R.; Staake, T.; Santini, S. The ECO Data Set and the Performance of Non-Intrusive Load Monitoring Algorithms. In Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings, Memphis, TN, USA, 3–6 November 2014; pp. 80–89. [Google Scholar] [CrossRef]
Kane, T.J. The NILM Dashboard: Watchstanding and Real-Time Fault Detection Using Non-Intrusive Load Monitoring. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2019. [Google Scholar]
Aboulian, A.; Green, D.H.; Switzer, J.F.; Kane, T.J.; Bredariol, G.V.; Lindahl, P.; Donnal, J.S.; Leeb, S.B. NILM Dashboard: A Power System Monitor for Electromechanical Equipment Diagnostics. IEEE Trans. Ind. Inform. 2019, 15, 1405–1414. [Google Scholar] [CrossRef]
Xie, Z. Study of Event Detection Methods for Non-intrusive Load Monitoring. Master’s Thesis, North China Electric Power University, Beijing, China, 2019. [Google Scholar]
Reeg, C.E. Nonintrusive Load Monitoring for Verification and Diagnostics. Ph.D. Thesis, University of Illinois at Urbana-Champaign, Champaign, IL, USA, 2011. [Google Scholar]
Morello, A. Sviluppo di una Metodologia di “Smart Monitoring” di Tipo Non Intrusivo per Carichi Elettrici Tramite l’Utilizzo di Intelligenza Artificiale. Ph.D. Thesis, Università degli Studi di Padova, Padova, Italy, 2022. [Google Scholar]
Jacopo, A. Non-Intrusive Load Monitoring: Use of Low Resolution Steady State Features to Disaggregate Household Appliances. Master’s Thesis, University of Padova, Padua, Italy, 2020. [Google Scholar]
Held, P. Frequency Invariant Transformation of Periodic Signals for Non-Intrusive Load Monitoring. Ph.D. Thesis, Université de Haute Alsace-Mulhouse, Mulhouse, France, 2019. [Google Scholar]
Kong, W.; Dong, Z.Y.; Hill, D.J.; Ma, J.; Zhao, J.; Luo, F. A hierarchical hidden Markov model framework for home appliance modeling. IEEE Trans. Smart Grid 2016, 9, 3079–3090. [Google Scholar] [CrossRef]
Leksono, E.; Mandhany, A.; Nashirul Haq, I.; Pradipta, J.; Handre Kertha Utama, P.; Fauzi Iskandar, R.; Mahesa Nanda, R. Development of Non-Intrusive Load Monitoring of Electricity Load Classification with Low-Frequency Sampling Based on Support Vector Machine. J. Eng. Technol. Sci. 2023, 55, 109–119. [Google Scholar] [CrossRef]
Irani Azad, M.; Rajabi, R.; Estebsari, A. Nonintrusive Load Monitoring (NILM) Using a Deep Learning Model with a Transformer-Based Attention Mechanism and Temporal Pooling. Electronics 2024, 13, 407. [Google Scholar] [CrossRef]
Wang, L.; Mao, S.; Nelms, R.M. Transformer for Nonintrusive Load Monitoring: Complexity Reduction and Transferability. IEEE Internet Things J. 2022, 9, 18987–18997. [Google Scholar] [CrossRef]
Petralia, A.; Charpentier, P.; Kadhi, Y.; Palpanas, T. NILMFormer: Non-Intrusive Load Monitoring That Accounts for Non-Stationarity. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2, Toronto, ON, Canada, 3–7 August 2025; pp. 4761–4772. [Google Scholar] [CrossRef]
Zhang, Y.; Tang, G.; Huang, Q.; Wang, Y.; Wang, X.; Lou, J. FedNILM: Applying Federated Learning to NILM Applications at the Edge. arXiv 2021, arXiv:2106.07751. [Google Scholar] [CrossRef]
Wang, H.; Si, C.; Liu, G.; Zhao, J.; Wen, F.; Xue, Y. Fed-NILM: A Federated Learning-based Non-intrusive Load Monitoring Method for Privacy-protection. Energy Convers. Econ. 2022, 3, 51–60. [Google Scholar] [CrossRef]
Machlev, R.; Malka, A.; Perl, M.; Levron, Y.; Belikov, J. Explaining the Decisions of Deep Learning Models for Load Disaggregation (NILM) Based on XAI. In Proceedings of the 2022 IEEE Power & Energy Society General Meeting (PESGM), Denver, CO, USA, 17–21 July 2022; pp. 1–5. [Google Scholar] [CrossRef]
Etezadifar, M. Non-Intrusive Load Monitoring: Event Detection Using Reinforcement Learning; Ecole Polytechnique: Montreal, QC, Canada, 2023. [Google Scholar]
Kulathilaka, M.J.S.; Saravanan, S.; Kumarasiri, H.D.H.P.; Logeeshan, V.; Kumarawadu, S.; Wanigasekara, C. NILM for Commercial Buildings: Deep Neural Networks Tackling Nonlinear and Multi-Phase Loads. Energies 2024, 17, 3802. [Google Scholar] [CrossRef]
Çimen, H.; Bazmohammadi, N.; Lashab, A.; Terriche, Y.; Vasquez, J.C.; Guerrero, J.M. An Online Energy Management System for AC/DC Residential Microgrids Supported by Non-Intrusive Load Monitoring. Appl. Energy 2022, 307, 118136. [Google Scholar] [CrossRef]
Revuelta Herrero, J.; Lozano Murciego, Á.; López Barriuso, A.; Hernández De La Iglesia, D.; Villarrubia González, G.; Corchado Rodríguez, J.M.; Carreira, R. Non Intrusive Load Monitoring (NILM): A State of the Art. In Trends in Cyber-Physical Multi-Agent Systems. The PAAMS Collection—15th International Conference, PAAMS 2017; De La Prieta, F., Vale, Z., Antunes, L., Pinto, T., Campbell, A.T., Julián, V., Neves, A.J.R., Moreno, M.N., Eds.; Springer International Publishing: Cham, Switzerland, 2018; Volume 619, pp. 125–138. [Google Scholar] [CrossRef]
Li, Z.; Wang, G.; Zhang, T.; Zheng, Y.; Wang, J.; Xiao, Q. BIC-Based NILM Event Detection Algorithm. In Proceedings of the 2025 IEEE 3rd International Conference on Power Science and Technology (ICPST), Kunming, China, 16–18 May 2025; pp. 1079–1083. [Google Scholar] [CrossRef]
Rehman, A.U.; Lie, T.T.; Valles, B.; Tito, S.R. Event-Detection Algorithms for Low Sampling Nonintrusive Load Monitoring Systems Based on Low Complexity Statistical Features. IEEE Trans. Instrum. Meas. 2020, 69, 751–759. [Google Scholar] [CrossRef]
Azizi, E.; Beheshti, M.T.H.; Bolouki, S. Event Matching Classification Method for Non-Intrusive Load Monitoring. Sustainability 2021, 13, 693. [Google Scholar] [CrossRef]
Liu, Z.H. Research on Multi-Scale Adaptive Load Event Detection Method for NILM. Master’s Thesis, Tianjin University, Tianjin, China, 2025. [Google Scholar] [CrossRef]
Li, X.; Chen, Y.; Jia, X.; Shen, F.; Sun, B.; He, S.; Guo, J. AI-Enhanced Non-Intrusive Load Monitoring for Smart Home Energy Optimization and User-Centric Interaction. Informatics 2025, 12, 55. [Google Scholar] [CrossRef]
Li, Y.; Wang, H.; Yang, Z.; Garcia Marquez, F.P.; Chen, Z.; Yang, J.; Li, Y. A Feature Engineering-Based NILM Framework for Appliance Recognition Considering Data Class Imbalance. IEEJ Trans. Electr. Electron. Eng. 2024, 19, 2012–2023. [Google Scholar] [CrossRef]
Mughees, A.; Kamran, M.; Mughees, N.; Mughees, A.; Ejsmont, K. New Appliance Signatures for NILM Based on Mono-Fractal Features and Multi-Fractal Formalism. IEEE Access 2024, 12, 108986–109000. [Google Scholar] [CrossRef]
Nuran, A.S.; Murti, M.A.; Suratman, F.Y. Non-Intrusive Load Monitoring Method for Appliance Identification Using Random Forest Algorithm. In Proceedings of the 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 8–11 March 2023; pp. 0754–0758. [Google Scholar] [CrossRef]
Etezadifar, M.; Karimi, H.; Mahseredjian, J. Non-Intrusive Load Monitoring: Comparative Analysis of Transient State Clustering Methods. Electr. Power Syst. Res. 2023, 223, 109644. [Google Scholar] [CrossRef]
Filip, A. Blued: A fully labeled public dataset for event-based nonintrusive load monitoring research. In Proceedings of the 2nd Workshop on Data Mining Applications in Sustainability (SustKDD), Toulouse, France, 29 August–2 September 2011; Volume 2012, p. 5. [Google Scholar]
Aguiar, E.L.; Lazzaretti, A.E.; Pipa, D.R. Features Extraction and Selection with the Scattering Transform for Electrical Load Classification. Learn. Nonlinear Model. 2023, 21, 19–35. [Google Scholar] [CrossRef]
Liu, Z.; Zhang, L.; Zhang, L.; Ji, T. Wavelet decomposition-bi-directional long-short term memory neural network with prob-sparse attention for non-intrusive load monitoring. CSEE J. Power Energy Syst. 2023. [Google Scholar]
Li, X.; Wang, Y.; Gao, Y.; Jie, B. Federated Sequence-to-Sequence Learning for NILM with Heterogeneous Data. In Proceedings of the 2025 IEEE International Conference on Power Systems and Smart Grid Technologies (PSSGT), Chongqing, China, 11–13 April 2025; pp. 344–349. [Google Scholar] [CrossRef]
Zhou, Q.Z.; Li, X.M.; Shen, H.Q.; Wu, H.N.; Li, Y.Y.; Rong, J.; Hu, C.H.; Liu, P.P.; Wang, C. Non-intrusive load decomposition of unbalanced data based on attention mechanism. J. Jilin Univ. (Eng. Technol. Ed.) 2024, 56, 239–246. [Google Scholar] [CrossRef]
Tokam, L.W.; Apeke, S.K.; Ouro-Djobo, S.S. Hybrid HDBSCAN-FHMM Approach for Energy Disaggregation in Non-Intrusive Load Monitoring (NILM) Systems. IEEE Access 2025, 13, 89685–89703. [Google Scholar] [CrossRef]
Makonin, S.; Ellert, B.; Bajić, I.V.; Popowich, F. Electricity, Water, and Natural Gas Consumption of a Residential House in Canada from 2012 to 2014. Sci. Data 2016, 3, 160037. [Google Scholar] [CrossRef] [PubMed]
Yang, F.; Liu, B.; Luan, W.; Zhao, B.; Liu, Z.; Xiao, X.; Zhang, R. FHMM Based Industrial Load Disaggregation. In Proceedings of the 2021 6th Asia Conference on Power and Electrical Engineering (ACPEE), Chongqing, China, 8–11 April 2021; pp. 330–334. [Google Scholar] [CrossRef]
Pan, X.; Ye, L.; Weng, D.; Chen, J.; Yin, J. RTNILM: A Deep Robust Transfer Neural Network for Practical Application of NILM. IEEE Trans. Ind. Inf. 2025, 21, 6968–6978. [Google Scholar] [CrossRef]
Athanasiadis, C.; Doukas, D.; Papadopoulos, T.; Chrysopoulos, A. A Scalable Real-Time Non-Intrusive Load Monitoring System for the Estimation of Household Appliance Power Consumption. Energies 2021, 14, 767. [Google Scholar] [CrossRef]
Sahrane, S.; Haddadi, M. Near Real-Time Low Frequency Load Disaggregation. ENP Eng. Sci. J. 2021, 1, 50–54. [Google Scholar] [CrossRef]
Papageorgiou, P.; Mylona, D.; Stergiou, K.; Bouhouras, A.S. A Time-Driven Deep Learning NILM Framework Based on Novel Current Harmonic Distortion Images. Sustainability 2023, 15, 12957. [Google Scholar] [CrossRef]
Mari, S.; Bucci, G.; Ciancetta, F.; Fiorucci, E.; Fioravanti, A. An Embedded Deep Learning NILM System: A Year-Long Field Study in Real Houses. IEEE Trans. Instrum. Meas. 2023, 72, 2531215. [Google Scholar] [CrossRef]
Tabanelli, E.; Brunelli, D.; Acquaviva, A.; Benini, L. Trimming Feature Extraction and Inference for MCU-based Edge NILM: A Systematic Approach. IEEE Trans. Ind. Inform. 2022, 18, 943–952. [Google Scholar] [CrossRef]
Hoosh, S.M.; Kamyshev, I.; Ouerdane, H. Fusion-ResNet: A Lightweight Multi-Label NILM Model Using PCA-ICA Feature Fusion. arXiv 2025, arXiv:2511.12139. [Google Scholar] [CrossRef]
Chen, W.Q. Non-Intrusive Load Identification and Disaggregation based on Deep Learning. Master’s Thesis, South China University of Technology, Guangzhou, China, 2022. [Google Scholar]
Francy, S.; Singh, R. Edge AI: Evaluation of Model Compression Techniques for Convolutional Neural Networks. arXiv 2024, arXiv:2409.02134. [Google Scholar] [CrossRef]
Sykiotis, S.; Athanasoulias, S.; Kaselimi, M.; Doulamis, A.; Doulamis, N.; Stankovic, L.; Stankovic, V. Performance-Aware NILM Model Optimization for Edge Deployment. IEEE Trans. Green Commun. Netw. 2023, 7, 1434–1446. [Google Scholar] [CrossRef]
Bamberg, L.; Minnella, F.; Bosio, R.; Ottati, F.; Wang, Y.; Lee, J.; Lavagno, L.; Fuks, A. eIQ Neutron: Redefining Edge-AI Inference with Integrated NPU and Compiler Innovations. arXiv 2025, arXiv:2509.14388. [Google Scholar] [CrossRef]
Zhu, Y.; Lu, H. Edge-Side NPU Inference Optimization: Adaptation Research of Multimodal Large Models on Qualcomm Platforms. Intell. Data Anal. Int. J. 2025, 30, 1088467X251342172. [Google Scholar] [CrossRef]
Xiao, X.; Luan, W.P.; Liu, B.; Wang, Y.; Yang, J.N.; Liu, Z.S.; Wei, Z. Autonomous Labeling of Unsupervised NILM Results Based on Rough Classification of Appliances. Proc. CSEE 2022, 42, 2462–2474. [Google Scholar] [CrossRef]
Liu, B.; Liu, W.; Luan, W.; Yu, Y.; Zhao, B.; Wang, Y. Automatic Appliance Labeling for Unsupervised NILM Based on Hierarchical Decision-Making. IEEE Trans. Instrum. Meas. 2024, 73, 2515813. [Google Scholar] [CrossRef]
Yao, L.; Wang, J.; Zhao, C. Non-Intrusive Load Monitoring Based on Multiscale Attention Mechanisms. Energies 2024, 17, 1944. [Google Scholar] [CrossRef]
Lazzaretti, A.E.; Renaux, D.P.B.; Lima, C.R.E.; Mulinari, B.M.; Ancelmo, H.C.; Oroski, E.; Pöttker, F.; Linhares, R.R.; Nolasco, L.D.S.; Lima, L.T.; et al. A Multi-Agent NILM Architecture for Event Detection and Load Classification. Energies 2020, 13, 4396. [Google Scholar] [CrossRef]
Lu, L.X. Research on Non-Intrusive Load Monitoring Methods Based on Event Detection and Their Applications. Ph.D. Thesis, Zhejiang University, Hangzhou, China, 2025. [Google Scholar] [CrossRef]
Guo, X.; Wang, C.; Wu, T.; Li, R.; Zhu, H.; Zhang, H. Detecting the Novel Appliance in Non-Intrusive Load Monitoring. Appl. Energy 2023, 343, 121193. [Google Scholar] [CrossRef]
Li, Y.Z. Research on Home Electricity Consumption Behavior Mining and Energy Management Methods Based on NILM. Ph.D. Thesis, Shenyang University of Technology, Shenyang, China, 2025. [Google Scholar] [CrossRef]
Zhang, J.; Liu, B.; Luan, W.; Ren, Y.; Tian, M.; Ma, M. Demand Response Potential Evaluation for Air Conditioning Loads Based on NILM and BPINN. In Proceedings of the 2025 IEEE Industry Applications Society Annual Meeting (IAS), Taipei, Taiwan, 15–20 June 2025; pp. 1–5. [Google Scholar] [CrossRef]
Bredariol, L.G.; Green, D.; Aboulian, A.; Nation, J.C.; Lindahl, D.P.; Leeb, S.B. NILM: A smarter tactical decision aid. Technol. Syst. Ships Day 2017. [Google Scholar]
Kaiyuan, T.; Shifeng, Z.; Gang, W.; Yan, F. Demand Response Strategy Considering User Satisfaction Based on NILM Technology. In Proceedings of the 2023 IEEE International Conference on Power Science and Technology (ICPST), Kunming, China, 5–7 May 2023; pp. 644–651. [Google Scholar] [CrossRef]
Chen, S.; Gan, L.; Chen, C.; Yu, K.; Pi, H.; Qian, Z.; Dai, R. Demand-Response Oriented Multi-Dimension Refined Portrait of Adjustable Resources Based on Load and Survey Data Fusion. Front. Energy Res. 2022, 10, 968368. [Google Scholar] [CrossRef]
Guo, R.; Cheng, X. Research on Data Right Confirmation Mechanism of Federated Learning Based on Blockchain. arXiv 2025. [Google Scholar] [CrossRef]
Agarwal, V.; Ardakanian, O.; Pal, S. Robust Peer-to-Peer Federated Learning for Non-Intrusive Load Monitoring in Smart Homes. Energy Build. 2025, 329, 115209. [Google Scholar] [CrossRef]
Breyer, J.; Jauhari, S.; Glebke, R.; Alizai, M.H.; Stroot, M.; Wehrle, K. Investigating Domain Bias in NILM. In Proceedings of the 11th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Hangzhou, China, 7–8 November 2024; pp. 333–336. [Google Scholar] [CrossRef]
Srivastava, R.S.; Watson, S.; Dimitriou, V. Metadata-Enhanced NILM for Appliance-Level Disaggregation across Diverse Residential Buildings. J. Phys. Conf. Ser. 2025, 3140, 042008. [Google Scholar] [CrossRef]
Li, K.; Feng, J.; Zhang, J.; Xiao, Q. Adaptive Fusion Feature Transfer Learning Method For NILM. IEEE Trans. Instrum. Meas. 2023, 72, 2511612. [Google Scholar] [CrossRef]
Yan, Z.; Hao, P.; Nardello, M.; Brunelli, D.; Wen, H. A Generalizable Load Recognition Method in NILM Based on Transferable Random Forest. IEEE Trans. Instrum. Meas. 2025, 74, 6505312. [Google Scholar] [CrossRef]
Quang, B.N.; Thanh, L.H.; Van, L.T.; Luong, D.C. A Practical Low-Cost NILM Device Based on Tiny Machine Learning. JSTIC-J. Sci. Technol. Inf. Commun. 2024, 1, 16–21. [Google Scholar]

Figure 1. Classification of existing NILM review papers by research perspective. The four categories are: algorithm/model-driven (e.g., [5,16,17,18]); technology/system architecture-driven (e.g., [4,19,20]); data/feature-driven (e.g., [21,22]); and application/scenario-driven (partially) (e.g., [23,24]).

Figure 2. Distribution of literature by research type: overall proportion (1992–2026) and yearly trend (2020–2026).

Figure 3. Flowchart of the event-based NILM method. The pipeline is divided into three layers: data acquisition, processing (event detection and feature extraction), and inference (load identification and disaggregation). Representative algorithms are annotated for each stage.

Figure 4. Flowchart of the state-based NILM method. Unlike event-based methods, state-based approaches skip explicit event detection and directly map preprocessed aggregate signals to appliance-level power via end-to-end deep learning models. The model automatically learns relevant features from the data.

Figure 5. Taxonomy of NILM application scenarios covered in this review. Scenario-specific technical challenges are discussed in Section 5.1, Section 5.2 and Section 5.3, while cross-cutting challenges common to all scenarios are addressed in Section 5.4.

Table 1. Classification of the 95 references.

Category	Number
Technical paradigm
Event-based methods	15
State-based (deep learning) methods	23
Hybrid/emerging paradigms	13
Feature/clustering/other	8
Reviews	14
Datasets/tools	6
System implementation/edge deployment	11
Non-technical background (cited in Introduction)
Background/policy	4
Total
	94

Table 2. Comparison of event detection algorithms.

Method	Advantages	Disadvantages
Adaptive BIC-based detection [48]	Highly adaptable, suitable for various devices	Requires prior parameter adjustment; may lack robustness for non-steady-state loads
Sliding window variance-based method [49]	Low computational cost, suitable for low-sampling-rate data	Sensitive to noise; requires manual threshold setting
Statistical anomaly detection [50]	No threshold required, robust to noise and spikes; suitable for small datasets	High computational complexity; challenging for multi-mode device detection
Enhanced $χ^{2}$ method with cepstrum analysis [31]	Improved robustness; cepstrum method insensitive to high-frequency noise	Parameter adjustment relies on empirical experience

Table 3. Comparison of feature extraction methods in event-based NILM.

Method	Advantages	Disadvantages
Multimodal fusion with color-mapped V-I trajectory [52]	Multimodal feature fusion, color enhancement for better differentiation, attention mechanism optimization	Relies on synchronous sampling, complex preprocessing, high dataset construction cost
Multi-domain feature extraction [53]	Comprehensive feature coverage, addresses class imbalance effectively, systematic feature selection	High-frequency features require high sampling rate, computationally complex, relies on labeled data
Fractal and multifractal analysis [54]	Suitable for low-power devices, can distinguish complex signals from noise, achieves high accuracy	Computationally intensive, requires high sampling rate, does not consider concurrent loads

Table 4. Comparison of load identification algorithms in event-based NILM.

Algorithm	Advantages	Disadvantages
Support Vector Machine (SVM) [37]	Low computational cost, suitable for embedded systems and real-time applications	Complex parameter tuning, insensitive to minor variations in load characteristics, requires labeled training data
Random Forest [55]	Robust to high-dimensional features, strong generalization capability, resistant to overfitting, excellent multi-class recognition performance	Poor model interpretability (black box nature), long training time for large datasets, sensitive to noise in data
Clustering (OPTICS) [56]	No labeled data required, automatic discovery of load events and patterns, suitable for low-frequency sampling data	Clustering results require subsequent labeling for practical use, high computational complexity, sensitive to noise and parameter settings

Table 5. Comparison of feature extraction approaches in state-based NILM.

Method	Advantages	Disadvantages
Scattering Transform [58]	No training required, high computational efficiency; insensitive to time shifts, strong robustness; features are interpretable, suitable for visual analysis	Relies on high-frequency data; feature dimensions can be high, requiring further selection
Wavelet-based 2D Image Analysis [59]	Rich time–frequency features, suitable for multi-state appliances; attention mechanism improves feature selection; supports low-frequency data, highly practical	Complex structure, long training time; sensitive to image transformation and wavelet decomposition parameters
Multi-source Feature Augmentation [60]	Integrates multi-source data, enhances feature richness; supports privacy preservation, suitable for distributed scenarios; applicable to extremely low-frequency data (sampled hourly)	Relies on weather data, may not be suitable for all scenarios; high communication cost in federated learning, slow convergence

Table 6. Comparison of state-based load disaggregation methods.

Method	Advantages	Disadvantages
BiLSTM with Attention Mechanism [61]	Strong temporal modeling capability, suitable for capturing long-term dependencies; attention mechanism focuses on key features, improving disaggregation accuracy; applicable to complex, multi-state loads	High computational complexity, long training time; relies on large amounts of labeled data; sensitive to imbalanced data
HDBSCAN-FHMM Hybrid Framework [62]	Clustering before modeling identifies hard-to-distinguish device states; suitable for unlabeled or low-labeled data; performs well on high-complexity devices	Clustering stage is time-consuming; sensitive to noise; overall model complexity high, resulting in slow inference speed
Dual-Power FHMM for Industrial Loads [64]	Dual-power input enhances ability to distinguish similar load devices; suitable for industrial scenarios with large power fluctuations and many devices; FHMM suitable for modeling parallel states of multiple devices	Requires simultaneous collection of active and reactive power; sensitive to industrial device state changes; supervised learning relies on labels

Table 7. Comparison of event-based and state-based NILM approaches.

Approach	Advantages	Disadvantages
Event-based Method	Strong physical interpretability; fully utilizes high-frequency features; relatively low computational resource requirements; less dependent on large amounts of training data	Highly dependent on event detection accuracy; requires high data sampling rate; weak capability in handling concurrent events; complex feature engineering
State-based Method	More robust to data frequency; excels at handling concurrent and continuous states; automated feature learning; high overall performance potential	Poor interpretability; heavily reliant on labeled data; high computational complexity; may overlook physical significance

Table 8. Comparison of core technical solutions for different application scenarios.

Application Scenario	Core Objective	Key Technical Path	Representative Work
Real-time Feedback	Second-level response + High accuracy	Edge–cloud collaborative architecture	[11]
		Data pipeline + Model efficiency	[69]
		Explainable AI + Automated labeling	[13]
Energy Efficiency Optimization	Fine-grained state awareness	Complex models (V-I trajectory)	[79]
		Unsupervised clustering + Few-shot learning	[78]
		Multi-objective trade-off (comfort + efficiency)	[83]
Demand Response	Flexible load control	Adjustable potential assessment	[84]
		Monitoring–control–validation closed loop	[85]
		User profiling + Personalized strategies	[87]
Cross-cutting Challenges	Deployment feasibility	Federated learning + Privacy preservation	[14]
		Meta-learning + Transfer learning	[15]
		Low-cost edge hardware	[94]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xiang, H.; Su, W.; Zong, Y. Non-Intrusive Load Monitoring: A Systematic Review of Methods, Scenario-Specific Challenges, and Pathways to Practical Deployment. Energies 2026, 19, 1883. https://doi.org/10.3390/en19081883

AMA Style

Xiang H, Su W, Zong Y. Non-Intrusive Load Monitoring: A Systematic Review of Methods, Scenario-Specific Challenges, and Pathways to Practical Deployment. Energies. 2026; 19(8):1883. https://doi.org/10.3390/en19081883

Chicago/Turabian Style

Xiang, Haotian, Wenjing Su, and Yi Zong. 2026. "Non-Intrusive Load Monitoring: A Systematic Review of Methods, Scenario-Specific Challenges, and Pathways to Practical Deployment" Energies 19, no. 8: 1883. https://doi.org/10.3390/en19081883

APA Style

Xiang, H., Su, W., & Zong, Y. (2026). Non-Intrusive Load Monitoring: A Systematic Review of Methods, Scenario-Specific Challenges, and Pathways to Practical Deployment. Energies, 19(8), 1883. https://doi.org/10.3390/en19081883

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Non-Intrusive Load Monitoring: A Systematic Review of Methods, Scenario-Specific Challenges, and Pathways to Practical Deployment

Abstract

1. Introduction

2. Materials and Methods

2.1. Literature Search Strategy

2.2. Inclusion and Exclusion Criteria

2.3. Literature Screening and Selection Process

2.4. Thematic Classification Framework

2.5. Distribution of Selected Literature

3. History of Non-Intrusive Load Monitoring Development

4. Key Technologies of Non-Intrusive Load Monitoring

4.1. Event-Based Methods

4.1.1. Data Acquisition

4.1.2. Event Detection

4.1.3. Feature Extraction

4.1.4. Load Disaggregation and Identification

4.2. State-Based Methods

4.2.1. Data Acquisition

4.2.2. Feature Extraction

4.2.3. Load Disaggregation and Identification

4.3. Technical Summary and Discussion

5. Application Scenarios and Practical Challenges

5.1. Real-Time Feedback

5.1.1. High Real-Time Performance and High Accuracy

5.1.2. High-Frequency Data and Edge Computing

5.1.3. Effectiveness and Understandability of Feedback Information

5.2. Energy Efficiency Optimization

5.2.1. Load State Identification

5.2.2. Unknown Appliance Identification

5.2.3. Optimization Strategies

5.3. Demand Response

5.3.1. Flexible Load Identification

5.3.2. Control Reliability and Safety

5.3.3. User Acceptance and Personalization

5.4. Cross-Cutting Challenges

5.4.1. Data Quality and Privacy

5.4.2. Algorithm Transferability

5.4.3. System Integration and Cost

5.5. Summary of Application Scenarios

6. Future Outlook

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI