1. Introduction
Against the strategic backdrop of the global energy transition and the Dual Carbon Goals, achieving the fine-grained management and efficient utilization of energy on the user side has become a crucial component in building a new-type power system [
1,
2,
3]. Non-intrusive load monitoring (NILM) technology can identify the operational states and decompose the energy consumption of individual electrical appliances merely by analyzing the voltage and current signals at the main power entry point.
The core task of an NILM system is to accurately disaggregate diverse electrical loads from the aggregated signal. To systematically describe and handle these loads, ref. [
4] classified loads into four types: on/off, finite-state, continuously variable, and always-on. On/off appliances have only two states, “on” and “off”; finite-state appliances operate in a limited number of discrete modes; continuously variable appliances exhibit power consumption that changes smoothly and continuously over a certain range; and always-on appliances theoretically run 24/7, with nearly constant or minimally fluctuating power.
Existing NILM approaches can be broadly divided into two categories: event-based methods and state-based methods. The former starts by detecting abrupt changes in the aggregated load and relies on feature engineering and conventional classification models; the latter is characterized by end-to-end learning, directly mapping continuous data to load states [
5,
6].
Although these approaches have matured theoretically, the widespread deployment of smart meters, rapid advances in computing power, and deep integration of artificial intelligence technologies have gradually moved NILM from early theoretical exploration and algorithm validation toward piloting and application in real-world scenarios. NILM has demonstrated significant potential in diverse fields such as residential electricity analysis [
7], commercial building energy conservation [
8], industrial load monitoring [
9], and grid-interactive response [
10]. However, as application scenarios continue to expand and deepen, NILM systems still face numerous challenges in terms of real-time performance [
11], accuracy [
12], interpretability [
13], privacy protection [
14], and cross-scenario generalization [
15], which constrain their large-scale deployment and commercial promotion.
Currently, a large body of research has focused on fundamental NILM algorithms—including event detection, feature extraction, load identification, and disaggregation—and considerable progress has been made. Nevertheless, systematic reviews of NILM in practical application scenarios remain relatively scarce, especially regarding key technical requirements across different scenarios, the applicability of existing solutions, and common unresolved challenges. A clear research roadmap and practical guidance have yet to be established.
To highlight the distinction between this review and existing works,
Figure 1 classifies representative NILM surveys published in recent years into four categories and summarizes the core idea of each category.
Different from the above 11 reviews that are centered on algorithms, datasets, or generic system architectures, this paper adopts a scenario-driven analytical framework. It decomposes NILM technology into three typical application scenarios—real-time feedback, energy efficiency optimization, and demand response—and reveals the core technical contradictions in each scenario (e.g., the trade-off between real-time performance and accuracy, unknown load identification, and user acceptance personalization). On this basis, we construct a “scenario–challenge–solution” three-dimensional mapping and systematically sort out differentiated technical pathways ranging from edge–cloud collaboration and model compression to user profiling. This review fills the gap of “scenario-oriented guidance for engineering deployment” in the NILM field. For the first time, it incorporates a human-centric perspective (satisfaction quantification, implicit behavior inference) together with hardware deployment (TinyML, NPU) into a unified analytical framework, thereby providing an actionable technical roadmap for researchers and practitioners targeting different application goals.
Before delving into the technical workflows, it is necessary to clarify the inherent challenges that load identification faces in practical applications, as these are fundamental factors constraining NILM system performance. Specifically, these challenges include: (i) load diversity—appliances can be categorized into on/off, finite-state, continuously variable, and always-on types [
4]; (ii) multiple operating modes and power fluctuations—the same appliance may operate in multiple states with varying power levels, and transient fluctuations can obscure event boundaries; (iii) concurrent appliance events—multiple devices switching simultaneously produce overlapping electrical signatures, increasing disaggregation difficulty; (iv) unknown or newly added appliances—deployed systems must handle device types not seen during training; (v) cross-household generalization—due to differences in appliance brands, usage habits, and installation environments, a model trained well in one household often suffers significant performance degradation when transferred to another. To address the above challenges, this paper discusses them in relevant sections: multi-state appliances and low-power fluctuations are covered in
Section 4.1.3 and
Section 4.2.2 (feature extraction); the limitations of concurrent events are analyzed in
Section 4.3 (Technical Summary and Discussion); unknown appliance identification is specifically discussed in
Section 5.2.2; and cross-household transferability is addressed in
Section 5.4.2.
To provide a structured roadmap, the remainder of this paper is organized as follows.
Section 2 outlines the methodology used for literature retrieval and analysis.
Section 3 traces the historical evolution of NILM, from its conceptual origins to contemporary intelligent paradigms, establishing the context for subsequent technical discussions. Building on this foundation,
Section 4 systematically dissects the two dominant technical workflows—event-based and state-based methods—detailing their data acquisition, feature extraction, and load disaggregation processes, along with representative algorithms and a comparative assessment of their respective strengths and limitations. This technical groundwork then serves as a prism through which
Section 5 examines NILM’s deployment across three critical real-world scenarios: real-time feedback, energy efficiency optimization, and demand response. For each scenario, we pinpoint the specific technical bottlenecks (e.g., the real-time vs. accuracy trade-off, fine-grained state awareness, reliable closed-loop control) and evaluate how existing solutions address them, before synthesizing persistent cross-cutting challenges such as data privacy, algorithmic transferability, and system integration costs. Finally,
Section 6 synthesizes these insights to propose promising future research directions, advocating for hybrid algorithmic models, distributed edge–cloud architectures, and multimodal sensing as key pathways toward large-scale practical deployment.
2. Materials and Methods
This study aims to conduct a critical analysis and synthesis of the existing literature in the field of non-intrusive load monitoring (NILM) and its application scenarios. To ensure comprehensive and representative coverage, we adopted a structured literature identification and analysis framework.
2.1. Literature Search Strategy
The relevant literature was primarily obtained from mainstream academic databases, including Web of Science, IEEE Xplore, and CNKI. The search covered all years from the inception of the NILM concept up to the submission date to capture the full evolution of the field. Core search terms included “non-intrusive load monitoring”, “NILM”, “load monitoring”, “deep learning”, “machine learning”, “application scenarios”, “demand response”, “energy efficiency”, and “real-time feedback”, using various combinations. In addition, we performed backward citation tracing from key references to supplement potentially missed studies.
2.2. Inclusion and Exclusion Criteria
To ensure systematic and representative selection, we established the following criteria:
Research type: Journal articles, conference papers, dissertations, and technical reports were included. Conference abstracts, news items, and unpublished preprints were excluded. Preprints were considered only if they contained original content not yet published elsewhere and were clearly marked as preprints; if a later formal publication existed, the formal version was cited instead.
Language: Only English and Chinese publications were included.
Publication status: Preference was given to formally published articles. Preprints were included only when they provided novel insights and had not been superseded by a published version.
Content relevance: The work must be directly related to NILM technology or its typical application scenarios (real-time feedback, energy efficiency optimization, demand response) and must provide clear algorithm descriptions, experimental designs, or system implementation details.
Methodological requirements: Algorithm-oriented papers were required to employ reproducible experimental setups and be validated on public datasets or real-world measurements. Application-oriented papers were required to present technical implementation paths and effectiveness evaluations in specific scenarios.
2.3. Literature Screening and Selection Process
After initial retrieval, a two-stage screening process was applied as follows:
Relevance screening: Based on titles and abstracts, papers not directly related to NILM or that merely mentioned the topic without substantive contribution were excluded.
Full-text review: The remaining papers were read in full, with priority given to those presenting concrete application cases, system architectures, or empirical findings. Purely conceptual discussions or redundant works were excluded.
A total of 95 core references were finally selected, covering theoretical methods, applications, system implementations, and reviews.
2.4. Thematic Classification Framework
To support a scenario-driven analysis, we developed a multi-dimensional classification framework. Each selected publication was categorized according to the following dimensions:
Research type: methodological studies (new algorithms or technical improvements), application studies (validation in specific scenarios), reviews/surveys, dataset/tool releases, and system implementations.
Technical paradigm: event-based methods (explicit event detection + feature engineering), state-based methods (end-to-end deep learning), and hybrid/emerging paradigms (meta-learning, few-shot learning, federated learning, zero-shot identification, etc.).
Application scenario: real-time feedback, energy efficiency optimization, demand response, or cross-cutting challenges (privacy, transferability, cost, system integration).
Data characteristics: high-frequency (>1 kHz), medium-frequency (1 Hz–1 kHz), low-frequency (<1 Hz), or multimodal (electrical + auxiliary sensing).
Maturity level: theoretical exploration (proof-of-concept), algorithm validation (evaluation on public datasets), prototype system (lab-scale demonstration), and large-scale deployment (field trials or commercial systems).
This framework guided both the literature selection and the subsequent thematic synthesis, enabling identification of gaps between algorithmic advances and practical deployment requirements.
2.5. Distribution of Selected Literature
Figure 2 shows the annual distribution of the 95 selected references by research type (methodology, application, review, system implementation, dataset/tool) for the period 2020–2026. Among these, 13 papers were published before 2020 (approximately 13.7%). These earlier works are deliberately retained because they not only laid the theoretical groundwork for NILM [
25] and established key benchmark datasets [
26,
27,
28], but also proposed early methods for event detection and data acquisition [
29,
30,
31]. Importantly, they documented valuable deployment experiences from early smart meter pilot projects and long-term field trials, including hardware reliability, communication stability, and maintenance challenges. These practical insights are essential for understanding the gap between laboratory performance and real-world durability, and inform the scenario-specific challenges analyzed in
Section 5.1,
Section 5.2,
Section 5.3 and
Section 5.4.
Based on the classification framework, the 95 references can be categorized as summarized in
Table 1. Event-based methods account for 15 papers, state-based deep learning methods for 23 papers, and hybrid/emerging paradigms for 13 papers. Reviews, datasets/tools, system implementation, and feature/clustering-based studies constitute the remaining methodological contributions. In addition, four background papers are cited in the Introduction to contextualize the broader energy policy and business landscape but are not included in the technical classification.
3. History of Non-Intrusive Load Monitoring Development
The concept of NILM can be traced back to the 1980s. In 1986, a research team at the Massachusetts Institute of Technology (MIT) in the United States applied for a patent to separate the energy consumption of various appliances by analyzing the total electrical entry signal, laying the basic idea for NILM [
32,
33].
The seminal paper “Nonintrusive Appliance Load Monitoring” by George Hart et al. in 1992 systematically elaborated on load models and disaggregation algorithms, formally establishing the NILM research methodology [
25]. At this stage, the technology mainly relied on manually extracted features such as steady-state power changes and harmonic components for identification. Although its feasibility was theoretically verified, its accuracy and generalization ability in practical deployment faced challenges due to algorithmic complexity and low-sampling-rate data [
34].
With the widespread adoption of smart meters and improvements in computing power, research entered the data-driven phase. The release of public high-sampling-rate datasets such as REFIT [
27], UK-DALE [
26], and ECO [
28] provided data support for algorithm training and fair comparison [
35]. Researchers began to widely adopt traditional machine learning methods such as Hidden Markov Models (HMMs) [
36] and Support Vector Machines (SVMs) [
37], automatically learning load features from data and significantly improving identification capabilities.
Inspired by the success in computer vision and natural language processing fields, NILM research underwent a transformation. Convolutional neural networks (CNNs), recurrent neural networks (RNNs), and their variants were introduced to directly process electrical sequence data, enabling end-to-end learning from raw data to load disaggregation results [
20]. Particularly, the Seq2Seq (Sequence-to-Sequence) architecture combined with attention mechanisms [
38] effectively addressed long-term dependency issues, achieving breakthrough improvements in accuracy under complex load overlapping scenarios.
In recent years, the research frontier has exhibited characteristics of multi-technology integration and practical deployment orientation:
Model Architecture Evolution: Advanced architectures such as Transformer have been explored to better model global dependencies [
39,
40].
Privacy Protection Requirements: To address stringent data privacy regulations, federated learning has been systematically introduced. Studies have demonstrated that by sharing model parameters instead of raw data, high-performance collaborative model training can be achieved while protecting user privacy [
41,
42].
Trustworthiness and Interpretability: Explainable Artificial Intelligence (XAI) techniques have been applied to explain the decision-making basis of deep models, enhancing the credibility and user acceptance of NILM results [
43].
Adaptive Learning Strategies: Reinforcement learning has begun to be used to optimize event detection and disaggregation strategies, enabling systems to adaptively adjust through interaction with the environment [
18,
44].
Application Scenario Expansion: NILM technology has expanded from residential energy consumption analysis to broader fields such as commercial building energy conservation [
45], grid demand response [
46], and fault warning [
29], providing key technical support for refined energy management and the achievement of carbon peaking and carbon neutrality goals.
The development history of NILM technology is a process from theoretical conception to data-driven and then to intelligent model-driven approaches, continuously evolving toward practicality, privacy protection, and trustworthiness. In the future, with the integration of edge computing, cross-domain transfer learning, and other technologies, NILM is expected to become an indispensable sensing layer intelligent technology for building new power systems while ensuring user privacy and data security.
4. Key Technologies of Non-Intrusive Load Monitoring
Over time, two distinct approaches have emerged in the NILM process. The first is the event-based method, which incorporates an explicit event detection step and primarily employs traditional machine learning techniques. The second is the state-based method, which directly analyzes features within the aggregate signal, predominantly utilizing deep learning approaches.
4.1. Event-Based Methods
Figure 3 illustrates the flowchart of the event-based NILM method. Event detection is its core task, aiming to identify significant change points in the aggregate power sequence that typically correspond to the switching on/off or state transitions of appliances. The feature library is constructed by modeling the electrical characteristics of appliances within a specific usage scenario, serving to resolve load identification problems. If an unsupervised clustering approach is adopted, pre-building a feature library is unnecessary; however, this often results in reduced accuracy and poorer interpretability.
4.1.1. Data Acquisition
Event-based methods heavily rely on medium-to-high-frequency data yet, in practice, this faces a fundamental trade-off between acquisition cost and accuracy. [
47] pointed out that while high-sampling-rate data acquisition can extract rich transient features, it relies on specialized hardware, incurs high costs, and is difficult to deploy widely in existing smart meters. Conversely, low-sampling-rate data, while easier to acquire, limits the granularity of feature extraction, affecting the accuracy of load disaggregation.
To address this trade-off, researchers are committed to developing efficient data acquisition and processing systems. For example, the NILM Dashboard system proposed by [
30] constructs an efficient, real-time data acquisition and preprocessing pipeline through high-sampling-rate multi-sensor collection, the Sinefit algorithm for data compression, NilmDB for event storage, and the Joule streaming framework, significantly enhancing event detection sensitivity and system response speed. The authors successfully applied the system to ship power monitoring, validating its reliability in complex industrial scenarios.
Therefore, for event-based methods relying on transient features, the core contradiction in their data acquisition strategy lies in balancing feature richness, system cost, and deployment feasibility. The aforementioned work demonstrates that designing intelligent acquisition–compression–processing pipelines can partially alleviate the tension between high-frequency data demands and low-cost deployment.
4.1.2. Event Detection
Event detection is a crucial step for precisely locating appliance state transitions from the aggregate power signal. The methods can be categorized mainly into threshold-based, statistical, signal processing, and machine learning-based approaches. However, with the increasing complexity of electricity usage scenarios, traditional methods face severe challenges in generalization capability, adaptability to low sampling rates, and robustness to high baseload, driving detection algorithms towards more intelligent and adaptive development.
To address the insufficient generalization and low detection accuracy of traditional event detection methods caused by appliance diversification and complex usage scenarios, ref. [
48] proposed an adaptive NILM event detection algorithm based on the Bayesian Information Criterion (BIC). By setting different detection thresholds for different types of appliances, it enables the dynamic adjustment of algorithm parameters, enhancing adaptability to various appliance load characteristics.
While many methods rely on high-frequency data, data sampling rates are often low in practical applications. Ref. [
49] proposed a low-complexity sliding window event detection algorithm based on variance and mean absolute deviation, specifically targeting low-sampling-rate NILM systems, significantly reducing computational overhead while maintaining detection performance. Furthermore, under low sampling rates, traditional fixed-threshold event detection methods are susceptible to noise and transient fluctuations. To address this, ref. [
50] proposed a statistical anomaly detection method by calculating the min/max ratio of adjacent samples and identifying statistical outliers, effectively improving event detection accuracy in low-frequency data without requiring preset thresholds.
Traditional event detection methods, such as the standard Chi-square Goodness-of-Fit test (X2GOP), perform well when the baseload in household aggregate power signals is low, but their performance degrades significantly when the baseload is high, leading to missed or false detections. Ref. [
31] enhanced detection capability for small power changes in high-baseload environments by introducing a sliding window and voting mechanism. Additionally, by transforming the power signal to the frequency domain for analysis and using cepstrum smoothing for event detection, the method becomes insensitive to baseload variations, offering better robustness.
Traditional evaluation of NILM event detection typically uses a “tolerance interval”-based matching method, representing an event with a single point in time and using metrics like precision and recall for assessment. This method cannot measure the algorithm’s ability to capture the complete transient process of an event. Ref. [
51] was the first to incorporate “detection completeness” as a core dimension into quantitative evaluation, promoting a shift in event detection research from pursuing the “detected moment” to pursuing the “reconstructed process.” This provides a more scientific evaluation tool that better aligns with the needs of downstream NILM tasks.
The development of NILM technology has undergone a clear transition from a reliance on high-sampling-rate data to the pursuit of low-complexity, high-robustness algorithms. A summary is provided in
Table 2.
4.1.3. Feature Extraction
Feature extraction can be understood as a mapping , where is the space of raw aggregate signals and d is the feature dimension. In event-based NILM, this mapping is explicitly constructed using handcrafted features, which can be categorized into the following types: time-domain features, frequency-domain features, waveform features, sequence features, and derived features.
To address the issues of insufficient recognition accuracy and robustness in traditional NILM methods when facing similar load characteristics, multi-state appliances, and modern smart home environments, ref. [
52] proposed a deep learning method based on color-mapped Voltage–Current (V-I) trajectories fused with frequency-domain features. Specifically, this method embeds instantaneous power into V-I trajectories via color mapping and adaptively fuses multi-dimensional features using a channel attention mechanism, significantly improving load classification accuracy and stability.
Due to the varying usage frequencies of different appliances (e.g., washing machines used infrequently, lights used frequently), NILM datasets often suffer from a severe imbalance between majority and minority class samples, causing traditional machine learning models to perform poorly on minority appliances. Ref. [
53] addressed the class imbalance problem in NILM by proposing a multi-domain feature extraction method. They constructed a comprehensive feature set containing 39 features from four dimensions, P-Q plane, current waveform, V-I trajectory, and harmonic currents, to enhance discriminative power for appliance identification. This research also emphasized the necessity of high sampling rates for V-I trajectory and harmonic features.
To address the challenge of identifying low-power appliances in NILM, ref. [
54] proposed a novel feature extraction method based on fractal and multifractal analysis. Traditional features have limited discriminative power in characterizing weak transient signals from low-power, nonlinear loads. Therefore, the authors extracted a hybrid feature set from appliance startup current transients, including monofractal features (e.g., fractal dimension, Hurst exponent, lacunarity) and multifractal features (e.g., singularity spectrum, Hölder exponent), capable of finely depicting signal complexity, self-similarity, and local singularity. Experiments showed that this feature set significantly improved classification performance for low-power appliances, achieving up to 98.3% recognition accuracy on an optimized deep neural network.
At the feature extraction stage, researchers have employed various strategies like steady-state and transient fusion features, multi-domain features, and fractal features, each with its own advantages and disadvantages. A summary is provided in
Table 3.
4.1.4. Load Disaggregation and Identification
This stage primarily employs traditional machine learning methods, i.e., after extracting handcrafted features, models are used for classification or regression analysis. It can also be categorized into supervised and unsupervised methods based on the availability of labeled training data.
To investigate the effectiveness and practicality of traditional machine learning in low-computational-resource environments, ref. [
37] proposed a low-frequency sampling-based NILM method using active and reactive power as features and an SVM classifier for complex state identification. This system achieved real-time load disaggregation on an embedded platform with 91% accuracy.
The traditional machine learning disaggregation process involves detecting significant changes in aggregate power or current, extracting steady-state electrical features from the post-event period, and then feeding them into a pre-trained classification model for appliance identification. For instance, ref. [
55] systematically compared traditional classifiers like Random Forest (RF), Support Vector Machine, and K-Nearest Neighbors (KNN) for identifying five common household appliances using active power and current as core steady-state features. Their experiments showed that under this setting, the Random Forest algorithm achieved the best overall performance, with an F1-score exceeding 0.99, highlighting the advantage of ensemble learning models in handling such classification tasks.
Following event detection, a critical step is classifying the extracted transient features. In this context, researchers also commonly use clustering methods alongside classifiers. A recent systematic study [
56] compared eight clustering algorithms, including K-means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Ordering Points To Identify the Clustering Structure (OPTICS), on the real-world BLUED dataset [
57]. The authors found that the density-based OPTICS algorithm showed more advantages in handling scenarios with multiple appliances and an uneven feature space distribution, particularly when combined with dual-stream feature inputs like active power P and reactive power Q or current I. The study also noted that such algorithms can maintain a reliable performance at sampling frequencies no lower than 1/30 Hz, suggesting their potential application in low-frequency smart meter domains.
The load disaggregation stage in event-based methods comprises two main streams: supervised algorithms (handcrafted features + classifier/regressor) and unsupervised algorithms (clustering). Their respective advantages and disadvantages are summarized in
Table 4.
4.2. State-Based Methods
Figure 4 illustrates the flowchart of the state-based NILM method. The model is pre-trained, and its network structure automatically learns high-level, discriminative features from raw power sequences. The directly extracted aggregate mixed signal contains various noises. To ensure subsequent model identification accuracy, preprocessing is required, typically involving: data alignment and cleaning, normalization/standardization, and constructing sample sequences.
4.2.1. Data Acquisition
The primary contribution of state-based NILM methods to the data acquisition stage lies in significantly reducing the dependence on data sampling frequency and real-time event detection accuracy, thereby enhancing the feasibility of NILM deployment in real residential environments [
5]. However, higher frequency data generally provides richer information and greater model potential [
38]. Yet, high-frequency data also brings challenges of a large data volume, increased noise, and higher costs, necessitating a careful trade-off.
4.2.2. Feature Extraction
State-based methods learn the feature extraction mapping implicitly through stacked nonlinear transformations.
This step is the origin of the “black box.” Deep learning models possess the ability to automatically learn and extract features. They extract implicit, data-driven, and optimized feature representations, often at the expense of interpretability.
At the feature extraction stage, ref. [
58] employed a scattering transform with time-shift invariance to enhance model robustness. The scattering transform, via cascading wavelet transforms and modulus operations, extracts feature representations from high-frequency current signals that are invariant to small time shifts and deformations. Compared to convolutional neural networks that require training many parameters, this method has analytically determined filter coefficients, allowing it to maintain a superior discriminative performance even in small-sample scenarios.
To address the issue that traditional methods using only time-domain information may lose critical features and struggle to identify multi-state processes of appliances, ref. [
59] proposed a feature extraction method based on wavelet decomposition. Specifically, the authors first converted the 1D aggregate power time series into 2D images via sliding windows and a repeat vector layer. Then, multi-level wavelet decomposition was applied to decompose the images, extracting time–frequency features in horizontal, vertical, and diagonal directions to capture both low-frequency steady-state components and high-frequency transient components of the signal, providing richer input representations for subsequent models.
To address the inherent low information content of low-frequency data, ref. [
60] proposed a novel feature enhancement method that integrates weather data and temporal information into a sequence-to-sequence model, effectively expanding the feature space for low-sampling NILM tasks and improving the model’s representational capacity under limited data conditions.
The aforementioned methods propose three distinct approaches to NILM, each with unique contributions in the feature extraction stage. A comparison of their characteristics is provided in
Table 5.
4.2.3. Load Disaggregation and Identification
This step can be categorized into three types: traditional probabilistic models, deep learning models, and hybrid models. Alternatively, based on training paradigms, they can be classified into supervised, semi-supervised, and unsupervised methods.
Traditional load disaggregation models, such as combinatorial optimization and Hidden Markov Models (HMMs), face issues like high computational complexity, susceptibility to local optima, and a limited ability to extract temporal features when dealing with complex, variable load data. To address this, ref. [
61] proposed a load disaggregation model incorporating an attention mechanism. Specifically, the authors used a Bidirectional LSTM (BiLSTM) to capture temporal dependencies and introduced channel attention and spatial attention to enhance feature extraction capability, allowing the model to dynamically focus on key information and improve its expressive power through the attention mechanism.
Existing unsupervised NILM methods often suffer from limited disaggregation accuracy due to inaccurate state identification when handling complex appliances with multiple operating modes. To address this problem, ref. [
62] proposed a hybrid HDBSCAN-FHMM framework. This framework employs the density-based Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) clustering algorithm to identify, in an unsupervised and adaptive manner, multiple fine-grained operating states of appliances, including low-frequency or transient states. Subsequently, these precisely identified states are used as prior knowledge to initialize an improved Factorial Hidden Markov Model (FHMM), thereby significantly enhancing the model’s ability to characterize appliance dynamics and concurrent operations. Experiments show that this method significantly outperforms several baseline models on the AMPds dataset [
63].
Most current NILM research focuses on residential loads, while industrial loads, due to their complex equipment, large power fluctuations, and long operation times, are difficult to address directly with existing methods. Ref. [
64] proposed an effective disaggregation method targeting industrial load characteristics. The authors applied a Factorial Hidden Markov Model (FHMM) to industrial load disaggregation, modeling multiple industrial devices with a multi-chain HMM and using the Viterbi algorithm for state estimation to separate individual device power from the aggregate load.
The aforementioned methods are all state-based load disaggregation approaches employing different technical routes, each with its own advantages and disadvantages, summarized in
Table 6.
4.3. Technical Summary and Discussion
This section classifies the NILM process, which may or may not include an explicit state/event detection step, into event-based and state-based methods. A systematic summary and comparison of the advantages and disadvantages of these two paradigms is provided.
Table 7 contrasts the two routes. Future development trends are also discussed.
While the event-based and state-based dichotomy provides a useful organizing framework, recent advances have introduced methodologies that transcend this classification. Meta-learning approaches, such as those proposed by [
15,
65], enable NILM models to rapidly adapt to new appliances or households using only a few labeled samples. These methods learn a general similarity function across tasks, effectively “learning to learn” load patterns, which addresses the long-standing challenge of cross-household generalization without requiring extensive retraining.
Zero-shot load identification represents another frontier, where models are trained to recognize appliances not seen during training by leveraging semantic attributes or external knowledge bases [
65]. This capability is particularly valuable for handling unknown appliances and evolving load profiles in real-world deployments.
From a mathematical perspective, feature extraction in modern NILM can be formalized as a mapping , where is the space of raw aggregate signals (e.g., voltage, current, power) and is a feature space. In event-based methods, is explicitly constructed through handcrafted features such as harmonic content, V-I trajectory shape descriptors, or fractal dimensions. In state-based deep learning models, is implicitly learned via stacked nonlinear transformations: , where are activation functions and are parameters optimized for the disaggregation objective. This formulation highlights the fundamental difference: event-based methods rely on human-engineered, interpretable features, whereas state-based methods learn features that are task-optimized but often less interpretable.
Hybrid architectures that combine explicit feature engineering with learned representations—such as the attention-enhanced BiLSTM in [
61]—offer a promising path to balance interpretability and performance.
Based on the systematic comparison of the advantages and disadvantages of event-based and state-based methods, it is evident that current NILM technology still has room for improvement in both practicality and accuracy. Future development trends will primarily focus on two major directions: reducing data acquisition costs and integrating hybrid models.
Transition from high-frequency to low-frequency data acquisition. Many studies (e.g., ref. [
49] developed low-complexity sliding window algorithms, ref. [
50] developed statistical anomaly detection algorithms) have achieved reliable event detection under low-sampling-rate conditions. Furthermore, state-based methods further reduce the dependence on data sampling frequency, significantly enhancing the feasibility of NILM deployment in real residential environments [
5].
Evolution from single architectures to hybrid architectures. For example, ref. [
62] proposed a HDBSCAN-FHMM hybrid architecture, first identifying unsupervised appliance multi-states via density clustering, then initializing an improved FHMM to enhance dynamic characterization capability; ref. [
61] combined BiLSTM with attention mechanisms to enhance temporal dependency modeling and key feature focusing ability. These hybrid approaches aim to leverage the strengths of multiple models to improve model practicality and accuracy.
6. Future Outlook
After decades of development, non-intrusive load monitoring (NILM) technology has achieved significant breakthroughs at the algorithmic level. However, to make the leap from laboratory research to large-scale commercial application, NILM systems must address challenges related to robustness, real-time performance, privacy security, and user experience in real-world scenarios, all while keeping costs under control. In the future, NILM technology will gradually mature and become practical by advancing along three main pathways: the reconstruction of algorithmic paradigms, the evolution of system architecture, and the expansion of sensing dimensions.
Algorithmic Paradigm Reconstruction: Event-based and state-based methods have long developed in parallel, often viewed as opposing approaches. Event-based methods offer clear physical interpretability but rely on accurate transient detection, making it difficult to handle scenarios with multiple devices operating concurrently. State-based methods excel at end-to-end learning but suffer from weak interpretability. For commercial applications, pursuing the ultimate performance of a single approach often entails high computational costs and deployment risks. Therefore, integrating the strengths of both methods to build a highly robust hybrid system is an effective path to balance accuracy and cost.
System Architecture Evolution: Much current NILM research relies on cloud-centric processing, which faces three major challenges in commercial promotion: first, the communication cost and bandwidth pressure from transmitting massive amounts of high-frequency data; second, security concerns arising from uploading private user data; and third, the difficulty of meeting the low-latency requirements of real-time feedback with cloud processing alone. Consequently, constructing a collaborative cloud–device–edge–end system—encompassing edge-side collection, edge processing, and cloud optimization—and introducing privacy-preserving computation mechanisms is a viable pathway for the large-scale application of NILM.
Sensing Dimension Expansion: Commercial applications demand more from NILM than simply knowing which device is running. They require understanding the operational state of the device, predicting when maintenance is needed, and even inferring the user’s true intentions to provide personalized recommendations. Relying solely on voltage and current signals can be ambiguous in complex electricity usage scenarios, where similar power changes might correspond to completely different device states or user behaviors. To address this, future NILM systems will selectively integrate low-cost, non-intrusive auxiliary signals—such as vibration sensors or acoustic signatures—to resolve ambiguities and enhance functional capabilities in real-world deployments. It is important to clarify that such cross-modal perception is proposed as a scenario-specific enhancement rather than a universal requirement; in most settings, single-point electrical signals remain sufficient for core disaggregation tasks. This targeted expansion preserves the non-intrusive and cost-effective nature of NILM while extending its utility toward higher-level functions like predictive maintenance and user-centric optimization.
In summary, by pursuing hybrid intelligence in algorithms, cloud–device–edge–end synergy in architecture, and multimodal fusion in sensing, NILM technology is poised to overcome its current bottlenecks in accuracy, cost, privacy, and robustness. When NILM can be integrated stably, safely, and affordably into smart home ecosystems and new-type power systems, its role as the “sensing nerve” for refined user-side energy management will herald broad commercial prospects.