A Non-Invasive Load Identification Method Considering Feature Dimensionality Reduction and DB-LSTM

: As the demand for detailed load data descriptions in modern power systems continues to increase, challenges such as high computational complexity in load identification tasks and high hardware requirements for devices have significantly hindered progress. Therefore, this paper proposes a non-intrusive load identification method using Densely-connected Bi-directional Long Short-Term Memory (DB-LSTM) with Kernel Principal Component Analysis. Firstly, a bilateral sliding window algorithm is employed for event detection in the data collected by load identification devices, checking for the switching on and off of electrical appliances. Secondly, after detecting the switching of load devices and extracting features, Kernel Principal Component Analysis is used to reduce data dimensions due to the complexity of existing features, selecting more relevant characteristics. Finally, a densely connected Bi-directional Long Short-Term Memory (LSTM) network is utilized. This enhances global and dynamic local features by stacking LSTM units and combining them with dense skip connections, providing additional channels for signal transmission, thereby strengthening feature propagation and reducing the number of parameters. This approach lowers computational complexity and improves the efficiency of the model’s load identification. The proposed model is compared and validated against mainstream non-intrusive load identification models through experiments, demonstrating its higher efficiency in load identification.


Introduction
In the continuously evolving landscape of new energy societies, electricity users are increasingly demanding detailed and refined management of their power consumption devices.To meet this challenge, Dr. Hart proposed a method called Non-Intrusive Load Monitoring (NILM) [1].Unlike traditional intrusive load monitoring methods, NILM utilizes the total load information of electricity users.By inferring the power usage behavior of each appliance, it avoids the cumbersome process of installing sensors on each device.This approach offers numerous advantages, such as reduced installation costs, minimal interference with user privacy, and ease of operation [2][3][4], making it the mainstream method for analyzing electricity consumption behavior.However, to understand and apply the NILM method more deeply, it is beneficial to fully utilize Advanced Metering Infrastructure (AMI) data.AMI not only provides information about the total load of users but also offers more detailed and real-time electricity usage data, thereby enhancing the accuracy of appliance monitoring.Through precise analysis of AMI data, we can gain a more accurate understanding of the energy consumption patterns and usage of each device, further optimizing energy management strategies.Additionally, integrating the NILM method with demand response technology allows for flexible adjustment of electricity demand during peak periods, achieving intelligent energy dispatching.For The various load identification methods discussed above consider different aspects in each segment and make corresponding improvements.However, some aspects are still not sufficiently addressed, such as in the feature transfer phase.The aforementioned load identification models all use effective features but do not specially handle redundant features.This results in a bloated model structure, where many features are ignored and left unprocessed, leading to decreased computational efficiency and impacting the efficiency of load identification.This paper proposes a NILM method that incorporates feature dimensionality reduction and DB-LSTM to address these aspects.Considering that sensors and other data collection devices are widely installed in load identification tasks, models that require high hardware specifications are not suitable.Therefore, this paper proposes the use of Dense Bidirectional LSTM networks, which enhance both global and dynamic local features.By stacking LSTM units and combining them with dense skip connections, additional channels for signal transmission are provided, thereby strengthening feature propagation, reducing the number of parameters, and lowering computational complexity.This approach decreases computational demands on devices, aiming to enhance the efficiency of the model in load identification tasks.Below are the specific research works of this paper: 1.
To address the difficulty in effectively monitoring load-switching events in load identification problems, a bilateral sliding window CUSUM algorithm is proposed.This algorithm dynamically monitors types of data such as load power, voltage, and current in the input load monitoring devices to identify the operational status of electrical equipment in real time.

2.
Existing methods extract a variety of features from multiple domains.However, due to the large number of extracted load features, potentially beneficial features for load identification tasks might be overlooked, resulting in longer computation times and increased computational demands on devices.Therefore, this study considers using Kernel Principal Component Analysis (KPCA) for data dimensionality reduction of the extracted load features, thereby obtaining more relevant feature data for load identification tasks.

3.
Considering that existing methods already account for the time-related aspects of load operation data but suffer from overly complex models that only utilize effective features without special handling of redundant ones, this leads to bloated model structures.Many features are ignored and left unprocessed, thus reducing computational efficiency and impacting load identification efficiency.To enhance the efficiency and accuracy of load identification, this paper proposes the use of Dense-LSTM networks.By employing stacked representation learners, both global and dynamic local features are enhanced.The interconnection of various modules within the LSTM network model ensures that the extracted features are transmitted to each layer, enhancing the reuse of effective features, reducing data redundancy, and decreasing the number of parameters to improve load identification efficiency.

Bilateral CUSUM Event Detection Method Based on Median Filtering in Sliding Windows
In non-intrusive load identification tasks, it is often necessary to collect and obtain data from actual field operations.Relying on data sets from intrusive load identification is very time-consuming and labor-intensive, making it costly for load identification tasks.Therefore, it is essential to utilize non-intrusive load identification methods for event detection, feature extraction, data processing, and load identification of electrical devices to accomplish the task.A crucial issue is how to effectively detect load events from the load data obtained from electrical users.Existing load identification methods can be categorized into two types based on the method of identifying changes in the state of electrical devices: event-based load identification methods and non-event-based load identification methods.Non-event-based methods, also known as blind source separation methods, mainly include hidden Markov models, stochastic finite state machines, and others.Event-based methods can be referred to as rule-based load identification methods or change point detectionbased load identification methods.These methods typically involve unidirectional scanning of load data.However, such methods often fail to detect local data changes or sudden load-switching events, resulting in the oversight of some electrical device status changes.Therefore, effectively monitoring related event changes becomes a challenge.This paper proposes an event detection method based on bilateral CUSUM and median filter denoising in sliding windows to determine the occurrence of load-switching events in electrical devices within the input data for the non-intrusive load identification model.

Bilateral Cumulative Sum Event Detection Method Using Sliding Windows
In existing non-intrusive load identification models, some utilize cluster-based methods or hidden Markov models, which are non-event-based.These models are effective in identifying switching events of electrical devices from large amounts of load data.However, they often fail to monitor short-term load-switching events of electrical devices, especially when there is no historical data for newly connected devices in the power system, leading to decreased identification capability under short-term load changes.Event-based methods for detecting load-switching events excel in real-time monitoring of load data fluctuations and observing the load behavior of electrical devices, thereby swiftly detecting the occurrence of switching events in these devices.
The Cumulative Sum (CUSUM) algorithm, derived from the likelihood ratio model, is a control chart model that continuously accumulates the difference between the data to be tested and the standard data in the input model.This process analyzes fluctuations in the data, amplifies related fluctuations, and thus detects the occurrence of switching events in electrical devices.Taking the active power consumption of electrical devices as an example, when there is a load-switching event in the electrical devices, the active power data collected by relevant sensors may exhibit either abrupt or gradual changes.Subsequently, transient events can be determined through algorithms.However, it still faces challenges in detecting short-term changes and local anomalies.
To address these issues, scholars have further proposed the bilateral CUSUM (Cumulative Sum) event detection method using sliding windows [26].This method is a statistical approach for real-time monitoring of abnormal changes or events in time series data.Unlike the traditional CUSUM method, the bilateral CUSUM with sliding windows introduces the concept of a sliding window to adapt to short-term changes and local anomalies that may occur in the system or process.In this approach, instead of applying the CUSUM algorithm to the entire load data sequence, a fixed-size window is deployed over the entire load data sequence, and the CUSUM algorithm is applied only within this window.This is more beneficial for detecting the occurrence of switch-on and switch-off events in electrical devices.Its working principle is as follows: Firstly, introduce a period of time series data X, Its expression is X = {x(k), k = 1, 2, 1 . ..}.Assuming that an electrical device switching event occurs at a certain moment δ, the expressions for the statistical functions g + k and g − k can be set as: In the formula, s + k and s − k represent the positive and negative offsets of the current detection point after removing the influence of noise values.
The expressions for s + k and s − k are: where µ 0 represents the average data value under normal conditions; And β represents various types of noise that exist during the data collection process; When the amplitude of the change in active power is less than β, it will be ignored.In addition, due to a certain time delay between the occurrence and detection of events throughout the entire process, this time delay is set to ζ, Therefore, the following sliding window bilateral CUSUM event detection process can be obtained: Firstly, when there is no transient event of the electrical equipment in the overall data, the overall value of 0 remains basically unchanged as the average of the statistical function, and no event detection is performed; Secondly, if a transient event occurs, numerical accumulation is performed in statistical functions g + k and g − k based on the rise and fall of the data, until the threshold Ω is reached, which is considered a transient event.If the cumulative threshold does not reach the threshold, d is applied to the time delayed data, that is, ζ i+1 = ζ i + 1, until the value exceeds the threshold, which is considered a transient event.
The sliding window bilateral CUSUM method can effectively detect local changes in load data, thereby detecting the occurrence of related load-switching events and achieving more accurate load identification.

Data Denoising with Median Filtering
In load identification tasks, it is crucial to collect effective and accurate load data such as current, voltage, power, and sensor measurements related to the load in the power system.However, these data often contain various interferences, such as Gaussian noise.
Moreover, during event detection tasks, load data is highly susceptible to noise from measurement devices, communication interference, and other external factors, leading to unnecessary fluctuations and disturbances in the data.Additionally, the power system experiences various sudden events, such as equipment switching and fault recovery, which can cause dramatic changes in load data, thereby reducing the accuracy.
Therefore, this paper proposes the use of median filtering technology for denoising filtering [27].Median filtering is a nonlinear signal processing technique based on the theory of order statistics.It is an effective noise reduction technique for on-site data collection, also known as a nonlinear filter or a statistical order filter.The principle of median filtering is to take each load data point of an electrical device as the center and convert this point into the median value of all data points within a certain neighborhood window [28].
For data requiring noise reduction, median filtering can effectively accomplish this task and better preserve the edges of the signal, maintaining the complete structure of the data, unlike linear filtering methods that may blur the data waveform by treating edge data as noise points.Additionally, median filtering possesses the advantages of simplicity in computation and ease of application in hardware, making it more suitable for the denoising task in the scenario presented in this paper compared to other filtering methods.The specific process is as follows: Assuming a load data sequence of electrical equipment is x j (−∞ < j < +∞), when performing median filtering on this set of time series data, a window of length l needs to be set, and the value of l is 2N + 1, where N is a positive integer.Taking a certain moment x(i) as an example, taking a total of a points before and after x(i − N), . . .x(i), . . .x(i + N) as 2N + 1 window, where x(i) is the center point of all data in this window, and arranging the aforementioned points x(i − N), . . .x(i), . . .x(i + N) and 2N + 1 according to their size, taking the value, and defining it as the output value of the median filter in this window.
This article proposes a double accumulation and CUSUM event detection method using sliding windows based on median filtering to effectively detect the switching of electrical equipment.After preliminary noise reduction of load data using median filtering technology, it can effectively reduce the interference of noise on load events by detecting whether switching events have occurred.

Non-Intrusive Load Monitoring Method Incorporating Feature Dimension Reduction and DB-LSTM
In existing load identification tasks, models process the raw steady-state or transient features extracted from the operation of electrical devices.This processing may involve time-frequency domain transformations or other feature extraction methods to obtain higher-dimensional relevant features.However, among these features, some may be highly beneficial for load identification tasks, while others may have little or no use.Therefore, selecting relevant and beneficial features becomes a noteworthy research problem.

Feature Dimension Reduction Method Based on Kernel Principal Component Analysis
In non-intrusive load monitoring tasks, the model performs a sequence of operations, encompassing event detection, feature extraction, data preprocessing, and load identification, in order to accomplish the task of load identification.However, the data input into the model is often voluminous and complex, containing both beneficial load feature information and other redundant information not useful for load identification.Therefore, selecting relevant data features becomes an important research issue.Some scholars have proposed using Principal Component Analysis (PCA) [29] to perform feature dimension reduction analysis on load data during the switching events of electrical devices in non-intrusive load identification tasks.However, traditional PCA is a linear feature processing method and is less sensitive to changes in data involving randomness and occasional events, such as the switching of electrical devices, which have nonlinear relationships.Therefore, its feasibility in non-intrusive load identification is not high.
However, the more recently proposed KPCA method can greatly improve this issue.Firstly, KPCA is a method of performing principal component analysis on nonlinear data.By exploring additional mapping forms in high-dimensional space, this method effectively captures the nonlinear correlations between input features.Consequently, it selects features more conducive to the load identification task, extracts more crucial information, achieves feature dimensionality reduction, and establishes a more precise model.This, in turn, reduces computational complexity, improves computational efficiency, and lessens hardware requirements.The KPCA method includes multiple kernel functions, such as linear kernels, radial basis function (RBF) kernels, etc.Among them, the Gaussian kernel function is one of the most common and effective kernel functions in RBF kernels, and its mathematical expression is as follows: where x i and x j represent the input electrical equipment load data, while σ represents a parameter of the shape of the kernel function.Due to the selected load device data in this paper being a type of nonlinear and complex data, among various kernel functions in KPCA, RBF kernel is highly effective in capturing nonlinear relationships within the data.It achieves this by mapping the data into a high-dimensional space and generating additional nonlinear structures.This further processes the data.Since the RBF kernel allows for the adjustment of the kernel's width through parameter 'σ', it diversifies the function, making it more adaptable to different data forms, thereby enhancing the effectiveness of data analysis and optimizing the algorithm.

DB-LSTM Non-Intrusive Load Identification Model
Various deep-learning methods have already been applied to non-intrusive load identification tasks.However, issues such as model complexity and high computational demand still exist.Additionally, some types of LSTM models tend to overlook the connections between load data of the same electrical device over different lengths of time [30,31].This type of data connection is crucial for users who can collect or store long-term load data and should not be ignored.The DB-LSTM network addresses this by analyzing data across different time scales using densely connected LSTM modules, thus achieving more accurate load identification.
To overcome the shortcomings of existing methods, this paper introduces a novel long-term temporal model for non-intrusive load identification, namely the Dense-LSTM network.This network utilizes a stack of representation learners to enhance both global and dynamic local features.By integrating various modules within the LSTM network model and establishing interconnections between layers, features extracted can be propagated throughout each layer, ensuring the effective reuse of valuable features, reducing data redundancy, and minimizing the number of parameters.Additionally, short-term and long-term time analysis patterns are established to facilitate the learning of temporal relationships in load data.Furthermore, forward and backward bidirectional learning is conducted based on the current target point, capturing a broader range of load information and comprehensively improving the efficiency of load identification.
Following the concept of densely connected networks, the LSTM network is further expanded into a densely connected form.This involves using skip connections within the densely connected network blocks, as shown in Figure 1, where y l t represents the densely connected module at the t time step in the l layer, with various colored lines indicating the connections between different dense layers.
Various deep-learning methods have already been applied to non-intrusive load identification tasks.However, issues such as model complexity and high computational demand still exist.Additionally, some types of LSTM models tend to overlook the connections between load data of the same electrical device over different lengths of time [30,31].This type of data connection is crucial for users who can collect or store long-term load data and should not be ignored.The DB-LSTM network addresses this by analyzing data across different time scales using densely connected LSTM modules, thus achieving more accurate load identification.
To overcome the shortcomings of existing methods, this paper introduces a novel long-term temporal model for non-intrusive load identification, namely the Dense-LSTM network.This network utilizes a stack of representation learners to enhance both global and dynamic local features.By integrating various modules within the LSTM network model and establishing interconnections between layers, features extracted can be propagated throughout each layer, ensuring the effective reuse of valuable features, reducing data redundancy, and minimizing the number of parameters.Additionally, short-term and long-term time analysis patterns are established to facilitate the learning of temporal relationships in load data.Furthermore, forward and backward bidirectional learning is conducted based on the current target point, capturing a broader range of load information and comprehensively improving the efficiency of load identification.
Following the concept of densely connected networks, the LSTM network is further expanded into a densely connected form.This involves using skip connections within the densely connected network blocks, as shown in Figure 1, where  represents the densely connected module at the  time step in the  layer, with various colored lines indicating the connections between different dense layers.The DB-LSTM network models the related temporal patterns of the switching actions of electrical devices through the aforementioned structure.It takes full advantage of the dense connection modules and bidirectional time feature detection to reduce computational complexity and enhance event detection capabilities.The main feature of the entire DB-LSTM network is that it not only achieves inter-layer connectivity but also cross-level connections across multiple layers.Its structure is shown in Figure 2.This structure effec- The DB-LSTM network models the related temporal patterns of the switching actions of electrical devices through the aforementioned structure.It takes full advantage of the dense connection modules and bidirectional time feature detection to reduce computational complexity and enhance event detection capabilities.The main feature of the entire DB-LSTM network is that it not only achieves inter-layer connectivity but also cross-level connections across multiple layers.Its structure is shown in Figure 2.This structure effectively reduces the gradient explosion problem commonly found in traditional LSTM networks.Additionally, by facilitating communication across multiple levels, it enhances effective feature reuse.
In this DB-LSTM model, ↔ y t refers to the output of the t-th LSTM module, and the specific calculation formula is as follows: The → y t and ← y t in Formula (6) represent the output of the t-th step in the two-way directions of the module in that group, respectively.[, ] indicates that these two outputs are directly linked to each other.→ and ← respectively represent the direction of output d.
The output y l t calculation method for the t -th time step of the LSTM module's l-th layer is as follows:  In this DB-LSTM model,  ⃖ ⃗ refers to the output of the -th LSTM module, and the specific calculation formula is as follows: The  ⃗ and  ⃖ in Formula (6) represent the output of the -th step in the two-way directions of the module in that group, respectively., indicates that these two outputs are directly linked to each other.→ and ← respectively represent the direction of output .The output  calculation method for the  -th time step of the LSTM module's -th layer is as follows: ( ) , ,..., ,... , Among them,  ,  , …  , …  refers to the set of features extracted by multiple modules before this layer module.
Using expression  () to represent the -th layer in the DB-LSTM network, where  is the input to each layer of the LSTM: Merge the output of the previous LSTM layer with the input feature  of the -th time step.
To obtain a comprehensive overall temporal correlation feature of load data, this paper represents the last output of the DB-LSTM network as (, ,  ,  ), where  is the sampling stack,  is the primary network, and  and  represent the weights of the SRL and DB-LSTM backbone networks, respectively.And use the cross entropy function to calculate the loss: Using expression H l (X) to represent the l-th layer in the DB-LSTM network, where X is the input to each layer of the LSTM: Merge the output of the previous LSTM layer with the input feature x t of the t-th time step.
To obtain a comprehensive overall temporal correlation feature of load data, this paper represents the last output of the DB-LSTM network as φ(S, F, W S , W L ), where S is the sampling stack, F is the primary network, and W S and W L represent the weights of the SRL and DB-LSTM backbone networks, respectively.And use the cross entropy function to calculate the loss: Finally, merge the scores generated by the network model modeling with the fusion layer and use a multi-scale sliding window to fuse all outputs into scores, the fusion formula is: In the formula, s represents the starting time step of the sliding window, and K represents the number of time steps.

Non-Intrusive Load Identification Model Incorporating Feature Dimension Reduction and DB-LSTM
The illustration in Figure 3 outlines the specific procedure of the KPCA-DBLSTM non-intrusive load identification model proposed in this paper.
represents the number of time steps.

Non-Intrusive Load Identification Model Incorporating Feature Dimension Reduction and DB-LSTM
The illustration in Figure 3 outlines the specific procedure of the KPCA-DBLSTM non-intrusive load identification model proposed in this paper.

Evaluation Metrics and Model Parameters
This paper uses the REFIT Power Data public dataset, which consists of electricity usage data from multiple UK households between 2013 and 2015 [32].For non-intrusive load disaggregation experiments, seven types of common household load devices from households 2, 3, 5, 6, 15 in the REFIT Power Data dataset were selected [33].The electrical load data from a portion of HOUSE2 will be extracted to serve as the training set for the experimental section of this paper, while the remaining data will be used as the test set.The data from household 3, 5, 6, and 15 was used as a validation set for experimentation.Some devices exhibited variable states and operating modes in different households, including single-state and multi-state switching devices.Due to the similar active load values and variable operating modes of these devices, the task of load disaggregation presents certain challenges.The capability of this work in load identification tasks is further highlighted, emphasizing its proficiency in the recognition of electrical loads.Detailed information about the seven types of loads is shown in Table 1.
Step 2: Perform denoising on the data using median filtering.3.
Step 3: Detect events in the data using a sliding window's bilateral cumulative sum (CUSUM).4.
Step 4: Extract features from the detected data, focusing on multi-dimensional characteristics.5.
Step 5: Employ the KPCA technique for data dimension reduction, extracting features that are more effective for load identification tasks.6.
Step 6: Input the processed data into the DB-LSTM network for load identification.7.
Step 7: Obtain the results of the load identification and output relevant information.

Case Analysis 4.1. Evaluation Metrics and Model Parameters
This paper uses the REFIT Power Data public dataset, which consists of electricity usage data from multiple UK households between 2013 and 2015 [32].For non-intrusive load disaggregation experiments, seven types of common household load devices from households 2, 3, 5, 6, 15 in the REFIT Power Data dataset were selected [33].The electrical load data from a portion of HOUSE2 will be extracted to serve as the training set for the experimental section of this paper, while the remaining data will be used as the test set.The data from household 3, 5, 6, and 15 was used as a validation set for experimentation.Some devices exhibited variable states and operating modes in different households, including single-state and multi-state switching devices.Due to the similar active load values and variable operating modes of these devices, the task of load disaggregation presents certain challenges.The capability of this work in load identification tasks is further highlighted, emphasizing its proficiency in the recognition of electrical loads.Detailed information about the seven types of loads is shown in Table 1.The article selects A cc , MAE and F1 as relevant evaluation metrics for the load identification model [34], assessing the ability of the load decomposition model in this article.Whereas A cc represents the overall accuracy of load identification results.The MAE metric reflects the real-time accuracy of the model in load identification, with a lower value indicating higher accuracy in load identification.The F1 metric focuses on reflecting the error between actual load data and data identified by the model in real situations.These three categories of metrics provide a comprehensive evaluation of the load identification results [35].The following formulas 21-25 represent the calculation formulas for these three metrics: In the formula, β t represents the power value of electrical appliance i obtained using the load decomposition algorithm at time t, β (i) t represents the actual power value of electrical appliance i at time t, A cc denotes the accuracy of the decomposition of all appliances in the entire household.MAE stands for the mean absolute error of the power decomposition values from time T 0 to T 1 within the time period.PRE represents precision.REC signifies recall.F1 represents the harmonic mean of precision and recall; TP is the point at which it is in and considered to be in a working state; FP is the point in time when it is in but not considered to be in a working state; FN is a point in time when it is not considered to be in a working state.

Experimental Parameter Settings and Effect Analysis
Experiment 1: Regarding the median filtering method proposed in this paper, its filtering effect and its impact on the accuracy of load identification results were investigated.The aim is to demonstrate the filtering effect of the median filtering method and its impact on the accuracy of load identification results.A comparison is made using before-and-after filtering effect images, as well as the evaluation metrics A cc , MAE and F1.
Experiment 2: Validates the effectiveness of the feature dimensionality reduction method proposed in this paper and compares it with various other methods.
Experiment Conducts comprehensive experiments comparing results between trained and untrained data.
The above three sets of experiments aim to thoroughly validate the effectiveness and generality of the previously proposed methods in the task of load identification.

Results
The effectiveness experiment of the median filtering method was conducted first.The intuitive filtering effect is shown in Figure 4, where (a) represents the waveform of the load data without median filtering, and (b) represents the waveform of the load data after median filtering.It is apparent that after median filtering, the significantly prominent data in the load data has reduced, making the overall waveform smoother without losing the general fluctuation pattern.It retains the overall information regarding load changes.According to Table 2, it is evident that this does not affect the load identification effect.On the contrary, the accuracy of load identification has slightly improved.
In addition, Table 2 shows the comparison of three metrics between the LSTM, CNN, HMM models, and the proposed KPCA-DBLSTM model under two conditions: with and without median filtering.According to the data in Table 2, the load data after median filtering, when input into the model, does not decrease the overall recognition rate.Instead, it leads to different degrees of improvement in the three evaluation metrics.Firstly, there is an improvement of 1.2% to 2.6% in the A cc metric, and the MAE metric also shows a slight improvement.Finally, the F1 metric exhibits an improvement ranging from 1.3% to 4%.In this experimental part, the KPCA-DBLSTM model proposed in this paper achieves optimal performance across all three evaluation metrics.From the comprehensive performance in the above table, it can be observed that the median filtering event detection method employed in this paper has the potential to improve the load identification efficiency of multiple models to varying degrees.However, there is a certain degree of variation in the improvement among different models.This is because, although median filtering can effectively reduce the impact of various factors such as noise and outliers, it is difficult to avoid filtering out the special feature information contained in some extreme values.Consequently, while the method can enhance accuracy in the subsequent identification of relevant models, it does not lead to a significant increase in accuracy.
The aim is to demonstrate the filtering effect of the median filtering method and its impact on the accuracy of load identification results.A comparison is made using before-andafter filtering effect images, as well as the evaluation metrics  ,  and 1.
Experiment 2: Validates the effectiveness of the feature dimensionality reduction method proposed in this paper and compares it with various other methods.
Experiment 3: Conducts comprehensive experiments comparing results between trained and untrained data.
The above three sets of experiments aim to thoroughly validate the effectiveness and generality of the previously proposed methods in the task of load identification.

Results
The effectiveness experiment of the median filtering method was conducted first.The intuitive filtering effect is shown in Figure 4, where (a) represents the waveform of the load data without median filtering, and (b) represents the waveform of the load data after median filtering.It is apparent that after median filtering, the significantly prominent data in the load data has reduced, making the overall waveform smoother without losing the general fluctuation pattern.It retains the overall information regarding load changes.According to Table 2, it is evident that this does not affect the load identification effect.On the contrary, the accuracy of load identification has slightly improved.In addition, Table 2 shows the comparison of three metrics between the LSTM, CNN, HMM models, and the proposed KPCA-DBLSTM model under two conditions: with and without median filtering.According to the data in Table 2, the load data after median filtering, when input into the model, does not decrease the overall recognition rate.Instead, it leads to different degrees of improvement in the three evaluation metrics.Firstly, there is an improvement of 1.2% to 2.6% in the  metric, and the  metric also shows a slight improvement.Finally, the 1 metric exhibits an improvement ranging from 1.3% to 4%.In this experimental part, the KPCA-DBLSTM model proposed in this paper achieves optimal performance across all three evaluation metrics.From the comprehensive performance in the above table, it can be observed that the median filtering event detection method employed in this paper has the potential to improve the load identification efficiency of multiple models to varying degrees.However, there is a certain degree  Next, this paper conducted experiments to validate the effectiveness of feature dimensionality reduction methods, and the results of three evaluation metrics can be obtained from Table 2.In this experiment, to verify the effectiveness of the feature dimensionality reduction methods, only changes were made in the feature processing part.Specifically, the results of the three metrics after feature computation with and without feature dimensionality reduction methods were compared.
From the results in Table 3, it can be observed that the feature dimensionality reduction method has a certain effect on the improvement of load identification accuracy.Firstly, there is a noticeable improvement of 3.7% to 5.4% in the A cc metric, and the MAE metric also shows a slight improvement.Finally, the F1 metric exhibits a relatively stable improvement ranging from 3.5% to 3.8%.This indicates that feature dimensionality reduction techniques effectively eliminate load features that are ineffective or have minimal impact on the load identification task.This makes the model calculations lean towards load features that are more beneficial for load identification, resulting in an increase in relevant evaluation metric values.However, after analyzing the data in the table, it was found that there is a disparity in the improvement of accuracy among different models.This is because, during the process of feature dimensionality reduction, different models emphasize the extraction of different types of features.As a result, the applicable features vary, leading to such issues in accuracy enhancement.The final step involves a comparative experiment to validate the model's generalization performance.This paper verifies the generalization capability of the proposed model by comparing untrained data from HOUSE2 with data from HOUSE3, 5, 6, and 15. Figure 5 illustrates the comparative results of three metrics across multiple load identification models.Through the comparison of the three indicators in Figure 5a-c, the results indicate that the proposed KPCA-DBLSTM model in this paper exhibits a significant advantage across multiple scenarios and various metrics.For the  metric, the KPCA-DBLSTM model proposed in this paper shows a 7.6-15.1% increase compared to other models in different scenarios.Under the  metric, there is an optimization ranging from 3.8% to 10.18%, and finally, the 1 metric shows an improvement of 6.7-17.1%.The comparison of these data effectively demonstrates the overall effectiveness of the non-intrusive load identification model proposed in this paper.
Furthermore, during the process of validating the generalization performance of the KPCA-DBLSTM model in different household data situations, there is a certain fluctuation in the accuracy.For instance, in the trained data of HOUSE2, the model performs the best, while the load identification results for data from other households (HOUSE3, 5, 6, and 15) show varying degrees of decrease.This is attributed to the fact that although different households may adopt similar electrical devices, their electricity consumption behaviors, power consumption, and other relevant load characteristics differ from those in HOUSE2, which was part of the training data.Therefore, while still effective in identification, there is a potential decrease in load identification accuracy.

Discussion
In this paper, a non-intrusive load identification method based on DB-LSTM, considering feature dimensionality reduction, is proposed.Through experimental comparisons, the model introduced in this paper optimizes and improves the load identification task in various stages, such as event detection, data processing, and load identification.The median filtering method helps mitigate the impact of extreme data on experiments, feature dimensionality reduction concentrates computational resources on aspects beneficial to load identification, and the DB-LSTM network effectively reduces model complexity and hardware requirements for load identification devices through the multiple reuse of features.
The proposed model facilitates the simplification of load identification procedures in real-world scenarios, enhancing computational speed and, consequently, improving the efficiency of analyzing electricity consumption behavior.This leads to a more convenient and accurate execution of load identification tasks, providing power users with more authentic and effective electricity consumption data.It assists users in optimizing their electricity consumption habits intelligently.
However, the computational process of the proposed method remains relatively ex- Through the comparison of the three indicators in Figure 5a-c, the results indicate that the proposed KPCA-DBLSTM model in this paper exhibits a significant advantage across multiple scenarios and various metrics.For the A cc metric, the KPCA-DBLSTM model proposed in this paper shows a 7.6-15.1% increase compared to other models in different scenarios.Under the MAE metric, there is an optimization ranging from 3.8% to 10.18%, and finally, the F1 metric shows an improvement of 6.7-17.1%.The comparison of these data effectively demonstrates the overall effectiveness of the non-intrusive load identification model proposed in this paper.
Furthermore, during the process of validating the generalization performance of the KPCA-DBLSTM model in different household data situations, there is a certain fluctuation in the accuracy.For instance, in the trained data of HOUSE2, the model performs the best, while the load identification results for data from other households (HOUSE3, 5, 6, and 15) show varying degrees of decrease.This is attributed to the fact that although different households may adopt similar electrical devices, their electricity consumption behaviors, power consumption, and other relevant load characteristics differ from those in HOUSE2, which was part of the training data.Therefore, while still effective in identification, there is a potential decrease in load identification accuracy.

Discussion
In this paper, a non-intrusive load identification method based on DB-LSTM, considering feature dimensionality reduction, is proposed.Through experimental comparisons, the model introduced in this paper optimizes and improves the load identification task in various stages, such as event detection, data processing, and load identification.The median filtering method helps mitigate the impact of extreme data on experiments, feature dimensionality reduction concentrates computational resources on aspects beneficial to load identification, and the DB-LSTM network effectively reduces model complexity and hardware requirements for load identification devices through the multiple reuse of features.
The proposed model facilitates the simplification of load identification procedures in real-world scenarios, enhancing computational speed and, consequently, improving the efficiency of analyzing electricity consumption behavior.This leads to a more convenient and accurate execution of load identification tasks, providing power users with more authentic and effective electricity consumption data.It assists users in optimizing their electricity consumption habits intelligently.
However, the computational process of the proposed method remains relatively extensive and cumbersome, requiring a complete workflow for load identification tasks.This imposes higher computational demands on load identification devices, potentially limiting the widespread deployment of this technology, especially in cost-sensitive or resourceconstrained environments.Therefore, further research is needed to explore ways to enhance the computational capabilities of load identification devices, such as simplifying algorithmic processes, reducing computational complexity, and lowering device computational requirements.
For instance, more efficient feature selection and dimensionality reduction techniques could be investigated, or lighter network structures could be developed.Model compression and quantization techniques could also be explored to reduce the size and computational demands of the model, making it more suitable for operation on resource-constrained devices.This could further facilitate the interoperability between edge devices and cloud devices, leveraging the advantages of the simplified network in DB-LSTM to reduce hardware requirements for edge detection devices installed in homes.This, in turn, addresses the current challenge of the limited installation of load identification devices due to cost issues and contributes to the further development of smart grids, promoting advancements in intelligent electricity usage.

Figure 3 .
Figure 3. Structure of the non-intrusive load identification model incorporating feature dimension reduction and DB-LSTM. 1. Step 1: Collect aggregate data through sensors or other devices.2. Step 2: Perform denoising on the data using median filtering.3. Step 3: Detect events in the data using a sliding window's bilateral cumulative sum (CUSUM).4. Step 4: Extract features from the detected data, focusing on multi-dimensional characteristics.5. Step 5: Employ the KPCA technique for data dimension reduction, extracting features that are more effective for load identification tasks.6. Step 6: Input the processed data into the DB-LSTM network for load identification.7. Step 7: Obtain the results of the load identification and output relevant information.

Figure 3 .
Figure 3. Structure of the non-intrusive load identification model incorporating feature dimension reduction and DB-LSTM.
the load data without median filtering (b) the data after denoising with median filtering
ACC Metric Comparison Chart (b) MAE Metric Comparison Chart (c) F1 Metric Comparison Chart

Figure 5 .
Figure 5.Comparison of load identification result metrics across multiple models.

Figure 5 .
Figure 5.Comparison of load identification result metrics across multiple models.
Electronics 2024, 13, x FOR PEER REVIEW 8 of 15 tively reduces the gradient explosion problem commonly found in traditional LSTM networks.Additionally, by facilitating communication across multiple levels, it enhances effective feature reuse. t

Table 1 .
Equipment Type Classification and Information Table.

Table 2 .
Comparison of recognition result metrics for load data with and without filtering.

Table 3 .
Comparison of dimensionality reduction effects.