Research of Non-Intrusive Load Decomposition Considering Rooftop PV Based on IDPC-SHMM

Liu, Xingqi; Liu, Xuan; Zheng, Angang; Dou, Jian; Du, Yina

doi:10.3390/en18184935

Open AccessArticle

Research of Non-Intrusive Load Decomposition Considering Rooftop PV Based on IDPC-SHMM

by

Xingqi Liu

^1,2,*,

Xuan Liu

¹,

Angang Zheng

¹,

Jian Dou

¹ and

Yina Du

¹

China Electric Power Research Institute, Metrology Institute, Beijing 100192, China

²

School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(18), 4935; https://doi.org/10.3390/en18184935

Submission received: 24 June 2025 / Revised: 22 July 2025 / Accepted: 31 July 2025 / Published: 17 September 2025

(This article belongs to the Section A2: Solar Energy and Photovoltaic Systems)

Download

Browse Figures

Versions Notes

Abstract

Household electricity meters equipped with rooftop photovoltaic systems only display net load power data after coupling loads with photovoltaic power, which gives rise to the issue of unknown PV output and load demand. A non-invasive load decomposition algorithm based on Improved Density Peak Clustering (IDPC) and the Simplified Hidden Markov Model (SHMM) is proposed to decompose PV generation power and load consumption power from net load power data, providing data support for power demand-side management. First, the Improved Density Peak Clustering algorithm is used to adaptively obtain load power templates. Then, historical power data from PV proxy sites are classified based on weather types, while radiation proxies are used to estimate the historical PV power of the target users. These estimated PV power data are combined with historical load information to derive the parameters of the SHMM under different PV output conditions, thereby constructing the load decomposition objective function. Finally, the net load power data are used to achieve non-intrusive load decomposition and photovoltaic power extraction for households with PV systems; the effectiveness of the proposed algorithm is validated using Apmds datasets and Pecans Street datasets.

Keywords:

non-intrusive load decomposition; net load power; density peak clustering; simplified hidden Markov model; radiation proxy

1. Introduction

With the advancement of the national rooftop PV development pilot program [1], a large number of rural households have installed distributed rooftop PV systems. However, gateway meters only display the net load information after coupling loads with distributed PV, which poses a great challenge to the analysis of demand-side resource responsiveness and PV output assessment [2,3]. Therefore, in order to enhance the role of demand response in power systems, it is crucial to investigate non-intrusive load decomposition algorithms to separate load consumption power from photovoltaic generation power in household PV systems.

Although rooftop PV systems are equipped with dedicated metering devices, the low informatization level and poor data collection quality limit functionality to daily power generation data uploads [4]. This constraint prevents their utility in non-intrusive load disaggregation. In recent years, the deployment of smart meter-based advanced measurement systems has enabled utilities to acquire high-resolution net load data via gateway meters [5]. Such data provide critical support for Non-Intrusive Load Monitoring and Decomposition (NILMD) technology to achieve fine-grained PV output assessment and load decomposition. When PV disaggregation accuracy reaches sufficient levels, these systems may replace distributed PV grid-connection meters, potentially yielding significant savings in investment and maintenance costs.

The Non-Intrusive Load Monitoring (NILM) technique was first proposed by Hart [6], and has attracted much attention due to its advantages such as low cost, easy dissemination, and good privacy protection [6]. Given the cost constraints of high-frequency sampling devices and data storage limitations for large-scale implementation, NILM based on low-frequency sampling has emerged as a key research focus in recent years. In [7], the Hidden Markov Model (HMM) is improved and a scalable NILM method based on segmented quadratic integer constrained planning is proposed. A sparse superstate HMM using the sparsity of load power data is proposed in [8], which reduced the complexity of the algorithm while improving the accuracy. However, the decomposition algorithm based on the HMM is related to the decomposition results of the previous moments, which are prone to error accumulation. By employing two-stage decomposition techniques to extract the low-frequency components of smart meters, an EV charging event recognition method is proposed based on event detection and Dynamic Time Warping (DTW) [9]. An NILM algorithm based on event detection and a Convolutional Neural Network (CNN) is proposed in [10], which uses Complementary Ensemble Empirical Mode Decomposition (CEEMD) to increase the diversity and difference in feature maps and improve the recognition effect. However, the above methods do not take rooftop PV into account and cannot solve the mismatch problem of PV output and load demand agnosticism.

Currently, some research scholars have also begun to conduct preliminary explorations of the load decomposition problem involving rooftop PVs. The PV generation and the total load demand are estimated by combining a PV physical model with a Mixed Hidden Markov (MHM) model [2]. A PV identification method based on the combination of statistics and Support Vector Machine (SVM) is proposed in [3]. However, only the PV system switching state is identified (whether or not it is in the generation state). Overall, existing studies focus on solving the separation of the PV generation power from the total load demand [11,12]. The resolution of the separated data is relatively low and lacks further decomposition of the total load demand.

Therefore, in response to the lack of research on load decomposition for households with rooftop photovoltaic systems, this paper proposes a method based on Improved Density Peak Clustering and the Simplified Hidden Markov Model to achieve load decomposition for households with rooftop PV systems. In the training stage, firstly, the load power template is obtained through the Improved Density Peak Clustering algorithm; then, the historical power data of the PV agent site are classified into four categories according to the weather type. In addition, the radiative agent is used to estimate the historical PV power of the user to be decomposed and to constitute a training set with the historical power of the load; after that, the HMM is simplified. The model parameters are obtained for the four types of typical weather, and the objective function is constructed. In the decomposition stage, the Euclidean distance is used to measure the similarity between the net load power and the typical PV power on the day to be decomposed. The objective function under the weather type of the day is selected to achieve load decomposition. Finally, the effectiveness of the proposed algorithm is verified based on two public datasets.

2. Load Power Template Construction and Photovoltaic Data Processing

2.1. IDPC-Based Load Power Template Construction

It is an important prerequisite for load decomposition to equate household appliances to finite state machines and to obtain representative values of power in different states [13]. The K-means clustering algorithm requires an empirical predetermination of the number of clusters, while the Affinity Propagation (AP) algorithm is more complex. Therefore, in this paper, Density Peak Clustering (DPC) [14] is employed to determine the load state power information, which does not require a predetermined number of clusters and has no iterative process. However, the algorithm still has some drawbacks—the clustering results are sensitive to the truncation distance, and the number of clusters needs to be determined on the distance–density decision diagram. In view of this, this paper proposes an Improved Density Peak Clustering (IDPC) algorithm combined with load power characteristics.

Assuming that the active power sequence of a load is

P = (p_{1}, p_{2}, \dots, p_{m})

, since some of the loads are off for the vast majority of the time, data smaller than 10 W are eliminated before clustering and the processed power sequence is denoted as

\tilde{P} = ({\tilde{p}}_{1}, {\tilde{p}}_{2}, \dots, {\tilde{p}}_{n})

, where n is the length of the sequence.

When performing clustering, there are two very important parameters to be determined—local density

ρ_{i}

and relative distance

δ_{i}

. The expression for local density

ρ_{i}

is shown in Equation (1), as follows:

ρ_{i} = \sum_{j = 1}^{n} χ (d_{i j} - d_{c}) j \neq i

(1)

where

χ (d_{i j} - d_{c}) = \{\begin{cases} 1, d_{i j} - d_{c} < 0 \\ 0, d_{i j} - d_{c} \geq 0 \end{cases}

(2)

d_{i j} = \sqrt{{({\tilde{p}}_{i} - {\tilde{p}}_{j})}^{2}}

denotes the Euclidean distance between two power values. dc is the truncation distance, which indicates that the data points whose distance between two power values is less than the truncation distance belong to the same cluster. In order to reduce the error caused by manually determining the truncation distance, density information entropy is used to adaptively determine the truncation distance [15], as follows:

H (d_{c}) = - \sum_{i = 1}^{n} \frac{ρ_{i}}{Ω} \log \frac{ρ_{i}}{Ω}

(3)

where

Ω = \sum_{i = 1}^{n} ρ_{i}

. From Equation (3), it can be seen that when the smaller value is taken, the more obvious the difference in the distribution characteristics of the sample points is, making it easier to determine the clustering center. Therefore, the essence of selecting the optimal dc value is related to the problem of minimizing the entropy of the density information. The optimal truncation distance can be found adaptively using Equation (4).

d_{c} = \underset{d_{c}}{\arg \min} (H (d_{c}))

(4)

The local density sequence

Θ = (ρ_{1}, ρ_{2}, \dots, ρ_{n})

corresponding to the sequence

\tilde{P}

is obtained from Equations (1)–(4).

The relative

δ_{i}

distance can be calculated from Equation (5), as follows:

δ_{i} = \{\begin{cases} \max_{i f ρ_{i} = \max \{Θ\}} {d_{i j}} j \in {1, 2, \dots, n}, j \neq i \\ \min_{ρ_{i} < ρ_{j}} {d_{i j}} j \in {1, 2, \dots, n}, j \neq i \end{cases}

(5)

In order to select the appropriate clustering center, an auxiliary parameter

ω_{i}

that combines

ρ_{i}

and

δ_{i}

is designed, as follows:

ω_{i} = ρ_{i} \cdot δ_{i}

(6)

Since the local density and relative distance of the cluster centers are much larger than that of the other points, there will be obvious segments when they are arranged in descending order. In this paper, by calculating the slope values of two adjacent points, the point with the fastest decreasing slope value and several points in front of it are taken as the cluster center.

In order to avoid the load power fluctuation that causes the sample points originally belonging to the same cluster class to be divided into several classes, the relative distance threshold (shown in Equation (7)) is set to exclude the points with closer relative distance.

δ_{i} > 10 % \cdot \max (\tilde{P})

(7)

In summary, the power representative value of the load state, i.e., the power template, can be obtained based on Equations (1)–(7).

However, due to the strong randomness and volatility of rooftop PV power output, it is impossible to discretize it into a finite state; therefore, PV is used as a background load.

2.2. Load Overstatement Coding

We can assume that the net load power of the smart meter at a certain moment is

P_{total} (t)

, which can be expressed as the sum of individual device powers, as shown in Equation (8).

P_{total} (t) = \sum_{i = 1}^{N} p_{i} (t) + p_{pv}^{famil} (t) + P_{noise} (t)

(8)

Here,

p_{i} (t)

is the power of load i at time t;

p_{pv}^{famil} (t)

is the power ‘consumed’ by the PV system at time t.

P_{noise} (t)

is the measurement error and noise.

As can be seen from Section 2.1, ordinary household appliances can be equated to a finite state machine, and binary coding is considered to represent the appliance operating states. An electrical appliance with m_i operating states can be represented by an m_i-bit binary vector. Assuming that the number of states corresponding to N loads is

(m_{1}, m_{2}, \dots m_{N})

, in order to preserve the correlation between different loads, the states of each load at each moment are jointly encoded to obtain the superstate vector

S_{H}

[8], as follows:

S_{H} = [y_{1}, \dots, y_{m_{1}}, y_{m_{1} + 1}, \dots, y_{m_{1} + m_{2}}, \dots, y_{ϒ}], y \in (0, 1)

(9)

where

[y_{m_{1} + 1}, \dots, y_{m_{1} + m_{2}}]

represents the state vector corresponding to load 2. Since each appliance can only correspond to one state at any given time, the state vector of each load has exactly one bit set to 1, with all other bits set to 0;

ϒ = \sum_{i = 1}^{N} m_{i}

represents the number of bits in the superstate vector. We can assume that the superstate composed of a washing machine and a light bulb is as follows:

S_{H^{'}} = [y_{1}, \dots, y_{m_{1}}, y_{m_{1} + 1}, \dots, y_{m_{1} + m_{2}}]

(10)

The washing machine has four states—off, filling, washing, and spinning; the light has only two states—on and off. Therefore,

S_{H^{'}}

can be represented by a 6-bit binary vector. The first two bits represent the state vector of the light, with the first bit indicating that the light is off and the second bit indicating that the light is on; bits 3–6 represent the washing machine operating in the off, filling, washing, and spinning states, respectively. If the superstate vector at a certain time is

[0, 1, 1, 0, 0, 0]

, it indicates that the light is on and that the washing machine is off. By combining the above superstate encoding rules with the load power template obtained in Section 4.2, the corresponding combined power template

P_{template}

can be derived. As such, Equation (8) can be rewritten as follows:

P (t) = S (t) \cdot P_{template}^{⊤} + P_{noise} (t) + P_{PV} (t)

(11)

where

S (t)

is the superstate corresponding to moment t;

P_{PV} (t)

is the moment t rooftop PV power. So far, the problem of non-intrusive load decomposition for rooftop PV-containing households is transformed into a problem comprising how to find the load state at each moment in the presence of PV disturbances with high power stochasticity and volatility.

2.3. Radiation Agent-Based Photovoltaic Data Acquisition

Assuming that the user cannot provide enough PV historical power data or does not install a special PV data acquisition device, this paper proposes the use of a radiation proxy [16] to obtain the PV historical power of the user to be disaggregated, which is used for model training. Considering that fine-grained solar radiation data are difficult to obtain, and different PV system panel materials and installation angles are different, this paper does not directly use irradiation intensity to calculate the PV power, but rather, the neighboring PV site power is used as the radiation proxy data to estimate the PV historical power of the user to be disaggregated.

Assuming that the power of the PV proxy site at a certain moment is

p_{PV}^{agent} (t)

, if a user installs a PV system with a similar configuration to the proxy site, there is an approximate linear relationship between the PV power of the household to be disaggregated,

p_{PV}^{family} (t)

, and the PV power of the proxy site, which can be derived as follows:

p_{PV}^{family} (t) \approx ξ \cdot p_{PV}^{agent} (t)

(12)

where

ξ

is the capacity factor of the household PV to be disaggregated with respect to the PV agent site, which can be derived from the rated capacity of the two PV systems.

To validate Equation (12), this paper utilizes photovoltaic power generation capacity data from the Pecans Street dataset [17] and employs the Pearson correlation coefficient, as shown in Equation (13), to measure the similarity of power output among photovoltaic systems of different capacities within the same region.

R (X, Y) = \frac{cov (X, Y)}{σ (X) \cdot σ (Y)}

(13)

where X and Y represent the daily power data of different PV systems.

cov (\cdot)

is the covariance of the two data.

σ (\cdot)

is the variance of the data.

As shown by the red data in Figure 1, the values of PV power correlation coefficients for different users are all greater than 0.75, which shows that it is feasible to use the PV proxy site historical power data with the capacity coefficients to estimate the PV historical power of the users to be disaggregated.

As seen in Figure 1, the output power of the PV array has a strong randomness and volatility, which is easy to mask other loads. In order to reduce the difficulty of load identification, the PV power data are classified according to the weather. According to the cloud coverage level, the weather is classified into four categories—clear, less cloudy, mostly cloudy, and rainy [18]. Since the number of clusters is known, the K-means clustering algorithm is used in this paper to cluster the PV data from neighboring sites; the PV power curves under the four categories of typical weather are shown in Figure 2.

After clustering the historical power of the PV proxy site, the historical PV data of the user to be disaggregated are estimated using Equation (12), which constitutes the training dataset together with the historical load data.

3. SHMM-Based Non-Intrusive Load Decomposition Considers Photovoltaics

3.1. HMM Simplified Model

The HMM is a statistical model for signal processing that is often used to represent the relationship between the observation sequence and the hidden sequence. The structure of the HMM is shown in Figure 3. The model can be represented by a quintuple—

λ = {H S, O, π, A, B}

, where HS is the set of hidden states, i.e., the loading state to be solved; O is the observation state corresponding to the hidden state, i.e., the total power;

π

is an initial probability distribution, which denotes the initial probability of being in each hidden state; A is the state transfer probability distribution, whose element aij denotes the probability that hidden state i transfers to hidden state j; and B is the observation probability distribution, whose element b_oj denotes the probability that the observation value is o when the hidden state is j.

The HMM satisfies the chi-squared Markovianity assumption, i.e., the transfer probability of the whole process is certain. However, in real situations, the state transitions of loads throughout the day have time-varying characteristics. A constant state transfer probability cannot accurately describe the changes in users’ energy use behavior. Moreover, the identification result of the model moment t depends on the state at the moment t − 1. When the state at the moment t − 1 is calculated incorrectly, it will have a great impact on the calculation result at the current moment. In particular, with the interference of rooftop PV, it is easy to affect the subsequent decomposition results.

In order to avoid the impact of the decomposition result of the previous moment on the current moment and to improve the computational efficiency, the following occurred: (1) The SHMM is obtained without considering the state transfer matrix A. (2) In order to reflect the time-varying characteristics of the PV and the load, one day is divided into a number of time periods, as shown in Figure 4. From 01:00 to 05:00, the load effectively belongs to the state of shutdown or the constant operation state; from 05:00 to 19:00, PVs are in the power generation stage. In order to limit the PV fluctuation to a smaller range, it is divided into a segment of every two hours. From 15:00 to 01:00 of the next day is the peak period of electricity consumption, which is also observed in segments of two hours.

3.2. Acquisition of Model Parameters for Different PV Outputs

After obtaining the labeled data according to the coding method in Section 2.2, the daily historical data under different weather conditions are divided into 11 subsets by time slots according to the segmentation principle in Section 3.1. The model parameters

π

and B are trained.

The following equation can be used to obtain the initial probability distribution of the superstate according to time period

π

:

π_{i} = (\frac{C^{S_{1}^{i}}}{\sum_{j = 1}^{Γ_{i}} C^{S_{j}^{i}}}, \frac{C^{S_{2}^{i}}}{\sum_{j = 1}^{Γ_{i}} C^{S_{j}^{i}}}, \dots, \frac{C^{S_{Γ_{i}}^{i}}}{\sum_{j = 1}^{Γ_{i}} C^{S_{j}^{i}}})

(14)

where

π_{i}

is the initial probability distribution of time period i,

C^{S_{j}^{i}}

denotes the frequency of superstate j. From Equation (14), it can be seen that the initial probability distribution is not related to the PV output.

In this paper, different observation probability distribution matrices are established for different PV power types. Taking a clear day as an example, the PV historical power data under typical weather conditions of the household to be disaggregated are estimated using Equation (12) and are summed with the load historical power to obtain the total residential power P^clear, which leads to the observation probability matrix, as follows:

{B_{i}^{clear}}_{Γ_{i} \times O_{i}^{clear}}

(15)

where

O_{i}^{clear} = \max (P_{i}^{clear}) - \min (P_{i}^{clear}) + 1

(16)

The elements of the observation probability matrix

{B_{i}^{clear}}_{Γ_{i} \times O_{i}^{clear}}

can be obtained from Equation (17), as follows:

b_{S_{j o}} = \frac{C_{S_{j o}}}{\sum_{k = 1}^{O_{i}^{clear}} C_{S_{j k}}}

(17)

where

b_{S_{j o}}

denotes the probability of observing power o when the load overstate is j;

C_{S_{j k}}

is the frequency of observing power k when the load overstate is j. Since PVs only generate a certain amount of power during the daytime, it is only necessary to find the observation probability matrix by weather type in time slots 2 to 8.

In summary, the initial probability distribution matrix

π

and the observation probability distribution matrix B can be obtained for the four typical weather types for the time slots.

3.3. NILM Algorithm Implementation Flow

In summary, the non-intrusive load decomposition steps based on IDPC-SHMM are shown in Figure A1.

(1) Input the net load power data of the user to be decomposed,

P (t)

, and the timestamp, t.

(2) Use Euclidean distance to measure the similarity between the net load power to be decomposed and the PV power under four types of typical weather; select the initial probability matrix

π_{i}

and the observation probability matrix

B_{i}^{weather}

, which corresponds to the type of weather on the day, and construct the objective function shown in Equation. (18) to solve for the load overstate.

S^{'} (t) = \underset{j \in \{1, 2, \dots, Γ_{i}\}}{\arg \max} \{π_{i} (j) \cdot B_{i}^{weather} (j, P (t) - \min (P_{i}^{weather}) + 1) \cdot θ (P_{pv}^{family} {}^{'}{(t)})\}

(18)

In Equation (18), weather is the weather type of the day;

θ (P_{pv}^{family} {}^{'}{(t)})

is the PV output penalty factor, which can be expressed as follows:

θ (P_{pv}^{family} {}^{'}{(t)}) = \{\begin{cases} 1 P (t) - S_{i}^{j} \cdot {(P_{i}^{template})}^{Τ} < 0 \\ 0 P (t) - S_{i}^{j} \cdot {(P_{i}^{template})}^{Τ} \geq 0 \end{cases}

(19)

It mainly avoids the decomposition results in a situation where the PVs do not contribute or even consume positive power during the day.

(3) For the superstate obtained in step (2), the state at each load moment t can be obtained using Equation (16) and the power template information.

(4) Load decomposition post-processing. Although the impact of different PV powers on the load decomposition results is considered when the load decomposition model is established in Section 3.2, the model is solved by each data point, which may easily lead to the decomposition results producing the anomalous situation of ‘state a–state a–state b–state a–state a–state a–state a’. However, normally, the load will be maintained for a period of time in a certain state. Under normal circumstances, the load will be maintained in a certain state for a certain period of time. This paper adopts the state correction algorithm to deal with the decomposition results, whereby if the state of a load at time t is S_b, and the states before and after it are S_a, then the state at time t is S_a.

(5) Find the power value of each moment according to the load state and power template.

(6) The difference between the net load power and the total load power is utilized to derive the PV power.

4. Results

4.1. Introduction to the Dataset

In this paper, the Ampds dataset [19] and the Pecans Street dataset [17] are used to validate the effectiveness of the proposed algorithms. Ampds is a public dataset commonly used in the field of NILM, which collected minute-by-minute water, electricity, and gas energy consumption data for a particular household in the Vancouver area of Canada from 2012 to 2014; however, this dataset contains only conventional loads and lacks PV data. The Pecan Street dataset has a user group that is maintained, managed, and owned by the Hickory Corporation, providing a one-minute sampling frequency conventional load with PV power data.

4.2. Algorithmic Evaluation Metrics

NILM has a variety of evaluation indexes; this paper selects the state identification accuracy (acc), F1 score [20], and power decomposition accuracy [21] as the evaluation indexes of the algorithm, in which the individual load power decomposition accuracy is denoted as ea, while the overall load (except PV) power decomposition accuracy is denoted as ea_total. Since PV does not have an obvious switching state, this paper describes the ability of the algorithm to decompose PVs only in terms of the power decomposition accuracy ea_PV.

(1) Accuracy of load state identification:

a c c_{i} = \frac{\sum_{t = 1}^{T} μ (s_{i}^{t} = {s_{i}^{'}}^{t})}{T}

(20)

where acc_i is the individual load state identification accuracy and overall identification accuracy;

s_{i}^{t}

is the true state of load i at moment t;

{s_{i}^{'}}^{t}

is the identified state of load i at moment t;

μ (s_{i}^{t} = {s_{i}^{'}}^{t})

=1 when

s_{i}^{t} = {s_{i}^{'}}^{t}

or 0 otherwise; and N is the number of loads.

(2) F1 score:

F 1 = 2 \frac{p \cdot r}{p + r}

(21)

p = \frac{t p}{t p + f p}

(22)

r = \frac{t p}{t p + f n}

(23)

where tp, fp, and fn represent the true, false positive, and false negative classes, respectively.

(3) Load power decomposition accuracy:

e a_{i} = 1 - \frac{\sum_{t = 1}^{T} |p_{i} (t) - p_{i} {}^{'}{(t)}|}{2 \sum_{t = 1}^{T} p_{i} (t)}

(24)

e a_{total} = 1 - \frac{\sum_{i = 1}^{N} \sum_{t = 1}^{T} |p_{i} (t) - p_{i} {}^{'}{(t)}|}{2 \sum_{i = 1}^{N} \sum_{t = 1}^{T} p_{i} (t)}

(25)

where

e a_{i}

and

e a_{total}

are the power decomposition accuracy of individual loads and the total power decomposition accuracy of loads;

p_{i} (t)

and

p_{i} {}^{'}{(t)}

denote the real power and decomposition power of load i at time t, respectively; and N is the number of loads.

4.3. Experimental Results and Analyses

In this paper, the model is built based on MATLAB R2023b, while the arithmetics are analyzed on a laptop computer with Intel Core TM i7-14650HX CPU.

In order to reduce the dependence of the user to be disaggregated on the PV proxy site, the proposed algorithm only estimates the historical PV power of the user in the training stage using Equation (10), which is used to train the model parameters. The disaggregation of the PVs generating power and the load consuming power in the disaggregation stage can be accomplished by only providing the net load data and time information. Since the Pecans Street dataset does not provide the installed capacity information of PV, the approximate relative capacity coefficients of different PV systems are derived by comparing the PV power during the noon hour under clear day weather.

(1) Testing on the Pecan Street dataset

The load composition of the two customers selected for this paper is shown in Table 1. Five weeks of data were selected as the training dataset and one week of data was selected as the test dataset. Since the PVs were divided into four categories during the training phase, there are less than 5 weeks of PV power data for each category. To ensure that the PV data used for training are of the same length as the load data, the PV data for each category are simply spliced.

The Sparse HMM algorithm from [8] is compared with the algorithm proposed in this paper; considering that LSTM has a better effect in time series processing, this paper builds an LSTM network in MATLAB [22] and compares it with the algorithm of this paper. The results of the comparison are shown in Table 2, where the ea_PV in Table 2 is calculated using Equation (24).

From Table 2, it can be seen that the accuracy of overall power decomposition of the algorithm loads of the algorithm proposed in this paper can be up to 82%, while the accuracy of PV decomposition reaches 90%, which is significantly higher than the Sparse HMM algorithm and slightly higher than the LSTM algorithm. In addition, in terms of decomposition time, the algorithm proposed in this paper takes only 0.03 s, which is significantly faster than the other two algorithms (the training phase can be completed offline, so this paper only compares the load decomposition time in the testing phase).

In order to test the performance of this paper’s algorithm for recognizing individual loads, houses 1 and 2 are selected for comparison with the above two algorithms; the comparison results are shown in Table 3. From the load decomposition results of house 1 in Table 3, it can be seen that the power decomposition accuracy of electric vehicles reaches 91.66%, which indicates that the algorithms in this paper not only decompose the loads of families containing PVs but can also be used for families containing electric vehicles. From the decomposition results of the furnace, it can be seen that the F1 scores of all three algorithms reach 1, mainly because the furnace has been in the operation phase during the test phase, so the F1 score is higher. However, from the state identification accuracy and power decomposition accuracy rate, this paper’s algorithm is better than the other two algorithms. In addition, Table 3 also shows that compared with house 2, the power decomposition effect of house 1 for individual loads is poorer, mainly because the two air conditioners and the ‘kitchen’ in house 1 run very infrequently during the test period. When a small number of identification errors occur, it will have a greater impact on the identification results of the load. However, Table 2 shows that when the loads with a very low operating frequency have identification errors, the impact on the overall decomposition of the loads is small.

Table 2 and Table 3 show that the algorithm proposed in this paper is better than the Sparse HMM and LSTM algorithms in terms of PV power decomposition accuracy, load identification effect, and computation time.

(2) Testing in synthetic datasets

Most of the sub-meters of the users in the Pecan Street dataset provide the total power of several loads. However, fewer existing NILM algorithms use the Pecan Street dataset. For a better comparison with existing algorithms, in this paper, the minute-by-minute conventional load data provided by Ampds are combined with the minute-by-minute household PV data provided by the Pecan Street dataset according to daily timescales in order to obtain semi-synthetic data. Since this paper does not consider the participation of residential customers in load demand response for the time being, it is assumed that the inclusion of PV does not have an impact on residential electricity consumption behavior. The two scenarios designed in this paper are shown in Table 4. Again, 5 weeks of data are chosen as the training set and 1 week of data is chosen as the test set.

The load decomposition results of Scenario 1 are shown in Table 5. BME, HPE, TVE, WOE, DWE, CDE, FRE, and FGE represent Basement Plugs and Lights, Heat Pump, Ent TV/PVR/AMP, Wall Oven, Dishwasher, Clothes Dryer, HVAC/Furnace, and Kitchen Fridge, respectively. It can be seen from Table 5 that the power decomposition accuracy of PVs reaches 94%, and the intercepted load decomposition results of the three-day period shown in Figure 5 indicate that the PV output obtained by the decomposition of the algorithm proposed in this paper is still consistent with the actual output of PV under the situation of the PV output fluctuating greatly. Under PV interference, the state identification accuracy of loads except FGE is not less than 95%, and the overall power decomposition accuracy of loads is not less than 80%, which is better than the Sparse HMM and LSTM algorithms in most of the evaluation indexes. In addition, there are some differences in the decomposition results of loads with different power levels, mainly because the frequent fluctuation of PV power causes some interference in the decomposition effect of small power loads such as FGE.

For comparison with the mixed integer nonlinear programming algorithm proposed in [21], the same six loads were selected as shown in Scenario 2 in Table 4. The model is built at AMPL with reference to the work in [21] and is solved using the Gurobi solver. The load decomposition comparison results are shown in Table 6, from which it can be seen that when the installed capacity of household PV is 6 kW, the accuracy of the load overall and the power decomposition of the PVs of this paper’s algorithm reach 84.39% and 96.70%, respectively, which is an improvement of 10.4% and 3.56%. Compared with the algorithm proposed in [21], although the identification of the loads with lower operating frequency—dryers—is not as effective as the algorithm in [21], the computational complexity in this paper is much less than that of the comparison algorithms. Because this paper’s algorithm does not consider the state transfer probability distribution matrix A, the solution time is only related to the number of superstates in the time period to be decomposed. It can be seen from Equation (16) that the computational complexity of this paper’s algorithm is only O (

Γ_{i}

). In addition, due to the fact that the household energy scenarios are limited in practice, most of the superstates do not exist in a specific time period. In [21], 10 constraints need to be considered in the decomposition stage, and the decomposition process requires global optimization; therefore, the computational complexity is high.

In order to test whether different PV agent sites will have an impact on the load decomposition results, this paper selects five different PV systems in the same region as agent sites, as well as selecting scenario 1 to be the user to be decomposed; the test results are shown in Table 7.

From Table 7, it can be seen that when choosing PV systems with different capacities in the same area as the proxy site, it does not significantly affect the load decomposition accuracy. When the approximate installed capacity of the PV proxy site is 2 kW, the overall load decomposition accuracy is slightly lower, but the PV decomposition accuracy can still reach 93%.

From the above analysis, it can be seen that the algorithm proposed in this paper can reach 90% accuracy for the power decomposition of rooftop PVs, the overall power decomposition accuracy of the load reaches 80% under the situation of large fluctuations in PV power, and the computational complexity is smaller compared to the comparison algorithm. Load level differences in the PV power fluctuations will lead to a relatively low accuracy of the decomposition of small power loads, such as FGE, kitchens, TVE, etc.

Comprehensively, the algorithm in this paper not only achieves the separation of PV power and load consumption power, but also ensures that heat pumps, electric vehicles, and other large power loads still have a high decomposition accuracy under the interference of frequent fluctuations in PV power, providing a greater advantage over the comparison algorithms in terms of load identification accuracy and computational complexity.

5. Conclusions

In this paper, a non-intrusive load decomposition algorithm based on IDPC-SHMM for rooftop PV-containing residents is proposed to address the problem of load demand and PV output agnosticity due to the fact that the gateway meter only presents the net load power. In this paper, simulations are carried out using two publicly available datasets. The following conclusions are drawn by comparing the algorithm of this paper with other algorithms:

(1): The load decomposition method based on the SHMM can effectively avoid the problem of error accumulation that the traditional HMM tends to bring under the interference of PV power fluctuation, improving the accuracy of load identification.
(2): In four power scenarios, the proposed algorithm has an average decomposition accuracy of 93.18% for PV power and 81.94% for overall load power, which is 12.1% and 14.1% higher than the Sparse HMM algorithm, 4.7% and 6.8% higher than the LSTM algorithm, and 10.4% and 3.5% higher than the Mixed-Integer Nonlinear Programming (MIP) algorithm. The decomposition time is further reduced compared to the above comparison algorithms.
(3): Small-power devices such as refrigerators and kitchen appliances (<200 W) have a significantly lower power decomposition accuracy rate (approximately 65%) than large-power loads (>90%) during periods of intense photovoltaic fluctuations. The main reason for this is that random fluctuations in photovoltaic power tend to overwhelm the characteristic signals of small loads, leading to errors in state identification.

In order to further improve the decomposition accuracy of small-power intermittent loads such as refrigerators under photovoltaic interference, it is considered to draw on the idea of quadratic integer planning and add constraints such as operating hours and daily power consumption, so that the algorithm can have a better decomposition performance for small-power loads.

Author Contributions

Conceptualization, X.L. (Xingqi Liu); Validation, A.Z.; Investigation, X.L. (Xuan Liu); Data curation, A.Z.; Writing—original draft, J.D. and Y.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Science and Technology Project of SGCC, grant number [5400-202472207A-1-1-ZN].

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. Load decomposition flowchart.

References

National Energy Administration. Notice of the Comprehensive Department of the National Energy Administration on the Submission of Pilot Programmes for the Development of Rooftop Distributed Photovoltaics in Whole Counties (Municipalities and Districts) [EB/OL]. Available online: http://zfxxgk.nea.gov.cn/2021-09/08/c_1310186582.htm (accessed on 20 June 2024).
Kabir, F.; Yu, N.; Yao, W.; Yang, R.; Zhang, Y. Joint estimation of behind-the-meter solar generation in a community. IEEE Trans. Sustain. Energy 2020, 12, 682–694. [Google Scholar] [CrossRef]
Jaramillo, A.F.M.; Laverty, D.M.; Del Rincón, J.M.; Brogan, P.; Morrow, D.J. Non-intrusive load monitoring algorithm for PV identification in the residential sector. In Proceedings of the 2020 31st Irish Signals and Systems Conference (ISSC), Online, 11–12 June 2020; pp. 1–6. [Google Scholar]
Wang, X.; Yu, M.; Huo, Z.; Yang, D. Short-term power forecasting of distributed photovoltaic station clus-ters based on affinity propagation clustering and long short-term time-series network. Autom. Electr. Power Syst. 2023, 47, 133–141. [Google Scholar]
Ge, L.; Fan, Y.; Lai, J.; Sun, Y.; Zhang, Y. Artificial Intelligence Enabled Microgrid Optimization Technology for Low Carbon Economy. High Volt. Eng. 2023, 49, 2219–2238. [Google Scholar]
Hart, G.W. Nonintrusive appliance load monitoring. Proc. IEEE 1992, 80, 1870–1891. [Google Scholar] [CrossRef]
Kong, W.; Dong, Z.Y.; Ma, J.; Hill, D.J.; Zhao, J.; Luo, F. An extensible approach for non-intrusive load disaggregation with smart meter data. IEEE Trans. Smart Grid 2016, 9, 3362–3372. [Google Scholar] [CrossRef]
Makonin, S.; Popowich, F.; Bajić, I.V.; Gill, B.; Bartram, L. Exploiting HMM sparsity to perform online real-time nonintrusive load monitoring. IEEE Trans. Smart Grid 2016, 7, 2575–2585. [Google Scholar] [CrossRef]
Run, Z.; Yue, X.; Yang, W.; Shiwei, X.; Youbo, L.; Junyong, L. Non-intrusive identification and load forecasting of household electric vehicle charging behavior based on smart meter data. Power Syst. Technol. 2022, 46, 1897–1908. [Google Scholar]
Feng, C.; Liu, P.; Wang, J.; Wen, F.; Zhang, Y. Non-intrusive load monitoring algorithm of residential users us-ing limited low-frequency information. Electr. Power Autom. Equip. 2023, 43, 181–187. [Google Scholar]
Dinesh, C.; Welikala, S.; Liyanage, Y.; Ekanayake, M.P.B.; Godaliyadda, R.I.; Ekanayake, J. Non-intrusive load monitoring under residential solar power influx. Appl. Energy 2017, 5, 1068–1080. [Google Scholar] [CrossRef]
Pang, K. Research on Photovoltaic Disaggregation and Incentive-Based Demand Response Baseline Load of Behind-The-Meter System. Ph.D. Thesis, Guangdong University of Technology, Guangzhou, China, 2022. [Google Scholar]
Schirmer, P.A.; Mporas, I. Non-intrusive load monitoring: A review. IEEE Trans. Smart Grid 2023, 14, 769–784. [Google Scholar] [CrossRef]
Rodriguez, A.; Laio, A. Clustering by fast search and find of density peaks. Science 2014, 344, 1492–1496. [Google Scholar] [CrossRef] [PubMed]
Gan, W.; Li, D. Hierarchical Clustering based on Kernel Density Estimation. J. Syst. Simul. 2004, 16, 302–305+309. [Google Scholar]
Khodayar, M.; Liu, G.; Wang, J.; Kaynak, O.; Khodayar, M.E. Spatiotemporal behind-the-meter load and PV power forecasting via deep graph dictionary learning. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4713–4727. [Google Scholar] [CrossRef] [PubMed]
Pecan Street Inc. Dataport Load Data [DB/OL]. Available online: https://www.pecanstreet.org/dataport/ (accessed on 6 February 2025).
Torquato, R.; Shi, Q.; Xu, W.; Freitas, W. A Monte Carlo simulation platform for studying low voltage residential networks. IEEE Trans. Smart Grid 2014, 5, 2766–2776. [Google Scholar] [CrossRef]
Makonin, S.; Popowich, F.; Bartram, L.; Gill, B.; Bajić, I.V. AMPds: A public dataset for load disaggregation and eco-feedback research. In Proceedings of the 2013 IEEE Electrical Power & Energy Conference, Halifax, NS, Canada, 21–23 August 2013; pp. 1–6. [Google Scholar]
Bao, G.; Huang, Y. Non-intrusive load monitoring based on ResNeXt network and transfer learning. Autom. Electr. Power Syst. 2023, 47, 110–120. [Google Scholar]
Balletti, M.; Piccialli, V.; Sudoso, A.M. Mixed-integer nonlinear programming for state-based non-intrusive load monitoring. IEEE Trans. Smart Grid 2022, 13, 3301–3314. [Google Scholar] [CrossRef]
Liu, H.; Liu, Y.; Deng, S. A power load identification method based on LSTM model. Electr. Meas. Instrum. 2019, 56, 62–69. [Google Scholar]

Figure 1. Comparison of the output of PV systems of different capacities in the same area.

Figure 2. Rooftop PV power curves for four typical weather types.

Figure 3. HMM structure diagram.

Figure 4. Segmentation of customers’ electricity consumption periods.

Figure 5. Load power decomposition results for a 3-day period.

Table 1. Four electricity scenarios.

	Photovoltaic Approximate Capacity	Load Composition
House 1	3 kW	Air conditioner 1, Air conditioner 2, Electric car Freezer, Kitchenette, Water heater
House 2	6 kW	Air conditioner, Fireplace 1, Fireplace 2, Pump, Freezer, Water heater

Table 2. Comparison of photovoltaic and load power decomposition.

	Evaluation Indicators	House 1	House 2
Sparse HMM	ea_total/%	65.53	81.83
	ea_PV/%	77.83	79.74
	T/S	0.07	0.10
LSTM	ea_total/%	81.04	82.57
	ea_PV/%	87.43	85.28
	T/S	0.36	0.53
This paper	ea_total/%	82.19	83.35
	ea_PV/%	95.27	90.14
	T/S	0.03	0.03

Table 3. Individual load recognition accuracy comparison.

Houses	Loads	Sparse HMM			LSTM			This Paper
Houses	Loads	acc/%	F1/%	ea/%	acc/%	F1/%	ea/%	acc/%	F1/%	ea/%
House 1	Air conditioner 1	94.90	-	-	92.78	-	-	99.85	-	-
	Air conditioner 2	96.67	57.63	21.61	98.88	58.63	56.45	99.38	57.92	55.92
	Electric car	95.00	74.05	80.33	83.84	16.11	91.12	97.99	91.60	91.66
	Freezer	67.23	-	64.24	51.40	30.18	68.06	72.15	56.58	68.50
	Kitchenette	99.41	56.11	52.33	99.49	-	36.35	99.04	7.14	33.72
	Water heater	92.31	69.69	67.45	83.60	34.27	66.17	93.07	70.08	70.88
House 2	Air conditioner	91.98	61.36	51.32	63.83	57.56	63.83	98.07	66.44	69.70
	Fireplace 1	92.59	100	94.16	92.02	100	92.02	98.13	100	97.68
	Fireplace 2	68.55	100	95.15	98.52	100	93.87	99.98	100	96.12
	Pump	83.19	7.38	27.32	46.68	-	46.69	70.79	0.71	-
	Freezer	66.49	61.64	62.17	58.85	45.75	88.60	68.33	62.86	83.14
	Water heater	82.46	46.17	85.46	64.51	66.83	84.51	90.41	86.56	87.64

Table 4. Two electricity scenarios.

	Photovoltaic Approximate Capacity	Load Composition
Electricity Scenario 1	3 kW	BME, HPE, TVE, WOE, DWE, CDE, FRE, FGE
Electricity Scenario 2	6 kW	HPE, TVE, DWE, CDE, FRE, FGE

Table 5. Scenario 1 load identification results.

Loads	Sparse HMM				LSTM				The Algorithm in This Paper
Loads	acc/%	F1/%	ea/%	T/S	acc/%	F1/%	ea/%	T/S	acc/%	F1/%	ea/%	T/S
BME	91.92	16.14	46.89	0.43	68.21	54.98	63.46	0.49	96.98	81.02	67.97	0.03
CDE	99.38	11.54	51.75		97.55	18.26	57.63		99.41	48.06	51.73
DWE	98.50	-	47.56		95.53	3.88	49.76		98.29	39.87	19.24
FGE	41.83	54.78	20.16		75.28	61.20	64.68		75.71	62.84	64.63
FRE	95.50	100	98.62		96.89	100	95.30		100	100	98.75
HPE	91.96	98.03	67.82		94.91	96.33	66.97		98.08	98.20	84.07
TVE	91.67	100	83.86		96.16	100	89.80		95.34	100	87.96
WOE	100	-	-		99.58	-	48.57		100	-	50.00
Total	-	-	68.73		-	-	70.69		-	-	80.29
PV	-	-	82.42		-	-	87.94		-	-	94.15

Table 6. Comparison of load recognition accuracy of electricity scenario 2.

Loads	Algorithm from [21]				The Algorithm in This Paper
Loads	acc/%	F1/%	ea/%	T/S	acc/%	F1/%	ea/%	T/S
CDE	99.88	94.51	97.58	59	99.39	60	89.49	0.03
DWE	92.66	9.58	-		98.46	51.13	31.83
FRE	100	100	99.07		100	100	98.75
TVE	80.79	100	58.62		94.98	100	88.15
FGE	67.28	56.75	54.99		75.75	60.16	64.74
HPE	76.44	86.89	88.62		98.10	98.20	85.17
Total	-	-	76.44		-	-	84.39
PV	-	-	93.38		-	-	96.70

Table 7. Impact of different PV agent sites on load decomposition accuracy.

	PV_1	PV_2	PV_3	PV_4	PV_5
Approximate capacity	2 kW	6 kW	8 kW	10 kW	12 kW
Capacity ratio	1.4284	0.4542	0.3098	0.2468	0.2107
ea_total	0.7992	0.8170	0.8021	0.8146	0.8064
ea_PV	0.9356	0.9418	0.9415	0.9422	0.9427

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, X.; Liu, X.; Zheng, A.; Dou, J.; Du, Y. Research of Non-Intrusive Load Decomposition Considering Rooftop PV Based on IDPC-SHMM. Energies 2025, 18, 4935. https://doi.org/10.3390/en18184935

AMA Style

Liu X, Liu X, Zheng A, Dou J, Du Y. Research of Non-Intrusive Load Decomposition Considering Rooftop PV Based on IDPC-SHMM. Energies. 2025; 18(18):4935. https://doi.org/10.3390/en18184935

Chicago/Turabian Style

Liu, Xingqi, Xuan Liu, Angang Zheng, Jian Dou, and Yina Du. 2025. "Research of Non-Intrusive Load Decomposition Considering Rooftop PV Based on IDPC-SHMM" Energies 18, no. 18: 4935. https://doi.org/10.3390/en18184935

APA Style

Liu, X., Liu, X., Zheng, A., Dou, J., & Du, Y. (2025). Research of Non-Intrusive Load Decomposition Considering Rooftop PV Based on IDPC-SHMM. Energies, 18(18), 4935. https://doi.org/10.3390/en18184935

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research of Non-Intrusive Load Decomposition Considering Rooftop PV Based on IDPC-SHMM

Abstract

1. Introduction

2. Load Power Template Construction and Photovoltaic Data Processing

2.1. IDPC-Based Load Power Template Construction

2.2. Load Overstatement Coding

2.3. Radiation Agent-Based Photovoltaic Data Acquisition

3. SHMM-Based Non-Intrusive Load Decomposition Considers Photovoltaics

3.1. HMM Simplified Model

3.2. Acquisition of Model Parameters for Different PV Outputs

3.3. NILM Algorithm Implementation Flow

4. Results

4.1. Introduction to the Dataset

4.2. Algorithmic Evaluation Metrics

4.3. Experimental Results and Analyses

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI