Photovoltaic Output Power Estimation and Baseline Prediction Approach for a Residential Distribution Network with Behind-the-Meter Systems

: Considering that most of the photovoltaic (PV) data are behind-the-meter (BTM), there is a great challenge to implement e ﬀ ective demand response projects and make a precise customer baseline (CBL) prediction. To solve the problem, this paper proposes a data-driven PV output power estimation approach using only net load data, temperature data, and solar irradiation data. We ﬁrst obtain the relationship between delta actual load and delta temperature by calculating the delta net load from matching the net load of irradiation for an approximate day with the least squares method. Then we match and make a di ﬀ erence of the net load with similar electricity consumption behavior to establish the relationship between delta PV output power and delta irradiation. Finally, we get the PV output power and implement PV-load decoupling by modifying the relationship between delta PV and delta irradiation. The case studies verify the e ﬀ ectiveness of the approach and it provides an important reference to perform PV-load decoupling and CBL prediction in a residential distribution network with BTM PV systems.


Background and Motivation
As the disadvantages of fossil fuel power generation become increasingly prominent, renewable energy generation is developing rapidly, especially photovoltaic (PV) generation [1][2][3]. The installed power capacity of renewable energy generation grew more than 200 GW, which is mostly PV generation in 2019 [4,5]. However, because of the intermittency and uncertainty of PV, the high penetration of PV could bring great challenges to the power grid, such as power distribution system planning and operation [6][7][8][9], load demand forecasting [10][11][12], hybrid energy system configuration [13,14], and PV power forecasting [15,16].
Demand response (DR) [17][18][19] is an effective method to improve the reliability and flexibility of the power distribution system [20][21][22]. In order to incent customers to participate in the DR program, the DR aggregators need to make decisions based on the customer baseline (CBL). Therefore, it is vital to predict the CBL accurately [23]. However, in a residential distribution network, the distributed PV Forecasting 2020, 2 471 owned by customers generally is located behind-the-meter (BTM) [24], which measures the net load and denotes actual electricity load minus PV power generation. Hence, the distributed PV generation, such as rooftop PV, makes it difficult to predict CBL with net load [25] due to the volatility of PV and load.
To solve the above issues, it is necessary to decouple the PV generation and electricity consumption load from net load data. The well-known method is to install additional meters to specifically monitor each PV for operators; however, it is infeasible as a result of high extra cost and privacy issues. Consequently, for a residential distribution network with BTM PV systems, the innovative approach to decouple PV-load and predict CBL from net load data is developed in this paper.

Literature Review
In recent years, there has been a large amount of literature on BTM PV, which involves BTM PV detection and capacity estimation [24,26], and BTM PV output power prediction [27,28], etc. In the meantime, it has been a heated research topic for PV-load decoupling and CBL forecasting. One of the common approaches is the PV physical model [29,30], established to simulate and disaggregate PV generation power; however, the detailed PV panel parameters (such as the size, material, azimuth, and tilt) and meteorological information including temperature and solar irradiation need to be known. The authors of [31] realized PV-load decoupling and PV parameters estimating through combining an iterative method with models including the PV physical model with the load estimation model. However, its performance particularly depends on the accuracy of the physical model. Fortunately, a variety of advanced measurement facilities have appeared in the power grid with the development of information technology, e.g., supervisory control and data acquisition (SCADA) and smart meters, etc. It has spawned the other approach (a data-driven approach), which has been investigated by some researchers to decouple PV output power from net load data.
The net load measured by smart meters was used to estimate the individual distributed PV's capacity and generation power in [15] through a support vector regression model. Based on the hybrid data dimension reduction method and mapping function, the authors of [32] provided a data-driven method to predict BTM PV generation power. However, they need to meter the output power of a small amount of BTM PV output power, which is assumed to have the similar characteristics as the estimated BTM PV. In [33], the unsupervised method disaggregating PV was proposed, considering customers' similar electricity consumption behavior with and without PV. Then, the authors in [34] further developed a disaggregation model to decouple PV-load considering a battery energy storage system (BESS). Nevertheless, the authors of [35] showed that it is different for the user with PV and without PV to consume electricity. To avoid these issues, considering both the PV output power and the harmonic current injected by the inverter of PV, the method was carried out in [36] to calculate the PV generation power. However, this method may not be appropriate when some PV installed without permission are included in the power distribution system.
In addition, the baseline (BL) prediction has been studied in some works. In [37], an improved support vector regression model with the ambient temperature of two hours before a DR event as input variables was implemented to predict the office building BL. The authors of [38] studied the CBL impacts on the profit of the company and customers through obtaining the CBL prediction of residential customers, and building the cost and profit function. To address the non-synchronous matching issues in the previous study, the authors in [39] proposed a clustering-based method to build a prediction model considered only on past DR daily load data instead of non-DR daily load data. The authors of [40] proposed a two-stage adaptive BL prediction method that combined the self-organizing map and k-means clustering methods to identify days similar to the tested day under DR events. Considering that determined BL prediction cannot reflect the users' complex electricity consumption behavior, in [41] the authors proposed a probabilistic BL prediction model based on Gaussian process regression. In [42], customers are selected randomly to form a control group to predict the CBL.
Reviewing the above articles, most CBL predictions were for users who had not installed a distributed PV system. However, considering the impact of BTM PV, all separated PV output power information in the residential distribution network is unknown except for the net load data and meteorology data, which will greatly affect the final prediction accuracy.
To address the above issues in this paper, based on the nearest neighbor algorithm and artificial neural network method, a PV-load decoupling and CBL prediction approach is proposed for a residential power distribution system with BTM PV. The main contributions of this paper are as below: 1.
The net load is decoupled from PV output power and actual load precisely; 2.
To correct the deviation of the matched net load data, the relationship between PV output power and the solar irradiance, and the relationship between actual load and the temperature, are discovered and further formulated; 3.
The CBL is predicted based on the PV-load decoupling.
This paper is organized as follows. Section 2 introduces the problem formulation. The methods are provided in Section 3, including the data set division, the electricity consumption sensitivity analysis, PV output power sensitivity analysis, and baseline prediction. In Section 4, the case studies are shown. Finally, the conclusions are given in Section 5.

Problem Formulation
In the residential distribution network with BTM PV, both PV output power and actual load are concealed in the net load. It brings a huge challenge for operators to make some decisions including optimal power dispatching and BL prediction.

PV Output Decoupling
Same as the customer net load, the aggregated net load is composed of the aggregated actual load and aggregated PV output power. Let D = {d|1, 2, . . . , D D } be a set of day record and T = {t|1, 2, . . . , T T } be a set of timestamp, then the aggregated net load on time t day d can be formulated as follows: where P nl (d, t), P ac (d, t), and P PV (d, t) are the aggregated net load, aggregated actual load, and aggregated PV output power, respectively. Figure 1 gives an example for the composition of net load.
Forecasting 2020, 2 FOR PEER REVIEW 3 Reviewing the above articles, most CBL predictions were for users who had not installed a distributed PV system. However, considering the impact of BTM PV, all separated PV output power information in the residential distribution network is unknown except for the net load data and meteorology data, which will greatly affect the final prediction accuracy.
To address the above issues in this paper, based on the nearest neighbor algorithm and artificial neural network method, a PV-load decoupling and CBL prediction approach is proposed for a residential power distribution system with BTM PV. The main contributions of this paper are as below: 1. The net load is decoupled from PV output power and actual load precisely; 2. To correct the deviation of the matched net load data, the relationship between PV output power and the solar irradiance, and the relationship between actual load and the temperature, are discovered and further formulated; 3. The CBL is predicted based on the PV-load decoupling. This paper is organized as follows. Section 2 introduces the problem formulation. The methods are provided in Section 3, including the data set division, the electricity consumption sensitivity analysis, PV output power sensitivity analysis, and baseline prediction. In Section 4, the case studies are shown. Finally, the conclusions are given in Section 5.

Problem Formulation
In the residential distribution network with BTM PV, both PV output power and actual load are concealed in the net load. It brings a huge challenge for operators to make some decisions including optimal power dispatching and BL prediction.

PV Output Decoupling
Same as the customer net load, the aggregated net load is composed of the aggregated actual load and aggregated PV output power. Let be a set of day record and be a set of timestamp, then the aggregated net load on time t day d can be formulated as follows: where nl P d t ( , ) , ac P d t ( , ), and PV P d t ( , ) are the aggregated net load, aggregated actual load, and aggregated PV output power, respectively. Figure 1 gives an example for the composition of net load.  Considering that the roof solar PV systems are usually invisible for the aggregator, and in most cases, the small scale distributed PV systems (less than 10 kW) are not set up with a separated PV output power meter [24], it is a big challenge to obtain the pure PV output power without the relevant monitoring information. Therefore, the purpose of the study is to acquire the decomposed PV output power by historical load data, solar irradiation data, and temperature data.
Unlike industrial load, temperature is the main impact factor for residential load consumption behavior [43]. As for the magnitude of PV output, the solar irradiation received by the PV panel plays a decisive role [44]. Based on the above-mentioned characteristics of user electricity consumption and PV output power, this paper considers the sensitivity of actual load to temperature to correct the difference in actual load due to temperature in net load on different days, and then the solar irradiation and PV power relationship is obtained.

CBL Prediction
For users who have not installed distributed PV equipment, relatively few factors can affect the CBL prediction. The error of the CBL prediction mainly comes from the diversity of the electricity consumption behavior of aggregated users under different environmental conditions [38]. However, for the users with PV systems, the uncertainty of PV output power should also be taken into account. To reduce the error of forecasting the CBL when only net load is available, in this paper, the CBL of actual load is predicted initially based on the PV decomposition technology, and then combined with the PV output power to improve the CBL prediction results of aggregated users equipped with distributed PV equipment.

Data Set Division
Due to the large differences in solar irradiation and temperature amplitude in different seasons, it leads to different electricity consumption patterns and changes in PV output power. The data set is divided into four categories according to local seasonal conditions. The detailed division time of four categories are shown in Table 1. Since the PV equipment does not generate electricity before sunrise and after sunset, a period τ = {t rise , t rise + 1, . . . , t set }, τ ∈ T is set to indicate the time containing solar irradiation, where t rise and t set represent the time of sunrise and time of sunset, respectively. The net load data of each season is further divided into the part that contains PV power, named P nl1 (s, d, t), and the part that does not contain PV power, named P nl2 (s, d, t), where s ∈ 1, 2, 3, 4, represents the label of seasons.

Correlation between Electricity Consumption and Temperature
Aggregated consumers have different electricity consumption habits in different seasons, and the difference in electricity consumption behavior is largely driven by temperature. In order to reflect the degree of correlation between temperature and consumer electricity consumption, a statistical method called Pearson's correlation coefficient [41] is used to measure the strength of the relationship between the two variables, as shown in Equation (2).
where T m,av and P ac,av represent the average temperature and average actual load, and T m,av and P ac,av represent the average value of T days average daily temperature and average actual load. The absolute values of Pearson's correlation coefficient are less than or equal to 1. The closer the r value is to 1 or −1, the stronger the positive or negative correlation between the two calculated variables. Conversely, the closer the r value is to 0, the weaker the correlation between the calculated two variables.

Electricity Consumption Sensitivity Model
Although temperature is the main factor that affects actual load, it is almost impossible to get the numerical relationship between temperature and actual load, because actual load consumption is invisible in the considered scenario. What is certain is that the PV output power is similar when the solar irradiation is the same or very similar [26,27] at the same time in the same season. Thus, if we can find a suitable match of pairs of similar or consistent solar irradiation dates and record them, and further make a difference to the corresponding net load data from the recorded date, we can eliminate the impact of PV output data as much as possible and get the actual load difference as formulated by the following equation: where R denotes the solar irradiation, and P ac1 denotes the actual load during the time period with solar irradiation. Since it is almost impossible to find a completely consistent daily PV curve pairing, we choose to search for a match at each time t ∈ τ. Considering that for a single point in time, the approximate date of illumination is generally more than one, and to ensure the number of matching samples, for each time t, we use the absolute difference between two points as the evaluation criterion of similarity and select h dates with the most similar solar irradiation data to match. With ∆R h rad d i , t denoting the set of solar irradiance difference between day d i and another day with similar solar irradiance, ∆R d i,h rad , t is the absolute difference values between the solar irradiation of day i and the solar irradiation of its hth closest day, and I h rad d i , t is the corresponding recording set of the date of solar irradiation similarity match. Their relationship could be given as follows: The delta temperature set ∆T h m,rad d i , t and approximate delta actual load set ∆P h * ac,rad d i , t according to the date match I h rad d i , t can be calculated by Equations (6) and (7), respectively, according to the data match in Equation (5).
where ∆T m d i,h rad , t represents the temperature difference between T m d i rad , t and T m d h rad , t . ∆P * ac d i,h rad , t is the approximate actual load difference between P nl d i rad , t and P nl d h rad , t . Given the delta temperature set and the approximate delta actual load set, the aim of this part is to perform a fit between these two sets of variables. From Section 3.2.1 we know that the temperature and actual load have a strong linear correlation. Especially in summer and winter, showing strong positive and negative correlations, respectively. Thus, we consider using a linear regression for curve fitting [45]. While taking into account that in the hypothesis when in the case of the same temperature situation, the actual load of the users should also be the same, that is to say when ∆T m d i,h rad , t equals to 0, ∆P * ac d i,h rad , t also equals to 0. Therefore, the proportional function is selected as the fitting function and the least squares method is used to solve slope k temp,ac (t), which can be expressed in Equation (8).
where n s represents the total number of days in season s. Noting that each time t ∈ τ has a slope k temp,ac (t), hence there are (t s − t r + 1) slopes in total. Figure 2 shows the example of scatter plots of delta temperature and delta actual load at 17:00 in winter and summer and its corresponding linear fitting line.  Given the delta temperature set and the approximate delta actual load set, the aim of this part is to perform a fit between these two sets of variables. From Section 3.2.1 we know that the temperature and actual load have a strong linear correlation. Especially in summer and winter, showing strong positive and negative correlations, respectively. Thus, we consider using a linear regression for curve fitting [45]. While taking into account that in the hypothesis when in the case of the same temperature situation, the actual load of the users should also be the same, that is to say when , also equals to 0. Therefore, the proportional function is selected as the fitting function and the least squares method is used to solve slope , which can be expressed in Equation (8).  Figure 2 shows the example of scatter plots of delta temperature and delta actual load at 17:00 in winter and summer and its corresponding linear fitting line.

PV Output Power Sensitivity Model
Similar to the method mentioned in Section 3.2.2, if a similar actual load consumption behavior in two days is found, then we make a difference of net load of the corresponding days, and the interrelated approximated delta PV output power can be obtained. Even though the total actual load cannot be found, partial actual load can still be determined. Because of the absence of sunlight where the period in which PV outputs equals to zero, the net load is given as: where τ is the complement of τ. Generally speaking, users' electricity consumption behavior is continuous. Therefore, if the daily electricity consumption behavior in t ∈ τ period is as similar as possible, the electricity consumption behavior in t ∈ τ period is as similar as possible to a large extent. The k nearest neighbor (KNN) algorithm [46] is used to find the approximate P nl2 .
K nearest neighbor is one of the most simple and effective data mining algorithms. It is proposed based on similar sample points close to each other in space. The principle of the KNN classification algorithm is that if a sample to be classified has k most similar (i.e., the nearest neighbors in the feature space) samples in the feature space, most of which belong to a certain category, the sample also belongs to this category. Therefore, when applying KNN to search for similar electricity consumption behavior curves, the similarity measure is performed on the feature space of each day d i , and the first k approximate samples are selected as the k neighbors of d i . The feature space of d i is P nl2 d i , t , t ∈ τ and the similarity of feature space of d i and d j is assessed via Euclidean distance ρ d i,j using Equation (10).
The smaller the value of ρ d i,j , the more similar the electricity consumption behavior in the two days (i.e., day i and j). Hence, we can also select k dates with the most similar electricity consumption behaviors data to match. The dates recording set is as follows: where d i,k beh is the recording date of electricity consumption behavior of kth nearest neighbor. Then we can make the following assumptions: The delta solar irradiation set ∆R k beh d i , t and approximate delta PV output power set ∆P k * PV,beh d i , t can be calculated by Equations (13) and (14), according to the dates recording in Equations (11) and (12).
where ∆R d i,k beh , t represents the solar irradiation difference between R d i beh , t and R d k beh , t . ∆P * PV d i,k beh , t is the approximate PV output power difference between P nl d i beh , t and P nl d k beh , t . Now we have the two sequences ∆R k beh d i , t and ∆P k * PV,beh d i , t , the next step is to get the relationship between these two sequences. It is worth noting that, unlike the electricity consumption sensitivity model mentioned in Section 3.3.1, when the temperature is equal to 0, the electricity consumption is not necessarily 0. For this section, when the solar irradiation is equal to 0, the PV output is also 0. That is to say, through this characteristic, when we find the relationship between delta solar irradiation and approximate delta PV output power, we also find the relationship between solar irradiation and PV output power. The derivation process is formulated as: where Rad amp (t) represents the equivalent solar irradiation amplitude of ∆R k beh d i , t , PV amp (t) represents the equivalent PV output power amplitude of ∆P * PV d i,k beh , t , and ∼ represents the symbol of derivation.

PV Output Power Sensitivity Model Based on Electricity Consumption Sensitivity Correction
The accuracy of decoupling the PV output power curve using the method in Section 3.3.1 depends largely on whether we can find the power consumption curves as similar as possible to match. However, due to actual conditions (insufficient historical data or similar electricity consumption behaviors that have not appeared in historical data), it is nearly impossible to find two completely consistent electrical behavior curves. Therefore, in most cases, there are large errors in the estimation of PV output curves according to the method in Section 3.3.1.
Our mentality is to choose two suitable dates to offset the actual consumption load P ac part of net load P nl as much as possible. However, the similarity of the matching curve in the τ period can only guarantee the similarity of the part of actual load of the matching curve pair in the τ period to a certain extent. Therefore, we hope to make some corrections to modify the two curves to have more similar electricity consumption behavior during the τ period. Although we cannot directly tell what the delta load ∆P ac of the matched two-day curve is in the τ period, we can gain insights about the temperature difference ∆T between the corresponding two days in the τ period. In other words, through the relationship of Equation (8) obtained in Section 3.2.2, we can approximate the correction of actual load difference between the two matching curves P cor in Equation (16).
Then, Equation (15) can be rewritten as: where ∆P * PV d i,k beh , t represents the correction of ∆P * PV d i,k beh , t , and PV amp (t) represents the equivalent PV output amplitude of ∆P * PV d i,k beh , t .

Evaluation Index of the PV Output Power Estimation
Mean absolute error (MAE) is used as the first evaluation indicator to calculate the accuracy of the decoupling results; the definition formula is as follows: where P est (t) and P real (t) represent the estimated value and real value of the PV output power at time t, and N represents the total time stamps during period τ.
Considering the PV output power value is extremely small or zero during the periods of sunrise and sunset or under certain extreme weather conditions, it is therefore not suitable to use mean absolute percentage error (MAPE) as a judgment indicator. However, we still want to show the decomposition effect more intuitively, so the following formula is defined to reflect the relative error of the estimated daily PV output power: Forecasting 2020, 2

478
where RAE PV,daily represents the daily relative absolute error (RAE). The decomposition process of PV output power from the net load proposed is shown in the following Figure 3.
decomposition effect more intuitively, so the following formula is defined to reflect the relative error of the estimated daily PV output power: where PV daily RAE , represents the daily relative absolute error (RAE).
The decomposition process of PV output power from the net load proposed is shown in the following Figure 3.

CBL Prediction Model Construction
In this section, the decomposition technology of PV output power is applied to forecast the CBL of user groups equipped with distributed PV systems to improve the accuracy of the prediction results. Since the CBL prediction makes no difference from the situation without a distributed PV system during the period τ, the CBL prediction in this paper mainly focuses on the scenario of the period τ. There are two methods for CBL prediction.

Direct Prediction Method
For the situation of PV output behind-the-meter, the CBL can only be predicted by the history net load of the non-DR event days. For CBL forecasting techniques, the most commonly used and effective method is the averaging algorithms and regression algorithms. The averaging algorithms mainly have the following three types, as given in Table 2. Table 2. Summary of the averaging methods.

Baseline Estimation Model Definition
High X of Y The average load of the X highest consumption days within those Y non-DR days preceding the DR event days Low X of Y The average load of the X lowest consumption days within those Y non-DR days preceding the DR event days Mid X of Y The average load of the X middle consumption days within those Y non-DR days preceding the DR event days In this paper, we also use multilayer perceptron (MLP) [47] and recurrent neural network (RNN) [48] as representatives of regression algorithms, and use historical net load data to forecast CBL.
From the definition of each average model and the characteristics of the regression algorithm, CBL of DR days can only be obtained through historical net load in the direct estimation method. This method may increase the forecast error due to the inability to consider the uncertainty of PV.

Prediction Method Based on PV Output Separation
Since we can peel off the PV output power through temperature and solar irradiation data, the actual load and PV output power on DR days can be predicted separately. The CBL prediction of the actual load part P ac,BL can be considered as the CBL prediction from the users that have not installed a distributed PV system. Then, we can get the final CBL prediction P nl,BL (t) by subtracting the part of separated PV output power P PV,BL (t); the equation is given as follows:

Evaluation Index of CBL Prediction
We evaluate the effect of CBL prediction methods from three perspectives (accuracy, bias, and variability). MAE is used for the evaluation of accuracy, which represents the absolute value of the difference between the true CBL and the predicted CBL. Bias is measured using the mean of the average error between the predicted CBL and the true CBL. Relative error ratio (RER) is used for the evaluation of variability, which represents a fraction of average load during the period τ [39].

Case Study
This section shows the effectiveness of the proposed PV output power curve decomposition and its application to CBL prediction.

Experimental Data Set and Platform Description
We used the load data set sourced from 300 randomly selected solar customers in Ausgrid's electricity network area as the experimental data. The customers chosen had a full set of actual load data and PV output power data for the period from 1 July 2010 to 30 June 2011. In other words, we had intuitive and separate statistics of PV output power data to measure the accuracy of the proposed decomposition method. The temperature and solar irradiation data of the relevant area were from the website [49]. The resolution of the data sets are 48 points per day.
The CPU used in the experiment was an Intel(R) Core(TM) i5-6500 @ 3.20 GHz and the RAM was 8 GB. Python 3.7.7 was used for experimental simulation. The aggregated data was summed by 300 customers' data. Timewise, 7:00-17:30 was set as the τ period. For the PV decomposition part of the experiment, the data of 1096 days in 3 years were used to derive the relationship between the solar irradiation and the PV output power. The total processing times of the method was 43.8 s, which meets the requirements of real-time PV output power estimation.
According to the season division rules in this paper, there are 98 days in spring, 460 days in summer, 169 days in autumn, and 369 days in winter. The clusters of actual load curves of four seasons are shown in Figure 4.
It can be seen from the figure that the actual load is relatively large due to the influence of air conditioning in summer and heating equipment in winter. For spring and autumn, the load has a similar shape. This is because the temperature in Australia in spring and autumn is similar, and the electricity consumption behavior of users is relatively consistent. Although the clusters of various seasons in the figure show that individual curves do not obey the group trends, one of the reasons could be due to extreme weather (cold wave and heat wave) in Australia. The division based on seasons can still effectively aggregate the actual load curves with the same electricity consumption pattern on the whole, which shows that clustering the net load based on different seasons is a good foundation for solving the temperature sensitivity of load consumption in the next step. was 8 GB. Python 3.7.7 was used for experimental simulation. The aggregated data was summed by 300 customers' data. Timewise, 7:00-17:30 was set as the  period. For the PV decomposition part of the experiment, the data of 1096 days in 3 years were used to derive the relationship between the solar irradiation and the PV output power. The total processing times of the method was 43.8 s, which meets the requirements of real-time PV output power estimation.
According to the season division rules in this paper, there are 98 days in spring, 460 days in summer, 169 days in autumn, and 369 days in winter. The clusters of actual load curves of four seasons are shown in Figure 4. It can be seen from the figure that the actual load is relatively large due to the influence of air conditioning in summer and heating equipment in winter. For spring and autumn, the load has a similar shape. This is because the temperature in Australia in spring and autumn is similar, and the electricity consumption behavior of users is relatively consistent. Although the clusters of various seasons in the figure show that individual curves do not obey the group trends, one of the reasons could be due to extreme weather (cold wave and heat wave) in Australia. The division based on seasons can still effectively aggregate the actual load curves with the same electricity consumption pattern on the whole, which shows that clustering the net load based on different seasons is a good foundation for solving the temperature sensitivity of load consumption in the next step.

PV Output Power Curve Separation
The Pearson correlation coefficient between average daily temperature and average daily load for each season are −0.48 (spring), 0.67 (summer), −0.36 (autumn), and −0.74 (winter). In other words, there is a strong correlation between temperature and actual consumption load. Therefore, we used absolute difference to find similar solar irradiation points in each season to match and offset the impact of PV output by making a difference to the net load of the corresponding day, and further used the least squares method to find the relationship between delta temperature and approximate actual load difference of each time stamp. Each solar irradiation point found the five most similar solar irradiation points to match. The slope of each time point of the linear fitting of temperature difference and load difference using the least squares method is shown in Figure 5. It can be observed that the slopes at different times in different seasons are inconsistent, and the positive and negative values of the slopes can also reflect the positive correlation between summer temperature and actual load, and the negative correlation among spring, autumn, and winter.

PV Output Power Curve Separation
The Pearson correlation coefficient between average daily temperature and average daily load for each season are −0.48 (spring), 0.67 (summer), −0.36 (autumn), and −0.74 (winter). In other words, there is a strong correlation between temperature and actual consumption load. Therefore, we used absolute difference to find similar solar irradiation points in each season to match and offset the impact of PV output by making a difference to the net load of the corresponding day, and further used the least squares method to find the relationship between delta temperature and approximate actual load difference of each time stamp. Each solar irradiation point found the five most similar solar irradiation points to match. The slope of each time point of the linear fitting of temperature difference and load difference using the least squares method is shown in Figure 5. It can be observed that the slopes at different times in different seasons are inconsistent, and the positive and negative values of the slopes can also reflect the positive correlation between summer temperature and actual load, and the negative correlation among spring, autumn, and winter. After obtaining the temperature-based sensitivity results of the actual load, the two-day load based on similar electricity consumption behavior can be corrected to the same temperature level according to the corresponding temperature difference (i.e., simulated as a situation where the electricity consumption behavior is as consistent as possible) during the analytic of the PV output After obtaining the temperature-based sensitivity results of the actual load, the two-day load based on similar electricity consumption behavior can be corrected to the same temperature level according to the corresponding temperature difference (i.e., simulated as a situation where the electricity consumption behavior is as consistent as possible) during the analytic of the PV output power sensitivity.
When analyzing the sensitivity of PV output power, the k value of k nearest neighbor was set to 15 to match load days with similar electricity consumption behavior. We used MLP to build a model of the relationship between delta solar irradiation and approximate delta PV output power. The MLP has a hidden layer with 20 hidden nodes, and the input-hidden layer and hidden-output layer are connected by the rectified linear unit (ReLU) and sigmoid function, respectively. The Adam optimizer was implemented in the MLP. In order to increase the training features of the sample, when building the MLP, in addition to delta solar irradiation, the solar irradiation of the two days before the difference was added as the input of the network. Table 3 shows the comparison of the separation accuracy of the PV output curve before and after the correction. It can be observed from the table that after the temperature-based correction, the separation accuracy of the PV output power curve has been greatly improved. The value of RAE in spring, summer, autumn, and winter is decreased by 31.60%, 44.08%, 42.40%, and 60.51%, respectively. In order to more intuitively reflect the effectiveness of the proposed PV output decomposition method, Figure 6 shows the decomposition results of PV output power for a week in each season. It can be seen from the figure that without temperature-based correction, the estimated PV output power will appear negative, which is obviously not logical. The main reason for this problem is that, although we have used KNN to find dates with similar temperatures as much as possible to ensure that the electricity consumption behavior of users on the two days is consistent, however, in reality, it is almost impossible to find a pair of dates with exactly the same temperature change. Moreover, for periods of sunrise and sunset, the solar zenith angle is too small, which leads to very high sensitivity of PV output. When using neural networks to build PV output power estimation models, they are extremely susceptible to differences in electrical behavior caused by temperature differences. Therefore, the PV output power and solar irradiation will show a negative correlation. However, after the temperature-based correction, the negative value is eliminated, and there is also a certain improvement in the curve waveform.

Results and Analysis of CBL Prediction
Since none of the customers in the data set participated in the DR project, we chose the top 50 electricity consumption days in the data set as the DR days, because the DR event generally occurs when the electricity consumption is large, and the true value of CBL can be regarded as the historical net load value of the corresponding date. At the same time, we could not determine the specific time when the DR event occurred. We assumed the scenario of the DR event occurring at the moment when there was PV output power to reveal the effectiveness of the CBL prediction algorithm based on PV output separation.
for periods of sunrise and sunset, the solar zenith angle is too small, which leads to very high sensitivity of PV output. When using neural networks to build PV output power estimation models, they are extremely susceptible to differences in electrical behavior caused by temperature differences. Therefore, the PV output power and solar irradiation will show a negative correlation. However, after the temperature-based correction, the negative value is eliminated, and there is also a certain improvement in the curve waveform.

Results and Analysis of CBL Prediction
Since none of the customers in the data set participated in the DR project, we chose the top 50 electricity consumption days in the data set as the DR days, because the DR event generally occurs when the electricity consumption is large, and the true value of CBL can be regarded as the historical  The number of hidden layer units of MLP and RNN were set to 20, and the number of layers was set to 2. We performed 500 iterations for each network training. Historical load and temperature data of 30 days before DR days were used to train these two networks.
The comparison forecasting results of the five methods (High 5 of 10, Low 5 of 10, Mid 5 of 10, MLP, and RNN) of the CBL based on direct prediction and PV curve decomposition are shown in Figure 6, in which a certain DR event day is randomly selected as an example. In Figure 7, we can see that after using the proposed PV output power curve decomposition algorithm, the accuracy of all five CBL prediction methods has been improved. Table 4 shows the average results of the evaluation indexes of CBL prediction for overall DR days under different methods.    From the MAE and RER in Table 4, it can be seen that after applying the PV output power curve decomposition algorithm, the errors of the three CBL prediction methods are reduced and the proposed method has a better stability. This is because through the PV output power curve decomposition, the PV output part can get relatively accurate forecasting results. Regardless of whether the PV output power curve is decomposed or not, the bias in the table is negative, indicating that the predicted CBL is generally smaller than the actual value. One of the explanations is that the selected DR days have a larger load. Therefore, these five CBL prediction methods provided in the calculation example can only obtain smaller values than the actual CBL. At the same time, we can also see that because High 5 of 10 uses the larger data in the first ten days for CBL prediction, this method has the highest accuracy.

Conclusions
This paper proposes a decomposition method of the behind-the-meter PV output power curve. The advantage of this method is that it can estimate the PV output power curve data of aggregated customers when only the historical net load data, historical temperature data, and historical PV data of aggregated customers are known, without the need for knowing individual distributed PV output system equipment information. The framework firstly searches for similar time points in historical solar irradiation data, makes a difference to the corresponding net load, and obtains the temperature-based sensitivity of actual load according to the least squares method. Then it finds the date with similar electricity consumption behavior to match and uses the temperature-based actual load sensitivity to correct the net load of the matched day to the same temperature standard and makes a difference to offset the actual load part. Finally, MLP is used to fit the relationship between the delta solar irradiation and delta approximate PV output power to derive the PV output power curve estimation model. To illustrate the effectiveness of the proposed PV output decomposition framework, a total of 300 real customers' data sets containing PV output power from Ausgrid were used to simulate PV output decomposition experiments. At the same time, the results of the two types of prediction algorithms (average and regression) demonstrated that the CBL prediction based on PV output decomposition has better performance in MAE, bias, and RER.
In summary, this is a brand-new framework for BTM PV output estimation. Its most obvious characteristic advantage is that the algorithm does not depend on any data other than the historical net load data and weather data. In future work, the performance of the model will be analyzed more comprehensively, such as the impact of the resolution of the input data on the performance of the model, and the stability of the model under extreme weather conditions. Author Contributions: Conceptualization, K.P., C.X. and C.S.L.; formal analysis, K.P., C.X. and C.S.L.; resources, C.S.L. and L.L.L.; data curation, K.P. and D.W.; writing-original draft preparation, K.P. and C.X.; writing-review and editing, C.S.L., L.L.L. and D.W. All authors have read and agreed to the published version of the manuscript.