1. Introduction
The global energy transition is accelerating the evolution of power systems towards a high penetration of renewable energy. However, the intermittency and volatility of wind and solar power pose tremendous challenges to the supply–demand balance of the system [
1,
2,
3,
4]. Against the backdrop of the paradigm shift from “source following load” to “source-load interaction,” residential load has emerged as the largest and most widely distributed flexible resource [
5,
6,
7]. According to statistics from the National Energy Administration, the domestic electricity consumption of urban and rural residents in China exceeded 1.588 trillion kWh in 2025. Although it accounts for only about 15% of the total electricity consumption across society, residential consumption behaviors exhibit high spatiotemporal concentration. Particularly during the evening peak, residential loads are highly prone to superimposing on the system’s peak load, imposing immense peak-shaving pressure on the power grid. Fully unlocking the flexible regulation potential of these massive resources will significantly enhance the operational efficiency of the novel power system, facilitate the efficient accommodation of renewable energy, reduce carbon emissions from fossil fuel generation, and provide crucial support for the green and low-carbon development of the power industry, thereby enhancing its overall sustainability.
Accurately predicting residential adjustable capacity is crucial for both market and operational decision-making. Load aggregators and virtual power plants (VPPs) require reliable flexibility estimations prior to submitting bids to energy or ancillary service markets [
8,
9,
10]. Overestimation may lead to energy imbalance penalties and delivery failures, whereas underestimation can leave valuable demand-side resources underutilized. At the distribution system level, dispatchable residential flexibility can only be effectively utilized upon achieving a comprehensive understanding of its magnitude, temporal location, and the uncertainties reliant on profile characteristics.
Methods for assessing demand response (DR) potential can be broadly categorized into physics-based optimization approaches and data-driven forecasting methods. Physics-driven models construct device-level load models based on the physical characteristics of appliances and energy conservation principles. Despite their clear physical interpretability, they suffer from inherent drawbacks such as severe parameter dependency and difficulties in large-scale applications. They require detailed parameters and behavioral habits for all appliances within each household, leading to prohibitively high data acquisition costs [
11,
12,
13,
14]. Furthermore, once fixed, model parameters are difficult to update, failing to capture the dynamic evolution of user response behaviors driven by time, electricity prices, and policies. Conversely, data-driven models, including transfer learning and deep learning [
15,
16,
17,
18,
19], can learn nonlinear mapping relationships from historical data, price signals, and response outcomes. However, these models typically rely on high-quality response labels, which are exceedingly scarce in the residential DR domain. Although adopting optimization-generated labels as a “cold-start” strategy can effectively break this data barrier, the inherent gap between theoretical optimization and uncertain actual human behaviors inevitably introduces errors into practical forecasting. Moreover, lacking explicit modeling of the physical boundaries of DR, these models cannot perceive the rigid constraints of user consumption and the operational limits of appliances, potentially yielding physically unrealistic predictions. In addition, many data-driven approaches fail to adequately incorporate user behavioral heterogeneity into the model inputs or the response sample generation process.
User profiling offers a practical approach for characterizing user heterogeneity from smart meter data. Clustering-based profiling technologies have been widely applied in smart meter data analysis and residential load pattern recognition [
20,
21,
22,
23]. Currently, however, user profiles are predominantly utilized for descriptive analysis, and their integration with flexible constraint sample generation and time-series forecasting technologies remains underexplored. To bridge this research gap, this paper proposes a profile-aware framework for residential adjustable capacity forecasting. The main contributions are summarized as follows:
Breaking the traditional limitation of utilizing user profiles merely for descriptive classification, this study deeply embeds user heterogeneity information into the entire pipeline of constraint extraction, sample generation, and adjustable capacity forecasting, thereby transforming clustered profiles from isolated preliminary results into core inputs for all stages of demand response modeling.
A profile-constrained demand response sample pool generation method is established. By combining user profile constraints with a mathematical optimization model to generate large-scale and high-quality training samples, it provides a reliable data foundation featuring both heterogeneity and rationality for data-driven models.
A profile-aware LSTM model is constructed to predict response load curves and adjustable capacity, utilizing baseline load, price signals, and profile labels as inputs.
2. Residential User Profiling
2.1. Load Feature Extraction
Load features directly reflect the overall electricity consumption scale and basic electricity demand of users [
24]. This paper constructs a load feature set comprising 7 physically meaningful indicators, which are summarized in
Table 1.
Let denote the power consumption of household on day and time interval , where = 1, …, . For the half-hourly data used in this study, = 48. Let be the set of valid days for household , be the evening peak time interval set; be the weekday set, and be the weekend set.
2.2. K-Means++ Clustering
To accurately extract typical residential electricity consumption behavior patterns, K-means++ is adopted for clustering analysis [
24,
25,
26]. This algorithm introduces a distance-based probability sampling strategy to ensure that the initial clustering centers are as dispersed as possible in space, thus effectively avoiding falling into local optimum and significantly improving the convergence speed and result stability of load clustering.
Let the historical load dataset be classified as , where represents the load curve of the -th user, and the set number of target typical load patterns is . The mathematical modeling and specific steps of the K-means++ algorithm for initializing clustering centers are as follows:
Step 1: Randomly select a load curve from the load dataset following a uniform distribution as the first clustering center, and add it to the selected center set .
Step 2: For each sample point in the dataset, calculate the Euclidean distance between it and the nearest clustering center in the current selected center set
, denoted as
. Its mathematical expression is as follows:
Step 3: Calculate the probability that each load sample in the dataset is selected as the next clustering center. This probability is proportional to the square of its distance to the nearest center. The probability distribution formula is defined as follows:
Step 4: Repeat Step 2 and Step 3 until the set contains clustering centers.
Step 5: Take these k highly representative load curves as the determined initial clustering centers, and then start the K-means alternating optimization process until the clustering results converge.
2.3. Construction Process of User Profile System
To accurately capture the heterogeneity of residential users’ electricity consumption behaviors, this paper constructs a user profile system driven by historical load data. Through multi-dimensional feature extraction and clustering analysis, the system aims to transform a large number of disordered load curves into group labels with typical response characteristics. The specific construction process is as follows:
Since the dimensions and value ranges of the 7 types of load features are significantly different, direct clustering will lead to high-dimensional features dominating the clustering results. Therefore, this paper uses Z-score standardization to preprocess all features and convert them into a standard distribution with a mean of 0 and a standard deviation of 1:
where
is the original feature value,
is the mean of the feature in the control training set,
is the standard deviation of the feature in the control training set, and
is the standardized feature value.
- 2.
Determination of the optimal number of clusters
To avoid the inherent limitations of a single clustering validity indicator and ensure the objectivity and reliability of the selection of the optimal number of clusters, this paper selects four objective clustering validity indicators: elbow method, silhouette coefficient, CH index and DB index, to conduct a comprehensive quantitative evaluation of the candidate number of clusters and determine the optimal number of clusters.
- 3.
Clustering model training and profile label assignment
Based on the standardized full feature matrix and the optimal number of clusters, the K-means++ algorithm is used to train the clustering model with an iteration number of 1000 and a convergence threshold of the maximum change in clustering centers less than 10−6. After the model training is completed, based on the feature differences in the clustering centers and combined with the typical electricity consumption behavior rules of residents, each cluster is assigned a user profile label with clear physical meaning.
4. Experimental Results and Analysis
This paper employs a residential load dataset from a region in Northwest China, comprising half-hourly electricity consumption data of 2207 households from January 2024 to June 2025. Specifically, from 1 January to 31 May 2024, all users were subject to a uniform flat-rate tariff. From 1 June 2024, to 1 June 2025, the test groups (Groups A, B, and C) were subject to Time-of-Use (ToU) tariffs, while the control group (Group D) continued under the flat-rate tariff. The ToU pricing schemes in this dataset reflect actual implemented pricing mechanisms within the Northwest China regional power grid, including residential ToU tariffs, critical peak pricing, and flat-rate discounted tariffs. These schemes comprehensively cover the mainstream price incentives of current demand response programs in China. The specific time period divisions and corresponding prices are detailed in
Table 2.
4.1. Construction of User Profiles
To capture the authentic electricity consumption behaviors of the users, the actual historical load profiles of 518 households from Control Group E, which was unaffected by price interventions, were selected as the baseline.
To verify the independence and information dimension coverage of the 7 constructed load feature indicators, this paper calculates the Pearson correlation coefficient between each feature, and the results are shown in
Figure 2.
It can be seen from
Figure 2 that there is a strong positive correlation between the load characteristic indicators representing electricity consumption scale, which conforms to the physical law that the larger the basic electricity consumption scale, the higher the extreme value. Overall, this feature set effectively decouples the scale attribute and behavioral elasticity of users, and can comprehensively and redundantly characterize the electricity consumption heterogeneity of groups.
- 2.
Selection of the number of clusters
Based on the four clustering validity indicators mentioned above, the clustering effects of
= 3, 4, 5, 6, 7, 8 are quantitatively evaluated, where the weight of the silhouette coefficient is 0.4, and the weights of the CH index and DB index are both 0.3. The calculation results of each indicator and the comprehensive score are shown in
Table 3.
The experimental results show that when = 6, the WCSS curve has an inflection point, and the silhouette coefficient, CH index and DB index all reach the optimal values with the highest weighted comprehensive score. Therefore, this paper determines the optimal number of clusters as 6.
- 3.
User profile labels
Based on the optimal number of clusters, the 6 clustering centers obtained from training and the core load characteristics of users in each cluster are statistically analyzed, and the results are shown in
Table 4. By comparing the load characteristics of different clusters and combining with residential electricity consumption behaviors, user profile labels are assigned to each cluster.
It can be seen from
Table 4 that the electricity consumption behaviors of the six types of user groups have significant heterogeneity, and the six profile labels can accurately distinguish the differences in residential users’ electricity consumption behaviors.
- 4.
Visual verification of clustering results
To further intuitively show the discrimination of clustering results, principal component analysis is used to reduce the 7-dimensional load features to a 3-dimensional feature space, and the clustering results are drawn as shown in
Figure 3.
It can be clearly seen from
Figure 3 that the six profile groups show significant internal aggregation and clear inter-cluster boundaries in space, indicating that the user profile system constructed in this paper can effectively distinguish user groups with different electricity consumption behaviors.
To more intuitively show the differences in electricity consumption patterns of each group, the average daily load curves of users with different profiles are drawn as shown in
Figure 4.
It can be clearly seen from
Figure 4 that the six types of users present completely different electricity consumption characteristics in the 24 h dimension. The summary is shown in
Table 5.
The above group differences indicate that if a unified model is used to predict the adjustable capacity of all users, the regular deviation caused by profile heterogeneity will be ignored. Therefore, it is necessary to introduce profile label features into the model to improve prediction accuracy and physical rationality.
4.2. Robustness Validation of the User Profiling System
To ensure the proposed user profiling system possesses reliable classification capabilities and to mitigate the impacts of algorithmic randomness and temporal volatility on clustering outcomes, this study conducts systematic validation experiments from two dimensions: random seeds and temporal variations. These experiments quantify the consistency and stability of the clustering results under varying conditions.
Although the K-means++ algorithm optimizes the initial center selection through probabilistic sampling, minor random fluctuations may still exist. To verify the reproducibility of the clustering results, 100 sets of different random initial seeds are configured to repeatedly execute the K-means++ clustering process on the complete dataset. The Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI) are employed as two metrics to calculate the consistency between any two clustering outcomes. The values of these metrics range from [0, 1], where a value closer to 1 indicates a higher degree of clustering consistency. The results are presented in
Table 6.
The experimental results demonstrate that the mean values of ARI and NMI over the 100 repeated experiments both approach the theoretical maximum of 1.0. Even in the worst-case scenario, the minimum ARI still reaches 0.9888, and the minimum NMI reaches 0.9833. This indicates that the probabilistic sampling initial center strategy of the K-means++ algorithm performs exceptionally well in this study. The clustering results are almost unaffected by the selection of initial random seeds, demonstrating the extremely strong robustness of the algorithm.
- 2.
Validation of Clustering Stability Across Different Seasons
Residential electricity consumption behavior is significantly influenced by seasonal factors such as temperature, sunshine duration, and living habits. To verify the applicability of the user profile system across different seasons, the complete dataset is divided into four independent subsets according to natural seasons: Spring (March to May), Summer (June to August), Autumn (September to November), and Winter (December to February). The ARI and NMI metrics between the clustering results of each season and the annual baseline clustering results are then calculated. The results are presented in
Table 7.
The experimental results exhibit a pattern of “higher in autumn and winter, lower in spring and summer,” which highly aligns with the climatic characteristics of the inland northwest region of China. Influenced by seasonal factors, the shape of users’ daily load curves undergoes fundamental changes. However, the ARI and NMI metrics for all seasons remain higher than random levels, indicating that the core electricity consumption patterns of users maintain a strong internal consistency despite seasonal variations.
- 3.
Validation of Clustering Stability Across Different Time Periods
There are significant differences in residents’ daily routines between weekdays and weekends. To verify the stability of the user profile system across different time periods within a week, the dataset is further divided into two independent subsets: weekdays and weekends. The consistency metrics between these subsets and the annual baseline clustering results are then calculated. The results are presented in
Table 8.
The experimental results indicate that the consistency between the weekday clustering results and the annual baseline is significantly higher than that of the weekends, which completely conforms to residents’ daily routines. In contrast, users have more flexible schedules on weekends, leading to a slight decrease in clustering consistency; nonetheless, core characteristics such as their basic electricity consumption scale and load volatility features remain stable.
4.3. Construction of Sample Pool
The historical load curves and electricity price signals are input into the optimization model, and the optimal response curves of each group are solved in batches with the goal of minimizing the user’s daily electricity cost, which are used as the profile labels for deep learning. The optimization solution process is completed based on the Gurobi 13.0.1 solver in the Python 3.9 environment, with the system environment of Intel Core i7-9750H and 16GB RAM. The results are shown in
Figure 5.
4.4. Training of Deep Learning Model
After constructing the demand response sample pool, the generated sample set is divided into 70% training set and 30% test set. The input feature matrix of the model includes the user’s historical baseline load, time-of-use electricity price signal and user profile label, and the output is the ideal response load curve generated by the Gurobi solver.
To verify the capability of different network architectures in capturing the implicit mapping relationships between load and price, three mainstream deep learning models—namely, the One-Dimensional Convolutional Neural Network (1D-CNN), the Gated Recurrent Unit (GRU), and the proposed LSTM—are selected for benchmark comparison. Furthermore, three evaluation metrics—Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE)—are employed to assess the predictive performance of the adjustable capacity forecasting model.
To ensure a fair comparison, uniform training hyperparameter configurations are applied across all deep learning models. Specifically, the batch size is set to 128, the Adam optimizer is employed for network weight updates, the initial learning rate is configured at 0.001, and the maximum number of training epochs is capped at 500. Furthermore, an early stopping mechanism utilizing the validation loss as the monitoring metric is incorporated. Regarding the specific network architecture design, the 1D-CNN model is constructed with three one-dimensional convolutional layers, with kernel sizes of 5, 5, and 3, and output channels of 64, 128, and 128, respectively. A Batch Normalization layer and a Dropout regularization operation with a dropout rate of 0.2 are cascaded after each convolutional layer, and the results are ultimately output through two fully connected layers with 128 and 64 nodes, respectively, along with the ReLU activation function. For both the LSTM and GRU recurrent neural networks, a 2-layer stacked recurrent structure is adopted, where the input feature dimension is 2 and the hidden state dimension is uniformly set to 128. Ultimately, a single fully connected layer is utilized in both models to map the high-dimensional time-series features into continuous load predicted values. The global prediction and accuracy results of the respective models on the identical test set are illustrated in
Figure 6 and
Figure 7.
It can be seen from
Figure 6 and
Figure 7 that the LSTM model has better prediction performance on the validation set than the other two benchmark models. Specifically, the RMSE index of the LSTM model prediction results is 8.83% and 18.37% lower than that of the 1D-CNN and GRU models respectively; the MAE index is 7.2% and 55.2% lower than that of the 1D-CNN and GRU models respectively; the MAPE index is 20.89% and 60.52% lower than that of the 1D-CNN and GRU models respectively.
To further demonstrate the superiority of the proposed heterogeneity-handling method based on user profiling, a benchmark comparison is conducted against commonly used heterogeneity-aware approaches, namely K-means + SVM and K-means + XGBoost. Similarly, three evaluation metrics—MAE, RMSE, and MAPE—are employed to assess the predictive performance of each model. The experimental results are tabulated in
Table 9.
As demonstrated in
Table 9, the proposed method outperforms both the K-means + SVM and K-means + XGBoost approaches across all three evaluation metrics, fully verifying its superiority in handling residential load time-series data.
4.5. Prediction and Analysis of Adjustable Capacity
After completing the training of the LSTM deep prediction model, to quantitatively analyze the physical heterogeneity of different user profile groups in adjustable capacity, this paper selects 6 users belonging to different profile groups on a typical summer day from the test set. The historical baseline load of the day, the profile category and three time-of-use electricity pricing schemes with different peak-valley difference gradients are input into the trained LSTM model. The specific prediction situation is shown in
Figure 8.
It can be seen from
Figure 8 that the evening peak load shows obvious load reduction and shifting with the increase in the electricity price gradient. The predicted adjustable capacity results in
Table 10 show that the High-base continuous-consumption profile shows the strongest elastic space, and its adjustable capacity can reach about 1.6018 kW under the Price C scheme; while the Low-base rigid-consumption profile has an extremely low reduction amount, only about 0.2004 kW.
- 2.
Interval Prediction and Uncertainty Analysis
To address the differentiated decision-making requirements across three major engineering scenarios—power grid dispatch, load aggregator bidding, and demand response (DR) potential assessment—this study constructs a Quantile Regression LSTM (QR-LSTM) interval prediction model based on the original LSTM point prediction framework. By selecting three quantiles (0.05, 0.5, and 0.95), the model synchronously outputs the upper and lower bounds of the 95% confidence interval (CI) alongside the median baseline predicted value. The model is trained utilizing the Pinball loss function. The uncertainty evaluation metrics of the model on the test set are presented in
Table 11.
As shown in
Table 11, the actual coverage rate of the 95% confidence interval (CI) closely approaches the preset 95% confidence level, encompassing the vast majority of the actual adjustable capacity values. The mean interval width is only 0.0813 kW, avoiding excessively wide prediction intervals. The point prediction RMSEs across the three key quantiles remain at low levels, indicating the model’s capability to meet the accuracy demands of varying engineering scenarios.
Based on these quantitative findings, specific strategies are tailored: for the power grid dispatch scenario prioritizing safety and reliability, the conservative adjustable capacity at the 0.05 quantile is adopted as the baseline to avert over-dispatching risks from high point predictions. For the load aggregator bidding scenario aiming to ensure profitability while boosting bidding success rates, the neutral capacity at the 0.5 quantile serves as the reference. For the demand response potential assessment scenario seeking to fully grasp the maximum regional capacity, the optimistic capacity at the 0.95 quantile is set as the upper bound.
4.6. Validation of the Necessity of Considering User Profile Heterogeneity and Analysis of Generalization Limitations
To quantitatively demonstrate the value of user profile heterogeneity in adjustable capacity forecasting, the baseline period load data of the test groups was utilized to conduct short-term rolling forecasting of the testing period loads. Subsequently, a comparative error analysis was performed against the actual loads during the testing period. Using an LSTM model without user profile labels as the baseline for comparison, the forecasting accuracy comparison is presented in
Table 12.
As shown in
Table 12, the forecasting accuracy of the LSTM model drops significantly after the removal of user profile labels. Compared with the baseline model, the proposed model achieves improvements of 16.29%, 24.52%, and 20.21% across the three primary accuracy metrics, respectively. This substantiates that user profile labels can significantly enhance the feature extraction capability of time-series models for heterogeneous response behaviors.
- 2.
Limitations of the Synthetic Data-Based Model and Its Impact on Generalization Capability
This study utilizes synthetic samples generated via Gurobi optimization as the model training data, which effectively addresses the challenge of scarce real-world demand response labels. When applied to the real-world measurement dataset, the corresponding error metrics of the proposed model increased by 9.41%, 12.67%, and 15.12%, respectively, compared to its performance on the synthetic optimization test set.
The fundamental reason is that the synthetic samples strictly adhere to the ideal physical constraints of electricity cost minimization. Consequently, they cannot replicate the various non-ideal factors present in real-world grid scenarios, including the fluctuation of users’ subjective response willingness, random electricity consumption behaviors, and the influence of environmental factors. However, once a sufficient volume of real-world demand response measurement samples is accumulated in the future, the model can undergo seamless incremental retraining using these real samples without altering the underlying model architecture. With the increasing number of real samples and the continuous enrichment of user response behavior data, the model will progressively learn the non-ideal factors inherent in real-world scenarios, thereby achieving sustained improvements in both forecasting accuracy and generalization capability.
5. Conclusions
Aiming at the problems of lack of group heterogeneity, subjective setting of constraint parameters, shortage of real response labeled samples in the prediction of residential adjustable capacity, and significant prediction deviation of the traditional fixed reduction coefficient method, this paper deeply integrates K-means++ clustering and LSTM time-series prediction model, and constructs a prediction method of residential adjustable capacity considering user profile heterogeneity. The main research conclusions of this paper are as follows:
Based on 7-dimensional load features, including the daily average load, peak-to-valley difference, and the proportion of evening peak load, the K-means++ algorithm is employed to divide residential users into 6 typical electricity consumption groups. The constructed user profile system can effectively characterize the heterogeneity of consumption behaviors, providing a foundation for differentiated forecasting.
The proposed profile-specific constrained sample pool construction method generates high-quality response samples in batches with the objective of electricity cost minimization. This resolves the challenges faced by traditional models regarding the reliance on empirical settings for constraint parameters and the shortage of response samples.
The LSTM forecasting model integrated with user profile labels exhibits outstanding performance, and its prediction accuracy outperforms commonly used heterogeneity-aware methods, namely K-means + SVM and K-means + XGBoost. After introducing the profile labels, the three core accuracy metrics of the model are further improved by 16.29%, 24.52%, and 20.21%, respectively, validating the critical role of profile heterogeneity in enhancing prediction accuracy.
Ultimately, in real-world grid dispatch and market bidding, the practically realizable capacity is inherently constrained by practical engineering factors. Future research will focus on extending this fundamental framework toward practical implementation across three key dimensions: (1) designing privacy-preserving forecasting architectures based on federated learning to guarantee the security of smart meter data; (2) optimizing lightweight, distributed edge-computing algorithms to satisfy the real-time efficiency demands of massive residential scenarios; and (3) integrating behavioral economics models to account for dynamic user participation willingness, thereby bridging the gap between theoretically predicted capacities and actual dispatchable resources.