This section presents the experimental validation of the proposed framework through comprehensive case studies. The dataset construction and simulation settings are first introduced, followed by the experimental design under different source–target domain configurations. Parameter configurations, comparative results, and computational cost analyses are then provided to evaluate forecasting accuracy, robustness under distribution shifts, and practical deployability.
4.1. Data Set
The dataset used in this study is constructed based on real-world residential electricity consumption data from the Pecan Street project in Austin, TX, USA. Pecan Street provides high-resolution smart meter measurements and publicly accessible residential energy datasets [
23], and has been widely adopted in studies related to residential demand response (DR), load aggregation, and energy behavior modeling. Its long-term and fine-grained data characteristics make it particularly suitable for investigating cross-domain learning problems under distributional shifts, which is the main focus of this paper. The original dataset contains 1-min resolution electricity consumption records for approximately 500 residential households, including both whole-house and appliance-level measurements. To ensure data completeness and consistency, households with missing or discontinuous records over the study period are removed. The remaining data span a continuous two-year period from 1 January 2015 to 31 December 2016, enabling the analysis of inter-annual variations in residential load patterns caused by differences in weather conditions, customer behavior, and household composition. The raw data are aggregated to a 15-min resolution, which is consistent with the temporal granularity typically used in incentive-based DR programs and day-ahead electricity market operations.
Due to the lack of publicly available datasets containing actual residential load reductions during incentive-based demand response (IBDR) events, this study adopts a home energy management system (HEMS)-based simulation framework to generate reference demand response data. The HEMS model employed in this work follows the formulation proposed in [
20], which has been widely used in the literature for aggregated DR capacity modeling and forecasting.
The HEMS model simulates customers’ response behavior under IBDR programs by optimizing household appliance operation with the objective of minimizing total electricity cost, while explicitly considering monetary rewards for load reduction and penalty mechanisms for unmet DR commitments [
24,
25]. Residential loads are categorized into three main types: air-conditioning systems, shiftable appliances, and inelastic loads. Air-conditioning systems are modeled using simplified thermal dynamic equations with ON/OFF control, temperature setpoints, and comfort deadbands. Shiftable appliances (e.g., washing machines and dishwashers) are assumed to have flexible operating time windows and are scheduled to respond to DR signals without significantly affecting user comfort. Inelastic loads are treated as fixed and non-responsive throughout the DR event. The key parameters of the HEMS model include electricity prices, reward rates, penalty rates, DR event start times and durations, appliance rated power, and thermal characteristics of residential buildings. These parameters are summarized in
Table 2 and are consistent with typical settings adopted in residential DR studies.
It should be emphasized that the HEMS model is not newly proposed in this paper, nor is it tuned to favor the proposed forecasting method. Instead, it serves solely as an offline data generation mechanism to produce realistic and internally consistent demand response (DR) samples based on authentic residential consumption data from the Pecan Street dataset. Similar HEMS-based frameworks have been widely adopted and validated in prior studies to approximate aggregated residential customer behavior under incentive-based DR programs [
13,
19,
20]. Although the simulated DRs cannot fully replicate real-world customer behavior in the absence of actual incentive-based DR event data, this limitation applies uniformly to all benchmark models and experimental cases considered in this study. Moreover, the same HEMS-generated DR potential values are consistently used across both the source and target domains. As a result, any modeling bias introduced by the underlying HEMS assumptions affects all methods in a consistent manner and does not compromise the fairness of the comparative evaluation or the validity of the relative performance conclusions.
From a robustness and sensitivity perspective, variations in HEMS assumptions or parameter settings primarily manifest as changes in the statistical distribution of the aggregated DR potential, rather than altering the fundamental relationship between aggregated baseline consumption, exogenous conditions, incentive signals, and response capability. Since the proposed framework does not incorporate appliance-level control logic or HEMS-specific parameters, but instead relies on domain-adaptive feature learning and marginal distribution alignment, it is inherently less sensitive to specific HEMS configurations. Consequently, moderate variations in HEMS parameters are expected to be effectively accommodated by the proposed domain-adaptive transfer learning mechanism.
4.2. Case Settings
To comprehensively evaluate the effectiveness, stability, and data-efficiency of the proposed domain-adaptive transfer learning framework, eight case studies are designed with different source–target domain configurations and source-domain data proportions.
Specifically, 55 demand response (DR) event days from the summer of 2015 are used to construct Dataset_1, while 65 DR event instances from the summer of 2016 form Dataset_2, as summarized in
Table 2. Owing to inter-annual differences in weather conditions, customer composition, and consumption patterns, these two datasets naturally exhibit noticeable distributional discrepancies, making them well suited for evaluating cross-domain transfer performance. It should be noted that the demand response (DR) event days in Dataset_1 and Dataset_2 are not randomly sampled from the full two-year dataset. Instead, they are selected from predefined summer periods (June–September) in 2015 and 2016 and correspond to days with simulated incentive-based DR events generated by the HEMS model. The selected DR event days are temporally distributed within each summer period but not strictly consecutive, as they are constrained by data availability and predefined DR event settings. Focusing on summer periods may introduce a certain degree of seasonal sampling bias by over-representing weather-sensitive load behaviors, such as air-conditioning usage. However, this bias is applied consistently to both the source and target domains. Despite being drawn from the same season, Dataset_1 and Dataset_2 exhibit noticeable inter-annual distribution shifts due to differences in weather conditions, customer composition, and baseline consumption patterns, leading to discrepancies in both input feature distributions and aggregated DR potential. This setting naturally forms a cross-domain learning scenario and constitutes a key motivation for adopting a transfer learning framework in this study. Accordingly, the results should be interpreted as representative of seasonal DR scenarios, and caution should be exercised when extrapolating them to other seasons.
In practical demand response applications, load aggregators often face severe data scarcity when incorporating new customer groups or expanding into new operational regions. Historical DR data are typically available only for a small subset of customers or events, while collecting additional labeled data requires costly and time-consuming field experiments. To realistically reflect this constraint, the initial case studies (Cases 1–4) deliberately restrict the source-domain training data to 10% of the available samples. This setting represents an extreme yet practically relevant scenario and is intended to assess the lower bound of data availability under which effective transfer learning can still be achieved. Under this design, the target domain is assumed to be completely unlabeled, and no target-domain information is used during model training or hyperparameter tuning. This strict separation ensures that the evaluation faithfully reflects a real-world deployment scenario for unlabeled target domains.
To further examine the robustness and stability of the proposed framework with respect to the amount of source-domain data, four additional cases (Cases 5–8) are introduced by increasing the source-domain proportion to 20% and 30%, respectively. These cases enable a systematic sensitivity analysis that evaluates whether the model’s performance trends remain consistent as more source-domain information becomes available.
Across all cases, the remaining samples in the corresponding dataset are used exclusively for testing. For intra-year scenarios (Cases 1 and 2), the source and target domains are drawn from the same dataset, while for cross-year scenarios (Cases 3–8), the source and target domains are drawn from different years to introduce more pronounced distribution shifts. The complete experimental configuration is summarized in
Table 3. This multi-case experimental setup provides a rigorous and transparent basis for evaluating the practical applicability of the proposed framework in real-world demand response environments characterized by limited labeled data and evolving customer behavior.
From an operational perspective, the case settings in this study are designed to reflect realistic deployment conditions faced by load aggregators. In practice, forecasting models for aggregated demand response (DR) potential are typically trained offline using historical data and updated periodically as new data become available, rather than being retrained in real time. Accordingly, all models in this study are trained under an offline learning setting, and their computational cost is evaluated to assess practical feasibility.
As will be shown in
Section 4.4, the proposed DA-RVFL framework exhibits training and inference times that are comparable to conventional machine learning models such as RF and SVR, and substantially lower than deep learning-based LSTM models. This is mainly due to the closed-form least-squares optimization and the absence of iterative backpropagation. All experiments are conducted in a CPU-based environment, indicating that the proposed approach can be readily implemented by load aggregators without requiring specialized hardware or cloud-scale computing resources. These characteristics make the proposed framework suitable for routine operational use in day-ahead bidding and planning scenarios.
Although the forecasting accuracy in this study is evaluated using statistical metrics such as RMSE and MAPE, these improvements have direct operational and economic implications for load aggregators. More accurate prediction of aggregated DR potential reduces the likelihood of overestimating available flexibility, which in turn lowers the risk of failing to meet committed DR bids and incurring penalty costs. At the same time, improved accuracy mitigates underestimation of DR capability, enabling aggregators to submit more competitive bids and fully utilize available demand-side flexibility. Therefore, the observed reductions in RMSE and MAPE translate into lower supply risk, more reliable DR participation, and improved economic efficiency in market operations. While a detailed market-level economic analysis is beyond the scope of this paper, the consistent accuracy improvements demonstrated across multiple case studies indicate tangible practical benefits for real-world aggregator decision-making.
4.3. Parameter Configuration
To simultaneously preserve predictive accuracy on the source domain and promote domain-invariant feature learning, the final loss function
is constructed. It explicitly combines the DA-RVFL training loss-defined as the root mean square error (RMSE) on the source-domain data-with a Maximum Mean Discrepancy (MMD) loss term applied to the outputs of the DL layer for both the source and target domains.
where
quantifies the distributional divergence
between the source and target datasets, and
is a hyperparameter that balances the contribution of the primary DA-RVFL training loss against the domain adaptation loss. Once this objective function is formulated, the model parameters can be optimized to automatically minimize the distribution discrepancy between the source and target domains during training. In this study, the hyperparameter
is selected through empirical tuning using a validation subset of the source-domain data. Values of
in the range [0.01, 1] are evaluated, and the model performance is observed to be relatively stable within this interval. Based on this analysis,
is fixed at 0.1 for all experiments, as it provides a favorable balance between forecasting accuracy and domain adaptation effectiveness.
The methodological setup of the ensemble model involves several key hyperparameters designed to balance predictive performance and computational complexity. To encourage diverse feature representations, the number of hidden-layer nodes is constrained to the range of 400–600. In addition, an ensemble consisting of 200 independent DA-RVFL models is constructed, representing a practical compromise between improved predictive accuracy and computational cost. The sigmoid function is adopted as the activation function throughout the network.
4.4. Results and Analysis
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10,
Figure 11,
Figure 12 and
Figure 13 present the temporal prediction results of the aggregated demand response (DR) potential for all eight case studies. Cases 1–4 correspond to the extreme low-data scenario where only 10% of the source-domain samples are available, while Cases 5–8 progressively increase the source-domain proportion to 20% and 30%.
As shown in
Figure 6,
Figure 7,
Figure 8 and
Figure 9, the predicted DR potential closely follows the actual temporal dynamics in both intra-year (Cases 1 and 2) and cross-year (Cases 3 and 4) transfer scenarios. Despite the pronounced distribution mismatch introduced by inter-annual variations in weather conditions and customer behavior, the proposed method maintains stable tracking performance and avoids the large deviations observed in conventional machine learning models. Quantitatively,
Table 4 shows that the proposed method consistently achieves the lowest RMSE and MAPE values and the highest coefficient of determination (R
2) across all four cases. In particular, for the most challenging cross-year transfer scenario (Case 3), the proposed framework improves RMSE by more than 40% compared with LSTM and by an even larger margin compared with RF and SVR. The relatively high R
2 values (above 0.5 in all cases) further indicate that the proposed model captures the dominant variation patterns of aggregated DR potential, whereas several benchmark models exhibit near-zero or even negative R
2 values, reflecting their inability to generalize under severe data scarcity. These results confirm that the proposed framework is not merely fitting the source-domain data but is able to extract domain-invariant representations that remain effective when transferred to unlabeled and distribution-shifted target domains. In Cases 5–8 extend the analysis by increasing the proportion of source-domain data to 20% and 30%, respectively. This design explicitly evaluates the sensitivity of the proposed method to the size of the source-domain training set and assesses whether the performance gains observed at 10% remain consistent or improve systematically as more data become available. As illustrated in
Figure 10,
Figure 11,
Figure 12 and
Figure 13 and summarized in
Table 4, increasing the source-domain proportion leads to a monotonic reduction in RMSE and MAPE for the proposed method in both transfer directions (Dataset_1 → Dataset_2 and Dataset_2 → Dataset_1). For example, in the Dataset_1 → Dataset_2 transfer setting, RMSE decreases from 16.45 (Case 3, 10%) to 13.19 (Case 5, 20%) and further to 10.58 (Case 7, 30%). A similar trend is observed for MAPE and R
2, with R
2 increasing steadily and exceeding 0.79 in Case 7.
To further investigate the mechanism underlying the performance improvements observed in cross-year transfer scenarios, the latent feature distributions before and after domain adaptation are visualized using principal component analysis (PCA) for Cases 3–6, as shown in
Figure 14,
Figure 15,
Figure 16,
Figure 17,
Figure 18 and
Figure 19. These cases represent challenging settings with pronounced inter-annual distribution shifts and limited source-domain data availability. In the absence of domain adaptation, the latent representations learned by the standard RVFL model exhibit clear separation between the source and target domains across all cases, indicating substantial distribution mismatch caused by differences in weather conditions, customer composition, and baseline consumption patterns between years. This separation is particularly evident in Cases 3 and 4, where only 10% of the source-domain data are available, highlighting the difficulty of cross-domain generalization under severe data scarcity.
To ensure a fair and meaningful comparison, all benchmark methods considered in this study were carefully optimized and tuned using only source-domain data prior to performance evaluation. A consistent validation procedure was applied across all models to determine appropriate hyperparameter settings, with the objective of achieving their best possible predictive performance under each case study. This unified tuning strategy avoids bias introduced by uneven optimization efforts and ensures that the observed performance differences genuinely reflect the intrinsic modeling capability of each approach. The detailed hyperparameter configurations of the compared methods are summarized as follows.
The support vector regression (SVR) model employed a radial basis function (RBF) kernel. The regularization parameter C, kernel width parameter , and insensitive loss parameter were selected using a grid search combined with five-fold cross-validation. The search ranges were , , and . The final configuration (, , ) achieved the lowest validation RMSE.
The random forest (RF) hyperparameters, including the number of trees, maximum tree depth, minimum number of samples required to split an internal node, and minimum number of samples per leaf, were tuned using cross-validation. The final configuration—200 trees, a maximum depth of 10, a minimum split size of 5, and a minimum leaf size of 2—was selected based on its superior validation performance while maintaining model robustness.
The hyperparameters of the long short-term memory (LSTM) model were determined using a combination of grid search and validation-based early stopping. The number of hidden units {30, 50, 80}, dropout rate {0.1, 0.2, 0.3}, and learning rate {0.0005, 0.001} were evaluated. The final architecture consisted of 50 LSTM units with a dropout rate of 0.2, trained using a learning rate of 0.001 and a batch size of 30. Training was terminated when the validation loss converged to prevent overfitting.
After the above hyperparameter configuration and validation procedure, all benchmark models (SVR, RF, and LSTM) were re-trained using the selected optimal settings on the corresponding source-domain training set and then evaluated on the target-domain test set in each case study. This ensures that performance differences among methods originate from their generalization capability under data scarcity and domain shift, rather than from suboptimal tuning. The comparative results are reported quantitatively in
Table 3 and further illustrated in
Figure 20,
Figure 21,
Figure 22 and
Figure 23, which collectively provide a clear assessment of prediction accuracy (RMSE and MAPE) and goodness-of-fit (R
2) across different transfer scenarios and source-data proportions.
After applying the proposed DA-RVFL framework, the latent feature distributions of the source and target domains become substantially more aligned in all cases. Notably, effective alignment is achieved even when the source domain contains only a small number of DR event days (Cases 3 and 4), demonstrating the strong data efficiency and robustness of the proposed domain adaptation mechanism. As the proportion of source-domain data increases to 20% (Cases 5 and 6), the alignment becomes more stable and compact, which is consistent with the monotonic improvements observed in the quantitative performance metrics.
Overall, these latent-space visualizations provide intuitive evidence that the proposed DA-RVFL framework effectively mitigates inter-annual distribution shifts at the representation level, enabling reliable knowledge transfer from a limited source domain to an unlabeled target domain. This explains the superior forecasting performance achieved by the proposed method in cross-year demand response potential prediction under realistic data scarcity conditions.
More specifically, for each case study, the source-domain data used for training were further partitioned into training and validation subsets using a five-fold cross-validation strategy. The hyperparameters of all baseline models were determined exclusively based on the source-domain data, without incorporating any target-domain information, in order to strictly adhere to the unlabeled target-domain assumption of the transfer learning setting. This experimental design ensures a fair and unbiased comparison among different algorithms under identical data availability and domain-shift conditions.
The support vector regression (SVR) model employed a radial basis function (RBF) kernel. Although SVR demonstrates reasonable performance in several intra-year scenarios, its prediction accuracy degrades noticeably in cross-year transfer cases. As shown in
Table 4, SVR yields relatively high RMSE and MAPE values in Cases 3 and 4, accompanied by negative or low R
2 values. This behavior indicates that SVR is sensitive to distribution shifts and struggles to generalize when the statistical characteristics of the target domain differ substantially from those of the source domain. The random forest (RF) model exhibits the weakest overall performance among the benchmark methods. Despite its robustness to noise and its ability to capture nonlinear relationships, RF relies heavily on sufficient and representative training data. Under severe data scarcity and domain mismatch conditions, RF fails to construct reliable decision boundaries, resulting in large prediction errors and highly unstable R
2 values. In several cross-domain cases, negative R
2 values are observed, suggesting that RF predictions are inferior to naive mean-based estimates. This limitation is particularly evident in Cases 1 and 3, where the combination of limited source-domain data and strong distribution shifts significantly undermines model generalization. The long short-term memory (LSTM) model generally outperforms RF and SVR in scenarios where temporal dependencies can be effectively learned. In intra-year cases with relatively mild distribution differences, LSTM achieves moderate prediction accuracy. However, its performance deteriorates substantially in cross-year transfer scenarios. As indicated in
Table 4, LSTM yields unstable R
2 values and elevated RMSE in Cases 3, 6, and 8. This behavior can be attributed to the strong data dependency of deep learning models: when training samples are limited and the target-domain distribution deviates from the source domain, LSTM tends to overfit the source-domain temporal patterns and fails to generalize to unseen conditions.
In contrast, the proposed DA-RVFL framework consistently achieves superior performance across all evaluation metrics and case studies. As summarized in
Table 4 and illustrated in
Figure 20,
Figure 21,
Figure 22 and
Figure 23, the proposed method yields the lowest RMSE and MAPE values and the highest R
2 values in all eight cases. Notably, its performance advantage is particularly pronounced in cross-year transfer scenarios, where traditional machine learning models experience significant degradation. This indicates that the proposed framework effectively mitigates the adverse effects of distribution mismatch through explicit domain adaptation.
Figure 20 and
Figure 21 provide an aggregated comparison of RMSE, MAPE, and R
2 across representative cases, clearly illustrating the consistent performance gap between the proposed method and the benchmark algorithms. Furthermore, the temporal prediction results shown in
Figure 22 and
Figure 23 demonstrate that the proposed framework is able to closely track the actual DR potential dynamics, while benchmark methods exhibit larger fluctuations and systematic biases, especially during periods of rapid load variation.
Table 5 reports the computational cost of different forecasting methods under all eight case studies, together with their corresponding training strategies. The results provide a quantitative assessment of the practical deployability of the proposed framework and directly complement the prediction accuracy analysis presented in
Table 3. All experiments were implemented in MATLAB R2025a. The test system was equipped with an Intel Core i5-7500U CPU operating at 3.40 GHz and 8.00 GB of installed RAM. All computational cost evaluations were conducted in a CPU-based environment. First, for all methods, the training time increases monotonically from Case 1 to Case 8, which is consistent with the gradual increase in the amount of source-domain training data from 10% to 30%.
This trend confirms that the reported computational costs follow expected scalability behavior rather than being dominated by implementation artifacts. Second, traditional machine learning models, including RF, SVR, and the proposed DA-RVFL, exhibit comparable computational costs across all cases. Although the proposed method introduces additional regularization and an MMD-based domain adaptation term into the objective function, its training time remains in the same order of magnitude as RF and SVR. This is mainly because the DA-RVFL framework relies on closed-form least-squares optimization and avoids iterative backpropagation, thereby preventing excessive computational overhead. Third, compared with deep learning-based LSTM models, the proposed approach demonstrates a clear advantage in computational efficiency. While the training time of LSTM increases substantially with sample size due to its iterative sequence modeling and gradient-based optimization, the DA-RVFL framework maintains a moderate and predictable growth rate. This property is particularly important for load aggregators, who often need to retrain forecasting models repeatedly under changing customer compositions and operating conditions. Finally, the testing time of RF, SVR, and DA-RVFL remains nearly constant across all cases, indicating that the inference complexity of the proposed method is not sensitive to training data size. Although DA-RVFL incurs a slightly higher testing cost than RF and SVR due to the ensemble structure, the absolute inference time remains within a few seconds, which is well suited for day-ahead and near-real-time demand response applications. Overall, the results in
Table 4 demonstrate that the proposed DA-RVFL framework achieves improved transfer learning performance (as shown in
Table 3) without sacrificing computational efficiency.