Next Article in Journal
Long-Term and Short-Term Photovoltaic Power Generation Forecasting Using a Multi-Scale Fusion MHA-BiLSTM Model
Previous Article in Journal
Title Use of Waste Heat from Generator Sets as the Low-Temperature Heat Source for Heat Pumps
Previous Article in Special Issue
Artificial Intelligence in Assessing Electricity and Water Demand in Oilseed Processing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Probabilistic Forecasting of Household Energy Self-Sufficiency Rate Using Pre-Trained Time-Series Foundation Models with Monte Carlo Simulation

by
Hiroki Yamasaki
,
Libei Wu
and
Masaaki Nagahara
*
Graduate School of Advanced Science and Engineering, Hiroshima University, Higashi Hiroshima City 739-8527, Japan
*
Author to whom correspondence should be addressed.
Energies 2026, 19(2), 362; https://doi.org/10.3390/en19020362
Submission received: 15 December 2025 / Revised: 5 January 2026 / Accepted: 9 January 2026 / Published: 12 January 2026

Abstract

Evaluating energy self-sufficiency in the residential sector is crucial for decarbonization. However, the discrepancy between design-stage estimates and actual measurements (the performance gap) poses a significant challenge. While the primary cause of this gap lies in uncertainties stemming from occupant behavior and weather conditions, no medium-term probabilistic forecasting framework for the energy self-sufficiency rate (ESSR) incorporating these factors has been established. To address this issue, this study proposes a probabilistic forecasting framework that integrates a pre-trained time-series foundation model called Chronos with Monte Carlo simulation. Validation using real data from 39 households demonstrates that the proposed method can achieve prediction accuracy superior to baseline models. Furthermore, the derived probability distributions of ESSR quantify fluctuation risks across households and seasons, highlighting the limitations of conventional uniform evaluation models.

1. Introduction

Decarbonization of the building sector is widely regarded an international policy challenge [1,2]. In Japan, achieving energy self-sufficiency in residential houses is required to achieve carbon neutrality by 2050 [3,4]. The key policy measure for this, called ZEH (net Zero Energy House) [5], aims to achieve a near-zero annual energy balance through improved thermal insulation, high-efficiency equipment, and the introduction of renewable energy. However, as pointed out by Yamaura et al. [3], the adoption of ZEH has not always proceeded smoothly, but hindered by concerns over high initial costs. While public subsidies and tax incentives are indispensable for widespread adoption, insufficient policy-based funding has been noted [6]. Therefore, to justify public incentives and gain trust from investors and consumers, a mechanism designed for accurately assessing how much energy self-sufficiency is achieved as a result of policy support. It is also urgently needed for policy sustainability. This study focuses on the evaluation of the energy self-sufficiency rate (ESSR), the ratio of generated energy to consumed energy; see Equation (1) for the definition.
Performance evaluation of ZEH, as well as energy-saving buildings, has been conventionally based on design-stage simulations and standardized household models. However, multiple studies have pointed out a significant discrepancy, known as the performance gap, between these estimated values and the measured values after occupancy. For instance, cases have been reported where actual consumption reaches several times the prediction, or where the measured self-sufficiency rate falls below the estimated rate [7,8,9]. The primary cause of this discrepancy is that simulations oversimplify uncertain factors such as actual occupant behaviors and weather conditions. Therefore, accurate assessment of energy self-sufficiency requires evaluation based on measured data, rather than design-stage estimates. Nevertheless, even in evaluations using measured data, what homeowners and policymakers need for decision-making is not past averages or standard values, but a future outlook of the household’s ESSR [10]. Most existing studies are limited to analyzing past data or deterministic short-term forecasting, and a framework for probabilistically forecasting ESSR over the medium to long term, considering the aforementioned uncertainties, has not been established.
Thus, to address this challenge, this study utilizes the recently emerged time-series foundation models, which enable medium-to-long-term future forecasting that accounts for temporal variability and uncertainty. Among these models, Chronos [11] has been reported to have the capability to provide future forecasts as probability distributions with high accuracy, even from relatively small amounts of data, due to its extensive pre-training on a wide variety of time-series datasets. This capability for probabilistic forecasting directly addresses the quantification of future uncertainty identified as a key challenge in prior research.
The objective of this study is to utilize the probabilistic forecasting capability of Chronos to quantify the future uncertainty of household ESSR. Specifically, we first fine-tune Chronos using monthly data (solar power generation, electricity consumption, and gas consumption) for each household to forecast 12 months into the future, and verify its prediction accuracy against baseline models. Then, we probabilistically evaluate the annual ESSR, including gas consumption, using the Monte Carlo method based on Chronos’s quantile predictions, clarifying the future self-sufficiency distribution that includes seasonal fluctuations and prediction uncertainty.
The contributions of this study are twofold. First, we demonstrate the application of Chronos to household energy data (photovoltaic generation, electricity consumption, and gas consumption) and confirm a high prediction accuracy that surpasses baseline models in medium-term forecasting over 12 months monthly. Second, we propose a probabilistic forecasting framework for annual ESSR, integrating electricity and gas, by combining Chronos’s quantile forecasting capability with Monte Carlo simulation. This framework provides confidence intervals and variability that are unattainable with conventional deterministic estimation, thereby supporting risk-based decision-making regarding household ESSR.
The remainder of this paper is organized as follows: Section 2 reviews related research. Section 3 explains research methodology and predictive models. Section 4 presents the results obtained. Section 5 discusses the obtained results. Section 6 presents the overall conclusions of this study.

2. Related Work

2.1. ZEH Evaluation and the Performance Gap

The Japanese definition of ZEH (net-Zero Energy House) aims for an annual net-zero balance of primary energy consumption [5]. A practical indicator used to measure household energy self-sufficiency is the Energy Self-Sufficiency Rate (ESSR) [4,12,13]. ESSR, which shows the extent to which on-site generation, such as photovoltaic (PV) systems, can cover demand, is employed as a crucial metric in ZEH performance evaluation and smart house demonstration studies.
However, when comparing the estimated values from the design phase with the measured values after operation using such indicators, a significant divergence-the performance gap—is widely recognized as a common academic challenge across both residential and non-residential buildings. Zou et al. [8] reported cases which show that actual energy consumption reached two to five times the design prediction, and a review by Van Dronkelaar et al. [14] shows that measured consumption exceeded regulatory benchmarks by an average of 34%.
The primary cause of this performance gap lies in the inability of design-stage simulation models to appropriately handle uncertainty. Specifically, occupant behavior is widely acknowledged as the main source of uncertainty in building energy performance [15,16], and the discrepancy between standardized occupancy schedules and actual behavior has been demonstrated to create the gap [17]. Kampelis et al. [18,19] reported that the performance gap is particularly prominent in residential buildings compared to industrial or research buildings, concluding that the reason is the difficulty in predicting occupant behavior. Furthermore, the uncertainty in weather data is also a critical fluctuating factor in forecasting future energy supply and demand [20].
For ex-ante assessments, such as investment decisions for ZEH adoption or policy planning for subsidy design, a prospective outlook of ESSR that accounts for future uncertainties is essential [21,22]. Therefore, there is a strong need for a framework that can quantify the uncertainties stemming from the root causes of the performance gap, namely occupant behavior and weather, and probabilistically forecast the future ESSR [23,24].

2.2. Time-Series Forecasting Research for Energy Supply and Demand

To address the challenge of uncertainty in ZEH evaluation discussed in the previous section, future forecasting of the energy self-sufficiency rate is essential. Conventional household energy forecasting research has used statistical methods (e.g., SARIMA) or machine learning (e.g., LSTM) [25,26,27]. However, the results often remained point forecasts and failed to present the range of uncertainty—the root cause of the performance gap—as a probability distribution [26,27]. Although the importance of probabilistic forecasting to quantify uncertainty has been pointed out in the energy field [28,29], its application with conventional models has been limited.
To overcome this challenge of point forecasting, this study utilizes recently introduced time-series foundation models [11]. These models, pre-trained on diverse data, have been demonstrated to achieve high predictive accuracy and generalization performance with only fine-tuning, even on small amounts of household energy data [30,31]. Their most important feature is the ability to output future uncertainty by default as probabilistic forecasts (quantile forecasts) [11,31]. Therefore, time-series foundation models surpass the limitations of conventional point forecasting, enabling the probabilistic handling of uncertainty. They are ideal inputs for the probabilistic evaluation of the energy self-sufficiency rate discussed in the next section, resolving the issues presented in the previous section.

2.3. Probabilistic Evaluation Approaches and Monte Carlo Method

This section reviews related research on evaluation methods that use the probabilistic forecasts obtained from time-series foundation models as input to quantify the final uncertainty of the energy self-sufficiency rate.
To address the performance gap problem (Section 2.1), research has advanced on probabilistically handling uncertainties in energy supply and demand. Sun et al. [32,33] pointed out that using probability distributions instead of conventional deterministic forecasts (point forecasts) is essential for closing the gap and improving the reliability of decision-making. Virote et al. [34] and Baetens et al. [16] emphasized the importance of methods for probabilistically modeling occupant behavior, a primary cause of the performance gap. Furthermore, Im et al. [22] and Hu et al. [17] have used Monte Carlo simulation (MC method) to quantify the impact of uncertainties, such as occupant behavior, on energy savings and demand flexibility, demonstrating the effectiveness of the MC method for analyzing energy supply and demand variations [35].
This study takes these approaches to performance gap analysis (uncertainty in supply and demand) one step further. Much of the conventional probabilistic evaluation [17,22] focused on individually modeling uncertain factors, such as occupant behavior. In contrast, the time-series foundation model discussed in Section 2.2 makes it possible to directly predict future uncertainty as a probability distribution from the energy supply and demand time-series data itself, which implicitly includes the factors of the performance gap (occupant behavior, weather). In this study, we use this probability distribution of supply and demand as the primary input for MC simulation. This allows us to construct a framework for probabilistically evaluating the energy self-sufficiency rate over a medium-to-long-term (annual) span. This approach, which combines probabilistic forecasting of supply and demand by time-series foundation models with the MC method to quantify the energy self-sufficiency rate distribution at the household level, is expected to advance conventional methods [22,32].

3. Forecasting Models

The objective of this study is to probabilistically forecasting the three household energy variables and presenting the energy self-sufficiency rate (ESSR) distribution. For this, we describe the experimental design and analytical methods. The analysis flow of this study consists of two experiments: (1) validation of the forecasting model, and (2) simulation of the ESSR.

3.1. Dataset

This study uses household energy data collected via home energy management systems (HEMS) in Kitakyushu City, Japan. The collection period spans 3 years and 3 months, from 1 January 2021, to 31 March 2024, with a temporal resolution of one month. After the preprocessing steps described in Section 3.2, a dataset of 39 households, cleared of missing values and anomalies, is used for the analysis.
To align with the primary energy evaluation of ZEH, this study unified energy quantities with different units into megajoules (MJ). For electricity, we use the definitional conversion of 1 kWh = 3.6 MJ , consistent with the energy statistics conventions set by the International Energy Agency (IEA) [36]. For gas, we perform the conversion based on the terms and conditions of the gas utility in the data collection area [37], setting 1 m 3 = 45 MJ .
The forecast targets in this study are three monthly time-series data (unit: MJ): power consumption, gas consumption, and photovoltaic (PV) generation.
Furthermore, we use the following as covariates for the forecasting models: monthly power purchase, monthly power sale, and fuel cell (FC) generation (unit: MJ) for each household. Additionally, monthly average temperature (°C), monthly average wind speed (m/s), monthly average relative humidity (%), monthly total precipitation (mm), and monthly total global horizontal radiation ( MJ / m 2 ), obtained from nearby weather observation data, are also used as covariates. The main specifications of the dataset are summarized in Table 1.

3.2. Experiment 1: Experimental Setup for Forecasting Model Validation

The objective here is to demonstrate that the time-series foundation model Chronos, which is fundamental to this study’s analysis, achieves higher accuracy than a standard baseline model (Seasonal Naive) and performs better than the standard deviation of the data.

3.2.1. Forecasting Model and Baseline

The Chronos model used in this study is a pre-trained time-series forecasting model based on a language model architecture, developed by Amazon AI [11]. Its main feature is treating time-series data as a language. Input time-series data are converted into a sequence of tokens with a fixed vocabulary through processes called scaling and quantization. The T5-architecture-based model is trained on this token sequence using a cross-entropy loss [11]. Chronos is inherently a generative model; given a past history (context), it makes predictions by generating multiple possible future trajectories through sampling. In this study, we obtain the probabilistic forecasts required in Section 3.3 and beyond by calculating quantiles from these sample trajectories. Since Chronos is pre-trained, training from scratch is unnecessary; we use the training data from Section 3.2.2 only for fine-tuning the model. The three variables are set as forecast targets, and we adopted a Pooled Fine-tuning strategy, in which all the time-series data from the 39 households are input into a single model. This is to draw out high generalization performance by referencing data from other households, even if the data volume for an individual household is small (27 months).
Regarding the baseline, we select the SN method as a simple yet strong standard for time-series forecasting. Although deep learning models such as Long Short-Term Memory (LSTM) are commonly used, they typically require large-scale datasets to ensure stable training and avoid overfitting. In this study, the training data is limited to 27 monthly observations per household, which raised concerns that training complex architectures like LSTM from scratch would lead to unstable performance. Consequently, we chose to compare Chronos against SN, which serves as a robust benchmark that captures the clear annual seasonality confirmed in our exploratory analysis. This setup highlights the capability of time-series foundation models to provide reliable forecasts even in data-constrained scenarios where traditional deep learning might struggle.

3.2.2. Experimental Setup and Evaluation Metrics

To forecast and evaluate the energy self-sufficiency rate for the next 12 months (one year), we split the total 39 months of data chronologically. The first 27 months (January 2021–March 2023) are used as training data for fine-tuning the model, and the remaining 12 months (April 2023–March 2024) are used as test data to evaluate the model’s forecasting performance.
Forecast accuracy is compared against the baseline and the data’s standard deviation (SD) using the metrics of MAE (mean absolute error), RMSE (root mean square error), and MASE (mean absolute scaled error).

3.3. Experiment 2: Methodology for Probabilistic ESSR Simulation

The objective of this experiment is to probabilistically present the future distribution of the Energy Self-Sufficiency Rate (ESSR) at both monthly and annual levels. In this study, ESSR is defined as the ratio of generated renewable energy to the total energy demand (electricity and gas consumption) as follows:
ESSR = PV Generation Electricity Consumption + Gas Consumption ,
where PV Generation, Electricity Consumption, and Gas Consumption are all expressed in Megajoules (MJ).
Unlike the initial deterministic approach, this analysis targets all 39 households and constructs a future ESSR distribution using Monte Carlo Simulation (MCS) that accounts for the correlations between the three energy variables. The procedure is described in the following steps:
Step 1 (Modeling Marginal Distributions and Correlations): First, the monthly quantile forecasts ( q 10 to q 90 ) for the future 12 months obtained in Experiment 1 are used as marginal distributions. For each variable v { PV , Elec , Gas } and each month m { 1 , , 12 } , we construct a continuous inverse cumulative distribution function (Inverse CDF), F m , v 1 ( u ) , where u [ 0 , 1 ] represents the cumulative probability. The 80% confidence interval (q10–q90) is adopted here as a standard range in energy risk management to capture significant fluctuation risks without including extreme outliers that may reduce practical precision.
To address the interdependence confirmed in Section 3.3, we modeled the correlation structure using a Gaussian copula. We estimated the Spearman rank correlation matrix Σ from the training data of the three variables. This approach avoids the physically inconsistent scenarios (e.g., simultaneous peak generation and peak demand) that arise from an independence assumption.
Step 2 (MCS with Correlated Random Numbers): We execute MCS to generate S = 5000 future scenarios, where S denotes the total number of simulated scenarios. To maintain the correlation structure during sampling, the following process is performed for each scenario s { 1 , , S } :
  • Generate a correlated random vector z ( s ) = ( z PV , z Elec , z Gas ) T from a multivariate normal distribution N ( 0 , Σ ) .
  • Transform z ( s ) into a correlated uniform random vector u ( s ) = ( Φ ( z PV ) , Φ ( z Elec ) , Φ ( z Gas ) ) T using the standard normal CDF Φ .
  • Substitute u v ( s ) into the inverse CDF to obtain the sample value X m , v ( s ) = F m , v 1 ( u v ( s ) ) .
To ensure the physical plausibility of the results and respond to concerns regarding unrealistic data, a clipping process is applied such that X m , v ( s ) = max ( 0 , X m , v ( s ) ) , ensuring all sampled energy values are non-negative.
Step 3 (Calculating ESSR Distributions): Using the S × 12 × 3 samples, we calculate the monthly and annual ESSR distributions. The monthly ESSR for scenario s in month m, denoted as ESSR m ( s ) , is calculated as:
ESSR m ( s ) = X m , PV ( s ) X m , Elec ( s ) + X m , Gas ( s ) .
For the annual evaluation, we first calculate the annual total for each variable v in scenario s, denoted as Annual v ( s ) , as follows:
Annual v ( s ) = m = 1 12 X m , v ( s ) .
The annual ESSR for scenario s, Annual_ESSR ( s ) , is then derived by:
Annual_ESSR ( s ) = Annual PV ( s ) Annual Elec ( s ) + Annual Gas ( s ) .
This yields a probability distribution of 5000 possible future annual ESSR values for each household.

4. Results

4.1. Experiment 1: Results of Forecasting Model Performance Validation

This section presents the results of comparing the forecasting accuracy of the Chronos model against the baseline and the data’s standard deviation (SD), based on the experimental setup defined in Section 3.2.

4.1.1. Overall Forecasting Accuracy (Quantitative Evaluation)

Table 2 shows the average forecast accuracy across the 39 households. First, to confirm whether the models are making meaningful predictions beyond the simple data variability (Test_SD), we compare RMSE and Test_SD. For all variables, the average RMSE of both Chronos and SN models is substantially lower than the Test_SD, suggesting that they have learned patterns such as the seasonality.
Next, comparing the performance of Chronos and the baseline, Chronos’s average MAE, RMSE, and MASE clearly outperform SN for all variables.
We confirm that the superiority of Chronos applies not only to the average values but also to the entire distribution of households. Figure 1 shows the box plots for MASE. A MASE below 1.0 (dashed line) signifies higher accuracy than the baseline SN.
Figure 1 indicates that the median MASE for Chronos is clearly below 1.0 for all three variables, showing higher predictive accuracy compared to SN. Particularly for PV Generation, the entire distribution falls below the 1.0 dashed line, meaning it surpasses SN’s accuracy for all 39 households analyzed. For Electricity Consumption, the majority of the distribution is well below 1.0, but for some households, the accuracy is equivalent to or slightly worse than SN. For Gas Consumption, although the median is below 1.0, the upper part of the distribution (75th percentile and above) exceeds 1.0, suggesting that for some households where prediction is difficult, it fails to surpass SN’s accuracy. Nonetheless, it is clear that Chronos generally outperforms the baseline model in terms of accuracy.

4.1.2. Calibration Evaluation of Probabilistic Forecasts

The reliability of the probabilistic forecast widths was quantitatively verified using the coverage rate of the nominal 80% confidence interval (q10–q90). Table 3 presents the actual coverage rates compared to the nominal 80% rate.
The results show that the calibration performance varies by variable. Electricity Consumption achieved an actual coverage of 82.69%, which is closely aligned with the nominal rate. For PV Generation, the actual coverage was 70.94%, indicating that the predicted intervals were slightly optimistic and did not fully capture the uncertainties arising from extreme weather fluctuations. However, even with this 71% coverage, the model provides a quantifiable uncertainty range that is far more informative for risk-based decision-making than deterministic point forecasts. Homeowners and policymakers can interpret the lower bound ( q 10 ) as a conservative risk threshold for energy self-sufficiency planning, allowing for a more resilient outlook against potential supply shortages.
Regarding Gas Consumption, the actual coverage reached 90.60%, resulting in intervals that were wider than the nominal rate. While this breadth might initially suggest a lack of precision, it accurately reflects the high seasonal volatility and inherent unpredictability of gas demand for heating and hot water. It should be noted that, although the monthly intervals are wide, the positive and negative forecast errors effectively balance out during the 12-month aggregation process. As demonstrated in Section 4.2, this enables the model to maintain high practical utility for annual cost estimation and energy self-sufficiency assessment.

4.1.3. Forecasting Results for the Representative Household

Next, to complement the quantitative evaluation, we visually compare the 12-month forecast results for the representative household. Figure 2 plots the actual values, the Chronos forecast, and the baseline forecast for each of the three target variables (PV Generation, Electricity Consumption, and Gas Consumption).
The median forecast from Chronos is shown to follow the seasonal fluctuation patterns of the actual values well for all three variables. Particularly during the peak periods for Electricity Consumption (Figure 2b) and Gas Consumption (Figure 2c), Chronos significantly reduces the error compared to the SN baseline.
Furthermore, to verify that high aggregate accuracy is not merely a result of error cancellation between households, we evaluated the specific performance for the representative household. The actual annual ESSR for this household was 28.02%, while the predicted mean ESSR from the simulation was 27.71%, resulting in a remarkably low absolute error of 0.31%. This level of precision at the individual household level provides initial evidence that the framework maintains high reliability without relying on aggregate-level error cancellation.
Regarding the probabilistic forecasts (the blue shaded area of q10–q90), the trends observed in Figure 2 support the calibration analysis shown in Section 4.1.2. The confidence interval for Electricity Consumption (Figure 2b) effectively covers the actual values, whereas the interval for Gas Consumption (Figure 2c) reflects its high volatility through wider intervals. Conversely, for PV Generation (Figure 2a), the intervals tend to be more sensitive to sudden weather-driven shifts.
In conclusion, both quantitative and qualitative evaluations demonstrate that Chronos possesses significantly superior accuracy over the baseline for mid-term forecasting. While calibration performance varies by variable, these results confirm the appropriateness of using these probabilistic forecasts as inputs for the Monte Carlo simulation in Section 3.3.

4.2. Experiment 2: Results of Probabilistic ESSR Simulation

This section presents the probabilistic distribution of the household ESSR, calculated using the Chronos quantile forecasts validated in Experiment 1, based on the methodology defined in Section 3.3. ESSR is defined by Equation (1) and represents the ratio of PV generation (energy creation) to total energy consumption.

4.2.1. Annual ESSR Distribution for All Households

First, to grasp the variability of self-sufficiency rates among households, the cumulative distribution function (CDF) of the annual ESSR for all 39 households is shown in Figure 3. This graph compares the actual values for each household with the predicted mean values obtained through the proposed framework.
The Chronos-based simulation closely captures the shape of the actual CDF. To rigorously evaluate the statistical similarity between the simulated and real-world distributions, we conducted a Kolmogorov-Smirnov (K-S) test and calculated the Wasserstein distance. The K-S statistic was 0.0769 with a p-value of 0.9999, which strongly indicates that there is no significant difference between the two distributions. Furthermore, the Wasserstein distance (Earth Mover’s Distance) was 0.0110, confirming that the geometric divergence between the distributions is extremely small. These metrics provide objective evidence that the framework accurately reproduces the actual variability of ESSR across households.
Furthermore, to ensure that this high aggregate accuracy was not merely a result of error cancellation between households, we analyzed the individual-level prediction performance, as shown in Figure 4. The scatter plot (left) demonstrates a strong linear correlation between actual and predicted values along the y = x line, while the box plot (right) shows that the prediction errors are tightly clustered around zero. This confirms that the framework provides consistent accuracy for each individual household rather than relying on the cancellation of positive and negative errors across the dataset.

4.2.2. Monthly ESSR Distribution for the Representative Household

Next, to evaluate the temporal uncertainty within a household, the probabilistic forecast of the monthly ESSR for the representative household is shown in Figure 5. This plots the quantiles (median q50, and the 80% confidence interval q10–q90) of the monthly ESSR samples obtained in Step 2 of Section 3.3.
Figure 5 illustrates the seasonal patterns of the ESSR. In August, the ESSR exceeds 50%. In contrast, from December to February, the ESSR falls below 20%. Furthermore, forecast uncertainty varies by month; notably, there is a tendency for uncertainty to increase during the summer season, where fluctuations in PV generation are significant. When compared with actual values, the predicted median closely tracks the fluctuations of the actual data, confirming high accuracy as a point forecast. Moreover, the majority of actual values fall within the predicted 80% confidence interval (q10–q90), demonstrating the effectiveness of the ESSR forecast in capturing uncertainty even with seasonal fluctuations.

4.2.3. Annual ESSR Distribution for the Representative Household

Finally, to evaluate the annual ESSR uncertainty for the representative household, the probability distribution of the annual ESSR obtained by MCS ( S = 5000 iterations) is shown as a histogram in Figure 6.
Figure 6 demonstrates the main contribution of this study’s probabilistic forecasting framework. With conventional deterministic point forecasting, the annual ESSR for this household can only be presented as a single value: the average, which is approximately 27.7%. However, according to this methodology, it is clear that the predicted ESSR forms a distinct probability distribution centered around the expected value.
This result quantifies the risk of the future ESSR falling below (or exceeding) the expected value, providing far more information than a single point forecast for evaluating household energy autonomy. The width of this distribution reflects the inherent uncertainties in weather and occupant behavior. While the results are subject to the calibration errors of input variables identified in Section 4.1.2, the current framework’s integration of the Gaussian copula allows for a more rigorous quantification of these risks by accounting for the dependency structure between PV generation and energy demand, which was a limitation in simpler independent models.

5. Discussion

5.1. Data Efficiency and Foundation Models

A significant challenge in residential energy forecasting is the limited duration of available high-resolution data. In this study, the training period was restricted to 27 months per household, which is generally insufficient for traditional deep learning models like LSTM to achieve stable generalization without overfitting. However, as demonstrated in Section 4.1, the pre-trained Chronos model provided accurate forecasts by leveraging knowledge from diverse time-series datasets during its pre-training phase. This suggests that the foundation model approach is particularly effective for small-scale residential datasets where long-term historical records are unavailable. While 27 months may not capture long-term inter-annual climate variability, the model’s ability to learn seasonal patterns from limited local data underscores its practical utility for newly occupied ZEHs.

5.2. Reliability of Probabilistic Risks and Copula-Based Integration

The calibration analysis in Section 4.1.2 revealed that the coverage for PV generation (70.94%) was slightly lower than the nominal 80%, indicating a tendency to underestimate downside risks during extreme weather. However, the introduction of the Gaussian copula in Section 3.3 addresses the critical limitation of variable independence. By explicitly modeling the dependencies between PV generation and energy demand, the framework successfully captures compound risks—such as the simultaneous occurrence of low solar radiation and high heating demand—that deterministic or independent stochastic models might overlook. Although monthly deviations, particularly in gas consumption, were observed, the aggregate annual ESSR remained highly accurate. This indicates that while short-term management requires caution, the proposed framework is robust for its primary objective: providing a reliable long-term outlook for energy self-sufficiency.

5.3. Value of Probabilistic Information for Decision-Making

Conventional ZEH evaluations often rely on deterministic point estimates, which might provide a false sense of security by omitting potential performance gaps and downside risks. The probabilistic distribution of ESSR presented in this study offers a more transparent evaluation for homeowners and policymakers. For instance, the cumulative distribution and fan charts (Figure 5 and Figure 6) visualize not only the expected performance but also the tail risks where the self-sufficiency rate might drop significantly. This allows for risk-averse decision-making, such as determining the optimal capacity of battery storage or evaluating the resilience of a ZEH investment against the worst-case weather scenarios. Such insights are unattainable through standard average-based models and are crucial for fostering trust in energy-saving policies.

5.4. Generalization and Future Outlook

This study serves as a proof-of-concept focused on 39 households in a single city in Japan. While foundation models inherently possess high generalization capabilities, the specific energy patterns and climate conditions of this dataset limit the immediate extrapolation of these results to different geographic regions. Future work should involve validating the framework across diverse climate zones and larger datasets. Furthermore, refining the dependence structure with more complex models, such as Vine Copulas, could further improve the accuracy of tail risk assessment. Despite these limitations, the integration of pre-trained foundation models and Monte Carlo simulation with copulas establishes a new benchmark for risk-aware residential energy evaluation.

6. Conclusions

This study aimed to quantify the uncertainty arising from occupant behavior and weather conditions—primary factors contributing to the performance gap in evaluating household energy self-sufficiency. To address this, we proposed a medium-term probabilistic forecasting framework that integrates a time-series foundation model, Chronos, with Monte Carlo simulation utilizing Gaussian copulas.
Validation using actual data from 39 households demonstrated that the proposed method achieved higher prediction accuracy than the conventional baseline model across all target variables: PV generation, electricity consumption, and gas consumption. Furthermore, by incorporating the dependence structure between these variables through copula functions, the framework successfully quantified compound risks that are often overlooked by independent stochastic models. Statistical evaluations, including the Kolmogorov-Smirnov test and Wasserstein distance, rigorously confirmed that the simulated distributions were statistically identical to the real-world measured data. This enables homeowners to grasp a concrete future outlook that includes not only the expected performance but also the downside risks.
Despite its performance, this study has limitations. Significant uncertainty remains in the monthly forecasting of gas consumption, and the high estimation accuracy of the annual ESSR partially relies on the error cancellation effect during temporal aggregation. Moreover, as this study focused on a single city, the generalization of the pre-trained foundation model to diverse climatic regions remains a subject for further validation.
Future work will involve refining the dependency modeling using more complex structures, such as Vine Copulas, and extending the framework to larger datasets across different geographic areas to enhance the robustness of risk-aware residential energy evaluations.

Author Contributions

Conceptualization, H.Y. and M.N.; methodology, H.Y.; software, H.Y.; validation, H.Y. and L.W.; formal analysis, H.Y.; investigation, H.Y.; resources, M.N.; data curation, H.Y.; writing—original draft preparation, H.Y.; writing—review and editing, H.Y., L.W. and M.N.; visualization, H.Y.; supervision, M.N.; project administration, M.N.; funding acquisition, M.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by JSPS KAKENHI Grant Nos. 23K26130, 22H00512, 22KK0155, and also Ministry of the Environment, Japan.

Data Availability Statement

The raw residential energy data used in this study are unavailable due to privacy and ethical restrictions. However, the Python 3.10 code used for the Chronos model fine-tuning and Monte Carlo simulation is available in a publicly accessible repository at https://github.com/MTBT-27/probabilistic-essr-forecasting (accessed on 8 January 2026). This repository includes mock data to demonstrate the functionality of the code.

Acknowledgments

We would like to express our gratitude to Takanori Ida, Yoshiaki Ushifusa and Takuya Fukushima for their invaluable support. During the preparation of this manuscript, the authors used Google Gemini (specifically, Gemini 2.5 Pro) for the purposes of improving the readability and grammatical accuracy of the manuscript, translating portions of the text from Japanese to English, and generating and debugging Python/LaTeX code snippets. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ESSREnergy Self-Sufficiency Rate
ZEHNet-Zero Energy House
PVPhotovoltaic
HEMSHome Energy Management System
MCSMonte Carlo Simulation
MAEMean Absolute Error
RMSERoot Mean Square Error
MASEMean Absolute Scaled Error
SNSeasonal Naive
CDFCumulative Distribution Function
CRediTContributor Roles Taxonomy
MJMegajoules

References

  1. Ürge-Vorsatz, D.; Khosla, R.; Bernhardt, R.; Chan, Y.C.; Vérez, D.; Hu, S.; Cabeza, L.F. Advances toward a net-zero global building sector. Annu. Rev. Environ. Resour. 2020, 45, 227–269. [Google Scholar] [CrossRef]
  2. Camarasa, C.; Mata, É.; Navarro, J.P.J.; Reyna, J.; Bezerra, P.; Angelkorte, G.B.; Feng, W.; Filippidou, F.; Forthuber, S.; Harris, C.; et al. A global comparison of building decarbonization scenarios by 2050 towards 1.5–2 C targets. Nat. Commun. 2022, 13, 3077. [Google Scholar] [CrossRef] [PubMed]
  3. Yamaura, K.; Xu, S.; Sugiyama, M.; Ju, Y. Public perceptions on net zero energy houses in Japan. Sustain. Sci. 2025, 20, 373–384. [Google Scholar] [CrossRef]
  4. Oota, M.; Iwafune, Y.; Ooka, R. Estimation of self-sufficiency rate in detached houses using home energy management system data. Energies 2021, 14, 975. [Google Scholar] [CrossRef]
  5. Nguyen, T.H.; Take, K.; Take, K. Reviewing of the net-zero energy buildings and housings in Japan. IOP Conf. Ser. Earth Environ. Sci. 2024, 1402, 012004. [Google Scholar] [CrossRef]
  6. Wilberforce, T.; Olabi, A.; Sayed, E.T.; Elsaid, K.; Maghrabie, H.M.; Abdelkareem, M.A. A review on zero energy buildings–Pros and cons. Energy Built Environ. 2023, 4, 25–38. [Google Scholar] [CrossRef]
  7. Menezes, A.C.; Cripps, A.; Bouchlaghem, D.; Buswell, R. Predicted vs. actual energy performance of non-domestic buildings: Using post-occupancy evaluation data to reduce the performance gap. Appl. Energy 2012, 97, 355–364. [Google Scholar] [CrossRef]
  8. Zou, P.X.; Wagle, D.; Alam, M. Strategies for minimizing building energy performance gaps between the design intend and the reality. Energy Build. 2019, 191, 31–41. [Google Scholar] [CrossRef]
  9. Kim, D.; Jang, Y.; Choi, Y. Comparative analysis of estimated and actual power self-sufficiency rates in energy-sharing communities with solar power systems. Energies 2023, 16, 7941. [Google Scholar] [CrossRef]
  10. Belaïd, F.; Ranjbar, Z.; Massié, C. Exploring the cost-effectiveness of energy efficiency implementation measures in the residential sector. Energy Policy 2021, 150, 112122. [Google Scholar] [CrossRef]
  11. Ansari, A.F.; Shchur, O.; Küken, J.; Auer, A.; Han, B.; Mercado, P.; Rangapuram, S.S.; Shen, H.; Stella, L.; Zhang, X.; et al. Chronos-2: From Univariate to Universal Forecasting. arXiv 2025. [Google Scholar] [CrossRef]
  12. Singh, N.K.; Nagahara, M. LightGBM-, SHAP-, and Correlation-Matrix-Heatmap-Based Approaches for Analyzing Household Energy Data: Towards Electricity Self-Sufficient Houses. Energies 2024, 17, 4518. [Google Scholar] [CrossRef]
  13. Wang, Z.; Luther, M.B.; Horan, P.; Matthews, J.; Liu, C. On-site solar PV generation and use: Self-consumption and self-sufficiency. Build. Simul. 2023, 16, 1835–1849. [Google Scholar] [CrossRef]
  14. Van Dronkelaar, C.; Dowson, M.; Burman, E.; Spataru, C.; Mumovic, D. A review of the energy performance gap and its underlying causes in non-domestic buildings. Front. Mech. Eng. 2016, 1, 17. [Google Scholar] [CrossRef]
  15. Yan, D.; O’Brien, W.; Hong, T.; Feng, X.; Gunay, H.B.; Tahmasebi, F.; Mahdavi, A. Occupant behavior modeling for building performance simulation: Current state and future challenges. Energy Build. 2015, 107, 264–278. [Google Scholar] [CrossRef]
  16. Baetens, R.; Saelens, D. Modelling uncertainty in district energy simulations by stochastic residential occupant behaviour. J. Build. Perform. Simul. 2016, 9, 431–447. [Google Scholar] [CrossRef]
  17. Hu, M.; Xiao, F. Quantifying uncertainty in the aggregate energy flexibility of high-rise residential building clusters considering stochastic occupancy and occupant behavior. Energy 2020, 194, 116838. [Google Scholar] [CrossRef]
  18. Kampelis, N.; Gobakis, K.; Vagias, V.; Kolokotsa, D.; Standardi, L.; Isidori, D.; Cristalli, C.; Montagnino, F.; Paredes, F.; Muratore, P.; et al. Evaluation of the performance gap in industrial, residential & tertiary near-Zero energy buildings. Energy Build. 2017, 148, 58–73. [Google Scholar] [CrossRef]
  19. Zhang, X.; Schildbach, G.; Sturzenegger, D.; Morari, M. Scenario-based MPC for energy-efficient building climate control under weather and occupancy uncertainty. In Proceedings of the 2013 European Control Conference (ECC), Zurich, Switzerland, 17–19 July 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 1029–1034. [Google Scholar]
  20. Wang, L.; Mathew, P.; Pang, X. Uncertainties in energy consumption introduced by building operations and weather for a medium-size office building. Energy Build. 2012, 53, 152–158. [Google Scholar] [CrossRef]
  21. Verhaeghe, C.; Audenaert, A.; Verbeke, S. Assessing performance regret of residential energy-flexibility measures under uncertainty: An ex-ante analysis of techno-economic implications. Energy Build. 2025, 305, 115857. [Google Scholar] [CrossRef]
  22. Im, P.; Jackson, R.; Bae, Y.; Dong, J.; Cui, B. Probabilistic reliability assessment and case studies for predicted energy savings in residential buildings. Energy Build. 2020, 209, 109658. [Google Scholar] [CrossRef]
  23. Xu, L.; Wang, S.; Tang, R. Probabilistic load forecasting for buildings considering weather forecasting uncertainty and uncertain peak load. Appl. Energy 2019, 237, 180–195. [Google Scholar] [CrossRef]
  24. Chong, A.; Lam, K.P.; Pozzi, M.; Yang, J. Bayesian calibration of building energy models with large datasets. Energy Build. 2017, 154, 343–355. [Google Scholar] [CrossRef]
  25. Chaturvedi, S.; Rajasekar, E.; Natarajan, S.; McCullen, N. A comparative assessment of SARIMA, LSTM RNN and Fb Prophet models to forecast total and peak monthly energy demand for India. Energy Policy 2022, 168, 113097. [Google Scholar] [CrossRef]
  26. Olu-Ajayi, R.; Alaka, H.; Sulaimon, I.; Sunmola, F.; Ajayi, S. Building energy consumption prediction for residential buildings using deep learning and other machine learning techniques. J. Build. Eng. 2022, 45, 103406. [Google Scholar] [CrossRef]
  27. Cai, M.; Pipattanasomporn, M.; Rahman, S. Day-ahead building-level load forecasts using deep learning vs. traditional time-series techniques. Appl. Energy 2019, 236, 1078–1088. [Google Scholar] [CrossRef]
  28. Taieb, S.B.; Huser, R.; Hyndman, R.J.; Genton, M.G. Forecasting uncertainty in electricity smart meter data by boosting additive quantile regression. IEEE Trans. Smart Grid 2016, 7, 2448–2455. [Google Scholar] [CrossRef]
  29. Wan, C.; Lin, J.; Wang, J.; Song, Y.; Dong, Z.Y. Direct quantile regression for nonparametric probabilistic forecasting of wind power generation. IEEE Trans. Power Syst. 2016, 32, 2767–2778. [Google Scholar] [CrossRef]
  30. Meyer, M.; Zapata, D.; Kaltenpoth, S.; Müller, O. Benchmarking time series foundation models for short-term household electricity load forecasting. arXiv 2024. [Google Scholar] [CrossRef]
  31. Park, Y.J.; Germain, F.; Liu, J.; Wang, Y.; Koike-Akino, T.; Wichern, G.; Azizan, N.; Laughman, C.; Chakrabarty, A. Probabilistic Forecasting for Building Energy Systems using Time-Series Foundation Models. Energy Build. 2025, 348, 116446. [Google Scholar] [CrossRef]
  32. Sun, S.; Kensek, K.; Noble, D.; Schiler, M. A method of probabilistic risk assessment for energy performance and cost using building energy simulation. Energy Build. 2016, 110, 1–12. [Google Scholar] [CrossRef]
  33. Sun, Y. Closing the Building Energy Performance Gap by Improving Our Predictions. Ph.D. Thesis, Georgia Institute of Technology, Atlanta, GA, USA, 2014. [Google Scholar]
  34. Virote, J.; Neves-Silva, R. Stochastic models for building energy prediction based on occupant behavior assessment. Energy Build. 2012, 53, 183–193. [Google Scholar] [CrossRef]
  35. Zhang, S.; Sun, Y.; Cheng, Y.; Huang, P.; Oladokun, M.O.; Lin, Z. Response-surface-model-based system sizing for Nearly/Net zero energy buildings under uncertainty. Appl. Energy 2018, 228, 1020–1031. [Google Scholar] [CrossRef]
  36. International Energy Agency (IEA) and Organisation for Economic Co-operation and Development (OECD). Energy Statistics Manual; IEA/OECD: Paris, France, 2005; Available online: https://iea.blob.core.windows.net/assets/67fb0049-ec99-470d-8412-1ed9201e576f/EnergyStatisticsManual.pdf (accessed on 10 November 2025).
  37. Saibu Gas Co., Ltd. General Gas Supply Conditions. Available online: https://www.saibugas.co.jp/home/rates/provision/pdf/ippan_fk_202308.pdf (accessed on 10 November 2025). (In Japanese).
Figure 1. MASE (Mean Absolute Scaled Error) box plots (distribution across 39 households). MASE < 1.0 (dashed line) indicates higher accuracy than the baseline.
Figure 1. MASE (Mean Absolute Scaled Error) box plots (distribution across 39 households). MASE < 1.0 (dashed line) indicates higher accuracy than the baseline.
Energies 19 00362 g001
Figure 2. Comparison of 12-month forecasts for the representative household. Actual values (black) are compared against the Chronos forecast (blue, median with q10–q90 confidence interval) and the SN forecast (gray).
Figure 2. Comparison of 12-month forecasts for the representative household. Actual values (black) are compared against the Chronos forecast (blue, median with q10–q90 confidence interval) and the SN forecast (gray).
Energies 19 00362 g002
Figure 3. Cumulative Distribution Function (CDF) of annual ESSR for all 39 households (Forecasted Mean vs. Actual).
Figure 3. Cumulative Distribution Function (CDF) of annual ESSR for all 39 households (Forecasted Mean vs. Actual).
Energies 19 00362 g003
Figure 4. Individual household accuracy analysis: (Left) Scatter plot of actual vs. predicted annual ESSR, where the red solid line and shaded area represent the linear regression and its 95% confidence interval, respectively; and (Right) Distribution of individual prediction errors, where the red horizontal line indicates the zero-error reference. In both plots, overlapping points represent the density of individual household data across the 39 households.
Figure 4. Individual household accuracy analysis: (Left) Scatter plot of actual vs. predicted annual ESSR, where the red solid line and shaded area represent the linear regression and its 95% confidence interval, respectively; and (Right) Distribution of individual prediction errors, where the red horizontal line indicates the zero-error reference. In both plots, overlapping points represent the density of individual household data across the 39 households.
Energies 19 00362 g004
Figure 5. Probabilistic forecast of monthly ESSR for the representative household. Shows the median (q50) and the 80% confidence interval (q10–q90), compared against actual values.
Figure 5. Probabilistic forecast of monthly ESSR for the representative household. Shows the median (q50) and the 80% confidence interval (q10–q90), compared against actual values.
Energies 19 00362 g005
Figure 6. Probability distribution of annual ESSR for the representative household. Results from S = 5000 MCS iterations.
Figure 6. Probability distribution of annual ESSR for the representative household. Results from S = 5000 MCS iterations.
Energies 19 00362 g006
Table 1. Dataset Specifications.
Table 1. Dataset Specifications.
ItemDescription
Data SourceKitakyushu, Japan
Collection Period1 January 2021–31 March 2024
Temporal Resolution1 Month
Number of Households39 (after preprocessing)
Analysis UnitMegajoules (MJ)
Target Variables
Power Consumption (MJ)
Gas Consumption (MJ)
PV Generation (MJ)
Covariates
Power Purchase (MJ),
Power Sale (MJ),
FC Generation (MJ)
Average Temperature (°C),
Average Wind Speed (m/s),
Average Relative Humidity (%)
Total Precipitation (mm),
Total Global Radiation ( MJ / m 2 )
Table 2. Model performance metrics (MAE, RMSE, MASE) averaged over 39 households, compared to the Standard Deviation (SD) of the test data. MAE, RMSE, and SD are in MJ.
Table 2. Model performance metrics (MAE, RMSE, MASE) averaged over 39 households, compared to the Standard Deviation (SD) of the test data. MAE, RMSE, and SD are in MJ.
Target VariableModelMAERMSEMASETest_SD
PV Gen. (MJ)Chronos105.8136.00.61451.7
SN173.9234.91.00
Elec. Cons. (MJ)Chronos169.1212.00.68371.3
SN261.0343.71.00
Gas Cons. (MJ)Chronos401.4475.50.911345.8
SN484.6623.81.00
Table 3. Actual coverage rate (%) of the Chronos 80% confidence interval (q10–q90), compared to the nominal rate of 80%.
Table 3. Actual coverage rate (%) of the Chronos 80% confidence interval (q10–q90), compared to the nominal rate of 80%.
Target VariableActual (%)Difference (%)
PV Generation70.94−9.06
Electricity Consumption82.69+2.69
Gas Consumption90.60+10.60
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yamasaki, H.; Wu, L.; Nagahara, M. Probabilistic Forecasting of Household Energy Self-Sufficiency Rate Using Pre-Trained Time-Series Foundation Models with Monte Carlo Simulation. Energies 2026, 19, 362. https://doi.org/10.3390/en19020362

AMA Style

Yamasaki H, Wu L, Nagahara M. Probabilistic Forecasting of Household Energy Self-Sufficiency Rate Using Pre-Trained Time-Series Foundation Models with Monte Carlo Simulation. Energies. 2026; 19(2):362. https://doi.org/10.3390/en19020362

Chicago/Turabian Style

Yamasaki, Hiroki, Libei Wu, and Masaaki Nagahara. 2026. "Probabilistic Forecasting of Household Energy Self-Sufficiency Rate Using Pre-Trained Time-Series Foundation Models with Monte Carlo Simulation" Energies 19, no. 2: 362. https://doi.org/10.3390/en19020362

APA Style

Yamasaki, H., Wu, L., & Nagahara, M. (2026). Probabilistic Forecasting of Household Energy Self-Sufficiency Rate Using Pre-Trained Time-Series Foundation Models with Monte Carlo Simulation. Energies, 19(2), 362. https://doi.org/10.3390/en19020362

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop