Comparative Evaluation of Deep-Learning and SARIMA Models for Short-Term Residential PV Power Forecasting

Bano, Kalsoom; Suresh, Vishnu; Montana, Francesco; Janik, Przemyslaw

doi:10.3390/en19081991

Open AccessArticle

Comparative Evaluation of Deep-Learning and SARIMA Models for Short-Term Residential PV Power Forecasting

by

Kalsoom Bano

^1,*

,

Vishnu Suresh

¹

,

Francesco Montana

²

and

Przemyslaw Janik

¹

Faculty of Electrical Engineering, Wroclaw University of Science and Technology, 50-370 Wroclaw, Poland

²

Department of Engineering, University of Palermo, 90128 Palermo, Italy

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(8), 1991; https://doi.org/10.3390/en19081991

Submission received: 5 February 2026 / Revised: 7 April 2026 / Accepted: 10 April 2026 / Published: 20 April 2026

Download

Browse Figures

Versions Notes

Abstract

Accurate photovoltaic (PV) power forecasting is essential for the efficient operation of residential energy systems and microgrids, as reliable short-term predictions enable improved energy scheduling, demand management, and operational planning in distributed energy environments. In this study, one-hour-ahead forecasting of residential PV power generation is investigated using real-world data collected from multiple households within an Irish energy community. Several deep-learning architectures, including long short-term memory (LSTM), gated recurrent unit (GRU), convolutional neural networks (CNN), CNN–LSTM hybrid networks, and attention-based LSTM models, are evaluated and compared with a seasonal autoregressive integrated moving average (SARIMA) statistical model. A sliding-window approach is employed to transform the PV time series into a supervised learning problem. To ensure statistical robustness, deep-learning models are evaluated using a multi-run framework, and results are reported as mean ± standard deviation based on MAE, RMSE, MAPE, and R² metrics across multiple households. The results indicate that deep-learning models achieve consistently strong forecasting performance, with GRU frequently providing the most reliable predictions across several households. For instance, in House 5, GRU achieved an RMSE of 142.02 ± 1.87 W and an R² of 0.694 ± 0.008, while in Houses 11 and 13 it attained R² values of 0.837 ± 0.002 and 0.835 0.08, respectively. However, performance varied across households, reflecting the influence of data variability and generation patterns on model effectiveness. In comparison, the SARIMA model demonstrated competitive performance and, in certain cases, outperformed deep-learning models. For example, in House 4, it achieved the lowest RMSE of 90.68 W and the highest R² of 0.709. Overall, these findings highlight that while deep-learning models offer greater adaptability and stability, statistical models remain effective for more regular PV generation patterns. Consequently, the study emphasizes the importance of evaluating forecasting models under realistic household-level conditions and demonstrates that both deep-learning and statistical approaches can provide short-term PV forecasting.

Keywords:

PV power forecasting; residential energy communities; deep learning; SARIMA; time series prediction

1. Introduction

1.1. Context and Motivation

The adoption of residential solar photovoltaic (PV) systems has increased significantly across Europe due to rising electricity prices and growing emphasis on renewable energy sources. Residential PV installations enable decentralized energy production, allowing households to actively participate in energy generation while enhancing energy independence and sustainability within local energy communities. However, solar power generation is inherently intermittent and highly dependent on environmental conditions. Therefore, accurate forecasting techniques are essential for maintaining grid stability and enabling efficient energy management in residential energy systems [1,2]. Despite the advantages of residential PV systems, accurate short-term forecasting remains challenging. Solar PV generation is strongly influenced by meteorological factors, including solar irradiance, temperature, and cloud cover. These factors introduce substantial variability in PV output, resulting in fluctuations in power generation that complicate operational planning and energy scheduling [3]. As the penetration of residential PV systems increases, these fluctuations can introduce uncertainties into energy management strategies and potentially affect the stability of the electricity grid [4]. Accurate short-term PV forecasting is therefore essential for optimizing residential energy management systems, enabling better scheduling of energy storage, improving demand response strategies, and supporting the integration of distributed renewable resources within energy communities.

1.2. Previous Work

Solar power forecasting has been extensively studied in the literature. Many existing studies focus on large-scale solar farms or aggregated datasets collected from multiple PV installations [5]. Traditional time series forecast approaches, such as the seasonal autoregressive integrated moving average (SARIMA) model, have been widely applied due to their ability to capture linear temporal dependencies and seasonal patterns in power generation data [6]. However, statistical models often struggle to capture complex nonlinear patterns observed in residential PV systems. Household-level PV generation is influenced by several localized factors, including rooftop orientation, shading conditions, installation capacity, and microclimatic variations. These characteristics introduce nonlinear patterns that conventional statistical models cannot adequately represent [7]. To address these limitations, recent studies have explored deep-learning approaches for solar power forecasting. Architectures such as Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRU), and Convolutional Neural Networks (CNNs) have demonstrated promising performance in modeling nonlinear temporal relationships in energy time series. These models can learn complex patterns in PV generation data and have shown improved accuracy in short-term forecasting applications [8,9,10].

1.3. Research Gap

Although deep-learning techniques have shown great promise for solar forecasting, many existing studies primarily focus on aggregated data or multi-site installations. Consequently, limited attention has been given to forecasting PV generation at the household level within residential communities [11,12]. In practice, households within the same energy community can exhibit significantly different PV generation patterns due to variations in installation configuration, shading effects, rooftop orientation, and local environmental conditions. Capturing this intra-community variability is essential for designing accurate forecasting models for residential energy management systems [13,14]. Furthermore, many existing forecasting frameworks rely on multivariate models that require extensive meteorological data. However, such data may not always be readily available for residential installations, making univariate forecasting approaches particularly relevant for household-level applications [15].

1.4. Contributions

To address these challenges, this study investigates short-term residential PV power forecasting at the household level using real-world data from an energy community. The main contributions of this work are as follows:

Household-level PV forecasting:
We develop forecasting models for individual residential PV systems using multiple deep-learning architectures, including LSTM, GRU, CNN, CNN–LSTM, and attention-based LSTM networks.
Comprehensive model comparison:
A systematic comparison is performed between deep-learning models and the SARIMA statistical model to evaluate their effectiveness in short-term residential PV power forecasting.
Analysis of intra-community variability:
The study analyzes PV generation patterns across multiple households within the same residential community, highlighting differences in generation behavior caused by local installation characteristics.
Performance evaluation across households:
The experimental results demonstrate that deep-learning models can effectively capture nonlinear temporal patterns in residential PV generation, while statistical models such as SARIMA remain competitive for certain households.

2. Methodology

2.1. Data Collection

Data collection refers to the process of acquiring and preparing data for analysis from various sources [16,17]. In this study, time-series data were obtained from a residential solar photovoltaic (PV) community in Ireland [18]. The dataset contains PV power generation measurements from multiple households equipped with rooftop PV systems. The data preprocessing was conducted using Python 3.12.2. Although the original dataset includes several variables, such as weather parameters and energy consumption profiles, this study focuses on univariate time-series forecasting, using only PV power output as the input feature. This approach reflects practical residential scenarios where detailed meteorological data is not always available. The PV power measurements were originally recorded at a 1-min temporal resolution throughout 2020. To align the data with the forecasting objective, the measurements were resampled to hourly resolution. The resampling process reduces high-frequency noise while preserving the essential temporal structure required for short-term forecasting. As a result, the dataset consists of hourly PV power generation values for each household within the residential community. The overall workflow adopted in this study is illustrated in Figure 1.

2.2. Data Preprocessing

Raw time-series data requires preprocessing before being used for machine-learning models. Several preprocessing steps were applied to ensure data quality, consistency, and reliable model training. First, missing values in the PV power series were handled using mean imputation, where null values were replaced with the mean of the corresponding feature [16]. This approach preserves continuity in the time series while minimizing distortion of the underlying temporal patterns. Next, the timestamp column was converted into a standardized datetime format to allow the extraction of temporal characteristics, such as daily cycles and seasonal variations. The processed dataset was then divided into training and test sets, with 80% of the data allocated for training and 20% for testing. Additionally, 10% of the training data was used as a validation set during model optimization to facilitate early stopping and hyperparameter tuning.

To ensure numerical stability during training, the PV power values were normalized using the Min–Max scaling technique, which transforms the data into the range [0,1] [17]. The normalization process is expressed in Equation (1):

x_{s c a l e d} = \frac{x_{f e a t u r e} - x_{f e a t u r e, m i n}}{x_{f e a t u r e, m a x} - x_{f e a t u r e, m i n}}

(1)

where

x_{f e a t u r e}

represents the original feature value,

x_{f e a t u r e, m i n}

and

x_{f e a t u r e, m a x}

denote the minimum and maximum values within the dataset, respectively, and

x_{s c a l e d}

is the normalized value used as input to the forecasting models. The scaling parameters were computed only from the training data and then applied to the validation and test sets to prevent data leakage. Since photovoltaic power generation is naturally zero during nighttime, these periods were retained in the dataset to preserve realistic temporal patterns. In addition, negative forecast values were clipped to zero during post-processing because negative PV generation is physically meaningless.

2.3. Experimental Setup

To ensure a fair and consistent comparison among the evaluated models, all forecasting experiments were conducted under the same setup. The dataset for each household was divided chronologically into training and test sets, with 80% of the data used for training and the remaining 20% for testing. For the deep-learning models, as shown in Table 1, 10% of the training data was further used as a validation subset during training. A sliding window approach was used to convert the univariate PV time series into supervised learning problems. Specifically, the previous 24-h observations were used as input features to predict the PV power output for the next hour. This setting corresponds to one-hour-ahead forecasting. The deep-learning models were implemented using the TensorFlow/Keras framework on a desktop OC with an Intel^® Core™ i7-10700 CPU @ 2.90 GHz, 16 GB of RAM, running Windows 10 (64-bit). All neural network models were trained using the Adam optimizer with a learning rate of 0.001, a batch size of 32, and a maximum of 100 epochs. To reduce overfitting, a dropout rate of 0.2 was applied, and early stopping with a patience of 10 epochs was used based on the validation loss.

To identify the optimal SARIMA model, a grid search procedure was employed to determine the best combination of non-seasonal and seasonal parameters based on the Akaike Information Criterion (AIC). The AIC is commonly used in time-series model selection to assess both the goodness-of-fit and model complexity [16]. In comparison, the Bayesian Information Criterion (BIC) imposes a stronger penalty on model complexity; however, AIC is generally preferred in forecasting-oriented applications, where predictive accuracy is the primary objective [17]. Although several candidate SARIMA models exhibit similar structural orders, they differ in their estimated parameters and corresponding AIC values. Therefore, the model with the lowest AIC was selected as the final configuration for each household. Furthermore, a rolling-window forecasting strategy was adopted to enhance predictive performance. In this approach, the SARIMA model is updated using the most recent 30 days of data at each step to generate one-step-ahead forecasts. This procedure enables the model to adapt to recent temporal dynamics while reducing the influence of outdated observations. Table 2 presents the optimal SARIMA configurations selected for each household based on the AIC, along with their corresponding BIC values for validation of model parsimony.

2.4. Forecasting Models

2.4.1. Long Short-Term Memory

To construct the input sequences, a fixed-length sliding-window approach is adopted. A sequence of past time steps is used as input to predict the PV power output at the subsequent time step, enabling the model to learn temporal dependencies present in the historical data. This sequence-to-one configuration is well-suited for short-term forecasting tasks, as illustrated in Figure 2.

The proposed LSTM architecture consists of four main components: an input layer, an LSTM layer, a fully connected dense layer, and an output layer. The input layer defines the shape of the data, where each sample comprises a sequence of time steps with a single feature corresponding to PV power output. The LSTM layer contains 64 memory units and employs a hyperbolic tangent (

t a n h

) activation function, allowing the network to capture long-term temporal dependencies and nonlinear patterns in the data. Following the LSTM layer, a dense layer with 32 neurons and a Rectified Linear Unit (ReLU) activation function is used to further process the extracted temporal features. The ReLU activation enhances the model’s ability to learn complex nonlinear relationships while mitigating the vanishing gradient problem [15]. Finally, the output layer consists of a single neuron that generates the predicted solar PV power value for the one-hour-ahead forecasting horizon. The architecture and functional description of the LSTM model are provided in Table 3.

The internal operations of the LSTM network are governed by a set of gating mechanisms that regulate the flow of information through the memory cell. At each time step

t

, the LSTM determines which information to retain, update, and output based on the current input

x_{t}

and the hidden state from the previous time step

h_{t - 1}

. The forget gate, defined in Equation (2), where

f_{t}

represents the forget gate, controls the extent to which information from the previous cell state,

C_{t - 1}

, is retained or discarded. This is achieved by applying a sigmoid activation function

σ (\cdot)

, which outputs values in the range [0, 1]. Here,

x_{t}

represents the input vector at time step t, and

h_{t - 1}

is the hidden state from the previous time step. Parameters

W_{f}

and

U_{f}

are weight metrics are associated with the input and hidden state, respectively, while

b_{f}

is the bias vector of the forget gate [19]. The input gate, given by Equation (3), determines which new information is incorporated into the cell state. Simultaneously, the cell candidate, expressed in Equation (4), generates a vector of potential new content using a hyperbolic tangent activation function. These components jointly determine new information that contributes to updating the memory cell. The updated cell state, shown in Equation (5), combines the retained information from the previous cell state with the newly selected candidate information. This enables LSTM to preserve long-term dependencies while integrating relevant new patterns from the input sequence [20]. The output gate defined in Equation (6) regulates the information passed from the updated cell state to the hidden state. Finally, the hidden state,

h_{t}

, computed using Equation (7), is obtained by applying a hyperbolic tangent activation function to the updated cell state and scaling it by the output gate. The hidden state serves as both the output of the LSTM at the time step

t

and the input to the next time step [21].

Forget gate f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f})

(2)

Input gate i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i})

(3)

Cell candidate \tilde{C_{t}} = t a n h (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c})

(4)

Cell state update C_{t} = f_{t} ⊙ C_{t - 1} + i_{t} ⊙ \tilde{C_{t}}

(5)

Output gat o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o})

(6)

Hidden state h_{t} = o_{t} ⊙ t a n h (C_{t})

(7)

2.4.2. Gated Recurrent Unit

The Gated Recurrent Unit (GRU) is a simplified variant of the Long Short-Term Memory (LSTM) network that is designed to reduce model complexity while preserving the ability to capture temporal dependencies in sequential data [22]. Unlike LSTM, the GRU architecture does not maintain a separate cell state; instead, it relies solely on the hidden state to store and propagate temporal information across time steps [23]. This structural simplification results in fewer parameters and often faster training while maintaining competitive forecasting performance. The GRU employs two gating mechanisms: the update gate (zt) and the reset gate (rt), as illustrated in Figure 3.

The update gate controls how much of the previous hidden state to incorporate, effectively combining the roles of the forget and input gates in an LSTM. The reset gate determines the extent to which past information is ignored when computing that candidate hidden state, enabling the model to focus on recent inputs when necessary. At each time step, the reset gate regulates the influence of the previous hidden state on the candidate activation, while the update gate determines the balance between the previous hidden state and the newly computed candidate state in forming the current hidden state. The model architecture with its functional description is shown in Table 3.

Through this mechanism, the GRU adaptively captures short-term and long-term temporal dependencies without the need for an explicit memory cell. At each time step

t

, the GRU computes these gates based on the current input

x_{t}

and the hidden state from the previous time step

h_{t - 1}

. The reset gate, defined in Equation (8), controls the extent to which past information is considered when computing the candidate hidden state.

The update gate, given by Equation (9), determines the balance between retaining information from the previous hidden state and incorporating newly computed information. This gate effectively combines the roles of the forget and input gates used in the LSTM architecture. The candidate hidden state, expressed in Equation (10), is computed using a hyperbolic tangent activation function applied to the current input and the gated previous hidden state.

The final hidden state, shown in Equation (11), is obtained by linearly interpolating between the previous hidden state and the candidate hidden state, as controlled by the update gate. Through this mechanism, the GRU adaptively captures both short-term and long-term temporal dependencies without maintaining a separate memory cell.

Reset gate r_{t} = σ (W_{r} x_{t} + U_{r} h_{t - 1} + b_{r})

(8)

Update gate z_{t} = σ (W_{z} x_{t} + U_{z} h_{t - 1} + b_{z})

(9)

Candidate hidden state \tilde{h_{t}} = t a n h (W_{h} x_{t} + U_{h} (r_{t} ⊙ h_{t - 1}) + b_{h})

(10)

Hidden state update h_{t} = (1 - z_{t}) ⊙ h_{t - 1} + z_{t} ⊙ \tilde{h_{t}}

(11)

2.4.3. Convolutional Neural Network

A one-dimensional CNN is employed to extract local temporal patterns from residential solar PV power time-series data. Although CNNs are widely used in image processing, they can also be effectively adapted for time-series forecasting by treating the temporal dimension as a one-dimensional sequence [24,25]. In this study, each input sample consists of a sequence of past PV power values, where each time step contains a single feature. The CNN architecture begins with a one-dimensional convolutional (Conv1D) layer comprising 32 filters with a kernel size of 3. This layer performs convolution by sliding each filter across the input sequence to capture local temporal dependencies. The model architecture is shown in Figure 4.

The convolution operation for the

k_{t h}

filter can be expressed in Equation (12), where

x_{i + j}

denotes the input sequence,

w_{j, k}

represents the filter weights,

b_{k}

is the bias term, F is the kernel size, and

g

is the activation function. The Rectified Linear Unit (ReLU) activation function is used to introduce nonlinearity [26] and is defined in Equation (13). Following the convolutional layer, a one-dimensional max-pooling layer with a pool size of 2 is applied to reduce the dimensionality of the feature maps while retaining the most salient features. This operation helps to reduce computational complexity and mitigate overfitting. The resulting figure maps are then flattened into a one-dimensional vector and passed to a fully connected dense layer with 32 neurons using the ReLU activation function in Equation (14). The output layer consists of a single neuron without an activation function, which produces the final one-hour-ahead solar PV power forecast. The CNN model is trained using the Adam optimization algorithm, which adaptively adjusts the learning rate during training to ensure stable convergence.

Convolution feature extraction y_{i, k} = g (\sum_{j = 0}^{F - 1} w_{j, k} x_{i + j} + b_{k})

(12)

Activation function (ReLU) g (z) = (0, z)

(13)

Max - Pooling operation y_{i, k}^{p o o l} = (y_{i + j, k})

(14)

Dense layer mapping y = g (W z + b)

(15)

2.4.4. The CNN–LSTM Model

The hybrid CNN–LSTM approach has been broadly adopted in sequential forecasting problems. In CNN–LSTM models, the convolutional layers act as automatic feature extractors that operate directly on the input sequence [20]. For time-series data, the convolutional operation captures local temporal patterns by sliding filters across consecutive time steps, thereby identifying short-term fluctuations and repeating structures in the signal [25]. The extracted feature maps provide a compact and informative representation of the original data, reducing noise and redundancy before temporal modeling [26]. The outputs of the convolutional and pooling layers are then reshaped and passed to the LSTM layer, which is responsible for learning long-term temporal dependencies [27]. The overall structure of CNN-LSTM model closely resembles the CNN model as shown in Figure 4. The architecture of the CNN–LSTM model is illustrated in Figure 5.

However, the fully connected layer is replaced by an LSTM layer to enhance temporal learning capability. By integrating convolutional feature extraction with recurrent sequence modeling, the CNN–LSTM model effectively leverages the strengths of both approaches, resulting in improved forecasting accuracy and robustness for short-term solar PV prediction [26]. The architecture of the CNN–LSTM model, along with its functional description, is shown in Table 3.

Feature Vector Formation (Flattening) f_{t} = f l a t t e n (y^{p o o l})

(16)

LSTM Input from CNN x_{t}^{L S T M} = f_{t}

(17)

CNN - LSTM x_{t}^{L S T M} = f l a t t e n (y_{i, k}^{p o o l})

(18)

2.4.5. Attention-Based LSTM

The attention-based LSTM (ATT–LSTM) model combines Long Short-Term Memory (LSTM) networks with an attention mechanism to improve sequence modeling performance [27]. While an LSTM network captures temporal dependencies in sequential data, the attention mechanism enables the model to selectively focus on the most relevant parts of the input sequence when generating predictions. As illustrated in Figure 6, the attention mechanism computes relevance scores between the hidden representations and the query vector, which is then normalized using a SoftMax function to produce attention weights. These weights determine the relative importance of each input element in the sequence when constructing the final context representation [28]. The model architecture of the attention-based LSTM model is shown in Table 3. In the computational equations, first, an attention score is computed for each input element as shown in Equation (19). Where

s_{i}

represents the attention score for the ith input element,

x_{i}

denotes the input feature vector at position I, and q represents the query vector derived from the decoder or hidden representation. Next, in Equation (20), the scores are normalized using the SoftMax function to obtain attention weights, where

α_{i}

represents the attention weight assigned to the ith input, and N denotes the total number of elements in the input sequence. In Equation (21), each input is then multiplied by its corresponding attention to produce weight inputs. Finally, the context vector is computed as the weighted sum of all input vectors, as shown in Equation (22), where

c_{t}

represents the context vector at time step t, which captures the most relevant information from the input sequence based on the attention weights. This attention allows the model to emphasize important temporal patterns in the input sequence, thereby improving forecasting performance for time-series applications.

Attention Score s_{i} = s (x_{i}, q)

(19)

SoftMax for Attention Weights α_{i} = \frac{e x p (s_{i})}{\sum_{j = 1}^{N} e x p (s_{j})}

(20)

Weighted Inputs α_{i} x_{i}

(21)

Context Vector c_{t} = \sum_{i = 1}^{N} α_{i} x_{i}

(22)

2.4.6. Seasonal Autoregressive Integrated Moving Average SARIMA Model

To establish a robust forecasting baseline, the Seasonal Autoregressive Integrated Moving Average (SARIMA) model was employed. SARIMA is well-suited for time-series data exhibiting seasonality and non-stationarity, as it integrates both non-seasonal and seasonal autoregressive and moving average components through differencing [29,30]. The general SARIMA model is denoted as SARIMA (p, d, q) (P, D, Q) s, where p, d, and q represent the orders of the non-seasonal autoregressive, differencing, and moving average terms, respectively, while P, D, and Q denote their seasonal counterparts, and s is the seasonal period. The model can be expressed mathematically in Equation (23):

Φ_{p} (B) Φ_{P} (B^{s}) {(1 - B)}^{d} {(1 - B^{s})}^{D} y_{t} = Θ_{q} (B) Θ_{Q} (B^{s}) ε_{t}

(23)

where

y_{t}

is the observed value at time t, B is the backshift operator, Φ(.) and Θ(.) represent the autoregressive and moving average polynomials, respectively, and

ε_{t}

is a white noise error term. The architecture of the SARIMA model, along with its functional description, is shown in Table 3.

3. Dataset Description

This article uses data from the residential community of Dingle Peninsula, Ireland [18]. The case study is based on 20 residential households in the Dingle community, Ireland, and is part of the ESB Networks Dingle Electrification strategy. This initiative supports the Dingle 2030 goal, which aims to enable an active and sustainable energy community, as shown in Figure 7. The dataset comprises high-resolution electrical measurements for each household, including power (W) and energy (Wh), along with corresponding local meteorological parameters. All variables were recorded at a 1-min temporal resolution over the full year 2020, allowing for a detailed analysis of short-term variability and seasonal trends in residential energy behavior and PV generation. Among the twenty monitored households, ten residences are equipped with rooftop solar PV systems, with installed capacities ranging from 2.0 to 2.2 kWp. Instead of displaying the full yearly time series, which would result in dense, difficult-to-interpret plots, the hourly PV power values were aggregated across the entire year to obtain a representative daily generation pattern. This approach highlights the typical intraday behavior of residential PV systems while retaining the key characteristics of the generation profile. Representative results of two households are shown in Figure 8 and Figure 9 for visualization purposes. For clarity and conciseness, only these representative households are presented in in the main manuscript, while detailed household-level hourly PV power generation profiles for all ten PV-equipped houses are provided in the Supplementary Material (Figures S1–S8).

4. Results

This section evaluates the performance of six models for one-hour-ahead solar PV power forecasting: SARIMA, LSTM, GRU, CNN, CNN–LSTM, and ATT–LSTM. The assessment uses data from 10 residential households (H1, H2, H3, H4, H5, H7, H10, H11, H13, and H17) within a solar-powered residential community. These households exhibit diverse PV generation characteristics, including stable output, moderate variability, and high intermittency. This evaluation framework is designed to assess not only predictive accuracy but also the generalization capability and robustness of the forecasting models under realistic operating conditions.

4.1. Performance Evaluation Metri

The performance of proposed forecasting models is evaluated using four widely adopted statistical metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), symmetric Mean Absolute Percentage Error (sMAPE), and the coefficient of determination

R^{2}

. These metrics jointly quantify absolute prediction error, sensitivity to large deviations, relative percentage error, and explanatory power, respectively.

It is important to note that the relatively high sMAPE values are primarily due to nighttime periods, the denominator in the sMAPE formula becomes very small, potentially inflating the percentage error despite small absolute deviations. The mathematical formulas of MAE, RMSE, sMAPE, and

R^{2}

are presented in Equations (24)–(27).

Mean Absolute Error M A E = \frac{1}{n} \sum_{t = 1}^{n} | y_{t} - \hat{y_{t}} |

(24)

Root Mean Squared Error R M S E = \sqrt{\frac{1}{n} \sum_{t = 1}^{n} {(y_{t} - \hat{y_{t}})}^{2}}

(25)

Symmetric Mean Absolute Percentage Error s M A P E = \frac{100}{n} \sum_{t = 1}^{n} \frac{| y_{t} - \hat{y_{t}} |}{(| y_{t} | + | \hat{y_{t}} |) / 2}

(26)

Coefficient of Determinant R^{2} R^{2} = 1 - \frac{\sum_{t = 1}^{n} {(y_{t} - \hat{y_{t}})}^{2}}{\sum_{t = 1}^{n} {(y_{t} - \hat{y})}^{2}}

(27)

In these equations,

y_{t}

represents the actual PV Power output,

\hat{y t}

represents the predicted value,

ŷ

is the mean value of the actual values, and n represents the number of test samples.

4.2. Quantitative Performance Comparison Across Households

The forecast performance of all evaluated models across individual households is summarized in Table 4, which reports the MAE, RMSE, sMAPE, and

R^{2}

metrics. Deep-learning models are presented as mean ± standard deviation over multiple runs, whereas SARIMA results correspond to a single deterministic model. Overall, both deep-learning and statistical models demonstrate competitive performance for short-term residential PV forecasting; however, accuracy varies across households due to differences in PV generation patterns and data variability. For House 1, LSTM achieves the lowest MAE (65.61 ± 6.89 W), while CNN provides the lowest RMSE (133.40 ± 1.60 W). SARIMA yields comparable performance (RMSE = 134.82 W,

R^{2}

= 0.827), indicating similar predictive capability among several models. In House 2, GRU demonstrates the best performance among deep-learning models, achieving the lowest RMSE (155.52 ± 1.13 W) and highest

R^{2}

(0.742 ± 0.004), although SARIMA shows comparable results (RMSE = 156.18 W,

R^{2}

= 0.740). For House 3, GRU again achieves the highest explanatory power with

R^{2}

= 0.808 ± 0.003), while CNN–LSTM achieves competitive RMSE (145.51 ± 1.83 W), indicating that both recurrent and hybrid architectures effectively capture temporal dependencies. In House 4, SARIMA clearly outperforms all deep-learning models, achieving the lowest MAE (36.05), lowest RMSE (90.68 W), and highest

R^{2}

(0.709). This suggests that the PV generation pattern in this household exhibits stronger linear and seasonal characteristics. For House 5, GRU achieves the best deep-learning performance (RMSE = 142.02 ± 1.87 W,

R^{2}

= 0.694 ± 0.008), while SARIMA produces nearly identical performance (RMSE = 142.53 W,

R^{2}

= 0.692), indicating strong competitiveness between statistical and learning-based approaches. In House 7, deep-learning models show consistently strong performance, with LSTM achieving the highest

R^{2}

(0.859 ± 0.004), while SARIMA provides comparable results (

R^{2}

= 0.851). For House 10, LSTM slightly outperforms other models with RMSE = 128.06 ± 3.73 W and

R^{2}

= 0.843 ± 0.009, although all models exhibit similar performance levels. In House 11, GRU achieves the best performance (RMSE = 118.66 ± 0.89 W,

R^{2}

= 0.837 ± 0.002), while SARIMA shows nearly identical accuracy (RMSE = 118.48 W,

R^{2}

= 0.837). In House 13, GRU again provides the highest

R^{2}

(0.835 ± 0.008), while CNN and LSTM achieve comparable RMSE values, indicating consistent model behavior across households. Finally, in House 17, CNN achieves the lowest RMSE (138.03 ± 0.78) and the highest

R^{2}

(0.833 ± 0.002), whereas GRU provides the lowest MAE (66.92 ± 3.23 W), highlighting metric-dependent performance differences. Overall, the results demonstrate that deep-learning models generally provide robust and consistent forecasting performance across households due to their ability to capture nonlinear temporal dependencies. However, SARIMA remains a strong baseline and, in some cases, outperforms deep-learning models, particularly when PV generation exhibits regular temporal patterns. These findings emphasize the importance of household-specific model evaluation, as forecasting performance is strongly influenced by the variability and structure of PV generation data.

4.3. Distributional Error Analysis of Evaluated Models

The overall forecast performance of the evaluated models is illustrated in Figure 10.

4.4. House-Wise Comparative Analysis of R²

A detailed comparison of forecast performance across individual households is presented in Figure 11, which illustrates the house-wise

R^{2}

values for all evaluated models. A consistent trend is observed in the

R^{2}

comparison, where models achieving lower RMSE values generally correspond to higher

R^{2}

scores, indicating reliable and stable predictive performance across households. Across the evaluated households, most deep-learning models achieve

R^{2}

values in the range of approximately 0.80–0.86, particularly for Houses 1, 3, 7, 11, and 13. Among these, recurrent architectures such as GRUs and LSTMs frequently demonstrate competitive or superior performance. For example, GRU achieves the highest

R^{2}

in House 11 (0.837 ± 0.002) and House 13 (0.835 ± 0.008), reflecting its strong capability to capture temporal dependencies in PV generation data. Similarly, CNN exhibits strong performance in certain households, such as House 4 and House 17, where it achieves the highest

R^{2}

values (0.678 ± 0.016 and 0.833 ± 0.002, respectively), indicating its effectiveness in capturing localized patterns. However, the performance of forecasting models varies across households, which can be attributed to differences in PV generation profiles and inherent data variability. In households with more irregular or nonlinear patterns, deep-learning models tend to provide predictive performance by capturing complex temporal relationships. In contrast, the SARIMA model performs competitively in several cases, particularly for households with more regular temporal behavior. For instance, in House 4, SARIMA achieves the highest

R^{2}

value (0.709) and the lowest RMSE, outperforming all deep-learning models. This suggests that statistical models remain effective when the underlying time series exhibits strong seasonality and linear structure. Overall, these results indicate that while deep-learning models generally provide higher predictive accuracy and robustness across diverse household scenarios, the SARIMA model remains a strong baseline for short-term residential PV forecasting, particularly when the data exhibit clear temporal regularity.

4.5. SARIMA Model Analysis

The estimated model parameters for each household are presented in Table 5. All fitted models follow a consistent SARIMA (1,1,1) (1,1,1,24) structure, capturing both short-term dynamics and daily seasonality in photovoltaic (PV) power generation. The autoregressive coefficient AR(1) is consistently high across households (typically ranging between ~0.75 and 0.83), indicating strong temporal dependencies in PV generation. This suggests that current power output is strongly influenced by recent past values, as expected given the gradual variation in solar irradiance and system inertia. The moving average terms MA (1) are relatively smaller in magnitude but remain statistically significant (p < 0.05), indicating that short-term forecast errors are effectively incorporated into the model. This behavior reflects the ability of SARIMA to correct recent prediction deviations, thereby improving short-horizon forecasting accuracy. The seasonal autoregressive component SAR(24) exhibits positive coefficients across all households, confirming the presence of strong daily periodicity in PV generation patterns. Conversely, the seasonal moving average SMA(24) is consistently negative and close to −1, indicating a strong seasonal error correction mechanism. This suggests that deviations from the daily cycle are systematically corrected, thereby enhancing model stability across repeated daily cycles. Furthermore, all estimated parameters are statistically significant with p-values close to zero, confirming the robustness and reliability of the fitted SARIMA models. The consistency of parameter values across households indicates that, despite variations in installation characteristics, residential PV systems exhibit similar underlying temporal structures dominated by daily seasonality. Overall, these results demonstrate that the SARIMA model effectively captures both short-term dependencies and seasonal patterns in PV power generation, which explains its competitive performance compared to deep-learning models in several households.

It should be noted that the p-values associated with the SARIMA model parameters are not obtained from a pairwise comparison with the ground truth data. Instead, these p-values are derived from the statistical estimation of the SARIMA model coefficients during the fitting process. They indicate the significance of each parameter in explaining the temporal and seasonal structure of the PV power time series. As all estimated parameters were found to be statistically significant (p < 0.05), the p-values have been omitted from Table 5 for clarity.

4.6. One-Hour-Ahead Forecasting for Representative Households

The one-hour-ahead PV power forecasting results for Households H4 and H5, using different forecasting models, are shown in Figure 12 and Figure 13. The CNN–LSTM and ATT–LSTM models provide the most accurate predictions, with their forecasted profiles closely following the actual PV output power throughout the day. These models excel in capturing both smooth and rapid variations in PV generation, particularly during the morning and evening hours when fluctuations are more pronounced. The CNN model shows slightly higher deviations than the hybrid models but still provides a reasonable approximation of the overall generation profile. The SARIMA model performs competently, providing a good baseline, particularly for Household H4, where it closely matches to the actual power generation. However, while SARIMA captures overall seasonal trends, it slightly lags in responding to rapid fluctuations in power output, especially during peak generation periods. LSTM and GRU models also capture the general trend in power generation but exhibit some delay during periods of rapid change, particularly in the early morning and late afternoon. The SARIMA model remains a strong contender, providing reasonable forecasts, especially for households with less variability. Forecasting results for the remaining households are provided in the Supplementary Material (Figures S9–S16). For clarity, a one-hour-ahead, one-day forecast is also presented for a randomly selected day for the representative households H4 and H5. Figure 14a shows that, for House 4, all models closely follow the actual power profile. Meanwhile, the SARIMA and CNN–LSTM models provide the most accurate forecasts, as indicated by the small deviations from the actual power curve. Similarly, in Figure 14b for House 5, deep-learning models (LSTM, GRU, CNN, CNN–LSTM, ATT–LSTM) and SARIMA exhibit strong performance with minimal discrepancies, particularly in capturing the daily peak power generation time. However, the SARIMA model, although slightly less effective than hybrid deep-learning models, still provides a reasonable approximation of the overall trend. These figures show that across both households, deep-learning models exhibit relatively lower errors during forecast periods, particularly the CNN–LSTM and ATT–LSTM models, demonstrating their ability to capture nonlinear fluctuations in PV power generation. While SARIMA also produces effective predictions, its performance lags slightly behind that of more advanced deep-learning models. Overall, these results provide a clearer visualization of model performance over a typical day and highlight the strengths and limitations of each approach. Hybrid deep-learning models, such as CNN–LSTM and ATT–LSTM, demonstrate superior performance for short-term forecasting of residential PV power, whereas SARIMA remains a valuable baseline despite its limitations during periods of rapid fluctuations.

4.7. Household-Level Variability Analysis

The forecasting results reveal noticeable variability across households within the same residential energy community. Although overall PV generation patterns are influenced by shared meteorological conditions, the quantitative evaluation metrics presented in Table 4 indicate distinct forecasting behaviors for individual households.

These differences highlight that residential photovoltaic systems within the same community do not produce identical generation profiles. For instance, House 4 consistently exhibits lower error values, with SARIMA achieving the lowest MAE (36.05) and RMSE (90.68 W), indicating relatively stable and predictable PV generation. In contrast, households such as House 2 and House 5 demonstrate higher error values (RMSEs of 155–165 W), reflecting more complex and variable generation patterns. The variability can be attributed to several factors, including rooftop orientation, shading conditions, installation configuration, and localized environmental influences. Consequently, forecasting models must adapt to household-specific characteristics rather than relying on a uniform modeling approach. Despite these variations, deep-learning models demonstrate consistent adaptability across different households. Architectures such as GRU and LSTM maintain stable performance across certain households, suggesting their ability to effectively extract temporal features. In contrast, the SARIMA model performs particularly well in households exhibiting strong linearity and seasonality, as observed in House 4. However, its performance becomes less competitive in households with higher variability, where nonlinear dynamics dominate. Overall, these results confirm the presence of intra-community heterogeneity in residential PV generation. Therefore, forecasting performance is strongly dependent on household-specific characteristics. The findings emphasize that deep-learning models are better suited for capturing nonlinear and complex temporal dependencies, whereas statistical models remain effective for more regular and structured generation patterns. Consequently, a flexible and adaptive modeling framework is essential for reliable short-term PV forecasting in residential energy systems.

4.8. Statistical Significance Analysis of Model Performance

To assess whether the observed differences in forecasting accuracy are statistically meaningful, pairwise comparisons were conducted using both a paired t-test and a Wilcoxon signed-rank test. The analysis was performed on RMSE values obtained from five independent runs of each deep-learning model, thereby accounting for stochastic variability in model training. The results for representative households are summarized in Table 6. For House 5, a statistically significant difference is observed between the LSTM and GRU models using a paired t-test (p = 0.0144), indicating that the GRU provides improved predictive accuracy for this household. A similar trend is observed for the comparison between CNN–LSTM and ATT–LSTM (p = 0.0451), suggesting that hybrid architectures may capture temporal dependencies more effectively under certain conditions. However, the corresponding Wilcoxon test results yield slightly higher p-values, indicating that these improvements are moderate and should be interpreted with caution. In contrast, for House 4, no statistically significant differences are observed among the evaluated models (p > 0.05 for all comparisons), indicating that the performance variations are not substantial. This suggests that for households with more regular and stable photovoltaic generation patterns, different deep-learning architectures tend to perform similarly. Overall, the statistical analysis confirms that while certain models demonstrate improved forecasting accuracy, these improvements are not consistently significant across all households. Therefore, model performance should be interpreted in the context of household-specific generation characteristics. The inclusion of both parametric and non-parametric tests strengthens the robustness of the comparative analysis and provides a more reliable basis for evaluating forecasting models.

5. Discussion

The results demonstrate that deep-learning models frequently achieve strong performance in short-term residential PV power forecasting. However, their superiority is not consistent across all households. Instead, forecasting accuracy is strongly dependent on the underlying characteristics of PV generation patterns, including variability, intermittency, and seasonal structure. Across the evaluated households, deep-learning models such as LSTM, GRU, CNN, CNN–LSTM, and ATT–LSTM achieve comparable performance, with only moderate differences in RMSE and

R^{2}

values. In several cases, GRU and CNN-based architectures exhibit slightly improved performance, particularly in households with higher variability. For instance, in House 5, GRU achieves the lowest RMSE and highest

R^{2}

indicating its ability to better capture short-term fluctuations in PV output. This observation is further supported by the statistical analysis, which shows that GRU significantly outperforms LSTM based on a paired t-test (p < 0.05). However, the corresponding Wilcoxon test indicates that this improvement is moderate, suggesting that performance differences among deep-learning models are not always strongly pronounced. In contrast, for households with more regular and stable PV generation patterns, such as House 4, the differences between models are minimal. Statistical tests confirm that no significant differences exist among deep-learning models for this household (p > 0.05), indicating that model choice has a limited impact when the underlying time series is less complex. In such a scenario, simpler models can achieve performance comparable to that of more complex architecture. The SARIMA model remains a competitive baseline across several households. Its performance is particularly strong in cases where PV generation exhibits clear daily seasonality and relatively stable behavior. For example, in House 4 and House 11, SARIMA achieved

R^{2}

values of 0.709 and 0.837, respectively, which are comparable to those of deep-learning models. The estimated SARIMA parameters further support this observation, as consistently high autoregressive and seasonal coefficients indicate strong temporal dependence and pronounced daily periodicity in PV generation. Hybrid deep-learning models, including CNN–LSTM and ATT–LSTM, demonstrate stable and reliable performance across most households, with RMSE values generally within a narrow range. However, their improvements over simpler architectures are not consistently statistically significant, indicating that increased model complexity does not always translate into meaningful performance gains. Overall, these findings highlight that forecasting performance is highly household-specific. While deep-learning models are effective at capturing nonlinear and highly variable generation patterns, SARIMA remains a robust and interpretable alternative for systems with strong seasonal structure. Therefore, model selection should be guided by the characteristics of the PV generation data rather than assuming universal superiority of a single modeling approach.

6. Conclusions

This study evaluated one-hour-ahead residential PV power forecasting using deep-learning models and a SARIMA baseline across multiple households. The results show that both approaches provide competitive performance, with accuracy depending on household-specific generation characteristics. Deep-learning models demonstrate stable performance, particularly in more variable conditions, although statistical analysis indicates that their improvements are not consistently significant across all households. In contrast, SARIMA remains a strong and interpretable baseline, achieving comparable performance in households with clear seasonal patterns, such as House 4 and House 11. Overall, no single model consistently outperforms the others, highlighting the importance of selecting forecasting methods based on the characteristics of PV generation. The proposed framework, incorporating multi-run evaluation and statistical testing, provides a reliable basis for model comparison in residential forecasting.

7. Limitations and Future Work

Despite the promising performance of deep-learning models, several limitations remain. First, the current forecasting framework is based solely on univariate PV power data, considering only historical power measurements for one-hour-ahead forecasting. While univariate forecasting simplifies the model and reduces data requirements, it limits the model’s ability to capture all the factors influencing PV generation, such as weather data and residential consumption patterns. Future work will address these limitations by integrating multivariate forecasting techniques and incorporating additional variables, such as solar irradiance, ambient temperature, and residential energy consumption. This will enable more comprehensive analysis of the factors influencing PV generation and improve overall forecasting accuracy. Additionally, the forecasting framework will be tested across different geographical locations to evaluate its robustness and generalization in varying climatic conditions. Another promising direction for future research is the integration of transformer-based and decomposition-based forecasting architectures. Recently developed models, such as the Transformer, Informer, Auto former, and FED Former have demonstrated strong capabilities for capturing long-range temporal dependencies and complex seasonal patterns in time series data. In particular, decomposition-based frameworks that explicitly separate trend and seasonal components may provide improved modeling of photovoltaic power generation, which exhibits strong diurnal and seasonal characteristics. Incorporating these architectures could further enhance forecasting accuracy and robustness for residential PV systems. Therefore, future work will investigate the application and comparative evaluation of transformer-based and decomposition-based models for household-level PV power forecasting. Furthermore, the proposed framework will be embedded into optimization-based energy management systems, enabling real-time forecasting, improved demand-side management, storage optimization, and the integration of renewable energy into smart grids.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/en19081991/s1, Figures S1–S16: Average daily PV generation profiles and one-hour-ahead PV power forecasting results for additional households.

Author Contributions

Conceptualization, K.B. and P.J.; Methodology, K.B.; Data Curation, V.S. and F.M.; Formal Analysis, K.B.; Writing—Review and Editing, K.B., F.M. and V.S.; Visualization, K.B.; Supervision, P.J. and V.S.; Funding Acquisition, F.M. and V.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in this study are publicly available from Trivedi et al. (2024) [15], Comprehensive dataset on electrical load profiles for energy community in Ireland, Scientific Data, 11, 621. Additional weather data were obtained from the Irish Meteorological service https://www.met.ie/climate/available-data/historical-data (accessed on 9 April 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AIC	Akaike Information Criterion
ARIMA	Autoregressive Integrated Moving Average
ATT–LSTM	Attention–LSTM
CNN	Convolutional Neural Network
EJRC	European Joint Research Centre
GRU	Gated Recurrent Unit
LSTM	Long Short–Term Memory
MAE	Mean Absolute Error
MRMI	Modified Relief–Mutual Information
PV	Photovoltaic
RMSE	Root Mean Squared Error
RPPFF	Renewable Power Production Forecasting Framework
SARIMA	Seasonal Autoregressive Integrated Moving Average
BIC	Bayesian Information Criterion
sMAPE	Symmetric Mean Absolute Percentage Error

References

Caramizaru, A.; Uihlein, A. Energy Communities: An Overview of Energy and Social Innovation; Publications Office of the European Union: Luxembourg, 2020. [Google Scholar]
dos Santos, S.A.B.; Coutinho, L.R.R.; Tofoli, F.L.; Barroso, G.C. Community energy management system for residential energy communities integrating demand response, distributed generation, and energy storage systems. J. Energy Storage 2025, 132, 117832. [Google Scholar] [CrossRef]
Kampman, B.; Blommerde, J.; Afman, M. The Potential of Energy Citizens in the European Union. 2016. Available online: www.cedelft.eu (accessed on 9 April 2026).
Massidda, L.; Bettio, F.; Marrocu, M. Probabilistic day-ahead prediction of PV generation. A comparative analysis of forecasting methodologies and of the factors influencing accuracy. Sol. Energy 2024, 271, 112422. [Google Scholar] [CrossRef]
Obi, M.; Bass, R. Trends and challenges of grid-connected photovoltaic systems—A review. Renew. Sustain. Energy Rev. 2016, 58, 1082–1094. [Google Scholar] [CrossRef]
Zielińska-Sitkiewicz, M.; Chrzanowska, M.; Furmańczyk, K.; Paczutkowski, K. Analysis of electricity consumption in Poland using prediction models and neural networks. Energies 2021, 14, 6619. [Google Scholar] [CrossRef]
Asghar, R.; Fulginei, F.R.; Quercio, M.; Mahrouch, A. Artificial Neural Networks for Photovoltaic Power Forecasting: A Review of Five Promising Models. IEEE Access 2024, 12, 90461–90485. [Google Scholar] [CrossRef]
Mishra, M.; Singh, J.G. A comprehensive review on deep learning techniques in power system protection: Trends, challenges, applications and future directions. Results Eng. 2025, 25, 103884. [Google Scholar] [CrossRef]
Pookpunt, S. Very-Short-Term Forecasting of Solar Radiation using ARIMA. AIP Conf. Proc. 2024, 3239, 020007. [Google Scholar] [CrossRef]
Fara, L.; Diaconu, A.; Craciunescu, D.; Fara, S. Forecasting of Energy Production for Photovoltaic Systems Based on ARIMA and ANN Advanced Models. Int. J. Photoenergy 2021, 2021, 6777488. [Google Scholar] [CrossRef]
Memarzadeh, G.; Keynia, F.; Amirteimoury, F.; Memarzadeh, R.; Noori, H. A New Hybrid Intelligent Method for Accurate Short-Term Electric Power Production Forecasting from Uncertain Renewable Energy Resources. Int. J. Ind. Electron. Control. Optim. 2025, 8, 45. [Google Scholar]
Riedel, P.; Belkilani, K.; Reichert, M.; Heilscher, G.; von Schwerin, R. Enhancing PV feed-in power forecasting through federated learning with differential privacy using LSTM and GRU. Energy AI 2024, 18, 100452. [Google Scholar] [CrossRef]
Thota, T.; Kurumthottam, A.B.; Kumar, S.R.; Mishra, J. An Enhanced Time Series based Solar Power Forecast for Microgrid System. IEEE Access 2025, 13, 144785–144797. [Google Scholar] [CrossRef]
Aksan, F.; Suresh, V.; Janik, P.; Sikorski, T. Load Forecasting for the Laser Metal Processing Industry Using VMD and Hybrid Deep Learning Models. Energies 2023, 16, 5381. [Google Scholar] [CrossRef]
Trivedi, R.; Bahloul, M.; Saif, A.; Patra, S.; Khadem, S. Comprehensive Dataset on Electrical Load Profiles for Energy Community in Ireland. Sci. Data 2024, 11, 621. [Google Scholar] [CrossRef]
Abdel-Nasser, M.; Mahmoud, K. Accurate photovoltaic power forecasting models using deep LSTM-RNN. Neural Comput. Appl. 2019, 31, 2727–2740. [Google Scholar] [CrossRef]
Xue, H.; Ma, J.; Zhang, J.; Jin, P.; Wu, J.; Du, F. Power Forecasting for Photovoltaic Microgrid Based on MultiScale CNN-LSTM Network Models. Energies 2024, 17, 3877. [Google Scholar] [CrossRef]
Klein-Seetharaman, R.; Zhu, X.; Mather, B. Transfer Learning Trained LSTM Models for Household Load Profile Forecasting. In Proceedings of the 2025 IEEE PES Grid Edge Technologies Conference and Exposition, Grid Edge 2025, San Diego, CA, USA, 21–23 January 2025; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2025. [Google Scholar] [CrossRef]
Nguyen, T.-A.; Pham, M.-H.; Phap, V.M.; Do, Q.-H.; Nguyen, N.-T.; Nguyen, D.-T.; Nguyen, T.N. Forecasting of solar power generation in Vietnam deploying a simple GRU model. In Proceedings of the 2023 Asia Meeting on Environment and Electrical Engineering, EEE-AM, Hanoi, Vietnam, 13–15 November 2023; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2023. [Google Scholar] [CrossRef]
Bano, K.; Suresh, V.; Janik, P. Comparative analysis of Deep learning approaches for short-term solar PV power prediction. Prz. Elektrotech. 2025, 261–269. [Google Scholar] [CrossRef]
Wang, H.; Lei, Z.; Zhang, X.; Zhou, B.; Peng, J. A review of deep learning for renewable energy forecasting. Energy Convers. Manag. 2019, 198, 111799. [Google Scholar] [CrossRef]
Sua, L.S.; Wang, H.; Huang, J. Deep learning in renewable energy forecasting: A cross-dataset evaluation of temporal and spatial models. Energy Environ. 2025. ahead of print. [Google Scholar] [CrossRef]
Suresh, V.; Janik, P.; Rezmer, J.; Leonowicz, Z. Forecasting solar PV output using convolutional neural networks with a sliding window algorithm. Energies 2020, 13, 723. [Google Scholar] [CrossRef]
Liu, X.; Xiao, C.; Huang, M.; Zhang, H.; Liu, W.; Li, J. Enhancing LSTM Algorithms for Photovoltaic Power Forecasting; Institute of Electrical and Electronics Engineers (IEEE): Piscataway, NJ, USA, 2025; pp. 2818–2823. [Google Scholar] [CrossRef]
Aslam, M.; Lee, S.J.; Khang, S.H.; Hong, S. Two-Stage Attention over LSTM with Bayesian Optimization for Day-Ahead Solar Power Forecasting. IEEE Access 2021, 9, 107387–107398. [Google Scholar] [CrossRef]
Singh, C.; Garg, A.R. Enhancing Solar Power Output Predictions: Analyzing ARIMA and S-ARIMA Models for Short-Term Forecasting. In Proceedings of the IEEE Power India International Conference, PIICON, Jaipur, India, 10–12 December 2024; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2024. [Google Scholar] [CrossRef]
Sapundzhi, F.; Chikalov, A.; Georgiev, S.; Georgiev, I. Predictive Modeling of Photovoltaic Energy Yield Using an ARIMA Approach. Appl. Sci. 2024, 14, 11192. [Google Scholar] [CrossRef]
Rai, A.; Shrivastava, A.; Jana, K.C. Differential attention net: Multi-directed differential attention-based hybrid deep learning model for solar power forecasting. Energy 2023, 263, 125746. [Google Scholar] [CrossRef]
Patil, Y.; Shruti, T. Time-Series Forecasting Using ARIMA and SARIMA Models for Solar NASA POWER Data. In Proceedings of the 2025 3rd International Conference on Intelligent Systems, Advanced Computing and Communication (ISACC), Silchar, India, 27–28 February 2025; pp. 946–952. [Google Scholar]
Hossain, M.L.; Shams, S.N.; Ullah, S.M. Time-series and deep learning approaches for renewable energy forecasting in Dhaka: A comparative study of ARIMA, SARIMA, and LSTM models. Discov. Sustain. 2025, 6, 775. [Google Scholar] [CrossRef]

Figure 1. Workflow of the proposed methodology. The arrows indicate the progression of data from one step to the next. The gears represent the preprocessing phase, while the blue circles represent model development steps. The vertical bars highlight key stages in the workflow, such as data acquisition, processing, and model evaluation.

Figure 2. Long Short-Term Memory model. Arrows indicate the flow of data between different components. “X” represents the values at each gate (forget, input, etc.), “S” represents the gate types (Forget Gate, Input Gate, etc.), and “T” indicates the interactions leading to output and new state updates.

Figure 3. Gated Recurrent Unit (GRU) model. The arrows represent the flow of information between different gates, with the dashed boxes indicating areas where information is retained or updated. The ‘−1’ symbol indicates the previous hidden state, while the reset and update gates are represented by

r_{t}

and

z_{t}

, respectively. The updated hidden state is denoted by

h_{t}^{’}

.

Figure 3. Gated Recurrent Unit (GRU) model. The arrows represent the flow of information between different gates, with the dashed boxes indicating areas where information is retained or updated. The ‘−1’ symbol indicates the previous hidden state, while the reset and update gates are represented by

r_{t}

and

z_{t}

, respectively. The updated hidden state is denoted by

h_{t}^{’}

.

Figure 4. Architecture of the Convolutional Neural Network (CNN) model illustrating the flow from the input data through the convolutional layers, pooling layer, flattening layer, and dense layer, culminating in the output layer.

Figure 5. Architecture of the CNN–LSTM model, combining convolutional layers for feature extraction with long-short term memory (LSTM) for sequential learning, including the forget gate, input gate, cell update, and output gate processes.

Figure 6. Attention–LSTM model. The figure shows the attention mechanism, where each input sequence s(x1,q), s(x2,1), …, s(xn,q) is assigned weight (α1, α2, …, αn) by the attention layer. The arrows represent the flow of information, where the weighted inputs are combined using the plus sign to compute the context vector (ct). This context vector is then passed through the SoftMax function to finalize the output.

Figure 7. Residential energy community on the Dingle Peninsula, Ireland. The figure shows the layout of the energy system, including solar power production, storage in batteries, and data transmission through the gateway router. Arrows indicate the flow of electricity from various sources (grid, solar) and the network structure for data exchange. Symbols represent different components such as household appliances, the cloud-based server for data storage, and the visualization process for data collection. The upward arrow represents the flow of energy from solar production to the grid and household.

Figure 8. Average daily solar PV power generation profile for House 4.

Figure 9. Average daily solar PV power generation profile for House 5.

Figure 10. Distribution of RMSE values for different forecasting models. It represents the distribution of RMSE values for forecasting models.as well as the average RMSE obtained across all households. The results indicate that deep-learning models, including LSTM, GRU, CNN, CNN–LSTM, and ATT–LSTM, exhibit comparable RMSE values and generally outperform or match the SARIMA model, depending on the characteristics of each household. Among the deep-learning models, GRU and LSTM consistently demonstrate strong performance with relatively low variability, as reflected by their small standard deviations across multiple runs. CNN also performs competitively, particularly in households where local temporal patterns are more prominent. In contrast, hybrid models such as CNN–LSTM and ATT–LSTM show slightly higher variability in some cases, suggesting sensitivity to data characteristics and model complexity. The SARIMA model provides a reliable statistical baseline and achieves competitive RMSE values in several households, particularly those with more regular and stable PV generation patterns. For example, in House 4, SARIMA exhibits a significantly lower RMSE compared to all deep-learning models, highlighting its ability to capture linear and seasonal structures. However, for households exhibiting higher variability and nonlinear behavior, deep-learning models generally provide improved predictive accuracy. Overall, the RMSE distribution demonstrates that deep-learning models maintain consistent forecasting performance across diverse household scenarios, while SARIMA remains effective in cases with strong temporal regularity. These results confirm that model performance is highly dependent on the underlying characteristics of PV generation, and no single model universally outperforms others across all households.

Figure 11. House-wise comparison of R² for different forecasting models.

Figure 12. One-hour-ahead PV power forecasting results for the representative House 4.

Figure 13. One-hour-ahead PV power forecasting results for the representative House 5.

Figure 14. One-hour-ahead PV forecasting results for the representative households. (a) House 4, (b) House 5.

Table 1. Hyperparameters used for deep-learning models.

Model	Optimizer	Learning Rate	Batch Size	Epochs	Dropout	Early Stopping
LSTM	Adam	0.001	32	100	0.2	Yes
GRU	Adam	0.001	32	100	0.2	Yes
CNN	Adam	0.001	32	100	0.2	Yes
CNN–LSTM	Adam	0.001	32	100	0.2	Yes
ATT–LSTM	Adam	0.001	32	100	0.2	Yes

Table 2. SARIMA Model Selection and Identification.

House Number	Model	Seasonal Order	AIC	BIC
H1	(10,1)	(1,1,1,24)	90,460.124	90,494.362
H2	(1,0,1)	(1,1,1,24)	91,974.080	92,008.318
H3	(1,0,1)	(1,1,1,24)	91,150.348	91,184.586
H4	(1,0,1)	(1,1,1,24)	89,693.597	89,727.835
H5	(1,0,1)	(1,1,1,24)	90,962.430	90,996.669
H7	(1,0,1)	(1,1,1,24)	88,422.821	88,434.623
H10	(1,0,1)	(1,1,1,24)	89,507.192	89,541.431
H11	(1,0,1)	(1,1,1,24)	88,150.663	88,162.465
H13	(1,0,1)	(1,1,1,24)	88,612.009	88,623.900
H17	(1,0,1)	(1,1,1,24)	90,064.959	90,099.760

Table 3. Model structure and functional description.

Model	Architecture	Description
LSTM	Input, LSTM (64 units), Dense (32 units, ReLU), Dense (1 unit)	Captures long-term temporal dependencies in solar PV power time-series
GRU	Input, GRU (64 units), Dense (32 units, ReLU), Dense (1 unit)	Model’s temporal dependencies using a gated recurrent mechanism with reduced computational complexity
CNN	Input, Conv1D (32 filters, kernel size 3, ReLU), MaxPooling1D (pool size 2), Flatten, Dense (32 units, ReLU), Dense (1 unit)	Extracts local temporal patterns from solar PV power sequences
CNN–LSTM	Input, Conv1D (32 filters, kernel size 3, ReLU), MaxPooling1D (pool size 2), LSTM (32 units), Dense (16 units, ReLU), Dense (1 unit)	Combines convolutional feature extraction with LSTM-based sequential learning
ATT–LSTM	Input, LSTM (64 units, return sequences), Self-attention layer, GlobalAveragePooling1D, Dense (32 units, ReLU), Dense (1 unit)	Enhances LSTM performance by focusing on the most informative time steps using an attention mechanism
SARIMA	Seasonal Autoregressive Integrated Moving Average (1,0,1) × (1,1,1,24)	Model linear temporal dependencies using autoregressive and moving average components with seasonal differencing; captures daily periodicity (24 h)

Table 4. Forecasting performance across households (mean ± standard deviation).

House No	Model Name	MAE [W]	RMSE [W]	sMAPE (%)	R²
House 1	LSTM	65.61 ± 6.89	132.64 ± 2.48	149.14 ± 1.24	0.833 ± 0.006
	GRU	67.20 ± 5.96	134.51 ± 2.34	148.97 ± 0.94	0.828 ± 0.006
	CNN	66.76 ± 2.93	133.40 ± 1.60	149.64 ± 0.20	0.831 ± 0.004
	CNN–LSTM	71.68 ± 2.23	138.41 ± 2.31	150.17 ± 0.43	0.818 ± 0.006
	ATT–LSTM	69.97 ± 8.45	137.18 ± 2.60	150.81 ± 0.87	0.821 ± 0.007
	SARIMA	64.10	134.82	108.04	0.827
House 2	LSTM	82.26 ± 3.82	160.01 ± 2.88	147.59 ± 1.34	0.727 ± 0.010
	GRU	78.60 ± 4.86	155.52 ± 1.13	147.71 ± 1.10	0.742 ± 0.004
	CNN	84.67 ± 4.57	161.12 ± 3.98	149.06 ± 0.94	0.723 ± 0.014
	CNN–LSTM	92.94 ± 10.30	165.43 ± 8.05	150.11 ± 1.63	0.708 ± 0.029
	ATT–LSTM	92.70 ± 14.20	164.59 ± 7.58	150.34 ± 2.16	0.711 ± 0.027
	SARIMA	73.65	156.18	99.65	0.740
House 3	LSTM	71.22 ± 6.72	143.80 ± 4.70	143.53 ± 1.52	0.808 ± 0.013
	GRU	70.97 ± 4.06	143.78 ± 1.21	142.44 ± 1.34	0.808 ± 0.003
	CNN	74.74 ± 4.44	147.70 ± 2.74	145.93 ± 0.86	0.797 ± 0.008
	CNN–LSTM	75.27 ± 4.27	145.51 ± 1.83	145.24 ± 1.20	0.803 ± 0.005
	ATT–LSTM	76.29 ± 12.58	150.79 ± 6.65	145.42 ± 2.95	0.788 ± 0.019
	SARIMA	69.64	146.40	107.07	0.801
House 4	LSTM	50.13 ± 5.57	100.35 ± 3.66	153.93 ± 0.97	0.644 ± 0.026
	GRU	47.61 ± 4.47	100.71 ± 5.45	153.78 ± 0.53	0.640 ± 0.039
	CNN	44.46 ± 3.65	95.45 ± 2.33	154.93 ± 1.57	0.678 ± 0.016
	CNN–LSTM	51.75 ± 5.67	99.60 ± 2.64	155.81 ± 0.74	0.649 ± 0.019
	ATT–LSTM	49.98 ± 9.33	100.55 ± 6.44	156.15 ± 1.72	0.641 ± 0.047
	SARIMA	36.05	90.68	52.47	0.709
House 5	LSTM	74.41 ± 2.51	147.85 ± 3.44	125.64 ± 5.78	0.668 ± 0.015
	GRU	69.84 ± 6.65	142.02 ± 1.87	124.88 ± 7.29	0.694 ± 0.008
	CNN	72.08 ± 4.81	144.89 ± 3.30	122.69 ± 3.38	0.682 ± 0.014
	CNN–LSTM	76.88 ± 4.20	147.56 ± 1.38	125.77 ± 2.98	0.670 ± 0.006
	ATT–LSTM	70.84 ± 6.42	144.48 ± 1.93	124.51 ± 8.46	0.683 ± 0.008
	SARIMA	64.95	142.53	104.60	0.692
House 7	LSTM	52.58 ± 3.41	106.60 ± 1.67	143.23 ± 0.64	0.859 ± 0.004
	GRU	54.02 ± 5.19	108.19 ± 1.65	144.10 ± 0.65	0.855 ± 0.004
	CNN	57.38 ± 7.88	109.75 ± 1.07	144.63 ± 0.26	0.850 ± 0.011
	CNN–LSTM	56.83 ± 3.96	109.75 ± 1.07	144.63 ± 0.26	0.851 ± 0.003
	ATT–LSTM	54.92 ± 3.98	111.54 ± 1.92	145.15 ± 0.93	0.846 ± 0.005
	SARIMA	51.10	109.80	79.58	0.851
House 10	LSTM	62.46 ± 5.68	128.06 ± 3.73	144.03 ± 0.82	0.843 ± 0.009
	GRU	61.88 ± 3.60	128.45 ± 2.39	144.26 ± 0.36	0.842 ± 0.006
	CNN	63.12 ± 5.83	128.57 ± 2.53	145.10 ± 0.47	0.842 ± 0.006
	CNN–LSTM	69.02 ± 3.52	132.43 ± 2.75	144.94 ± 0.64	0.832 ± 0.007
	ATT–LSTM	62.57 ± 1.69	134.27 ± 2.19	145.97 ± 0.45	0.828 ± 0.006
	SARIMA	61.56	131.98	81.15	0.834
House 11	LSTM	59.02 ± 2.62	119.56 ± 0.34	144.49 ± 0.49	0.834 ± 0.001
	GRU	57.03 ± 1.99	118.66 ± 0.89	143.85 ± 0.41	0.837 ± 0.002
	CNN	59.89 ± 4.09	120.00 ± 3.10	145.20 ± 0.58	0.833 ± 0.009
	CNN–LSTM	70.23 ± 3.94	130.13 ± 3.41	146.01 ± 0.47	0.803 ± 0.010
	ATT–LSTM	66.54 ± 12.57	126.70 ± 10.29	146.12 ± 2.43	0.812 ± 0.031
	SARIMA	54.51	118.48	80.67	0.837
House 13	LSTM	61.38 ± 5.33	126.71 ± 0.95	143.55 ± 0.52	0.834 ± 0.002
	GRU	59.07 ± 4.90	126.11 ± 3.14	143.36 ± 0.56	0.835 ± 0.008
	CNN	60.49 ± 4.08	126.60 ± 1.51	144.79 ± 0.32	0.834 ± 0.004
	CNN–LSTM	67.60 ± 7.59	129.78 ± 3.99	144.78 ± 0.59	0.825 ± 0.011
	ATT–LSTM	60.20 ± 5.26	127.22 ± 5.15	144.72 ± 1.35	0.832 ± 0.014
	SARIMA	54.26	125.82	80.32	0.836
House 17	LSTM	69.21 ± 5.79	139.56 ± 5.03	142.37 ± 0.92	0.829 ± 0.012
	GRU	66.92 ± 3.23	138.46 ± 2.02	142.07 ± 0.60	0.832 ± 0.005
	CNN	68.39 ± 3.36	138.03 ± 0.78	143.53 ± 0.65	0.833 ± 0.002
	CNN–LSTM	73.42 ± 5.29	140.80 ± 3.29	143.18 ± 0.43	0.826 ± 0.008
	ATT–LSTM	69.36 ± 3.69	142.15 ± 3.29	143.21 ± 0.59	0.823 ± 0.008
	SARIMA	65.72	140.29	83.38	0.827

Table 5. Estimated SARIMA model parameters for each household.

House No	Parameter	Coefficient	St. Error
House 1	AR (1)	0.8240	0.006
	MA (1)	0.0951	0.008
	SAR (24)	0.0451	0.007
	SMA (24)	−0.9758	0.002
House 2	AR (1)	0.7560	0.006
	MA (1)	0.0691	0.008
	SAR (24)	0.0501	0.007
	SMA (24	−1.0235	0.002
House 3	AR (1)	0.7936	0.006
	MA (1)	0.0740	0.008
	SAR (24)	0.0472	0.007
	SMA (24)	−1.0170	0.002
House 4	AR (1)	0.7858	0.005
	MA (1)	0.0793	0.008
	SAR (24)	0.0571	0.007
	SMA (24)	−1.0454	0.003
House 5	AR (1)	0.7614	0.006
	MA (1)	0.0724	0.008
	SAR (24)	0.0370	0.007
	SMA (24)	−0.9811	0.002
House 7	AR (1)	0.8234	0.006
	MA (1)	0.1386	0.008
	SAR (24)	0.0604	0.007
	SMA (24)	−1.0255	0.002
House 10	AR (1)	0.8185	0.006
	MA (1)	0.0984	0.008
	SAR (24)	0.0400	0.007
	SMA (24)	−0.9776	0.002
House 11	AR (1)	0.8221	0.006
	MA (1)	0.0977	0.008
	SAR (24)	0.0381	0.007
	SMA (24)	−0.9790	0.002
House 13	AR (1)	0.8248	0.004
	MA (1)	0.1986	0.007
	SAR (24)	0.0679	0.007
	SMA (24)	−1.0252	0.002
House 17	AR (1)	0.8283	0.006
	MA (1)	0.0914	0.007
	SAR (24)	0.0604	0.007
	SMA (24)	−1.0224	0.002

Table 6. Statistical Significance Analysis of Model Performance of Deep-Learning Models.

House	Model Pair	p-Value (t-Test)	p-Value (Wilcoxon)	Interpretation
H4	LSTM vs. GRU	0.7770	0.8125	Not significant
H5	LSTM vs. GRU	0.01444	0.0625	Significant (t-test)
	GRU	155.52 ± 1.13	147.71 ± 1.10	0.742 ± 0.004
	CNN–LSTM vs. ATT–LSTM	0.0451	0.1250	Significant (t-test)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bano, K.; Suresh, V.; Montana, F.; Janik, P. Comparative Evaluation of Deep-Learning and SARIMA Models for Short-Term Residential PV Power Forecasting. Energies 2026, 19, 1991. https://doi.org/10.3390/en19081991

AMA Style

Bano K, Suresh V, Montana F, Janik P. Comparative Evaluation of Deep-Learning and SARIMA Models for Short-Term Residential PV Power Forecasting. Energies. 2026; 19(8):1991. https://doi.org/10.3390/en19081991

Chicago/Turabian Style

Bano, Kalsoom, Vishnu Suresh, Francesco Montana, and Przemyslaw Janik. 2026. "Comparative Evaluation of Deep-Learning and SARIMA Models for Short-Term Residential PV Power Forecasting" Energies 19, no. 8: 1991. https://doi.org/10.3390/en19081991

APA Style

Bano, K., Suresh, V., Montana, F., & Janik, P. (2026). Comparative Evaluation of Deep-Learning and SARIMA Models for Short-Term Residential PV Power Forecasting. Energies, 19(8), 1991. https://doi.org/10.3390/en19081991

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparative Evaluation of Deep-Learning and SARIMA Models for Short-Term Residential PV Power Forecasting

Abstract

1. Introduction

1.1. Context and Motivation

1.2. Previous Work

1.3. Research Gap

1.4. Contributions

2. Methodology

2.1. Data Collection

2.2. Data Preprocessing

2.3. Experimental Setup

2.4. Forecasting Models

2.4.1. Long Short-Term Memory

2.4.2. Gated Recurrent Unit

2.4.3. Convolutional Neural Network

2.4.4. The CNN–LSTM Model

2.4.5. Attention-Based LSTM

2.4.6. Seasonal Autoregressive Integrated Moving Average SARIMA Model

3. Dataset Description

4. Results

4.1. Performance Evaluation Metri

4.2. Quantitative Performance Comparison Across Households

4.3. Distributional Error Analysis of Evaluated Models

4.4. House-Wise Comparative Analysis of R2

4.5. SARIMA Model Analysis

4.6. One-Hour-Ahead Forecasting for Representative Households

4.7. Household-Level Variability Analysis

4.8. Statistical Significance Analysis of Model Performance

5. Discussion

6. Conclusions

7. Limitations and Future Work

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.4. House-Wise Comparative Analysis of R²