1. Introduction
Solar energy has become one of the central components of the transition toward sustainable and low-carbon energy systems. Among renewable technologies, photovoltaic (PV) generation has expanded particularly rapidly due to declining equipment costs, modular deployment, and the increasing economic attractiveness of distributed and rooftop installations. At the same time, the growing penetration of solar power in modern electricity systems has intensified the need for accurate forecasting tools that can support operational management, grid integration, and longer-term sustainable energy planning [
1,
2,
3,
4]. Solar PV recorded the largest absolute increase among renewable technologies in 2023, and official IEA analyses continue to identify PV as a leading driver of renewable electricity growth in the coming years.
Recent progress in photovoltaic forecasting reflects the broader role of numerical, statistical, and machine learning models in modern energy systems. In renewable-energy applications, accurate PV forecasts are required because solar generation is inherently variable and weather-dependent, while higher PV penetration increases the need for reliable planning, dispatch, self-consumption optimization, and grid-balancing tools [
1,
5,
6,
7,
8]. At the same time, data-driven numerical models are increasingly used across the wider energy sector, including fossil fuel and reservoir engineering applications. Recent examples include pressure-transient testing for evaluating hydraulic fracturing effectiveness, analytic hierarchy and reliability analysis models for shale-drilling drag reduction, and machine learning frameworks for enhanced oil recovery screening [
7,
8,
9]. These studies illustrate that model-based decision support is becoming important across both clean and conventional energy systems. In the present work, however, the focus is specifically on solar PV forecasting, where improved predictions directly support sustainable energy planning and the integration of distributed renewable generation.
Forecasting photovoltaic production remains a challenging task because PV output depends on seasonality, weather variability, local operating conditions, and temporal persistence in the historical generation series. These features often induce nonlinear behavior and changing dependence structures, which may limit the performance of purely linear forecasting models, especially when the available datasets are relatively short or heterogeneous. For this reason, recent literature has increasingly emphasized machine learning and deep learning methods as flexible alternatives capable of capturing more complex temporal patterns and improving predictive accuracy [
2,
3,
4,
5,
10,
11]. Recent review studies have also stressed the importance of selecting models according to forecasting horizon, data characteristics, and application context, particularly when renewable-energy forecasting is intended to support planning and control decisions.
Recent studies further confirm the rapid development of photovoltaic forecasting from isolated single-site models toward broader multi-site, pooled, and uncertainty-aware frameworks. Recent review papers have emphasized that the field is moving toward more systematic comparisons of machine learning and deep learning methods, with growing attention to forecasting horizon, model robustness, and transferability across sites and datasets [
6,
12,
13,
14]. At the same time, recent methodological contributions have shown that multi-site or global formulations can be beneficial when historical records are limited or uneven across installations, while recent probabilistic studies have highlighted the importance of reliable uncertainty quantification for operational use in energy management and grid integration [
15,
16,
17,
18]. Solar energy has become one of the central components of the transition toward sustainable and low-carbon energy systems, and accurate PV forecasting is increasingly required for grid integration, dispatch, self-consumption optimization, and planning [
1,
5,
6,
19]. These developments support the motivation of the present study, which combines a pooled multi-system setting with a comparative evaluation of feed-forward and recurrent neural architectures for monthly photovoltaic forecasting.
Contemporary PV forecasting literature can be grouped into statistical models, classical machine learning methods, deep neural architectures, hybrid models, and probabilistic or uncertainty-aware methods [
5,
6]. Recent reviews emphasize that deep learning has become increasingly important for PV time-series forecasting, but they also note that model performance depends strongly on forecast horizon, available input variables, preprocessing, benchmark design, and hyperparameter selection [
5,
6]. Commonly compared architectures include multilayer perceptrons (MLPs), recurrent neural networks, LSTM and GRU models, convolutional neural networks, graph neural networks, and Transformer-based models [
5,
6,
20,
21]. These studies usually evaluate forecasting quality by error and goodness-of-fit indicators such as MAE, RMSE, MAPE, sMAPE, and R
2, while recent comparative works also consider validation/test separation, residual diagnostics, and computational efficiency [
6,
19,
20]. For example, recent solar power studies have compared several deep architectures under common evaluation protocols and have reported RMSE, MAE, MAPE, and R
2 as standard regression metrics; other work has emphasized that GRU-type recurrent models may provide competitive accuracy with reduced training time, which is relevant for operational forecasting environments [
19,
20]. Recent reviews also stress that there is still no universally accepted benchmark for PV forecasting, making transparent reporting of datasets, horizons, metrics, and validation protocols particularly important [
6].
Several recent works have moved beyond isolated single-model forecasting toward multi-architecture, multi-site, or data-scarce PV forecasting frameworks. Kim et al. developed Transformer and recurrent-network variants for multi-step day-ahead PV forecasting using power, weather, and solar geometry inputs from two PV plants [
21]. Jang et al. proposed a common deep learning model applicable to multiple solar generation sites and explicitly addressed the use of shared information across locations [
22]. Depoortere et al. introduced SolNet, an open-source deep learning framework for PV forecasting across many sites, emphasizing that high-quality long observational histories are often unavailable in practice and that transfer or pooled learning can be valuable in data-scarce settings [
23]. These studies support the motivation for the present pooled formulation. However, most recent multi-site studies rely on high-frequency data, meteorological variables, or large collections of PV systems. In contrast, the present work addresses a more restricted but practically common setting: monthly yield forecasting for several related rooftop PV installations with unequal record lengths and without rich exogenous meteorological predictors. This motivates the comparison of a pooled MLP and a pooled GRU using only lagged production values, cyclical calendar encodings, and plant-specific embeddings.
In addition to point forecasting, recent PV forecasting literature increasingly recognizes the importance of uncertainty quantification. Prediction intervals are useful because energy planning decisions depend not only on the central forecast but also on the range of plausible production outcomes [
24,
25]. Recent conformal-prediction studies for PV power forecasting use calibration or validation residuals to transform point forecasts into prediction intervals with improved reliability, and they show that such uncertainty-aware forecasts can support electricity-market and operational decision-making [
24,
25]. Although the present study does not implement a full conformal-prediction framework, its validation residual prediction intervals follow the same practical rationale: residuals from a chronologically subsequent validation block are used to estimate empirical uncertainty bounds around the final point forecasts.
Our earlier study [
26] addressed monthly photovoltaic energy yield forecasting for the Chikalov PV installation using ARIMA-type models. That work showed that even compact monthly datasets can provide practically useful forecasts for both total yield and specific yield, thereby demonstrating the relevance of time-series methods for PV performance analysis. The Chikalov installations are rooftop photovoltaic systems in southwestern Bulgaria, monitored through Sunny Portal, and highlighted the practical role of forecasting for energy management, renewable resource optimization, and grid-related planning [
26]. However, the underlying ARIMA framework is inherently linear and treats the forecasting problem from the perspective of a single installation.
In the present study, the forecasting setting is extended from a single-series formulation to a pooled multi-system framework based on the Chikalov family of photovoltaic datasets. The attached data include monthly production records for Chikalov 1, Chikalov 3, Chikalov 4, Chikalov 5, and Chikalov 6, associated with Simitli, Cherniche, and Poleto, and the series lengths are unequal across plants. This structure naturally motivates a global pooled model, since a shared learning framework can simultaneously exploit temporal dependence within each PV system and structural similarities across systems, thereby increasing the effective information available for model estimation.
Against this background, the present paper investigates forecasting solar energy production through a comparative pooled neural network framework using the Chikalov PV data. Two complementary architectures are considered. The first is a multilayer perceptron (MLP), which serves as a nonlinear feed-forward benchmark based on lagged observations and seasonal descriptors. The second is a gated recurrent unit (GRU), which is specifically designed for sequence modeling and can retain informative temporal structure through recurrent hidden-state updates [
27,
28,
29,
30]. This comparison is methodologically meaningful for monthly PV forecasting. On the one hand, an MLP may already capture substantial seasonal and nonlinear structure when appropriate lagged inputs are supplied. On the other hand, a GRU may provide additional gains by explicitly modeling chronological dependence and persistence effects in the monthly production sequence [
5,
10,
27,
28,
29,
30].
The practical importance of this work lies in its focus on small and medium-sized distributed photovoltaic systems, for which long, homogeneous, and meteorologically enriched datasets are often unavailable. In such cases, accurate forecasting must rely mainly on historical production records, seasonal information, and robust modeling strategies capable of extracting shared structure across related installations. By developing a pooled neural forecasting framework for several Chikalov PV systems, the present study provides a data-driven tool that can support monthly production planning, performance monitoring, maintenance scheduling, and uncertainty-aware decision-making in distributed solar energy management.
The main objective of this study is therefore to assess whether pooled neural modeling can provide an effective and practically relevant framework for forecasting monthly photovoltaic production in support of sustainable energy planning. More specifically, the paper compares MLP and GRU models trained on the same pooled Chikalov dataset and evaluates their predictive behavior through standard forecasting criteria. In this way, the study contributes in three directions: first, by extending the Chikalov forecasting setting from a single-system linear approach to a pooled multi-system learning framework. Second, by providing a direct comparison between feed-forward and recurrent neural architectures for monthly PV production forecasting. Third, by emphasizing the role of accurate PV forecasts in planning, monitoring, and optimizing distributed solar energy systems [
1,
2,
3,
4,
5,
10,
11,
26,
27,
28,
29,
30].
The remainder of the paper is structured as follows:
Section 2 presents the Chikalov photovoltaic systems and dataset, and then describes the pooled MLP and GRU forecasting architectures together with the experimental design, hyperparameter tuning procedure, evaluation criteria, validation residual prediction intervals, computational workflow and replication details.
Section 3 reports the numerical results of the comparative study.
Section 4 discusses the main findings, with emphasis on the relative behavior of the feed-forward and recurrent models, the role of the 12-month input horizon, and the implications of pooled learning for short and unbalanced monthly PV series. Finally,
Section 5 concludes the paper and outlines directions for further research.
2. Data and Methods
2.1. Chikalov PV Systems and Dataset
The analysis is based on monthly energy production records from the Chikalov family of photovoltaic systems. They are located in southwestern Bulgaria, where a 30 kW rooftop photovoltaic system is mounted on a residential building and monitored through the Sunny Portal platform [
26]. The systems employ three-phase Sunny Tripower 5000TL inverters by SMA Solar Technology AG, Niestetal, Germany and BYD P6-30 Series-3BB photovoltaic modules manufactured by BYD Company Ltd., Shanghai, China.
Since the installations are located in southwestern Bulgaria, in the northern hemisphere and in a region influenced by both temperate continental and Mediterranean climatic conditions with a pronounced annual solar radiation cycle, the monthly energy yield is expected to exhibit clear seasonality, with higher production during late spring and summer and lower production during winter.
The data employed here contain monthly production records for five systems labeled Chikalov 1, Chikalov 3, Chikalov 4, Chikalov 5, and Chikalov 6. According to the dataset in
Table 1,
Table 2,
Table 3,
Table 4 and
Table 5, these systems are associated with the locations Simitli, Cherniche, and Poleto, and the records are given as monthly energy values. The available histories are unbalanced: Chikalov 1 has the longest record, beginning in 2012 (
Table 1), whereas the remaining systems begin later, mainly between 2020 and 2021, and all series continue into 2024 with partial final-year observations (
Table 2,
Table 3,
Table 4 and
Table 5). The dataset, therefore, has the structure of a short monthly panel rather than a collection of equally long individual time series.
Let
denote the monthly total energy yield of plant
in month
. The forecasting target is the one-step-ahead value
, predicted from historical observations of the same plant together with shared information learned across all plants. Because the monthly frequency implies a strong annual cycle, seasonality is represented through cyclical encodings of the calendar month, namely
where
is the month index. This representation preserves the circular nature of the calendar and avoids the artificial discontinuity between December and January. The monthly yield values are standardized using parameters estimated from the training data only, and the observations are divided chronologically into training, validation, and test subsets in order to preserve the forecasting interpretation of the experiment. The pooled formulation is adopted because it allows the models to exploit both temporal dependence within each plant and structural similarity across plants, which is particularly important when several series are short.
Table 1,
Table 2,
Table 3,
Table 4 and
Table 5 report the complete raw monthly total energy yield values, measured in kWh, for the five considered 30 kW Chikalov photovoltaic systems. The long-format panel used for model construction is obtained directly by stacking the monthly records by plant and calendar month. In each table, rows correspond to calendar years, and columns correspond to months. Blank cells at the beginning or end of a table indicate that the corresponding month lies outside the available observation period for the respective system, while explicitly marked “n. a.” entries denote missing records within the covered observation period. The raw tables, therefore, document not only the magnitude of the monthly yields but also the unequal availability of data across the five systems.
Table 1,
Table 2,
Table 3,
Table 4 and
Table 5 show that Chikalov 1 has the longest history, whereas Chikalov 3, Chikalov 4, Chikalov 5, and Chikalov 6 have shorter records starting later in the observation window.
The descriptive statistics in
Table 6 summarize this structure. The pooled dataset contains 296 observed monthly yield values over 300 potential months within the covered plant-specific observation periods. Only four values are missing, all belonging to Chikalov 1, and they correspond to the unrecorded values in November and December 2015 and 2016. The mean monthly yield ranges from 3061.89 kWh for Chikalov 6 to 3550.19 kWh for Chikalov 1, while the standard deviations are relatively large, ranging from 1331.07 kWh to 1432.57 kWh. This variability is expected for monthly photovoltaic production because the installations are located in a region with a pronounced annual solar radiation cycle and therefore produce substantially higher yields in late spring and summer than in winter. The minimum values, ranging from 590.00 kWh to 989.00 kWh, occur in low-production months, whereas the maximum values, ranging from 5208.50 kWh to 6273.00 kWh, correspond to high-production months. Overall, the table confirms that the datasets are comparable in scale but strongly unbalanced in length, which motivates the use of a pooled forecasting framework capable of borrowing information across related PV systems.
2.2. Multilayer Perceptron (MLP) Architecture
The first forecasting model is a pooled multilayer perceptron (MLP), which serves as a nonlinear feed-forward benchmark. In contrast to classical linear autoregressive models, the MLP can learn nonlinear interactions among lagged production values and seasonal descriptors. The MLP is used as a nonlinear feed-forward benchmark because MLP-type models are widely used in PV forecasting and are recognized as capable of learning nonlinear relationships between historical production values, meteorological or calendar descriptors, and future output [
5,
20]. In the present formulation, the input vector for plant
and month
is constructed from the previous
monthly yields together with the seasonal encodings and a plant-specific identifier. After standardization, the lag window is flattened into a single feature vector of the form
where
denotes the standardized yield, and
denotes the numerical representation of plant
. In the present implementation,
is not a one-hot vector but a learned plant embedding. Each photovoltaic system is first assigned an integer identifier, and this identifier is mapped by an embedding layer to a four-dimensional trainable vector. The embedding dimension was fixed at 4 and was not included in the hyperparameter search. For the MLP, this learned embedding is concatenated to the flattened lagged input vector, so that the network receives both the temporal production history and a compact trainable representation of the plant identity. The embedding parameters are estimated jointly with the remaining MLP weights during training by backpropagation. Feed-forward neural networks trained by backpropagation form one of the standard foundations of modern nonlinear prediction, and recent PV studies confirm that MLP-type models remain competitive baselines for solar power forecasting when suitable lagged and seasonal features are provided [
27,
28].
The proposed pooled MLP consists of an input layer followed by two fully connected hidden layers with nonlinear activation functions and a final scalar output layer. Denoting the input vector by
, the hidden transformations can be written schematically as
and the one-step-ahead forecast is then obtained as
Here,
is chosen as a rectified linear unit (ReLU), since it provides a simple and effective nonlinear transformation while keeping the network computationally light. Dropout regularization is included to reduce overfitting, which is particularly relevant for short PV datasets and neural models trained with limited observations [
22,
23,
24]. Dropout regularization may be inserted after the hidden layers to reduce overfitting, which is a relevant consideration for short monthly datasets. The principal role of the MLP in the present comparison is to provide a strong nonlinear baseline that uses the same pooled data structure as the GRU but does not explicitly model sequential recurrence. Its predictive ability therefore depends on the informativeness of the lagged feature vector rather than on an internal hidden state updated over time.
In recent PV forecasting comparisons, feed-forward neural networks are commonly retained as baseline models against recurrent, convolutional, and Transformer-based architectures because they provide a simple reference for evaluating the additional value of explicit sequence modeling [
5,
19,
20]. This architecture is well suited as a benchmark for three reasons. First, it is simple, transparent, and easy to optimize on relatively small datasets. Second, it allows a direct assessment of how much predictive information is already contained in a fixed lagged representation of monthly PV yield. Third, because the same lag horizon, seasonal variables, and plant identifiers can be used in both the MLP and GRU settings, the comparison between the two models becomes methodologically cleaner. In other words, differences in performance can be attributed primarily to the architectural treatment of temporal dependence rather than to differences in the underlying information set.
2.3. Gated Recurrent Unit (GRU) Architecture
The second forecasting model is a pooled gated recurrent unit (GRU), which extends the feed-forward baseline by explicitly modeling sequential dependence. GRU networks were introduced as a recurrent neural architecture capable of learning temporal structure through gating mechanisms that regulate the flow of past information. In contrast to a standard recurrent neural network, the GRU uses an update gate and a reset gate to control how much of the previous hidden state is retained and how strongly past information contributes to the candidate state. This design improves the ability of the model to capture medium- and long-range dependencies while remaining more compact than more elaborate recurrent alternatives. For monthly photovoltaic data, such a mechanism is attractive because the target variable exhibits persistence, seasonality, and possible nonlinear carry-over effects from one month to the next [
29].
In the present pooled implementation, the input for month
is not a flattened lag vector but an ordered sequence of the previous
monthly observations. Each time step contains three numerical quantities: the standardized yield, the sine seasonal encoding, and the cosine seasonal encoding. As in the MLP model, plant identity is represented by a learned embedding rather than by one-hot encoding. Each plant is assigned an integer identifier, which is mapped to a four-dimensional trainable embedding vector. The embedding dimension was fixed at 4 and was not tuned. For the GRU, this plant embedding is repeated across the
time steps of the input window and concatenated to the time-step features
. Thus, at each time step, the GRU receives the standardized yield, the cyclical seasonal descriptors, and the same plant-specific latent vector. This allows the recurrent model to distinguish among the PV systems while still learning shared temporal dynamics from the pooled dataset. Thus, the effective input at time step
can be written as
For a given target month
, the GRU processes the sequence
and produces a hidden state summarizing the relevant temporal information in the lag window. The final hidden representation is then passed to a dense output layer that returns the one-step-ahead forecast
.
At the cell level, the recurrent transitions are governed by the update gate
, the reset gate
, the candidate state
, and the hidden state
. In standard notation,
The reset gate determines how strongly the previous hidden state contributes to the candidate representation, whereas the update gate controls the balance between newly computed information and previously stored memory. This allows the GRU to preserve useful temporal structure while suppressing irrelevant or outdated components of the sequence. In the monthly PV context, such behavior is especially relevant because the model must extract information from seasonal repetition, recent production persistence, and cross-plant regularities without becoming excessively parameterized.
The pooled GRU is expected to offer several advantages in the present application. First, recurrent architectures process the lagged observations as an ordered sequence and can therefore preserve temporal dependence more naturally than a feed-forward model based on a flattened static feature vector [
5,
19,
20,
21]. Second, because monthly PV production is strongly influenced by seasonal solar radiation patterns, a one-year or near-one-year sequence can provide the model with a complete annual production cycle; recent PV studies have similarly emphasized the importance of seasonal decomposition, temporal context, and seasonal patterns in improving forecast accuracy [
23,
31]. Third, the pooled formulation is consistent with recent multi-site and data-scarce PV forecasting studies, where shared models are used to exploit common structure across locations or systems while retaining site-specific information [
22,
23]. For these reasons, the GRU constitutes the main sequence model in the comparative study, while the MLP provides the reference nonlinear feed-forward alternative.
2.4. Experimental Design and Evaluation Metrics
To ensure a fair comparison between the multilayer perceptron (MLP) and the gated recurrent unit (GRU), both models are trained and evaluated under the same pooled-data setting. The dataset is organized as a monthly panel of Chikalov photovoltaic systems, and all observations are sorted chronologically within each plant. The forecasting task is one-step-ahead prediction of monthly total energy yield. Thus, for a target month
, the models use only information available up to month
, which preserves the genuine forecasting interpretation of the experiment. The same training, validation, and test partition is applied to both architectures, with earlier observations used for model fitting, a subsequent block reserved for model selection, and the most recent block retained for final out-of-sample evaluation. This chronological design avoids information leakage and is more appropriate than random shuffling for time-series forecasting problems [
26].
All numerical experiments were conducted in Python 3.9.7 using standard scientific computing and machine learning packages. The main libraries used in the implementation were NumPy 1.20.3 and Pandas 1.3.4 for data handling, scikit-learn 1.0.1. for preprocessing and evaluation metrics, PyTorch 1.10.0 for neural network implementation and training, Matplotlib 3.4.3 for visualization, and openpyxl 3.0.9 for exporting numerical results. The computations were performed on a workstation with an Intel® Core™ i9-12900 processor at 2.4 GHz, 128 GB RAM operating at 4000 MT/s, and Intel® UHD Graphics 770. The same software environment, chronological data split, and evaluation protocol were used for both the MLP and GRU models.
In both models, the input information is built from lagged monthly yields and seasonal variables. The month of the year is represented through sine and cosine encodings in order to capture the annual cycle while respecting its circular structure. The yield values are standardized using statistics computed from the training subset only, and the same transformation is then applied to the validation and test subsets. The pooled formulation is identical for both architectures: all Chikalov systems are used jointly during training so that the models can learn not only the temporal dynamics of each installation but also the common structure shared across the PV systems. This is particularly important because the available monthly series are unbalanced in length, and some plants contain substantially fewer observations than others.
The MLP and GRU differ only in the way they process the same predictive information. For the MLP, the previous
monthly observations are flattened into a fixed feature vector, together with the seasonal encodings and plant-specific identifier. For the GRU, the same lagged observations are preserved in their natural chronological order and fed as a sequence to the recurrent layer. This design makes the comparison methodologically transparent: both models use the same forecasting target, the same pooled data, the same seasonal information, and the same chronological validation protocol while differing only in their internal architectural treatment of temporal dependence. The MLP therefore serves as a nonlinear feed-forward benchmark, whereas the GRU serves as the recurrent sequence model [
27,
28,
29]. The plant embedding dimension was fixed at 4 in both architectures. The hyperparameter search therefore covered only the lag or sequence length, hidden dimension, dropout rate, and learning rate; the plant-embedding dimension was kept constant in order to maintain a symmetric comparison between the MLP and GRU and to limit the search space for the short monthly panel dataset.
Hyperparameter selection is carried out separately for the two architectures by chronological grid search. For the GRU, the tuning parameters include the input sequence length, hidden dimension, dropout rate, and learning rate. Following the exploratory setup already developed for the pooled recurrent model, the tested sequence lengths are , the hidden dimensions are chosen from , dropout is varied in , and the learning rate is selected from . The GRU is trained with the Adam optimizer and a smooth loss, while early stopping is applied on the validation subset. The final GRU specification is selected by minimizing validation RMSE and is then refitted on the combined training and validation data before final testing. This recurrent setup is appropriate for monthly PV forecasting because it allows the model to process one full annual cycle and to retain informative hidden-state dynamics over the lag window.
A parallel hyperparameter-tuning strategy is adopted for the MLP in order to maintain symmetry between the two models. The principal tuning parameters for the MLP are the lag length , the number and width of hidden layers, the dropout rate, and the learning rate. A practical and balanced search space for the present dataset is to use , one or two hidden layers, hidden dimensions in , dropout in , and learning rate in . As with the GRU, the MLP is trained with Adam and a smooth loss, and early stopping is controlled by the validation subset. The final MLP is likewise selected according to validation RMSE and subsequently refitted on the combined training and validation data. This design ensures that the comparison between MLP and GRU is not biased by unequal optimization effort.
The use of validation-based model selection is especially important in the present study because the dataset is short and the number of candidate architectures is nontrivial. The validation subset is therefore used exclusively for hyperparameter tuning, early stopping, and model calibration, while the test subset remains untouched until the final evaluation stage. This separation is essential for preserving the objectivity of the reported test results. In practical terms, the validation phase determines which configuration is retained, whereas the test phase quantifies the true out-of-sample performance of the selected model. The final refitting step, in which the selected architecture is retrained on the union of the training and validation observations, is justified by the limited size of the dataset and by the need to exploit as much historical information as possible before generating the final test forecasts.
Forecast quality is assessed by several complementary evaluation criteria. The first is the root mean square error (RMSE), which emphasizes larger forecast errors and therefore provides a sensitive measure of overall predictive accuracy. The second is the mean absolute error (MAE), which offers a more direct average measure of absolute deviation. The coefficient of determination R
2 is also reported to quantify the proportion of variance explained by the model. In addition, two percentage-based criteria are used: the mean absolute percentage error (MAPE) and the symmetric mean absolute percentage error (sMAPE). MAPE is widely used in forecasting applications because of its interpretability in relative terms, while sMAPE is included because it is typically more stable when the target variable assumes smaller values. These metrics are standard in recent PV forecasting studies, where RMSE and MAE are typically used to quantify absolute forecast error, MAPE and sMAPE to assess relative error, and R
2 to measure explained variance [
16,
19,
20]. Taken together, these five indicators provide a balanced assessment of absolute, relative, and variance-based predictive performance, and they allow a detailed comparison between the MLP and the GRU.
Formally, if denotes the observed monthly yield and its forecast, then RMSE is defined as the square root of the mean squared prediction error, MAE as the mean absolute prediction error, and MAPE as the average of the absolute percentage deviations. The sMAPE criterion replaces the conventional denominator by the average magnitude of observed and predicted values, thereby reducing sensitivity to scale effects. In the present study, all metrics are computed on the original scale of the monthly yield after inverting the standardization transform, so that the reported results remain directly interpretable in physical units and in percentage terms.
Besides point forecasting, the study also considers uncertainty quantification through validation residual prediction intervals. This step is implemented separately for each model after hyperparameter selection. Let
denote the residuals on the validation subset of the selected model. The empirical distribution of these residuals is used to estimate lower and upper quantiles, denoted by
and
, for a given confidence level
. Then, for a point forecast
, the corresponding prediction interval is constructed as
For instance, with
, the interval provides an empirical 95% uncertainty band around the forecast. This residual-based approach is especially suitable in the present setting because it does not impose a strong parametric assumption on forecast errors and remains easy to interpret and implement for both neural architectures. Residual-based interval calibration is related to recent uncertainty-aware and conformal-prediction approaches in PV forecasting, where residuals or nonconformity scores computed on a calibration set are used to construct prediction intervals around point forecasts [
24,
25].
The use of validation residuals for interval calibration has two methodological advantages. First, it aligns naturally with the chronological model-selection framework because the interval width is estimated from a data block that is temporally subsequent to training but still prior to the final test period. Second, it allows direct comparison between MLP and GRU not only in terms of point accuracy but also in terms of practical uncertainty quantification. In the final experimental comparison, each model therefore produces two outputs: a point forecast and an accompanying empirical prediction interval. This is valuable from an applied perspective since sustainable energy planning requires not only accurate central predictions but also a realistic representation of forecast uncertainty. The use of a validation block for interval calibration is consistent with recent probabilistic PV forecasting studies, where calibration data are kept separate from the final test period in order to estimate uncertainty bounds without contaminating the out-of-sample evaluation [
24,
25].
Finally, the evaluation procedure is supplemented by residual diagnostics. After the best hyperparameter configuration is selected, the residual autocorrelation and partial autocorrelation functions may be examined in order to detect any remaining temporal structure not captured by the model. Such diagnostics are especially useful in monthly photovoltaic forecasting, where unmodeled seasonality or persistence may remain visible even when the overall error indicators are satisfactory. In this way, the experimental design combines three complementary layers of assessment: validation-based hyperparameter tuning, test-based out-of-sample accuracy evaluation, and residual-based diagnostic and uncertainty analysis. Together, these elements provide a coherent framework for comparing the MLP and GRU models on equal methodological grounds.
2.5. Computational Workflow and Replication Details
For reproducibility, the complete forecasting workflow was organized into five consecutive stages: data preprocessing, supervised sample construction, hyperparameter tuning, final model refitting, and out-of-sample evaluation. The raw monthly production records were first transformed into a long-format panel dataset containing the plant identifier, calendar month, and monthly total energy yield. Missing observations were kept as missing values and were not interpreted as zero production. For each plant, the monthly records were ordered chronologically, and supervised samples were constructed only when the full lag window and the corresponding target value were available.
Let
denote the monthly total energy yield of plant
in month
. For a selected lag length
, the forecasting problem is formulated as a one-step-ahead prediction:
where
denotes either the MLP or GRU model with trainable parameters
, and
containing the information available from months
. The target value
is not included among the predictors.
The monthly yield values were standardized using the mean and standard deviation estimated from the training subset only:
where
and
are the training-set mean and standard deviation, respectively. The same scaling parameters were then applied to the validation and test subsets. Forecasts were transformed back to the original kWh scale before computing all reported evaluation metrics.
For the MLP, the lagged observations were flattened into a fixed input vector containing the standardized yields and seasonal encodings from the previous
months, together with a plant-specific embedding. For the GRU, the same information was retained as an ordered sequence so that the recurrent layer could process the chronological structure of the monthly observations. In both cases, the models were trained by minimizing the smooth
loss on the training subset:
where
denotes the training subset,
is the number of training samples,
is the standardized observed yield, and
is the corresponding standardized prediction. The Adam optimizer was used for parameter estimation, and early stopping was controlled by the validation error in order to reduce overfitting.
The hyperparameter search was performed chronologically and independently for the MLP and GRU architectures. For the MLP, the tuned parameters were the lag length, hidden-layer width, dropout rate, and learning rate. For the GRU, the tuned parameters were the input sequence length, hidden-state dimension, dropout rate, and learning rate. Each candidate configuration was trained on the training subset and evaluated on the validation subset. The configuration with the lowest validation RMSE was selected:
where
denotes a candidate hyperparameter configuration and
is the corresponding hyperparameter grid.
After model selection, the selected architecture was refitted on the combined training and validation data and evaluated once on the held-out test subset. This protocol ensured that the test data were not used for hyperparameter tuning or early stopping. The final test evaluation therefore provides an out-of-sample assessment of the selected model.
Prediction intervals were obtained by validation residual calibration. For the selected model, validation residuals were computed as
where
denotes the validation subset.
The empirical lower and upper residual quantiles,
and
, were then added to each point forecast to obtain the
-level prediction interval:
In this study, , corresponding to empirical 95% prediction intervals. This procedure was applied separately to the MLP and GRU models, allowing direct comparison of both point forecasts and uncertainty bounds.
Pseudocodes for data preprocessing and supervised sample construction, MLP and GRU model selection and forecasting, and validation residual prediction intervals are given in Algorithm A1–A4 (
Appendix A).
3. Results
The hyperparameter search produced clear patterns for both neural architectures. For the MLP, the best validation model was achieved for a lag length of 12 months, hidden size 128, dropout 0.2, learning rate 0.001, and 12 training epochs. Its validation performance was RMSE = 421.60, MAE = 290.17, R
2 = 0.8924, MAPE = 11.96%, and sMAPE = 10.59% (
Figure A1,
Figure A2,
Figure A3,
Figure A4 and
Figure A5,
Appendix B). For the GRU, the best validation model was obtained for a sequence length of 12 months, hidden size 48, dropout 0, learning rate 0.0005, and 31 training epochs. Its validation performance was RMSE = 457.53, MAE = 308.96, R
2 = 0.8733, MAPE = 12.73%, and sMAPE = 11.24% (
Figure A6,
Figure A7,
Figure A8,
Figure A9 and
Figure A10,
Appendix B).
Thus, on the validation subset, the MLP outperformed the GRU across all reported criteria. In particular, the MLP reduced RMSE by approximately 35.9 units and MAE by about 18.8 units relative to the GRU, while also improving the coefficient of determination and both percentage-based error measures. These results indicate that, at the model-selection stage, a feed-forward nonlinear architecture based on lagged inputs and seasonal descriptors was able to fit the available monthly validation block more effectively than the recurrent alternative.
The search tables also show that the leading configurations for both models are concentrated around a 12-month input horizon. For the GRU, the top-ranked validation models are dominated by sequence length 12, with a few competitive 18-month alternatives appearing slightly below the optimum. The 24-month recurrent specifications generally produce weaker validation scores. For the MLP, the same pattern is even more pronounced: the best validation results are also obtained mainly with 12-month lag windows, especially when combined with hidden size 64 or 128 and mild to moderate dropout. This indicates that one full annual cycle contains the most informative memory for one-step-ahead monthly photovoltaic forecasting in the Chikalov dataset.
To further clarify the influence of the input horizon, we additionally summarized the best validation RMSE obtained for each tested lag length after minimizing over the remaining hyperparameters. The results confirm that the 12-month horizon provides the most favorable validation performance for both architectures. For the MLP, the best validation RMSE was 421.60 kWh for L = 12, compared with 460.87 kWh for L = 18 and 465.86 kWh for L = 24. Thus, extending the input window beyond one annual cycle did not improve the validation accuracy of the feed-forward model. A similar pattern was observed for the GRU. The best validation RMSE was 457.53 kWh for L = 12, compared with 462.45 kWh for L = 18 and 500.31 kWh for L = 24. These results show that longer windows introduce additional estimation burden without improving validation performance.
After hyperparameter selection, both models were refitted on the combined training and validation data and then evaluated on the held-out test period. At this final stage, the ranking changed. The GRU achieved RMSE = 296.38, MAE = 213.16, R2 = 0.9231, MAPE = 7.52%, and sMAPE = 7.49%, whereas the refitted MLP achieved RMSE = 332.01, MAE = 242.06, R2 = 0.9035, MAPE = 7.57%, and sMAPE = 7.92%. Therefore, although the MLP was superior on validation, the GRU produced clearly better out-of-sample performance on the final test subset.
The magnitude of the GRU improvement on the test set is practically meaningful. Relative to the MLP, the GRU reduced RMSE by about 35.63 units and MAE by about 28.89 units, while increasing R2 by roughly 0.0196. The MAPE values of the two models are very close, differing by only around 0.05 percentage points, but the GRU retains the advantage in sMAPE as well. These results suggest that the recurrent architecture generalizes better to unseen monthly observations, even though the feed-forward model appeared more favorable during validation.
Another useful observation concerns the training dynamics. The selected MLP converged very quickly, stopping after only 12 epochs, while the selected GRU required 31 epochs. This behavior is consistent with the simpler optimization structure of the feed-forward network. However, faster convergence did not translate into better final generalization. On the contrary, the recurrent model, despite a slightly weaker validation fit, proved more robust once refitted and tested on the most recent block of observations.
To place these values in context, it should be noted that direct numerical comparison with the literature is not straightforward because PV forecasting studies differ substantially in installed capacity, temporal aggregation, forecast horizon, input variables, and train–test protocols. In particular, RMSE and MAE are scale-dependent and therefore cannot be directly compared between a 30 kW monthly yield dataset and studies based on hourly or daily power measurements from larger PV plants. For this reason, the relative indicators MAPE, sMAPE, and R2 are more informative for cross-study interpretation. On the final test subset, the proposed pooled GRU achieved R2 = 0.9231, MAPE = 7.52%, and sMAPE = 7.49%, while the pooled MLP achieved R2 = 0.9035, MAPE = 7.57%, and sMAPE = 7.92%. These values indicate a high level of explained variance and a moderate relative forecasting error for a monthly dataset constructed from short and unbalanced PV series.
Recent studies report comparable or a bit better relative accuracy when richer input information is available. For example, recent reviews of PV forecasting show that MLP, recurrent, convolutional, graph-based, and hybrid neural models are widely used, and that RMSE, MAE, MAPE, sMAPE, and R
2 are standard evaluation criteria in this field [
5,
32]. Transformer and recurrent neural models have also been used for day-ahead or multi-step PV forecasting with historical power, weather observations, weather forecasts, and solar-geometry variables; in such settings, hybrid Transformer-LSTM variants may substantially reduce MAE relative to simple recurrent baselines [
22]. Other recent comparative studies of short- and medium-term PV forecasting evaluate LSTM, CNN, and GRU models using MAE, RMSE, MAPE, and R
2, confirming that recurrent architectures often provide strong performance when temporal dependence is important [
29]. In this context, the present results are competitive, especially because the proposed models use only lagged monthly yield, cyclical calendar encodings, and plant identifiers, without direct irradiance, temperature, cloud-cover, or numerical-weather-prediction inputs.
4. Discussion
The comparison between the two architectures leads to an interesting and substantively relevant conclusion. The MLP appears to be highly competitive as a pooled nonlinear baseline and is capable of extracting substantial predictive information from lagged monthly yields and seasonal encodings. Its strong validation performance indicates that a large part of the forecastable variation in the Chikalov data can indeed be represented through a fixed lagged feature vector. This is an important finding because it shows that monthly photovoltaic forecasting does not necessarily require a complex recurrent architecture in order to achieve good predictive accuracy.
At the same time, the final test results show that the GRU is ultimately more effective when the goal is robust out-of-sample forecasting. This can be interpreted in several ways. First, the recurrent hidden state allows the GRU to preserve chronological information more naturally than the MLP, which only sees the lagged inputs as a flattened vector. Second, the GRU may be better able to encode persistence effects and subtle transitions across neighboring months, especially when the annual cycle is strong but not perfectly regular. Third, the pooled recurrent representation may offer a more stable way of borrowing information across systems of unequal length, which is important in the present monthly panel setting [
15]. These considerations are consistent with the rationale of the GRU design itself, namely, the use of update and reset gates to manage temporal memory and filter irrelevant sequence components.
The hyperparameter search further reinforces the importance of the 12-month input horizon. For both models, the best or near-best configurations are concentrated around one-year windows. This supports the interpretation that a full annual seasonal cycle is the most informative lag structure for monthly PV energy yield. Six-month memory is insufficient to capture the complete seasonal cycle. On the other hand, extending the horizon to 18 months occasionally produces competitive models, particularly in the GRU case, but does not improve the best validation score. Extending the window to 24 months generally worsens the validation results of both architectures. For a short and unbalanced monthly panel, this additional history appears to be partly redundant and may increase estimation uncertainty. This likely reflects the trade-off between richer historical context and the loss of effective sample size in short monthly datasets. When the time window becomes too long, the models appear to inherit additional estimation burden without receiving sufficiently informative new structure in return.
The selected hyperparameters also reveal architectural differences. The best GRU is comparatively compact, with hidden size 48 and no dropout, whereas the best MLP requires a wider hidden representation of 128 units together with dropout 0.2. This suggests that the MLP needs greater feed-forward capacity and stronger regularization in order to compete effectively, while the GRU attains its best performance with a more parsimonious recurrent structure. From a modeling perspective, this is reasonable: recurrence itself provides an inductive bias toward temporal organization, so the GRU can rely more on architecture and less on width.
One further point deserves attention. The selected validation winner is not necessarily the model with the best preliminary test-at-selection score. For example, several GRU configurations with 18-month windows achieve very strong test-set values during the search stage, and some MLP variants also perform competitively on test-at-selection. However, model selection must remain validation-based in order to preserve the integrity of the final test comparison. From this perspective, the fact that the GRU outperforms the MLP on the fully held-out test block is particularly informative: it indicates that recurrent modeling provides greater robustness beyond the tuning phase.
Overall, the experimental evidence suggests a balanced conclusion. The pooled MLP is a strong and computationally efficient benchmark that performs very well on the validation block and should not be dismissed as a simplistic baseline. Nevertheless, the pooled GRU delivers the best final forecasting accuracy and the strongest generalization on the unseen test period. For the present Chikalov monthly photovoltaic dataset, this makes the GRU the preferred model when predictive robustness is the primary objective, while the MLP remains an attractive alternative when simplicity, speed, and ease of implementation are emphasized.
The comparison with recent literature should be interpreted carefully. Many recent PV forecasting studies report lower MAPE or sMAPE values than those obtained here, but they often use high-frequency data, meteorological variables, weather forecasts, solar geometry descriptors, or much larger training datasets. For instance, recent Transformer and recurrent network studies for day-ahead PV forecasting use historical power together with weather observations, weather forecasts, and solar geometry inputs, which provide substantially richer information than the monthly production-only setting considered here [
22]. Similarly, recent short-term PV forecasting studies evaluate many machine learning and deep learning models on high-resolution datasets and often benefit from dense temporal information and exogenous predictors [
19,
33]. By contrast, the present study deliberately focuses on a constrained but practically common case: monthly energy-yield forecasting for several related rooftop PV systems with unequal record lengths and no direct meteorological covariates.
From this perspective, the obtained GRU test performance is encouraging. The final R
2 = 0.9231 indicates that the model explains more than 92% of the variance in the held-out monthly observations, while the MAPE of 7.52% and sMAPE of 7.49% indicate a practically acceptable relative error for monthly energy planning purposes. The MLP also performs competitively, with R
2 = 0.9035, MAPE = 7.57%, and sMAPE = 7.92%, confirming that much of the forecastable structure is already contained in the lagged annual pattern. However, the GRU provides lower RMSE and MAE on the final test period, which is consistent with the broader literature showing that recurrent architectures are well suited to forecasting tasks where ordered temporal dependence and seasonal persistence are important [
5,
26,
34,
35].