Short-Term Solar Irradiance Forecasting Using Random Forest-Based Models with a Focus on Mountain Locations

Velimirovici, Lucas; Paulescu, Eugenia; Paulescu, Marius

doi:10.3390/en19030769

Open AccessArticle

Short-Term Solar Irradiance Forecasting Using Random Forest-Based Models with a Focus on Mountain Locations

by

Lucas Velimirovici

,

Eugenia Paulescu

^*

and

Marius Paulescu

Faculty of Physics and Mathematics, West University of Timisoara, V. Parvan 4, 300223 Timisoara, Romania

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(3), 769; https://doi.org/10.3390/en19030769

Submission received: 31 December 2025 / Revised: 23 January 2026 / Accepted: 29 January 2026 / Published: 2 February 2026

(This article belongs to the Special Issue The Future of Renewable Energy: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Photovoltaic (PV) power forecasting has become a key tool for the intelligent management of electrical grids. Since the largest source of error in PV power forecasting originates from uncertainties in solar irradiance prediction, improving the accuracy of solar irradiance forecasts has emerged as an active research topic. This study evaluates multiple random tree-based model versions using a challenging dataset collected at globally distributed stations, spanning elevations from sea level to nearly 4000 m and covering a wide range of climate classes. The originality of the study lies in the synergistic contribution of two elements: the innovative inclusion of diffuse irradiance among the predictors and a comparative analysis of forecast quality across lowland and mountainous locations. In such environments, accurate solar resource forecasting is particularly important for the intelligent management of stand-alone PV systems deployed at high altitudes and in remote, off-grid areas. Overall, the results identify Extremely Randomized Trees (XTRc) as the best-performing model. XTRc achieves Skill Scores ranging from 0.087 to 0.298 across individual stations. The model accuracy remains high even at mountain stations, provided that sky-condition variability is low.

Keywords:

solar resources; forecasting; Random Forest; Extremely Randomized Trees; mountain locations

1. Introduction

The large-scale integration of photovoltaic (PV) power plants into existing power grids introduces significant operational challenges. Consequently, research on PV power forecasting has increased significantly in recent years. Despite the considerable effort devoted to this direction, progress has been relatively slow, and there remains substantial room for further research aimed at improving forecast accuracy [1]. PV power forecasting can be performed directly by applying a model to historical measurements of a PV plant’s power output. In this direct approach, the plant-specific characteristics are captured in the measured PV power output time series processed by the forecasting algorithm. Thus, this option inherently ties the model to a specific PV plant, considerably reducing its universality. A more flexible alternative is a two-step forecasting approach. Within this framework, the solar resource is first forecasted and then used as input for a model for estimating PV power. Nearly all uncertainty in this two-step approach originates from the prediction of the solar resource. As a result, this approach is applicable across wide geographical areas and is not restricted by the operational characteristics of individual PV plants.

Within this broader context, the present study focuses on very-short-term solar irradiance forecasting, specifically a 15 min horizon. This horizon was not chosen arbitrarily, as 15 min forecasts are commonly used in practice [2]. Moreover, the increase in forecasting uncertainty with longer horizons is well documented [3]. A 15 min horizon represents the longest lead time at which 15-min-ahead solar energy forecasts are obtained by aggregating the irradiance predicted at each minute within the interval. From another perspective, improving the performance of solar irradiance forecasting is a research topic in its own right. The 15 min ahead forecasting horizon may therefore also be regarded as a representative short-term case used to illustrate methodological approaches aimed at improving solar irradiance forecast skill. For intra-hour forecasting horizons, statistical models based on solar irradiance time series provide a reasonable performance [3]. However, to further improve forecast accuracy, purely statistical approaches are increasingly being complemented or replaced by hybrid physics-based models. These models incorporate exogenous inputs such as predicted sky conditions or cloud dynamics derived from all-sky imagery. Artificial intelligence-based algorithms represent a further alternative in this context.

Machine learning models are now widely used for short-term solar irradiance forecasting. For intra-hour solar irradiance forecasting, Random Forest (RF) and ANN/deep learning (DL) models tend to be preferable under different data regimes. RF is often a strong choice for tabular predictors with engineered lags, delivering a competitive accuracy with minimal tuning on moderate-sized datasets. For instance, a direct comparison on univariate solar irradiance time series showed that RF and LSTM achieve a comparable accuracy, both clearly outperforming a simple autoregressive baseline. This result indicates that RF can remain highly competitive, even when compared with recurrent deep learning models in such settings [4]. In contrast, deep learning models, such as CNN–LSTM/GRU and other spatiotemporal architectures, are typically advantageous when forecasting depends on high-frequency temporal dynamics. This is especially true in cases where rich exogenous information, such as sky images or other high-dimensional inputs that capture cloud evolution, is available. Hybrid image–sequence models have demonstrated clear gains over persistence for a 15 min forecasting horizon when using sky imager information [5]. Multimodal DL approaches that fuse sky imagery with historical numerical data report improved skills over traditional ML baselines for very short forecasting horizons [6]. Recent surveys and reviews further support this pattern: DL tends to dominate in image-based and hybrid nowcasting settings, while “classical” ML methods (including RF) remain strong, simpler baselines in purely time series/tabular scenarios, with performance sensitivity to location, climate, and data availability [7,8].

The development of this study was based on the hypothesis that, in most practical cases of very-short-term (intra-hour) solar irradiance forecasting, only tabulated time series data are available. Consequently, this study has been focused on forecasting models belonging to the RF class. In this context, a brief survey of Random Forest-based forecasting models is provided in the following.

Trull et al. [9] conducted a comprehensive comparison of statistical and machine learning techniques for solar irradiance forecasting using a benchmark dataset [10]. Random tree-based models consistently achieved a competitive accuracy and demonstrated a strong robustness across a large range of forecasting horizons and atmospheric conditions. Ensemble learning approaches, including Random Forest-based models, have been widely investigated for very-short-term solar irradiance forecasting, demonstrating a reliable predictive performance at horizons ranging from a few minutes to one hour [11]. Some studies have highlighted the importance of combining random tree-based feature selection with weather regime classification, leading to an improved stability and reduced forecasting errors under rapidly changing sky conditions [12]. Reference [13] reports an ensemble-based framework for intra-hour solar irradiance forecasting by combining multiple machine learning models. The study showed reduced forecasting errors across different state-of-the-sky regimes, highlighting the benefits of model diversity for very-short-term prediction horizons. The impact of input stationarization on very-short-term global horizontal irradiance forecasting using machine learning models, including Random Forests, was recently investigated [14]. The study reported measurable gains in forecast accuracy and model stability under rapidly changing sky conditions. The integration of sky-view images with Random Forest regression models has also been explored for very-short-term solar irradiance forecasting, confirming the suitability of tree-based ensembles for minute-scale prediction tasks [15]. Implicit and explicit regime identification strategies within machine learning models for solar irradiance prediction are compared in [16]. The analysis showed that Random Forest models are well suited for capturing rapid transitions between atmospheric regimes, leading to improved short-term forecasting performance. A spatiotemporal downscaling framework that combines nearest-neighbor Random Forest models with Gaussian Processes for solar irradiance forecasting is introduced in [17]. The proposed method significantly enhances the representation of local variability and improves predictive performance relative to conventional downscaling approaches.

This survey highlights the abundance of studies focused on the use of random tree-based models for short- and very-short-term solar irradiance forecasting. Nevertheless, substantial room for improvement remains across multiple dimensions, including the choice of random tree models suited to local atmospheric conditions, predictor selection guided by the principle of parsimony, and the achievement of reliable forecast quality.

This study evaluates multiple random tree-based model versions and identifies the one with the widest spatial applicability for global solar irradiance forecasting. The originality of this study arises from the synergistic contribution of two facets: the innovative inclusion of diffuse irradiance among the predictors and the comparative analysis of forecast quality across lowland and mountainous locations. In such settings, the accurate forecasting of solar resources is particularly relevant for the intelligent management of stand-alone PV systems deployed at high altitudes and located far from the electrical grid.

The remainder of this paper is organized as follows. Section 2 introduces the tested models and research methodology and describes the relevant data. The obtained results are presented and discussed in Section 3. The main conclusions are summarized in Section 4.

2. Methodology

2.1. Models

This study builds on the results of testing different RF models for intra-hour solar irradiance forecasting. The RF model [18] belongs to the class of tree-based ensemble learning methods and has been widely adopted in solar irradiance forecasting [19,20]. Its popularity is mainly due to its ability to model nonlinear relationships and interactions among atmospheric variables. By aggregating multiple randomized decision trees, RF-based models provide robust predictions and limit overfitting. This property is particularly important for short-term and intra-hour irradiance forecasting under variable sky conditions. Within this broader framework, four types of tree-based models were evaluated for solar irradiance prediction:

Random Forest (RF): RF relies on bootstrap aggregation and random feature selection to capture complex dependencies between solar irradiance and atmospheric predictors;
Extremely Randomized Trees (XTR): XTR introduces additional split-level randomness, which often improves generalization and reduces computational cost;
Light Gradient Boosting (LGB): LGB is a gradient boosting approach designed for efficient learning from large datasets and enables accurate irradiance forecasting at high temporal resolution;
Extreme Gradient Boosting (XGB): XGB is a regularized boosting framework that incrementally builds decision trees and performs well in capturing rapid irradiance fluctuations associated with cloud dynamics.

The first group of RF models has the broadest representation, comprising five models. The first model, denoted S-RF1, uses as predictors only the global solar irradiance at the prediction time and its lagged values. The second model, S-RF2, adds a deterministic predictor, namely the solar elevation angle. The third model, S-RF3, further extends the predictor set by including the measured diffuse irradiance at the time the forecast is issued.

The predictors were selected to specifically account for the unique characteristics of mountainous terrain, including rapid irradiance fluctuations caused by clouds, altitude-dependent solar geometry effects, and terrain-induced shading. Lagged irradiance values and solar geometry variables were included to capture these short-term and location-specific variations, while redundant or weakly informative variables were excluded to improve model robustness across high-altitude stations.

The next six models employed the same core set of predictors: global, diffuse and direct-normal solar irradiances, standard deviation of solar irradiance, global solar irradiance rolling averages over 5, 15, and 60 min, extraterrestrial horizontal solar irradiance, and solar elevation angle. The fourth model, denoted RF, increases the number of predictors by incorporating the core set of predictors. The fifth model, RFc, is implemented in two stages. It extends the predictor set by introducing the forecasted diffuse solar irradiance obtained in the first stage. The diffuse solar irradiance component itself was predicted using extraterrestrial horizontal solar irradiance, solar elevation angle, the actual and 15 min lag of diffuse solar irradiance, and the diffuse solar irradiance standard deviation. The XTR group comprises two model versions, denoted XTR and XTRc. XTR is a single-stage model that issues forecasts using the same predictors as RF. XTRc, similarly to RFc, is implemented in two stages: in the first stage, diffuse irradiance is forecasted and then used, in the second stage, as one of the predictors for global irradiance. LGB and XGB are implemented in their standard configurations. Figure 1 presents a workflow diagram illustrating the comparative model evaluation procedure and highlighting differences in the input sets.

Therefore, nine forecasting models are evaluated: five RF model versions and two XTR model versions, as well as LGB and XGB. Together, these models cover a broad range of tree-based ensemble strategies suited to short-term solar irradiance forecasting, where nonlinear effects and rapid temporal variability dominate. In addition, a simple persistence model is tested and used as a reference for the evaluation.

Regarding algorithmic configuration, all RF-based models shared identical hyperparameters. These hyperparameters were not selected arbitrarily; instead, a grid search was conducted to identify the parameter combinations yielding the highest predictive performance. This process resulted in four best-performing hyperparameter sets, and, notably, for each hyperparameter, at least three of the four optimal configurations converged on the same value. In contrast, the LGB and XGB models were included primarily as a baseline and were therefore not tuned; both were trained using their default out-of-the-box hyperparameters. Additional experiments were conducted to assess the impact of including direct-normal and diffuse components as predictors, as well as the use of global solar irradiance maximum and minimum measurements. While the inclusion of global solar irradiance max/min values did not show any visible improvement, the presence of direct-normal and diffuse components provided a performance gain across all models except XGB. All data processing and analyses were performed using Python 3.14.

2.2. Data

The present study was conducted using radiometric data collected from eight stations of the Baseline Surface Radiation Network (BSRN) [21]. BSRN is a global network of high-quality radiometric stations designed to provide accurate and continuous measurements of surface solar radiation. The network follows strict measurement protocols and quality control procedures, making its data a reliable reference for solar energy and climate studies. The radiometric stations selected for this study are listed in Table 1 and ordered according to local altitude. The stations are globally distributed, ranging from the USA to Taiwan, and are located at elevations spanning from sea level (Cabauw, the Netherlands) to 3858 m (Yushan, Taiwan). Together, they cover three of the five Köppen–Geiger climate classes [22] (B—arid; C—temperate; E—polar and alpine climates). These elements indicate diversity in the dataset, which a priori represents a challenge for any solar irradiance forecasting model.

The database includes three stations located in transitional areas between plains and mountainous regions, within hilly landscapes characterized by complex terrain and distinct climatic conditions: CNR (Spain), DRA (Nevada, USA), and BOS (Colorado, USA). CNR, situated near Pamplona in northern Spain, lies in a gently hilly landscape at the transition between the Ebro basin and the Cantabrian Mountains; DRA is located within the Basin and Range Province of southern Nevada, a region marked by alternating basins and low mountain ranges; and BOS is positioned in the Boulder Valley at the foothills of the Rocky Mountains.

The radiometric data included in the primary dataset comprise global, diffuse, and direct normal solar irradiance, as well as the standard deviation of global solar irradiance. It should be noted that BSRN provides one-minute mean values of solar irradiance derived from measurements sampled at a one-second resolution. Consequently, the standard deviation represents a coarse quantification of the variability of the solar radiative regime. In addition to these quantities provided directly by BSRN, the following deterministic variables were calculated and included in the dataset: solar irradiance at the top of the atmosphere and the solar elevation angle.

Sky conditions are characterized using two indicators, namely the sunshine number (SSN) and the sunshine stability number (SSSN). These indicators are introduced in the following. SSN is defined as a time-dependent random binary variable [23]:

S S N_{t} = \{\begin{cases} 1 if the Sun shines at time t \\ 0 otherwise \end{cases}

(1)

Series of SSN values were derived from measurements on basis of the World Meteorological Organization sunshine criterion [24]: the sun is shining at time t if the direct-normal solar irradiance

G_{d n}

exceeds 120 W/m², i.e.,

S S N_{t} = \{\begin{cases} 1 if G_{d n} > 120 {W / m}^{2} \\ 0 otherwise \end{cases}

(2)

Obviously, the average value of SSN over a given period

Δ t

equals the relative sunshine over

Δ t

.

On basis of SSN, a straightforward quantifier for the variability in the state of the sky can be defined [25]:

S S S N_{t} = \{\begin{cases} 1 if S S N_{t} > S S N_{t - 1} \\ 0 otherwise \end{cases}

(3)

Equation (3) defines SSSN in a simpler and more intuitive form compared to Ref. [25]. According to this definition, SSSN identifies the time instants in the SSN time series when the sun appears from behind clouds. The average value of SSSN during

Δ t

, denoted

\bar{S S S N}

, measures the frequency of changing SSN during

Δ t

. Therefore,

\bar{S S S N}

appears as a natural quantifier for the variability in the state of the sky and solar irradiance time series.

Figure 2 presents the temporal variation in global solar irradiance

G

, the sunshine number (SSN), and the sunshine stability number (SSSN) at Sonnblick over four days in 2025: 26 May, 27 May, 30 May, and 31 May. The four days were selected to illustrate distinctly different sky conditions: 26 May exhibits pronounced instability; 27 May is characterized by cloudy conditions in the morning followed by increased variability in the afternoon; 30 May also starts with cloudy conditions, but, following a variable period, near clear-sky conditions prevail; 31 May represents an almost clear-sky day, with the exception of an episode of variability in the afternoon. The graphical representations of SSN and SSSN resemble bar code-like patterns. This analogy can be extended further, as these patterns provide a compact encoding of the prevailing sky conditions.

Figure 3 presents basic statistical summaries, in the form of boxplots, of the global solar irradiance (the forecasted variable), the daily mean SSN, and the daily mean SSSN across the stations. Compared to low-altitude stations, high-altitude stations exhibit a much wider range of variability in solar irradiance. At low-altitude stations, although SSN and SSSN vary from one site to another, sky-condition behavior remains largely similar. In contrast, mountainous stations show pronounced differences in sky conditions. While IZA is characterized by predominantly sunny and highly stable conditions, the situations at SON and YUS are markedly different. SON exhibits predominantly cloudy conditions with enhanced variability, whereas YUS records the highest sky-condition variability. For stations located in transitional areas, the surrounding orography contributes to complex local atmospheric dynamics. However, the variability in solar irradiance is strongly conditioned by climate: a high variability is observed at CNR, which is characterized by a temperate climate, whereas lower variability is experienced at DRA and BOS, both located in arid climatic regions. Overall, Figure 3 reveals substantial diversity in the dataset, which constitutes a challenge for any solar irradiance forecasting model.

At this point, it is important to note that, at each station, the data from each month were split chronologically into two subsets: measurements from the first part of the month (70%) were used for model training, while data from the second part of the month (30%) were used for model testing.

3. Results

3.1. Comparative Accuracy of Tree-Based Models

All nine tree-based models, together with the persistence model, were applied to all BSRN stations listed in Table 1, independently for each month. The results are presented compactly in Figure 4 as a radar chart. The radar chart displays nRMSE values. The testing stations are arranged along the radial dimension, enabling a comparative assessment of model performance across sites. Several features are evident in Figure 4. In general, regardless of the station, all tested models outperform the persistence model. The performance of the tree-based models varies significantly from one station to another. However, when comparing the models among themselves, no substantial differences are observed, irrespective of the station considered. The largest differentiation among the models is observed at high-altitude stations SON and YUS. For both the RF and XTR families, the radar chart analysis indicates that the compounded models (RFc and XTRc) and S-RF3, incorporating the diffuse forecast or simply the diffuse irradiance, consistently outperform their simpler counterparts.

The overall conclusion drawn from the radar chart is that the XTRc model exhibits the best performance; therefore, it was selected for the analyses presented in the following sections.

To provide a more robust comparison, a ranking-based evaluation was performed. For each month (station), models were ranked according to their nRMSE values, and scores from 16 (best-performing model) to 1 (worst-performing model) were assigned. The scores were then aggregated across all months to obtain an overall performance ranking (between 22 and 132 for XTRc). Based on this cumulative ranking, the XTRc model achieved the highest total score and was therefore selected for subsequent analysis.

3.2. Model Performance: Accuracy

Table 2 reports the statistical indicators nRMSE, nMBE, and Skill Score for the XTRc model, computed for the eight locations both separately for the two months and aggregated over the entire two-month period.

The nRMSE values range from 4.9% to 52%. The lowest nRMSE, both for the individual months and for the aggregated two-month period, was obtained at the Izaña station (IZA), Tenerife, Spain. This site is characterized by a low geographic latitude (28.309°) among the analyzed locations and by a high altitude of 2372 m. Among the stations with relatively high altitudes—Boulder, Colorado, USA (1689 m), Sonnblick, Austria (3108 m), and Yushan, Taiwan (3858 m), all exhibiting an aggregated nRMSE of approximately 35%— the XTRc model performs substantially better at Izaña, achieving an nRMSE of only 5.4%. The good performance observed at Izaña may be attributed to its low latitude and subtropical Mediterranean climate, which is characterized by low variability. In contrast, the Yushan (YUS) station, despite having an even lower latitude of 23.487°, exhibits a poor model performance, with an aggregated nRMSE of 34.9%. This station is located at the highest altitude among the analyzed sites (3858 m) and is characterized by an alpine climate. The second-best performance of the XTRc model (16.1% nRMSE) was observed at the Desert Rock station, Nevada, USA, located at a latitude of 36.626° and an altitude of 1007 m. This suggests that the model tends to perform better at sites characterized by a relatively low latitude and altitude. The only station where this pattern does not hold is Cener, Spain (CNR). The poorest model performance was observed at this site, with an aggregated nRMSE of 38.1%, despite its relatively low altitude of 471 m and a moderate latitude of 42.816°. Certainly, latitude and altitude are not the only factors influencing the model’s performance. Another decisive factor could be the variability in the state of the sky. This variability, quantified through the SSN and SSSN, is analyzed in Section 3.4.

The second statistical indicator, nMBE, exhibits relatively low values. Only three of the monthly values indicate an overestimation exceeding 5%, and these occur at stations with altitudes higher than 1689 m.

All Skill Score values are positive, ranging from 0.087 to 0.298, indicating that the XTRc model outperforms the persistence reference at all stations. The highest values, both for the individual months and the aggregated period, were observed at the Izaña station, Tenerife.

The solar elevation angle,

h

, is an important predictor for solar irradiance. Figure 5 illustrates the influence of the solar elevation angle on the performance of the XTRc model. For this analysis, the test day results were grouped into three categories based on the parameter

h / h_{\max}

, obtained by dividing the solar elevation angle by its maximum value reached on the respective day. Data points with

\frac{h}{h_{\max}} < 0.25

belong to the LOW category, those with

0.25 \leq h / h_{\max} < 0.75

to the MID category, and those with

h / h_{\max} \geq 0.75

to the HIGH category.

In Figure 5a, it can be observed that, with a single exception at the BUD station, the nRMSE decreases as the solar elevation angle increases. According to Figure 5b, the model produces significant overestimations only for low values of the solar elevation angle. The Skill Score shown in Figure 5c indicates that the model exceeds the persistence reference most strongly at the high-accuracy stations IZA and DRA, particularly for low values of the solar elevation angle. At the IZA station, the negative Skill Score obtained by XTRc at high solar elevation angles can be explained as follows. At IZA, the sky is predominantly clear, with only few cloudy periods. Under clear-sky conditions, solar irradiance exhibits only minor changes over 15 min intervals. Consequently, the persistence model is expected to perform very well, failing only during rare transitions in the state of the sky. In contrast, the accuracy of the XTRc model at IZA, which relies on a large number of input predictors, is affected by overfitting, as the model learns the training data too closely and captures outliers.

3.3. Model Performance: Precision

The forecast precision is measured by the percentage of forecasts P [%] accurate to within a given interval T [%], centered on measurements. Figure 6 displays the results, i.e., the percentage P against T for all the eight stations. The ranking of stations according to the performance of the XTRc model is preserved in the analysis of prediction precision. The IZA, DRA, and BUD stations consistently stand out for all values of the tolerance interval

T

. For the stations characterized by high altitude and/or high variability (YUS, BOS, SON, and CNR), the prediction precision is lower. The percentage values

P

as a function of

T

are tightly clustered, indicating similar performance levels across tolerance intervals. For

T > 50 %

, the model prediction precision tends to become similar across all eight stations.

To provide a more intuitive assessment of the XTRc model performance, time series plots comparing the predicted and actual global irradiance values were generated for two representative stations: DRA and SON. Four days with diverse climatic conditions were selected for each station to capture different weather scenarios. Figure 7 and Figure 8 present the actual and predicted irradiance values, illustrating the model’s ability to follow the temporal variability of solar irradiance accurately. These plots offer location-specific insights and complement the quantitative analysis presented earlier, highlighting both the strengths and limitations of the model under varying conditions.

Forecasting solar irradiance in mountainous environments remains a challenging task due to complex cloud dynamics, strong topographic effects, and rapid temporal variability. This difficulty has been recently highlighted by Kosmopoulos et al. [26], who applied a state-of-the-art three-dimensional Cloud Motion Vector (3D-CMV) forecasting technique combined with a fast radiative transfer model across multiple European and African sites. Their results indicate that the largest forecast errors were observed at the Sonnblick station, which has the highest altitude among the analyzed locations.

The time series comparisons presented in Figure 8 for Sonnblick further illustrate these challenges, showing rapid irradiance fluctuations under diverse meteorological conditions. Despite these complexities, the proposed XTRc model is able to capture the main temporal patterns of global irradiance, demonstrating its robustness and suitability for high-altitude solar forecasting.

3.4. Model Performance Under State-of-the-Sky Variability

The sky condition was characterized using two indicators, namely the sunshine number (SSN) and the sunshine stability number (SSSN). These two indicators were averaged for each of the test days across all eight stations. The nRMSE was calculated for each of these days and is presented as a contour plot in Figure 9, shown as a function of the daily mean SSN and daily mean SSSN.

The regions in the plot with lighter colors, approaching yellow, correspond to good model performance, whereas the areas with darker colors indicate poor performance. The region in the lower-right corner, with daily mean SSN > 0.7 (high sunshine) and daily mean SSSN < 0.02 (low variability), is depicted entirely in shades of yellow and corresponds to good model performance. The test days with data from the IZA station have the lowest daily mean SSN (0.9176) and the highest daily mean SSSN (0.0114), indicating that they fall within this rectangle. All of these days are characterized by high sunshine and low variability. The test day data from the DRA station, where the second-best performance was obtained, have the lowest daily mean SSN (0.6643) and the highest daily mean SSSN (0.052). At the CNR station, where the performance is somewhat surprisingly the lowest (despite an altitude of only 471 m), the test day data show an average daily mean SSN of 0.498 and an average daily mean SSSN of 0.034. This value of 0.034 is the highest among all eight stations. Moreover, the daily mean SSSN reaches a maximum of 0.0689, which is the highest value observed across all test days from all stations. We can therefore conclude that the high variability is likely the cause of the poor model performance at this station.

3.5. Comparison with Other Models

To further assess the robustness of the proposed model, an additional comparison was conducted using benchmark models from different methodological classes, namely Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) (Figure 10). The evaluation was performed on the same dataset, with a specific focus on high-altitude stations located above 1000 m a.s.l., which typically exhibit a higher variability and represent more challenging forecasting conditions. A ranking-based approach was adopted exclusively for these high-altitude stations, where for each station the three models (XTRc, SVM, and MLP) were ranked according to their nRMSE values and assigned scores of 3, 2, and 1, respectively. The aggregated scores resulted in 21 points for XTRc, 20 points for MLP, and 19 points for SVM. While the performance differences are relatively small, the results confirm the consistent superiority of the XTRc model under demanding mountainous conditions.

4. Conclusions

This study investigated very-short-term global solar irradiance forecasting using multiple random tree-based model versions evaluated on a challenging dataset collected at globally distributed stations, spanning elevations from sea level to nearly 4000 m and covering diverse climate classes. The analysis demonstrates that incorporating diffuse irradiance as a predictor, combined with a comparative assessment across lowland and mountainous locations, provides valuable insight into model robustness under heterogeneous atmospheric conditions.

Nine tree-based models were analyzed and compared in the study, and the best-performing tree-based model was further compared with two additional models—SVM and MLP, representing kernel-based and ANN classes, respectively—using data from high-altitude locations.

Among the tested approaches, Extremely Randomized Trees (XTRc) consistently exhibited the best overall performance. Across individual stations, XTRc achieved Skill Scores ranging from 0.087 to 0.298, confirming its strong predictive capability under very-short-term forecasting horizons. Importantly, the ranking of model performance was largely preserved across different stations, although prediction accuracy varied significantly with local site characteristics.

The XTRc model achieves its best performance at sites characterized by a relatively low latitude and altitude, with the lowest nRMSE of 5.4% observed at the Izaña station (2372 m), compared to aggregated nRMSE values of around 35% at other high-altitude stations. These results indicate that, beyond altitude and latitude, sky-condition variability plays a decisive role in determining model performance, as illustrated by the poor performance at Yushan (34.9%) and Cener (38.1%).

The results further indicate that model accuracy remains high even at mountain stations, provided that sky-condition variability is low. Conversely, stations characterized by a high altitude combined with a pronounced cloud variability posed greater challenges for all forecasting models, highlighting the intrinsic difficulty of solar irradiance prediction under highly unstable atmospheric regimes. From a general modeling perspective, the performance of solar irradiance forecasting models is intrinsically limited by persistence. Incorporating physical information derived from direct sky observations into the XTRc model may reduce its tendency to extrapolate the current state into the future. We plan to investigate this approach in detail in future work.

Overall, the findings confirm that tree-based ensemble methods, and XTRc in particular, offer robust performance across a wide range of geographic and climatic conditions. These results underline the relevance of accurate very-short-term solar irradiance forecasting for the intelligent management of photovoltaic systems, especially in high-altitude and remote, off-grid environments.

Future work will aim to improve the performance of the XTRc model at mountainous locations, where the high altitude and pronounced sky-condition variability pose additional challenges for very-short-term solar irradiance forecasting.

Author Contributions

Conceptualization, E.P., M.P. and L.V.; methodology, E.P., M.P. and L.V.; software, L.V. and E.P.; validation, L.V. and E.P.; formal analysis, E.P., M.P. and L.V.; data curation, L.V.; writing—original draft preparation, M.P.; writing—review and editing, E.P., M.P. and L.V.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in the study are openly available in Baseline Solar Radiation Network, https://bsrn.awi.de/.

Acknowledgments

This research was carried out within the framework of the UNITA Starting Grant projects.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sabadus, A.; Blaga, R.; Hategan, S.M.; Calinoiu, D.; Paulescu, E.; Mares, O.; Boata, R.; Stefu, N.; Paulescu, M.; Badescu, V. A cross-sectional survey of deterministic PV power forecasting: Progress and limitations in current approaches. Renew. Energy 2024, 226, 120385. [Google Scholar] [CrossRef]
Moreno, G.; Santos, C.; Martín, P.; Rodríguez, F.J.; Peña, R.; Vuksanovic, B. Intra-Day Solar Power Forecasting Strategy for Managing Virtual Power Plants. Sensors 2021, 21, 5648. [Google Scholar] [CrossRef] [PubMed]
Blaga, R.; Sabadus, A.; Stefu, N.; Dughir, C.; Paulescu, M.; Badescu, V. A current perspective on the accuracy of incoming solar energy forecasting. Prog. Energy Combust. Sci. 2019, 70, 119–144. [Google Scholar] [CrossRef]
Díaz-Bedoya, D.; González-Rodríguez, M.; Clairand, J.-M.; Serrano-Guerrero, X.; Escrivá, G. Forecasting Univariate Solar Irradiance Using Machine Learning Models: A Case Study of Two Andean Cities. Energy Convers. Manag. 2023, 296, 117618. [Google Scholar] [CrossRef]
Ansong, M.; Huang, G.; Nyang’onda, T.N.; Musembi, R.J.; Richards, B.S. Very Short-Term Solar Irradiance Forecasting Based on Open-Source Low-Cost Sky Imager and Hybrid Deep-Learning Techniques. Sol. Energy 2025, 294, 113516. [Google Scholar] [CrossRef]
Jonathan, A.L.; Bamisile, O.; Cai, D.; Ejiyi, C.J.; Nkou Nkou, J.J.; Victor, K.; Ukwuoma, C.C.; Wei, L.; Huang, Q. A Multimodal Deep Learning Approach for Very Short-Term Solar Forecasts Using Sky Images and Historical Numerical Data. Renew. Energy 2025, 255, 123774. [Google Scholar] [CrossRef]
Ajith, M.; Martinez-Ramon, M. Deep Learning Algorithms for Very Short Term Solar Irradiance Forecasting: A Survey. Renew. Sustain. Energy Rev. 2023, 182, 113362. [Google Scholar] [CrossRef]
Chu, Y.; Wang, Y.; Yang, D.; Chen, S.; Li, M. A Review of Distributed Solar Forecasting with Remote Sensing and Deep Learning. Renew. Sustain. Energy Rev. 2024, 198, 114391. [Google Scholar] [CrossRef]
Trull, O.; García-Díaz, J.C.; Peiró-Signes, A. A Comparative Study of Statistical and Machine Learning Methods for Solar Irradiance Forecasting Using the Folsom PLC Dataset. Energies 2025, 18, 4122. [Google Scholar] [CrossRef]
Pedro, H.T.C.; Larson, D.P.; Coimbra, C.F.M. A Comprehensive Dataset for the Accelerated Development and Benchmarking of Solar Forecasting Methods. J. Renew. Sustain. Energy 2019, 11, 036102. [Google Scholar] [CrossRef]
Lee, J.; Wang, W.; Harrou, F.; Sun, Y. Reliable solar irradiance prediction using ensemble learning-based models: A comparative study. Energy Convers. Manag. 2020, 208, 112582. [Google Scholar] [CrossRef]
Ahmed, U.; Khan, A.R.; Mahmood, A.; Rafiq, I.; Ghannam, R.; Zoha, A. Short-term global horizontal irradiance forecasting using weather-classified categorical boosting. Appl. Soft Comput. 2024, 155, 111441. [Google Scholar] [CrossRef]
Hategan, S.-M.; Stefu, N.; Paulescu, M. An Ensemble Approach for Intra-Hour Forecasting of Solar Resource. Energies 2023, 16, 6608. [Google Scholar] [CrossRef]
e Silva, A.; Cesar, B.; Callejo, M.; Cira, C.-I. Impact of Stationarizing Solar Inputs on Very-Short-Term Spatio-Temporal Global Horizontal Irradiance (GHI) Forecasting. Energies 2024, 17, 3527. [Google Scholar] [CrossRef]
Coathup, T.J.; Rodgers, M. Solar irradiance forecasting with visible spectrum sky-view images and random forest regression. In Proceedings of the IEEE Photovoltaic Specialist Conference (PVSC), Philadelphia, PA, USA, 9–14 June 2024. [Google Scholar] [CrossRef]
McCandless, T.; Dettling, S.; Haupt, S.E. Comparison of Implicit vs. Explicit Regime Identification in Machine Learning Methods for Solar Irradiance Prediction. Energies 2020, 13, 689. [Google Scholar] [CrossRef]
Asiedu, S.T.; Suvedi, A.; Wang, Z.; Rekabdarkolaee, H.M.; Hansen, T.M. Spatiotemporal Downscaling Model for Solar Irradiance Forecast Using Nearest-Neighbor Random Forest and Gaussian Process. Energies 2025, 18, 2447. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Inman, R.H.; Pedro, H.T.C.; Coimbra, C.F.M. Solar forecasting methods for renewable energy integration. Prog. Energy Combust. Sci. 2013, 39, 535–576. [Google Scholar] [CrossRef]
Voyant, C.; Notton, G.; Kalogirou, S.; Nivet, M.-L.; Paoli, C.; Motte, F.; Fouilloy, A. Machine learning methods for solar radiation forecasting: A review. Renew. Energy 2017, 105, 569–582. [Google Scholar] [CrossRef]
Baseline Surface Radiation Network. BSRN. 2025. Available online: https://bsrn.awi.de/ (accessed on 1 December 2025).
Kottek, M.; Grieser, J.; Beck, C.; Rudolf, B.; Rubel, F. World Map of the Koppen-Geiger climate classification updated. Meteorol. Z. 2006, 15, 259–263. [Google Scholar] [CrossRef]
Badescu, V.; Paulescu, M. Statistical properties of the sunshine number illustrated with measurements from Timisoara (Romania). Atmospheric Res. 2011, 101, 194–204. [Google Scholar] [CrossRef]
World Meteorological Organization (WMO). Guide to Meteorological Instruments and Methods of Observation, 7th ed.; WMO-No. 8; WMO: Geneva, Switzerland, 2008. [Google Scholar]
Paulescu, M.; Badescu, V. New approach to measure the stability of the solar radiative regime. Theor. Appl. Clim. 2011, 103, 459–470. [Google Scholar] [CrossRef]
Kosmopoulos, P.; Dhake, H.; Melita, N.; Tagarakis, K.; Georgakis, A.; Stefas, A.; Vaggelis, O.; Korre, V.; Kashyap, Y. Multi-Layer Cloud Motion Vector Forecasting for Solar Energy Applications. Appl. Energy 2024, 353, 122144. [Google Scholar] [CrossRef]

Figure 1. Workflow diagram illustrating the comparative model evaluation procedure.

Figure 2. Temporal variation in global solar irradiance G, sunshine number SSN, and sunshine stability number SSSN at Sonnblick over four days in 2025: (a) 26 May, (b) 27 May, (c) 30 May and (d) 31 May. SSN and SSSN switch between 0 and 1.

Figure 3. Boxplot representation of (a) global solar irradiance G, (b) daily mean of sunshine number SSN, and (c) daily mean of sunshine stability number SSSN. Stations are identified by the BSRN indices listed in Table 1.

Figure 4. Radar chart showing the normalized root mean square error registered by the models at each station for each month. In the plot, the stations are arranged radially and labeled as CODmm, where COD denotes the BSRN station index and mm indicates the month in which the testing was conducted (see Table 1).

Figure 5. (a) Normalized root mean square error nRMSE, (b) normalized mean bias error nMBE, and (c) Skill Score of the XTRc model at the BSRN stations listed in Table 1. For each station, the statistical metrics are shown for three classes of solar elevation angle h: LOW, MID, and HIGH.

Figure 6. Percentage of forecasts P accurate to within a tolerance interval T centered on the measured value.

Figure 7. Temporal variation in global solar irradiance G, actual and predicted, at Desert Rock over four days in 2020: (a) 23 August, (b) 25 August, (c) 28 August and (d) 30 August.

Figure 8. Temporal variation in global solar irradiance G, actual and predicted, at Sonnblick over four days in 2025: (a) 26 May, (b) 27 May, (c) 30 May and (d) 31 May.

Figure 9. Contour plot of the nRMSE as a function of daily mean SSN and daily mean SSSN, based on measurements from the eigth locations listed in Table 1.

Figure 10. Radar chart showing the normalized root mean square error registered by the models XTRc, SVM and MLP at each station for each month.

Table 1. BSRN radiometric stations providing data used in this study. The climate at each station is indicated according to the Köppen–Geiger scheme [22]. (B—arid; C—temperate; E—polar and alpine climates).

Station/Location	BSRN Index	Latitude [deg]	Longitude [deg]	Altitude [m]	Climate	Period
Station/Location	BSRN Index	Latitude [deg]	Longitude [deg]	Altitude [m]	Climate	Year	Month
Cabauw, Netherlands	CAB	51.9680	4.9280	0	C	2025	04, 08
Budapest, Hungary	BUD	47.429	19.182	139	C	2024	08, 09
Cener, Spain	CNR	42.816	−1.601	471	C	2025	03, 07
Desert Rock, Nevada, USA	DRA	36.626	−116.018	1007	B	2020	02, 08
Boulder, Colorado, USA	BOS	40.125	−105.237	1689	B	2019	03, 07
Izana, Tenerife, Spain	IZA	28.309	−16.499	2372	C	2025	02, 07
Sonnblick, Austria	SON	47.054	12.957	3108	E	2025	05, 07
Yushan, Taiwan	YUS	23.487	120.959	3858	E	2022	06, 11

Table 2. Statistical indicators of accuracy when XTRc was tested at the stations listed in Table 1.

Station	Month 1			Month 2			All
Station	nRMSE	nMBE	Skill Score	nRMSE	nMBE	Skill Score	nRMSE	nMBE	Skill Score
CAB	0.187	−0.004	0.134	0.407	0.024	0.212	0.300	0.008	0.195
BUD	0.175	0.007	0.175	0.271	0.016	0.157	0.210	0.010	0.166
CNR	0.385	−0.001	0.138	0.370	0.035	0.181	0.381	0.021	0.168
DRA	0.173	−0.003	0.221	0.154	−0.014	0.152	0.161	−0.010	0.182
BOS	0.287	−0.020	0.087	0.394	0.052	0.133	0.354	0.020	0.121
IZA	0.062	−0.009	0.222	0.049	0.004	0.298	0.054	−0.000	0.265
SON	0.273	−0.005	0.149	0.520	0.073	0.166	0.354	0.020	0.157
YUS	0.387	−0.004	0.136	0.283	0.054	0.125	0.349	0.022	0.133

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Velimirovici, L.; Paulescu, E.; Paulescu, M. Short-Term Solar Irradiance Forecasting Using Random Forest-Based Models with a Focus on Mountain Locations. Energies 2026, 19, 769. https://doi.org/10.3390/en19030769

AMA Style

Velimirovici L, Paulescu E, Paulescu M. Short-Term Solar Irradiance Forecasting Using Random Forest-Based Models with a Focus on Mountain Locations. Energies. 2026; 19(3):769. https://doi.org/10.3390/en19030769

Chicago/Turabian Style

Velimirovici, Lucas, Eugenia Paulescu, and Marius Paulescu. 2026. "Short-Term Solar Irradiance Forecasting Using Random Forest-Based Models with a Focus on Mountain Locations" Energies 19, no. 3: 769. https://doi.org/10.3390/en19030769

APA Style

Velimirovici, L., Paulescu, E., & Paulescu, M. (2026). Short-Term Solar Irradiance Forecasting Using Random Forest-Based Models with a Focus on Mountain Locations. Energies, 19(3), 769. https://doi.org/10.3390/en19030769

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Short-Term Solar Irradiance Forecasting Using Random Forest-Based Models with a Focus on Mountain Locations

Abstract

1. Introduction

2. Methodology

2.1. Models

2.2. Data

3. Results

3.1. Comparative Accuracy of Tree-Based Models

3.2. Model Performance: Accuracy

3.3. Model Performance: Precision

3.4. Model Performance Under State-of-the-Sky Variability

3.5. Comparison with Other Models

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI