Estimation of Solar Diffuse Radiation in Chongqing Based on Random Forest

Wan, Peihan; He, Yongjian; Zheng, Chaoyu; Wen, Jiaxiong; Gu, Zhuting

doi:10.3390/en18040836

Open AccessArticle

Estimation of Solar Diffuse Radiation in Chongqing Based on Random Forest

by

Peihan Wan

¹,

Yongjian He

^1,*,

Chaoyu Zheng

²,

Jiaxiong Wen

¹ and

Zhuting Gu

¹

School of Geographical Sciences, Nanjing University of Information Science and Technology, Nanjing 210044, China

²

Fujian Provincial Climate Center, Fujian Provincial Meteorological Bureau, Fuzhou 350001, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(4), 836; https://doi.org/10.3390/en18040836

Submission received: 21 January 2025 / Revised: 8 February 2025 / Accepted: 10 February 2025 / Published: 11 February 2025

(This article belongs to the Section A2: Solar Energy and Photovoltaic Systems)

Download

Browse Figures

Versions Notes

Abstract

Solar diffuse radiation (DIFRA) is an important component of solar radiation, but current research into the estimation of DIFRA is relatively limited. This study, based on remote sensing data, topographic data, meteorological reanalysis materials, and measured data from radiation observation stations in Chongqing, combined key factors such as the solar elevation angle, water vapor, aerosols, and cloud cover. A high-precision DIFRA estimation model was developed using the random forest algorithm, and a distributed simulation of DIFRA in Chongqing was achieved. The model was validated using 8179 measured data points, demonstrating good predictive capability with a correlation coefficient (R²) of 0.72, a mean absolute error (MAE) of 35.99 W/m², and a root mean square error (RMSE) of 50.46 W/m². Further validation was conducted based on 14 radiation observation stations, with the model demonstrating high stability and applicability across different stations and weather conditions. In particular, the fit was optimal for the model under overcast conditions, with R² = 0.70, MAE = 32.20 W/m², and RMSE = 47.51 W/m². The results indicate that the model can be effectively adapted to all weather calculations, providing a scientific basis for assessing and exploiting solar energy resources in complex terrains.

Keywords:

diffuse radiation; random forest; spatiotemporal distribution; all weather

1. Introduction

Solar diffuse radiation (DIFRA) is a key component of solar radiation reaching the Earth’s surface, and its accurate estimation is crucial for climate change research, solar energy resource assessment, ecological environment protection, and agricultural management [1,2,3]. DIFRA is generated by the scattering of solar radiation by gasses, aerosols, and cloud layers in the atmosphere [4]. This process is influenced by various factors, including the solar elevation angle (SEA), atmospheric composition, and surface characteristics [5,6]. The complexity of these factors presents significant challenges in estimating DIFRA. Traditional solar radiation estimation methods can be categorized into physical models and empirical models [7].

Physical models mainly include two types: parameterized models and atmospheric radiation transfer-based models. Parameterized models calculate solar shortwave radiation under various weather conditions using fixed meteorological parameters through numerous physical formulas and mathematical computations [8]. Examples include the Bird model [9,10], the Iqbal model [11], and the REST2 model [12], among others. The other type of physical model is based on atmospheric radiation transfer theory, with models such as RSTAR [13], MRM [14], MODTRAN [15], SMATRS2 [16,17], and SBDART [18], which have been widely used in radiation-related studies by scholars both domestically and internationally. Although physical models account for complex interactions among various factors, the difficulty in acquiring parameters and their complex structure inevitably leads to issues such as poor model portability and low general applicability.

Empirical models estimate DIFRA by establishing the empirical relationships among conventional meteorological factors, such as sunshine hours, radiation, temperature, water vapor, cloud cover, and surface observed radiation [5,19]. The two most commonly used models are those based on the DIFRA ratio (Kd) and the clear-sky index (Kt). The relationship between Kd and Kt was first established by Liu and Jordan [20] in 1960. Spencer et al. [21] proposed a modified model based on latitude and Kd, using observations from five stations in Australia. Based on total solar radiation data, Jamil and Akhtar [22,23] developed empirical models to estimate the monthly mean DIFRA for the subtropical climate of India. Vernet and Fabregat [24] evaluated 23 existing empirical models for estimating daily solar radiation on the Northeast Coast of the Iberian Peninsula and concluded that models based on cloud cover performed better. Mubiru and Banda [25] compared various empirical models and concluded that models based on Kd and relative sunshine hours provided better estimation results. Wang et al. [26] constructed 97 multivariable DIFRA estimation models using over 20 meteorological and geographic parameters, including the clear-sky index, atmospheric quality, relative humidity, and daily temperature. Empirical models are simple to calculate, but they rely heavily on ground-based meteorological station data, and in complex mountainous areas, where station density is relatively sparse, this limits their ability to accurately capture the spatial distribution characteristics of radiation [27].

With the rapid development of remote sensing and computer technology, machine learning has demonstrated strong capabilities in dealing with nonlinear problems, providing more flexible and efficient solutions than traditional methods [28]. Machine learning models can predict DIFRA by capturing the nonlinear relationships among multiple meteorological factors, geographical factors, and DIFRA [29].

Demircan et al. [30] improved the Angstrom–Prescott model using the Artificial Bee Colony algorithm, significantly reducing the error in radiation estimation. Barancsuk et al. [31] used sky camera images combined with convolutional neural networks (CNNs) to estimate minute-level DIFRA, with an average R² of 0.87. Wei and Yang [32] developed a combined supervised and unsupervised learning model based on k-means clustering and fuzzy C-means clustering-based models. They established a real-time solar radiation forecasting system capable of predicting the next 12 h of solar radiation updates. Mustafa et al. [33] used both random forest and k-nearest neighbor models to estimate DIFRA in different climatic regions of India, achieving R² values above 0.95, with good estimation results. Lou et al. [34] applied boosted regression trees (BRTs) to estimate DIFRA in Hong Kong and performed sensitivity analysis on meteorological variables, finding that solar elevation, temperature, and cloud cover were crucial to the estimation results. Feng et al. [35] employed an extreme learning machine (ELM), a genetic algorithm-optimized backpropagation neural network (GANN), random forest (RF), and a generalized regression neural network (GRNN) to estimate daily DIFRA in the North China Plain. All four models outperformed the Iqbal physical model. Ramírez-Rivera et al. [36] investigated the application of ensemble learning algorithms (ELAs) for solar radiation prediction in Santo Domingo, developing a high-resolution, minute-level solar radiation model. Lu et al. [37] leveraged a generative adversarial network (GAN) model to estimate the radiation distribution in the Boston metropolitan area, generating a 2 m resolution spatial solar radiation distribution map with enhanced accuracy. Gao et al. [38] accounted for both temporal and spatial dependencies in solar radiation prediction by unfolding sequences and applying a Transformer model, demonstrating substantial performance gains over conventional neural networks. He et al. [39] proposed a hybrid BiLSTM-Transformer model, which effectively extracted deep features and captured long-term dependencies, significantly improving the accuracy of long-term solar radiation forecasts compared to traditional models.

Due to their high accuracy, flexible input variable combinations, and wide applicability, machine learning models have gradually become a widely used and efficient method for estimating solar radiation components. However, most radiation studies focus primarily on the estimation and analysis of total and direct solar shortwave radiation, with relatively few studies specifically modeling DIFRA components. Most of these studies concentrate on simulating monthly and daily DIFRA, while studies with high temporal resolution remain scarce. Moreover, most studies primarily focus on exploring meteorological factors, with insufficient consideration given to topographic factors, and many of the variables used are difficult to obtain. Therefore, this study comprehensively considers input features from three aspects—astronomical elements, atmospheric composition, and topographic factors—to extract DIFRA-related variables from readily available datasets and develop an hourly DIFRA model suitable for complex mountainous regions.

In this study, we used remote sensing, topographic, and meteorological reanalysis data to extract meteorological and topographic feature variables. We proposed a grid-based SEA calculation method, assigning accurate SEA values to each grid to extract high-precision astronomical factors. We then combined this with measured data from radiation observation stations in Chongqing to construct a prediction sample set. A high-temporal-precision (hourly) DIFRA estimation model was developed using the random forest algorithm to address gaps in the existing research. Using this model, the spatiotemporal distribution of surface solar DIFRA in Chongqing was simulated and validated against observation data from ground-based meteorological stations. Further analysis was conducted on the influence of key factors, such as SEA, water vapor, aerosols, and cloud cover, on model errors. Additionally, the spatiotemporal distribution patterns of DIFRA were investigated in relation to topographic factors. Compared with traditional empirical models and other machine learning methods for estimating DIFRA, this study demonstrates some advantages: First, the hourly high temporal resolution model used here can accurately capture instantaneous variations in DIFRA in complex terrains, whereas traditional models, typically based on daily or monthly data, fail to reflect such fine-scale dynamic changes. Second, this study uses easily accessible open data with broad spatial coverage, extracting features in three dimensions (astronomical, meteorological, and topographic). This enhances both the usability of the data and the future portability of the model, allowing for more detailed spatiotemporal distribution simulations. The results provide scientific evidence and technical support for solar energy resource assessment, climate change research, and environmental protection in areas with complex terrain.

2. Materials and Methods

2.1. Study Area

This study focuses on Chongqing as the research area, which is located between 105°17′ E and 110°11′ E, and 28°10′ N and 32°13′ N, and covers an area of approximately 82,400 square kilometers, as illustrated in Figure 1. The topography of Chongqing is characterized by a predominance of mountains and hills, with higher elevations concentrated in the southeast and lower elevations situated in the northwest. This topography exhibits substantial elevation variations, ranging from a minimum of 16 m to a maximum of 2778 m. The region’s topography is marked by the presence of deep valleys and ravines, which exhibit notable topographic undulations. These undulations significantly influence the meteorological conditions across different altitudes and terrain units, thereby intensifying the complexity of the spatial distribution of DIFRA. Consequently, the unique topographic and meteorological characteristics of Chongqing make it an optimal region for studying the distribution patterns of solar DIFRA, providing a solid foundation for DIFRA estimation research.

The figure below shows a digital elevation model (DEM) of Chongqing, illustrating the elevation changes and topographic features of the area. The black pentagrams in the figure represent several radiation observation stations, which are distributed across different elevations and terrain conditions and are used to collect solar radiation observation data.

2.2. Data and Preprocessing

The data for this study are listed in Table 1 and include the following: (1) Ground Observation Data: These data included observations from 14 radiation observation stations in Chongqing from June to December 2016, providing hourly instantaneous values of DIFRA. These observation data provided important field references for the verification and calibration of solar radiation estimation models. (2) DEM Data: The elevation data were sourced from the SRTM data measured jointly by NASA and the U.S. National Geospatial-Intelligence Agency, with a spatial resolution of 30 m. The reason for selecting this dataset is that SRTM data provide high-precision topographic information, which can effectively reflect the impact of complex terrain on solar radiation, making them particularly suitable for radiation simulation in mountainous and other complex terrains. (3) ERA5 Dataset: ERA5-Land hourly data from 1950 to the present (ERA5 Land Data) and ERA5 hourly data on single levels from 1940 to the present (ERA5 single data) were obtained. The ERA5 Land Data provided key meteorological parameters, such as the 2 m dewpoint temperature (DTEM), 2 m temperature (TEM), surface temperature, and surface pressure (SP), while the ERA5 single data provided high cloud cover (HCC), medium cloud cover (MCC), and low cloud cover (LCC), which are important for simulating and analyzing solar radiation. The reason for selecting the ERA5 dataset is that it provided high temporal resolution and spatially continuous global meteorological data, which can offer detailed meteorological parameters for simulating DIFRA. (4) Ozone Data: This dataset was provided by the National Cryosphere Desert Data Center (http://www.ncdc.ac.cn (accessed on 14 August 2024)) and includes the “China Surface Hourly Ozone (HrSOD) Dataset (2005–2020)”, published by Zhang Wenxiu et al. [40]. The reason for selecting this dataset is that ozone has a significant effect on the absorption and scattering of solar radiation. Especially in China, the high spatiotemporal resolution of the ozone data can more accurately reflect the impact of ozone variations on DIFRA. (5) Aerosol Data: This study used the high-resolution aerosol optical depth (AOD) and PM2.5 concentration dataset (LGHAP v2) [41] published by Bai et al. and provided through the Zenodo platform (https://zenodo.org/communities/ecnu_lghap (accessed on 14 August 2024)). The reason for selecting this dataset is its global coverage, absence of data gaps, and spatial resolution of up to 0.01°, providing important aerosol optical depth data for this study, as it directly impacts the scattering and absorption of solar radiation.

In this study, the datasets used to extract the topographic factors and meteorological parameters are publicly available and have extensive spatial coverage, providing a solid foundation for future model transferability.

Before the experiment began, the ground observation data were subjected to quality control, and outliers were removed. Considering that the other datasets had different temporal and spatial resolutions, the mismatch in resolutions may have led to scale effects during model construction, thereby affecting the accurate simulation of the spatiotemporal distribution of DIFRA. Therefore, in the data preprocessing stage, resampling and interpolation methods were used to unify all data to a spatial resolution of 1 km and a temporal resolution of 1 h, and the required variable factors were computed. The relative humidity (RH) and vapor pressure difference (VP) were calculated from the DTEM, TEM, and SP, while the latitude, altitude, slope, and aspect were extracted from the DEM. The topographic factors obtained by resampling the 30 m DEM data to 1 km can still preserve some terrain detail, whereas the coarser spatial resolution of the ERA5 data may not fully capture the subtle effects of terrain on meteorological conditions. In addition, the temporal resolution of the data directly influences the model’s ability to capture the dynamic characteristics of radiation changes. Although this unification process may have resulted in the loss of some high-resolution details, it helped to reduce the errors introduced by resolution mismatches, ensuring consistency in the spatial scale of the model inputs and improving the accuracy and robustness of the model predictions.

2.3. Methods

2.3.1. Grid-Based Solar Elevation Angle Calculation

SEA is a key factor influencing radiation. To accurately calculate SEA under complex topographic conditions, this study employed the following method to compute SEA for each grid point. First, the sunrise and sunset times on the horizontal plane were determined. For any point P in the grid, the latitude of the point was obtained from the data. The solar hour angle for any day of the year at the same latitude as the point was then calculated using the following Formulas (1–3) [32].

ω_{0} = \arccos (- t a n φ t a n δ)

(1)

\begin{matrix} δ = (0.006894 - 0.399512 \cos τ + 0.072075 \sin τ - \\ 0.006799 \cos 2 τ + 0.000896 \sin 2 τ - \\ 0.002689 \cos 3 τ + 0.001516 \sin 3 τ) \end{matrix}

(2)

τ = \frac{2 π (D_{n} - 1)}{365}

(3)

In Equation (1),

ω_{0}

is the solar time angle when the sun is gone, counting from the true solar time at noon, positive toward the west and negative toward the east;

φ

is the latitude (in radians);

δ

is the solar declination [11] (in radians), positive in the northern celestial hemisphere and negative in the southern celestial hemisphere. In Equation (3),

τ

is the angle of the sun, expressed in radians, and can be determined by the number of days

D_{n}

, which ranges from 1 January to 31 December, for a total of 365 days.

Based on the above calculations, it can be determined that the illuminable time on the horizontal plane is

2 ω_{0}

, which represents the astronomical illuminable time. To obtain SEA at any point P at any time, the steps below can be followed:

Given the time step $Δ T (h$ )—in this study, $Δ T = 1 h$ —calculate the corresponding solar time angle step, as shown in Equation (4):

$Δ ω = \frac{2 π}{24 \times 60} \cdot Δ T$

(4)
In the interval [ $- ω_{0}, ω_{0}$ ], the sunrise and sunset times on the horizontal plane are divided into n + 1 moments using $Δ ω$ as the step size:

$\begin{matrix} [- ω_{0}, - ω_{0} + Δ ω, \dots, - ω_{0} + i Δ ω, \dots, - ω_{0} + (n - 1) Δ ω, ω_{0}] \\ n = int (\frac{2 ω_{0}}{Δ ω}) + 1 \end{matrix}$

(5)
Determine ${S E A}_{i}$ at each moment according to Equation (6), which calculates the solar time angle at each moment:

$ω_{i} = - ω_{0} + i Δ ω, i = 0,1, 2, \dots, n - 1$

(6)
The corresponding ${S E A}_{i}$ at each moment can be determined using Equation (7) [11]:

$s i n {S E A}_{i} = s i n φ s i n δ + c o s φ c o s δ c o s ω_{i}$

(7)

2.3.2. Random Forest Algorithm

The random forest (RF) regression model predicts the target variable DIFRA by integrating multiple decision trees. Each decision tree is a feature-based nonlinear function, and the final prediction of the model is the average of the predictions from all decision trees [42]. Assume that

X = (x_{1}, x_{2}, . . ., x_{m})

is the input feature vector, where each x_i corresponds to a specific feature (ozone, VP, TEM, slope, AOD, etc.). The final prediction f(X) is represented by Equation (8):

f (X) = \frac{1}{N} \sum_{k = 1}^{N} T_{k} (X)

(8)

where

f (X)

is the output of the RF regression model and represents the predicted DIFRA value, and N is the number of decision trees.

T_{k} (X)

is the prediction of the kth decision tree on the input X. The predictions of the kth decision tree on the input X are shown in the equation. The prediction of each tree depends on the splitting and weighting of the features. By integrating the predictions from multiple trees, the random forest model is able to capture the complex relationships among features and improve prediction accuracy.

2.3.3. Model Development

The estimation of DIFRA is influenced by various factors, including astronomical, atmospheric composition, and topographic factors. The dataset was structured to incorporate the following variables: (1) astronomical factors: SEAsin; (2) atmospheric components: cloud cover (HCC, MCC, LCC), water vapor components (VP, RH), temperature (TEM), surface pressure (SP), ozone (Ozone), and aerosol optical depth (AOD); (3) topographical factors: altitude, aspect, slope, and latitude. Figure 2 illustrates the correlations between these factors and DIFRA. Variables with a correlation coefficient greater than 0.1 were retained, while those with a correlation below 0.1 (HCC, altitude, aspect, slope, and latitude) were grouped. Various combinations of these factors were analyzed using the Gini index in a random forest model to determine feature importance. The removal of irrelevant variables helps reduce model complexity and data noise, thereby enhancing training efficiency. As a result, HCC and latitude were excluded, leaving 12 variables for model training.

To construct the DIFRA estimation model, the sample data were randomly allocated, with 70% used as the training set for model training and 30% as the validation set for model accuracy validation. Figure 3 shows the main technical flowchart of this study. When building the model, the key random forest parameters selected were as follows: n_estimators, max_depth, min_samples_leaf, and max_features. The model was built within the threshold range, as shown in Table 2, using a grid search, and the optimal parameters were determined based on model accuracy. The final optimal parameters were as follows: n_estimators: 800; max_depth: 30; min_samples_leaf: 1; max_features: 3.

2.3.4. Model Validation Method

To ensure the accuracy of the DIFRA estimation, this study used the coefficient of determination (R²), mean absolute error (MAE), and root mean square error (RMSE) as the evaluation metrics for the model. The mathematical expressions are as follows:

1.: Coefficient of Determination (R²):

R^{2} = 1 - \frac{\sum_{i = 1}^{n} ({D I F R A}_{i} - {D I \hat{F} R A}_{i})^{2}}{\sum_{i = 1}^{n} ({D I F R A}_{i} - \bar{D I F R A})^{2}}

(9)

where

{D I F R A}_{i}

is the observed value,

{D I \hat{F} R A}_{i}

is the predicted value,

\bar{D I F R A}

is the mean of the observed values, and n is the sample size. The closer the

R^{2}

value is to 1, the stronger the model’s explanatory power.

2.: Mean Absolute Error (MAE):

M A E = \frac{1}{n} \sum_{i = 1}^{n} | {D F R A}_{i} - {D I \hat{F} R A}_{i} |

(10)

where MAE is the mean absolute error between the predicted values and the actual values. The smaller the MAE value, the higher the accuracy of the model.

3.: Root Mean Square Error (RMSE):

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} ({D I F R A}_{i} - {D I \hat{F} R A}_{i})^{2}}

(11)

where RMSE measures the square root of the mean squared differences between the predicted values and the actual values. The smaller the RMSE value, the smaller the model’s error.

3. Results and Discussion

3.1. Model Accuracy Validation

3.1.1. Overall Accuracy Validation of Diffuse Radiation Model

As shown in Figure 4, there is a strong correlation between the predicted DIFRA values and the observed DIFRA values. Specifically, the R² value reached 0.72, indicating that the model has strong predictive capability. The MAE was 35.99 W/m², and the RMSE was 50.46 W/m², suggesting that there were relatively large errors in some data points, especially those where the predicted values deviated significantly, which had a greater impact on overall model performance. From the fitted line, it can be observed that in the higher radiation value range, the predicted values exhibited systematic underestimation, indicating that the model performed less well in high-radiation areas. In these regions, the data points are more scattered, and the model’s error is larger. Overall, the estimation results reflect the spatial variation trend of DIFRA in Chongqing well and meet the requirements for spatiotemporal distribution characteristic analysis in this study.

3.1.2. Accuracy Analysis by Station

To further understand the model’s performance at each station, the model accuracy at various sites was analyzed in detail based on Figure 5. The stations shown in Figure 5a–d performed excellently, with R² values of 0.78, 0.74, and 0.76, respectively. Both MAE and RMSE are lower than the overall average, with the MAE and RMSE for Changshou station (Figure 5a) being 32.27 W/m² and 45.33 W/m², respectively, significantly lower than the overall level. The model fitting was good, indicating stable performance and strong predictive capability at these stations. For the stations shown in Figure 5e–i, the R² values were all above 0.7, indicating stable predictive capability with controllable errors. In Figure 5j–n, the R² values ranged from 0.67 to 0.69, with the MAE and RMSE showing larger fluctuations, indicating some limitations. Although there were some prediction deviations at individual stations, such as Fulin (Figure 5m) and Dazu (Figure 5n), these deviations mainly occurred in high-radiation-value regions and had limited impact on the overall trend. Overall, the model demonstrates good generalization ability and stability in Chongqing, meeting the requirements for spatiotemporal distribution characteristic analysis and providing reliable support for regional radiation characteristic studies.

3.1.3. Hourly Accuracy Analysis

Figure 6 displays the daily variation characteristics of DIFRA, showing a typical “bell-shaped” distribution: radiation levels are lower in the early morning and evening, reaching a peak at noon. The radiation is lowest during the 6:00 period, with the median below 50 W/m². As time progresses, radiation gradually increases, with the medians for 7:00 and 8:00 periods reaching approximately 100 W/m² and 150 W/m², respectively. From 10:00 to 14:00, radiation reaches its peak, with the median around 200 W/m² and the maximum approaching 400 W/m², reflecting the high variability of radiation around noon, which is significantly influenced by solar intensity and weather conditions. After 15:00, radiation levels begin to decline, but variability remains notable. By 17:00 and 18:00, radiation intensity rapidly decreases, with the medians dropping to 100 W/m² and 50 W/m², respectively, and the distribution range gradually narrows. The box plot shows that the model performed relatively consistently in the early morning (6:00–8:00) and evening (17:00–18:00), with a smaller median and narrower distribution range. However, during the peak period (10:00–14:00), when radiation variability increases, the model’s prediction error became relatively larger, likely influenced by local weather conditions and the complex terrain. Additionally, some outliers in specific periods (such as 8:00, 15:00, and 17:00) are associated with rapid local weather changes, which increase the difficulty of model prediction.

Overall, the model was able to capture the daily variation characteristics of DIFRA well, but its prediction stability during high-radiation periods requires further optimization.

3.1.4. Accuracy Analysis Under Clear-Sky and Cloudy-Sky Conditions

The model was constructed to provide an overall framework to cover all radiation periods under all weather conditions. To further explore the model’s stability under different weather conditions, cloud cover of 0.3 was used as the dividing threshold, where cloud cover below 0.3 was defined as clear-sky conditions, and cloud cover above 0.3 was defined as cloudy-sky conditions. This classification allowed for an in-depth evaluation of the model’s accuracy under different weather types. As observed in Figure 7, under clear-sky conditions (Figure 7a), the prediction interval was relatively wide, particularly in the high-radiation range, where the data points are more dispersed. Although the model achieved a coefficient of determination of R² = 0.69, the prediction error remained relatively large. This indicates that the model has greater uncertainty in predicting high-radiation levels under clear-sky conditions. In contrast, under cloudy conditions (Figure 7b), the prediction interval was significantly narrower, and the model’s fit improved to R² = 0.70. The data points are more concentrated, suggesting that the model’s radiation predictions were more stable, with a notable reduction in errors (MAE = 32.20 W/m², RMSE = 47.51 W/m²). This demonstrates that the model performs with higher accuracy under cloudy conditions, with the narrower prediction interval reflecting reduced uncertainty.

Overall, the model exhibited higher stability, lower uncertainty, and smaller errors under cloudy conditions, which may be attributed to the regulatory effect of clouds on the radiative transfer process. Clouds induce scattering, absorption, and reflection, making the DIFRA reaching the surface more uniform and stable, thereby reducing the uncertainty caused by fluctuations in direct radiation. In contrast, under clear-sky conditions, direct radiation dominates, and factors such as atmospheric water vapor and aerosols introduce significant local heterogeneity and instability in the radiative transfer process, resulting in increased prediction difficulty and errors.

3.1.5. Analysis of the Impact of Model Factors

As shown in Figure 8, the feature importance of each variable is provided, which was calculated using the Gini index method in the random forest model. Based on this analysis, SEA(sin) achieved the highest importance value of 0.319, making it the dominant factor under all radiation conditions. This indicates that SEA(sin) contributes most significantly to the model’s predictive power and plays a crucial role in accurately estimating DIFRA under different radiation conditions. The second most important factor was MCC, with an importance of 0.106, exceeding 0.1, suggesting that MCC plays a significant role in predicting DIFRA. Other factors, including TEM, LCC, ozone, RH, AOD, and VP, had relatively similar importance, indicating that their influence on model performance was balanced. Topographic factors contributed the least to the model. It is worth noting that although the contributions of altitude and slope were relatively low, removing these variables during the model construction process decreased prediction accuracy. This indicates that the inclusion of topographic factors enhances the detailed estimation of DIFRA. Based on feature importance, the top eight ranked factors were selected to further analyze their relationship with the radiation errors (Figure 9).

Figure 9 shows the variation trends of different factors with respect to the observed radiation values, along with the corresponding model errors in each radiation value region. In the low radiation region (<100 W/m²), the MAE of the model remains at a low level and the overall performance is stable, indicating that the model can effectively capture the DIFRA characteristics under low radiation conditions. At this stage, RH (Figure 9f) and VP (Figure 9g) have a weak driving effect on the errors, while MCC (Figure 9b) and LCC (Figure 9d) show some correlation, and SEA(sin) (Figure 9a) has little influence on the errors. In the medium-radiation region (100–200 W/m²), the model error increases slightly but remains stable overall. The driving effects of RH and VP become more significant and become key factors, while the influence of MCC and LCC gradually weakens. The change in AOD (Figure 9h) becomes steady and has a minimal impact on the errors. SEA(sin) increases at a faster rate, gradually becoming an important driving factor for the errors. In the high-radiation region (>200 W/m²), the model error increases significantly, and the variability also intensifies, with SEA(sin) becoming the dominant driving factor and having the most significant impact on the errors. At the same time, the influence of RH and VP further increases, while the correlation of MCC and LCC significantly weakens, suggesting that the model has some robustness against changes in cloud cover. Moreover, the volatility of AOD increases, exhibiting complex nonlinear effects. The continued increases in TEM (Figure 9c) and the ozone (Figure 9e) also contribute to the growth of the errors.

Overall, SEA(sin) is the core driving factor of errors in the high-radiation region, while RH and VP exhibit significant influence in the medium- to high-radiation regions. In contrast, the roles of MCC and LCC gradually weaken under high-radiation conditions, with the model showing good stability in low- and medium-radiation conditions. Future optimization should focus on the combined effects of SEA(sin), RH, and VP, while also exploring the nonlinear effects of AOD and the ozone to further improve the prediction accuracy and stability of the model in high-radiation regions.

3.2. Spatiotemporal Distribution Characteristics of Diffuse Radiation

By integrating Figure 10 and Figure 11, we systematically analyzed the spatiotemporal variations of DIFRA in Chongqing and its relationship with slope aspect, interpreting these patterns from the perspective of radiative energy transfer and atmospheric scattering mechanisms to enhance the physical credibility of the results.

On a temporal scale, the distribution of diffuse radiation is regulated by SEA, atmospheric scattering effects, and total solar radiation intensity, exhibiting distinct diurnal and seasonal variations. In summer, a higher SEA increases total solar radiation, while higher atmospheric water vapor and aerosol concentrations enhance Mie scattering, which increases DIFRA. At 14:00, DIFRA reaches its peak (Figure 10b,e), with some regions exceeding 150 W/m². In winter, as SEA decreases, total solar radiation is significantly reduced. Although Rayleigh scattering becomes more pronounced at shorter wavelengths, the overall energy input for diffuse radiation diminishes, leading to lower DIFRA intensity (Figure 10h,k). In the morning and evening, when the sun is near the horizon, the increased atmospheric path length enhances scattering effects. However, due to the overall reduction in total solar radiation, the actual amount of diffuse radiation received remains low, resulting in a significant drop in DIFRA levels at 08:00 and 18:00 (Figure 10c,j).

On a spatial scale, the distribution of diffuse radiation is influenced by topography and slope aspect, with varying characteristics across different seasons and times of the day. Figure 11a illustrates the impact of slope aspect on DIFRA at different times of the day (08:00, 12:00, 14:00, and 18:00), revealing that DIFRA peaks shift across slopes as the solar azimuth angle changes. Figure 11b further reveals the seasonal impact of slope aspect on diffuse radiation: In summer, due to the high SEA, southeast (SE) and west (W)-facing slopes generally receive more diffuse radiation, whereas north-facing (N) slopes exhibit lower DIFRA values. This may be attributed to local terrain reflectivity effects, leading to spatial variations in DIFRA reception across different slopes. In winter, lower total solar radiation weakens the overall DIFRA levels, while enhanced Rayleigh scattering results in a more uniform angular distribution of sky radiation, thereby reducing the differences in DIFRA across different slope aspects (Figure 11b).

The spatiotemporal distribution of diffuse radiation in Chongqing is jointly influenced by the solar elevation angle, slope aspect, atmospheric scattering mechanisms, and total solar radiation intensity. In summer, DIFRA levels are higher, with greater variability across slope aspects. In winter, the overall DIFRA levels decrease, and differences among slope aspects become less pronounced. These physical mechanisms collectively shape the spatiotemporal characteristics of diffuse radiation, providing essential scientific insights for solar energy resource assessment and photovoltaic site selection in complex terrain regions.

3.3. Analysis of Extreme Weather Changes

To evaluate the model’s predictive capability under rapidly changing atmospheric conditions, this study selected 3 days with drastic cloud cover variations as extreme weather cases and analyzed the model’s performance. Figure 12 shows the observed DIFRA, model predictions, and cloud cover variations over time.

In low to moderate DIFRA ranges, the model effectively follows the trend in the observed values, demonstrating strong fitting ability. However, during periods of rapid cloud cover transitions, the predictions exhibit a noticeable lag, indicating that the model may struggle to respond instantaneously to abrupt atmospheric changes. In high DIFRA ranges, the prediction errors increase significantly, with a general tendency to underestimate the actual values. This underestimation is particularly evident after a sudden drop in cloud cover, when the observed DIFRA rises rapidly, yet the model fails to capture this sharp transition, further exacerbating prediction deviations. Additionally, prediction errors are most pronounced during periods of drastic cloud cover changes, suggesting that the model’s adaptability remains limited when dealing with rapid atmospheric fluctuations.

The model errors observed under extreme weather conditions can be further explained using radiative transfer theory. DIFRA is primarily influenced by scattering, absorption, and transmission processes in the atmosphere. Under stable atmospheric conditions, radiative transfer remains relatively uniform, allowing the model to effectively capture DIFRA variations. However, when cloud cover changes rapidly, the redistribution of radiative flux significantly alters the ratio of direct to diffuse radiation, leading to sudden variations in atmospheric optical properties, such as AOD and RH.

During cloud dissipation, the abrupt reduction in cloud cover leads to an increase in direct radiation, which enhances surface DIFRA through multiple scattering mechanisms. However, because the model is trained on averaged meteorological conditions, it struggles to capture instantaneous changes caused by cloud cover transitions. This explains the prediction lag and the systematic underestimation of high DIFRA values, resulting in increased forecast uncertainty.

3.4. Comparison and Analysis of the Model Performance

In this study, we compared the performance of different models for hourly DIFRA prediction using the same dataset to ensure a fair evaluation. The results, as shown in Table 3, indicate that the RF model performed best, achieving the highest R² = 0.7212 and the lowest prediction errors (RMSE = 50.4582 W/m², MAE = 35.9867 W/m²). This suggests that RF effectively captures the nonlinear characteristics of the data, significantly improving prediction accuracy. In contrast, the Liu and Jordan model, as a traditional empirical model, performed the worst, with R² = 0.2829 and considerably higher errors (RMSE = 81.5813 W/m², MAE = 62.8531 W/m²). Its prediction errors are much higher than those of machine learning models, indicating that empirical formulas are difficult to apply under dynamically changing meteorological conditions.

With the same dataset, the ANN and CNN showed some improvement over the Liu and Jordan model, but they still did not surpass RF. The ANN and CNN achieved R² = 0.5772 and R² = 0.5751, with RMSE values of 62.1447 W/m² and 62.2959 W/m², respectively. While they outperformed the GRNN, they remained slightly inferior to RF. Meanwhile, the GRNN showed the lowest fitting performance, with R² = 0.3127 and RMSE = 79.23 W/m², indicating poor adaptability.

In summary, under the same dataset, RF remained the best-performing model, outperforming the ANN, CNN, and GRNN, while the empirical model exhibited the highest prediction errors and the lowest applicability.

Although RF has relatively lower computational efficiency in the modeling phase, as it requires constructing multiple decision trees and performing extensive splitting, leading to higher computational costs—particularly for large datasets—the spatialization speed of RF is not slow. Furthermore, its superior prediction accuracy makes it a viable option for real-time DIFRA forecasting. Additionally, leveraging cloud computing or GPU acceleration can further enhance RF’s training efficiency, allowing it to maintain high accuracy while improving computational performance, making it more suitable for large-scale meteorological data prediction. Future research can explore integrating ensemble learning or time-series models (e.g., LSTM or Transformer) to enhance both computational efficiency and predictive accuracy, meeting real application needs.

3.5. Limitations and Further Research

Although some progress has been made in DIFRA modeling in this study, there are still limitations of insufficient data accuracy, model applicability, and generalization.

First, there are limitations in data precision. The ground observation data for DIFRA only cover June to December 2016, which does not fully capture the annual meteorological variations. As a result, there is a lack of direct observational support during spring and early winter, potentially leading to the somewhat insufficient simulation of seasonal changes in DIFRA. In addition, the shortcomings in the temporal and spatial resolutions of the atmospheric parameters used as model inputs affect the model’s precision and response to extreme weather. Specifically, the relatively low spatial resolution of the ERA5 cloud data makes it difficult to capture local cloud variations, especially in mountainous areas with complex topography, which limits the ability of the model to respond to short-term variations in DIFRA. In addition, the current aerosol data are on a daily scale, and although they have a spatial resolution of 0.01°, they still do not fully capture the rapid changes in the aerosol concentration, which has a significant impact on DIFRA.

Secondly, the DIFRA model was primarily trained on the mountainous terrain in the Chongqing area. While it demonstrated good stability and accuracy in that region, its applicability and generalization capability in other geographical environments, such as plains, deserts, or coastlands, still need further validation. The local atmospheric circulation effects induced by complex terrains, such as valley and slope winds, can affect the spatial distribution of aerosols and water vapor, thereby altering the observed DIFRA. These micrometeorological factors have not been sufficiently explored in the current study.

To address these issues, future research will focus on improving data precision and enhancing the model’s generalization capability. On the one hand, the study will extend the temporal coverage of ground observation data to obtain a complete annual record of DIFRA, including spring and early winter, and incorporate satellite data with higher spatiotemporal resolution (such as Himawari-8, CALIPSO, etc.) along with data assimilation techniques, to improve the reliability of the input variables, thereby enhancing the model’s response to short-term and local environmental changes. On the other hand, to further validate and enhance the generalization capability of the RF model across different terrain and climatic conditions, the future research plan will implement a more detailed roadmap for cross-regional testing and adaptation. Specifically, this roadmap includes the following: (1) Data Collection and Quality Control: Comprehensive, high-quality datasets will be gathered from representative regions, such as plains, deserts, coastal areas, and high-latitude zones. These datasets will include not only ground observation data (covering variables like temperature, humidity, wind speed, pressure, etc.) but also high-resolution satellite data (such as remote sensing images, cloud images, and radiation data), thereby ensuring that the rich and diverse environmental conditions of each region are well captured. Strict data standardization and quality control procedures will be enforced to guarantee the reliability, consistency, and sufficient spatiotemporal resolution required for model training and validation. (2) Performance Evaluation: Once the data are collected, the RF model’s performance will be systematically evaluated in each region. This will be accomplished by comparing the model’s predictions with local observational data and utilizing statistical metrics, such as the root mean square errors and correlation coefficients, to quantify the model’s accuracy and stability. This step is essential for identifying and quantifying any region-specific biases or limitations, thus providing a scientific basis for subsequent model adaptations. (3) Model Optimization and Hybrid Framework Development: Based on the specific climatic and topographic characteristics of different regions, advanced techniques, such as transfer learning, domain adaptation, and parameter tuning, will be employed to optimize the RF model. Regional micrometeorological factors (e.g., local atmospheric circulation and terrain-induced shading effects) will be integrated as auxiliary variables to better capture the local environmental influences. Moreover, the optimized RF model will be combined with other data-driven approaches (such as deep learning and ensemble learning) and traditional physical models to construct a hybrid prediction framework capable of accurately capturing the complexities of nonlinear atmospheric processes and regional micrometeorological effects. (4) Enhanced Temporal Resolution: High-resolution radiation observation data at minute-level or hour-level intervals will be integrated with regional meteorological models (for example, WRF-Chem) to further refine the simulation of diurnal variations and localized atmospheric dynamics. Advanced data assimilation techniques will be used to closely couple real-time observations with model predictions, ensuring the model can effectively capture short-term dynamic changes in the local environment.

Through this phased and targeted approach, the adaptability and robustness of the RF model in simulating DIFRA across varied geographical settings will be significantly improved, thereby strengthening its broader applicability in solar energy resource assessment, photovoltaic power optimization, and climate change analysis.

4. Conclusions

This study, based on remote sensing data, topographic data, meteorological reanalysis materials, and ground-based data from radiation observation stations in Chongqing, developed an RF model for estimating DIFRA and simulated the spatial distribution of DIFRA in Chongqing. The model performance was analyzed in depth, and the following three key conclusions were drawn:

The model demonstrates high accuracy in estimating DIFRA, with an overall R² of 0.72, MAE of 35.99 W/m², and RMSE of 50.46 W/m². In low- and medium-radiation regions, the model performs stably with relatively small prediction errors. However, in high-radiation regions, the model tends to underestimate radiation levels, exhibiting systematic bias.
The model shows significant differences in performance under different weather conditions. Under clear-sky conditions (cloud cover < 0.3), the model’s R² is 0.69, but errors are larger in high-radiation areas. Under cloudy-sky conditions (cloud cover ≥ 0.3), the model performs better, with R² increasing to 0.72, and errors are significantly reduced, showing higher stability and reliability.
In the Chongqing region, the spatiotemporal distribution of DIFRA is jointly regulated by SEA, aspect, and atmospheric scattering mechanisms. In summer, the higher solar elevation and Mie scattering effect enhance DIFRA on flat areas and sun-facing slopes, while in winter, the lower solar elevation leads to an overall reduction in DIFRA levels and diminished slope differences.
The spatiotemporal distribution of DIFRA is significantly influenced by SEA and topographic factors. In summer, with a higher SEA, radiation intensity is higher in flat areas and sun-facing aspects. In winter, with a lower SEA, radiation differences among regions are reduced. Future research can further explore the effects of topographic shading and reflection in complex terrain areas on radiation distribution, thus improving the understanding of the micro-distribution patterns of DIFRA.

Overall, this study provides an effective model for estimating DIFRA in Chongqing and offers important guidance for future research directions. By further optimizing the model, integrating more high-resolution data, expanding regional applications, and exploring the effects of terrain, it is expected that this model will demonstrate greater potential under different climate and topographic conditions, advancing the development of solar energy resource assessment.

Author Contributions

Conceptualization, P.W. and Y.H.; methodology, P.W. and Y.H.; software, P.W. and J.W.; validation, P.W.; data curation, P.W. and Z.G.; writing—original draft preparation, P.W.; writing—review and editing, Y.H.; visualization, P.W. and Y.H.; supervision, Y.H. and C.Z.; project administration, Y.H. and C.Z.; funding acquisition, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 41971298.

Data Availability Statement

DEM Data: the SRTM data were obtained from NASA (https://www.earthdata.nasa.gov/ (accessed on 14 August 2024)). ERA5 Dataset: the ERA5-Land hourly data from 1950 to the present and the ERA5 hourly data on single levels from 1940 to the present were obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) (https://www.ecmwf.int/en/forecasts/datasets (accessed on 14 August 2024)). Ozone Data: the China Surface Hourly Ozone Dataset (HrSOD) (2005–2020) was obtained from the National Cryosphere Desert Data Center (http://www.ncdc.ac.cn (accessed on 14 August 2024)). Aerosol Data: the high-resolution aerosol optical depth (AOD) and PM2.5 concentration data (LGHAP v2) were obtained from Zenodo (https://zenodo.org/communities/ecnu_lghap (accessed on 14 August 2024)).

Acknowledgments

The authors would like to thank the respective platforms for their support in providing the data: NASA Earthdata (https://www.earthdata.nasa.gov/ (accessed on 14 August 2024)), ECMWF (https://www.ecmwf.int/en/forecasts/datasets (accessed on 14 August 2024)), the National Cryosphere Desert Data Center (http://www.ncdc.ac.cn (accessed on 14 August 2024)), and Zenodo (https://zenodo.org/communities/ecnu_lghap (accessed on 14 August 2024)).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jahani, B.; Dinpashoh, Y.; Nafchi, A.R. Evaluation and development of empirical models for estimating daily solar radiation. Renew. Sustain. Energy Rev. 2017, 73, 878–891. [Google Scholar] [CrossRef]
Liu, P.; Tong, X.; Zhang, J.; Meng, P.; Li, J.; Zhang, J. Estimation of half-hourly diffuse solar radiation over a mixed plantation in north China. Renew. Energy 2020, 149, 1360–1369. [Google Scholar] [CrossRef]
Bailek, N.; Bouchouicha, K.; Al-Mostafa, Z.; El-Shimy, M.; Aoun, N.; Slimani, A.; Al-Shehri, S. A new empirical model for forecasting the diffuse solar radiation over Sahara in the Algerian Big South. Renew. Energy 2018, 117, 530–537. [Google Scholar] [CrossRef]
Feng, Y.; Li, Y. Estimated spatiotemporal variability of total, direct and diffuse solar radiation across China during 1958–2016. Int. J. Climatol. 2018, 38, 4395–4404. [Google Scholar] [CrossRef]
Khorasanizadeh, H.; Mohammadi, K. Diffuse solar radiation on a horizontal surface: Reviewing and categorizing the empirical models. Renew. Sustain. Energy Rev. 2016, 53, 338–362. [Google Scholar] [CrossRef]
Gürel, A.E.; Ağbulut, Ü.; Bakır, H.; Ergün, A.; Yıldız, G. A state of art review on estimation of solar radiation with various models. Heliyon 2023, 9, e13167. [Google Scholar] [CrossRef] [PubMed]
Paulescu, E.; Blaga, R. Regression models for hourly diffuse solar radiation. Sol. Energy 2016, 125, 111–124. [Google Scholar] [CrossRef]
Boland, J.; Ridley, B.; Brown, B. Models of diffuse solar radiation. Renew. Energy 2008, 33, 575–584. [Google Scholar] [CrossRef]
Bird, R.E.; Hulstrom, R.L. Direct Insolation Models; Solar Energy Research Institute: Golden, CO, USA, 1980. [Google Scholar]
Bird, R.E. A simple, solar spectral model for direct-normal and diffuse horizontal irradiance. Sol. Energy 1984, 32, 461–471. [Google Scholar] [CrossRef]
Iqbal, M. An introduction to solar radiation. Space Sci. Rev. 1983, 39, 387–390. [Google Scholar]
Gueymard, C.A. REST2: High-performance solar radiation model for cloudless-sky irradiance, illuminance, and photosynthetically active radiation–Validation with a benchmark dataset. Sol. Energy 2008, 82, 272–285. [Google Scholar] [CrossRef]
Ningombam, S.S.; Bagare, S.P.; Khatri, P.; Sohn, B.J.; Song, H.J. Estimation of aerosol radiative forcing over an aged-background aerosol feature during advection and non-advection events using a ground-based data obtained from a Prede Skyradiometer observation. Atmos. Res. 2015, 164–165, 76–83. [Google Scholar] [CrossRef]
Kambezidis, H.D.; Psiloglou, B.E.; Karagiannis, D.; Dumka, U.C.; Kaskaoutis, D.G. Meteorological Radiation Model (MRM v6.1): Improvements in diffuse radiation estimates and a new approach for implementation of cloud products. Renew. Sustain. Energy Rev. 2017, 74, 616–637. [Google Scholar] [CrossRef]
Berk, A.; Bernstein, L.S.; Anderson, G.P.; Acharya, P.K.; Robertson, D.C.; Chetwynd, J.H.; Adler-, S.M. Golden MODTRAN Cloud and Multiple Scattering Upgrades with Application to AVIRIS. Remote Sens. Environ. 1998, 65, 367–375. [Google Scholar] [CrossRef]
Gueymard, C.A. Parameterized transmittance model for direct beam and circumsolar spectral irradiance. Sol. Energy 2001, 71, 325–346. [Google Scholar] [CrossRef]
Carrasco-Hernandez, R.; Smedley, A.R.D.; Webb, A.R. Fast calculations of the spectral diffuse-to-global ratios for approximating spectral irradiance at the street canyon level. Theor. Appl. Climatol. 2016, 124, 1065–1077. [Google Scholar] [CrossRef]
Valenzuela, A.; Arola, A.; Antón, M.; Quirantes, A.; Alados-Arboledas, L. Black carbon radiative forcing derived from AERONET measurements and models over an urban location in the southeastern Iberian Peninsula. Atmos. Res. 2017, 191, 44–56. [Google Scholar] [CrossRef]
Tolabi, H.B.; Moradi, M.H.; Ayob, S.B.M. A review on classification and comparison of different models in solar radiation estimation. Int. J. Energy Res. 2014, 38, 689–701. [Google Scholar] [CrossRef]
Liu, B.Y.H.; Jordan, R.C. The interrelationship and characteristic distribution of direct, diffuse and total solar radiation. Sol. Energy 1960, 4, 1–19. [Google Scholar] [CrossRef]
Spencer, J.W. A comparison of methods for estimating hourly diffuse solar radiation from global solar radiation. Sol. Energy 1982, 29, 19–32. [Google Scholar] [CrossRef]
Jamil, B.; Akhtar, N. Comparative analysis of diffuse solar radiation models based on sky-clearness index and sunshine period for humid-subtropical climatic region of India: A case study. Renew. Sustain. Energy Rev. 2017, 78, 329–355. [Google Scholar] [CrossRef]
Jamil, B.; Akhtar, N. Comparison of empirical models to estimate monthly mean diffuse solar radiation from measured data: Case study for humid-subtropical climatic region of India. Renew. Sustain. Energy Rev. 2017, 77, 1326–1342. [Google Scholar] [CrossRef]
Vernet, A.; Fabregat, A. Evaluation of Empirical Daily Solar Radiation Models for the Northeast Coast of the Iberian Peninsula. Energies 2023, 16, 2560. [Google Scholar] [CrossRef]
Mubiru, J.; Banda, E.J.K.B. Performance of empirical correlations for predicting monthly mean daily diffuse solar radiation values at Kampala, Uganda. Theor. Appl. Climatol. 2007, 88, 127–131. [Google Scholar] [CrossRef]
Wang, H.; Sun, F.; Wang, T.; Liu, W. Estimation of daily and monthly diffuse radiation from measurements of global solar radiation a case study across China. Renew. Energy 2018, 126, 226–241. [Google Scholar] [CrossRef]
Voyant, C.; Notton, G.; Kalogirou, S.; Nivet, M.; Paoli, C.; Motte, F.; Fouilloy, A. Machine learning methods for solar radiation forecasting: A review. Renew. Energy 2017, 105, 569–582. [Google Scholar] [CrossRef]
Ali, M.A.; Elsayed, A.; Elkabani, I.; Akrami, M.; Youssef, M.E.; Hassan, G.E. Artificial Intelligence-Based Improvement of Empirical Methods for Accurate Global Solar Radiation Forecast: Development and Comparative Analysis. Energies 2024, 17, 4302. [Google Scholar] [CrossRef]
Berrizbeitia, S.E.; Gago, E.J.; Muneer, T. Empirical Models for the Estimation of Solar Sky-Diffuse Radiation. A Review and Experimental Analysis. Energies 2020, 13, 701. [Google Scholar] [CrossRef]
Demircan, C.; Bayrakçı, H.C.; Keçebaş, A. Machine learning-based improvement of empiric models for an accurate estimating process of global solar radiation. Sustain. Energy Technol. Assess. 2020, 37, 100574. [Google Scholar] [CrossRef]
Barancsuk, L.; Groma, V.; Günter, D.; Osán, J.; Hartmann, B. Estimation of Solar Irradiance Using a Neural Network Based on the Combination of Sky Camera Images and Meteorological Data. Energies 2024, 17, 438. [Google Scholar] [CrossRef]
Wei, C.; Yang, Y. A Global Solar Radiation Forecasting System Using Combined Supervised and Unsupervised Learning Models. Energies 2023, 16, 7693. [Google Scholar] [CrossRef]
Mustafa, J.; Husain, S.; Alqaed, S.; Khan, U.A.; Jamil, B. Performance of Two Variable Machine Learning Models to Forecast Monthly Mean Diffuse Solar Radiation across India under Various Climate Zones. Energies 2022, 15, 7851. [Google Scholar] [CrossRef]
Lou, S.; Li, D.H.W.; Lam, J.C.; Chan, W.W.H. Prediction of diffuse solar irradiance using machine learning and multivariable regression. Appl. Energy 2016, 181, 367–374. [Google Scholar] [CrossRef]
Feng, Y.; Cui, N.; Zhang, Q.; Zhao, L.; Gong, D. Comparison of artificial intelligence and empirical models for estimation of daily diffuse solar radiation in North China Plain. Int. J. Hydrog. Energy 2017, 42, 14418–14428. [Google Scholar] [CrossRef]
Ramírez-Rivera, F.A.; Guerrero-Rodríguez, N.F. Ensemble Learning Algorithms for Solar Radiation Prediction in Santo Domingo: Measurements and Evaluation. Sustainability 2024, 16, 8015. [Google Scholar] [CrossRef]
Lu, Y.; Li, X.; Wu, S.; Wang, Y.; Qiu, W.; Chen, D.; Li, Y. SolarGAN for Meso-Level Solar Radiation Prediction at the Urban Scale: A Case Study in Boston. Remote Sens. 2024, 16, 4524. [Google Scholar] [CrossRef]
Gao, Y.; Miyata, S.; Matsunami, Y.; Akashi, Y. Spatio-temporal interpretable neural network for solar irradiation prediction using transformer. Energy Build. 2023, 297, 113461. [Google Scholar] [CrossRef]
He, Z.; Zhang, X.; Li, M.; Wang, S.; Xiao, G. A novel solar radiation forecasting model based on time series imaging and bidirectional long short-term memory network. Energy Sci. Eng. 2024, 12, 4876–4893. [Google Scholar] [CrossRef]
Zhang, W.; Liu, D.; Tian, H.; Pan, N.; Yang, R.; Tang, W.; Yang, J.; Lu, F.; Dayananda, B.; Mei, H.; et al. Recurrent mapping of Hourly Surface Ozone Data (HrSOD) across China during 2005–2020 for ecosystem and human health risk assessment. Earth Syst. Sci. Data Discuss. 2022, 2022, 1–36. [Google Scholar]
Bai, K.; Li, K.; Shao, L.; Li, X.; Liu, C.; Li, Z.; Ma, M.; Han, D.; Sun, Y.; Zheng, Z.; et al. LGHAP v2: A global gap-free aerosol optical depth and PM2.5 concentration dataset since 2000 derived via big Earth data analytics. Earth Syst. Sci. Data 2024, 16, 2425–2448. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]

Figure 1. Study area map.

Figure 2. Variable heatmap.

Figure 3. Technical flowchart.

Figure 4. Overall scatter plot of the DIFRA model. (The red-blue gradient used in the scatter plot represents density).

Figure 5. Scatter plots of DIFRA model by station: (a) Changshou station, (b) Qijiang station, (c) Wanzhou station, (d) Qianjiang station, (e) Jiangjin station, (f) Fengjie station, (g) Shapingba station, (h) Fengdu station, (i) Nanchuan station, (j) Hechuan station, (k) Youyang station, (l) Liangping station, (m) Fulin station, (n) Dazu station. (The red-blue gradient used in the scatter plot represents density).

Figure 6. Hourly DIFRA box plot.

Figure 7. Scatter plot of DIFRA model under different weather conditions: (a) clear sky and (b) cloudy sky. (The red-blue gradient used in the scatter plot represents density).

Figure 8. Feature importance chart.

Figure 9. Correlations among factors and DIFRA with MAE chart: (a) SEA(sin), (b) MCC, (c) TEM, (d) LCC, (e) ozone, (f) RH, (g) VP, and (h) AOD.

Figure 10. Spatial distribution map of DIFRA: (a) 2016.06.20 8:00, (b) 2016.06.20 14:00, (c) 2016.06.20 18:00, (d) 2016.08.20 8:00, (e) 2016.08.20 14:00, (f) 2016.08.20 18:00, (g) 2016.11.20 8:00, (h) 2016.11.20 14:00, (i) 2016.11.20 18:00, (j)2016.12.20 8:00, (k) 2016.12.20 14:00, (l) 2016.12.20 18:00 (Different figures represent the distribution of scattered radiation at different time points).

Figure 11. Rose diagram of aspect distribution of diffuse radiation (DIFRA): (a) diurnal variation; (b) seasonal variation.

Figure 12. Observed and predicted DIFRA variations under rapid cloud cover changes.

Table 1. Data table.

Production	Dataset	Spatial Resolution	Temporal Resolution
ERA5 Land hourly data from 1950 to present	2 m dewpoint temperature	0.1° × 0.1°	Hourly
	2 m temperature
	Surface pressure
ERA5 hourly data on single levels from 1940 to present	Medium cloud cover	0.25° × 0.25°	Hourly
ERA5 hourly data on single levels from 1940 to present	Low cloud cover	0.25° × 0.25°	Hourly
Hourly Surface Ozone (HrSOD) Dataset in China (2005–2020)	Ozone	0.1° × 0.1°	Hourly
A long-term, gap-free, high-resolution air pollutant concentration dataset	AOD	0.01° × 0.01°	Daily
STRM DEM	DEM	30 m	-

Table 2. Grid search parameters.

Parameter	Threshold	Step Size
n_estimators	0–1500	50
max_depth	0–50	5
min_samples_leaf	0–5	1
max_features	0–10	1

Table 3. Model accuracy errors.

Model	R²	RMES (W/m²)	MAE (W/m²)
RF	0.7212	50.4582	35.9867
Liu and Jordan	0.2829	81.5813	62.8531
ANN	0.5772	62.1447	45.6849
GRNN	0.3127	79.23	60.1999
CNN	0.57515	62.2959	44.6636

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wan, P.; He, Y.; Zheng, C.; Wen, J.; Gu, Z. Estimation of Solar Diffuse Radiation in Chongqing Based on Random Forest. Energies 2025, 18, 836. https://doi.org/10.3390/en18040836

AMA Style

Wan P, He Y, Zheng C, Wen J, Gu Z. Estimation of Solar Diffuse Radiation in Chongqing Based on Random Forest. Energies. 2025; 18(4):836. https://doi.org/10.3390/en18040836

Chicago/Turabian Style

Wan, Peihan, Yongjian He, Chaoyu Zheng, Jiaxiong Wen, and Zhuting Gu. 2025. "Estimation of Solar Diffuse Radiation in Chongqing Based on Random Forest" Energies 18, no. 4: 836. https://doi.org/10.3390/en18040836

APA Style

Wan, P., He, Y., Zheng, C., Wen, J., & Gu, Z. (2025). Estimation of Solar Diffuse Radiation in Chongqing Based on Random Forest. Energies, 18(4), 836. https://doi.org/10.3390/en18040836

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Estimation of Solar Diffuse Radiation in Chongqing Based on Random Forest

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data and Preprocessing

2.3. Methods

2.3.1. Grid-Based Solar Elevation Angle Calculation

2.3.2. Random Forest Algorithm

2.3.3. Model Development

2.3.4. Model Validation Method

3. Results and Discussion

3.1. Model Accuracy Validation

3.1.1. Overall Accuracy Validation of Diffuse Radiation Model

3.1.2. Accuracy Analysis by Station

3.1.3. Hourly Accuracy Analysis

3.1.4. Accuracy Analysis Under Clear-Sky and Cloudy-Sky Conditions

3.1.5. Analysis of the Impact of Model Factors

3.2. Spatiotemporal Distribution Characteristics of Diffuse Radiation

3.3. Analysis of Extreme Weather Changes

3.4. Comparison and Analysis of the Model Performance

3.5. Limitations and Further Research

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI