1. Introduction
Solar diffuse radiation (DIFRA) is a key component of solar radiation reaching the Earth’s surface, and its accurate estimation is crucial for climate change research, solar energy resource assessment, ecological environment protection, and agricultural management [
1,
2,
3]. DIFRA is generated by the scattering of solar radiation by gasses, aerosols, and cloud layers in the atmosphere [
4]. This process is influenced by various factors, including the solar elevation angle (SEA), atmospheric composition, and surface characteristics [
5,
6]. The complexity of these factors presents significant challenges in estimating DIFRA. Traditional solar radiation estimation methods can be categorized into physical models and empirical models [
7].
Physical models mainly include two types: parameterized models and atmospheric radiation transfer-based models. Parameterized models calculate solar shortwave radiation under various weather conditions using fixed meteorological parameters through numerous physical formulas and mathematical computations [
8]. Examples include the Bird model [
9,
10], the Iqbal model [
11], and the REST2 model [
12], among others. The other type of physical model is based on atmospheric radiation transfer theory, with models such as RSTAR [
13], MRM [
14], MODTRAN [
15], SMATRS2 [
16,
17], and SBDART [
18], which have been widely used in radiation-related studies by scholars both domestically and internationally. Although physical models account for complex interactions among various factors, the difficulty in acquiring parameters and their complex structure inevitably leads to issues such as poor model portability and low general applicability.
Empirical models estimate DIFRA by establishing the empirical relationships among conventional meteorological factors, such as sunshine hours, radiation, temperature, water vapor, cloud cover, and surface observed radiation [
5,
19]. The two most commonly used models are those based on the DIFRA ratio (Kd) and the clear-sky index (Kt). The relationship between Kd and Kt was first established by Liu and Jordan [
20] in 1960. Spencer et al. [
21] proposed a modified model based on latitude and Kd, using observations from five stations in Australia. Based on total solar radiation data, Jamil and Akhtar [
22,
23] developed empirical models to estimate the monthly mean DIFRA for the subtropical climate of India. Vernet and Fabregat [
24] evaluated 23 existing empirical models for estimating daily solar radiation on the Northeast Coast of the Iberian Peninsula and concluded that models based on cloud cover performed better. Mubiru and Banda [
25] compared various empirical models and concluded that models based on Kd and relative sunshine hours provided better estimation results. Wang et al. [
26] constructed 97 multivariable DIFRA estimation models using over 20 meteorological and geographic parameters, including the clear-sky index, atmospheric quality, relative humidity, and daily temperature. Empirical models are simple to calculate, but they rely heavily on ground-based meteorological station data, and in complex mountainous areas, where station density is relatively sparse, this limits their ability to accurately capture the spatial distribution characteristics of radiation [
27].
With the rapid development of remote sensing and computer technology, machine learning has demonstrated strong capabilities in dealing with nonlinear problems, providing more flexible and efficient solutions than traditional methods [
28]. Machine learning models can predict DIFRA by capturing the nonlinear relationships among multiple meteorological factors, geographical factors, and DIFRA [
29].
Demircan et al. [
30] improved the Angstrom–Prescott model using the Artificial Bee Colony algorithm, significantly reducing the error in radiation estimation. Barancsuk et al. [
31] used sky camera images combined with convolutional neural networks (CNNs) to estimate minute-level DIFRA, with an average R
2 of 0.87. Wei and Yang [
32] developed a combined supervised and unsupervised learning model based on k-means clustering and fuzzy C-means clustering-based models. They established a real-time solar radiation forecasting system capable of predicting the next 12 h of solar radiation updates. Mustafa et al. [
33] used both random forest and k-nearest neighbor models to estimate DIFRA in different climatic regions of India, achieving R
2 values above 0.95, with good estimation results. Lou et al. [
34] applied boosted regression trees (BRTs) to estimate DIFRA in Hong Kong and performed sensitivity analysis on meteorological variables, finding that solar elevation, temperature, and cloud cover were crucial to the estimation results. Feng et al. [
35] employed an extreme learning machine (ELM), a genetic algorithm-optimized backpropagation neural network (GANN), random forest (RF), and a generalized regression neural network (GRNN) to estimate daily DIFRA in the North China Plain. All four models outperformed the Iqbal physical model. Ramírez-Rivera et al. [
36] investigated the application of ensemble learning algorithms (ELAs) for solar radiation prediction in Santo Domingo, developing a high-resolution, minute-level solar radiation model. Lu et al. [
37] leveraged a generative adversarial network (GAN) model to estimate the radiation distribution in the Boston metropolitan area, generating a 2 m resolution spatial solar radiation distribution map with enhanced accuracy. Gao et al. [
38] accounted for both temporal and spatial dependencies in solar radiation prediction by unfolding sequences and applying a Transformer model, demonstrating substantial performance gains over conventional neural networks. He et al. [
39] proposed a hybrid BiLSTM-Transformer model, which effectively extracted deep features and captured long-term dependencies, significantly improving the accuracy of long-term solar radiation forecasts compared to traditional models.
Due to their high accuracy, flexible input variable combinations, and wide applicability, machine learning models have gradually become a widely used and efficient method for estimating solar radiation components. However, most radiation studies focus primarily on the estimation and analysis of total and direct solar shortwave radiation, with relatively few studies specifically modeling DIFRA components. Most of these studies concentrate on simulating monthly and daily DIFRA, while studies with high temporal resolution remain scarce. Moreover, most studies primarily focus on exploring meteorological factors, with insufficient consideration given to topographic factors, and many of the variables used are difficult to obtain. Therefore, this study comprehensively considers input features from three aspects—astronomical elements, atmospheric composition, and topographic factors—to extract DIFRA-related variables from readily available datasets and develop an hourly DIFRA model suitable for complex mountainous regions.
In this study, we used remote sensing, topographic, and meteorological reanalysis data to extract meteorological and topographic feature variables. We proposed a grid-based SEA calculation method, assigning accurate SEA values to each grid to extract high-precision astronomical factors. We then combined this with measured data from radiation observation stations in Chongqing to construct a prediction sample set. A high-temporal-precision (hourly) DIFRA estimation model was developed using the random forest algorithm to address gaps in the existing research. Using this model, the spatiotemporal distribution of surface solar DIFRA in Chongqing was simulated and validated against observation data from ground-based meteorological stations. Further analysis was conducted on the influence of key factors, such as SEA, water vapor, aerosols, and cloud cover, on model errors. Additionally, the spatiotemporal distribution patterns of DIFRA were investigated in relation to topographic factors. Compared with traditional empirical models and other machine learning methods for estimating DIFRA, this study demonstrates some advantages: First, the hourly high temporal resolution model used here can accurately capture instantaneous variations in DIFRA in complex terrains, whereas traditional models, typically based on daily or monthly data, fail to reflect such fine-scale dynamic changes. Second, this study uses easily accessible open data with broad spatial coverage, extracting features in three dimensions (astronomical, meteorological, and topographic). This enhances both the usability of the data and the future portability of the model, allowing for more detailed spatiotemporal distribution simulations. The results provide scientific evidence and technical support for solar energy resource assessment, climate change research, and environmental protection in areas with complex terrain.
3. Results and Discussion
3.1. Model Accuracy Validation
3.1.1. Overall Accuracy Validation of Diffuse Radiation Model
As shown in
Figure 4, there is a strong correlation between the predicted DIFRA values and the observed DIFRA values. Specifically, the R
2 value reached 0.72, indicating that the model has strong predictive capability. The MAE was 35.99 W/m
2, and the RMSE was 50.46 W/m
2, suggesting that there were relatively large errors in some data points, especially those where the predicted values deviated significantly, which had a greater impact on overall model performance. From the fitted line, it can be observed that in the higher radiation value range, the predicted values exhibited systematic underestimation, indicating that the model performed less well in high-radiation areas. In these regions, the data points are more scattered, and the model’s error is larger. Overall, the estimation results reflect the spatial variation trend of DIFRA in Chongqing well and meet the requirements for spatiotemporal distribution characteristic analysis in this study.
3.1.2. Accuracy Analysis by Station
To further understand the model’s performance at each station, the model accuracy at various sites was analyzed in detail based on
Figure 5. The stations shown in
Figure 5a–d performed excellently, with R
2 values of 0.78, 0.74, and 0.76, respectively. Both MAE and RMSE are lower than the overall average, with the MAE and RMSE for Changshou station (
Figure 5a) being 32.27 W/m
2 and 45.33 W/m
2, respectively, significantly lower than the overall level. The model fitting was good, indicating stable performance and strong predictive capability at these stations. For the stations shown in
Figure 5e–i, the R
2 values were all above 0.7, indicating stable predictive capability with controllable errors. In
Figure 5j–n, the R
2 values ranged from 0.67 to 0.69, with the MAE and RMSE showing larger fluctuations, indicating some limitations. Although there were some prediction deviations at individual stations, such as Fulin (
Figure 5m) and Dazu (
Figure 5n), these deviations mainly occurred in high-radiation-value regions and had limited impact on the overall trend. Overall, the model demonstrates good generalization ability and stability in Chongqing, meeting the requirements for spatiotemporal distribution characteristic analysis and providing reliable support for regional radiation characteristic studies.
3.1.3. Hourly Accuracy Analysis
Figure 6 displays the daily variation characteristics of DIFRA, showing a typical “bell-shaped” distribution: radiation levels are lower in the early morning and evening, reaching a peak at noon. The radiation is lowest during the 6:00 period, with the median below 50 W/m
2. As time progresses, radiation gradually increases, with the medians for 7:00 and 8:00 periods reaching approximately 100 W/m
2 and 150 W/m
2, respectively. From 10:00 to 14:00, radiation reaches its peak, with the median around 200 W/m
2 and the maximum approaching 400 W/m
2, reflecting the high variability of radiation around noon, which is significantly influenced by solar intensity and weather conditions. After 15:00, radiation levels begin to decline, but variability remains notable. By 17:00 and 18:00, radiation intensity rapidly decreases, with the medians dropping to 100 W/m
2 and 50 W/m
2, respectively, and the distribution range gradually narrows. The box plot shows that the model performed relatively consistently in the early morning (6:00–8:00) and evening (17:00–18:00), with a smaller median and narrower distribution range. However, during the peak period (10:00–14:00), when radiation variability increases, the model’s prediction error became relatively larger, likely influenced by local weather conditions and the complex terrain. Additionally, some outliers in specific periods (such as 8:00, 15:00, and 17:00) are associated with rapid local weather changes, which increase the difficulty of model prediction.
Overall, the model was able to capture the daily variation characteristics of DIFRA well, but its prediction stability during high-radiation periods requires further optimization.
3.1.4. Accuracy Analysis Under Clear-Sky and Cloudy-Sky Conditions
The model was constructed to provide an overall framework to cover all radiation periods under all weather conditions. To further explore the model’s stability under different weather conditions, cloud cover of 0.3 was used as the dividing threshold, where cloud cover below 0.3 was defined as clear-sky conditions, and cloud cover above 0.3 was defined as cloudy-sky conditions. This classification allowed for an in-depth evaluation of the model’s accuracy under different weather types. As observed in
Figure 7, under clear-sky conditions (
Figure 7a), the prediction interval was relatively wide, particularly in the high-radiation range, where the data points are more dispersed. Although the model achieved a coefficient of determination of R
2 = 0.69, the prediction error remained relatively large. This indicates that the model has greater uncertainty in predicting high-radiation levels under clear-sky conditions. In contrast, under cloudy conditions (
Figure 7b), the prediction interval was significantly narrower, and the model’s fit improved to R
2 = 0.70. The data points are more concentrated, suggesting that the model’s radiation predictions were more stable, with a notable reduction in errors (MAE = 32.20 W/m
2, RMSE = 47.51 W/m
2). This demonstrates that the model performs with higher accuracy under cloudy conditions, with the narrower prediction interval reflecting reduced uncertainty.
Overall, the model exhibited higher stability, lower uncertainty, and smaller errors under cloudy conditions, which may be attributed to the regulatory effect of clouds on the radiative transfer process. Clouds induce scattering, absorption, and reflection, making the DIFRA reaching the surface more uniform and stable, thereby reducing the uncertainty caused by fluctuations in direct radiation. In contrast, under clear-sky conditions, direct radiation dominates, and factors such as atmospheric water vapor and aerosols introduce significant local heterogeneity and instability in the radiative transfer process, resulting in increased prediction difficulty and errors.
3.1.5. Analysis of the Impact of Model Factors
As shown in
Figure 8, the feature importance of each variable is provided, which was calculated using the Gini index method in the random forest model. Based on this analysis, SEA(sin) achieved the highest importance value of 0.319, making it the dominant factor under all radiation conditions. This indicates that SEA(sin) contributes most significantly to the model’s predictive power and plays a crucial role in accurately estimating DIFRA under different radiation conditions. The second most important factor was MCC, with an importance of 0.106, exceeding 0.1, suggesting that MCC plays a significant role in predicting DIFRA. Other factors, including TEM, LCC, ozone, RH, AOD, and VP, had relatively similar importance, indicating that their influence on model performance was balanced. Topographic factors contributed the least to the model. It is worth noting that although the contributions of altitude and slope were relatively low, removing these variables during the model construction process decreased prediction accuracy. This indicates that the inclusion of topographic factors enhances the detailed estimation of DIFRA. Based on feature importance, the top eight ranked factors were selected to further analyze their relationship with the radiation errors (
Figure 9).
Figure 9 shows the variation trends of different factors with respect to the observed radiation values, along with the corresponding model errors in each radiation value region. In the low radiation region (<100 W/m
2), the MAE of the model remains at a low level and the overall performance is stable, indicating that the model can effectively capture the DIFRA characteristics under low radiation conditions. At this stage, RH (
Figure 9f) and VP (
Figure 9g) have a weak driving effect on the errors, while MCC (
Figure 9b) and LCC (
Figure 9d) show some correlation, and SEA(sin) (
Figure 9a) has little influence on the errors. In the medium-radiation region (100–200 W/m
2), the model error increases slightly but remains stable overall. The driving effects of RH and VP become more significant and become key factors, while the influence of MCC and LCC gradually weakens. The change in AOD (
Figure 9h) becomes steady and has a minimal impact on the errors. SEA(sin) increases at a faster rate, gradually becoming an important driving factor for the errors. In the high-radiation region (>200 W/m
2), the model error increases significantly, and the variability also intensifies, with SEA(sin) becoming the dominant driving factor and having the most significant impact on the errors. At the same time, the influence of RH and VP further increases, while the correlation of MCC and LCC significantly weakens, suggesting that the model has some robustness against changes in cloud cover. Moreover, the volatility of AOD increases, exhibiting complex nonlinear effects. The continued increases in TEM (
Figure 9c) and the ozone (
Figure 9e) also contribute to the growth of the errors.
Overall, SEA(sin) is the core driving factor of errors in the high-radiation region, while RH and VP exhibit significant influence in the medium- to high-radiation regions. In contrast, the roles of MCC and LCC gradually weaken under high-radiation conditions, with the model showing good stability in low- and medium-radiation conditions. Future optimization should focus on the combined effects of SEA(sin), RH, and VP, while also exploring the nonlinear effects of AOD and the ozone to further improve the prediction accuracy and stability of the model in high-radiation regions.
3.2. Spatiotemporal Distribution Characteristics of Diffuse Radiation
By integrating
Figure 10 and
Figure 11, we systematically analyzed the spatiotemporal variations of DIFRA in Chongqing and its relationship with slope aspect, interpreting these patterns from the perspective of radiative energy transfer and atmospheric scattering mechanisms to enhance the physical credibility of the results.
On a temporal scale, the distribution of diffuse radiation is regulated by SEA, atmospheric scattering effects, and total solar radiation intensity, exhibiting distinct diurnal and seasonal variations. In summer, a higher SEA increases total solar radiation, while higher atmospheric water vapor and aerosol concentrations enhance Mie scattering, which increases DIFRA. At 14:00, DIFRA reaches its peak (
Figure 10b,e), with some regions exceeding 150 W/m
2. In winter, as SEA decreases, total solar radiation is significantly reduced. Although Rayleigh scattering becomes more pronounced at shorter wavelengths, the overall energy input for diffuse radiation diminishes, leading to lower DIFRA intensity (
Figure 10h,k). In the morning and evening, when the sun is near the horizon, the increased atmospheric path length enhances scattering effects. However, due to the overall reduction in total solar radiation, the actual amount of diffuse radiation received remains low, resulting in a significant drop in DIFRA levels at 08:00 and 18:00 (
Figure 10c,j).
On a spatial scale, the distribution of diffuse radiation is influenced by topography and slope aspect, with varying characteristics across different seasons and times of the day.
Figure 11a illustrates the impact of slope aspect on DIFRA at different times of the day (08:00, 12:00, 14:00, and 18:00), revealing that DIFRA peaks shift across slopes as the solar azimuth angle changes.
Figure 11b further reveals the seasonal impact of slope aspect on diffuse radiation: In summer, due to the high SEA, southeast (SE) and west (W)-facing slopes generally receive more diffuse radiation, whereas north-facing (N) slopes exhibit lower DIFRA values. This may be attributed to local terrain reflectivity effects, leading to spatial variations in DIFRA reception across different slopes. In winter, lower total solar radiation weakens the overall DIFRA levels, while enhanced Rayleigh scattering results in a more uniform angular distribution of sky radiation, thereby reducing the differences in DIFRA across different slope aspects (
Figure 11b).
The spatiotemporal distribution of diffuse radiation in Chongqing is jointly influenced by the solar elevation angle, slope aspect, atmospheric scattering mechanisms, and total solar radiation intensity. In summer, DIFRA levels are higher, with greater variability across slope aspects. In winter, the overall DIFRA levels decrease, and differences among slope aspects become less pronounced. These physical mechanisms collectively shape the spatiotemporal characteristics of diffuse radiation, providing essential scientific insights for solar energy resource assessment and photovoltaic site selection in complex terrain regions.
3.3. Analysis of Extreme Weather Changes
To evaluate the model’s predictive capability under rapidly changing atmospheric conditions, this study selected 3 days with drastic cloud cover variations as extreme weather cases and analyzed the model’s performance.
Figure 12 shows the observed DIFRA, model predictions, and cloud cover variations over time.
In low to moderate DIFRA ranges, the model effectively follows the trend in the observed values, demonstrating strong fitting ability. However, during periods of rapid cloud cover transitions, the predictions exhibit a noticeable lag, indicating that the model may struggle to respond instantaneously to abrupt atmospheric changes. In high DIFRA ranges, the prediction errors increase significantly, with a general tendency to underestimate the actual values. This underestimation is particularly evident after a sudden drop in cloud cover, when the observed DIFRA rises rapidly, yet the model fails to capture this sharp transition, further exacerbating prediction deviations. Additionally, prediction errors are most pronounced during periods of drastic cloud cover changes, suggesting that the model’s adaptability remains limited when dealing with rapid atmospheric fluctuations.
The model errors observed under extreme weather conditions can be further explained using radiative transfer theory. DIFRA is primarily influenced by scattering, absorption, and transmission processes in the atmosphere. Under stable atmospheric conditions, radiative transfer remains relatively uniform, allowing the model to effectively capture DIFRA variations. However, when cloud cover changes rapidly, the redistribution of radiative flux significantly alters the ratio of direct to diffuse radiation, leading to sudden variations in atmospheric optical properties, such as AOD and RH.
During cloud dissipation, the abrupt reduction in cloud cover leads to an increase in direct radiation, which enhances surface DIFRA through multiple scattering mechanisms. However, because the model is trained on averaged meteorological conditions, it struggles to capture instantaneous changes caused by cloud cover transitions. This explains the prediction lag and the systematic underestimation of high DIFRA values, resulting in increased forecast uncertainty.
3.4. Comparison and Analysis of the Model Performance
In this study, we compared the performance of different models for hourly DIFRA prediction using the same dataset to ensure a fair evaluation. The results, as shown in
Table 3, indicate that the RF model performed best, achieving the highest R
2 = 0.7212 and the lowest prediction errors (RMSE = 50.4582 W/m
2, MAE = 35.9867 W/m
2). This suggests that RF effectively captures the nonlinear characteristics of the data, significantly improving prediction accuracy. In contrast, the Liu and Jordan model, as a traditional empirical model, performed the worst, with R
2 = 0.2829 and considerably higher errors (RMSE = 81.5813 W/m
2, MAE = 62.8531 W/m
2). Its prediction errors are much higher than those of machine learning models, indicating that empirical formulas are difficult to apply under dynamically changing meteorological conditions.
With the same dataset, the ANN and CNN showed some improvement over the Liu and Jordan model, but they still did not surpass RF. The ANN and CNN achieved R2 = 0.5772 and R2 = 0.5751, with RMSE values of 62.1447 W/m2 and 62.2959 W/m2, respectively. While they outperformed the GRNN, they remained slightly inferior to RF. Meanwhile, the GRNN showed the lowest fitting performance, with R2 = 0.3127 and RMSE = 79.23 W/m2, indicating poor adaptability.
In summary, under the same dataset, RF remained the best-performing model, outperforming the ANN, CNN, and GRNN, while the empirical model exhibited the highest prediction errors and the lowest applicability.
Although RF has relatively lower computational efficiency in the modeling phase, as it requires constructing multiple decision trees and performing extensive splitting, leading to higher computational costs—particularly for large datasets—the spatialization speed of RF is not slow. Furthermore, its superior prediction accuracy makes it a viable option for real-time DIFRA forecasting. Additionally, leveraging cloud computing or GPU acceleration can further enhance RF’s training efficiency, allowing it to maintain high accuracy while improving computational performance, making it more suitable for large-scale meteorological data prediction. Future research can explore integrating ensemble learning or time-series models (e.g., LSTM or Transformer) to enhance both computational efficiency and predictive accuracy, meeting real application needs.
3.5. Limitations and Further Research
Although some progress has been made in DIFRA modeling in this study, there are still limitations of insufficient data accuracy, model applicability, and generalization.
First, there are limitations in data precision. The ground observation data for DIFRA only cover June to December 2016, which does not fully capture the annual meteorological variations. As a result, there is a lack of direct observational support during spring and early winter, potentially leading to the somewhat insufficient simulation of seasonal changes in DIFRA. In addition, the shortcomings in the temporal and spatial resolutions of the atmospheric parameters used as model inputs affect the model’s precision and response to extreme weather. Specifically, the relatively low spatial resolution of the ERA5 cloud data makes it difficult to capture local cloud variations, especially in mountainous areas with complex topography, which limits the ability of the model to respond to short-term variations in DIFRA. In addition, the current aerosol data are on a daily scale, and although they have a spatial resolution of 0.01°, they still do not fully capture the rapid changes in the aerosol concentration, which has a significant impact on DIFRA.
Secondly, the DIFRA model was primarily trained on the mountainous terrain in the Chongqing area. While it demonstrated good stability and accuracy in that region, its applicability and generalization capability in other geographical environments, such as plains, deserts, or coastlands, still need further validation. The local atmospheric circulation effects induced by complex terrains, such as valley and slope winds, can affect the spatial distribution of aerosols and water vapor, thereby altering the observed DIFRA. These micrometeorological factors have not been sufficiently explored in the current study.
To address these issues, future research will focus on improving data precision and enhancing the model’s generalization capability. On the one hand, the study will extend the temporal coverage of ground observation data to obtain a complete annual record of DIFRA, including spring and early winter, and incorporate satellite data with higher spatiotemporal resolution (such as Himawari-8, CALIPSO, etc.) along with data assimilation techniques, to improve the reliability of the input variables, thereby enhancing the model’s response to short-term and local environmental changes. On the other hand, to further validate and enhance the generalization capability of the RF model across different terrain and climatic conditions, the future research plan will implement a more detailed roadmap for cross-regional testing and adaptation. Specifically, this roadmap includes the following: (1) Data Collection and Quality Control: Comprehensive, high-quality datasets will be gathered from representative regions, such as plains, deserts, coastal areas, and high-latitude zones. These datasets will include not only ground observation data (covering variables like temperature, humidity, wind speed, pressure, etc.) but also high-resolution satellite data (such as remote sensing images, cloud images, and radiation data), thereby ensuring that the rich and diverse environmental conditions of each region are well captured. Strict data standardization and quality control procedures will be enforced to guarantee the reliability, consistency, and sufficient spatiotemporal resolution required for model training and validation. (2) Performance Evaluation: Once the data are collected, the RF model’s performance will be systematically evaluated in each region. This will be accomplished by comparing the model’s predictions with local observational data and utilizing statistical metrics, such as the root mean square errors and correlation coefficients, to quantify the model’s accuracy and stability. This step is essential for identifying and quantifying any region-specific biases or limitations, thus providing a scientific basis for subsequent model adaptations. (3) Model Optimization and Hybrid Framework Development: Based on the specific climatic and topographic characteristics of different regions, advanced techniques, such as transfer learning, domain adaptation, and parameter tuning, will be employed to optimize the RF model. Regional micrometeorological factors (e.g., local atmospheric circulation and terrain-induced shading effects) will be integrated as auxiliary variables to better capture the local environmental influences. Moreover, the optimized RF model will be combined with other data-driven approaches (such as deep learning and ensemble learning) and traditional physical models to construct a hybrid prediction framework capable of accurately capturing the complexities of nonlinear atmospheric processes and regional micrometeorological effects. (4) Enhanced Temporal Resolution: High-resolution radiation observation data at minute-level or hour-level intervals will be integrated with regional meteorological models (for example, WRF-Chem) to further refine the simulation of diurnal variations and localized atmospheric dynamics. Advanced data assimilation techniques will be used to closely couple real-time observations with model predictions, ensuring the model can effectively capture short-term dynamic changes in the local environment.
Through this phased and targeted approach, the adaptability and robustness of the RF model in simulating DIFRA across varied geographical settings will be significantly improved, thereby strengthening its broader applicability in solar energy resource assessment, photovoltaic power optimization, and climate change analysis.