Next Article in Journal
Effects of Oil Contamination on Range of Soil Types in Middle Taiga of Western Siberia
Previous Article in Journal
A Hybrid Method of Cooling and Heating Consumption Prediction for Six Types of Buildings Based on Machine Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Assessment of a Temperature-Based Artificial Neural Network Designed for Global Solar Radiation Estimation in Regions with Sparse Experimental Data

by
Enrique González-Plaza
1,*,
David García
2 and
Jesús-Ignacio Prieto
1,*
1
Department of Physics, University of Oviedo, c/Federico García Lorca, nº18, 33007 Oviedo, Spain
2
Department of Energy, University of Oviedo, c/Wifredo Ricart, s/n, 33204 Gijón, Spain
*
Authors to whom correspondence should be addressed.
Sustainability 2024, 16(24), 11201; https://doi.org/10.3390/su162411201
Submission received: 19 November 2024 / Revised: 15 December 2024 / Accepted: 18 December 2024 / Published: 20 December 2024

Abstract

:
The aim is to evaluate a model of monthly mean global solar radiation based on a simple ANN that uses geographic and temperature data as input variables and is designed for estimations in regions with few radiometric stations. Using data from 414 Spanish stations, the performance of the model is evaluated when both the number and the percentage of data collected for training the network are significantly modified while maintaining the clustering algorithms. The statistical indicators obtained show a compromise between achieving a lower mean error for all stations and limiting the maximum error at each station. In the worst case, the average error is less than 10% for all stations, and the maximum local error only exceeds 20% in less than 2% of the estimates. The least accurate predictions seem to be related to climate types where the clearness index tends to be higher in winter than in summer, which is the case in some locations on the northern Spanish coast. The results are consistent with estimates obtained for 16 non-Spanish stations, selected within the same input data range, suggesting that the variation of the clearness index over the year could be an important factor for local climate characterization.

1. Introduction

Renewable energy sources, mainly wind or solar, are currently considered essential for sustainable access to humanity’s basic resources, i.e., water, food, and energy [1], and for climate change mitigation, which is critical to protect humans, wildlife, and ecosystems [2]. Solar energy serves as a cornerstone for policymakers, engineers, and researchers [3] and is also discussed in educational projects for sustainability [4]. Thus, reliable data on solar resources are therefore needed to support the sustainable growth and impact of solar installations [5,6].
Solar resource assessment requires relatively expensive direct measurement equipment operating at qualified meteorological stations and providing long data sets to facilitate the validation of solar radiation models. Since radiometric stations are relatively scarce, especially in less developed countries [7,8], it is interesting to develop models that allow indirect estimation of global solar radiation (GSR) from abundantly recorded meteorological variables such as temperature.
Numerous GSR models have been developed over decades [9,10,11]. The accuracy of the estimates depends not only on the methodology and the number and type of influential variables considered in the model but also on the quality and the spatial and temporal distribution of the used data [12,13]. Models have even been proposed that use non-meteorological input variables, such as declination [14] or latitude [15], penalizing accuracy at the expense of simplicity. In general, it can be said that most GSR models are acceptable for the localities or regions where they were validated but less accurate when extended to other areas [16].
There is a trade-off between the accuracy of a model, its complexity, and its generality, i.e., its adaptability to large regions. Accuracy at the local scale can be increased using a higher number of variables, but a simple model may have advantages for predictions over large regions due to its higher number of degrees of freedom [17]. In addition, generality can be facilitated using dimensionless variables and complete functional relationships, i.e., dimensionally homogeneous equations. Otherwise, the parameters or coefficients of the functional relationships necessarily depend on variables that are not explicit in the model [11]. This observation has been little emphasized so far and is applicable to both regression-based and artificial intelligence-based models.
The most widely used models for estimating GSR consist of expressions of the clearness index as a function of the relative sunshine duration, so they are dimensionally homogeneous equations with various mathematical forms. Most equations use site-dependent parameters, but some of them are intended to be valid worldwide. Around 90% of the variability in data recorded at 59 European stations has been explained using relative sunshine hours, elevation, and the monthly index as input variables [18], but data on sunshine duration are not generally available. Looking for a temperature-based approach and as an evolution of the classical model of Hargreaves and Samani [19], a model of the monthly average of the atmospheric clearness index has been previously introduced [20], using the square root of T / T m i n as the only meteorological variable, where T is the difference between the monthly average of daily maximum temperature, T m a x (in K), and the monthly average of daily minimum temperature, T m i n (in K), and assuming that the proportionality coefficient depends on the ratio z / L between the elevation z (in m) and the distance to the sea L (in km). This simple and dimensionally homogeneous model has been evaluated with acceptable results in localities of the northern Spanish coast [11,12,13,17,20] and two large areas of central and southern Spain, with very different climatology and latitude. The comparison with other temperature-based models in each of the three zones showed both the advantages of this model for obtaining general equations applicable in each zone and the influence of latitude on the geographical variability of the model coefficients [12].
To analyze the influence of latitude, mathematical methodology, and possible seasonal effects, a model based on a simple ANN was recently developed to estimate the monthly clearness index for the set of 105 stations located in the three areas of peninsular Spain previously studied. Using T / T m i n , z / L , the latitude and a monthly index as input variables. This approach allowed the improvement of the accuracy of previous models for both statistical indicators averaged for the set of stations and local deviations [13].
The varied geography and orography of Spain have been considered in many studies to characterize the solar resource, using different methods and data from a smaller number of on-ground stations than in the present article [21,22,23]. Access to the network of the Spanish Ministry of Agriculture, Fisheries and Food’s Agroclimatic Information System for Irrigation (SIAR) has facilitated for this paper the evaluation of the model using data from a total of 414 on-ground stations.
The main objective of the paper is to answer the following three research questions:
(a)
Is the robustness of the model, based on a simple ANN using the most commonly available input variables, acceptable when the percentage of data collected for network testing is significantly increased while maintaining the clustering algorithms?
(b)
What would the result be if a similar percentage of data were used for training and testing?
(c)
Does the procedure allow the detection of any climatic characteristics to improve the estimates without adding input variables?
To investigate the first question, the scenario described in the previous paper is considered to be a starting point (Case 0), using 53 stations for ANN training and 52 for testing. Case 1 refers to the ANN trained with the same 53 stations as in Case 0, but using for validation the remaining 361 stations available after adding the SIAR database.
In the analysis of the second question, Case 2 refers to the behavior of the ANN re-trained with 206 stations and validated with the remaining 208 stations.
Finally, to address the third question and as a preliminary step for more comprehensive future work on the extension of the procedure to other countries, the ANN results for the Spanish stations are complemented with ANN results for 16 stations included in the Meteonorm database [24].
Regarding the article’s structure, the Materials and Methods section contains descriptions of the physical and mathematical foundations of the model, the data, and the analysis procedure employed. In the Results section, it is considered an interesting contribution to show the statistical distribution of data in the cases analyzed, as well as the use of relative errors, both local and averaged over the total data set, as statistical indicators of the accuracy. After being previously described, the results are interpreted in the Discussion section in response to the research questions posed as objectives.
From a practical point of view, it is considered that the model procedure may be of interest for low-cost solar resource assessments at the microclimate scale, using data from secondary thermometric networks, terrain elevation maps, and GIS techniques once the model is validated on a wide range of input variables.
From an academic point of view, the article is an additional contribution to the scarce general GSR models, where the rare use of dimensionless variables is surprising despite being more orthodox from a physics point of view and more advantageous from a computational perspective.

2. Materials and Methods

2.1. Model Fundamentals

Any physical model is based on a functional relationship between quantities, even if it has an unknown mathematical form. In this article, the monthly average global solar irradiation on horizontal surface H is assumed to be dependent on the monthly average extraterrestrial irradiation H 0 , on the already defined temperature amplitude T , on the monthly average of daily minimum temperature T m i n , on the elevation z , on the distance to the sea L , on the latitude ϕ , and on a monthly index d , i.e.,:
H = f H 0 ,   T ,   T m i n ,   z ,   L ,   ϕ ,   d .
Parametric model equations are called incomplete when they cannot be converted into dimensionally homogeneous expressions, i.e., they cannot be rewritten by means of dimensionless groups formed by products of powers of the original ordinary quantities. Therefore, dimensional homogeneity analysis makes it possible to detect the absence of influential variables in a model. Using this type of reasoning, the original Hargreaves and Samani model was previously converted into a complete equation by adding T m i n , z and L as variables influencing the clearness index [20]. The procedure is also applicable to transfer functions used in ANN-based models, which satisfy relationships of the following type:
y j = f i = 1 n w i j x i + k j ,
where y j , x i , w i j , and k j denote components of the output vector, the input vector, the connection weight matrix, and the independent term, respectively [25]. Buckingham’s Pi theorem provides an operational method to obtain the influencing non-dimensional groups, whose number is generally smaller than the original number of ordinary variables. If the application of the theorem does not lead to a solution, it is undoubtedly because the starting relationship is incomplete. In the case of Equation (1), the application of the theorem leads to the following equivalent expression, which justifies the selection of model input variables to estimate the atmospheric clearness index:
H H 0 = f T T m i n ,   z L , ϕ , d .

2.2. Data Bases

Table S1 in Supplementary Data shows the geographical and topographical data, as well as the Köppen–Geiger climate classification of the 414 radiometric stations used in this article, either for training or validation purposes. Stations No.1 to 105 are those used in previous works [12,13,17], operated by the Spanish State Meteorological Agency, Spanish Ministry of Environmental Issues, Environmental Climatology Information Network of the Andalusian Regional Government, Service of Environmental Information of the Principality of Asturias, and City Council of Gijón. Stations No.106 to 414 are the ones introduced in this paper, operated by the Spanish Ministry of Agriculture, Fisheries and Food. Figure 1 shows the distribution of the 414 stations, according to the Köppen–Geiger climate classification [26]. The variety of geographical, topographical, and climatic characteristics considered is representative of practically the whole of peninsular Spain, while stations No.369 to 376 facilitate the extension of the study to the Balearic Islands. At almost all stations, records averaged over a minimum of 14 years of the last two decades are used.
Table S2 in Supplementary Data shows experimental values of monthly averages of maximum and minimum daily temperatures and daily global solar irradiation on horizontal surfaces, as well as calculated values of monthly averages of daily extraterrestrial irradiation, totaling 4968 data for each variable.
Data compiled from the Meteonorm database [24] for stations in other countries are labeled as stations No.415 to 430 in Tables S3 and S4 of Supplementary Data.
Figure 2 shows the data distributions of the input variables used in Case 0, which fall within the following ranges:
0.0122 T / T m i n 0.0756
36.29 ° ϕ 43.58 °
0.22 z / L 376.65
Figure 3 shows the data distributions of the input variables used in Cases 1 and 2, which fall within the following ranges:
0.0122 T / T m i n 0.0787
36.29 ° ϕ 43.58 °
0.00 z / L 377.77

2.3. Hierarchical Clustering

Following the procedure of previous work [27], the available stations were classified into representative clusters by means of a bottom-up hierarchical clustering algorithm based on Euclidian distance and median merging criteria. Hierarchical clustering has been used in many areas, such as the analysis of residential energy consumption in Spain [28]. It was also recently recommended to perform solar radiation clustering, which led to the closest results to the Köppen–Geiger classification based on comparisons of four methods using data from 76 Spanish stations, two of which are located in northern coastal provinces [29].
Since some variables have values of some orders of magnitude higher than others, to avoid over-representation of these variables in the clustering procedure, the data have been previously normalized with the following equations:
T T m i n i * = T T m i n i min j = 1 n T T m i n j max j = 1 n T T m i n j min j = 1 n T T m i n j
z L i * = z L i min j = 1 n z L j max j = 1 n z L j min j = 1 n z L j
ϕ i * = ϕ i min j = 1 n ϕ j max j = 1 n ϕ j min j = 1 n ϕ j
where n is the number of stations, T / T m i n i is the annual average value for i-th station, calculated from the monthly average values T / T m i n i m using the following equation:
T T m i n i = 1 12 m = 1 12 T T m i n i m ,

2.4. ANN Characteristics

From a biological standpoint, a neuron can be defined as a living element placed inside a network of similar organisms, which provides it with a series of inputs. Then, an output is given [30] based on internal calculations. Following the structure of an ANN, its elements can be sorted into different groups. The neurons of the input layer act as the entrance of the numerical values of the variables x 1 , ,   x n used to obtain the exit variables a 1 , ,   a k of the network. It also carries information to the elements within the hidden layers. In this sense, the number of hidden layers can be as high as required to meet the required behavior. It is worth noting that its complexity and, thus, its optimization will be increased with this number. The output of each neuron is computed using Equation (2). The variable y j represents the output of the j-th element, which is the result of the activation function f , given a linear combination of the outputs of the previous layer. Thus, x i is the input of the i-th neuron from the previous layer; w i j is the gain factor of each excitation from the previous layer, and k j denotes an independent term. The function f is introduced to produce a non-linearity in the system. According to the literature, it could be, among different options, an arctangent, a sigmoid, or a ramp function [25].
The complexity of an ANN makes it unfeasible to find an analytic solution directly, given the number of variables involved. Therefore, a training process is required. In this sense, a cost function is defined to be used in a convergence algorithm. It is aimed to obtain for each neuron the weights and biases in Equation (2) that optimize this function. The application of Buckingham’s theorem to the functional relation underlying the ANN simplifies the structure of the ANN, providing computational advantages due to the smaller number of variables and weights. The general results are evaluated by comparing the network output with the objective values by means of statistical indicators.
Multiple algorithms can be used in the process, with each single one of them having a different variation. A review of the great variation of them can be found in [31]. In general, one should not be chosen over the others, as there is a plethora of variations for each single one of them. Depending on the structure of the ANN, some may obtain the solution faster or be more suitable for more complex networks; however, the study of the best optimization process is outside the scope of this paper. As already mentioned, the behavior of the ANN previously introduced and applied to data recorded at 105 stations in three large areas of Spain (Case 0) will be evaluated in the following sections with a different set of data. The network structure consists of 3 hidden layers, connected by 15 neurons in each one. In this manuscript, a modification of the gradient algorithm is used to find the final parameters of the ANN during the training process. In this case, a value of any of the parameters of the network ( k j or w i j ) is chosen randomly and then modified by adding or subtracting a certain quantity. If the cost function is improved with the change, this is kept. Then, the process is repeated until the cost function reaches a certain threshold.

2.5. Statistical Indicators

Several methods have been proposed to assess the fit between experimental observations and model estimates, but none of them is free of limitations [32]. Since percentage results are easier to interpret, this paper will make comparisons between models using dimensionless statistical indicators, namely the relative root-mean-square error (RRMSE), the relative mean bias error (RMBE), and the coefficient of determination R 2 . To facilitate comparisons with the results of other authors, the Nash–Sutcliffe model efficiency coefficient (NSE), the normalized mean square error (NRMSE), and mean bias error (NMBE) have also been calculated, in both cases using the mean value of the experimental data as the reference for the normalization. In addition, comparisons based on the centered pattern RMS difference provide a concise summary statistic in terms of their correlation, their root-mean-square difference, and the ratio of their variances, as can be deduced from the following relationship:
E n = 1 + σ s n 2 2 σ s n R
where σ s n is the standard deviation of the clearness index estimates, normalized by the standard deviation of the experimental data.
Regarding the performance at the local scale, the assessments will be made in terms of monthly relative errors and their probabilistic distribution.

3. Results

3.1. Number of Clusters and Selection of Training and Testing Data

Among the various criteria in the literature for determining the number of clusters, the one proposed by Kaushik has been adopted [27]. This procedure is based on the construction of the dendrogram and recommends that the number of clusters is the number of vertical lines cut in the dendrogram by a horizontal line that can traverse the maximum distance vertically without intersecting a cluster.
From the dendrogram shown in Figure 4, it is deduced that 4 clusters are obtained for the 414 stations considered in Cases 1 and 2. The stations included in each cluster are specified in Table 1, and the distributions of stations and clusters in the normalized space are shown in Figure 5. Cluster No.1 includes only 3 stations, which are very close to the Cantabrian coastline and have a Cfb-type climate. The 14 stations included in Cluster No.2 are also located on the north coast, 11 of which have a Cfb climate, and the remaining ones have a Csb climate. Cluster No.3 includes 7 stations with climate type Csa, 5 stations type BSh, 4 stations type BSk, and 2 stations type BWh, some of them with relatively high z / L values, especially station No.24, whose z / L value is only surpassed by the stations of Cluster No.1. Cluster No.4 is much bigger than others are, as it includes 381 stations, which represent 92% of the stations, and has a greater climatic variety.
For Case 2, approximately half of the stations in each cluster were randomly selected for training or validation purposes, amounting to a total of 206 training stations and 208 testing stations. Table 2 lists the distribution of the training stations in the clusters. It can be observed that 189 of the training stations belong to Cluster No.4, representing 92% of the training stations.

3.2. ANN Performance for the Set of 414 Spanish Stations

Figure 6 compares the experimental values and the neural network estimates for Case 1, using symbols to differentiate the stations used for training and testing in Case 0 and the remaining stations added from the SIAR network. Figure 7 compares the results obtained after having re-trained the neural network for Case 2.
Table 3 allows the comparison of the statistical results in both cases. The RRMSE value obtained in Case 2 for the training and testing stations is very similar, in the order of 6.9%, which indicates that the stations chosen for training represent acceptably the testing stations. Figure 8 allows a comparison between the two cases by means of a Taylor diagram, which is based on Equation (8) [33].

4. Discussion

From the figures and tables, it can be deduced that the results are relatively satisfactory for Case 1, with an RRMSE value of 9.28% for the 4968 data set. In addition, with respect to the errors at each station, most of the monthly relative errors are within the range of ±20%, even though the network was trained in this case with less than 13% of the station set. Table 4 shows the statistical distribution of the monthly relative errors, with only 92 values above 20%, equivalent to 1.85% of the total data. However, as Figure 6 shows, it is evident that the neural network is biased for many stations in the SIAR network, providing estimates below the experimental data.
In Case 2, after re-clustering and selection of training stations, an RRMSE value of 6.89% is obtained for the data set. Figure 8 highlights the best statistical indicators obtained in Case 2 for the set of stations. As for the monthly relative errors, the values above 20% are reduced to about half of Case 1, as can be seen from Table 5. Figure 9 provides a comparison between the monthly relative error distributions, with a noticeable improvement for Case 2. The comparison between Figure 2a and Figure 3a suggests that the main cause of such an improvement could be the higher percentage of stations used for training in mid-latitudes.
In Case 2 there are 49 values of monthly relative errors greater than 20%, spread over 16 stations. Figure 10a plots the values for the stations where the monthly relative errors are higher than 20% in 3 months at most, while Figure 10b allows comparing the results for the remaining 5 stations where the monthly relative errors are higher than 20% in more than 3 months.
Figure 10a shows that relative error values are higher than 20% in three months at the most for 11 stations, which are included in Cluster No.4, except for station No.100, which belongs to Cluster No.2. For these stations, the least accurate estimates correspond practically to November, December, and January, with the least satisfactory results for station No.368. Figure 10b shows that 25 monthly relative error values are higher than 20% in the spring and summer months for stations No.87–90, while for station No.162, the 7 values of monthly relative error higher than 20% are more moderate and do not have a marked seasonal character. All five stations are classified as Cfb-type climate. Stations No.87–90 are located in a coastal province of northern Spain and are included in Cluster No.2, where high values of latitude and, to a lesser degree of z / L are characteristic. However, station No.162 is located in the Castilian plateau and is included in Cluster No.4. Therefore, it is not evident that the largest errors are associated with high values of such variables since all stations of Cluster No.1 and most of Cluster No.2 present moderate monthly relative errors.
Figure 11a shows the low variability of the monthly clearness index measured for stations No.87–90, with the peculiarity that the difference between the clearness index values in June and December, C I , is negative. This observation contrasts with the seasonal variability of the clearness index shown in Figure 11b for the remaining 12 stations that also present some monthly error higher than 20%.
Among the 414 stations analyzed, C I is negative at only 3 other stations, namely No.85, 91, and 98. Figure 11c shows the variability of the monthly clearness index for these stations, as well as for station No.97, whose data have the lowest positive value of C I , namely 0.005. This last station and the 7 already mentioned with C I < 0 are included in clusters No.1 and 2, and all of them are in the Principality of Asturias, although managed by different official agencies.
It is interesting to note that in Case 1, stations No.87–90 and 162 also have the highest number of monthly relative errors above 20%. Figure 12 shows that the trends are similar in both cases but with more moderate errors in Case 1.
Furthermore, it should be noted that data from stations No. 1–71, located in southern Spain, were used in previous work to compare 14 GSR models based on regression and temperatures [12]. To evaluate their use as general models, the fit coefficients were replaced by functions of the z / L ratio. The best averages for the set of stations, R R M S E = 7.45 % and R 2 = 0.9693 , were obtained using the modified Adaramola model [12,34], which expresses the clearness index by a linear function whose only variable is the monthly average daily mean temperature. This model predicted annual mean errors of less than 20% for all stations except for station No. 24, located in Cádiz, where the error was 50.32%, the most probable cause being that the z / L = 150 value at this location is notably higher than at the rest of the stations. Given that ANN predicts for station No.24 annual mean errors of 2.76% in Case 1 and 6.63% in Case 2, it can be concluded that the probability of obtaining unacceptable estimates at the local scale is lower using the neural network-based procedure.
As a complement to the analysis carried out for the Spanish stations and as a preliminary step to more comprehensive future work, the performance of the neural network has also been evaluated using data from stations located in other countries. Table 6 shows the data of the monthly clearness index for 16 non-Spanish stations with input variables within the following ranges:
0.020 T / T m i n 0.067
37.02 ° ϕ 42.93 °
0.016 z / L 41.667
As shown in Figure 13a, using the neural network trained in Case 2, the relative error of the clearness index is less than 20% for 12 stations during all months of the year. For the remaining four stations, Figure 13b shows that the monthly relative errors are greater than 20% for station No.418 in all months of the year, with no appreciable seasonal differences. This result seems to be justified mainly because this station has a Dfa climate in the Köppen–Geiger classification and there are no stations of this class among the 414 Spanish stations analyzed. The errors are highest for station No.426 in the spring and summer months, while the highest errors are observed from December to February for station No.417 and from October to January for station No.427. Figure 13b also shows that the same stations present the highest errors and with similar monthly trends when estimates are made using the neural network trained in Case 1, although the errors are more moderate except for station No.426. It can be interpreted that the remarkably high errors obtained at this station are related to the negative value of C I , as Figure 14a shows. On the contrary, Figure 14b shows that stations No.417 and 427 have similar clearness index trends to the stations with lower errors, which is consistent with the fact that the errors at these stations only exceed 20% in 3 and 1 months, respectively, when using the network trained in Case 1.
The difference between the June and December clearness index is also negative for station No.430, located in Yinchuan, China, with some similarity to the behavior represented in Figure 11c for stations No.85, 91, 97, and 98 on the Spanish north coast, and also with acceptable monthly errors. However, the ANN estimates for station No.430 are pending corroboration because, among the 414 Spanish stations analyzed, there is only one with the same BWk type climate, namely station No.1, located at the Plataforma Solar de Almería facilities, in the desert area of Tabernas.
In summary, it can be said that the choice of the best model is a matter of preference between objectives: The behavior of the ANN in Case 2 is more appropriate if the objective of obtaining the lowest mean error for the set of stations is a priority, but Case 1 leads to more moderate values of the maximum monthly relative errors, with a mean error for the set of stations that can be considered acceptable, especially if one takes into account that in this case the network has been trained with less than 13% of the set of stations.
In any case, the less accurate estimates seem to be related to climate types where the clearness index tends to be higher in winter than in summer, which is the case in few locations on the northern Spanish coast. This result is consistent with the estimates obtained for the Japanese locality of Sendai and suggests that the sign of the difference between the clearness indices in June and December may be a basic indicator of climatic variability as a complement to the Köppen–Geiger classification.
Future work is expected to improve the ANN model, designed to estimate GSR over large regions and based on temperature and geographic input data, using algorithms that incorporate climate variability as a variable for selecting training weather stations.

5. Conclusions

Hierarchical clustering, based on normalized values of relative temperature amplitude, latitude, and the ratio of elevation to distance to the sea, is suitable for the characterization of solar radiation data.
A simple ANN is capable of estimating GSR from temperature, orography and geography data with acceptable accuracy, both with respect to error averages for the input data set and with respect to local errors, which is interesting for low-cost solar resource assessments over wide areas.
The application of Buckingham’s theorem to the functional relation underlying the ANN simplifies the structure of the ANN, providing computational advantages due to the smaller number of variables and weights.
The percentage of data used for ANN training influences the quality of the results. In this work, with the same network structure, a lower average error was obtained for the set of 4968 data measured at 414 Spanish stations using 50% of the data for network training, but local errors were lower using 18% of the data for training.
The results of the article once again corroborate the usefulness of the z / L ratio as a proxy for both elevation and thermal regulation of the sea.
The ANN trained with data from Spanish stations performs generally well for other places all around the world within the same ranges of input variables.
The difference between the clearness indices in June and December could be used in the hierarchical clustering of climate data as a complement to the Köppen–Geiger classification, which is suggested to be studied in future works.

Supplementary Materials

Supplementary Data are available at https://doi.org/10.17632/ktrfbhf7bj.2 (accessed on 17 December 2024).

Author Contributions

Conceptualization, J.-I.P.; methodology, J.-I.P.; software, E.G.-P. and D.G.; validation, E.G.-P., D.G. and J.-I.P.; formal analysis, D.G. and J.-I.P.; investigation, E.G.-P., D.G. and J.-I.P.; resources, J.-I.P.; data curation, E.G.-P. and D.G.; writing-review and editing, J.-I.P.; supervision, J.-I.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The geographical and climatic characteristics of the weather stations and the experimental data used in this work are provided as Supplementary Data at https://doi.org/10.17632/ktrfbhf7bj.2.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

d Monthly index
E n Normalized centered pattern RMS difference = 1 + σ s n 2 2 σ s n R
HMonthly average of global solar irradiation on a horizontal surface (kWh/m2)
H 0 Monthly average of extraterrestrial solar irradiation on a horizontal surface (kWh/m2)
k j Component of the independent term
KGCCKöppen–Geiger climate classification
LDistance to the sea (km)
n Number of data
NMBENormalized mean bias error = i = 1 n s i o i / n / o i ¯
NRMSENormalized root-mean-square error = i = 1 n s i o i 2 / n / o i ¯
NSENash–Sutcliffe model efficiency = 1 i = 1 n s i o i 2 / i = 1 n o i o i ¯ 2
o i Observed value
R 2 Coefficient of determination = i = 1 n s i s i ¯ o i o i ¯ / i = 1 n s i s i ¯ 2 i = 1 n o i o i ¯ 2 2
RMBERelative mean bias error = i = 1 n s i o i / o i / n
RRMSERelative root-mean-square error = i = 1 n s i o i / o i 2 / n
s i Simulated value
T m a x Monthly average of daily maximum temperature (K)
T m i n Monthly average of daily minimum temperature (K)
w i j Component of the connection weight matrix
x i Component of the input vector
y j Component of the output vector
zElevation above sea level (m)
z / L * Normalized z / L
C I Difference between the clearness index values in June and December
T Temperature difference (K) = T m a x T m i n
T / T m i n * Normalized T / T m i n
ϕ Latitude (°)
ϕ * Normalized latitude
σ o Standard deviation of experimental data = i = 1 n o i o i ¯ 2 / n
σ s Standard deviation of simulated data = i = 1 n s i s i ¯ 2 / n
σ s n Normalized standard deviation = σ s / σ o

References

  1. Renewable Energy–Powering a Safer Future. Available online: https://www.un.org/en/climatechange/raising-ambition/renewable-energy (accessed on 10 September 2024).
  2. Solar Energy, Wildlife, and the Environment. Available online: https://www.energy.gov/eere/solar/solar-energy-wildlife-and-environment (accessed on 9 December 2024).
  3. Khalil, M.; Sheik, S.A. Advancing Green Energy Integration in Power Systems for Enhanced Sustainability: A Review. IEEE Access 2024, 12, 151669–151692. [Google Scholar] [CrossRef]
  4. Kauniuk, G.; Fursova, T.; Cherniuk, A.; Mezeria, A.; Zaika, S.; Dolmatov, O. Using Solar Energy in Eco-Project Based Learning for Sustainability. In Proceedings of the IEEE 4th KhPI Week on Advanced Technology (KhPIWeek), Kharkiv, Ukraine, 2–6 October 2023. [Google Scholar] [CrossRef]
  5. Maka, A.; Alabid, J. Solar energy technology and its roles in sustainable development. Clean Energy 2022, 6, 476–483. [Google Scholar] [CrossRef]
  6. Obaideena, K.; AlMallahib, M.N.; Alamib, A.H.; Ramadane, M.; Abdelkareemb, M.A.; Shehatah, N.; Olabi, A.G. On the contribution of solar energy to sustainable developments goals: Case study on Mohammed bin Rashid Al Maktoum Solar Park. Int. J. Thermofluids 2021, 12, 10012. [Google Scholar] [CrossRef]
  7. Gevorgian, A.; Pernigotto, G.; Gasparella, A. Addressing Data Scarcity in Solar Energy Prediction with Machine Learning and Augmentation Techniques. Energies 2024, 17, 3365. [Google Scholar] [CrossRef]
  8. Porfirio, A.C.S.; Ceballos, J.C.; Britto, J.M.S.; Costa, S.M.S. Evaluation of Global Solar Irradiance Estimates from GL1.2 Satellite-Based Model over Brazil Using an Extended Radiometric Network. Remote Sens. 2020, 12, 1331. [Google Scholar] [CrossRef]
  9. Qin, W.; Wang, L.; Lin, A.; Zhang, M.; Xia, X.; Hu, B.; Niu, Z. Comparison of deterministic and data-driven models for solar radiation estimation in China. Renew. Sustain. Energy Rev. 2018, 81, 579–594. [Google Scholar] [CrossRef]
  10. Chen, J.-L.; He, L.; Yang, H.; Ma, M.; Chen, Q.; Wu, S.-J.; Xiao, Z.-L. Empirical models for estimating monthly global solar radiation: A most comprehensive review and comparative case study in China. Renew. Sustain. Energy Rev. 2019, 108, 91–111. [Google Scholar] [CrossRef]
  11. Prieto, J.-I.; García, D. Global solar radiation models: A critical review from the point of view of homogeneity and case study. Renew. Sustain. Energy Rev. 2022, 155, 111856. [Google Scholar] [CrossRef]
  12. Prieto, J.-I.; García, D. Modified temperature-based global solar radiation models for estimation in regions with scarce experimental data. Energy Convers. Manag. 2022, 268, 115950. [Google Scholar] [CrossRef]
  13. González-Plaza, E.; García, D.; Prieto, J.-I. Monthly Global Solar Radiation Model Based on Artificial Neural Network, Temperature Data and Geographical and Topographical Parameters: A Case Study in Spain. Sustainability 2024, 16, 1293. [Google Scholar] [CrossRef]
  14. Hassan, M.A.; Khalil, A.; Kaseb, S.; Kassem, M.A. Independent models for estimation of daily global solar radiation: A review and a case study. Renew. Sustain. Energy Rev. 2018, 82, 1565–1575. [Google Scholar] [CrossRef]
  15. Ilbuodo, J.M.; Bonkoungou, D.; Tassembedo, S.; Koalaga, Z. General Models for Monthly Average Daily Global Solar Irradiation. Sci. J. Energy Eng. 2024, 12, 81–90. [Google Scholar] [CrossRef]
  16. Meenal, R.; Inmanuel Selvakumar, A. Assessment of SVM, empirical and ANN based solar radiation prediction models with most influencing input parameters. Renew. Energy 2018, 121, 324–343. [Google Scholar] [CrossRef]
  17. Prieto, J.-I.; García, D.; Santoro, R. Comparative Analysis of Accuracy, Simplicity and Generality of Temperature-Based Global Solar Radiation Models: Application to the Solar Map of Asturias. Sustainability 2022, 14, 6749. [Google Scholar] [CrossRef]
  18. Paulescu, M.; Stefu, N.; Calinoiu, D.; Paulescu, E.; Pop, N.; Boata, R.; Mares, O. Ångström–Prescott equation: Physical basis, empirical models and sensitivity analysis. Renew. Sustain. Energy Rev. 2016, 62, 495–496. [Google Scholar] [CrossRef]
  19. Hargreaves, G.H.; Samani, Z.A. Estimating potential evapotranspiration. J. Irrig. Drain. Eng. ASCE 1982, 108, 225–230. [Google Scholar] [CrossRef]
  20. Prieto, J.-I.; Martínez-García, J.C.; García, D. Correlation between global solar irradiation and air temperature in Asturias, Spain. Sol. Energy 2009, 83, 1076–1085. [Google Scholar] [CrossRef]
  21. Urraca, R.; Martinez-de Pison, E.; Sanz-Garcia, A.; Antoñanzas, J.; Antoñanzas-Torres, F. Estimation methods for global solar radiation: Case study evaluation of five different approaches in central Spain. Renew. Sustain. Energy Rev. 2017, 77, 1098–1113. [Google Scholar] [CrossRef]
  22. Rodríguez-Benítez, F.J.; Arbizu-Barrena, C.; Santos-Alamillos, F.J.; Tovar-Pescador, J.; Pozo-Vázquez, D. Analysis of the intra-day solar resource variability in the Iberian Peninsula. Sol. Energy 2018, 171, 374–387. [Google Scholar] [CrossRef]
  23. Bueso, M.C.; Paredes-Parra, J.M.; Mateo-Aroca, A.; Molina-García, A. A characterization of metrics for comparing satellite-based and ground-measured global horizontal irradiance data: A principal component analysis application. Sustainability 2020, 12, 2454. [Google Scholar] [CrossRef]
  24. Meteotest, AG. Meteonorm, v7.3.0.26247; Meteotest: Bern, Switzerland, 2018. [Google Scholar]
  25. Rasamoelina, A.D.; Adjailia, F.; Sinčák, P. A Review of Activation Function for Artificial Neural Network. In Proceedings of the IEEE 18th World Symposium on Applied Machine Intelligence and Informatics, Herlany, Slovakia, 23–25 January 2020. [Google Scholar] [CrossRef]
  26. Bernabé, A.C.; García, E.F.; Sánchez, B.P.; Rebull, T.T.; Mariño, B.L.; Pinto, E.C.; García, J.V.M.; Fresneda, R.R.; Fullat, R.B. Mapas Climáticos de España (1981–2010) y ETo (1996–2016); Agencia Estatal de Meteorología: Madrid, Spain, 2018; (In Spanish). [Google Scholar] [CrossRef]
  27. Kaushik, S. Clustering|Introduction, Different Methods, and Applications (Updated 2023). Available online: https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering (accessed on 29 June 2023).
  28. García-López, J.; Dominguez-Amarillo, S.; Sendra, J.J. Clustering Open Data for Predictive Modeling of Residential Energy Consumption across Variable Scales: A Case Study in Andalusia, Spain. Buildings 2024, 14, 2335. [Google Scholar] [CrossRef]
  29. García-Gutiérrez, L.; Voyant, C.; Notton, G.; Almorox, J. Evaluation and Comparison of Spatial Clustering for Solar Irradiance Time Series. Appl. Sci. 2022, 12, 8529. [Google Scholar] [CrossRef]
  30. McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
  31. Abdolrasol, M.G.M.; Hussain, S.M.S.; Ustun, T.S.; Sarker, M.R.; Hannan, M.A.; Mohamed, R.; Ali, J.A.; Mekhilef, S.; Milad, A. Artificial Neural Networks Based Optimization Techniques: A Review. Electronics 2021, 10, 2689. [Google Scholar] [CrossRef]
  32. Ritter, A.; Muñoz-Carpena, R. Performance evaluation of hydrological models: Statistical significance for reducing subjectivity in goodness-of-fit assessments. J. Hydrol. 2013, 480, 33–45. [Google Scholar] [CrossRef]
  33. Taylor, K.E. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. 2001, 106, 7183–7192. [Google Scholar] [CrossRef]
  34. Adaramola, M.S. Estimating global solar radiation using common meteorological data in Akure, Nigeria. Renew. Energy 2012, 47, 38–44. [Google Scholar] [CrossRef]
Figure 1. Distribution of weather stations according to the Köppen–Geiger climate classification.
Figure 1. Distribution of weather stations according to the Köppen–Geiger climate classification.
Sustainability 16 11201 g001
Figure 2. Data distribution in Case 0: (a) as a function of latitude; (b) as a function of z / L ; (c) as a function of T / T m i n · 100 .
Figure 2. Data distribution in Case 0: (a) as a function of latitude; (b) as a function of z / L ; (c) as a function of T / T m i n · 100 .
Sustainability 16 11201 g002
Figure 3. Data distribution in Cases 1 and 2: (a) as a function of latitude; (b) as a function of z / L ; (c) as a function of T / T m i n · 100 .
Figure 3. Data distribution in Cases 1 and 2: (a) as a function of latitude; (b) as a function of z / L ; (c) as a function of T / T m i n · 100 .
Sustainability 16 11201 g003
Figure 4. Dendrogram obtained after clustering the 414 stations.
Figure 4. Dendrogram obtained after clustering the 414 stations.
Sustainability 16 11201 g004
Figure 5. Distribution of stations and clusters in normalized space.
Figure 5. Distribution of stations and clusters in normalized space.
Sustainability 16 11201 g005
Figure 6. Comparison between ANN estimates and experimental data for Case 1.
Figure 6. Comparison between ANN estimates and experimental data for Case 1.
Sustainability 16 11201 g006
Figure 7. Comparison between ANN estimates and experimental data for Case 2: (a) Training data; (b) testing data.
Figure 7. Comparison between ANN estimates and experimental data for Case 2: (a) Training data; (b) testing data.
Sustainability 16 11201 g007
Figure 8. Taylor diagram for Cases 1 and 2.
Figure 8. Taylor diagram for Cases 1 and 2.
Sustainability 16 11201 g008
Figure 9. Comparison between monthly relative error distributions.
Figure 9. Comparison between monthly relative error distributions.
Sustainability 16 11201 g009
Figure 10. Stations with monthly relative errors greater than 20%: (a) in three months at the most; (b) in more than three months.
Figure 10. Stations with monthly relative errors greater than 20%: (a) in three months at the most; (b) in more than three months.
Sustainability 16 11201 g010
Figure 11. Variability of the monthly clearness index at (a) stations with any monthly relative error value greater than 20% and C I < 0 ; (b) stations with any monthly relative error value greater than 20% and C I > 0 ; (c) remaining stations with C I < 0 or close to zero.
Figure 11. Variability of the monthly clearness index at (a) stations with any monthly relative error value greater than 20% and C I < 0 ; (b) stations with any monthly relative error value greater than 20% and C I > 0 ; (c) remaining stations with C I < 0 or close to zero.
Sustainability 16 11201 g011
Figure 12. Comparison between Case 1 and Case 2 for stations with monthly relative errors greater than 20% in more than three months.
Figure 12. Comparison between Case 1 and Case 2 for stations with monthly relative errors greater than 20% in more than three months.
Sustainability 16 11201 g012
Figure 13. Relative errors of the monthly clearness index in non-Spanish stations: (a) with values lower than 20% in all months; (b) with values higher than 20% in any month.
Figure 13. Relative errors of the monthly clearness index in non-Spanish stations: (a) with values lower than 20% in all months; (b) with values higher than 20% in any month.
Sustainability 16 11201 g013
Figure 14. Variation of the monthly clearness index at non-Spanish stations: (a) with C I < 0 ; (b) with C I > 0 .
Figure 14. Variation of the monthly clearness index at non-Spanish stations: (a) with C I < 0 ; (b) with C I > 0 .
Sustainability 16 11201 g014
Table 1. Distribution of stations in each cluster.
Table 1. Distribution of stations in each cluster.
Cluster No.1Cluster No.2Cluster No.3Cluster No.4
91, 92, 10385, 86, 87, 88, 89, 90, 93, 94, 95, 97, 98, 99, 100, 3674, 7, 9, 10, 11, 13, 15, 17, 18, 20, 23, 24, 44, 60, 389, 399Rest of stations
Table 2. Distribution of training stations in each cluster for Case 2.
Table 2. Distribution of training stations in each cluster for Case 2.
Cluster No.1Cluster No.2Cluster No.3Cluster No.4
91, 10386, 88, 90, 93, 95, 97, 994, 10, 18, 20, 24, 44, 60, 3992, 6, 8, 12, 14, 16, 22, 26–42 (even), 46–58 (even), 62–84 (even), 96, 101, 105–365 (odd), 369–387 (odd), 391–397 (odd), 401–413 (odd)
Table 3. Statistical indicators for Cases 1 and 2.
Table 3. Statistical indicators for Cases 1 and 2.
RRMSE (%)RMBE (%)NRMSE (%)NMBE (%)NSER2 σ s n E n (%)
Case 19.28−5.419.44−0.680.45910.67620.963858.64
Case 26.89−0.306.33−0.670.75700.76280.928249.01
Table 4. Case 1: Probability that the relative error is greater than a certain x-value and number N of corresponding values.
Table 4. Case 1: Probability that the relative error is greater than a certain x-value and number N of corresponding values.
x (%)012345678910111213141516171819202530
P (%)10091.3482.6374.0764.6556.9850.0843.8038.5334.0230.0326.4323.1119.9916.4312.729.506.944.853.081.850.140.02
N49684538410536803212283124882176191416901492131311489938166324723452411539271
Table 5. Case 2: Probability that the relative error is greater than a certain x-value and number of corresponding values.
Table 5. Case 2: Probability that the relative error is greater than a certain x-value and number of corresponding values.
x (%)01234567891011121314151617181920253035404550
P (%)10087.2475.3464.1753.1043.2034.6026.9920.8715.8811.748.096.044.393.482.722.191.691.371.190.990.520.340.280.120.040.02
N49684334374331882638214617191341103778958340230021817313510984685949261714621
Table 6. Monthly clearness index, geographical and climatic data for non-Spanish stations [14].
Table 6. Monthly clearness index, geographical and climatic data for non-Spanish stations [14].
No.Location ϕ ( ° ) z / L KGCCJANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC
415Ahar (Azerbaijan)38.438.763Csa0.54330.57910.56440.56790.60350.63910.64910.64300.64370.62430.55960.5372
416Bragança (Portugal)41.824.035Csb0.44780.54640.53460.57810.59410.64270.67920.66730.64180.49740.46780.3889
417Çannakkale (Turkey)40.1315.789Csa0.34270.36130.44810.49650.53730.56570.59770.56830.54620.47740.40510.3306
418Elkins (USA)38.882.096Dfa0.41290.42940.46430.45650.48690.49370.47940.49210.48010.45130.42610.4135
419Faro (Portugal)37.0211.887Csa0.57540.61960.62090.65630.66450.69340.70820.68470.66160.63020.59720.5324
420Isparta (Turkey)37.8510.324Csb0.46800.46990.51000.52460.56600.61050.63400.61040.63290.54770.52240.4554
421Monte Real (Portugal)39.835.844Csb0.53070.50190.57010.57070.58370.57950.64240.64640.61380.57830.56230.5323
422New York JFK (USA)40.652.275Cfa0.40010.47150.49110.46910.50340.51590.51110.58780.50470.58350.45910.4057
423Pocatello Airp. (USA)42.921.440Dfb0.46820.53590.54200.58540.57260.63370.65500.66440.64130.61670.49010.4647
424Roma-Ciampino (Italy)41.805.984Csa0.40590.45960.50080.47910.51760.54280.56730.54840.50790.47480.40030.3598
425San Francisco (USA)37.6241.667Csb0.46160.52870.55940.58240.58140.61740.62900.62240.63380.57020.51080.5033
426Sendai (Japan)38.275.402Cfa0.49430.50670.48340.48400.43530.36420.33800.37790.36980.43530.46510.4795
427Sofia (Bulgary)42.682.754Cfb0.34460.41660.42440.42090.45420.49260.50640.50300.46000.41500.35900.2905
428Turpan (China)42.930.016BWk0.46760.54070.55000.57000.57530.55930.54970.56150.59330.58200.51270.4401
429Viterbo (Italy)42.437.815Csa0.47280.52140.53560.50730.57140.59370.62050.60870.55660.52140.47650.4307
430Yinchuan (China)38.481.106BWk0.59430.59990.56710.57430.58360.55560.54410.54250.55280.57790.58740.5804
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

González-Plaza, E.; García, D.; Prieto, J.-I. Assessment of a Temperature-Based Artificial Neural Network Designed for Global Solar Radiation Estimation in Regions with Sparse Experimental Data. Sustainability 2024, 16, 11201. https://doi.org/10.3390/su162411201

AMA Style

González-Plaza E, García D, Prieto J-I. Assessment of a Temperature-Based Artificial Neural Network Designed for Global Solar Radiation Estimation in Regions with Sparse Experimental Data. Sustainability. 2024; 16(24):11201. https://doi.org/10.3390/su162411201

Chicago/Turabian Style

González-Plaza, Enrique, David García, and Jesús-Ignacio Prieto. 2024. "Assessment of a Temperature-Based Artificial Neural Network Designed for Global Solar Radiation Estimation in Regions with Sparse Experimental Data" Sustainability 16, no. 24: 11201. https://doi.org/10.3390/su162411201

APA Style

González-Plaza, E., García, D., & Prieto, J.-I. (2024). Assessment of a Temperature-Based Artificial Neural Network Designed for Global Solar Radiation Estimation in Regions with Sparse Experimental Data. Sustainability, 16(24), 11201. https://doi.org/10.3390/su162411201

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop