Assessing Neural Network Approaches for Solar Radiation Estimates Using Limited Climatic Data in the Mediterranean Sea

: One of the most crucial variables in agricultural meteorology is solar radiation (Rs), although it is measured in a very limited number of weather stations due to its high cost in both installation and maintenance. Moreover, the quality of the data is usually low because of sensor failure and/or lack of calibration, which made scientists search for new approaches such as neural network models. Thus, the improvement of traditional solar radiation estimation models with minimum data availability is still needed for different purposes. In this work, several neural network models were developed and assessed (Multilayer Perceptron—MLP, Support Vector Machines—SVM, Extreme Learning Machine, Convolutional Neural Networks—CNN, and Long Short-Term Memory— LSTM) with different temperature-based input variables configurations in Southern Spain (weather station located in the Mediterranean Sea coast). The performances were analyzed using different statistical indices (Root Mean Square Error—RMSE, Mean Bias Error—MBE, and Nash-Sutcliffe model efficiency coefficient—NSE).


Introduction
During the last decades, an exponential increase in the Earth's pollution has warned governments worldwide.However, incredible high population growth in addition to climate change accentuates the problem of energy needs and guarantees of food supply, which has become two of the major challenges facing our current society.One of the main measures to be adopted has been to increase the use of renewable energy, especially, the use of solar energy.
In these terms, accurate estimations of solar radiation (Rs) are of high importance not only to estimate the available solar energy on a particular day, but also to agronomical parameters such as the reference evapotranspiration (it determines the quantity of evaporated and transpired water in a hypothetical grass reference).Measuring solar radiation is more difficult than other meteorological parameters such as temperature or relative humidity, among others.In this sense, the number of weather stations collecting them is higher than those that collect solar radiation at a rate of 1:500 [1].Besides, when several quality control procedures [2] are applied, solar radiation usually contains the major quantity of flagged data [2,3].
In order to address this problem, several methods were developed to estimate solar radiation: (i) methods based on empirical relationships of different available meteorological parameters such as sunshine duration, air temperature, relative humidity, extraterrestrial radiation, cloud cover, among others [4][5][6][7][8][9][10][11], (ii) estimations using data from nearby stations [7,12,13], (iii) using satellite-based methods [14][15][16][17][18][19], (iv) using Machine Learning (ML) models [16,[20][21][22][23], (v) and others [24,25].ML models efficiently extract high dimensional and complex features from the different inputs in order to map them to obtain an output [26]; this is the reason why ML models have become one of the most commonly used methodologies to estimate solar radiation and other hydro-meteorological parameters [27][28][29][30].In this term, the capability of Support Vector Regression (SVR) was studied for a weather station in Iran [31], showing a better performance than the empirical models and the PSO-based (Particle Swarm Optimization) model tested.[32] assessed the use of Artificial Neural Networks (ANN) in Turkey, obtaining better results using ANN than with other physical or statistical models.Ref. [23] implemented a Long Short-Term Memory (LSTM) and ANN model in Cape Verde with better performance of LSTM in terms of RMSE.[33] evaluated Convolutional Neural Networks (CNN) to forecast short-term solar radiation.[34] evaluated and a hybrid model with LSTM and CNN in Australia.
However, the search for new neural network approaches to improve solar radiation estimates is not so common [35].Thus, in this work, the main objectives are: i) the assessment of several ML models (Multilayer Perceptron-MLP, Support Vector Machines-SVM, Extreme Learning Machine, Convolutional Neural Networks-CNN, and Long Short-Term Memory-LSTM) to estimate solar radiation using limited climatic data (temperature and relative humidity of the air) using weather data from a coastal station in Southern Spain; ii) the assessment of Bayesian optimization to tune the different ML models.

Source of Data
This work was carried out in Almuñecar station (see Figure 1), a coastal location situated in the semiarid region of Andalusia (latitude 36°45'6'' N, longitude 3° 40'44'' W, and 29 m above mean sea level).The dataset consisted of intra-hourly temperature and relative humidity (recorded every 30 min), and the daily extraterrestrial solar radiation (poner referencia).This weather station belongs to the Agronomic Information Network of Andalusia (RIA) and it can be downloaded at the following link https://www.juntadeandalucia.es/agriculturaypesca/ifapa/ria/servlet/FrontController(accessed on 21 September 2020).The period of the dataset was a total of 18 years, from 2000 to 2018, being split to training (from 2000 to 2013) and testing data (from 2014 to 2018).However, 20% of the training data is used as validation to find the fittest hyperparameters of the models.In Table 1, can be seen the statistical maximum, mean, and minimum values of the variables used in this work.In order to estimate solar radiation, 3 configurations were assessed: 1) the use of 48 half-hourly temperature values of the day to estimate solar radiation, 2) the use of 48 halfhourly temperature, and 48 half-hourly relative humidity values, 3) the use of the second model and the daily extraterrestrial solar radiation (daily-basis).In the last case, LSTM could not be implemented due to its requirement of having inputs with the same time dimension.

Multilayer Perceptron (MLP)
A multilayer perceptron (MLP) is a model based on the functionality of neurons in the human brain.It is composed of a determined number of fully interconnected neurons, which are distributed in different layers (the input, hidden, and output layer).The input layer and the output determine the input and output variables of the model, respectively, while the neurons are located along the hidden layers.The process of learning using the training dataset and a backpropagation function is blind to the user, which is the reason why it is called 'hidden'.The MLP structure and configuration (number of neurons, number of hidden layers, activation, and optimization function) determine final efficiency of the model.For further details, see [22].

Support Vector Regression (SVR)
The concept of Support Vector Machine (SVM) for classification (it can be extrapolated to regression tasks) is based on the search of a hyperplane where the margins are maximized to separate two or more classes, which can be easily extrapolated to regression tasks, known as Support Vector Regression (SVR).Its use has been widely assessed in different hydrological and solar radiation estimations [36][37][38], having promising results due to its ability to work with a high-dimensional feature space (using a kernel function).For further details, the following articles can be revised [38,39].

Extreme Learning Machine (ELM)
It was firstly proposed by [40] as a single hidden layer feedforward neural network (SLFNN) with a particular feature, where the weights and biases were randomly generated, while the output weights were analytically calculated.In consequence, the model did not need any iteration learning process, obtaining a very low computational cost model.For further details, see [17,38,41,42].

Convolutional Neural Network (CNN)
Convolutional Neural Network models are frequently used in image processing applications, although its use for 1D data had promising results in hydro-meteorological estimations [34,[43][44][45].Its functionality is based on two main functions, the convolution, and the pooling.The convolution is a mathematical operation on two matrices (the input data and a kernel) producing a new one.On the other hand, the pooling operation reduces the dimensionality of the feature map using the maximum or average functions.For further details, the following works can be revised [34,46].

Long Short-Term Memory (LSTM)
Long Short-Term Memory models are based on Recurrent Neural Network (RNN) and it was first introduced by [47].The main purposes of this approach were to model long-term dependencies and to address the vanishing gradient problem.As a result, the LSTM model contains three gates (input, output, and forget) to control the information that goes into or output the memory cell over any arbitrary time.Figure 2 shows the structure of a memory cell.For further details, see [16,30].

Bayesian Optimization
ML models can be configured by parameters (called hyperparameters) that modify its architecture and have a great impact on its final efficiency.This optimization process from a hyperparameter space is usually known as tuning.A wrong configuration could lead to overfitting or underfitting results.A common practice is to select these hyperparameters by a trial and error technique.Although this technique could yield good results on some occasions, the results may lay on a local minimum.In these terms, Ref. [48] firstly proposed the Bayesian optimization to address this problem.The main advantage of this method is that it considers past evaluations when choosing a new set of hyperparameters (from a pre-defined hyperparameter space), so the algorithm does not expend time on non-promising configurations.
The different hyperparameter space for the ML models in this work was: (1) In MLP, the number of hidden layers (from 1 to 5), the number of neurons each layer (from 1 to 150), the activation function (relu, sigmoid, and tanh), and the maximum number of training epochs (100); (2) In SVM, the kernel was chosen among linear, poly, rbf, and sigmoid, the c parameter (from 0.01 to 10) and the epsilon (from 0 to 10); (3) In ELM, the maximum number of neurons (250) and the activation function (linear, sigmoid ,and tanh); (4) In LSTM, the number of LSTM layers (from 1 to 3), the number of unit each layer (from 10 to 150), the number of hidden layer (from 1 to 2), the number of neurons of each layer (from 1 to 10), the maximum number of training epochs (100), and the activation function (relu, sigmoid, and tanh); (5) In CNN, the number of CNN layers (each layer is composed of convolutional layers and pooling layer-from 1 to 2), the number of convolutional layers per CNN layer (from 1 to 2), the number of filter (from 10 to 20), the number of kernels (from 1 to 5), the type of pooling layer (maximum or average), the size of pooling (from 1 to 3), the number of hidden layers (from 1 to 5), the number of hidden neurons (from 1 to 15), the maximum number of training epochs (100), and the activation function (relu, sigmoid, and tanh).It is worth mentioning that the Bayesian optimization took 50 epochs to carry out this optimization problem, where 40 of them were randomly generated.

Data Standardization
Data standardization is a common data preprocessing operation for machine learning models, where the data is rescaled in order to have a standard deviation of 1 and a mean of 0. The purpose of data standardization is to avoid ML models being influenced due to different input ranges.It can be expressed as Equation (1).
where x' is the standardized data, x represents the input data, µ is the mean value of the training dataset, and σ represents the standard deviation of the training dataset.

Statistical Analysis
All the performances were evaluated using the following parameters: Mean Bias Error (MBE), Root Mean Square Error (RMSE), and Nash-Sutcliffe model efficiency coefficient (NSE).The MBE, RMSE, and NSE are defined in Equations ( 2)-( 4), respectively:

Results
In terms of MBE, RMSE, and NSE, the results of estimating solar radiation in the weather station of Almuñecar using different input configurations and models are shown in Table 2. Using the input configuration of 48 temperature values, CNN performed as the best model in RMSE (3.6864 MJ/m 2 d) and NSE (0.7278), whereas the best MBE value (0.7238 MJ/m 2 d) was carried out by LSTM.In terms of NSE, the models MLP, ELM, and LSTM performed below 0.7.All the model performances were below 4.0 in terms of RMSE, although ELM (RMSE = 3.9524 MJ/m 2 d) and MLP (RMSE = 3.9305 MJ/m 2 d) were very close to this value.Regarding MBE, the MLP, SVM, and ELM models were above 1.0, although CNN and LSTM approaches obtained a better performance (0.8086 MJ/m 2 d and 0.7238 MJ/m 2 d, respectively).Concerning the input configuration of 48 temperature + 48 relative humidity values, SVM obtained the best performance in terms of RMSE (3.1836 MJ/m 2 d) and NSE (0.7969), although it was MLP, the model that got the best MBE value (−0.0724MJ/m 2 d) for all the configurations.Regarding RMSE and NSE, the configuration ranking was SVM, MLP, CNN, LSTM, and ELM, in this order.It is worth mentioning that all the models outperformed their previous performance only using 48   In Figure 3 it is shown the estimations for the best (SVM using 48 T,48 RH and Ra) and the worst (ELM using 48 T) configurations.

Discussion
In general, the results obtained after applying the different machine learning models proposed in this work, outperformed local-calibrated empirical solar radiation estimates for this station [11] in RMSE (3.61 MJ/,m 2 d using Hargreaves-Samani, 3.64 MJ/,m 2 d using Annandale, 3.58 MJ/,m 2 d using Bristow-Campbell and 3.67 MJ/,m 2 d using Allen), and also for different regions of Spain [49].In terms of machine learning modelling, the performance of these models also outperformed RMSE values from a MLP approach using temperature, relative humidity, and pressure as unique climatic input variables in Tucumán, Argentina [35].On the other hand, modelling solar radiation with input climatic data such as sunshine duration and cloud cover among others, gave better estimations [31,37].
It is worth mentioning that the use of daily and semi-hourly (30 min frame) input variables is recommended to improve solar radiation estimations when using machine learning models, so, the inputs do not require to have the same frame in order to be used.Besides, the performance of SVM is highly recommended to estimate solar radiation (very close to CNN and LSTM).Furthermore, the use of Bayesian optimization to tune hyperparameters is highly suggested instead of the commonly use of trial and error techniques.

Conclusions
Different machine learning models using several input configurations (only temperature and relative humidity of the air variables) were implemented and evaluated in Almuñecar (coastal location in Southern Spain).Firstly, the dataset was split into two parts, the training (14 years) and the testing (4 years) sub-series, and different configurations, hyperparameters and models were evaluated.The results indicated that the use of Bayesian optimization and SVM were highly suggested because of its high efficiency and its low computational requirement cost, in places where there is no availability to collect.
In future works, derived variables from temperature and relative humidity could be explored and their performance on regional models (the use of several stations to train, while a new one is taken to test).

Figure 1 .
Figure 1.Location of the weather station of Almuñecar (Southern Spain).

Figure 2 .
Figure 2. Structure of a Long Short-Term Memory (LSTM) memory cell.The 'x' represents an input, 'h' is a hidden state, 'c' is a cell state, sigmoid and tanh represent the respective activation function.
temperature values.The last configuration consisted of mixing intra-hourly (temperature and relative humidity) and daily data (extraterrestrial solar radiation) in a total of 97 inputs.The results outperformed all the previous RMSE and NSE values, where the best model was again SVM (RMSE = 2.5640 MJ/m 2 d and NSE = 0.8683), very close to CNN (RMSE = 2.6609 MJ/m 2 d and NSE = 0.8581), ELM (RMSE = 2.7920 MJ/m 2 d and NSE = 0.84386), and MLP (RMSE = 2.8138 MJ/m 2 d and NSE = 0.8414), in this order.On the other hand, regarding MBE, the ranking changed to MLP (0.3521 MJ/m 2 d), ELM (0.3950 MJ/m 2 d), CNN (0.5344 MJ/m 2 d), and SVM (0.6915 MJ/m 2 d), in this order.

Figure 3 .
Figure 3. Linear regression of (a) SVM solar radiation estimates using 48 T + 48 RH + Ra (best model with the best configuration) and (b) ELM using 48 T (worst model and configuration).

Table 1 .
Statistics of maximum, mean, and minimum temperature, relative humidity, and solar radiation (Max: Maximum, Min: Minimum).

Table 2 .
Mean Bias Error (MBE), Root Mean Square Error (RMSE), and Nash-Sutcliffe model efficiency coefficient (NSE) values for the different models and input configurations.