Daily Estimation of Global Solar Irradiation and Temperatures Using Artificial Neural Networks through the Virtual Weather Station Concept in Castilla and León, Spain

In this article, the interpolation of daily data of global solar irradiation, and the maximum, average, and minimum temperatures were measured. These measurements were carried out in the agrometeorological stations belonging to the Agro-climatic Information System for Irrigation (SIAR, in Spanish) of the Region of Castilla and León, in Spain, through the concept of Virtual Weather Station (VWS), which is implemented with Artificial Neural Networks (ANNs). This is serving to estimate data in every point of the territory, according to their geographic coordinates (i.e., longitude and latitude). The ANNs of the Multilayer Feed-Forward Perceptron (MLP) used are daily trained, along with data recorded in 53 agro-meteorological stations, and where the validation of the results is conducted in the station of Tordesillas (Valladolid). The ANN models for daily interpolation were tested with one, two, three, and four neurons in the hidden layer, over a period of 15 days (from 1 to 15 June 2020), with a root mean square error (RMSE, MJ/m2) of 1.23, 1.38, 1.31, and 1.04, respectively, regarding the daily global solar irradiation. The interpolation of ambient temperature also performed well when applying the VWS concept, with an RMSE (°C) of 0.68 for the maximum temperature with an ANN of four hidden neurons, 0.58 for the average temperature with three hidden neurons, and 0.83 for the minimum temperature with four hidden neurons.


Introduction
Agricultural productivity can be increased by knowing and predicting more precisely crop yields under various conditions. This is a key concept in both precision agriculture and agricultural modelling. Several authors have studied the different techniques applied in precision agriculture and in the modelling of crop production where they involve meteorological variables, with the objective of improving quality, profitability, resource use efficiency and sustainability [1][2][3]. Among these techniques, the application of variable doses of water, fertilizers and agrochemicals (while considering agrometeorological conditions), as well as the estimation of production (based on the evolution of meteorological variables and the physiological response of crops), are the most frequently used and are currently adopted by many farmers. Indeed, in most cases, crop recommendations are based on data recorded from field studies that compile their conditions (soil and environment) [4].
The impact of global solar irradiation on the Earth's surface has a significant influence on a country's economy, including, for example, agricultural productivity, renewable energy use, food security and human health risks [5], as reported in [6][7][8][9][10]. Loghmari et al. [21] developed and evaluated two monthly spatial interpolation models of global solar radiation, for the purpose of predicting global solar radiation within a distance of more than 50 km in southern and central Tunisia: an artificial neural network (ANN) that obtained better results than a model based on IDW.
In order to spatially fill gaps (nowcasting) in micrometeorological data sets (wind, humidity and temperature), Gunawardena et al. [22] employed Multivariate Linear Regression (MLR) and ANN at eight locations, using measurements from three nearby weather stations, covering scales from 100 m to 5 km. These measurements were made in regions marked by complex terrain, where spatial variability is high on small length scales, which in this case is the Cadarache Valley, which is located in southeastern France, from December 2016 to June 2017, demonstrating that both methods are acceptable.
In this case [23], it is notable the interpolation of the observed weather in the centre of a 25 by 25 km grid, where the weather data is homogeneous, and the temperature, sunshine, humidity and wind speed are expected to change gradually at distances of 50 to 150 km in the European Commission's MARS (Monitoring Agriculture with Remote Sensing) Crop Yield Forecasting System (MCYFS) wiki.
Geographic Information Systems (GIS) offer different options to analyze and represent the spatial heterogeneity of the incident solar radiation in a given area. Martín and Dominguez [24] presented a description of the methods for estimating the distribution of solar radiation in geographical areas, from a sample of data, using deterministic techniques (global polynomial interpolation, local polynomial interpolation, inverse distance weighting and radial basis functions) and geostatistical techniques (kriging and co-kriging) applying them for the summer solstice 2011, from 45 stations in Spain. Indeed, the global polynomial method presents interpolations closer to the real value, the geostatistical methods, in turn, generally present very low squared errors (the universal kriging and the ordinary co-kriging are those that show the best adequacy in the results).
The data, which is collected at discrete weather stations, can only be meaningful when represented by surfaces. Spatial interpolation methods help to convert the point data into surfaces by estimating missing values for areas where data is not collected. In addition to the objective, the total number of data points, their location and their distribution in the study area affect the accuracy and efficiency of the interpolation. Keskin et al. [25] aimed to investigate the optimal spatial interpolation method for mapping meteorological data (precipitation, temperature and wind speed) in the Northern part of Turkey, using the interpolation methods (IDW, kriging, radial basis and natural neighbour). This investigation was carried out in January 2005, resulting in a three-locations average RMSE for a temperature of 0.94 • C with IDW, 0.75 • C with kriging and 0.70 • C natural neighbour.
Yazar [26] performed spatial interpolation of solar radiation with data from 81 agrometeorological stations over heterogeneous agricultural areas including different crop species, irrigation techniques, and topographical and other conditions in Southeastern Turkey, by applying Ordinary Kriging (OK) individually and to reduce the Ordinary Co-Kriging (OCK) error with solar radiation related data (air temperature, vapour pressure deficit and digital elevation model), with up to 21% accuracy, which allowed for better evaluation and management of crop development and yield.
Leirvik and Yuan [5] employed statistical methods (Random Forest (RF); Linear Regression (LR); Generalized Additive Regression (GAM); Least Squares Dummy Variable (LSDV); Ordinary Kriging (OK); and combinations, as LR + OK, GAM + OK, and LSDV + OK) to interpolate missing values in a monthly dataset spanning nearly five decades of global solar irradiation over the Earth's surface, highlighting the benefits of using Machine Learning in environmental research.
Antonić et al. [27] used ANN models for monthly mean values of meteorological variables (air temperature, daily minimum and maximum air temperature, relative humidity, precipitation, global solar irradiation and evapotranspiration) through data obtained from 127 meteorological stations in Croatia. The inputs used (elevation, latitude, longitude, month and time series of the respective climatic variables) were from two meteorological stations. The quality of the results allows the construction of spatial distributions of the average climate for a given period, which would be useful for dendroecological analysis.
Siqueira et al. [28] performed the generation of synthetic daily solar irradiation series from spatial interpolation based on ANNs, employing geographic variables (latitude, longitude and altitude) and meteorological variables (precipitation, maximum and minimum temperature), which were easily available. The data were measured during the months of November (from 2001 to 2006) over seven locations in Pernambuco, Brazil.
Many climate studies need to generate predictions of a climate variable at a given location using values from other locations. Snell et al. [29] conducted a spatial interpolation of daily maximum surface air temperatures using ANNs, so as to generate estimates at 11 locations in the central U.S. continent, using information from a network of surrounding stations for the 4-and 16-point cases and over a 63-year period (from 1931 to 1993) that were used as input and output vectors for the ANNs. The results obtained are better than the spatial average, nearest neighbour and inverse distance methods, and the potential of using ANNs for downscaling General Circulation Models (GCMs) of temperature is discussed.
Rigol et al. [30] performed a spatial interpolation of daily minimum air temperature using an ANN trained with input variables (date, field variables and neighbouring temperature observations) for a full year, covering an area of 100 km × 100 km in Yorkshire, UK, analyzing the internal weights of the inputs to estimate the degree of spatial correlation between neighbouring stations, and the most influential variables contributing to the trend. The performance when testing ANN (33-1-1) is RMSE = 3.15 • C, of ANN (19-4-1) is RMSE = 1.26 • C, and of ANN (45-4-1) RMSE = 1.15 • C.
Zambon et al. [31] reviewed Industry 4.0 procedures suitable for the agricultural sector, while pointing out that the 4.0 revolution in agriculture is still limited to a few innovative companies. Additionally, environmental variability and stochastic events contribute to a high degree of uncertainty in the supply chain and a lack of predictability in agricultural operations. This is where recent technologies related to the digital age, such as precision agriculture, which uses positioning technologies combined with the application of sensors and data, provide digital information in all agricultural processes.
In this paper, the concept of a Virtual Weather Station (VWS) is used and employs meteorological data from real stations to estimate data from a nearby location that does not have a weather station. As part of the VWS development, the performance of ANN models for interpolating each separate meteorological variable (global solar irradiation, maximum, average and minimum temperatures) was evaluated. The performance of the models is compared with those obtained by Franco et al. [11], who proposed the use of a VWS in places where meteorological data are needed, as an alternative to their acquisition, when it is not possible to install a meteorological station. The ANN models, in this case, were used with all the variables of the same place, while in this article, the estimation of each variable (solar irradiation and temperatures) is carried out separately (an ANN model for each meteorological variable).

Materials and Methods
In this section, the following points are described: (1) the meteorological data used with the tested geographic interpolation models, corresponding to global daily solar irradiation and ambient temperature (maximum, average and minimum), as well as information on the location of the agro-meteorological stations where these data were recorded; (2) the ANN models designed for the estimation of the analyzed meteorological variables; and (3) the statistics used to analyze the accuracy of the results obtained by the ANN-based interpolation models that have been examined.

Daily Data on Global Solar Irradiation and Ambient Temperature (Maximum, Average and Minimum)
The daily average data of global solar irradiation and ambient temperature (maximum, mean and minimum) used in this article, for a 15-day period (from 1 to 15 June 2020, were collected in the 54 agrometeorological stations (Appendix A) belonging to the Agro-climatic Information System for Irrigation (SIAR, Sistema de Información para el Asesoramiento al Riego, in Spanish), located in Castilla and León Region, in the North-central part of Iberian Peninsula, as shown in the map presented in Figure 1 and in Table A1 (data of altitude, latitude and longitude). mation on the location of the agro-meteorological stations where these data were recorded; (2) the ANN models designed for the estimation of the analyzed meteorological variables; and (3) the statistics used to analyze the accuracy of the results obtained by the ANN-based interpolation models that have been examined.

Daily Data on Global Solar Irradiation and Ambient Temperature (Maximum, Average and Minimum)
The daily average data of global solar irradiation and ambient temperature (maximum, mean and minimum) used in this article, for a 15-day period (from 1st to 15th of June 2020), were collected in the 54 agrometeorological stations (Appendix A) belonging to the Agro-climatic Information System for Irrigation (SIAR, Sistema de Información para el Asesoramiento al Riego, in Spanish), located in Castilla and León Region, in the Northcentral part of Iberian Peninsula, as shown in the map presented in Figure 1 and in Table  A1 (data of altitude, latitude and longitude). SIAR is a project financed by the Ministry of Environment and Rural and Maritime Areas of Spain, which is managed by the Agricultural Technological Institute of Castilla and León, (ITACyL, Instituto Tecnológico Agrario de Castilla y León, in Spanish), through the Meteorological Information Service [32]. The SIAR project helps farmers to manage irrigation water in an optimal way, advising them on the doses to be applied at each time of the year, depending on the phenological stage of the crop, by calculating the reference evapotranspiration (ETo).
Within the agrometeorological stations of the SIAR network, solar irradiance is measured by a Skye SP1110 pyranometer (Campbell Scientific, Inc., North Logan, UT, USA), consisting of a silicon photocell sensitive to radiation between 350 and 1100 nm, while the ambient temperature is measured by a Pt-1000 temperature sensor, which is based on the variation of platinum resistance with temperature. The linearization and amplification electronics for these sensors are located next to a Vaisala HMP45C probe (Campbell Scientific, Inc., North Logan, UT, USA), which is used to measure ambient temperature and relative humidity, in the temperature ranges of -40 to 60 °C, and 0 to 100%, respectively.
The climatic classification for the location of most agrometeorological stations is Csb, with some located in areas classified as Cfb, Csa and BSk types [33], according to the Koppen-Geiger climate classification. SIAR is a project financed by the Ministry of Environment and Rural and Maritime Areas of Spain, which is managed by the Agricultural Technological Institute of Castilla and León, (ITACyL, Instituto Tecnológico Agrario de Castilla y León, in Spanish), through the Meteorological Information Service [32]. The SIAR project helps farmers to manage irrigation water in an optimal way, advising them on the doses to be applied at each time of the year, depending on the phenological stage of the crop, by calculating the reference evapotranspiration (ETo).
Within the agrometeorological stations of the SIAR network, solar irradiance is measured by a Skye SP1110 pyranometer (Campbell Scientific, Inc., North Logan, UT, USA), consisting of a silicon photocell sensitive to radiation between 350 and 1100 nm, while the ambient temperature is measured by a Pt-1000 temperature sensor, which is based on the variation of platinum resistance with temperature. The linearization and amplification electronics for these sensors are located next to a Vaisala HMP45C probe (Campbell Scientific, Inc., North Logan, UT, USA), which is used to measure ambient temperature and relative humidity, in the temperature ranges of −40 to 60 • C, and 0 to 100%, respectively.
The climatic classification for the location of most agrometeorological stations is Csb, with some located in areas classified as Cfb, Csa and BSk types [33], according to the Koppen-Geiger climate classification.

Estimation of Solar Irradiation and Ambient Temperature Using Artificial Neural Networks
The architectures of the ANNs used for the evaluated geographic interpolation models are illustrated in Figure 2. All of them contain two inputs (longitude and latitude) and one output, which can be the daily global solar irradiation, or the daily mean values of the ambient temperature (maximum, average, or minimum).
forwardnet function, dimensioned with the input and output data vectors, which determine the size of the respective layers, generating a Multilayer feed-Forward Perceptron (MLP) type ANN with a single hidden layer, where the selected activation function between neurons in the hidden layer was the hyperbolic sigmoidal tangent (tansig), while the selected transfer function for the neurons in the output layer was linear (purelin). The Levenberg-Marquardt back-propagation (BP-LM) algorithm was applied to achieve fast optimization (trainlm) [34,35]. The training of the ANNs was performed with the train function, with matrices of input and output data vector, carried out daily in 53 agrometeorological stations of the SIAR network (all of them belonging to this network, except the agrometeorological station of Tordesillas, used in the validation phase of the results), over a period of 15 days (from the 1st to the 15th of June 2020). Finally, the sim function was used, testing the ANNs previously trained with 1, 2, 3, and 4 neurons in the hidden layer, to estimate each meteorological variable studied separately, over the same 15 days at the station located in Tordesillas (Valladolid, Figure 1), with geographic coordinates 41°30′32″ N and 4°59′20″ W, altitude 658 mamsl, used as reference for the validation. The period from June 1 to 15 was chosen because it is the period of the year when agricultural activity is the highest in the Iberian Peninsula, coinciding with the end of winter crops and the beginning of summer crops. The implementation of the ANNs was performed in MATLAB Software with the feedforwardnet function, dimensioned with the input and output data vectors, which determine the size of the respective layers, generating a Multilayer feed-Forward Perceptron (MLP) type ANN with a single hidden layer, where the selected activation function between neurons in the hidden layer was the hyperbolic sigmoidal tangent (tansig), while the selected transfer function for the neurons in the output layer was linear (purelin). The Levenberg-Marquardt back-propagation (BP-LM) algorithm was applied to achieve fast optimization (trainlm) [34,35].
The training of the ANNs was performed with the train function, with matrices of input and output data vector, carried out daily in 53 agrometeorological stations of the SIAR network (all of them belonging to this network, except the agrometeorological station of Tordesillas, used in the validation phase of the results), over a period of 15 days (from 1 to 15 June 2020). Finally, the sim function was used, testing the ANNs previously trained with 1, 2, 3, and 4 neurons in the hidden layer, to estimate each meteorological variable studied separately, over the same 15 days at the station located in Tordesillas (Valladolid, Figure 1), with geographic coordinates 41 • 30 32 N and 4 • 59 20 W, altitude 658 mamsl, used as reference for the validation. The period from June 1 to 15 was chosen because it is the period of the year when agricultural activity is the highest in the Iberian Peninsula, coinciding with the end of winter crops and the beginning of summer crops.

Statistics for the Validation of the ANN Models
The accuracy of the results obtained by the ANN models in the validation phase was analyzed using the following statistics: Root Mean Square Error (RMSE, solar irradiation MJ/m 2 and temperature • C), using Equation (1); and the coefficient of determination (R 2 ), as an indicator of the level of model fit, using Equation (2).

Results
This section presents the results obtained by the ANN models for the daily estimation of global solar irradiation (1) and ambient temperature (maximum (2), average (3), and minimum (4)) at the agrometeorological reference station SIAR, located in Tordesillas, Valladolid, Castilla and León, Spain.

ANN Models for Estimating Daily Global Solar Irradiation at the Reference Station
The results of the ANN models for estimating daily global solar irradiation at the reference station presented in Figure 2a are shown in Table 1. The best result is obtained when using ANN (2-4-1) with RMSE = 1.04 MJ/m 2 , which improves on the best ANN result of Franco et al. [11] for the summer months of 1.63 MJ/m 2 , by using the rectified linear unit activation function.

ANN Models for the Estimation of the Maximum Daily Temperature in the Reference Station
The results of the ANN models shown in Figure 2b for the estimation of the daily maximum temperature at the reference station, are presented in Table 2. The best result obtained is the ANN (2-4-1) with RMSE = 0.68 • C, which improves the best result of the ANNs Franco et al. [11] for the summer months by 1.28 • C using the sigmoid activation function.

ANN Models for the Estimation of the Average Daily Temperature in the Reference Station
The results of the ANNs models shown in Figure 2c for estimating the daily mean temperature at the reference station are presented in Table 3. The best result is obtained by ANNs (2-3-1) with RMSE = 0.58 • C, which improves the best ANN performance Franco et al. [11] for the summer months by 0.99 • C when using the hyperbolic tangent activation function.

ANN Models for the Estimation of the Minimum Daily Temperature in the Reference Station
The results of the ANN models shown in Figure 2d for the estimation of the daily minimum temperature at the reference station, are visualized in Table 4. It obtained the best result for the ANN (2-4-1) with RMSE = 0.83 • C, which improves the best result of all ANNs Franco et al. [11] for the summer months by 1.55 • C, when using the hyperbolic tangent activation function.

Discussion
In this paper, ANNs were used to perform spatial weather forecasts using data measured by SIAR agrometeorological stations in Castilla and León (Spain), one of the largest regions in Europe (94,224 km 2 , where more than half of the area is agricultural land), using meteorological data from both the area near the reference station and the neighbouring areas, which achieved a better performance of the ANN models. Loghmari et al. [21] applied an ANN model using the available meteorological data in the target area with a Recorded Average Relative Root Mean Square Error (ARRMSE) of 6.4%, while the IDW model estimated the global solar radiation measured in nearby areas with an error of 5.11%.
The date set used by Franco et al. [11] to interpolate the values of the most important meteorological variables in agriculture using an ANN was daily precipitation (mm), evapotranspiration ETo (mm), mean daily air temperature ( • C), maximum temperature ( • C), minimum temperature ( • C), mean daily relative humidity (%), maximum relative humidity (%), minimum relative humidity (%), mean wind speed (m/s) and total solar irradiation (MJ/m 2 ) during the summer months (June, July and August) by the same SIAR agrometeorological stations in the territory of Castilla and León, Spain.
In this paper, ANN models are performed independently for each daily variable studied (global solar irradiation, and maximum, average and minimum temperatures) from the geographic coordinates [longitude and latitude] of the location to be estimated, achieving better performance in RMSE values (1.04 MJ/m 2 , 0.68 • C, 0.58 • C, and 0.83 • C, respectively), compared to the ANN models. Franco et al. [11] simultaneously analyzed in the same ANN, ten meteorological variables, during the summer months, obtaining RMSE values of 1.63 MJ/m 2 , 1.28 • C, 0.99 • C, and 1.55 • C, respectively, for the same variables.

Conclusions
Precision agriculture can improve the performance of crops, and thus increase agricultural productivity, by considering a precise knowledge of the meteorological variables that affect them in their development. The number of agrometeorological station networks is increasing, but it is still interesting to have data from the specific location of the crops, which can be obtained by interpolating the data measured by the agrometeorological station network. Strong et al. [36] assessed and evaluated the barriers to the adoption of smart agriculture through the Internet of Things (IoT) among Brazilian farmers in the Rio Grande do Sul, where they found that elements such as compatibility, complexity, testability, and visibility were the predictors of farmers' adoption of innovative solutions. As for ANN models, they were analyzed in this paper to describe the importance of their application for the adoption of climate-smart agriculture.
Kilelu et al. [37] carried out a report on the development of enterprises providing agricultural services in the context of the transformation of agricultural value chains and food systems in the dairy sector in Kenya, where they have the potential to provide innovation support to entrepreneurial farmers as well as contribute to the sustainable growth of the sector.
In this article, ANN models were used to interpolate the data measured daily by the SIAR network of agrometeorological stations in the Region of Castilla and León (Spain) for several meteorological variables: global solar irradiation, maximum, average and minimum temperatures, from the geographical coordinates of the location where the interpolation was carried out, by means of an ANN model for each of the variables studied. This study uses meteorological data available in the target region (areas close to the reference station) and in neighbouring regions (areas far from the reference station). The possibility of having synthetic meteorological data that best represent the local meteorology at each place and time is therefore very important to be able to apply advanced agricultural forecasting techniques that, for example, are related to the knowledge of the phenological behaviour of plants of productive interest, to the prediction of the necessary irrigation doses and the incidence of pests and diseases, or to the estimation of the potential product of the crops [38][39][40].
The results obtained from this study are more successful than those obtained previously for the same SIAR network by applying a single ANN model for all meteorological variables (10 variables). The key to this improvement in results is the use of more simplified and simpler ANN models, which provide a more accurate ANN (Occam's razor).
In addition, the results obtained from the VWS in this study can be applied to make the prediction, at the same location, of the global solar irradiation of the next day with the ANN models developed by Diez et al. [34], and to estimate the hourly distribution of the ambient temperature, during the 24 h of the day, with the ANN models developed by Diez et al. [35], as well as the prediction of the values, for the next day, of the temperature (maximum, average and minimum).
Future studies that develop these ANN models for the interpolation of meteorological variables from geographic coordinates for crop production could include a predictor variable that directly affects the variable to be estimated (in a sloping terrain, its orientation to interpolate solar irradiation, or in the case of temperatures, the type of vegetation cover) that would increase the accuracy of the ANN models.

Acknowledgments:
The authors wish to acknowledge the European Union for supporting this work by means of the FUSILLI project (H2020-FNR-2020-1/CE-FNR-07-2020), and CYTED (the Ibero-American Program of Science and Technology for Development) for supporting this work through collaboration with the RITMUS network.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Appendix A
Appendix A shows the information (altitude, latitude and longitude) of the 54 agrometeorological stations belonging to the Agro-climatic Information System for Irrigation (SIAR) InfoRiego [32], located in the nine provinces of Castilla and León Region, Spain, in Table A1.