A New Index Assessing the Viability of PAR Application Projects Used to Validate PAR Models

: Photosynthetically active radiation ( PAR ) is a useful variable to estimate the growth of biomass or microalgae. However, it is not always feasible to access PAR measurements; in this work, two sets of nine hourly PAR models were developed. These models were estimated for mainland Spain from satellite data, using multilinear regressions and artiﬁcial neural networks. The variables utilized were combinations of global horizontal irradiance, clearness angle cosine, relative humidity, and air temperature. The study territory was divided into regions with similar features regarding PAR through clustering of the PAR clearness index ( k PAR ). This methodology allowed PAR modeling for the two main climatic regions in mainland Spain (Oceanic and Mediterranean). MODIS 3 h data were employed to train the models, and PAR data registered in seven stations across Spain were used for validation. Usual validation indices assess the extent to which the models reproduce the observed data. However, none of those indices considers the exceedance probabilities, which allow the assessment of the viability of projects based on the data to be modeled. In this work, a new validation index based on these probabilities is presented. Hence, its use, along with the other indices, provides a double and thus more complete validation.


Introduction
In recent years, the growing global demand for food and biomass crops has pushed the research efforts to learn more about the factors that affect the growth and production of different crops in order to improve agricultural productivity. In this context, researchers seek not only to improve the efficiency of the processes involved in classical food crops, but they also aim to find new, clean and eco-friendly sources of power, such as obtaining biofuels and bioalcohols from biomass and the potential uses of microalgae in bioreactors. Microalgae are a promising field of study of the so-called third-generation biofuels that can contribute to reducing greenhouse gas emissions since they are non-polluting and sustainable. For instance, there is an ongoing project called the development of advanced technologies of microalgae for a circular economy (ALGATEC-CM S2018/BAA-4532) that addresses the coupling of biofuel production with the extraction of high-added value products to achieve the economic viability that microalgae technology needs. The effectiveness of the microalgae growth is mainly determined by the availability and concentration of nutrients as well as by the climatic variables, especially temperature and the ranges of photosynthetically active radiation (PAR) that are the best for biofuel or bioalcohol production. To this effect, modeling and simulating PAR and temperature conditions are key tools for the researchers involved in this project to discern which are the most suitable places to locate an outdoor pilot plant.
All types of crops need photosynthesis to grow and produce the leaves, fruits, or any type of bioproduct that interests us. Thus, controlling the ambient conditions to 1. To present a methodology to develop PAR models based on satellite data provided by a Moderate-Resolution Imaging Spectroradiometer (MODIS). The models' development utilized data from each cluster to be specific for each region. Subsequently, a site adaptation technique was applied to the best models to test if there was an improvement in the results.

2.
To introduce a statistical index to assess the goodness of the developed models. This index is defined in Equation (12) (PAR could be replaced by any other modeled variable) and allows the estimation of the goodness of a model. Utilizing the defined Exceedance Probability Index (EPI), we can make a joint comparison of all those exceedance probabilities corresponding to modeled and measured values.

Materials and Methods
Two datasets were used; the first one was utilized for the development and the second one for the training of the models. GHI and PAR training data were downloaded from MODIS Land Products (MCD18) [44], particularly, the products handled were MCD18A1 [45] and MCD18A2 [46], respectively. This dataset covers a grid from 35.3 • N to 44.0 • N in latitude and from 9.5 • W to 3.5 • E in longitude. It has a resolution of 5 km (0.045 × 0.045 • ) and covers the period from 1 January 2018 to 31 December. Hourly T and dew point (T d ) data were provided by the European Centre for Medium-Range Weather Forecasts (ECMWF) [47]. These data cover a grid from 35.3 • N to 44.0 • N in latitude and from 9.5 • W to 3.5 • E in longitude. In this case, the resolution was 11 km (0.1 × 0.1 • ) from 1 January 2018 to 31 December 2018. These grids correspond to the location of the Iberian Peninsula where mainland Spain is located.
Data from seven PAR measurement stations belonging to the GEOPAR project (Project CGL2016-79284-P AEI/FEDER/UE) were used to validate the models. These stations provide PAR, GHI, RH, and T minute data. Each of these stations includes Apogee SP-110 and Apogee SQ-110 pyranometers for GHI and PAR measuring, respectively, and a HOBO S-THB Temperature/Humidity M002 psychrometer for RH and T measuring. Figure 1 illustrates the locations of these stations, and Table 1 summarizes these data.

Materials and Methods
Two datasets were used; the first one was utilized for the development and the second one for the training of the models. GHI and PAR training data were downloaded from MODIS Land Products (MCD18) [44], particularly, the products handled were MCD18A1 [45] and MCD18A2 [46], respectively. This dataset covers a grid from 35.3° N to 44.0° N in latitude and from 9.5° W to 3.5° E in longitude. It has a resolution of 5 km (0.045 × 0.045°) and covers the period from 1 January 2018 to 31 December. Hourly T and dew point (Td) data were provided by the European Centre for Medium-Range Weather Forecasts (ECMWF) [47]. These data cover a grid from 35.3° N to 44.0° N in latitude and from 9.5° W to 3.5° E in longitude. In this case, the resolution was 11 km (0.1 × 0.1°) from 1 January 2018 to 31 December 2018. These grids correspond to the location of the Iberian Peninsula where mainland Spain is located.
Data from seven PAR measurement stations belonging to the GEOPAR project (Project CGL2016-79284-P AEI/FEDER/UE) were used to validate the models. These stations provide PAR, GHI, RH, and T minute data. Each of these stations includes Apogee SP-110 and Apogee SQ-110 pyranometers for GHI and PAR measuring, respectively, and a HOBO S-THB Temperature/Humidity M002 psychrometer for RH and T measuring. Figure 1 illustrates the locations of these stations, and Table 1 summarizes these data.   Result of k PAR clustering over the territory of study. The oceanic cluster is represented in green, while the Mediterranean one is in yellow. The locations of the seven stations used for validating the models are also represented. For obtaining RH data to develop the models, T and T d data provided by ECMWF were used according to Equation (1), derived from the Magnus formula and the Arden Buck Equation [48].
where RH is the relative humidity, T the air temperature, T d is the dew point, b = 18.678, c = 257.14 • C, and d = 234.5 • C. The clearness index (k t ) and solar zenith angle cosine (cos θ) were also employed for developing and validating the models. These variables were calculated following Equations (2) and (4).
where G 0 is the hourly global extraterrestrial irradiance, calculated as Equation (3) indicates: where I SC is the solar constant and equal to 1367 Wm −2 , φ the latitude, ω i the hour angle at the beginning of the measurement period, ω f the hour angle at the end of the measurement period, δ the declination angle, and ε the eccentricity of the terrestrial orbit [49,50].
To obtain the models, multilinear regressions and ANNs were used. ANNs are a machine learning technique proposed by [51], inspired by the connections of biological neurons. Briefly, this technique consists of an iterative process in which the network assigns different weights to the input signals until it reaches the nearest output to a target dataset [52][53][54]. This process is the so-called network training, once the ANN is trained can be fed with inputs to produce an estimated value.
The raw data from MCD18A1 [45] and MCD18A2 [46] provided by MODIS are 3 h averages covering mainland Spain with a resolution of 5 km. Modified Akima's piecewise cubic Hermite interpolation [55,56] helped to produce hourly estimates based on MODIS data. Then, this estimated hourly dataset served to train different types of models. The proposed methodology is focused on developing regional models. With that purpose, the use of a clustering of the field of study contributed to constructing models for a whole region. This methodology employs a clustering of k PAR , defined as the ratio between global extraterrestrial PAR (PAR 0 ) and PAR reaching the Earth's surface, to separate the territory of study into two areas with similar characteristics.
A clustering served to establish the regions with similar behavior of PAR. The clustering was applied to k PAR , defined in Equation (5). where the PAR 0 value was estimated as 39.8% of G 0 according to [57,58]. The clustering technique employed was the k-means method, which consists of minimizing the sum of squared distances between the centroid of the group and each object of this group. There are many clustering methods, but k-means represents a widely used method, and its results over mainland Spain fitted well with the orographic and climatological characteristics of the territory. The clustering divided the field of study into two regions with similar behavior of k PAR .
Once the regions with similar behavior were set, the development of both multilinear regression and ANN models for each region begun. The models were prepared using data provided by MODIS and ECMWF from all the points of the grid belonging to each region and subsequently validated with data provided by the stations located in the corresponding region.
The set of variables used in each model was the same for both multilinear regression and ANN models, as Table 2 explains. Combinations of sets of variables that are commonly used in the bibliography were selected as GHI, RH, T, k t , cos θ, and the product k t ·cos θ. Examples of models that utilize these variables can be found in [25,26,[32][33][34][38][39][40][41][59][60][61][62][63]. The aim making different combinations of them is to produce different models so that we can compare them and choose which model is the best and which variables are more interesting for PAR modeling. In the first five models, combinations of these variables always including GHI were utilized, since the presence of this variable is the key to obtaining good results according to [25]. However, in the last four models, the combination of variables always included the product of k t and cos θ, as this product obtained promising results in [41]. Table 2. Set of variables used in each model.
where y new mod is the corrected model value, y mod is the value of the original model, x local is a local measured value, and the coefficients a and b are the coefficients obtained in a linear fitting between measured and modeled values. The statistics used to estimate the goodness of the models developed were the slope and the intercept of the linear regression between measured and modeled PAR, the mean bias error (MBE), the root mean square error (RMSE), and the determination coefficient (R 2 ), whose expressions are shown from Equation (7) to Equation (11).
Agronomy 2021, 11, 470 6 of 17 where n is the number of recordings, σ PAR−Mod is the variance of the modeled PAR, σ PAR−Mea is the variance of the measured PAR, and σ PAR−Mod PAR−Mea is the covariance of both modeled and measured PAR. Additionally, to estimate the goodness of the models from the point of view of their representativeness when assessing the viability of PAR projects, a new index was defined. Indeed, the new index-called the Exceedance Probability Index (EPI)-is based on the exceedance probabilities, which are widely utilized to evaluate the viability of solar power plant projects. The proposed EPI estimates the difference between the exceedance probabilities for the modeled and measured variable (PAR could be replaced by any other modeled variable) as Equation (12) indicates. This difference is normalized dividing by the exceedance probability of the measured variable. According to this expression, when the exceedance probabilities of the measured and modeled values are close, EPI tends to zero.
where PAR T is a threshold PAR value, PAR min is the lower PAR value, PAR max is the higher PAR value, PAR M is the modeled PAR, and PAR O is the measured PAR.

Regional PAR Models
The clustering of k PAR conducted over the field of study permitted the development of regional PAR models. This procedure allows the separation of the field of study into regions that present similar behavior and characteristics concerning PAR. Hence, by using data from each cluster, it is possible to develop models for any place located in the area delimited by the cluster.
3.1.1. k PAR Clustering Figure 1 shows the division of the territory into two regions obtained after clustering k PAR following the k-means method. The number of optimal clusters employed was two. Details of the method can be found in [58].
Compared to the climatological regions present in Spain [67,68], the two regions obtained throughout the clustering analysis fit with the Oceanic and the Mediterranean areas, respectively. Indeed, Oceanic climate is present in northern and northwest regions of Spain and fit reasonably with the green cluster, whereas eastern, southern, and central areas of Spain have a Mediterranean climate and fit well with the yellow cluster, though there are two different climatic varieties. In the eastern and coastal zones, the Mediterranean subtype is standard, and in the inner zones, the Mediterranean subtype is continental. In mainland Spain, there is another climate called the mountain climate. As its name indicates, this climate is typical of mountain ranges and high locations. This climate appears in mainland Spain in the main mountain ranges, such as the Pyrenees, Central Range, Iberian Range, and the Sierra Nevada. The cluster division did not account for this mountain climate, but the inclusions of the green cluster into the yellow cluster are located in the main mountain ranges in mainland Spain. This indicates that the mountain climate locations have been included in the green cluster [58,69].
According to this division, three stations are placed in the oceanic cluster: Lugo-USC; Villaviciosa-SERIDA; and Vitoria-NEIKER. However, the remaining four stations belong to the Mediterranean cluster (Albacete-ITAP; Córdoba-IFAPA; Salamanca-CIALE; and Zaragoza-AULA DEI). Therefore, the validation of the oceanic models employed only data from the three oceanic stations, and the validation of the Mediterranean models employed data from the other four stations.
Since the field of study was divided into two regions, there were only two sets of nine models generated, one set for the oceanic region and another set for the Mediterranean region. Thus, the same set of equations was used for validating the three oceanic stations, and another set of equations was utilized for validating the Mediterranean stations, as Table 3 illustrates. ANN setup had one input layer and one output layer, and the number of hidden layers was chosen according to the best performance reached. All ANNs were feedforward type and were fed with the set of inputs corresponding to each model. In total, 70 % of training data served to train the ANN, while the remaining 30 % of data validated the performance of the ANN to find the optimal number of hidden layers. In this case, the ANN training for each region utilized data from the corresponding region.
Validating with data from surface stations obtained the results illustrated in Figures 2 and 3. The statistics utilized to evaluate the performance of the different models were the slope and intercept in a scatterplot, the mean bias error (MBE), the root mean square error (RMSE), and the EPI.

Site-Adaptation
To verify if the results obtained by Model 1 with MR-chosen as the best model for the first ranking-could be improved through a correction using local data, the site-adaptation technique for both the Mediterranean and Oceanic regions was applied. Likewise, this same technique was also applied to MR Model 8, as it was chosen as the best model for the second ranking, to check if an improvement could be achieved.
To obtain parameters a and b of Equation (6), half of the measured and modeled PAR data were fitted, and the other half of the data were used to validate the new model. Thus, half of the data randomly selected from Lugo-USC; Villaviciosa-SERIDA; and Vitoria-NEIKER were utilized to calculate a and b for the Oceanic region, and the same with data from Albacete-ITAP; Córdoba-IFAPA; Salamanca-CIALE; and Zaragoza-AULA DEI to calculate a and b for the Mediterranean region. The equation for the site-adaptation correction to the Oceanic region was (13), whereas for the Mediterranean region was Equation (14). Figure 4 shows a comparison between the results of Model 1 (using MR) and siteadaptation correction, whereas Figure 5 shows a comparison between the results of Model 8 (using MR), which was chosen as the best one from the assessment carried out from the EPI, and site-adaptation correction. Validating with data from surface stations obtained the results illustrated in Figures  2 and 3. The statistics utilized to evaluate the performance of the different models were the slope and intercept in a scatterplot, the mean bias error (MBE), the root mean square error (RMSE), and the EPI.

Site-Adaptation
To verify if the results obtained by Model 1 with MR-chosen as the best model for the first ranking-could be improved through a correction using local data, the site-adaptation technique for both the Mediterranean and Oceanic regions was applied. Likewise, this same technique was also applied to MR Model 8, as it was chosen as the best model for the second ranking, to check if an improvement could be achieved.
To obtain parameters a and b of Equation (6), half of the measured and modeled PAR data were fitted, and the other half of the data were used to validate the new model. Thus, half of the data randomly selected from Lugo-USC; Villaviciosa-SERIDA; and Vitoria-NEIKER were utilized to calculate a and b for the Oceanic region, and the same with data from Albacete-ITAP; Córdoba-IFAPA; Salamanca-CIALE; and Zaragoza-AULA   Figure 4 shows a comparison between the results of Model 1 (using MR) and siteadaptation correction, whereas Figure 5 shows a comparison between the results of Model 8 (using MR), which was chosen as the best one from the assessment carried out from the EPI, and site-adaptation correction.

Discussion
The clustering led to a division into two regions, one of them located in the north and northwest of mainland Spain and the other one located in the central, southern, and eastern areas of the territory. The division obtained in the clustering coincides with the two main climatological regions in mainland Spain [67,68]: Oceanic and Mediterranean. The northern cluster fits reasonably well with areas of Spain that have an oceanic climate, while the southern cluster coincides with the areas that have a Mediterranean climate. The presence of mountain ranges can explain the inclusions of the oceanic cluster in the Mediterranean one [58].
According to the results, it is not easy to infer what the best model is. It is necessary to assess all the statistics together to choose the best option. However, all the statistics used except the EPI are based on the comparison between the observed value and the corresponding modeled value. For its part, EPI is not based on a point-to-point comparison from which a compound index is obtained. Instead, the index is calculated comparing exceedance probabilities, which are the result of an entire set of points. Therefore, EPI is the best index to assess the goodness of a model that is going to be used to estimate the viability of a project. The rest of the statistics are used to assess the extent to which the model adequately reproduces the observed data. In that sense, for the evaluation of the models, we should use the EPI separately from the rest of the statistics, and propose two rankings of models: one focused on the degree of goodness when reproducing the observed data, and another focused on the goodness when assessing the viability of possible projects.
First, the models were ranked according to the first five statistics (the slope, the intercept, the R 2 , the RMSE, and the MBE). Analyzing the performance of both MR and ANN models from the slope, R 2 , and RMSE, the worst combination of variables is that of Models 6 and 8, that is, the models that did not include GHI as an input. As Figures 2 and 3 illustrate, the slopes obtained with Models 6 and 8 either with MR or ANN for all the stations were the farthest from one. The same happened with the R 2 ; the lower values were found in Models 6 and 8. Regarding RMSE, the highest values were obtained with Models 6 and 8. Indeed, the RMSEs of Models 6 and 8 were more than twice the RMSEs of the other models in the case of MR. The slope was close to one in the rest of the models. All the models obtained R 2 higher than 0.99, except Models 6 and 8. In the case of the RMSE, lower results were obtained with Models 1 and 7. According to the intercept, the worst models are 4 and 6 for both MR and ANN, and Model 9 in the case of MR whose intercepts were in all cases above 16 µmol m −2 s −1 . On the other hand, the three best models are 1, 2, and 3, because their intercepts in all cases were below 25 µmol m −2 s −1 . Regarding the MBE, the closest to zero per cent results were obtained with Model 8 when using MR, and Model 7 in the case of ANN; for these models, the MBE statistics were in all cases within the range (−2%-2%). However, the worst models according to their MBE are 4 and 9, since their numbers were the farthest to zero per cent in all cases. Thus, according to these five statistics (the slope, the R 2 , the RMSE, the intercept, and the MBE), the best performance was obtained with Model 1, for both MR and ANN models.
Regarding the differences between ANN and MR models, in the case of the slope, they were not significant in Models 1, 2, 6, and 7. Model 3 was the only one in which the MR slopes were higher than ANN, and the difference between them was in the order of one hundredth in all stations. Inversely, the ANN slopes were higher than the MR slopes in Models 4, 5, 8, and 9; additionally, in these models, the largest differences reached three hundredths. Likewise, the differences between the intercepts of MR and ANN were not significant in Models 1, 2, 6, and 7. Only in Model 3 the ANN intercepts were above the MR intercepts (except in the case of Córdoba-IFAPA station); in the remaining models, MR intercepts were higher than ANN, and these differences reached the maximum in Model 9 where they were above 10 µmol m −2 s −1 for all stations and up to 30 µmol m −2 s −1 in Córdoba-IFAPA station. The differences between the R 2 obtained with MR and ANN were negligible, except in the case of Model 3 whose differences found on Córdoba-IFAPA and Salamanca-CIALE stations reached the order of one hundredth. Regarding the MBE, the differences were negligible in Models 1, 2, and 6. The largest differences in MBE were found in Model 9 (up to 1.71%), and the MBEs of ANN were closer to zero than the MBE of MR. In the case of RMSE, the differences were not significant in Models 1, 2, 6, and 8. In Models 4, 5, and 9, there were only significant differences in Villaviciosa-SERIDA and Zaragoza-AULA DEI stations. In Model 7, the RMSE of ANN was higher than the RMSE of MR. The largest differences (higher than 3.50%) were found in Model 3 in Córdoba-IFAPA and Salamanca-CIALE stations.
As the difference between Model 1 using ANN and MR was very small, Model 1 was chosen with MR as the best model because of its time of computation, which was more than two times lower compared to ANN.
The second ranking compared the EPI results. This index seemed to have similar behavior regarding the MBE and intercept, since EPI was determined from exceedance probabilities, that is, the complementary percentiles. If a model overestimates (or underestimates), its percentiles should be higher (or lower) than the observed data percentiles. Hence, EPI moved away from zero, the same as the MBE and intercept. Indeed, as happened with the intercept and MBE, in the case of EPI, the worst models were 4 and 9, whose average EPIs on all stations were, respectively, 0.043 and 0.039 for MR and 0.032 and 0.025 for ANN. The rest of the models were quite similar-except 5-obtaining EPI in the order of 0.01. However, Model 8 appeared as the best one of the MR models (its average EPI on all stations was 0.009), whereas for ANN models, Model 3 had the best EPI results by a narrow margin (its average EPI on all stations was 0.010). Thus, Model 8 with MR was chosen as the best model for the second ranking due again to its lower time of computation.
It is a remarkable fact that Model 8 is among the worst models according to the rest of the indices and that GHI is not an input variable for this model. As previously mentioned, the slope, the intercept, the R 2 , the RMSE, and the MBE estimate the closeness of the model to the observed data point by point, so GHI is a necessary variable to adequately reproduce the PAR, as this variable is intrinsically associated with GHI (some PAR models are limited to expressing a ratio of GHI). In this work, the best combination of variables to achieve this purpose without GHI was the one proposed in Model 8 (k t , cos θ, T, and RH). Nevertheless, though GHI is not used explicitly, in fact, it is utilized implicitly since k t is defined as the ratio between the variables GHI and G 0 . Thus, the information carried in GHI is one way or another needed to obtain good PAR estimates.
According to Figure 4, the results utilizing the site-adaptation technique improved slightly in Albacete-ITAP station. However, in the remaining stations, some statistics improved but others got worse. For instance, in Cordoba-IFAPA, the MBE improved, but the slope got worse, the same as in Vitoria-NEIKER, or in Villaviciosa-SERIDA and Zaragoza-AULA DEI where the slope improved but MBE worsened, whereas in Lugo-USC, the MBE improved, but the slope remained almost the same, and in Salamanca-CIALE, the MBE improved, but the slope got worse. In all cases, the intercept improved significantly using the site adaptation, whereas there were no significant differences in RMSE and R 2 .
According to the EPI, the site adaptation improved the results in Albacete-ITAP, Lugo-USC, Salamanca-CIALE, and Zaragoza-AULA DEI, while in the remaining three stations, EPI got worse.
Bearing this in mind, it can be concluded that there was no clear improvement either by using the site-adaptation correction to Model 1-assessing the slope, the intercept, the R 2 , the RMSE, and the MBE-or using the site-adaptation correction to Model 8-assessing the EPI. Models 1 and 8 were compared to four hourly PAR models of the bibliography [32,34,41,59]. The following equations describe the models of the bibliography chosen for the comparison, where the coefficients were transformed to the units used in this work when needed: In [41], the model in Equation (15) is proposed using the product of the clearness index (k t ) and the cosine of the solar zenith angle (cos θ).
The model described in Equation (16) was introduced in [59].
The last model was introduced in [32] and is expressed in Equation (18).
According to the results shown in Figure 6, the best performance of models found in the literature was reached with the model described in Equation (15). Comparing the slopes, in all the seven locations, the models closer to one were those obtained by the model proposed in [41]. No clear tendency on the intercepts was observed. Comparing the R 2 , the models [32,34,59] were close to one in all the stations. However, the model proposed in [41] was further from one. Regarding the MBE statistics, the lowest values were also obtained using the model proposed in [41]. However, the lowest RMSE was obtained with the model proposed in [34] by a scarce difference. As far as EPI is concerned, the lowest results were those of the model proposed in [41]. The differences between the EPI of model [41] and the remaining models of the bibliography, which reached one magnitude order, are remarkable.
Comparing to the performance of Models 1 and 8, the slopes found with Model 1 were closer to one than those of the model proposed in [41], whereas the slopes obtained using Model 8 and the model proposed in [41] were quite similar. The intercepts of the model proposed in [41] were clearly closer to zero than those of Models 1 and 8, except in Córdoba-IFAPA and Salamanca-CIALE. Regarding R 2 , the results of Model 1 were the closest to one, while those of Model 8 and the model proposed in [41] were similar. The MBE obtained with the model proposed in [41] was further from zero than in the case of Models 1 and 8. The same happened with RMSE, but in this case, the difference between the numbers of Model 1 and the model proposed in [41] was higher than the difference between Model 8 and the model proposed in [41]. Regarding EPI, the values of the model proposed in [41] were higher than those obtained in Models 1 and 8.
were also obtained using the model proposed in [41]. However, the lowest RMSE was obtained with the model proposed in [34] by a scarce difference. As far as EPI is concerned, the lowest results were those of the model proposed in [41]. The differences between the EPI of model [41] and the remaining models of the bibliography, which reached one magnitude order, are remarkable.

Conclusions
The new index allows the assessment of the viability of PAR projects, as it is based on exceedance probabilities, unlike the rest of the indices, which estimate the closeness of the model to the observed data point by point. Indeed, a good model according to these indices must also be representative of the degree of viability of the project, but it is possible that a bad model according to these indices turns out to adequately represent that viability. This is the case shown in the work, where Model 8 is the best model according to EPI, but it is one of the worst according to the rest of the indices. EPI adds information from a different point of view to the traditional statistics indices and could help in decision making to assess the suitability of PAR resources in an arbitrary location. EPI could be also useful for discerning which PAR model is more appropriate to estimate PAR in order to assess its potential regarding industrial exploitations such as agro-food cultivation, vegetables for biomass cultivation, or micro-algae cultivation to obtain biofuels.
A methodology for developing regional hourly models from 3 h satellite data was presented as well. The clustering utilizing k-means methods led to a division of the field of study into two regions with similar features regarding PAR. For each of these two regions, a set of nine multilinear regression and nine artificial neural network models was developed and validated.
As PAR is among the main variables to estimate the growth of biomass and microalgae, these models were validated bearing in mind that the viability of PAR projects can be applied to a priori estimate biomass production or the viability of a microalgae growth in different regions across mainland Spain, for instance. The clustering division of the territory fitted in with the two main climatic areas of Spain. This information will be useful in futures studies to assess the best species of microalgae cultivation according to the level of PAR irradiance received and the most suitable region to set it up.
The results showed small differences in the best models between multilinear regressions and artificial neural networks, but the higher time consumption of artificial neural networks made the multilinear regression models a better option. For the first ranking considered-indices which estimate the closeness of the model to the observed data point by point-all the models that did not include GHI as an input obtained the worst results, remarking the importance of GHI for PAR modeling in this case (the best performance was reached by Model 1, which only utilizes GHI as input). However, in the case of assessing the representativeness of a model when estimating the viability of a PAR project, GHI is not required explicitly for the model (Model 8), but indeed implicitly throughout k t . This result illustrates that a model with traditional statistics, which can appear to be a bad model in the sense of replicating ground measurements, can be a good model from the point of view of the exceedance probabilities and may add valuable information for industrial exploitation.
A correction using a site-adaptation technique was introduced. The statistical results of the site adaptation to Model 1 and Model 8 improved some statistics but got worse in others. Thus, there was no clear improvement in the use of site adaptation in this case.
Finally, a comparison between Model 1 and 8 against four models of the bibliography was carried out. The results of the comparison showed that the performance of the models presented in this work was better than those of the bibliography, though the slopes and the R 2 of Model 8 and the model proposed in [41] were similar, and the intercepts of the model proposed in [41] were the lowest.   [45,46] (accessed on 8 April 2020). Data provided by ECMWF used in this study are openly available at https://cds.climate.copernicus.eu/ at doi:10.24381/cds.e2161bac, reference number [47] (accessed on 17 April 2020). Measured data belonging to GEOPAR project are available to bona fide researchers, subject to an agreement. For further details of the data and how to request access, contact the corresponding author.