Evaluation of High-Resolution Satellite-Derived Solar Radiation Data for PV Performance Simulation in East Africa

Palmer, Diane; Blanchard, Richard

doi:10.3390/su132111852

Open AccessArticle

Evaluation of High-Resolution Satellite-Derived Solar Radiation Data for PV Performance Simulation in East Africa

by

Diane Palmer

^*

and

Richard Blanchard

Centre for Renewable Energy Systems Technology, Loughborough University, Loughborough LE11 3TU, UK

^*

Author to whom correspondence should be addressed.

Sustainability 2021, 13(21), 11852; https://doi.org/10.3390/su132111852

Submission received: 23 September 2021 / Revised: 21 October 2021 / Accepted: 22 October 2021 / Published: 27 October 2021

(This article belongs to the Section Energy Sustainability)

Download

Browse Figures

Versions Notes

Abstract

:

Access to reliable, clean, modern cooking enhances life chances. One option is photovoltaic cooking systems. Accurate solar data are needed to ascertain to what extent these can satisfy the needs of local people. In this paper, we investigate how to choose the most accurate satellite-derived solar irradiance database for use in Africa. This is necessary because there is a general shortage of ground measurements for Africa. The solar data are needed to model the output of solar cooking systems, such as a solar panel, battery and electric pressure cooker. Four easily accessible global horizontal irradiation (GHI) satellite databases are validated against ground measurements using a range of statistical tests. The results demonstrate the impact of the mathematical measure used and the phenomenon of balancing errors. Fitting of the satellite model to the appropriate climate zone and/or nearby measurements improves accuracy, as does higher spatial and temporal resolution of input parameters. That said, all the four databases reviewed were found to be suitable for simulating PV yield in East Africa.

Keywords:

solar radiation; satellite-derived irradiance; global horizontal irradiance; clear sky model; ground stations; validation

1. Introduction

Accurate knowledge of incoming solar radiation at specific locations is very important for many applications. In the context of this research, it is required for modelling PV yield as input to solar cooking systems for the Modern Energy Cooking Services project (https://mecs.org.uk/, accessed on 26 October 2021) Worldwide, nearly three billion people rely on solid fuel for cooking and heating. This has health and environmental implications. Women and children especially are exposed to smoke, resulting in respiratory illnesses, cataracts, heart disease and cancer. Much time and human energy is expended in firewood collection. Reliance on wood fuel contributes to climate change and local forest degradation. The Modern Energy Cooking Services Programme (MECS) is investigating how to rapidly transition from biomass to genuinely “clean” cooking (e.g., with electricity). The aim of this ongoing research is to investigate the possibility of developing a solar power support system that can support individual electric cooking systems in off-grid situations. Such a system might comprise a solar panel, battery and a cooking device such as an electric hob or electric pressure cooker. A detailed solar resource assessment is necessary to discover to what extent such solar enabled cooking can supply people’s needs.

Europe has a relatively dense network of well-maintained weather stations which provide publicly available data. In Africa, the situation is quite different. There are proportionately few ground sensors and a dearth of accessible weather measurements [1]. Thus, an alternative source of information must be sought. Satellite-derived radiation datasets are widely regarded as the most accurate alternative. However, not all solar datasets are created equal. Moreover, as yet, there is no standardised approach for choosing the most suitable solar irradiation dataset [2].

It is difficult to select a dataset from published validation statistics. These use different locations, temporal resolutions, methods of error calculation, data filtering and data aggregation processes. Yang and Bright (2020) [3] suggest that due to uncertainties in ground records, it is better to ask if the database under investigation is sufficient for the intended purpose, or if one dataset performs better than another, rather than relying on error and bias values. Another reference is [4].

The goal of this article is to determine which of four easily available satellite-derived global horizontal products is to be preferred for modelling PV output in East Africa. (Incidentally, hourly PV output may also be calculated from cloud data and ambient temperature change [5]). This is novel because solar global horizontal irradiation (GHI) satellite databases have only previously been compared in South Africa [6] where the solar market is established.

There are three specific manuscript objectives:

To evaluate and compare GHI satellite datasets in East Africa with a view to advising which database to use where. GHI satellite data are mostly verified against data from the archive of the Baseline Surface Radiation Network (BSRN), based at the World Radiation Monitoring Centre (WRMC). However, there are just three BSRN monitoring stations on the continent of Africa (Algeria, Namib Desert and South Africa), as opposed to 13 in the USA and 11 in Europe. Additionally, the West has many other ground stations, which, although accurate, do not belong to the BSRN network. Africa is very short of ground-based solar radiation sensors in general.
To establish whether the different clear sky models utilised by satellite-derived solar radiation datasets affect the outcomes of the dataset values in East Africa. Clear sky models differ in complexity of algorithm, atmospheric inputs, temporal and spatial resolution of atmospheric inputs and location where the model was fitted.
To compare and contrast solar GHI satellite data with measurements from ground stations in East Africa.

The paper is organised as follows. Section 2 describes the satellite-derived GHI databases compared in this research, and the ground station data obtained for comparative validation by this project. Section 3 summarises the methods used. Section 4 investigates the extent to which comparative accuracy of databases can be ascertained without weather station data; additionally, it explains and discusses the results of a multiplicity of statistical tests used to differentiate between the four GHI satellite datasets. Finally, Section 5 and Section 6 present the discussion, conclusions and main messages of this research.

2. Instruments, Places and Measurements

There are many satellite-derived solar radiation databases. For this research, up-to-date, high-temporal resolution GHI ones were required (scales of one minute to one hour). It was also necessary to select those which cover Africa, as some are confined to India, Europe or the USA. Suitable candidate datasets include free products, e.g., Solemi [7], available upon request. There are also paid-for services: Meteonorm [8], Reuniwatt [9], SoDa [10], SolarAnywhere [11], Solargis [12], 3E [13] and 3Tier [14]. The four databases selected for use in this paper (detailed in Section 2.1) were instantly downloadable and free. (Except for Solcast, which has a generous free allowance for researchers).

Remotely sensed solar data also have a role to play in climate studies, e.g., radiative transfer, solar radiation variability and the 11-year solar cycle [15]. Such archives have extensive spatial coverage and comprise several decades of data [16].

The ground measurements used for validation were the only ones available to the authors during the COVID-19 pandemic outbreak (2020–2021) when this paper was written. The ground data locations are described in Section 2.2.

All the database time series values were averaged or rounded to the nearest time period end so that inner database joins could be performed to enable subsequent analysis. The global horizontal irradiance values in all databases have compatible units, being recorded in Wh/m², except for the ground measurements from Galu and Munje (see below). These daily values were divided by 10 (the average number of daylight hours in the day in Kenya) to convert them to Wh/m².

2.1. The Satellite-Derived GHI Databases Used in This Research

Two of the solar radiation products under investigation here are produced by the Climate Monitoring Satellite Application Facility (CMSAF) of the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT). The Joint Research Centre (JRC) Photovoltaic Geographical Information System [17] versions are used.

The first of these is SARAH, the Surface Solar Radiation DataSet-Heliosat. These data are available at hourly intervals and at a spatial resolution of 0.05° (5.6 km). Extensive validation has been performed by Urraca et al. [18]. SARAH employs observations from the Meteosat Visible Infra-Red Imager (MVIRI) and the Spinning Enhanced Visible and Infrared Imager (SEVIRI) instruments carried by EUMETSAT geostationary Meteosat satellites. The Heliosat-2 algorithm is utilised. It subtracts cloud properties recorded by the satellite sensor from clear sky irradiance. Clear sky radiation is obtained via the SPECMAGIC method (SPECtral Mesoscale Atmospheric Global Irradiance Code [19]). Inputs to SPECMAGIC comprise aerosol properties, total column water vapour and ozone in the form of a monthly look-up table for processing speed. SPECMAGIC was fitted at two European sites.

The second is CMSAF. CMSAF data are supplied at 15 min, hourly, daily and monthly intervals, with 0.05° spatial resolution. CMSAF uses the same instruments, inputs and algorithms as SARAH, but the look-up table is updated continuously with 3-hourly satellite-derived values of atmospheric inputs [20].

The third satellite-derived GHI solar database examined is CAMS (Copernicus Atmosphere Monitoring Service) [21]. Temporal resolution is one minute to one month. (One-, five-, fifteen- and sixty-minute data are used here). It is spatially interpolated to the point of interest. Again, CAMS uses Meteosat/SEVIRI, but this time, the Heliosat-4 model [22] is applied. Heliosat-4 combines inputs from the McClear clear sky model and the McCloud cloud properties model [22]. The McClear model (also used by this research) takes as inputs the solar position, ground reflectance, ground elevation (Shuttle Radar Topography Mission (SRTM)) and atmospheric particulates (with 3-h temporal resolution) zoned according to simplified Köppen Climate Classification (tropic, mid-latitude or sub-Arctic). The McCloud algorithm divides clouds into four types (low, medium, high or thin ice) and treats these separately.

The last solar irradiance product studied in this paper is Solcast [23]. (Five-, fifteen- and sixty-minute temporal scales are used here). This is a paid-for service. (The author suggests the PVsyst version is chosen, regardless of intended software, for ease of analysis). Satellite inputs include those from Meteosat. Like CAMS, a clear sky model (REST2v5, parameterised in the U.S). and a cloud model (proprietary in this case) are used [24]. Atmospheric inputs are from MEERA-2 reanalysis [25]. MEERA-2 temporal resolution is hourly, but the spatial resolution is 50 km. The ground altitude data incorporated are likewise of low spatial resolution.

All the satellite-derived GHI databases reviewed here use input data from the same satellite sensors. All are semi-empirical (fitted to ground measurements somewhere to some extent). CMSAF might be expected to be more accurate than SARAH because atmospheric data are three-hourly rather than monthly. This also applies to CAMS. Differences may also arise from the different clear sky models (SPECMAGIC versus McClear) and cloud properties models. Solcast has high temporal resolution of atmospheric variables but these have low spatial resolution, as do the ground elevation data inputs. The four satellite-derived databases analysed in this research are summarised in Table 1.

2.2. The Ground-Based Data Used in This Research

The ground-based data measurements used in this research are from two sources. The first is from two locations for a solar nano-grids project [26] (Figure 1). The second author was involved in this project.

The two villages in Kenya where the measurement instruments are located are:

Lemolo B (latitude: –0.01°; longitude: 36.04°), in a semi-arid region of Kenya (Köppen–Geiger climate classification AW, tropical savannah);
Echareria (latitude: −0.35°; longitude: 36.22°) with a Köppen–Geiger climate classification of Csb. That is, it enjoys a “Mediterranean” climate with a dry summer and mild wet winter.

Data are available from Lemolo B for July 2016 to December 2017, and from Echareria for September 2016 to October 2017. The data logging interval varies slightly but is generally 7 consecutive one-second values at the end of each minute (UTC). One-, five-, fifteen, sixty-minute and daily averages were calculated for the purposes of this analysis. The measurement instrument was a CS300 (SP-110) APOGEE PYR-P silicon photovoltaic detector. (Calibration uncertainty at 1000 Wm² less than 3%, traceable to the World Radiometric Reference (WRR) in Davos, Switzerland). The data were quality controlled as described in Appendix A.

The second source of ground data measurements is daily global horizontal irradiance data for two locations in a ground water management project [27] (Figure 1).

The details of the two villages in Kwale County, Kenya, where measurement took place are:

Galu: latitude −4.35°, longitude 39.57°; and
Munje: latitude −4.51°, longitude 39.46°.

Both are Köppen–Geiger climate classification Af, tropical rainforest. They are coastal, near Mombasa. The measurement equipment is Maplin Professional Solar Powered Wi-Fi Weather Station (Maplin N23DQ), which records solar radiation every 5 min (accuracy ± 3–7%). This is aggregated to daily totals before being made public. There is no way of obtaining any further information about these data.

During the COVID-19 pandemic when this article was written, these were the only ground GHI measurements that were possible to obtain. There are few solar data for Africa in any eventuality. Comparisons with all GHI satellite databases were affected, so none of them were unfairly disadvantaged.

3. Methodology

The following statistical tests were used to validate the GHI satellite data against the ground values: normalised Root Mean Error (nRMSE), normalised Mean Bias Error (nMBE), hourly average, hourly standard deviation, trendlines, Pearson Product-Moment Correlation Coefficient, average GHI per hour of day, average GHI per day of year and frequency distribution.

4. Results

4.1. Selection of GHI Satellite Database without Ground Validation

Initially, we attempted to choose a suitable database to simulate PV output for a cooking system without the support of ground-based measurements, which is normally the situation throughout most of Africa. Two databases were selected for comparison: SARAH and CMSAF. Ten years of data (2007–2016) were analysed. The port of Dar es Salaam, on the Tanzanian coast, was taken as the example. Dar Es Salaam is Köppen Climate Classification subtype “Aw” (Tropical Savannah Climate). It is located on the coast of the Indian Ocean, at an elevation of 10–60 m, in the southern hemisphere.

Direct comparison methods were selected from the many statistical metrics available, as they are well known and simple to apply. As can be seen in Table 2, overall summations and averages do little to distinguish between the two GHI satellite databases. Yearly totals and hourly averages are almost the same. Mean standard deviation of each hourly GHI value over 10 years is not particularly high, at around 10% of maximum hourly values, although SARAH does vary more than CMSAF.

More useful for distinguishing between the datasets are average hour of day values, which show that SARAH is nearly always greater than CMSAF, except at the end of the day (Figure 2). Here, SARAH is overestimating (or CMSAF is underestimating), although the differences are not very large, except for the last daylight hour.

Monthly differences tell us that SARAH GHI is less than CMSAF in May–November, when precipitation is at its lowest. SARAH GHI is more than CMSAF in the other months (Figure 3). Therefore, one of the models is not responding to cloud cover as well as the other. Again, the differences are not very big (average monthly difference: 3.4%).

Daily averages also indicate that SARAH GHI is less than CMSAF in May–October (cooler dry season) (Figure 4).

Looking at the frequency chart (Figure 5), CMSAF GHI is greater than SARAH for GHI between 226–426 Wh/m² and 476–550 Wh/m². That is, SARAH GHI is less than CMSAF at low-medium GHI values. It is likely that these are occurring in the dry season (from the daily and monthly graphs) and from 5:00 p.m. to 6:00 p.m. from the daily graph.

The statistical tests so far indicate that there is a difference between the two GHI satellite datasets but do not give any guidance on which is preferable for the intended purpose. SARAH has lower irradiance values in the dry season, suggesting that it is possibly less representative of the true situation, but the evidence for this is weak.

The problem of model validation without measurement data has been discussed in the discipline of hydrology [28], but there are no references on this topic in the solar PV field, despite it being a very common problem. In hydrology, nearby data are used, but solar data change rapidly over short distances [29]. Therefore, the second suggestion of using values from the literature is adopted in the following investigation. Theoretical clear sky values from Meteonorm are compared to the two GHI satellite databases at Dar es Salaam for one year (Figure 6).

It may be seen that SARAH tracks the clear sky values more closely. The general trend is for SARAH GHI values to be higher than those of CMSAF. While this would indicate accuracy in a desert, Dar es Salaam is a tropical savannah, also known as a tropical wet and dry climate. Therefore, variance well below clear sky values in the wet season (November–May) is anticipated. Turning to the dry season (May–November), both GHI satellite databases occasionally exceed the clear sky value, SARAH 16% of the year and CMSAF 9% of the year.

The foregoing comparison with theoretical values again indicates that there is a difference between the two GHI satellite datasets. There is weak evidence to suggest that SARAH is less accurate.

To conclude this section, it may be deduced that some idea of which database better relates to reality may be obtained by comparing them to local climate descriptions and seasonal behaviour. Comparison with clear sky values is another alternative. In both cases, any inference reached is somewhat arguable, and there appears to be a strong need for validation with ground measurements.

4.2. Accuracy of the GHI Satellite Databases Determined by Ground Validation

The following four sections describe the comparison of GHI satellite databases to ground-based measurements. To commence, the measure of differences between the ground-based data measurements and the four satellite-derived databases under investigation was determined by calculating the normalised Root Mean Square Error (nRMSE), normalised by the mean of inputs.

4.2.1. nRMSE of the Four GHI Satellite Databaes

Looking at the highest temporal resolution data first, the only possible comparison was between CAMS and McClear (the clear sky model) because these are the only databases (of those investigated) for which one-minute interval data are available. The average value per minute was calculated for this purpose from the ground-based data. As would be expected, CAMS performs better than the clear sky model at this level, because it accounts for cloud fields, although both deliver a suitable nRMSE (Table 3), considering the time interval.

Moving up to five-minute interval data, CAMS and Solcast data were compared. The CAMS values and ground-based measurements were calculated as the average of the period. The Solcast data were downloaded directly at this temporal resolution. It may be seen from Table 3 that Solcast is more accurate than CAMS at this timing. The same may be observed for 15-min interval data, which are directly available at this resolution from both CAMS and Solcast (Table 3). Solcast has almost the same value of nRMSE for 15-min data as for 5-min data, whereas CAMS has a different value. This suggests that the method of aggregation is having an impact.

Juxtaposition of more GHI satellite databases and another Song site (Echareria) was possible for hourly data, because of greater data availability at this resolution. The results are illustrated in Figure 7. The ground measurements were averaged to 60-min intervals, but all the other datasets were available ready-prepared at this granularity. It is evident that the SARAH database performs poorly, being no better (Echareria) or worse (Lemolo) than the clear sky model. At Lemolo, next best is Solcast, with CMSAF and CAMS being the most accurate, with little between them (within the range of pyranometer uncertainty). At Echareria, CMSAF is third best, Solcast second and CAMS slightly outperforms Solcast, to give overall best accuracy. nRMSE values are lower for all databases at Lemolo due to its semi-arid climate. CAMS would be anticipated to deliver good results because it is a modern model. However, outperforming Solcast is surprising, because CAMS has lower temporal resolution input data. The small improvement in nRMSE must be due to the higher spatial resolution of CAMS input data.

The raw numbers upon which Figure 7 is based are given in Appendix B. Figure 7 is based on 2016 data only because this is the last year for which SARAH and CMSAF are currently available. However, the same pattern is observable between Solcast and CAMS if 2017 data are included to take advantage of the remaining ground measurements (Appendix B, Table A4). Comparison of the same hours for both Song sites also gives the same order of performance (Appendix B, Table A4).

Normalised mean bias error (nMBE) values for the two sites additionally reveal virtually the same pattern of accuracy between databases (Table 4). Positive nMBE results demonstrate (on average) under-estimation in all cases. All nMBEs are low due to cancellation (mitigation of positive and negative values).

Moving on to daily data granularity allowed the inclusion of two more Kenyan locations, Galu and Munje (Upgro project). Figure 8 shows that SARAH fares the worst at this interval at Lemolo and Echareria. There is little to choose between the other databases at Lemolo and Echareria. Having said that, Solcast performs well at this timescale, being best at three of four sites. Note that this graph was based on 235 days of 2016 data only, because this was all that matched in the GHI satellite dataset and the ground-based measurement records from the logger at Lemolo, Echareria and Galu. Thus, observations may be subject to anomaly caused by the low quantity of data. (Only 2017 data were only available for Munje (182 days), obviating the use of SARAH and CAMS data). The raw numbers upon which Figure 8 is based are given in Appendix B (Table A5). Galu and Munje are at sea level, whereas Lemolo and Echareria are situated at 1961 m and 1594 m, respectively.

Thus, it appears that accuracy of satellite-derived databases is dependent on climate, temporal resolution, height above sea level and method of deriving one-minute, five-minute and hourly data from the original fifteen-minute satellite interval.

4.2.2. Instantaneous Accuracy of the GHI Satellite Databases

The data in GHI satellite databases are generally taken as representative of the whole time period of its resolution, e.g., 15 min. However, satellite images are taken at an instant in time and, in fact, only reflect that instant. Therefore, a further comparison was made between the satellite values and the ground-based one-second value closest to the end time of those values, rather than with the average of ground-based readings for the whole period, as detailed above. The end time of the satellite 15-min interval was used as the best compromise. In fact, the satellite image may be taken any time in the 15-min interval, but for prepared GHI values, this time is not stored.

The results of this analysis are given in Table 5 below. At the hourly resolution, there is little to choose between databases, except for SARAH. Solcast outperforms CAMS for 15-min data.

Although the foregoing discussion demonstrates that selection of the most accurate satellite model is not clear-cut, depending on location, resolution and method of ascertaining accuracy, the CAMS model would seem to be a good choice for most Kenyan sites. It is free to download and current.

4.2.3. Managing Changing Uncertainties and Preserving the Temporal Pattern of the Data

The nRMSE and nMBE measures employed above utilise the sum of squared residuals, which assumes that the size of the error term does not differ across values. This is does not hold true for the GHI satellite databases under investigation, as is obvious from the frequency charts (see later). Additionally, these methods consider each data value at each time separately. They lose any pattern which may exist between previous and subsequent values. A performance metric capable of respecting the relationship between data points is the Pearson Product-Moment Correlation Coefficient (PMCC) [30].

PMCC draws a trendline through a scatterplot of two data variables. Its value, r, is an indication of how well the data match the line of best fit. r ranges between 0 (no relationship between the two datasets) and 1 (a perfect relationship) [31].

The PMCC values for hourly GHI data for Lemolo and Echareria are shown in Table 6.

According to this metric, CAMS and CMSAF jointly have the best accuracy at Lemolo, followed by Solcast, with SARAH last. (That is, the same as the nRMSE ranking). At Echareria, CAMS is best, followed by Solcast, then CMSAF, with SARAH coming last—again, the same as nRMSE comparison.

4.2.4. Statistics for the Ground-Based Measurements and GHI Satellite Databases

Having determined the accuracy of the four GHI satellite databases under investigation in this research, the effect on solar irradiation values is explored. Table 7 details the findings. For Lemolo, SARAH is closest in terms of overall solar radiation sum and hourly average to the logger measurements, due to its smaller standard deviation. Compensating errors are occurring more frequently than for the other GHI satellite databases. CMSAF and CAMS are remarkably different, considering their similar nRMSE values. All the databases tend to over-estimate, using this measure, with CMSAF being the worst at this. However, they perform this over-estimation in just one-third of daylight hours, under-estimating for most of the time. Analysis of values for Echareria generates a contrasting set of observations (Table 7). Solcast has the most accurate overall solar radiation sum and hourly average with the smallest standard deviation, and SARAH has the greatest mismatch at this site. Again, all the databases tend to under-estimate in most hours.

Thus, the accuracy and usefulness of GHI satellite database appear to vary from location to location in the same African country, despite the same quality control procedures being applied. Suitable performance at one site cannot be taken as a guide for the country as a whole.

The data are now decomposed for closer examination. In the case of percentage difference of satellite value to logger per hour, Solcast shows the greatest similarity at Lemolo. The other databases cluster closely together, further away from the logger and Solcast. Figure 9 demonstrates this observation in the form of trendlines. The busy data series plots are hidden for clarity. At Echareria, CMSAF and CAMS are jointly closest to the logger, with the trendlines of Solcast and SARAH being at greater distances (Appendix B, Figure A1).

Looking at the average hourly GHI value for each discrete daylight hour from each data source for Lemolo (Figure 10), compared to the logger, all datasets over-estimate, except SARAH, which tracks the logger closely at midday. (Note: this observation is not the consequence of incorrect time stamps. This has been tested and all databases aligned to the nearest hour (nn:00): 60 min CAMS reports at nn:00, as does Solcast (PVSyst version), SARAH at nn:06 and CMSAF at nn:51).

In the case of Echareria, all databases over-estimate, noticeably at noon. Solcast tracks the logger the closest (Appendix B, Figure A2).

However, looked at in terms of percentage differences, all the GHI satellite databases are only around 10% different from the ground data value in the early afternoon hours at Lemolo, i.e., the most productive hours for PV (Figure 11), although using this measure, CAMS is frequently most accurate. This suggests that any of them may function well for the purpose of PV performance simulation. All databases also have similar comparative hourly differences to logger readings at Echareria (Appendix B, Figure A3).

On a daily basis, SARAH has the most similar values to the ground measurement at Lemolo on average (Table 8). (Solcast has the nearest value at Echareria). At Lemolo, SARAH under-estimates in summer and over-estimates in winter (Appendix B, Figure A4). The other databases are only inclined to this trend to a minimal degree (Figure 12). The average values in Table 8 hide the observation that Solcast really has the closest daily values to the ground measurements, with CAMS and CMSAF also performing well, and SARAH less so (Figure 12).

A study of the frequency distribution of GHI at Lemolo shows that SARAH has too many low values as compared to the logger, and CAMS has too few. All the databases mirror the logger reasonably well between 100 Wh/m² and 1000 Wh/m². SARAH tracks it the best, then Solcast and CMSAF, with CAMS coming last. All have too many very high values (Figure 13). There are also too many high values in all GHI satellite databases at Echareria (Appendix B, Figure A5).

Study of further statistics does not clarify the issue of choice of satellite GHI database to any extent. CAMS has some of the best nRMSE and nMBE values but SARAH has the most realistic frequency distribution. SARAH can have either the best or worst accuracy in respect of sums and averages depending on location. Solcast has trendlines of percentage difference to logger closest to zero (i.e., best match) for hourly and daily values at both Lemolo and Echareria. Solcast also does well on a daily basis. All databases are likely to overestimate in the middle of the day by around 10% when most PV production occurs, so in this regard, they are evenly matched.

5. Discussion

Photovoltaic system designers, including those with electric cooking loads, need to consider the accuracy of solar databases and not take them verbatim. For example, it might be prudent to consider a 10% buffer in the design of a photovoltaic domestic cooking system to account for the variation and uncertainty in the results. Whilst in the past, this might have added to system costs, the rapid fall in the price of photovoltaic modules means that this should not be an additional economic burden and has the potential to ensure greater end-user service satisfaction. However, further investigation is needed with more ground stations in other parts of Africa.

Which database performs best is site-dependent. In the case of all the databases, it is a matter of how well equal and opposite errors balance, rather than few errors. Some have many small errors, others have fewer large ones, and this varies from location to location, season to season and time of day. Modelled databases exhibit long-range power law correlations, whereas the trend for measured values is white noise behaviour [32].

Looking at the resolution of the atmospheric inputs (Table 1), it would be expected that SARAH would have the worst performance. A review of the different analytical tests and four Kenyan sites investigated here shows that it sometimes does, but not always. Occasionally, it can exhibit the best accuracy of the GHI satellite databases examined. From the inputs table (Table 1), Solcast should be best. It is at Echareria, but not at Lemolo. The accuracy of the GHI satellite databases is thus influenced rather complexly by both resolution of atmospheric inputs, performance of the clear sky model utilised and usage of cloud properties model. It depends how well the clear sky model performs in the particular climate zone (or percentage of clear, almost clear, partly clear, totally cloudy days) at the site of interest. That is, it is site-dependent. In addition to this, it has been found that some clear sky models are more sensitive to uncertainties in inputs than others [33].

To clarify the findings of this paper, Appendix B, Table A6 summarises the accuracy ranking of each database in relation to the other three for each performance metric for each temporal resolution for each site. Looking at the hourly and daily data at Lemolo, SARAH achieves the best accuracy most times, but it also achieves the worst accuracy most times. At Echareria, Solcast has the highest number of best scores, with SARAH having the highest number of worst scores. Taking both sites together, CAMS and Solcast jointly perform the best and SARAH the worst. An alternative is to calculate the average rank (Appendix B, Table A7). (Lowest score is best). Looking at hourly data only, CAMS is best at both Lemolo and Echareria. Taking both sites together, CAMS scores the highest accuracy followed by Solcast in second place, then CMSAF, and lastly, SARAH.

6. Conclusions

It is not feasible to verify a satellite-derived GHI model without ground measurements, although an informed guess as to which model is likely to perform satisfactorily may be made via comparison with knowledge of local seasons and climate and/or clear sky data.

The comparative accuracy, and, therefore, the selection of satellite-derived GHI databases, has been shown to be site-dependent. Therefore, the datasets which by chance have been fitted to ground measurements close to the site of interest, or datasets which employ greater numbers of ground stations data in their construction, are likely to display superior performance. Performance is evidently influenced by climate and height above sea level, although the role these factors play is not clear from the analysis carried out here. Temporal resolution and methods of deriving 1-min, 5-min and hourly data from the original 15-min satellite interval are also playing a part. Additional factors are the compatibility of the clear sky model to the climate zone of the site of interest, spatial and temporal resolution of clear sky model input parameters and the susceptibility of the clear sky model to imperfections in input, as well as inclusion of cloud properties model.

Based on the preceding analysis, the CAMS model, as a publicly available, up-to-date and fairly accurate resource, appears to be an appropriate option for PV simulation in East Africa. However, in general, all the databases deliver figures around 10% of the ground measurement values in the middle of the day (little more than pyranometer uncertainty). This contrasts with findings in the U.K., where one database clearly outranked the others [34]. Whether this level of accuracy is sufficient to model the provision of energy for pressure cookers, hotplates, etc., has yet to be determined in a subsequent publication. A simple investigation, using the PVGIS Performance of Off-Grid PV Systems tool to model the type of solar cooking system usage envisaged by the MECS project (300 W solar panel, battery 24 V/75 Ah, 1.0 kWh daily consumption for two meals) at Lemolo, revealed some difference between databases. SARAH models 25 days per year with an empty battery, and 283 days with a full one. CMSAF predicts 11 days per year with an empty battery, and 304 days with a full one. SARAH generally anticipated a lower state of battery charge, suggesting longer cooking times.

Impression of accuracy is determined by which mathematical measure is employed. Solar radiation publications commonly use average hourly/daily nRMSE or nMBE but these aggregate values can cover trends. Equal and opposite errors may counterbalance. Hourly trendlines of percentage differences and nRMSE values rank the GHI satellite databases in the same order of accuracy at Echareria (Appendix B, Figure A1 and Figure 7), but not at Lemolo (Figure 7; Figure 9).

Although making up the difference over the long term is acceptable for calculating the profitability of a solar farm, it is of little use when investigating if an individual solar panel can power a cooking device at a specific time. If a battery is used in the cooking system, daily data become applicable. Viewed on an hour of day (Figure 11) or daily basis (Figure 8), none of the satellite-derived GHI databases largely outperform the others. In any one hour, one will be more accurate than the others, but there is no consistency as to which one this is.

The location dependence of GHI databases’ accuracy means that superior precision at one site (or a small number of sites) cannot be taken as a guide for East Africa or one country there as a whole. Ground-based measurements (e.g., for one year) are necessary to select the more accurate GHI satellite database at each specific location. It is hoped the next steps in the MECS project will include setting up a ground station in the region.

If it was known which clear sky model performs best in each climate, it would be possible to select a satellite GHI database appropriately. This would overcome the problem that GHI satellite databases are used where there are no ground measurements, but ground measurements are needed to choose the most accurate GHI satellite database. Only one study has investigated this [35]. However, here, the 29 Köppen zones are simplified into five. This is not enough because Kenya is arid and REST2 is reported as the best, but this has not been found to be so at all sites in this study.

Our specific recommendations are as follows. (1) The CAMS model generally gives suitable results in East Africa; (2) nevertheless, some leeway (e.g., 10%) should be allowed for variation and uncertainty in the results. (3) PMCC and other trendline analyses which maintain the relationship between successive values have proved to be helpful in interpreting solar time series.

Finally, the goal of this research was to establish which satellite-derived solar irradiance dataset is the most suitable for simulating PV yield in East Africa. The initial findings presented here suggest that all four databases reviewed are suitable for this task. Future work will include comparing modelled PV output based on GHI satellite datasets to actual output. This may further enhance our understanding of suitability.

Author Contributions

Conceptualisation, methodology, validation, formal analysis, investigation, data curation, writing—original draft preparation, D.P.; resources, writing—review and editing, supervision, project administration, funding acquisition, R.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was part supported by the UK Foreign Commonwealth and Development Office grant IATI Identifier: GB-GOV-1-300123: Modernising Energy Cooking Services. and through the Innovate UK Energy Catalyst Round 6: Project Number 10528, Productive Use of DC Solar Power in Africa to Improve Quality of Rural Life. The APC was also funded by these two sources.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets related to this article can be found as follows: PVGIS-SARAH at https://re.jrc.ec.europa.eu/pvg_tools/en/tools.html (accessed on 26 October 2021), an open-source online data repository hosted at Joint Research Centre EU Science Hub [17]; PVGIS-CMSAF at https://re.jrc.ec.europa.eu/pvg_tools/en/tools.html (accessed on 26 October 2021), an open-source online data repository hosted at Joint Research Centre EU Science Hub [17]; CAMS at http://www.soda-pro.com/web-services/radiation/cams-radiation-service, (accessed on 26 October 2021), an open-source online data repository hosted at the research center O.I.E. of Mines ParisTech (Center Observation, Impact, Energy) [21]; Solcast at https://solcast.com/, (accessed on 26 October 2021), a commercial data supplier with support for researchers [23]; McClear clearsky model at http://www.soda-pro.com/web-services/radiation/cams-mcclear (accessed on 26 October 2021), an open-source online data repository hosted at the research center O.I.E. of Mines ParisTech (Center Observation, Impact, Energy) [21]; Ground-based Kenyan data at Solar Nano Grids (SoNG) [26], a research partnership. Data available on contact; Ground-based Kenyan data at Gro for GooD: Groundwater Risk Management for Growth and Development [27]. The data were downloaded from: https://metadata.bgs.ac.uk/geonetwork/srv/eng/catalog.search#/metadata/5cfd5112-e0c0-41cb-e054-002128a47908 (accessed on 26 October 2021).

Conflicts of Interest

The authors declare no conflict of interest.

Glossary

Abbreviation	Definition
BSRN	Baseline Surface Radiation Network
CAMS	Copernicus Atmosphere Monitoring Service
CMSAF	Climate Monitoring Satellite Application Facility
ETR	Extraterrestrial irradiation
EUMETSAT	European Organisation For the Exploitation of Meteorological Satellites
GHI	Global horizontal irradiation
h	Solar Elevation
JRC	Joint Research Centre
MECS	Modern Energy Cooking Services Programme
nMBE	Normalised Mean Bias Error
MERRA-2	Modern Era Retrospective Analysis for Research and Applications, Version 2
MVIRI	Meteosat Visible Infra-Red Imager
nRMSE	normalised Root Mean Error
PMCC	Pearson Product-Moment Correlation Coefficient
PV	Photovoltaic
PVGIS	Photovoltaic Geographical Information System
QC	Quality Control
SARAH	Surface Solar Radiation Dataset-Heliosat
SEVIRI	Spinning Enhanced Visible and Infrared Imager
SoNG	Solar Nano Grids
SPECMAGIC	SPECtral Mesoscale Atmospheric Global Irradiance Code
SPECtral	Shuttle Radar Topography Mission
SRTM	Shuttle Radar Topography Mission
WRMC	World Radiation Monitoring Centre

Appendix A

Quality Control of Lemolo and Echareria Data

Data quality control checks summarised in Table A1 were applied. The chosen tests were selected from Journée and Bertrand (2011) [35] and Laitia et al. (2014) [36]. These are based on guidance from Baseline Surface Radiation Network (BSRN) from the World Radiation Monitoring Centre (WRMC). The procedures were chosen with regard to availability of data (i.e., no beam or diffuse irradiation data were available).

Table A1. Quality criteria of GHI data used in temporal drift tests (TD), physical threshold tests (PT), step tests (S), persistence tests (P) and spatial consistency tests (SC).

Type of Test	Test Name	Test Description	Quality Criteria
TD	Temporal Drift	Clock drift detection	i. Comparative hourly plots between datasets
TD	Temporal Drift	Clock drift detection	ii. Comparative hourly plots between datasets and clear sky values ¹.
PT	Upper Limit	Upper bound when comparing surface solar radiation data against the extraterrestrial solar radiation ².	GHI/ETR < 1 if h > 2°
PT	Upper Clear sky Limit	Upper bound when comparing surface solar radiation data against the clear sky solar radiation ¹.	GHI/Clear sky irradiance <= 1.1 if h > 2°
PT	Lower Limit	Lower bound for heavily overcast conditions with low atmospheric transparency.	GHI ≥ 0.03 × ETR
PT	Clear sky hours	Number of clear sky hours ³.
PT	Daily Lower Limit	Lower bounds for GHI in heavily overcast conditions with low atmospheric transparency. The daily mean µ is calculated from data when the sun is above the horizon (daylight hours).	µ (GHI/ETR) ≥ 0.03
S	Step	Plausible rate of change between two successive timestamps.	$(\frac{GHI (t)}{ETR (t)} - \frac{GHI (t - 1)}{ETR (t - 1)}) < 0.75$ If h > 2°
S	Shadow	Shadow contamination: rapid drop of values followed by sudden increase.	$(\frac{GHI (t)}{ETR (t)} - \frac{GHI (t - 1)}{ETR (t - 1)}) > 0.1$ If h > 2°
P	Persistence	Check for variability of measurements/sensor failure. The daily mean µ and standard deviation σ are calculated from data when the sun is above the horizon (daylight hours).	$(\frac{1}{8}) . µ (\frac{GHI}{ETR}) \leq σ (\frac{GHI}{ETR}) \leq 0.35$
SC	Spatial Consistency/Sum	Comparison of the sum of GHI for 990 h in the period under review when both weather instruments report data.
SC	Completeness of data	Percentage of hours in the measurement period for which data exists.

¹ The McClear clear sky model was used because of its easy accessibility (download from http://www.soda-pro.com/web-services/radiation/cams-mcclear (accessed on 26 October 2021)). It is a physical model and employs a look-up table on satellite-derived aerosols, water vapour and ozone data. ² Extraterrestrial irradiation and solar elevation angle were obtained from the solaR package in R software (Perpiñán 2012). ³ Clearsky periods were identified from the simple model of Collares-Pereira and Rabi (1979) (GHI/ETR > 0.6) due to lack of measured diffuse irradiance.

For the temporal drift test, there is no evidence of incorrect timestamp values. The results of most of the other tests are presented in Table 2. Mostly, these are very good, with only a few errors around sunrise and sunset.

Table A2. Percentage of hours containing data which failed QC limit, step, shadow and persistence tests for Lemolo and Echareria.

Test Name	Lemolo B	Echareria
Upper Limit	22	15
Upper Clear sky Limit	0.9	7
Lower Limit	9	9
Daily Lower Limit	0	4
Step	0.1	0
Shadow	2.7	7
Persistence	0	0

Proceeding to the spatial consistency test, both villages achieved similar values, allowing for the difference in climate. There is a high level of completeness of data for Lemolo on an hourly basis. Echareria data are fragmentary but available.

Table A3. Results of spatial consistency tests.

Test Name	Lemolo B	Echareria
Percentage of Clearsky hours in test period	54	50
Average GHI of clear sky hours Wh/m²	700	500
Sum of GHI for 990 h when both data loggers report data kWh/m²	237	219
Percentage completeness of data 2016–2017	93	11
Percentage completeness of data 2017 only	100	6

In general, the results of the tests indicate that Lemolo B and Echareria data loggers have produced data of good quality. There are few outliers, little shading and nothing to suggest instrument failure.

On the other hand, the loggers at Lemolo and Echareria may or may not be absolutely vertical. However, there is no way of obtaining any further information. The measurement is in millivolts, so a tiny difference will have a large impact at low values, i.e., morning and evening hours.

Appendix B

Table A4. nRMSE of satellite models and clear sky models at Lemolo and Echareria, 60-min interval data.

HOURLY	No. Values Lemolo	nRMSE % Lemolo	No. Values Echareria	nRMSE % Echareria	nRMSE % Lemolo 715 Values
PVGIS-SARAH 2016	3489	82.3	715	80.8	65.7
Solcast 2016	3489	37.5	715	57.8	50.2
PVGIS_CMSAF 2016	3489	34	715	68.6	35.4
CAMS 2016	3489	31.1	715	53.6	33.3
Solcast 2017	8102	38.6	386	68.5
Solcast 2016, 17	11,590	37.5	1100	61.6
CAMS 2017	8102	38.6	386	65.7
CAMS 2016, 17	11,590	36.5	1100	57.9
McClear 2016	3489	65.9	715	99.8	77.5
McClear 2017	8102	73.7	386	100.6
McClear 2016, 17	11,590	71.3	1100	100.1

Table A5. nRMSE of satellite models four Kenyan sites, daily data.

DAILY	nRMSE % Lemolo (365 Days)	nRMSE % Echareria (365 Days)	nRMSE % Galu (235 Days)	nRMSE % Munje (182 Days)
PVGIS-SARAH 2016	22.3	26.9	22.9
PVGIS_CMSAF 2016	10.5	12.3	15.6
CAMS	10.6	15.7	24.6	52.2
Solcast	9.2	11.4	27.5	53.2

Figure A1. Trendlines of percentage difference to logger for four satellite-derived databases of average hourly GHI at Echareria.

Figure A2. Average GHI per hour of day (satellite values and ground measurements) for Echareria.

Figure A3. Absolute percentage differences in average GHI per discrete hour between four satellite-derived GHI databases and ground measurements for Echareria.

Figure A4. Average GHI per day of year (satellite values and ground measurements) for Lemolo.

Figure A5. Frequency distribution of hourly GHI (satellite values and ground measurements) for Echareria.

Table A6. Accuracy ranking of each database for each performance metric for each temporal resolution for each site.

		Lemolo					Echareria					Galu
Time Interval	Test	Best	2nd Best	3rd Best	Worst of 4	Worst of 5	Best	2nd Best	3rd Best	Worst of 4	Worst of 5	Best	2nd Best	3rd Best	Worst of 4	Best	2nd Best
One min	nRMSE	CAMS	McClear
5 min	nRMSE	Solcast	CAMS
15 min	nRMSE	Solcast	CAMS
15 min	Instant nRMSE	Solcast	CAMS
60 min	nRMSE	CAMS	CMSAF	Solcast	McClear	SARAH	CAMS	Solcast	CMSAF	SARAH	McClear
60 min	Instant nRMSE	CAMS	Solcast	CMSAF	McClear	SARAH
60 min	nMBE	CAMS	CMSAF	Solcast	SARAH		CAMS	Solcast	CMSAF	SARAH
60 min	Hourly average	SARAH	Solcast	CAMS	CMSAF		Solcast	CMSAF	CAMS	SARAH
60 min	Hourly Std Dev	SARAH	CAMS	Solcast	CMSAF		Solcast	CAMS	CMSAF	SARAH
60 min	Trend closest to Logger	Solcast	CAMS	CMSAF	SARAH		CAMS	CMSAF	Solcast	SARAH
60 min	Pearson	CAMS	CMSAF	Solcast	SARAH		CAMS	Solcast	CMSAF	SARAH
60 min	Average GHI per hour of day	SARAH	CAMS	Solcast	CMSAF		Solcast	CAMS	CMSAF	SARAH
Daily	nRMSE	Solcast	CAMS	CMSAF	SARAH		Solcast	CMSAF	CAMS	SARAH		CMSAF	SARAH	CAMS	Solcast	CAMS	Solcast
Daily	Daily average	SARAH	Solcast	CAMS	CMSAF		Solcast	CMSAF	CAMS	SARAH
Daily	Average GHI per Day of Year	Solcast	CAMS	CMSAF	SARAH
	Frequency Distribution	SARAH	Solcast	CMSAF	CAMS

Table A7. Calculation of average rank of each database across all performance metrics for hourly data.

	Lemolo					Echareria					Both
Rank	1	2	3	4	Mean	1	2	3	4	Mean	Mean
CAMS	4 × 1	3 × 2	1 × 3	0	3.25	4 × 1	2 × 2	1 × 3	0	2.75	3
CMSAF	1 × 1	3 × 2	2 × 3	3 × 4	6.25	0	2 × 2	5 × 3	0	4.75	5.5
Solcast	1 × 1	2 × 2	5 × 3	0	5	3 × 1	3 × 2	1 × 3	0	3	4
SARAH	2 × 1	0	0	5 × 4	5.5	0	0	0	7 × 4	7	6.25

References

Meyer, R. Industry Insight: On-Site Measurements for PV Projects—Is It Really Necessary? ESI Afr. 2015, 42, 1. Available online: https://www.esi-africa.com/industry-sectors/renewable-energy/industry-insight-on-site-solar-measurements-for-solar-pv-projects-is-it-really-necessary/ (accessed on 12 January 2021).
Solargis. How to Choose the Right Dataset for Evaluation of Solar Projects—The MASTER Approach. 2020. Available online: https://solargis.com/ebook-how-to-choose-solar-resource-data (accessed on 12 January 2021).
Yang, D.; Bright, J.M. Worldwide validation of 8 satellite-derived and reanalysis solar radiation products: A preliminary evaluation and overall metrics for hourly data over 27 years. Sol. Energy 2020, 210, 3–19. [Google Scholar] [CrossRef]
Bilbao, J.; Roman, R.; Miguel, A. Turbidity Coefficients from normal direct solar irradiance in Central Spain. Atmos. Res. 2014, 143, 73–84. Available online: https://www.sciencedirect.com/science/article/pii/S0169809514000854?via%3Dihub (accessed on 13 October 2021). [CrossRef]
Gandoman, F.H.; Abdel Aleem, S.H.E.; Omar, N.; Ahmadi, A.; Alenezi, Q. Short-term solar power forecasting considering cloud coverage and ambient temperature variation effects. Renew. Energy 2018, 123, 793–805. [Google Scholar] [CrossRef]
Amillo, A.M.G.; Ntsangwane, L.; Huld, T.; Trentmann, J. Comparison of satellite-retrieved high-resolution solar radiation datasets for South Africa. J. Energy S. Afr. 2018, 29, 63–76. [Google Scholar] [CrossRef]
Solemi. Available online: https://wdc.dlr.de/data_products/SERVICES/SOLARENERGY/description.php (accessed on 12 January 2021).
Meteonorm. Available online: Meteonorm.com (accessed on 12 January 2021).
Reuniwatt. Available online: https://reuniwatt.com/en/ (accessed on 12 January 2021).
Soda. Available online: http://www.soda-pro.com/web-services/radiation/helioclim-3-archives-for-pay (accessed on 12 January 2021).
SolarAnywhere. Available online: https://www.solaranywhere.com/ (accessed on 12 January 2021).
Solargis. Available online: Solargis.com (accessed on 12 January 2021).
3E. Available online: https://www.3e.eu/data-services/solar-resource-data/ (accessed on 12 January 2021).
3 Tier. Available online: https://www.3tier.com/en/support/solar-online-tools/ (accessed on 12 January 2021).
Efstathiou, M.; Varotsos, C.A. On the 11 year solar cycle signature in global total ozone dynamics. Meteorol. Appl. 2013, 20, 72–79. [Google Scholar] [CrossRef]
Cracknell, A.P.; Varotsos, C.A. New aspects of global climate-dynamics research and remote sensing. Int. J. Remote Sens. 2011, 32, 579–600. [Google Scholar] [CrossRef]
PVGIS. Available online: https://re.jrc.ec.europa.eu/pvg_tools/en/tools.html (accessed on 12 January 2021).
Urraca, R.; Gracia-amillo, A.M.; Koubli, E.; Huld, T.; Trentmann, J.; Riihelä, A.; Lindfors, A.V.; Palmer, D.; Gottschalg, R.; Antonanzas-torres, F. Remote Sensing of Environment Extensive validation of CM SAF surface radiation products over Europe. Remote Sens. Environ. 2017, 199, 171–186. [Google Scholar] [CrossRef] [PubMed] [Green Version]
SPECMAGIC. Available online: http://gnu-magic.sourceforge.net/ (accessed on 12 January 2021).
Qu, Z.; Oumbe, A.; Blanc, P.; Espinar, B.; Gesell, G.; Gschwind, B.; Klüser, L.; Lefèvre, M.; Saboret, L.; Schroedter-Homscheidt, M.; et al. Fast radiative transfer parameterisation for assessing the surface solar irradiance: The Heliosat-4 method. Meteorol. Z. 2017, 26, 33–57. [Google Scholar] [CrossRef]
CAMS (Copernicus Atmosphere Monitoring Service). Available online: http://www.soda-pro.com/web-services/radiation/cams-radiation-service (accessed on 12 January 2021).
Schroedter-Homscheidt, M.; Hoyer-Klick, C.; Killius, N.; Lefevre, M.; Wald, L.; Wey, E.; Saboret, L. User’s Guide to the CAMS Radiation Service-Status December 2017. Available online: https://www.researchgate.net/publication/324542911_User%27s_Guide_to_the_CAMS_Radiation_Service_-_Status_December_2017 (accessed on 10 December 2020).
Solcast. Available online: https://solcast.com/ (accessed on 12 January 2021).
Bright, J. Solcast: Validation of a satellite-derived solar irradiance dataset. Sol. Energy 2019, 189, 435–449. [Google Scholar] [CrossRef]
MEERA-2. Available online: https://gmao.gsfc.nasa.gov/reanalysis/MERRA-2/ (accessed on 12 January 2021).
UK Engineering and Physical Sciences (EP/L002612/1) Research Project: Solar Nano Grids (SoNG). Available online: http://songproject.co.uk/ (accessed on 12 January 2021).
Gro for GooD: Groundwater Risk Management for Growth and Development. Available online: https://upgro.org/consortium/gro-for-good/ (accessed on 12 January 2021).
Is There a Way of Calibrating and Validating Sediment Yield Model without Observed Sediment Data. Available online: https://www.researchgate.net/post/Is_there_a_way_of_calibrating_and_validating_sediment_yield_model_without_observed_sediment_data (accessed on 26 May 2021).
Palmer, D.; Koubli, E.; Cole, I.; Betts, T.; Gottschalg, R. Satellite or ground-based measurements for production of site specific hourly irradiance data: Which is most accurate and where? Sol. Energy 2018, 165, 240–255. [Google Scholar] [CrossRef] [Green Version]
Bennett, N.D.; Croke, B.F.W.; Guariso, G.; Guillaume, J.H.A.; Hamilton, S.H.; Jakeman, A.J.; Marsilli-Libelli, S.; Newham, L.T.H.; Norton, J.P.; Perrin, C.; et al. Characterising performance of environmental models. Environ. Model. Softw. 2012, 40, 1–20. Available online: https://www.researchgate.net/publication/285693132_Characterising_performance_of_environmental_models (accessed on 12 January 2021). [CrossRef]
Laerd Statistics. Available online: https://statistics.laerd.com/statistical-guides/pearson-correlation-coefficient-statistical-guide.php (accessed on 12 January 2021).
Varotsos, C.A.; Efstathiou, M.; Cracknell, A.P. Plausible reasons for the inconsistencies between the modeled and observed temperatures in the tropical troposphere. Geophys. Res. Lett. 2013, 40, 4906–4910. [Google Scholar] [CrossRef]
Polo, J.; Antonanzas-Torres, F.; Vindel, J.M.; Ramirez, L. Sensitivity of satellite-based methods for deriving solar radiation to different choice of aerosol input and models. Renew. Energy 2014, 68, 785–792. Available online: https://ideas.repec.org/a/eee/renene/v68y2014icp785-792.html (accessed on 12 January 2021). [CrossRef]
Sun, X.; Bright, J.; Gueymard, C.A.; Acord, B.; Wang, P.; Engerer, N. Worldwide performance assessment of 75 global clear-sky irradiance models using Principal Component Analysis. Reviews 2019, 111, 550–570. Available online: https://www.sciencedirect.com/science/article/abs/pii/S1364032119302187 (accessed on 12 January 2021). [CrossRef]
Journée, M.; Bertrand, C. Quality control of solar radiation data within the RMIB solar measurements network. Sol. Energy 2011, 85, 72–86. [Google Scholar] [CrossRef]
Laitia, L.; Andreis, D.; Zottele, F.; Giovannini, L.; Panziera, L.; Toller, G.; Zardi, D. A solar atlas for the Trentino region in the Alps: Quality control of surface radiation data. Energy Procedia 2014, 59, 336–343. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Map of Kenya showing the locations of ground-based measurement stations in red.

Figure 2. Ten-year average GHI per hour of day for Dar es Salaam.

Figure 3. Ten-year average GHI kWh/m² for each month over ten years at Dar es Salaam.

Figure 4. Ten-year daily average GHI over ten years for Dar es Salaam.

Figure 5. Frequency distribution of hourly GHI satellite values (2007–2016) for Dar es Salaam.

Figure 6. Comparison of Meteonorm (Clear sky), SARAH and CMSAF hourly GHI for Dar es Salaam 2016.

Figure 7. nRMSE of Satellite models and clear sky model at Lemolo (3489 values) and Echareria (715 values), hourly data.

Figure 8. nRMSE of satellite models four Kenyan sites, daily data.

Figure 9. Trendlines of percentage difference to logger for four satellite-derived databases of average hourly GHI at Lemolo.

Figure 10. Average GHI per hour of day (satellite values and ground measurements) for Lemolo (2016–2017 data).

Figure 11. Absolute percentage differences in average GHI per discrete hour between four satellite-derived GHI databases and ground measurements for Lemolo (2016–2017 data).

Figure 12. Trendlines of percentage difference to logger for four satellite-derived databases of average GHI per day of year for Lemolo.

Figure 13. Frequency distribution of hourly GHI (satellite values and ground measurements) for Lemolo (2016–2017 data).

Table 1. Models and data inputs of satellite-derived GHI datasets under review.

Database	Satellite Model	Clear Sky Model	Cloud Properties Model	Temporal Resolution of Clear Sky Inputs	Spatial Resolution of Clear Sky Inputs
SARAH	Heliosat-2	SPECMAGIC	-	Monthly	125 km
CMSAF	Heliosat-2	SPECMAGIC	-	3-hourly	125 km
CAMS	Heliosat-4	McClear	McCloud	3-hourly	125 km
Solcast	Proprietary	REST2v5	Proprietary	Hourly	50 km

Table 2. Overall comparison of two satellite-derived GHI datasets at Dar es Salaam (2005–2016, 87,648 h).

Statistical Measure	CMSAF	SARAH	% Difference
Average annual in-plane irradiation kWh/m²	1650.67	1664.12	−0.81
10-year average hourly GHI Wh/m²	188.28	189.81	−0.81
Mean std dev of each hourly GHI value Wh/m²	73.98	82.10	−9.90

Table 3. nRMSE of Satellite model and clear sky model at Lemolo, one-, five- and fifteen-minute interval data.

Time Interval (min)	Satellite Model	No. Values Lemolo	nRMSE % Lemolo
1	CAMS	1143	76
1	McClear	1143	162
5	CAMS	72,611	166
5	Solcast	72,611	47
15	CAMS	45,487	125
15	Solcast	45,487	45

Table 4. nMBE of Satellite models at Lemolo and Echareria, 60-min interval data.

HOURLY	No. Values Lemolo	nMBE % Lemolo	No. Values Echareria	nMBE % Echareria
PVGIS-SARAH 2016	3489	0.35	715	0.47
Solcast 2016	3489	0.16	715	0.34
PVGIS_CMSAF 2016	3489	0.18	715	0.39
CAMS 2016	3489	0.16	715	0.31

Table 5. Instantaneous nRMSE of Satellite models and clear sky model at Lemolo, 15- and 60-min interval data.

Time Interval (min)	GHI Satellite Database/Clearsky Model	No. Values Lemolo	nRMSE % Lemolo
15	Solcast	4916	52
15	CAMS	4916	130
60	CAMS	1161	47
60	Solcast	1161	47
60	PVGIS_CMSAF	364	53
60	McClear	1161	60
60	PVGIS-SARAH	364	295

Table 6. PMCC values for hourly GHI satellite data (2016) for Lemolo and Echareria.

Database	Lemolo PMCC	Echareria PMCC
SARAH	0.875	0.877
CMSAF	0.978	0.905
CAMS	0.978	0.946
Solcast	0.971	0.907

Table 7. General statistics for Lemolo and Echareria.

Location and No. Hours	GHI Wh/m²	Logger	SARAH	CMSAF	CAMS	Solcast
Lemolo 3489 h	Sum	842,684	835,475	910,815	888,160	881,534
	Hourly average	242	240	261	255	253
	% difference sum/avg to Logger		−1	8	5	5
	Hourly Std dev	322	337	358	345	347
	% difference std dev to Logger		5	11	7	8
	% of hours under-estimating		70	63	69	68
Echareria 715 h	Sum	157,929	189,617	172,963	179,699	167,135
	Hourly avg	221	266	242	252	234
	% difference sum/avg to Logger		20	10	14	6
	Hourly Std dev	314	360	354	350	335
	% difference std dev to Logger		15	13	11	7
	% of hours under-estimating		65	67	67	64

Table 8. Average daily GHI difference for both Song sites.

% Avg Daily Difference to Logger	No. Days	SARAHGHI	CMSAFGHI	CAMSGHI	SolcastGHI
Lemolo	152	0	8	6	4
Echareria	22	22	12	16	4

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Palmer, D.; Blanchard, R. Evaluation of High-Resolution Satellite-Derived Solar Radiation Data for PV Performance Simulation in East Africa. Sustainability 2021, 13, 11852. https://doi.org/10.3390/su132111852

AMA Style

Palmer D, Blanchard R. Evaluation of High-Resolution Satellite-Derived Solar Radiation Data for PV Performance Simulation in East Africa. Sustainability. 2021; 13(21):11852. https://doi.org/10.3390/su132111852

Chicago/Turabian Style

Palmer, Diane, and Richard Blanchard. 2021. "Evaluation of High-Resolution Satellite-Derived Solar Radiation Data for PV Performance Simulation in East Africa" Sustainability 13, no. 21: 11852. https://doi.org/10.3390/su132111852

APA Style

Palmer, D., & Blanchard, R. (2021). Evaluation of High-Resolution Satellite-Derived Solar Radiation Data for PV Performance Simulation in East Africa. Sustainability, 13(21), 11852. https://doi.org/10.3390/su132111852

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluation of High-Resolution Satellite-Derived Solar Radiation Data for PV Performance Simulation in East Africa

Abstract

1. Introduction

2. Instruments, Places and Measurements

2.1. The Satellite-Derived GHI Databases Used in This Research

2.2. The Ground-Based Data Used in This Research

3. Methodology

4. Results

4.1. Selection of GHI Satellite Database without Ground Validation

4.2. Accuracy of the GHI Satellite Databases Determined by Ground Validation

4.2.1. nRMSE of the Four GHI Satellite Databaes

4.2.2. Instantaneous Accuracy of the GHI Satellite Databases

4.2.3. Managing Changing Uncertainties and Preserving the Temporal Pattern of the Data

4.2.4. Statistics for the Ground-Based Measurements and GHI Satellite Databases

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Glossary

Appendix A

Quality Control of Lemolo and Echareria Data

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI