Quality Control of Global Horizontal Irradiance Estimates through BSRN, TOACs and Air Temperature/Sunshine Duration Test Procedures

Solar Radiation (SR) data are required for many disciplines and applications. The ground measurement of SR data is hampered by technical and operational errors. Therefore, several approaches have been developed to detect these errors. This study aimed to compare two quality tests of hourly Global Horizontal Irradiance (GHI) estimates through the Baseline Surface Radiation Network (BSRN) of the World Meteorological Organization (WMO) and Top of Atmosphere irradiance and Clear sky (TOACs) on a horizontal plane. Each of these tests has a threshold to pass data, which leads to different results. A newly developed quality test method is presented that uses Sunshine Duration (SD) and Air Temperature (AT) to check hourly GHI and is applied to data from 20 meteorological stations in northeast Iraq. The new method was validated using independent high quality data from six stations in various regions with the same climate regime. The method consists of several tests that compare ground data with upper and lower limits of radiation at the top of the atmosphere, using a clear sky radiation model and the relation between SD and AT with SR to determine data values of dubious quality. The rate of error flags generally range from 1% to 27%. The findings show that SD and AT can be used to support other quality tests and to detect nearly 2% additional dubious data values compared to BSRN and TOACs tests. The SD test tends to work like a consistency check but AT does not work like that according to the validation result. However, AT can be used to test the plausibility of data. The argument for using AT in this study may be impractical for other climate conditions. The results suggest that a combination of tests can lead to a better quality of ground data, especially when the components of SR are unviable. Using climate variables for further checks is another possibility.


Introduction
Solar radiation (SR) is considered to be one of the most important parameters of climate elements.It affects other climate variables and is crucial for research fields including climate change, renewable energy, agriculture, architecture and hydrology [1,2].Therefore, a high quality of SR data is needed.It can be estimated from satellite images [3][4][5][6] and modelled using climate variables [7][8][9][10].In both cases, high quality ground data are required for validation.SR is measured at the ground level with precision by pyranometers, albeit with some uncertainty due to technical issues of the instruments, Climate 2018, 6, 69 2 of 21 which include the cosine effect, temperature response, sensitivity, non-linearity, spectral range and thermal offset [11].In addition, there are operational and installation errors from miscalibration, a lack of regular cleaning of dust, snow, water droplets and bird droppings, and shadows cast on the equipment by nearby trees and buildings [12].These factors cause systematic and random errors in the data, which have been reported in the literature [13][14][15][16][17]. Therefore, sensors that are measuring SR are unlike other meteorological instruments in that they need high maintenance to sustain performance and collection of high quality data, and their data recording needs to be checked regularly before using in scientific studies [1,11,18].Hence, new pyranometers have been developed, and some of the equipment errors have been almost eliminated, e.g.snow melt on the pyranometer dome [19].
Several studies and organisations have proposed models for the Quality Control (QC) of SR data to detect these errors using a variety of tests.The tests recommended by the Baseline Surface Radiation Network (BSRN) of the World Meteorological Organization (WMO) are the most widely used tests in the literature [2,13,15,17,18].This method was proposed by Long and Dutton [20].
Other prominent tests are those which depend on the Top of Atmosphere irradiance (TOA) on the horizontal plane and Clear Sky (Cs) models for testing the physically possible limit and extremely rare limit, respectively.This test is named TOACs in this study and is presented by Geiger et al. [21].It has been applied widely in the literature [22][23][24].This comparison of BSRN and TOACs tests has not been documented in depth in the literature.The National Renewable Energy Laboratory (NREL) of the United States has also developed a QC software for quality check of SR data based on the ratio between global and beam radiation [25], which cannot be applied when only GHI is available.
Others have used a 20% increase of TOA for the upper test limit of SR to check if there is a major problem in the data (Kendrik et al.,1994 cited in [11,12]).Some studies [1,26] have used a combination of tests in the literature such as a subtest of BSRN, a subtest of TOACs and other tests as a quality check without comparing and investigating the different results of each test for the same target.
Comparisons between solar radiation components as a test to check the consistency and plausibility of the SR data based on the relationship between diffuse, direct (beam) and global radiation have been studied intensely [11][12][13]27].Some studies have assessed the relationship of the beam, diffuse, and GHI radiation as an index to detect errors.For example, several studies have used assumptions such as direct radiation is lower than GHI, diffuse is lower than 110% of GHI and the sum of direct and diffuse is within ± 8% of GHI [11,12,25,27,28].Pyranometers, pyrheliometers and sun trackers need high maintenance and are costly [29].Therefore, meteorological stations often lack the capability of recording SR data, particularly all of its components.This means that most of the above tests cannot be applied when just GHI is available.This study addresses this problem area.
GHI data have also been checked by utilising statistical indexes such as the ratio between first and third quartile to determine the rates of lower and upper outliers in the data.This is to check normal operation in a station based on those rates.If the outlier rates are low, it means data quality is good and vice versa [11,12,14].The ratio of standard deviation and mean of the GHI with TOA data have also been used as a conditional operator of an equation for a persistence test of GHI; for further detail see [28].This study has also utilised some statistical indices for setting the test limits.One published study argues that the minor deviation errors of daily GHI data can be detected using satellite-based products [30].
Regarding the use of other climate variables for testing the quality of GHI data, such as Sunshine Duration (SD) and Air Temperature (AT), SD has been used by [31] and for testing daily [12] and monthly [11] GHI data.Moradi [22] also investigated a model which is based on the lower limit of the SD index to test daily global radiation, especially for those stations for which direct and diffuse radiation components are not available (see also [24]).Recently, Journée et al. [28] have also tested hourly Direct Normal Irradiance (DNI) data with SD.This study was applied another way to test hourly GHI with SD.
Uses of AT to assess GHI data are scarce in the literature, despite the known relationship between the two parameters [7,32,33].Several models used AT to estimate GHI [7].However, AT has been used for testing longwave radiation [13].The same study also used the lower bound of AT for snow melting to test the sum of global shortwave radiation, whereas our study used a new method.
From the literature review, it is clear that further research into the quality control of GHI observations is required.The aim of this study was, therefore, to compare and evaluate the results of two sets of tests (TOACs and BSRN) for hourly GHI data over 20 stations whose data had not been quality assured and tested before.The analysis will assess the reliability of each test where the conditionals in each test are different, but the target is the same.This study also uses a simple new AT and SD test which is based on the relationship between GHI data with AT and SD.This test is useful for stations that do not record diffuse and direct radiation.It is validated with high quality data of SR available for six stations with the same climate types as the study area.Finally, SD and AT are combined to enhance the results and to detect the errors in those variables rather than GHI.

Dataset
Hourly data of GHI (W/m 2 ), SD in minutes per hour and average AT ( • C) were collected from seven automatic and thirteen tower meteorological stations in the Kurdistan region of northeast Iraq.SD is not recorded at tower stations.Openly available one minute data of all SR components with SD from three Australian [34] and one minute data of all SR components with AT from three BSRN [35] stations were collected for validation purposes.Figure 1 shows the climate regions according to the Köppen classification and the locations of the stations [36,37].Tables 1-3 show the geographical information, pyranometer types and timescales for each station.The data acquisition times were selected to be between sunrise and sunset when the sun elevation angle is above 15 • .This is to avoid a high rate of errors due to the cosine effect for lower sun angles and the AT test when the elevation angle is low [1,11,28], although some researchers suggested a 7 • sun elevation angle [26].The timescale of the data varies between the stations.been used for testing longwave radiation [13].The same study also used the lower bound of AT for snow melting to test the sum of global shortwave radiation, whereas our study used a new method.
From the literature review, it is clear that further research into the quality control of GHI observations is required.The aim of this study was, therefore, to compare and evaluate the results of two sets of tests (TOACs and BSRN) for hourly GHI data over 20 stations whose data had not been quality assured and tested before.The analysis will assess the reliability of each test where the conditionals in each test are different, but the target is the same.This study also uses a simple new AT and SD test which is based on the relationship between GHI data with AT and SD.This test is useful for stations that do not record diffuse and direct radiation.It is validated with high quality data of SR available for six stations with the same climate types as the study area.Finally, SD and AT are combined to enhance the results and to detect the errors in those variables rather than GHI.

Dataset
Hourly data of GHI (W/m 2 ), SD in minutes per hour and average AT (°C) were collected from seven automatic and thirteen tower meteorological stations in the Kurdistan region of northeast Iraq.SD is not recorded at tower stations.Openly available one minute data of all SR components with SD from three Australian [34] and one minute data of all SR components with AT from three BSRN [35] stations were collected for validation purposes.Figure 1 shows the climate regions according to the Köppen classification and the locations of the stations [36,37].Tables 1-3 show the geographical information, pyranometer types and timescales for each station.The data acquisition times were selected to be between sunrise and sunset when the sun elevation angle is above 15°.This is to avoid a high rate of errors due to the cosine effect for lower sun angles and the AT test when the elevation angle is low [1,11,28], although some researchers suggested a 7° sun elevation angle [26].The timescale of the data varies between the stations.[36,37].Figure 1.Climate regions, station types and their locations in the study area [36,37].

Method
Firstly, the time series of hourly data for each station were plotted for variables GHI, AT and SD together for daytime recordings with sun elevation angle above 15 • , and GHI alone, in fingerprint plots, in which the x-axis represents a day in the year and the y-axis represents an hour in the day, demonstrating a GHI value for the times with a colour scale from blue to red.This is to check for any major problems with the data before testing every single observation.Secondly, in order to obtain high quality GHI data, the methods in the following sections were implemented, and illustrated in Figure 2.

Missing Value (NA) Detection
Detecting missing hourly values in the time series and setting them to NA is important to show missing observations which can later be used for different purposes, such as comparing two observations in the time series or comparing ground data with satellite data to avoid inappropriate comparisons [1,16,27].We automatically checked all hourly time series data for any gaps or unreliable filled values such as 999 and (///), and we set them to NA.We aggregated the one-minute data of BSRN and Australian stations data to hourly data.For this aggregation, first the time series should be gap-filled.Then, if there are some minutes as NA in an hour the result of that hour become NA.

Comparison between BSRN and TOACs Tests
GHI data were checked for the physically possible limits for minimum and maximum observations using two tests.The first subtest uses the BSRN subtest with two targets.−4 W/m 2 < GHI GD < (So/Se 2 )1.5(cos θ 1.2 ) + 100 W/m 2 (1) with: GD : Ground Data So: Solar constant equal to 1367 W/m 2 cos θ: Cosine of solar zenith angle Se: The Earth-Sun distance in astronomical units.This is used to adjust the solar constant over a year.For instance, its highest value is 1.016682 on July in summer and the lowest value is 0.983277 on January in winter for the latitude 35 north and the longitude 45 east [20,38].
The tests have been applied in several case studies (see Section 1).It requires the solar zenith angle and the Earth-Sun distance for each hour with the solar constant (Equation ( 1)).These can be obtained from many sources such as SPA [38] or by Equation ( 3).The first subtest of BSRN as the Equation ( 1) is compared to the first subtest of TOACs Equation ( 2) for the same two targets as below: The second condition of Equation ( 1) (flag 1) for the BSRN subtest is compared to the second condition of Equation ( 2) (flag 3) for the TOACs subtest to detect the upper physically possible limits.The first condition of Equation ( 1) (flag 4) for the BSRN subtest is compared to the first condition of Equation (2) (flag 5) for the TOACs subtest to detect lower physically possible limits (Table 4).0.03 * TOA < GHI GD < TOA (2) The TOACs tests are described in detail in [21].The test requires the TOA, which is available from sources, namely SoDa [39] and SPA [38].It is calculated for any location and time by Equation (3).
Another comparison between BSRN and TOACs in other subtests is checking to detect extremely rare limit observations in the data; their borders for detection are lower than the previous subtests in Equations ( 1) and ( 2).The first one is related to the BSRN second subtest with the same requirements as Equation ( 1), and is calculated by Equation ( 4).The target is the same for both BSRN and TOACs second subtests.The second subtest of BSRN has compared the condition of Equation ( 4) (flag 6) to the condition of Equation ( 5) (flag 7), which represents the second subtest of the TOACs.
The TOACs second subtest is based on comparing the ground data with 110% of the clear sky radiation value [21].In reality, the ground data should be lower than the result of clear sky radiation [26,40].However, if it is higher than the clear sky radiation, the data value should be flagged for further checks.There are a number of models for estimating clear sky radiation [41].One simple model is based on daylight time by hours with a constant coefficient and radiation at the top of atmosphere [42].Another model uses the Linke turbidity factor to demonstrate the clarity of the sky [43].For more detail about clear sky radiation, the reader is referred to Reno et al. [41].This study used the McClear model for clear sky radiation, which was developed based on a physical model and uses more than one input to the model, mostly from satellite images.Full details can be found in [40].

Sunshine Duration Test
The relationship between SD and GHI and its use as a persistence test of the GHI data have been described extensively in the literature [11,12,14,22,28].SD test is a good option for those areas where only GHI is available, and the radiation components are not available.This was demonstrated by Moradi [22] for checking daily data.The comparison test for checking the consistency of data cannot be applied because it depends on diffuse and direct beam radiation in addition to global radiation.Previous studies [22,24] used the lower limit of SD with a clearness index to test GHI data.Here, both the lower and upper bounds of SD are applied in the testing.First, for the lower bound of SD = 0, in a given time interval GHI should not exceed the maximum possible rate of diffuse radiation.Otherwise, data values will be flagged as dubious quality according to Equation ( 6).This is because under cloudy conditions when SD is zero, pyranometers record the diffuse radiation.
The maximum rate of diffuse radiation is set to 35% of TOA radiation, based on a satellite-derived database which is available for the case study at SoDa [39] because of the unavailability of measured diffuse radiation in the case study.We compared the max diffuse of the satellite-derived database with various rates of TOA in the case study until we set it as 35% of TOA.Here we test GHI, based on the reality that direct (beam) radiation does not exceed 120 W/m 2 if the sunshine duration is zero [28,29].This means that the contribution of direct radiation to the GHI is low when SD is zero and most of the GHI in this situation is diffuse radiation.Therefore, we set the condition of the test to explain why GHI is high while SD is zero.The condition is set as in Equation ( 6) so that if the result is true the data passes the test and vice versa.The data is checked for this type of error, which can occur because of miscalibration and operational related errors such as high reflected radiation from nearby equipment.Hence, this situation may happen naturally by broken cloud or bright cloudy sky but they are not regular.If the rate of the test does not reach a high percentage, it is considered acceptable.However, the hourly data are based on mean radiation, which includes several situations.
Regarding the upper bound of SD, if the SD exceeds 83% in a given time interval, the solar radiation should be above 35% of TOA.For SD between 50% and 83%, the solar radiation should be above 10% of TOA.This situation was checked with Equation (7).The test detects data values affected by partial shading of the sensor, semi-malfunction, bird droppings on the sensor and other forms of dirt.The test also checks data for systematic errors above 3% of TOA if they have not been detected by the lower limit of TOACs test.
M-diff: Possible max diffuse radiation equal to 35% of TOA in this study.
SD > 50 min in 60 & GHI GD > 35% TOA W/m 2 ; 50 > SD < 30 min in 60 & GHI GD > 10% TOA W/m 2 (7) This test was validated with three Australian stations (Table 3).The validation was done by comparing SD tests in Equations ( 6) and (7) (flag 8 and 9) with flag 14, which is based on the consistency test with the availability of SR components (the conditions are shown in Table 4).The validation is based on comparing (1) how the SD tests passed data or flagged it as having errors with (2) the consistency test in high-quality data at Australian stations.This is because the consistency test uses the data records of three pyranometers.Therefore, the rates of error detection in flags 8 and 9 on one side and flag 14 on the other side in that high-quality data evaluate the SD tests based on their percentage of error detection compared to the consistency test.

Air Temperature Test
Using AT for quality checks of GHI data has not been widely described in the literature, although there is a significant relationship between both variables, especially during daytime.One of the main factors of temperature change is SR.When it is transmitted through the atmosphere, a fraction of the radiation reaches the earth's surface and is converted into thermal energy.However, there are several regional, local, and climatological factors affecting these processes.
Several models have used AT to estimate GHI [7,24].Therefore, the relationship between the two variables can be used to test GHI data, for example by utilising mean AT for each month for all data observations.We assume that if AT is higher than its monthly average, the GHI should be above 10% of TOA (Equation ( 8)) and if the AT is lower than half its monthly average, GHI should be lower than the maximum possible diffuse ratio (Equation ( 9)).These are the hypotheses for detecting possible errors in the data.However, the rate of flag errors can possibly be detected in the data because of local factors that affect the temperature change, whereas in the specific climate regions, the rate of flag errors should not reach up to 3% of the data.The condition of the test is based on the connection between two variables, when the sun elevation angle is above 15 • and the limit of the test is set to a low level such as 10% and 35% of TOA.This is to decrease the effect of other factors discussed above because the response of temperature to GHI is delayed slightly by absorption, conduction and transfer.Other QC tests of GHI data in the literature also have some conditionals, namely comparison and statistical tests [1,2,[12][13][14]28], which do not test all the data.This new AT test will be checked in semi-arid and Mediterranean climate regions.Some modifications for other climate regions might be needed.
AT > its daytime hourly mean by month & GHI GD > 10% of TOA (8) AT < its daytime hourly mean/2 by month & GHI GD < 35% of TOA (M-diff) Hourly mean and its hourly half of the mean of AT is calculated in each month for sun elevation angle above 15 • .For example, all hourly data of the AT in January are used to calculate mean and half of the mean to test the January GHI data, and so on for each month.
The AT test is useful because: 1.
Other tests such as the upper and lower limit with extremely rare observations cannot be used for detecting errors in the middle of the data.

2.
When the comparison test based on solar components cannot be applied because diffuse and direct beam radiation were not recorded.

3.
The temperature is recorded for almost all stations, and the uncertainty of temperature recording equipment is minimal [29,32].

4.
When the sunshine record is unavailable at a station, AT is an option.

5.
The test can be used for further checks to demonstrate the quality of solar radiation data or to compare its result with others.
However, the test has some limitations such as the effect of some local factors and climate conditions on the result and also some natural situation of GHI might be detected as errors.This test will be checked with data from three BSRN stations (Table 3).This is quite similar to the validation SD tests procedure, but we compared the Equations ( 8) and (9) AT tests (flags 10 and 11) with the flag 14 consistency test.

Combining Air Temperature and Sunshine Duration Tests
To reduce uncertainty, to detect errors in both AT and GHI and to enhance the SD test we combine Equations ( 6) and (8) in one new test (Table 4, flag 12) and Equations ( 7) and ( 9) in another new test (Table 4, flag 13).All arguments are written in one conditional.This is useful to check each variable against each other to see in which variable the errors are, GHI, AT or SD.The argument is based on conditions of three variables.In these new tests, we check for multiple possible errors, for instance, whether AT is above its mean, SD is above 50 min in 60 min, and GHI is lower than 10% of TOA.If there is any situation like this, the data value is flagged.Similarly, if AT is less than half its mean in a month and SD is zero and GHI is above 35% TOA, the data value is flagged.
All the input parameters for the equations to test the GHI data were calculated or downloaded from related sources for each hour according to the true local solar time for all stations in the case study area.

Quality Control Flags
As suggested by Maxwell [25] and applied by Younes et al. [11] and Moradi [22] none of the data were modified or deleted, but they were flagged.The data were checked by all tests separately, and each subtest had a flag number.If the condition of an equation, or a part of the equation for those equations with two conditions is true, data are passed; otherwise data are failed and flagged with the appropriate error flag number (Table 4, Figure 2).Some flags may be removed by aggregating hourly data to daily data, whereas others cannot be removed because the flaw in the data affects the quality of the aggregation.The flag procedure is considered an easy automatic way to count, to check, to delete, and to interpolate any observation according to its flag number.

Counting All Tests
Flag 1 is set according to different sets of tests, from TOACs and BSRN alone, by combining them with the SD tests, and subsequently with the AT tests separately, as well as combining them with both SD and AT tests (Figure 2).Unlike previous studies [2,17] this procedure was used to assess the rate of each tests separately.This is important because in the previous sections all data have been tested with each subtest, whereas here we assess which test or tests the data values passed.

Combining Air Temperature and Sunshine Duration Tests
To reduce uncertainty, to detect errors in both AT and GHI and to enhance the SD test we combine Equations ( 6) and (8) in one new test (Table 4, flag 12) and Equations ( 7) and ( 9) in another new test (Table 4, flag 13).All arguments are written in one conditional.This is useful to check each variable against each other to see in which variable the errors are, GHI, AT or SD.The argument is based on conditions of three variables.In these new tests, we check for multiple possible errors, for instance, whether AT is above its mean, SD is above 50 min in 60 min, and GHI is lower than 10% of TOA.If there is any situation like this, the data value is flagged.Similarly, if AT is less than half its mean in a month and SD is zero and GHI is above 35% TOA, the data value is flagged.
All the input parameters for the equations to test the GHI data were calculated or downloaded from related sources for each hour according to the true local solar time for all stations in the case study area.

Quality Control Flags
As suggested by Maxwell [25] and applied by Younes et al. [11] and Moradi [22] none of the data were modified or deleted, but they were flagged.The data were checked by all tests separately, and each subtest had a flag number.If the condition of an equation, or a part of the equation for those equations with two conditions is true, data are passed; otherwise data are failed and flagged with the appropriate error flag number (Table 4, Figure 2).Some flags may be removed by aggregating hourly data to daily data, whereas others cannot be removed because the flaw in the data affects the quality of the aggregation.The flag procedure is considered an easy automatic way to count, to check, to delete, and to interpolate any observation according to its flag number.

Counting All Tests
Flag 1 is set according to different sets of tests, from TOACs and BSRN alone, by combining them with the SD tests, and subsequently with the AT tests separately, as well as combining them with both SD and AT tests (Figure 2).Unlike previous studies [2,17] this procedure was used to assess the rate of each tests separately.This is important because in the previous sections all data have been tested with each subtest, whereas here we assess which test or tests the data values passed.Comparison of GHI ground data against TOA and additional TOA following Equation (1) for the upper limit and with 3% and −4% for the lower limit.Checks for major errors and flags those as a fail flag.

Results
The QC test procedures were applied to GHI data from 20 meteorological stations in northeastern Iraq.All results are presented in Tables 5-7 and Figures 3-7.Table 5 shows the results of the tower stations for comparison between the BSRN (flags 2, 4, & 6) and TOACs (flags 3, 5, and 7) tests for each of their subtests, respectively, with AT tests as flags 10 and 11.Flag 2 is compared to flag 3, and flag 4 is compared to flag 5 for detecting observations above and below the upper and lower physically possible limits, respectively, for each subtest of the BSRN and TOACs tests.Similarly, flag 6 is compared to the flag 7 for detecting extremely rare observations as subtests in the two tests.Quite similar to Table 5, Table 6 shows the result of automatic stations for all flags in Table 5 with flags 8 and 9 of SD tests and flags 12 and 13 for combining AT and SD tests.This is because SD is available in automatic stations only.Table 7 shows the result of the validation for SD tests (flag 8 and 9) by consistency test (flag 4) at three Australian stations and evaluating AT tests (flag 10 and 11) by consistency test (flag 4) at three BSRN stations.The rate of passing data by flag 1 among the tests and combination of the tests are demonstrated by Figures 6 and 7 for the tower and the automatic stations respectively.General checks for the time series data are shown for some examples in Figures 3 and 4. The borders and limits of the tests are shown in different hours at one station as an example (Figure 5).The rates of NA detection are shown in Tables 5 and 6 for the 20 stations.

General check and NA Detection
All available GHI, SD and AT data are shown for a selection of stations in Figure 3.There are systematic errors in the GHI data for Maydan station (Figure 3b) from January to March 2016.Other errors in the GHI data are found for Kalar station (Figure 3d) from the end of 2015 until January 2016, and for Mazne station (Figure 3c).Errors in SD data are present for Bazian (Figure 3a) and Kalar (Figure 3d) stations, especially in the hot summer months, whereas both GHI and AT are normal for the first two stations.
Data gaps (NA) are shown in Tables 5 and 6.A high rate of missing hourly values (11.3%) recorded by the automatic stations found at Maydan, and the lowest rate at Halsho (3%).Missing values have not been detected at the tower stations except at a negligible rate (0.3%) at Hojava station.
The fingerprint plot of stations namely Halabja, Dukan and Kalar (Figure 4b-d) shows systematic errors in each year in April and September.Other stations are nearly normal according to the fingerprint plots, and the example of Bazian is shown (Figure 4a).

Comparing BSRN and TOACs
The limitations and borders of the BSRN and TOACs tests are shown in Figure 5 as samples from different hours of the day.For the physically possible upper limit test (flags 2 and 3) most automatic and tower stations passed the flag two checks, whereas low rates were recorded in some stations for flag 3 (Tables 5 and 6).
All data values passed the flag 4 BSRN test, while flag 5 for the TOACs test was raised by 9.53%, 6.14%, 5.55% and 1.46%% of the data values recorded at Kalar, Banmqan, Mazne and Surdash, respectively.The error rate for the same flag is lower than 1.15% for other stations (Tables 5 and 6).
The data were checked to detect extremely rare limit observations by flags 6 and 7.The rate of flag 6 is zero in all stations except Banmqan, which recorded 0.75%.In contrast, the dramatic high rate of flag 7 was recorded in most stations.The highest rates were 27.18%, 17.45%, 12.98% and 11.65% in stations Surdash, Banmqan, Aliawa and Kalarikon respectively.However, two low rates were recorded for the same flag, which are 0.03% and 0.46% at Halabja and Darband stations.Other values of flag 7 for remaining stations range from 1% to 9% (Tables 5 and 6).

Sunshine Duration Test
This test is applied only to automatic stations.The rate of flag 8 is near zero at three stations but reached 1.68% at Halabja and 7.36% at Kalar stations.Flag 9 registered high rates of 17.64%, 12.3%, 7.38% and 3.93% at Halsho, Bazian, Kalar and Maydan respectively, but the rates of the three other stations were lower than 1% (Table 6).In the validation of this test according to the Table 7, the rate of flag 8 was near 0%, but the flag 14 consistency test is zero in two Australian stations and near zero (0.02%) in the other one.Hence, flag 9 recorded low rates of 0.59%, 0.26% and 0.43% at the same three stations.Note: some flags were not applied because SD is not recorded at tower stations.

Air Temperature Test
The occurrence rate of flag 10 is below 1% for most automatic and tower stations.The two highest rates were 4.54% and 3.31% at Kalar and Mazne.Similarly, the rate of flag 11 is below 1% at 11 stations, and the highest percentage of 1.81% was recorded at Halsho station.The rate of the remaining stations was lower than 1.8% (Tables 5 and 6).Table 7 which compares this test with a consistency check and reveals that the rate of flags 10 and 11 were 0%, but flag 14 reached 1.87% at Petrolina station.The same two flags recorded low rates of 0.04% and 0.58%, whereas flag 14 reached 2.5% at Sede Boqer.At Carpentras flags 10 and 11 rose to 0.93% and 2.08% of the data while flag 14 had a very low rate of only 0.03%.

Combining Air Temperature and Sunshine Duration Test
Zero rates were recorded for flags 12 except one station with 3.45%.The rate of flag 13 is also below 0.5% at two stations; it reached zero at the other five stations (Table 6).

Data Pass (Flag 1)
The result of flag one which indicates a data pass, is represented by four combinations.First, according to all BSRN tests, the GHI data showed a high percentage pass rate for flag 1 of all stations at 100%, except Banmqan station at 99.75% (Figures 6 and 7).Second, the result of flag 1 according to the TOACs tests are quite different from the previous tests.The two lowest pass rates were recorded at Surdash and Banmqan stations at 71% and 76%.The other stations, Aliawa, Kalar and Kalarikon, had pass rates less than 90%, and all other rates ranged from 90% to 99%.Third, when combining the AT test with the previous two tests, the passed data percentage according to this test was generally lower by nearly 1-2% of the TOACs test (Figures 6 and 7).
Finally, the combination of the SD tests combined with AT tests, TOACs, and BSRN tests were applied only to the automatic stations.The results of SD with TOACs and BSRN are shown in Figure 6.The pass rate of flag 1 was lowest at Halsho, Kalar and Bazian (76%, 81% and 84%).In contrast, the pass rate of the other four stations was above 90%.The result of mixed tests of AT and SD with TOACs and BSRN is quite similar to the result of TOACs alone (Figure 7).

Discussion
The application of the sets of QC tests to the GHI station data of both station types in northeastern Iraq has shown high data quality.The data gaps are generally very limited at the tower stations, whereas all automatic stations have a rate of missing values, which are similar to previously published studies [16,44].The results of QC of GHI show slight differences between the BSRN and TOACs set of tests.The results of quality checks of GHI based on SD varies with the station and with the lower and upper limits of SD.Small error rates are detected by the AT tests for all stations.The errors flagged up by a combination of SD and AT are generally low.The result of evaluating SD and AT by a consistency check supports the SD test but not the AT tests.The rate of AT test errors at the validation stations also is low.
The general reliability and error rates for GHI, SD and AT can be highlighted by comparing them to each other (Figure 3) or for GHI by fingerprint plots (Figure 4).Those plots are important to check all data, whereas their results are more obvious if they are used for check one to three years data rather than above three years.The figures represent hourly data.For instance, when minute data is aggregated as hourly values and the errors are therefore difficult to detect.Figure 3 identifies some equipment errors in the GHI data when compared to a time series of AT. Figure 3b shows that both GHI and AT have errors.This type of error is not easily detected by comparing both variables.However, for the fingerprint plot some errors can be seen easily if the plot represents one or ten minutes data while some errors are seen for hourly data (Figure 4).
The comparison of the two tests showed the diversity of the results for each test or in separate error flags (Tables 5 and 6).In the past, several studies applied either BSRN or TOACs.The difference between the two tests is evident from Tables 5 and 6 and Figures 6 and 7.This is because the limits of the BSRN tests are more relaxed than the TOACs tests (Figure 5).
The detection errors of the upper physically possible limit as (flag 2) and (flag 3) are relevant in the two tests, whereas more observations are flagged as errors by the TOACs test than by the BSRN test (Tables 5 and 6).This is because of that the TOACs test depends on TOA (Equation ( 1)) and BSRN depends on increased TOA (Equation (2), Figure 5).This result of each test is in agreement with published studies [1,16,21,22,24,30] that have applied each test separately.

Discussion
The application of the sets of QC tests to the GHI station data of both station types in northeastern Iraq has shown high data quality.The data gaps are generally very limited at the tower stations, whereas all automatic stations have a rate of missing values, which are similar to previously published studies [16,44].The results of QC of GHI show slight differences between the BSRN and TOACs set of tests.The results of quality checks of GHI based on SD varies with the station and with the lower and upper limits of SD.Small error rates are detected by the AT tests for all stations.The errors flagged up by a combination of SD and AT are generally low.The result of evaluating SD and AT by a consistency check supports the SD test but not the AT tests.The rate of AT test errors at the validation stations also is low.
The general reliability and error rates for GHI, SD and AT can be highlighted by comparing them to each other (Figure 3) or for GHI by fingerprint plots (Figure 4).Those plots are important to check all data, whereas their results are more obvious if they are used for check one to three years data rather than above three years.The figures represent hourly data.For instance, when minute data is aggregated as hourly values and the errors are therefore difficult to detect.Figure 3 identifies some equipment errors in the GHI data when compared to a time series of AT. Figure 3b shows that both GHI and AT have errors.This type of error is not easily detected by comparing both variables.However, for the fingerprint plot some errors can be seen easily if the plot represents one or ten minutes data while some errors are seen for hourly data (Figure 4).
The comparison of the two tests showed the diversity of the results for each test or in separate error flags (Tables 5 and 6).In the past, several studies applied either BSRN or TOACs.The difference between the two tests is evident from Tables 5 and 6 and Figures 6 and 7.This is because the limits of the BSRN tests are more relaxed than the TOACs tests (Figure 5).
The detection errors of the upper physically possible limit as (flag 2) and (flag 3) are relevant in the two tests, whereas more observations are flagged as errors by the TOACs test than by the BSRN test (Tables 5 and 6).This is because of that the TOACs test depends on TOA (Equation ( 1)) and BSRN depends on increased TOA (Equation (2), Figure 5).This result of each test is in agreement with published studies [1,16,21,22,24,30] that have applied each test separately.
The most important feature is that the lower limit of the BSRN test (flag 4) leaves errors, even significant systematic errors, undetected (Tables 5 and 6, Figures 3 and 4).This is owing to setting the lower limit to (-4 W/m 2 ).In contrast, many data values are flagged by the TOACs test (flag 5) (Tables 5 and 6, Figure 3).This might be owing to the full or partial shading of the sensor, dirt on the sensor or malfunction of the sensor [11,16,26].The lower limit of the BSRN test can be useful when checking day-and night-time data in cold regions.It is clear that there is no negative value of GHI, but this situation happens when the calibration calculation depends on the temperature difference between the dark and bright area in the active part of the sensor, which occurring at night [2,15].
Another interesting aspect is that the error rates of the BSRN for the extremely rare limit test observations (flag 6) are zero, while flag 7 of TOACs test for the same target recorded high rates at most stations.The high difference between the two tests is related to the border limits (Figure 5).However, high error rates of flag 7 are detected in this study.Similar error rates are reported in the literature [17,26,28].Those studies explained that this situation happens in high latitudes and even in mid-latitudes when there is cloud reflected radiation received by the sensor, then the GHI is greater than clear sky radiation.In our case study, the high errors are explained by the above-mentioned reason and the following reasons.First, in the case of Surdash and Banmqan stations, the errors might refer to the operational issue because most other tests detect the errors in those two stations.Second, in other cases, it might be related sky model, which has a record of under estimation [45] especially when the sun elevation angle is from 16-20 • .
The result of the SD test for flag 8 recorded two high error rates which are partly related to errors in the SD recorder.Other error rates indicate that the pyranometer needed to be checked for partial shading or dirt contamination especially in April and September (Figure 4) when most of the errors are detected by flag 8 (Table 6).For the lower limits of the SD test (flag 9) the high rate of errors at Bazian and Kalar stations during particular times are related to systematic errors in SD itself, not in the GHI data (Figure 3a,d), as seen in the flag 13 result.This indicates that when another variable is used for QC of GHI data, it should be checked prior to the analysis.Studies mentioned some ways for testing an SD recorder [28,32].Some studies have used SD for testing the GHI data without checking it against another variable such as AT for more accurate results [22,24].Therefore, we compared SD with the possible maximum diffuse TOA radiation and with AT, and the results are observable (flag 12 and flag 13).In this way, both SD and GHI are tested (Table 6).From the comparison of flags 8 and 9 (Tables 6 and 7) at the case study stations and the validation stations, the results show good agreement.The rates of errors are relevant except where the errors are related to SD and not to the GHI.However, flag 9 recorded low rates under 0.5% and flag 14 is nearly zero in all validation cases.These findings generally support the use of SD as a consistency check.
Published uses of AT for QC of GHI are limited in the literature.The results here demonstrate good agreement of the AT test for its lower and upper limits with other tests such as in the case of Kalar, Dukan, Surdash and Mazne stations (Tables 5 and 6).This is for its upper limit, which supports the results of other tests, and the rates of error are because of the same reason of flag 5.For low AT and high GHI the result of AT also supports other tests at most tower stations and Kalar and Halabja automatic stations.In the case of comparing AT (flag 10 and flag 11) with the consistency test (flag 14), the validation station results did not support using AT as a consistency test (Table 7).This is because the error rates detected by the AT tests at Carpentras and Petrolina are related mainly to the local conditions whereas comparison rates at Sede Boqer tend to be relevant.The low error rates of nearly 2% of dubious data values according to the AT tests in high quality data tend to support the use of AT a plausible test (Table 7).This supports our previous argument with the AT test, which is based on no more than 3% errors in a station dataset.Other studies have used the other components of SR [2,17,22,24] and using only SD [22,24] whereas our model is based on AT which is available at most stations.
Another interesting point of this study is related to the error rates identified by flags 12 and 13, which indicate that the errors are not generally in the GHI data but are instead related to the SD and AT variables.
The rate of flag 1, which means data pass for various types of tests, reveals a high difference between BSRN and TOACs tests.This indicates that the BSRN test is not acceptable if only GHI is available because even systematic errors are not detected (Figures 3, 4, 6 and 7).However, the TOACs test has detected several systematic errors and others errors whereas some other observations have also been flagged due to the border of the test.The rates of data pass based on AT tests with others lower by nearly 2% based on TOACs which indicates that some data values are flagged as dubious only by the AT test.The rates of data passing the QC according to the SD test are low because errors were in the SD recorder, which has been detected by combining it with the AT test (Table 6, Figures 6 and 7).The results reveal that SD and AT tests have detected a rate of dubious data which had not been detected by any other tests due to the chosen limits.
These are the limitations of the tests.Firstly, the general plot (Figure 3) is not always reliable especially for a large number of observations.Secondly, the chosen limits for the SD and AT tests might not be perfect.For example, we set the upper limit of SD to 50 and from 30-50 minutes in one hour, which is based on an assumption about why SD is high and GHI is low; there might be some questionable data under that limit.Previous studies have used only one argument as a lower limit [22,24].Our set limits for these two tests are near the middle of the data for the upper limit and far from the TOACs by 7% for the lower limit (Figure 5).This is important to identify errors in that border and contribute to other tests.Thirdly, the mean and the half of mean AT in each month are used to test GHI with 10% and 35% of TOA, which also tends not to be perfect.This is mainly because there are some times when this situation may happen naturally, especially when the AT is lower than its half of the mean in the month and GHI is above 35% of TOA.The arguments need to be modified by using the AT test in other climate regions.
Owing to the limitations of recording minimum and maximum AT at many stations and the fact that the increase in AT in one hour is not high [32], we used the mean AT test.Some studies have estimated GHI from minimum and maximum AT [7,46].Unless we have validated SD and AT with the consistency test, which tests all single observations, AT and SD have limited borders for tests according to conditional arguments (Table 4), which means that they do not test every single observation in the time series.Generally, the results of AT for testing GHI data show good agreement with other tests and AT is useful to enhance the SD test.

Conclusions
This study has applied quality control approaches for flagging data values of hourly Global Horizontal Irradiance (GHI) that are of dubious quality, by comparing the BSRN and TOACs tests for the upper and lower limits of physically possible and extremely rare observations.By applying new quality checks based on Sunshine Duration (SD) and Air Temperature (AT) for stations with the unavailability of all solar radiation components, they detect further errors in the data at seven automatic stations and thirteen tower stations in semi-arid and Mediterranean Sea climate regions.The new tests were validated with high quality meteorological data from six stations in various regions around the globe with similar climate types.The results demonstrate the high percentage difference between BSRN and TOACs for each subtest due to the different limits.This indicates that BSRN cannot be used when only GHI is available because most errors will not be detected.Hence, the rare limit of TOACs detected high rates of errors, which needs to be addressed for a decision on the final QC results.SD can be used as a partial consistency test, which has been supported by the validation results.Contrary to that, AT has not been supported as a suitable test.However, it is possible that AT can be used to generally check GHI data, especially when the components of solar radiation data are unavailable.The AT test detected very low error rates in high quality data at the validation stations.Further research is required to compare BSRN and TOACs tests in other areas.Using several arguments

Figure 1 .
Figure1.Climate regions, station types and their locations in the study area[36,37].Figure1.Climate regions, station types and their locations in the study area[36,37].

Figure 2 .
Figure 2. Flowchart of proposed quality control.Figure 2. Flowchart of proposed quality control.

Figure 2 .
Figure 2. Flowchart of proposed quality control.Figure 2. Flowchart of proposed quality control.

Figure 3 .
Figure 3. Scatterplots of GHI W/m 2 (first y-axis) and SD in minutes per hour with AT in C (second y-axis) for all-time series hourly data at stations namely (a)

Figure 3 .
Figure 3. Scatterplots of GHI W/m 2 (first y-axis) and SD in minutes per hour with AT in • C (second y-axis) for all-time series hourly data at stations namely (a) Bazian, (b) Maydan, (c) Mazne which SD is unavailable, and (d) Kalar.

Figure 6 .Figure 6 .
Figure 6.Bar chart of flag 1 passed data according to each set of tests in tower stations.'Others' refers to previous tests such as BSRN and TOACs.

Figure 7 .
Figure 7. Bar chart of flag 1 passed data according to each set of tests in automatic stations.

Figure 7 .
Figure 7. Bar chart of flag 1 passed data according to each set of tests in automatic stations.

Table 1 .
Description of processed data of ten-minute global horizontal irradiance measured by a CMP6 Kipp and Zonen Pyranometer for the tower stations.

Table 2 .
Description of processed data of hourly global horizontal irradiance measured by a QMS101 Vaisala Pyranometer for the automatic stations.

Table 3 .
Description of processed data of hourly GHI, DNI and Diffuse Horizontal Irradiance (DHI) measured by Kipp and Zonen equipment with high quality for six stations elsewhere in the world for validation.

Table 4 .
Flag number and description of quality control approach.

Table 5 .
The ratio of NA and the error flags in the tower station data.

Table 6 .
The ratio of NA and error flags at automatic stations.

Table 7 .
The ratio of error flags at validation stations.

Table 6 .
The ratio of NA and error flags at automatic stations.

Table 7 .
The ratio of error flags at validation stations.