Development and Validation of Machine-Learning Clear-Sky Detection Method Using 1-Min Irradiance Data and Sky Imagers at a Polluted Suburban Site, Xianghe

: Clear-sky detection (CSD) is of critical importance in solar energy applications and surface radiative budget studies. Existing CSD methods are not sufﬁciently validated due to the lack of high-temporal resolution and long-term CSD ground observations, especially at polluted sites. Using ﬁve-year high resolution ground-based solar radiation data and visual inspected Total Sky Imager (TSI) measurements at polluted Xianghe, a suburban site, this study validated 17 existing CSD methods and developed a new CSD model based on a machine-learning algorithm (Random Forest: RF). The propagation of systematic errors from input data to the calculated global horizontal irradiance (GHI) is conﬁrmed with Mean Absolute Error (MAE) increased by 99.7% (from 20.00 to 39.93 W · m − 2 ). Through qualitative evaluation, the novel Bright-Sun method outperforms the other traditional CSD methods at Xianghe site, with high accuracy score 0.73 and 0.92 under clear and cloudy conditions, respectively. The RF CSD model developed by one-year irradiance and TSI data shows more robust performance, with clear/cloudy-sky accuracy score of 0.78/0.88. Overall, the Bright-Sun and RF CSD models perform satisfactorily at heavy polluted sites. Further analysis shows the RF CSD model built with only GHI-related parameters can still achieve a mean accuracy score of 0.81, which indicates RF CSD models have the potential in dealing with sites only providing GHI observations. (accessed on 10 August We collected all the available MERRA-2 and AERONET data over Xianghe from January 2005 to December 2009. The SURFRAD data were accessed from radiation/surfrad/ (accessed on 10 August 2021). We collected the irradiance and TSI data at GWN and PSU from January 2009 to December 2009. Speciﬁcally, we thank Jamie M. Bright, Hongrong Shi and Xinlei Han for their help. The authors would like to thank the anonymous reviewers and editors for their valuable suggestions.


Introduction
Surface irradiance is vital in many different fields such as agriculture, atmospheric science, building design and engineering [1]. Clouds are a major modulator of surface irradiance causing dramatic difference from a clear-sky counterpart, based on which clearsky detection (CSD) methods on 1 min irradiance time series are developed (i.e., Gueymard et al. [2] and references therein). These CSD methods typically adopt global horizontal irradiance (GHI), and sometimes direct normal irradiance (DNI) or diffuse horizontal irradiance (DHI) [3][4][5] to build linear classifiers in nature across the boundary of cloudy and clear skies. Thus, they seem to be impossible to obtain good discrimination results under complex conditions. Many criteria for the magnitude and temporal variability of surface irradiance are used for cloud screening. Nevertheless, such methods, especially the thresholds they used, are mostly suitable for a specific climate and then lack spatial generalization [6]. Recently, Bright et al. [7] proposed a novel and globally applicable CSD method (Bright-Sun), which does not suffer limitations of existing CSD methods (i.e., inability at high zenith angle, and over-conservative or over-relaxed criteria). However, Irradiance measurements from January 2005 to December 2009 are taken at Xianghe (39.75 • N, 116.95 • E), a baseline surface radiation network (BSRN) site in the North China Plain. The annual mean aerosol optical depth (AOD) at 550 nm is 0.63, and a large day-today variation is observed (the standard deviation of 0.56) at Xianghe [16]. GHI is measured by a Kipp and Zonen CM21 pyranometer. An Eppley NIP pyrheliometer and a B&W pyranometer, installed on a solar tracker, measure DNI and DHI, respectively. The 1 min measurements are quality controlled by using the BSRN recommended procedures and uploaded to the BSRN data archive. The measurement uncertainties are about 6%, 3%, and 6% for GHI, DNI, and DHI, respectively [17].
TSI-440 is a full color sky camera that uses a solid-state charge-coupled device to take images at a rate of one minute during daytime. A picture with 352 × 288 pixels of sky is obtained by looking downward onto a hemispherical mirror. The intense beam irradiance is blocked by a shadow-band that prevents flares and protects the imager optics. A redto-blue threshold is used to distinguish between clear and cloudy pixels from 24-bit JPEG format images. The cloud cover is calculated as the number of cloudy pixels divided by the total number of pixels within field of view of 160 • . TSI-based sky discrimination results are further improved by manual inspection of raw TSI images to correct apparent errors, such as TSI misclassification of thin cloud into clear and misclassification of heavy haze into cloud. There are 307,995 clear samples and 441,810 cloudy samples in this study [18]. Each 1 min irradiance measurement is labeled with sky condition according to the corrected TSI Remote Sens. 2021, 13, 3763 3 of 14 measurements. Labeled data points of 0.75 million are used for CSD methods validation and the RF CSD model development.
Several CSD methods consider the measured irradiance time series alongside the corresponding clear-sky irradiance estimates from a cloudless-sky irradiance model. All 1 min clear-sky GHI (GHI cs ), DNI (DNI cs ), and DHI (DHI cs ) are calculated with the high-performance REST2 model here [19]. Sun et al. [20] evaluated the performance of 75 cloudless-sky irradiance models against ground stations worldwide and found that the REST2v5 model ranks 2nd globally. The REST-2 required inputs are solar zenith angle (SZA), extraterrestrial irradiance (GHI 0 ), site pressure, precipitable water vapor (PWV), ozone amount, nitrogen dioxide amount, AOD at 550 nm, Ångström exponent (AE), and surface albedo [19].
Generally, NASA's Modern-ERA Retrospective Analysis for Research and Application, version 2 (MERRA-2) reanalysis data are used to drive the REST2 model. Here, we compare the inputs extracted from MERRA-2 against the Aerosol Robotic Network (AERONET) [21] in Figure 1. Figure 1a shows the validation result of daily MERRA-2 AOD with AERONET measurements at Xianghe. The coefficient of determination (R 2 ) of 371 matched pairs is 0.59, and the Root Mean Square Error (RMSE) is 0.47. Meanwhile, the Mean Absolute Error (MAE) is 0.27. MERRA-2 AOD is underestimated, with the slope of the linear regression at 0.56, especially in heavy polluted conditions (AOD > 0.5). The result indicates that the aerosol products in the MERRA-2 show large error at Xianghe [22]. In Figure 1b, we found a good correlation of MERRA-2 and AERONET PWV with the values of R 2 , RMSE, MAE of 0.97, 0.30 and 0.19. The slope of linear fit indicates a slight overestimation of MERRA-2 reanalysis (1.06). According to the total 1006 matched pairs (Figure 1c), the R 2 of AE achieved at 0.23, the RMSE is 0.31, and MAE is 0.25, respectively. Nonetheless, the performance of AE is not as well as those of AOD and PWV. The linear regression shows that the MERRA-2 AE is overestimated for low AE. the total number of pixels within field of view of 160°. TSI-based sky discrimination results are further improved by manual inspection of raw TSI images to correct apparent errors, such as TSI misclassification of thin cloud into clear and misclassification of heavy haze into cloud. There are 307,995 clear samples and 441,810 cloudy samples in this study [18]. Each 1 min irradiance measurement is labeled with sky condition according to the corrected TSI measurements. Labeled data points of 0.75 million are used for CSD methods validation and the RF CSD model development.
Generally, NASA's Modern-ERA Retrospective Analysis for Research and Application, version 2 (MERRA-2) reanalysis data are used to drive the REST2 model. Here, we compare the inputs extracted from MERRA-2 against the Aerosol Robotic Network (AER-ONET) [21] in Figure 1. Figure (Figure 1c), the R 2 of AE achieved at 0.23, the RMSE is 0.31, and MAE is 0.25, respectively. Nonetheless, the performance of AE is not as well as those of AOD and PWV. The linear regression shows that the MERRA-2 AE is overestimated for low AE.   (Figure 2a), while REST2-calculated GHIcs by MERRA-2 data performs as insufficiently as that by AERONET (Figure 2b), evidenced by a higher MAE (33.93 W·m −2 ) and RMSE (43.94 W·m −2 ). The results confirm the propagation of errors from input data to the calculated cloudless-sky irradiance, which may also influence the judgement of ''clear'' periods in conventional CSD methods [7]. Therefore, some required inputs, namely AOD at 550 nm, PWV, and AE are extracted from AERONET, and others from MERRA-2 reanalysis.  Figure 2 shows the comparison of calculated GHI cs with measured clear-sky GHI. The REST2-calculated GHI cs using AERONET data has a good agreement with measured clear-sky GHI with MAE of 20.00 W·m −2 , and RMSE of 26.39 W·m −2 (Figure 2a), while REST2-calculated GHI cs by MERRA-2 data performs as insufficiently as that by AERONET (Figure 2b), evidenced by a higher MAE (33.93 W·m −2 ) and RMSE (43.94 W·m −2 ). The results confirm the propagation of errors from input data to the calculated cloudless-sky irradiance, which may also influence the judgement of "clear" periods in conventional CSD methods [7]. Therefore, some required inputs, namely AOD at 550 nm, PWV, and AE are extracted from AERONET, and others from MERRA-2 reanalysis. Remote Sens. 2021, 13, x FOR PEER REVIEW 4 of 15

Conventional CSD Methods
The conventional CSD methods [1] can be separated into two broad categories according to the data they adopt, i.e., CSDsky (the detection of clear-sky without any clouds in sky dome) in which GHI and/or DHI are used, and CSDsun in which only DNI is used to detect clear sky where the sun is not obscured by cloud. CSDsun is obviously less stringent than CSDsky. Only CSDsky methods are used for comparison, since the aim of this study is to detect clear skies with no cloud (cloudless sky). 17 CSDsky methods are briefly introduced, whereas detailed information can be found in references [1] and [7].
Some of the quantities used by the CSDsky methods appear in Appendix A and are explained as follows. Polo et al. [23] used a 2 × 2 matrix consisting of the ratio of the covariance and standard deviation of GHI and GHIcs. Clear skies are determined if the determinant of that matrix is lower than 0.005. The method was modified by Alia-Martinez et al. [24] who introduced an iteration procedure to select potential clear-sky GHI measurements. Clear skies require that Kc values exceed 0.80 [25], which tends to misclassify passing intermittent clouds as clear skies [7]. Batlles et al. [26] suggested Kt < Ktt and Kd > Kdt for the detection of clear skies. Perez et al. [27] detected clear periods with ε > 6.2. Reno and Hansen [28] recommended the following tests: the absolute mean and maximum differences in GHI and GHIcs are < 75 W·m −2 ; ΔL between GHI and GHIcs is between −5 and 10; the standard deviation of change rate of GHI should be < 0.005; the maximum difference between temporal variations of GHI and GHIcs is < 8 W·m −2 . Ellis et al. [29] optimized this method and made it applicable to GHI time series with varying time steps (1-30 min). Ineichen [30] compared the magnitude and temporal variability of DNI and GHI0 to detect clear skies. Kt' < 0.65 is used for discrimination of clear sky [4]. This method was modified by adding three extra constraints: Kt < 0.82, the stability of Kt within three hours < 0.01, and AOD < 0.5. Differences in the mean and maximum value, the irradiance increment, the standard deviation and the maximum of irradiance increment of GHI versus GHIcs as well as DNI versus DNIcs in a 10 min window are used in Inman et al. [31]. Lefevre et al. [5] determined clear skies by Kd < 0.3 and the standard deviation of Kt' within 90 min < 0.02. Xie and Liu [32] derived cloud fraction from GHI and DNI to identify clear sky with cloud fraction of zero. Long and Ackerman [33] suggested to normalize GHI and DHI by clear sky counterparts that are functions of the cosine of SZA (μ), i.e., (a × μ b ). Clear skies are detected if GHI and DHI measurements satisfy following criteria: 1) GHI < 1250 × μ 1.2 and GHI > 1000×μ 1.2 for SZA < 78.5°, or GHI > 900 × μ 1.2 for SZA > 78.5°; 2) DHI < 150 × μ 0.5 ; 3) GHI(t)-GHI(t-1) should fall within a specified μ-dependent range of concomitant GHI0 difference; 4) the standard deviation of normalized DHI differences between adjacent minutes within a 11 min window should be lower than 5. Garcia et al. [34] revised this method by introducing a function of parameter b to AOD. Zhang et al. [35] designed a

Conventional CSD Methods
The conventional CSD methods [1] can be separated into two broad categories according to the data they adopt, i.e., CSD sky (the detection of clear-sky without any clouds in sky dome) in which GHI and/or DHI are used, and CSD sun in which only DNI is used to detect clear sky where the sun is not obscured by cloud. CSD sun is obviously less stringent than CSD sky . Only CSD sky methods are used for comparison, since the aim of this study is to detect clear skies with no cloud (cloudless sky). 17 CSD sky methods are briefly introduced, whereas detailed information can be found in references [1,7].
Some of the quantities used by the CSD sky methods appear in Appendix A and are explained as follows. Polo et al. [23] used a 2 × 2 matrix consisting of the ratio of the covariance and standard deviation of GHI and GHI cs . Clear skies are determined if the determinant of that matrix is lower than 0.005. The method was modified by Alia-Martinez et al. [24] who introduced an iteration procedure to select potential clear-sky GHI measurements. Clear skies require that K c values exceed 0.80 [25], which tends to misclassify passing intermittent clouds as clear skies [7]. Batlles et al. [26] suggested K t < K tt and K d > K dt for the detection of clear skies. Perez et al. [27] detected clear periods with ε > 6.2. Reno and Hansen [28] recommended the following tests: the absolute mean and maximum differences in GHI and GHI cs are < 75 W·m −2 ; ∆L between GHI and GHI cs is between −5 and 10; the standard deviation of change rate of GHI should be < 0.005; the maximum difference between temporal variations of GHI and GHI cs is < 8 W·m −2 . Ellis et al. [29] optimized this method and made it applicable to GHI time series with varying time steps (1-30 min). Ineichen [30] compared the magnitude and temporal variability of DNI and GHI 0 to detect clear skies. K t ' < 0.65 is used for discrimination of clear sky [4]. This method was modified by adding three extra constraints: K t < 0.82, the stability of K t within three hours < 0.01, and AOD < 0.5. Differences in the mean and maximum value, the irradiance increment, the standard deviation and the maximum of irradiance increment of GHI versus GHI cs as well as DNI versus DNI cs in a 10 min window are used in Inman et al. [31]. Lefevre et al. [5] determined clear skies by K d < 0.3 and the standard deviation of K t ' within 90 min < 0.02. Xie and Liu [32] derived cloud fraction from GHI and DNI to identify clear sky with cloud fraction of zero. Long and Ackerman [33] suggested to normalize GHI and DHI by clear sky counterparts that are functions of the cosine of SZA (µ), i.e., (a × µ b ). Clear skies are detected if GHI and DHI measurements satisfy following criteria: (1) GHI < 1250 × µ 1.2 and GHI > 1000 × µ 1.2 for SZA < 78.5 • , or GHI > 900 × µ 1.2 for SZA > 78.5 • ; (2) DHI < 150 × µ 0.5 ; (3) GHI(t)-GHI(t-1) should fall within a specified µdependent range of concomitant GHI 0 difference; (4) the standard deviation of normalized DHI differences between adjacent minutes within a 11 min window should be lower than 5. Garcia et al. [34] revised this method by introducing a function of parameter b to AOD. Zhang et al. [35] designed a CSD sky method based on incremental differences in both GHI and GHI cs . The tolerance of absolute differences between the measured and clear-sky increments within a 30 min period should be 0.1.
Considering clouds may impose negligible impact on GHI whilst significantly influencing DHI, Bright et al. [7] propose a CSD method (Bright-Sun) exhibits extra discretization power by including analysis on DHI. The Bright-Sun CSD method consists of three stages: firstly, the input clear-sky irradiance curve is optimized, secondly, a tri-component (GHI, DNI and DHI) multi-criteria analysis is performed, and finally a duration filter is applied. Through qualitative evaluation, the Bright-Sun method suffers less limitations of the existing CSD methods, and presents superior and more consistent global performance than existing CSD methods in five distinct stations.
In total, 7 methods out of 17 conventional CSD methods, namely Batlles, Garcia, Inceichen16, Lefevre, Long, Perez, and Bright-Sun utilize DHI components. Eight methods (AliaMartinez, Ellis, Inman, Polo, Quesada, Reno, and Zhang) use GHI cs , two methods adopt both GHI cs and DNI cs to detect cloudless conditions (Inman and Xie), and only Bright-Sun uses both GHI cs and DHI cs .

Machine-Learning Methods
RF is an ensemble learning algorithm using the concept of bagging. It has similar principle with existing CSD methods, which constructs a set of decision rules by thresholds of predictor variables. RF receives a number of decision tree classifiers from sub-samples of the data and combines them. It is advantageous to build classifiers as this reduces the variance of the classification and avoids over-fitting. The accuracy is assessed by the residual mean square of the out-of-bag data. The prediction process integrates the results of all regression trees [36]. A critical feature of RF is that the feature importance can be evaluated by the impurity, which may make the learning procedure simpler and faster. Therefore, a RF model is developed to classify clear and cloudy skies.

Model Construction and Sensitivity Analysis
The RF CSD model is implemented on the Python platform using the Ensemble module in the scikit-learn toolbox [37].

Choice of Input Features
As illustrated in Section 2.2, there are numbers of the quantities used by the CSD sky methods to classify clear and cloudy skies. Though ML specializes in big data with multiply features, huge consumption of computer resources should also be considered. Meanwhile, accurate calculation of clear-sky irradiance needs many parameters; however, many ground sites may provide only irradiance measurements due to the limited equipment. To select the input feature with more generality and accelerate computing, µ and four GHI-related features, including K d , K t , GHI difference in adjacent minutes (∆GHI), and the standard deviation of GHI within 10 min period (Std10), are used to detect clear and cloudy skies. It should be noted that DHI as an early warning signal for potential clouds is a good index to classify sky conditions [7]. Therefore, we select K d as an agent of DHI ( Figure 3). Generally, K d and K t are the sign of the influence of atmosphere and cloud on solar radiation, and ∆GHI and Std10 show the temporal variation of GHI. The histogram of input features under clear and cloudy conditions in 2005 is shown in Figure 3. All the variables are normalized to the range of 0-1 by Equation (1); x represents input features.

Model Construction
The flowchart of developing a RF CSD model in this study is shown in Figure 4. The dataset is divided into two groups: the data in 2005 (one year) for training and 2006-2009 (four years) for testing. In our case, the number of training and testing records are 181,710 and 568,095, respectively. The grid search and 10-fold cross validation are applied in hyperparameter optimization. The 10-fold cross validation splits total training dataset into k (k = 10) consecutive folds, and each fold is then used once as a validation while the k -1 remaining folds form the training set. The model with optimal hyperparameters is thereafter applied to the testing dataset to evaluate model performance. The thresholds, intervals and optimums of hyperparameters for the RF model are shown in Table 1. The performance metric used in this study is accuracy score, which is defined as the number of correct predictions divides total predictions. The mean accuracy score in 10-fold validation ('best_score_') of the optimal model is 0.81.

Model Construction
The flowchart of developing a RF CSD model in this study is shown in Figure 4. The dataset is divided into two groups: the data in 2005 (one year) for training and 2006-2009 (four years) for testing. In our case, the number of training and testing records are 181,710 and 568,095, respectively. The grid search and 10-fold cross validation are applied in hyperparameter optimization. The 10-fold cross validation splits total training dataset into k (k = 10) consecutive folds, and each fold is then used once as a validation while the k − 1 remaining folds form the training set. The model with optimal hyperparameters is thereafter applied to the testing dataset to evaluate model performance. The thresholds, intervals and optimums of hyperparameters for the RF model are shown in Table 1. The performance metric used in this study is accuracy score, which is defined as the number of correct predictions divides total predictions. The mean accuracy score in 10-fold validation ('best_score_') of the optimal model is 0.81. The RF CSD model also provides the importance of features. Compared to other inputs, K d , K t and Std10 show relatively higher importance in an RF model, with the magnitude of 0.38, 0.14, and 0.32, respectively (Table 2). After determining the optimal hyperparameters, the optimal RF model is used to detect clear sky in the testing dataset.  remaining folds form the training set. The model with optimal hyperpar after applied to the testing dataset to evaluate model performance. The t vals and optimums of hyperparameters for the RF model are shown in T formance metric used in this study is accuracy score, which is defined a correct predictions divides total predictions. The mean accuracy score in tion ('best_score_') of the optimal model is 0.81.

Results
The performance of 17 CSD sky methods (including Bright-Sun) and the RF CSD model is presented in Figure 5. Regarding to the conventional CSD sky methods except Bright-Sun, higher clear-sky accuracy score often associates with lower cloudy-sky accuracy score, and vice versa. For instance, Quesada works very well to detect clear skies (almost all clear skies are properly detected) while the performance of detection of cloudy skies is barely satisfactory (cloudy accuracy score is only about 0.2). This is associated with loose criteria of these methods for detecting clear skies which misclassifies cloudy samples into clear skies (16 January 2009 in Figure 6). On the other hand, Long and Perez adopt very strict criteria for clear sky detection, which naturally excludes most clear sky samples (the clear-sky accuracy score < 0.1), but a high cloudy-sky accuracy score (~1.0) is achieved. In  Figure 6); however, it has bad performance under polluted conditions (15 January 2009 in Figure 6). It seems difficult for these conventional methods to achieve good performance under both clear and cloudy skies, especially at heavy polluted sites such as Xianghe.
In qualitative analysis, Bright-Sun provides the best performance among the 17 CSD sky methods, whose accuracy score are 0.73 (clear-sky) and 0.92 (cloudy-sky), respectively. The accuracy score of Bright-Sun is higher than those of methods with relatively high accuracy score under both clear and cloudy conditions, i.e., Ineichen06 approximate clear-and cloudy-sky accuracy score (0.78 and 0.88), the RF CSD model is more "balanced" compared to Bright-Sun. Figure 7 evaluates the frequency distribution of visual inspected CSD results (clear and cloudy) and accuracy score of Bright-Sun and RF CSD models under different AOD backgrounds. The clear-sky accuracy score of Bright-Sun decreases (from 0.85 to 0.60) with the increase of AOD (from 0 < AOD ≤ 0.2 to 0.5 < AOD), while the cloudy-sky accuracy score is generally high (about 0.9), and slightly increases (from 0.90 to 0.93) with the increase of AOD. Although Bright-Sun has a relatively low clear-sky accuracy score (0.6) under heavy polluted conditions, it is still a reliable CSD method with a high mean accuracy score in polluted areas.
The clear accuracy score of the RF CSD model is higher than that of Bright-Sun under clean conditions (0 < AOD ≤ 0.5) with over 0.91, and slightly lower than that of Bright-Sun under heavy polluted conditions (0.5 < AOD) with only 0.57. Meanwhile the cloudy accuracy score of RF CSD model increases significantly (from 0.60 to 0.91) with the increase of AOD (from 0 < AOD ≤ 0.2 to 0.5 < AOD), which is lower than that of Bright-Sun. In general, the RF CSD model can also be used to obtain high-accuracy CSD results.  Figure 5). The mean accuracy score of the RF CSD model (0.84) is about equal to that of Bright-Sun (0.84). Considering the relatively approximate clear-and cloudy-sky accuracy score (0.78 and 0.88), the RF CSD model is more "balanced" compared to Bright-Sun. Figure 7 evaluates the frequency distribution of visual inspected CSD results (clear and cloudy) and accuracy score of Bright-Sun and RF CSD models under different AOD backgrounds. The clear-sky accuracy score of Bright-Sun decreases (from 0.85 to 0.60) with the increase of AOD (from 0 < AOD ≤ 0.2 to 0.5 < AOD), while the cloudy-sky accuracy score is generally high (about 0.9), and slightly increases (from 0.90 to 0.93) with the increase of AOD. Although Bright-Sun has a relatively low clear-sky accuracy score (0.6) under heavy polluted conditions, it is still a reliable CSD method with a high mean accuracy score in polluted areas.
The clear accuracy score of the RF CSD model is higher than that of Bright-Sun under

Inuput Features
Since partial sites may only be equipped with GHI observations, we further studied the performance of the RF CSD model when DHI is not considered as an input feature. The optimal parameters and mean accuracy score of the RF CSD model without considering kd are shown in Table 3. The mean accuracy score of the RF CSD model without kd (0.81) is lower than that of the model with DHI-related parameters, but better than CSDsky methods except Bright-Sun. The result shows that the RF CSD model can achieve over a 0.8 accuracy score with only GHI components as inputs.

Training Set Size
As shown in Figure 7, the accuracy score of RF CSD model has similar pattern with the frequency distribution of AOD, which implies the sample size may be an important factor for RF algorithm. For looking for the "least" training set size, it is useful to compare the accuracy score of models trained by various length data with same testing set. We also apply 10-fold cross validation to perform the hyperparameter optimization on 1-month (January), 3-months (January to March), 6-months (January to June), and 9-months (January to September) training set, and optimums are shown in Table 4. The mean accuracy score of models on same testing set are quite close. The mean accuracy score increases slightly from 0.83 to 0.84 between the model trained by 3 months data and the model trained by 1 year data. In other words, although the accuracy score increases along with the training set size, 3-month ground-truth CSD data is enough to develop a RF CSD model with relatively high accuracy score.

Inuput Features
Since partial sites may only be equipped with GHI observations, we further studied the performance of the RF CSD model when DHI is not considered as an input feature. The optimal parameters and mean accuracy score of the RF CSD model without considering k d are shown in Table 3. The mean accuracy score of the RF CSD model without k d (0.81) is lower than that of the model with DHI-related parameters, but better than CSD sky methods except Bright-Sun. The result shows that the RF CSD model can achieve over a 0.8 accuracy score with only GHI components as inputs.

Training Set Size
As shown in Figure 7, the accuracy score of RF CSD model has similar pattern with the frequency distribution of AOD, which implies the sample size may be an important factor for RF algorithm. For looking for the "least" training set size, it is useful to compare the accuracy score of models trained by various length data with same testing set. We also apply 10-fold cross validation to perform the hyperparameter optimization on 1-month (January), 3-months (January to March), 6-months (January to June), and 9-months (January to September) training set, and optimums are shown in Table 4. The mean accuracy score of models on same testing set are quite close. The mean accuracy score increases slightly from 0.83 to 0.84 between the model trained by 3 months data and the model trained by 1 year data. In other words, although the accuracy score increases along with the training set size, 3-month ground-truth CSD data is enough to develop a RF CSD model with relatively high accuracy score.

Validation over SURFRAD Network
In order to check the validity of the RF CSD model at the other sites, here we select 1-year (2009) radiation and TSI data from two SURFRAD sites (Goodwin Creek, Mississippi (GWN), and Penn. State Univ., Pennsylvania: PSU). The dataset in GWN and PSU is divided into two groups: the data from January to March (3 months, same length as Section 5.2) for training and from April to December (9 months) for testing. With the exactly hyperparameter optimization strategies (grid search and 10-fold cross validation), optimal models of these two sites were obtained, separately. Table 5 provides a compilation of RF CSD models trained and tested using data from GWN and PSU. It consists of four classes: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). TP means that the models correctly identify a state as clear, whereas TN means that a state is correctly identified as cloudy. FP refer to a clear sky misclassified as cloudy sky and FN is a state identified as clear by the RF CSD model while it is actually cloudy. The high mean accuracy score is achieved at both GWN and PSU (0.89 and 0.97). Therefore, training a RF CSD model is considered effective in CSD problems.

Conclusions
This study developed a RF CSD model to detect clear skies using one-year minutely surfaced irradiance data. The performance of the newly developed model is compared to the existing CSD sky methods, especially the novel Bright-Sun method, at Xianghe, a heavy polluted site in China. Major conclusions are as follows.
The propagation of systematically errors from input data (using MERRA-2 instead of AERONET) to the calculated GHI cs is impressive with MAE increased by 99.7% (from 20.00 to 39.93 W·m −2 ). With most existing CSD sky methods it is difficult to obtain high accuracy scores under both clear and cloudy conditions at heavily polluted sites such as Xianghe. Some methods work very well under clear skies but not under cloudy conditions because they adopt loose criteria for clear sky detection; for instance, Quesada's clear-sky accuracy score is almost 1, but the cloudy-sky accuracy score is only about 0.2. On the contrary, some other methods, such as Long and Perez, tell a different story because very strict criteria are adopted to detect clear skies, and their clear-sky accuracy score are under 0.1.
Bright-Sun provides better performance than the other 16 CSD sky methods, whose accuracy score is 0.73 (clear-sky) and 0.92 (cloudy-sky). The Bright-Sun integrated many concepts from the existing methods. It requires inputs of µ, GHI, DHI, GHI cs , DHI cs and the local standard time (LST). Though the Bright-Sun model would perform optimization of GHI cs and DHI cs on a day-by-day basis, the calculation of these two variables still requires relatively tedious data preparation, e.g., the number of required inputs of REST2 is nine. The RF CSD model demonstrates similar mean accuracy scores as that of Bright-Sun (0.84), but they are more "balanced" under clear and cloudy conditions.
The results in this study imply the applicability of the Bright-Sun CSD method in polluted sites and the capacity of the ML technique to obtain reliable CSD results. Besides, the RF CSD model built with a three-month training dataset and only GHI-related information can still achieve a high accuracy score (0.83 and 0.81, respectively), which is better than conventional CSD sky methods except Bright-Sun. Additionally, training a RF CSD model at the other sites has been shown to be effective in solving CSD problems. Nonetheless, cautions should be taken, since the ML model is subject to some limitations, such as the requirement of a large number of samples (such as over three months in this study). Application of the RF CSD model to other regions with different climatology still needs further systematical tests.

Data Availability Statement:
The data used in this study are available on request from the corresponding author.