Optimal Solar Zenith Angle Deﬁnition for Combined Landsat-8 and Sentinel-2A/2B Data Angular Normalization Using Machine Learning Methods

: Data from Landsat-8 and Sentinel-2A/2B are often combined for terrestrial monitoring because of their similar spectral bands. The bidirectional reﬂectance distribution function (BRDF) effect has been observed in both Landsat-8 and Sentinel-2A/2B reﬂectance data. However, there is currently no deﬁnition of solar zenith angle ( θ sz ) that is suitable for the normalization of the BRDF-adjusted reﬂectance from the three sensors’ combined data. This paper describes the use of four machine learning (ML) models to predict a global θ sz that is suitable for the normalization of bidirectional reﬂectance from the combined data in 2018. The observed θ sz collected globally, and the three locations in the Democratic Republic of Congo (26.622 ◦ E, 0.356 ◦ N), Texas in the USA (99.406 ◦ W 30.751 ◦ N), and Finland (25.194 ◦ E, 61.653 ◦ N), are chosen to compare the performance of the ML models. At a global scale, the ML models of Support Vector Regression (SVR), Multi-Layer Perception (MLP), and Gaussian Process Regression (GPR) exhibit comparably good performance to that of polynomial regression, considering center latitude as the input to predict the global θ sz . GPR achieves the best overall performance considering the center latitude and acquisition time as inputs, with a root mean square error (RMSE) of 1.390 ◦ , a mean absolute error (MAE) of 0.689 ◦ , and a coefﬁcient of determination ( R 2 ) of 0.994. SVR shows an RMSE of 1.396 ◦ , an MAE of 0.638 ◦ , and an R 2 of 0.994, following GPR. For a speciﬁc location, the SVR and GPR models have higher accuracy than the polynomial regression, with GPR exhibiting the best performance, when center latitude and acquisition time are considered as inputs. GPR is recommended for predicting the global θ sz using the three sensors’ combined data.


Introduction
The polar orbit satellite Landsat-8, launched by National Aeronautics and Space Administration (NASA) [1,2] and the Sentinel-2A and 2B satellites, launched by ESA [3], have similar spectral bands. Together, these three satellites provide 10-30 m moderate spatial resolution multi-spectral global coverage. The combination of the three satellites presents a new solution for global moderate-resolution landcover monitoring. Compared with a single satellite, the combination of the three satellites, taking advantage of their complementary revisit interval patterns, provides a 2.9-day global median average revisit interval [4,5]. This would benefit numerous remote sensing applications, such as deforestation [6], fire monitoring [7], agriculture dynamics [8], and ice velocity detection [9].
Both Landsat-8 and Sentinel-2A/2B have sun-synchronous polar orbits. Their view angles are ±10.3 • (Sentinel-2) and ±7.5 • (Landsat-8) from the nadir view when acquiring observations, resulting in non-Lambertian surface directional reflectance effects. The magnitude of these effects varies as a function of geometry (i.e., the relation between the sun, the object, and the sensor), and is usually described by the bidirectional reflectance distribution function (BRDF). The BRDF effects should be corrected to provide stable and 2B satellites orbit at an altitude of 786 km and have an incline of 98.62 • . The scanning angle for both the satellites is ±10.3 • and the swath width is 290 km. Both sensors revisit the same location every 10 days, giving a combined revisit interval of five days. Sentinel-2 has an equatorial crossing time of 10:30 [17].

Global θ sz Metadata Records for Landsat-8 and Sentinel-2A/2B
The Landsat-8 observation metadata records were bulk downloaded from the United States Geological Survey (USGS) Landsat archive metadata database [18]. Metadata records with approximate scene dimensions of 185 km × 180 km are defined in the Worldwide Reference System (WRS-2) [19]. The metadata records acquired from January 1-December 31, 2018, were extracted. Only the images acquired during the daytime were used in this study.
The Sentinel-2A/2B metadata records were downloaded from the USGS Earth explorer [20]. Each metadata record is defined in a fixed 109 km × 109 km tile projected in the Universal Transverse Mercator (UTM) map projection [3,21]. The Sentinel-2 projected tiles are stored as Standard Archive Format for Europe (SAFE) files [17] and cut along orbit swaths. All the metadata records acquired from 1 January-31 December, 2018, were extracted. Only the images acquired in descending orbits were used in this study.
For each Sentinel-2A/2B metadata record, the following information was used: "Acquisition Start Date," "Acquisition End Date," "Sun Zenith Angle Mean," "Center Latitude dec," and "Center Longitude dec." The Act for each Sentinel-2 metadata record was computed as the average of the "Acquisition Start Date" and "Acquisition End Date," and the "Sun Zenith Angle Mean" was defined as the θ sz value for each scene. "Center Latitude dec" and "Center Longitude dec" were used as the Lat and Lon for each metadata record, respectively.
There are duplicated observations for Landsat-8 and Sentinel-2A/2B during 2018 because the satellites have a repeat circle of 16 days (Landsat) and 10 days (Sentinel). Large amounts of redundant data pose a challenge to the training of ML models. A large redundant dataset requires more memory to fit the data, and more consumption time is required to train and to extract useful features from the data. An increase in the size of the redundant data volume would decrease the predictive ability and effectiveness of a machine learning algorithm. [22,23]. Thus, a global dataset for the study was established by selecting every tenth line of the Landsat-8 and Sentinel-2A/2B datasets collected during 2018. The three datasets were then merged together from top to bottom. In total, 361,826 metadata records were used in this study, with 25,737 from Landsat-8, 161,799 from Sentinel-2A, and 174,290 from Sentinel-2B. As θ sz varies smoothly in space and time [12], it constitutes a good representation for the global θ sz dataset for the combination of Landsat-8 and Sentinel-2A -2B in 2018. The number of metadata records for Sentinel-2 are larger than that of Landsat-8. The proposed ML model is first trained from the combination of the three sensors and then predicts θ sz ; thus, this, not even available, dataset would infect the derived θ sz values slightly and become more inclined to Sentinel-2 data.

Local θ sz Metadata Records for Landsat-8 and Sentinel-2A/2B
To examine the ML models' performance on local θ sz , observations collected by Landsat-8 and Sentinel N, over path/row 188/17 or 189/17 for Landsat-8 and tile  number T35NMA for Sentinel-2) (henceforth referred to, for brevity, as Congo, Texas, and  Finland). These three locations were selected as they have a different latitude/longitude but span a large latitudinal range. Locations with different latitudes are used to examine the ML models' prediction performance because θ sz will change with latitude for a given local time.

Methodology
Following the optimal θ sz definition criteria described in [12], Lat, Lat and Act, and Lat and Lon and Act were used as the input variables into different ML models to predict θ sz respectively, ensuring that θ sz can be modeled at any location and date to produce consistent BRDF normalized reflectance data for the combination of Landsat-8 and Sentinel-2A -2B. This requirement enables a large volume of data auto processing for the three sensors combined. For generating multi-temporal composite produce, a consistent θ sz definition is representative of the same day in the compositing period, if it can be modelled at any location and any date [24]. Statistical metrics were used to evaluate the fitting performance between the predicted θ sz values given by each model and the testing datasets, ensuring that the difference between the observed and normalized θ sz was minimized, because it will introduce unreliability into the semi-empirical BRDF model to normalize reflectance data at the θ sz that are different from the θ sz used to invert BRDF model parameters [11]. These statistical metrics are the root mean squared error (RMSE), mean absolute error (MAE), and the coefficient of determination (R 2 ).
The process flow of the θ sz retrieval based on ML models is illustrated in Figure 1, and the process can be summarized as preprocessing, training, and prediction. In the preprocessing step, the input variables, e.g., Lat, Lon, and Act, were normalized and scaled, in which values were shifted and rescaled to ensure that they ended up with a value between 0 and 1. After this the normalized variables were fitted for the ML models as inputs. In addition, the global θ sz datasets were split randomly into training and testing datasets, with 70% of the data for training and 30% for testing in each set. In the training process, ten-fold cross-validation [25] was used to optimize the hyper-parameters of each model and check the overall performance of each regression method. In the crossvalidation, the datasets were randomly and equally dispatched into k groups (k = 10). In each validation process, (k − 1) groups were used as the training instances and one group was treated as the test instance. An evaluation score was obtained for the model in each validation process, giving a total of k evaluation scores after looping for every test instance. Finally, the hyper-parameters for each model were optimized in the crossvalidation process using the average of the evaluation score. In the prediction process, the predicted θ sz was estimated by each of the ML models using the best hyper-parameters obtained from the cross-validation process. Particularly, polynomial regression and four ML models were used to predict θ sz for the normalization. Remote Sens. 2021, 13, x FOR PEER REVIEW 5 of 21

Polynomial Regression Model
For reference, the single variable Lat was used as the input to a 6 th -degree polynomial to retrieve constant latitudinal sz θ . In this study, the following 6 th -degree polynomial regression was used as a benchmark model to compare with ML models, for both the latitudinal fix sz θ model and the physical astronomical model, described in Section 1, built upon the following polynomial regression [12,13]: where x is the Lat for each metadata record and y is the predicted sz θ from the polynomial regression.

ML Regression Models
ML has been successfully applied in many regression [14,15,[26][27][28] and classification [29][30][31] tasks. Despite the good fitting performance of parametric models, i.e., polynomial regression, the approximation of these models is usually limited by the explicit relationships they rely on. Unlike parametric regression, ML models learn the parameters and functional form from the data. In this study, four ML models were used to predict sz θ for the combination of Landsat-8 and Sentinel-2A/2B.

Polynomial Regression Model
For reference, the single variable Lat was used as the input to a 6 th -degree polynomial to retrieve constant latitudinal θ sz . In this study, the following 6 th -degree polynomial regression was used as a benchmark model to compare with ML models, for both the latitudinal fix θ sz model and the physical astronomical model, described in Section 1, built upon the following polynomial regression [12,13]: where x is the Lat for each metadata record and y is the predicted θ sz from the polynomial regression.

ML Regression Models
ML has been successfully applied in many regression [14,15,[26][27][28] and classification [29][30][31] tasks. Despite the good fitting performance of parametric models, i.e., polynomial regression, the approximation of these models is usually limited by the explicit relationships they rely on. Unlike parametric regression, ML models learn the parameters and functional form from the data. In this study, four ML models were used to predict θ sz for the combination of Landsat-8 and Sentinel-2A/2B.

Regularized Linear Regression
RLR was used to predict θ sz from different input variables. The output results were assumed to be a linear sum according to the weighted input variables, x = [x 1 , x 2 , · · · , x n ] T , then, θ sz = x T w. The maximum likelihood of the output results was obtained by minimizing the squared errors of the weights. Additionally, the weights w = [w 1 , w 2 , · · · , w n ] T can be estimated by least squares minimization.

Support Vector Regression
SVR is based on support vector machines, which are nonlinear ML models for classification and regression [32,33]. In the SVR model, a linear model was used to estimate the output θ sz after transforming the input variables into a high-dimensional space. A kernel function was used to transform the input data to ensure that a linear model can be fitted. After mapping the training data into hyperspace, Vapnik's ε-insensitive cost function was used to optimize the parameters of the linear model, as follows: where an error e = y −ŷ within a ε-defined margin is ignored, while the influence of the samples outside this linear margin is penalized. The parameter C regularizes the trade-off between model complexity and error frequency. In this work, we use the SVR implementation of the Python package Scikit-Learn [34], which is based on LIBSVM [35].
There are the following two key parameters: the margin distance, defined by ε, is used to evaluate the sample data that fall outside some boundaries, and the C factor is used to balance the weighting between complexity and accuracy. As well as the ε distance and C factor, the kernel type must be selected; a radial basis function kernel was used. To optimize the hyperparameters, we considered values of 1, 10, and 100 for the C factor and values of 0.001, 0.01, 0.1, and 0.2 for the ε distance in the experiments.

Gaussian Process Regression
Gaussian process models [22,36] are probabilistic ML models designed for regression and classification problems. They offer powerful regression ability in vegetation biophysical parameter retrieval [37,38], and landcover classification [39]. A GPR model assumes that the observed θ sz is a function of the input variables using a joint Gaussian distribution of the available observations with zero mean and covariance matrix K, as follows: where k * is the covariance between the training inputs and the testing inputs, k * * is the autocovariance for the testing inputs, and K + σ 2 I n defines the covariance matrix of the noise for training inputs. The GPR model uses Bayes' principle to define a posterior distribution and likelihood function over the output predictedŷ given the new input and the training dataset. The mean value of the posterior distribution is used as the prediction, and the confidence intervals of the prediction are derived from the likelihood function. In our implementation, GPR implementation of the Python package Scikit-Learn was used to derive the predicted θ sz . When learning a GPR model, some parameters related to the covariance or kernel functions also need to be specified. The RBF kernel (Radial-basis function kernel) was chosen, which is a stationary squared exponential kernel. The hyperparameters to be tuned for RBF kernel include magnitude, characteristic length, and noise variance. Additionally, the maximum-likelihood method was applied for parameter tuning in this study. Compared with SVR, GPR not only gives a prediction mean for the new observation, but also provides a full probabilistic posterior distribution.

Multi-Layer Perception
Multi-layer perception network (MLP) is a classical type of feedforward artificial neural network and has been widely used in different fields of remote sensing [15,27]. There are the following three parts in the MLP: an input layer, a hidden layer, and an output layer. A non-linear activation function was used in each node of neuron in the hidden and output layers. Additionally, there can be multiple layers of neurons in each hidden layer. The backpropagation iterative optimization process is used to adjust the connection weight between the hidden layers of the neurons by minimizing the estimation error [40]. The MLP model used a representation of neurons to achieve highly precise estimates of the nonlinear relationship between the input variable and the output result. Since the network is fully connected, each node in one layer connects to every node in the next layer with a certain weight. Therefore, the output c of each neuron is as follows: where a i and w i are the input and weight for the neuron, respectively, b i is the bias of the neuron, and ϕ is the activation function used. In our implementation, MLP Regressor in the Python Scikit-Learn package was used. The parameters for MLP include the number of hidden layers, number of neurons for each hidden layer, optimizer, and activation function. The ReLU activation function was used, and the Adam optimizer was chosen. To find the optimum parameters, the number of hidden layers is tested with 1, 2 and 3, and the number of neurons for each hidden layer is tested with (200), (200, 100) and (200, 140, 70). Figure 2 shows the observed θ sz plotted against the Lat and Lon for the global data in 2018 by Landsat-8, Sentinel-2A, Sentinel-2B, and the three sensors combined. For all three sensors, the variation of θ sz with latitude is apparent. As can be seen in Figure 2, θ sz increases from the equator area to the polar regions. Compared with Sentinel-2, Landsat-8 acquires more observations around the two polar regions; this is because of the different satellite imagery acquisition strategies [17,41]. There are few observed θ sz collected over the ocean, an abrupt change for θ sz was observed in the region at about 30 • N, −30 • E. This is because there is an interval of a few months between the observations. Additionally, for each grid point in the graph, if there were more than one θ sz sensed, the first was chosen to plot.

Global θ sz Distribution and Variations for Landsat-8 and Sentinel-2A/2B
Across the whole year, θ sz varies because the satellite local overpass time changes with the latitude and because of seasonal changes in the position of the sun. Figure 3 illustrates θ sz as a function of Lat and Act for Landsat-8, Sentinel-2A, Sentinel-2B, and the three sensors combined in 2018. As both Landsat-8 and Sentinel-2 have a polar orbit, a similar θ sz variation with respect to the Lat and Act for Landsat-8 and Sentinel-2 can be seen from Figure 3. Table 1 summarizes the global mean average, median average, minimum, maximum, and standard deviation of θ sz for each sensor and the three sensors combined in 2018. Across the whole year, sz θ varies because the satellite local overpass time changes with the latitude and because of seasonal changes in the position of the sun. Figure 3 illustrates sz θ as a function of Lat and Act for Landsat-8, Sentinel-2A, Sentinel-2B, and the three sensors combined in 2018. As both Landsat-8 and Sentinel-2 have a polar orbit, a similar sz θ variation with respect to the Lat and Act for Landsat-8 and Sentinel-2 can be seen from Figure  3.       Figure 4 shows a boxplot of the observed θ sz distribution for each of the sensor and the three sensors combined. The dotted orange lines show the median average for each θ sz data set. It is clear that the medium of θ sz for Landat-8 is higher than for Sentinel-2 and the three sensors combined due to the different equatorial crossing times of Landsat-8 and Sentinel-2. The medium value of θ sz for Sentinel-2 and the three sensors combined are on the same level because the number of observed θ sz for Sentinel-2 accounts for a large proportion of the three sensors combined.
8, Sentinel-2 gives a lower sz θ , with a maximum, minimum, and mean of 83.38, 14.74, and 43.89° for Sentinel-2A and 88.87, 14.76, and 44.89° for Sentinel-2B. This is because the average equatorial crossing time of Landsat-8 is 30 min earlier than Sentinel-2 (10:00 for Landsat-8, 10:30 for Sentinel-2). There are slight differences between the sz θ of Sentinel-2A and Sentinel-2B because they have a phase delay of 180° between them [3]. The global mean average difference in sz θ between Landsat-8 and Sentinel-2A/2B is 6.09°/5.09°.    Table 2 summarizes the accuracy metrics, with respect to the observed sz θ for the three sensors combined, for all five methods when using different combinations of input variables. When only using one input variable (Lat or Act) to predict sz θ , the SVR and MLP models achieve a comparable performance to that of polynomial regression. The GPR model is slightly better than polynomial regression, with 2 R = 0.526, MAE = 10.590°,  Table 2 summarizes the accuracy metrics, with respect to the observed θ sz for the three sensors combined, for all five methods when using different combinations of input variables. When only using one input variable (Lat or Act) to predict θ sz , the SVR and MLP models achieve a comparable performance to that of polynomial regression. The GPR model is slightly better than polynomial regression, with R 2 = 0.526, MAE = 10.590 • , and RMSE = 12.463 • with Lat as the input variable, compared with values of 0.525, 10.597 • , and 12.473 • , respectively, for polynomial regression. When more than one variable is used for the fitting process, the nonlinear ML models exhibit a significant improvement over the accuracy of the linear RLR model, as can be seen for the combination of Lat and Act, and Lat and Lon and Act in Table 2. Considering Lat and Act as inputs, all three nonlinear ML models achieve acceptable results. Specifically, GPR gives the best results in terms of R 2 (0.994), MAE (0.689 • ), and RMSE (1.390 • ), followed by SVR with values of 0.994, 0.638 • , and 1.396 • , respectively. Note that using three variables as inputs (Lat and Lon and Act) does not improve the global-scale prediction results of the GPR and SVR models compared to the use of two input variables (Lat and Act). This is consistent with the results discussed in Section 4.1, as Lat and Act are the two main factors determining the variation of θ sz .  Figure 5 compares the regression performance of different models using Lat (left) and Act (right). The dotted points denote the data used for training (yellow) and testing (gray). The polynomial regression parameterized single input, colored in blue, is plotted as the reference line. The plot on the left shows a bowl-shaped observed θ sz against Lat for all three nonlinear ML models, representing the polar solar-synchronous orbit geometry. The RLR model (orange) is a straight line, which does not fit much of the data along the center latitude in the x-axis. The GPR, MLP, and SVR models exhibit a regular regression fitting line of θ sz against latitude, with GPR (red) and SVR (green) providing a better fit than MLP (purple). Some differences between GPR and SVR appear at latitudes above 60 • north and below 60 • south, where GPR gives a better fit to the θ sz in high-latitude areas, with an RMSE of 12.463 • (GPR) compared to 12.597 • (SVR). The gray area illustrates the GPR θ sz fitting for the 95% confidence interval. There are few data located around 60 • S for the observed θ sz , because around this latitude most parts of the earth are occupied by ocean and a very small proportion of land ( Figure 2). Both of the two ML models predicted a smooth regression fitting line in this area, reflecting the ML models' prediction ability when the data were sparse or missed. The right-hand plot compares the regression performance of the ML models against the polynomial regression for θ sz against Act. The RLR model (orange) still follows a straight line, whereas all three nonlinear ML models give a sinusoidal shape. The performance of these three models is quite similar, with an R 2 around 0.1, indicating a relatively weak relationship between θ sz and Act. Figure 6 shows the results of an importance test of the input variables with respect to the predicted θ sz for the four ML regression models. The importance of each input variable in a prediction was measured using the MAE, which was computed by leaving out each input variable in turn and performing a test, and then computing the average value. After each test, a stable variable importance was established. A higher MAE value indicates that the input variable that has been left out has more importance to the model. For the RLR model, all three input variables have nearly the same importance to the fitting model because of the linear form of the combination of input variables. For all three nonlinear ML models, Lat has a strong correlation to the predicted θ sz , as shown in Figure 6. Act has a moderate correlation to the fitted θ sz , while there is only a weak correlation between Lon and θ sz . The results agree with our physical intuition regarding how each of the input variables contributes to the predicted θ sz , as discussed in Section 4.1. Remote Sens. 2021, 13, x FOR PEER REVIEW 11 of 21 Comparison of the performance of the polynomial regression and ML models using the plotted data. Figure 6 shows the results of an importance test of the input variables with respect to the predicted sz θ for the four ML regression models. The importance of each input variable in a prediction was measured using the MAE, which was computed by leaving out each input variable in turn and performing a test, and then computing the average value. After each test, a stable variable importance was established. A higher MAE value indicates that the input variable that has been left out has more importance to the model. For the RLR model, all three input variables have nearly the same importance to the fitting model because of the linear form of the combination of input variables. For all three nonlinear ML models, Lat has a strong correlation to the predicted sz θ , as shown in Figure 6. Act has a moderate correlation to the fitted sz θ , while there is only a weak correlation between Lon and sz θ . The results agree with our physical intuition regarding how each of the input variables contributes to the predicted sz θ , as discussed in Section 4.1.   Comparison of the performance of the polynomial regression and ML models using the plotted data. Figure 6 shows the results of an importance test of the input variables with respect to the predicted sz θ for the four ML regression models. The importance of each input variable in a prediction was measured using the MAE, which was computed by leaving out each input variable in turn and performing a test, and then computing the average value. After each test, a stable variable importance was established. A higher MAE value indicates that the input variable that has been left out has more importance to the model. For the RLR model, all three input variables have nearly the same importance to the fitting model because of the linear form of the combination of input variables. For all three nonlinear ML models, Lat has a strong correlation to the predicted sz θ , as shown in Figure 6. Act has a moderate correlation to the fitted sz θ , while there is only a weak correlation between Lon and sz θ . The results agree with our physical intuition regarding how each of the input variables contributes to the predicted sz θ , as discussed in Section 4.1.   Figure 7 shows a scatterplot of the observed θ sz against the predicted θ sz for the four ML models, taking Lat and Act as the input variables on a global scale. The RLR model presents the most disperse scatterplot, implying that relatively large errors could occur if a linear model was used to derive θ sz . This is because complex nonlinear relationships exist between the input variables and θ sz . It can be observed that the three nonlinear ML models produce a distribution of data around the 1:1 line (gray line). All of them achieve MAE values below 1 • (SVR: 0.638 • , GPR: 0.689 • , and MLP: 0.873 • ), which is less than the global mean average difference of θ sz between Landsat-8/Sentinel-2A (6.09 • ) and Landsat-8/Sentinel-2B (5.09 • ), implying that the ML models have an excellent ability to fit this nonlinear complex relationship. There are some outliers when θ sz was larger than 70 • for all the SVR, GPR, MLP models. This is because when polar-orbiting Landsat-8 and Sentinel-2 transit over polar regions, the observed θ sz varied greatly over the area with same Lat but different Lon [4] and also because Lat and Act were used as inputs. We do not include Lon to predict θ sz in Figure 7.

Performance of ML Models for Global θ sz Prediction
ML models produce a distribution of data around the 1:1 line (gray line). All of them achieve MAE values below 1° (SVR: 0.638°, GPR: 0.689°, and MLP: 0.873°), which is less than the global mean average difference of sz θ between Landsat-8/Sentinel-2A (6.09°) and Landsat-8/Sentinel-2B (5.09°), implying that the ML models have an excellent ability to fit this nonlinear complex relationship. There are some outliers when sz θ was larger than 70° for all the SVR, GPR, MLP models. This is because when polar-orbiting Landsat-8 and Sentinel-2 transit over polar regions, the observed sz θ varied greatly over the area with same Lat but different Lon [4] and also because Lat and Act were used as inputs. We do not include Lon to predict sz θ in Figure 7.    Figure 8 shows the distribution of the predicted θ sz using Lat and Act as the input variables for different ML models. The RLR model appears in the form of a fitted plane. The nonlinear fitted models have a very similar distribution to the observed θ sz (see Figure 3), with smooth variations in space and time, indicating good θ sz predictions for the three sensors combined.
As all three nonlinear ML models achieve good accuracy in terms of estimating the observed θ sz for the three sensors combined at the global scale, the computation time and RAM consumption were also investigated. A desktop computer with an Intel Core i7-8700 3.20 GHz CPU and 16 GB RAM running Windows 10 was used as the computing environment. Table 3 summarizes the computation times and RAM consumption of the different models for θ sz regression using Lat and Act as inputs. Compared with the other three nonlinear models, RLR has the lowest runtime and memory consumption (1.36 s and 501.7 MB, respectively). When the regression models become more complex, the computation time and RAM consumption increase (1747.27 s and 5118.5 MB for the GPR model, 7090.03 s and 4999.9 MB for the SVR model). The nonlinear fitted models have a very similar distribution to the observed sz θ (see Figure 3), with smooth variations in space and time, indicating good sz θ predictions for the three sensors combined. As all three nonlinear ML models achieve good accuracy in terms of estimating the observed sz θ for the three sensors combined at the global scale, the computation time and RAM consumption were also investigated. A desktop computer with an Intel Core i7-8700 3.20 GHz CPU and 16 GB RAM running Windows 10 was used as the computing environment. Table 3 summarizes the computation times and RAM consumption of the different models for sz θ regression using Lat and Act as inputs. Compared with the other three nonlinear models, RLR has the lowest runtime and memory consumption (1.36 s and 501.7 MB, respectively). When the regression models become more complex, the computation time and RAM consumption increase (1747.27 s and 5118.5 MB for the GPR model, 7090.03 s and 4999.9 MB for the SVR model).

Performance of ML Models for Local θ sz Prediction
The left-hand side of Figures 9-11 shows θ sz from the three sensors combined plotted as a function of Act for 2018 over the northeast of the Congo (Figure 9), Texas ( Figure 10) and Finland (Figure 11). θ sz against Lat was not shown, as there are nearly no variations for observed Lat over the three selected locations through 2018.
The variations in θ sz from the three sensors combined appear to be sinusoidal for the Congo because the solar geometric position swings back and forth around the equator.

Performance of ML Models for Local sz θ Prediction
The left-hand side of Figures 9-11 shows sz θ from the three sensors combined plotted as a function of Act for 2018 over the northeast of the Congo (Figure 9), Texas ( Figure  10) and Finland ( Figure 11). sz θ against Lat was not shown, as there are nearly no variations for observed Lat over the three selected locations through 2018.
The variations in sz θ from the three sensors combined appear to be sinusoidal for the Congo because the solar geometric position swings back and forth around the equator. In 2018, the mean average sz θ values for Landsat-8 and Sentinel-2A/2B were 31.023, 26.977, and 27.204°, respectively. The maximum variation in sz θ over the Congo in 2018 for the three sensors combined was 17.006°, with a maximum of 35.666° observed on June 23 from Landsat-8 and a minimum of 18.660° observed on October 1 from Sentinel-2B.   Finland (Figure 11, left) shows a similar bowl-shaped sz θ variations to Texas, but spans even greater ranges due to the higher latitude. The mean average sz θ values for Landsat-8 and Sentinel-2A/2B were 60.344, 56.505, and 56.932°, respectively, over 2018. The maximum variation in sz θ for the three sensors combined over Finland in 2018 was 46.514°, with a maximum of 84.859° from Landsat-8 on December 11 and a minimum of 38.345° from Sentinel-2B on June 23. The right-hand sides of Figures 9-11 compare the regression performance of the different ML models for predicting sz θ , considering Act as the input variable for the Congo, Texas, and Finland, respectively. Polynomial regression is shown as a reference. For all cases, the nonlinear ML models fit the change in sz θ with respect to Act. The RLR model The right-hand sides of Figures 9-11 compare the regression performance of the different ML models for predicting θ sz , considering Act as the input variable for the Congo, Texas, and Finland, respectively. Polynomial regression is shown as a reference. For all cases, the nonlinear ML models fit the change in θ sz with respect to Act. The RLR model produces a straight line for both cases. The regression results of SVR, GPR, and MLP reflect the variation of θ sz in the three locations.
Tables 4-6 present summary statistics for the predicted θ sz considering the inputs of Act, Lat and Act, and Lat and Lon and Act given by the different ML models for the Congo, Texas and Finland in 2018, respectively. Considering Act as the input variable, all of the nonlinear ML models achieve better results than the RLR linear model. SVR and GPR obtain better results than polynomial regression in all cases, achieving RMSEs of 1.769 and 1.810 • for the Congo, 1.099 and 1.162 • for Texas, and 0.187 and 0.181 • for Finland, respectively, compared with 1.943 • (Congo), 1.228 • (Texas), and 0.207 • (Finland) for polynomial regression. Compared with polynomial regression, SVR and GPR reduced the RMSE by 9.0 and 6.8% for Congo, 10.5 and 5.4% for Texas and 9.7 and 12.6% for Finland, respectively. The major difference between SVR and GPR occurs when Lat and Act, and Lat and Lon and Act are considered for training. For the Congo, GPR achieves an RMSE of 0.987 • when Lat and Act are considered as inputs, compared with 1.279 • for SVR; similar differences appear at the other two locations. This demonstrates that GPR achieves better regression performance when Lat and Act are considered. Additionally, the same results were found when Lat and Lon and Act were considered as inputs.

Discussion
The combination of Landsat-8 and Sentinel-2A/2B has been widely used to monitor landcover use and for landcover mapping [42]. The BRDF effect has been demonstrated in both Landsat-8 and Sentinel-2 reflectance data [10]. Semi-empirical approaches have been advocated for the normalization of Landsat-8 and Sentinel-2 directional reflectance data to the nadir view. However, no suitable definition of θ sz has yet been presented for normalizing the combined Landsat-8 and Sentinel-2A/2B directional reflectance data to produce reliable time series. θ sz has the potential to change greatly over space and time because of the latitudinal variation of the local cross time and also because of seasonal changes in solar position. The annual reflectance variations caused by changes in θ sz can be as large as 0.053 for the red band and 0.065 for the NIR band at latitudes of 50 • [43]. This is nontrivial and constitutes a serious issue for remote sensing applications.
In this paper, Landsat-8 and Sentinel-2A/2B metadata records collected in 2018 were analyzed. Combining the three sensors, the minimum and maximum θ sz are 14.739 and 89.985 • , and the mean value is 44.802 • with a standard deviation of 18.250 • . The differences in the global mean average θ sz between Landsat-8 and Sentinel-2A and between Landsat-8 and Sentinel-2B are 6.09 and 5.09 • , respectively. These differences are caused by the 30minute difference in the mean equator overpass time (Landsat-8: 10:00, Sentinel-2: 10:30), which necessitates the normalization of θ sz for the three sensors to eliminate the effects of BRDF from θ sz variations.
The novelty of this research lies in that the four ML models were used to learn and optimize the functional forms of these models and then predict the values of θ sz for the combination of the three sensors. The results showed that the GPR model achieved a higher accuracy to estimate the θ sz compared to the polynomial and astronomical physical models. Further, research on the quantification of θ sz for the combination of Landsat-8 and Sentinel-2A -2B is lacking and defining a θ sz suitable for normalizing the BRDF time series given the data set is required.
The four ML models, i.e., RLR, SVR, GPR, and MLP, were selected to predict θ sz for a combination of these three sensors. In the RLR model, the predicted θ sz is assumed to be a linear weighted sum of the input variables; thus, the performance of the RLR model is limited by its simple function form. The SVR model is a regression version of the traditional support vector classification model, and it estimates θ sz values by optimizing the penalty function after delivering at a sparse solution. GPR is a probabilistic approximation to non-parametric data distribution regression models; in the prediction, both a predictive mean value and result variance can be learned. The MLP model is a basic type of an artificial neural network, it can fit any nonlinear relationship through a non-linear activation function and back-propagation iterative optimization. We noted that the advantage of the MLP model compared to SVR and GPR is not substantial in θ sz prediction. This is possible because that MLP-base model presents a better performance when the datasets are non-linear, complex, and redundant, etc., image and natural language data [44,45].
The optimal θ sz definition for normalizing combined Landsat-8 and Sentinel-2A/2B reflectance data was considered as the following criteria: θ sz can be modelled at any location and date, and the difference between the observed θ sz and θ sz for normalization are minimized. Lat, Lat and Act, and Lat and Lon and Act were used as the input variables into different ML models to predict θ sz , respectively. RMSE, MAE, and R 2 metrics were used to evaluate the regression accuracy of the different ML models. Instead of an astronomical physical model [12], which builds upon polynomial and uses physical knowledge and mathematical relationships to convert geometrical coordinates to θ sz , four ML models were used to fit the values of θ sz . The functional forms of these models were learned and optimized from the global θ sz data from all three sensors. The performance of the astronomical physical model depends on the estimation accuracy of polynomial regression. The research results showed that the GPR and SVR models are slightly better than the polynomial for global θ sz data, and an improvement was made at all of the three locations' θ sz datasets.
With Lat as the input variable, the nonlinear ML models achieved RMSEs of 12.597 • (SVR), 12.463 • (GPR), and 12.470 • (MLP) and R 2 values of 0.516 (SVR), 0.526 (GPR), and 0.525 (MLP) when comparing the predicted θ sz with the test θ sz at the global scale. This prediction accuracy is comparable to that of the reference polynomial regression, which achieved an RMSE of 12.473 • and an R 2 of 0.525, with GPR slightly better. SVR, GPR, and MLP achieved RMSEs of 1.396, 1.390, and 1.504 • and R 2 values of 0.994, 0.994, and 0.993, respectively, when considering Lat and Act as the input variables; these values compare favorably with the linear RLR model (an RMSE of 17.484 • and an R 2 of 0.067). There is little further improvement when Lat and Lon and Act are used as input variables. A relative importance test showed that Lat is most closely correlated with the variation of θ sz , followed by Act. Lon has a weak relationship with θ sz at a global scale. This is consistent with our physical knowledge that θ sz varies with the latitudinal variation of local cross time and the seasonal changes in solar position.
The performance of the ML models was further evaluated at specific locations (Congo, Texas, and Finland). In the case of Finland, the predicted θ sz against the Act gave RMSEs of 0.187 • (SVR), 0.181 • (GPR), and 0.230 • (MLP) and R 2 values of 1.000 (SVR), 1.000 (GPR), and 1.000 (MLP), all of which are better than the linear RLR values of 14.443 • (RMSE) and −0.039 (R 2 ). Compared with the precision statistics from the polynomial regression (0.207 • for RMSE and 1.000 for R 2 ), MLP achieved comparable accuracy, and SVR and GPR performed better, reducing the RMSE by 9.7% (SVR) and 12.6% (GPR), compared with the polynomial. For the Congo and Texas, the nonlinear models (SVR, GPR, and MLP) exhibited more favorable performances than the linear RLR model. Compared with the well-established polynomial model, MLP produced a performance comparable to that of polynomial regression, and SVR and GPR gave greater accuracy in all cases. These results indicate that, although polynomial regression performs well, it is constrained by its explicit and simple parametric relations. On the contrary, for ML models, functional parameters are learned and optimized from data; thus, an improved performance can be achieved, compared with polynomial regression, in the case of complex nonlinear relations. GPR performs better than SVR, especially when Lat and Act, and Lat and Lon and Act were considered as input variables.
The resource consumption of each of the ML models was also investigated. The resource consumption rises with the model complexity, with training times of 7090.03 s for SVM, 1747.27 s for GPR, and 2152.50 s for MLP, compared with 1.36 s for RLR. All of the nonlinear models have a greater computational cost than the linear model. However, once the learning model has been trained, the model can be stored for further use; this is a clear advantage of the ML models over physical models.

Conclusions
In this study, the variation in θ sz for the combination of Landsat-8, Sentinel-2A, and Sentinel-2B was quantified for the year 2018. Throughout the year, the minimum and maximum θ sz for the three sensors combined are 14.739 and 89.985 • , respectively, giving a mean value of 44.802 • with a standard deviation of 18.250 • . As Landsat-8 crosses the equator 30 min before Sentinel-2, the global mean average θ sz difference is 6.09 • (between Landsat-8 and Sentinel-2A) and 5.09 • (between Landsat-8 and Sentinel-2B).
The four ML models were explored to train the forms and to optimize the model parameters and then used to estimate the values of θ sz for a combination of three sensors. The ML models exhibit a performance comparable to that of polynomial regression when Lat is used to predict the global θ sz . When Lat and Act were considered as the inputs, the nonlinear ML models achieve an obvious improvement in prediction accuracy compared with the linear model. The GPR model achieved the best overall model performance when using Lat and Act were used as the inputs, with an RMSE of 1.390 • , MAE of 0.689 • , and R 2 of 0.994, followed by the SVR model with an RMSE of 1.396 • , MAE of 0.638 • , and R 2 of 0.994. In addition, comprehensive analysis of the model regression for specific locations (Congo, Texas, and Finland) were discussed. Considering Act as input variable, the SVR and GPR models achieve more accurate estimations than the polynomial regression in all cases, implying that ML models can stratify θ sz definition criteria better than polynomial regression. GPR achieved the best results, especially when Lat and Act, and Lat and Lon and Act were considered as the input variables. The GPR model is recommended for the prediction of global θ sz for the three sensors combined in 2018.
In this study, more than 350,000 metadata records were obtained by down-sampling one-tenth of the combined dataset of Landsat-8 and Sentinel-2A/2B during 2018. A scalable Gaussian process [23] that has been specially designed for huge data volumes could be used to handle the full dataset. In the future, the proposed optimal θ sz could be used for the normalization of bidirectional reflectance to produce consistent global time series for the combination of Landsat-8 and Sentinel-2A -2B and quantify the effects of normalized θ sz on the reflectance data. Landsat-9, proposed for launch in mid-2021, will be placed into the current Landsat-7 orbit and will carry a refined version of the Landsat-8 sensor payload [46]. Thus, Landsat-9 data could be added to the experiment, giving a combination of Landsat-8/9 and Sentinel-2A/2B from which to derive θ sz values suitable for applying BRDF normalization, thus resulting in more consistent time series data.