Modeling Bidirectional Polarization Distribution Function of Land Surfaces Using Machine Learning Techniques

Liu, Siyuan; Lin, Yi; Yan, Lei; Yang, Bin

doi:10.3390/rs12233891

Open AccessArticle

Modeling Bidirectional Polarization Distribution Function of Land Surfaces Using Machine Learning Techniques

¹

Beijing Key Lab of Spatial Information Integration and 3S Application, Institute of Remote Sensing and Geographic Information System, School of Earth and Space Sciences, Peking University, Beijing 100871, China

²

Guangxi Key Laboratory of Remote Measuring System, Guiling University of Aerospace Technology, Guilin 541004, China

³

College of Electrical and Information Engineering, Hunan University, Changsha 410082, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2020, 12(23), 3891; https://doi.org/10.3390/rs12233891

Submission received: 20 October 2020 / Revised: 23 November 2020 / Accepted: 25 November 2020 / Published: 27 November 2020

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Accurate estimation of polarized reflectance (R_p) of land surfaces is critical for remote sensing of aerosol optical properties. In the last two decades, many data-driven bidirectional polarization distribution function (BPDF) models have been proposed for accurate estimation of R_p, among which the generalized regression neural network (GRNN) based BPDF model has been reported to perform the best. GRNN is just a simple machine learning (ML) technique that can solve non-linear problems. Many ML techniques were reported to work well in solving non-linear problems and consequently may provide better performance in BPDF modeling. However, incorporating various ML techniques with BPDF modeling and comparing their performances have never been well documented. In this study, three widely used ML algorithms—i.e., support vector regression (SVR), K-nearest-neighbor (KNN), and random forest (RF)—were applied for BPDF modeling. Using measurements collected by the Polarization and Directionality of the Earth’s Reflectance onboard PARASOL satellite (POLDER/PARASOL), non-linear relationships between R_p and the input variables, i.e., Fresnel factor (F_p), scattering angle (SA), reflectance at 670 nm (R₆₇₀) and 865 nm (R₈₆₅), were built using these ML algorithms. Results showed that taking F_p, SA, R₆₇₀, and R₈₆₅ as input variables, the performance of the four ML-based BPDF models was quite similar. The KNN-based BPDF model provided slightly better results, and improved the accuracy of the semi-empirical BPDF models by 9.55% in terms of the overall root mean square error (RMSE). Experiments of different configuration of input variables suggested that using multi-band reflectance as input variables provided better results than using vegetation indices. The RF-based BPDF model using all reflectances at six bands as input variables produced the best results, improving the overall accuracy by 6.62% compared with the GRNN-based BPDF model. Among all the input variables, reflectance at absorbing spectral bands—e.g., 490 nm and 670 nm—played more significant roles in RF-based BPDF modeling due to the domination of polarized partition in total reflectance. Fresnel factor and scattering angle were also important for BPDF modeling. This study confirmed the feasibility of applying ML techniques to more accurate BPDF modeling, and the RF-based BPDF model proposed in this study can be used to increase the accuracy of remote sensing of the complete aerosol properties.

Keywords:

bidirectional polarization distribution function (BPDF); land surfaces; machine learning; random forest; POLDER

Graphical Abstract

1. Introduction

Radiation scattered by the Earth’s surface is partially polarized [1,2,3]. Polarized reflectance (R_p) of the Earth’s surface reveals essential information as, on one hand, it characterizes the optical properties of land surface [4,5]; and on the other hand, it serves as the boundary condition for retrieval of aerosol optical properties [6,7,8]. It has been reported that R_p follows an anisotropic distribution pattern [9,10,11]. The angular distribution of R_p can be parameterized by the bidirectional polarization distribution function (BPDF). Accurate BPDF models are thus essential for estimation of R_p, and are important for studies of land surface and aerosol.

Polarized characteristics of land surfaces have been widely studied from various scales [5,10,12,13]. The targets include different types of vegetation [1,2,5,14,15,16,17,18], soil [19,20], snow and ice [4,11], urban surfaces [21], as well as other basic elements including soot [4,22] and man-made targets [3], among others. These studies promoted the development of BPDF models at the same time. In the past three decades, several BPDF models have been proposed based on measurements from various polarimetric sensors over various land surface types [23]. These models can be generally categorized as two types, physical models and data-driven models. The physical models start with simulating first scattering of radiation by leaves [14,24], and they were then combined with biophysical properties of vegetation, e.g., leaf inclination angle [15,19]. Physical models reveal good physical interpretations given that they were built based on radiative transfer process. However, they are not as accurate as expected, especially in forward scattering directions when viewing angle is large [10,19]. Monte-Carlo ray tracing based vector radiative transfer models for both leaf [25] and canopy [26,27] have been recently proposed by simulating the propagation of a large number of rays within hundreds of voxels. They give more accurate simulation of reflected polarized reflectance, but are rather complicated and require large computational costs. Conversely, the data-driven methods provide good options for BPDF modeling by directly fitting against the polarization measurements. These models are easy-to-use and can also provide satisfactory performance. The data-driven BPDF models have thus been preferred for modeling polarized reflectance of land surfaces.

There are two types of data-driven BPDF models, semi-empirical and machine-learning-based (ML-based) models. The semi-empirical models combined the physical models with several free parameters, which can be empirically derived by fitting the model with measurements [9,10,11,20,21,28,29]. These semi-empirical models were easy-to-use and could achieve an accuracy of nearly 0.003 in terms of root-mean-square error (RMSE) [23]. All the semi-empirical models indicated that R_p is non-linearly related to factors regarding sun-sensor geometry, refractive index, and/or vegetation coverage [10,20,21,29]. The ML-based BPDF models utilize the advantage of ML techniques on solving non-linear regression problems [30,31,32,33], and build a solid relationship between R_p and the corresponding input variables. Recently, a generalized regression neural network (GRNN) based BPDF model was proposed [34]. Compared to the semi-empirical BPDF models, this GRNN-based BPDF model can reduce the errors by 13.4% on average, indicating the good performance of the ML-based models.

With the development of ML techniques, various ML algorithms have been proposed. Among these algorithms, GRNN is just a relatively simple neural network algorithm [34]. Several other popular algorithms, such as the simpler algorithms, e.g., K-nearest-neighbor (KNN) [35], more complicated algorithms, e.g., support vector regression (SVR) [30] and ensemble methods, e.g., random forest (RF) [36], also have shown good performance in solving the non-linear regression problems, and they have been widely applied in many research fields of remote sensing, such as classification and change detection [37,38], retrieval of biophysical and biochemical properties of vegetation [36,39,40], among others. It is thus possible to build ML-based BPDF models using these popular algorithms. Such work is important because it may give an even more accurate BPDF model than the GRNN-based BPDF model, and consequently help to improve the retrieval accuracy of aerosol optical depth and micro-physical properties of atmosphere. However, according to the best of our knowledge, this work has never been well documented.

Given this, this paper built three ML-based BPDF models using the SVR, KNN, and RF algorithms. The objective of this paper is to comprehensively explore the performance of these ML-based BPDF models. For this goal, we (1) compared the accuracy of the proposed ML-based BPDF models with that of the GRNN-based model as well as the widely used semi-empirical models; (2) analyzed the advantages of the ML-based BPDF models over the semi-empirical BPDF models; (3) improved the accuracy of the ML-based models by selecting the optimal configuration of the model input variables; and (4) discussed the limitations and potential of the ML-based BPDF models.

The paper is organized as follows. Section 2 gives an introduction of the data used and brief descriptions on the ML-based algorithms and the semi-empirical models. The results of ML-based BPDF models and the comparison with the semi-empirical models are presented in Section 3. Section 4 discusses the advantages, improvement, limitations, and potential of the ML-based BPDF models. Finally, Section 5 summarizes this paper.

2. Data and Methods

2.1. POLDER/PARASOL BRDF-BPDF Database

In this study, the most up-to-date POLDER/PARASOL BRDF-BPDF database released in January 2017 was used [41]. This database provides a great number of bidirectional reflectance and polarized reflectance data of globally distributed earth targets observed from POLDER/PARASOL. It has been widely used for the studies on both BRDF and BPDF modeling of land surfaces [11,23,42,43]. Although the data acquisition period of POLDER/PARASOL is from early 2005 to October 2013, only data acquired in 2008 were used to generate the database because of its best acquisition continuity. The atmospherically corrected surface bidirectional reflectance factor (BRF) at six wavebands from visible to near-infrared spectral region (i.e., 490 nm, 565 nm, 670 nm, 765 nm, 865 nm, and 1020 nm) was provided in the database. Besides, the top-of-atmosphere polarized reflectance (R_p) was measured at three wavebands, i.e., 490 nm, 670 nm, and 865 nm. However, only atmospherically corrected surface R_p at 865 nm was provided in the database, because (i) the atmospheric correction has larger uncertainty for shorter wavelengths, and (ii) the surface R_p is generally assumed to be spectrally neutral [41]. The data were allocated and stored according to 16 International Geosphere Biosphere Program (IGBP) classes. In each IGBP class, 50 selected targets with best quality were provided for every month of the year, i.e., ideally there were 600 selected targets for a given IGBP class. Note that although the POLDER pixels have a resolution of 6.2 × 6.2 km, only pixels with more than 75% occupation of one land cover type were kept, which guaranteed the relatively high homogeneity of the selected targets in the database [41]. The database can be freely downloaded from the PANGAEA website (doi:10.1594/PANGAEA.864090).

Statistics of the selected observations used in this study are listed in Table 1. It is notable that missing observations (with filled values in the database) were excluded from the database. Moreover, to suppress the aerosol effects, the observations with Aero field value bigger than 5 were also excluded from the data. A target was removed if all the corresponding observations were excluded. The number of selected targets was thus less than 600 per IGBP class. It is notable that there is a small occupation of negative polarized reflectance in each IGBP class. Unlike the positive values which indicate the polarizing direction of reflected light perpendicular to the scattering plane, the negative values represent such directions parallel to the scattering plane. The scattering plane here is the principal plane, i.e., the plane containing the sun and view direction. The negative R_p always occur in the back-scattering direction when the scattering angle is near to 180°. The negative R_p takes up nearly 7.24% on average in the database, as shown in Table 1.

2.2. Machine Learning (ML)-Based BPDF Models

Four machine learning regression algorithms, involving GRNN, SVR, KNN, and RF, were used in this study to build the ML-based BPDF models. The configuration of the input variables is introduced in Section 2.2.1, after which brief description of the four ML algorithms is given. All the four algorithms were implemented in MATLAB (R2019b). The summary of the four ML algorithms was listed in Table A1.

2.2.1. Selection of Input Variables

The polarized radiation is primarily generated by the specular reflection which occurs at the land surfaces. The Fresnel function, F_p, can describe this process and is widely used for BPDF modeling [9,23,34]. Analogously, F_p is used as one of the input variables in this study. It is a function of the scattering angle (SA) and the refractive index of the air-surface interface (N), as

F_{p} (α, N) = \frac{1}{2} [{(\frac{N \cos α^{'} - \cos α}{N \cos α^{'} + \cos α})}^{2} - {(\frac{N \cos α - \cos α^{'}}{N \cos α + \cos α^{'}})}^{2}],

(1)

where

α

is the incident angle, which can be calculated from SA through

α = (π - SA) / 2

.

α^{'}

is the refractive angle and is related to

α

by

\sin (α) = N \sin (α^{'})

. Here, N is fixed to 1.5, which is commonly accepted [24,29].

Another input variable is the scattering angle (SA), which has great impact on R_p. Previous studies showed that for a given target, R_p tends to be larger when SA is smaller and it approaches zero, or even negative, when SA becomes larger and approaches 180° [10,23]. SA characterizes the sun-sensor geometry and can be calculated using sun zenith angle

θ_{s}

, view zenith angle

θ_{v}

and relative azimuth angle

φ

of the sun and view direction

\cos (SA) = - \cos (θ_{s}) \cos (θ_{v}) - \sin (θ_{s}) \sin (θ_{v}) \cos (φ) .

(2)

Moreover, as is documented in [10,21], the vegetation coverage was found to be negatively correlated to R_p, given that vegetation produce lower R_p than soil [21]. Consequently, surface reflectance at the two vegetation-sensitive wavebands, i.e., 670 nm (R₆₇₀) and 865 nm (R₈₆₅) was considered as the input variables.

Therefore, four variables, F_p, SA, R₆₇₀ and R₈₆₅ were selected as the input of the ML-based models, i.e., the size of input matrix was n-by-4, where n was the number of observations.

2.2.2. Generalized Regression Neural Networks (GRNN)

GRNN is one of the radial basis function (RBF) based neural networks. It builds a joint probability density function of training input and the output variable. GRNN has four layers containing the input, the pattern, the summation and the output layer [44]. For each of the input variables X (1-by-4 vector in this study), the pattern layer is calculated by using the input and a specific training sample

X_{i}

through a Gaussian RBF

p_{i} = \exp [- \frac{{(X - X_{i})}^{T} (X - X_{i})}{2 σ^{2}}],

(3)

where

p_{i}

is the i-th neuron of the pattern layer, and

σ

is the smoothing parameter of the Gaussian RBF, determining the size of the sensitive area of the model. Then two neurons of the summation layer are calculated by the simple sum of the pattern neurons and the sum weighted by the corresponding output (i.e., the R_p) of the training samples, respectively. Finally, the output layer gives the estimated R_p by calculating the ratio between the two summation neurons. Small

σ

may lead to overfitting whereas large

σ

may result in under-fitting of the training samples.

As a smoothing factor controlling the shape of the Gaussian RBF,

σ

has a significant impact on the model performance. Cross-validation method is applied to find the optimal selection of

σ

. The searching range of

σ

was set from zero to 0.2 with a step of 0.01, as documented in [34]. The GRNN was modeled in MATLAB using a function newgrnn.

2.2.3. Support Vector Regression (SVR)

SVR is the regression implementation of support vector machine (SVM). Among various types of SVR algorithm, the classical and widely used

ε

-SVR was used in this study [45,46,47].

ε

-SVR uses a kernel function to transform the input data into a high-dimensional feature space, and then uses the support vectors whose training error lying outside the

ε

margin to build a regression function with hyper parameters. A nonlinear Gaussian RBF, with an adjustable parameter γ, was used in this study as a kernel function. γ controls the basis radius of the RBF, which reflects the size of the sensitive area of each support vector. A larger γ indicates a narrower sensitive area of the RBF, making the model more likely to be over-fitted. In the loss function, a regularization constant C is used to control the trade-off penalty on the support vectors, whereas observations within the

ε

-tube are not penalized. A larger C also leads to a bigger possibility of overfitting. In this study,

ε

was set to 0.01, and the appropriate setting of γ and C is crucial to the model performance.

The optimizing ranges of γ and C parameters were from 10⁻⁵ to 10² and from 10⁻² to 10², respectively, which is similar to the configuration used in [48]. The parameters were then optimized by minimizing the merit function

RMSECV (γ_{k}, C_{k}) = \frac{\sum_{i = 1}^{m} {RMSE}_{i th} (γ_{k}, C_{k})}{m},

(4)

where

RMSECV (γ_{k}, C_{k})

is the root mean square error of cross-validation using the k-th set of the combination of γ and C; m is the fold number of the cross-validation;

{RMSE}_{i th}

is the root mean square error of the model estimation in the i-th iteration of the cross-validation. The optimization process was repeated for each IGBP class. The optimizing was performed using a MATLAB function fminsearchbnd, with the boundary values mentioned above. The

ε

-SVR was implemented using the MATLAB interface of the V3.24 LIBSVM [49].

2.2.4. K-Nearest-Neighbor (KNN) Regression

KNN is a simple supervised learning algorithm for both classification and regression. To solve the regression problem, KNN implements the estimation based on the values of K nearest neighbors of the query input point. To define the distance between the two multi-dimensional points (four-dimensional points in this study), the Euclidean distance is the most common choice and was thus used in this study. After the selection of the K neighbors, the estimation of the query point is given by the weighted value of the selected neighbors through a specific weighting function [50]

w_{i} = \frac{1}{d_{i}^{t}} / \sum_{i = 1}^{K} \frac{1}{d_{i}^{t}},

(5)

where

d_{i}

, I = 1, 2,…k, is the distance between the query point and the i-th nearest neighbor; t generally takes values from 0, 1, and 2 [35,50,51]. In this study, t was fixed to 1 [51,52]—i.e., the weights were assigned proportional to the inverse of the distance between the neighbor and the query points; that is, the nearby points contributed more to the estimated value than the faraway points. The model-estimated polarized reflectance of the query point is then calculated as the weighted average of the nearest points.

As a key parameter of the KNN algorithm, K has a great impact on the model performance. Smaller K may give under-fitting results whereas larger K is more likely to leads to overfitting. In this study, K was optimized within the range from zero to 200 using the cross-validation method, which was repeated for each IGBP class. The structure of fast query of nearest points was built using MATLAB function KDTreeSearcher, and the query was achieved using function knnsearch.

2.2.5. Random Forest (RF) Regression

RF regression is a non-parameter ensemble learning algorithm [53]. RF regression grows many decision trees simultaneously and implements the estimation based on the results of all trees, so it generally gives efficient and robust performance [54]. In the training procedure, RF regression uses bootstrapping approach to randomly select about two-thirds of samples from the training dataset, and the remaining one-third are used to calculate the out-of-bag (OOB) error to represent the performance of the built model. To grow each tree independently, RF randomly selected one-third (two variables of the total four in this study) every time for growing each individual tree. Using the standard Classification and Regression Tree (CART), the best splitting variable and the best splitting value are determined by minimizing a weighted impurity

G

from the left and the right node after the splitting

G (x_{i}, v_{i j}) = \frac{n_{l}}{N_{t}} H (X_{l}) + \frac{n_{r}}{N_{t}} H (X_{r}),

(6)

where

v_{i j}

, the splitting point, is the j-th value of the splitting variable

x_{i}

;

n_{l}

, and

n_{r}

are the number of training samples of the left node and right node after the splitting, respectively;

N_{t}

is the number of training samples of the node to be split;

H (X_{l})

or

H (X_{r})

is the impurity function of the left or the right node. For the regression problem, the mean squared error (MSE) serves as the

H (X)

. MSE is defined as the mean square of the deviation between the training targets and their average within a node after the splitting. In the prediction procedure, the average of the predictions of all trees are taken as the estimated output value for each of the query input observation.

The two key parameters of RF regression—i.e., the number of the trees (ntree) and the minimum leafs of the terminal nodes (nodesize)—have significant impact on the model performance. Smaller ntree with larger nodesize indicate a less dense forest with deeper trees. nodesize was set to 5 in this study to balance the training accuracy and the generalizing ability of the model [36]. It was reported that a selection of ntree bigger than 100 guarantees the stability of the RF model [54], so ntree was set to 100 in this study. The RF regression was implemented using TreeBagger function in MATLAB, with the option “OOBPredictorImportance” set to “on” to calculate the importance of all input variables. The process of RF modeling and the calculation of importance was repeated for 100 times in this study to reduce the uncertainty, and the average results were finally used.

2.3. Semi-Empirical BPDF Models

To assess the performance of the ML-based BPDF models, semi-empirical models were taken as a reference. Semi-empirical BPDF models have been widely used to estimate the surface R_p, given the specified sun-sensor geometry and the model-required information [55,56]. These models were initially proposed for different land covers and were built based on the measurements from different instruments [9,10,11,21,23,28,29,57]. They have been proven to yield relatively high accuracy by introducing empirically free parameters to the modeling process. Table 2 gives the brief introduction of the six representative semi-empirical models—i.e., Nadal–Bréon, Waquet, Maignan, Litvinov, Diner and Xie-Cheng models. The estimation accuracy of these models were used in this study for the comparison with the ML-based BPDF models.

2.4. Optimization and Selection of Model Parameters

For each IGBP class, the data were randomly divided into two parts, i.e., the training dataset, and the validation dataset. For SVR-, KNN-, and GRNN-based models, model parameters were optimized using 10-fold cross-validation within the training dataset. That is to say, for each candidate parameter within the optimizing range, the training dataset was randomly divided into 10 parts, nine of which were used to build the model and the other for the validation. The average of the root mean square errors (RMSEs) of the 10 iterations was calculated. After the searching of all possible parameters, the parameter yielding smallest average model RMSE was chosen to be optimal, and the model was then built based on the whole training dataset using the optimal parameter.

For each semi-empirical model and for each IGBP class, fitting with observations from each of the targets in the training dataset gave hundreds of sets of target-based free parameters. Then, the median value of these target-based parameters were chosen as the class-based free parameter, i.e., the a priori free parameter. The scheme for generating the a priori free parameters was widely used in previous studies [10,23,55]. Finally, the model with a priori free parameter was considered a priori model for the R_p estimation.

2.5. Evaluation and Comparison

The procedure of evaluation and comparison of the ML-based and semi-empirical BPDF models are illustrated in Figure 1. The step-by-step approach can be briefly described as follows:

Step 1: Divide the POLDER/PARASOL BPDF database into two parts, 75% as the training dataset and 25% as the validation dataset. Each dataset includes the input variables, involving F_p, SA, R₆₇₀, and R₈₆₅, and the output R_p.

Step 2: (i) For GRNN-, SVR-, and KNN-based BPDF models, use the training dataset to optimize the model parameters, i.e., σ for GRNN, C and γ for SVR and K for KNN. For the RF-based BPDF model, set the ntree to 100; (ii) For the six semi-empirical models, use the training dataset to obtain the a priori free parameters.

Step 3: (i) Use the input variables of the validation dataset to run the ML-based models with the optimal parameters selected in Step 2 to estimate R_p; (ii) Use the validation dataset to run the six semi-empirical models with the a priori free parameters obtained in Step 2 to estimate R_p.

Step 4: Compare the estimated R_p with the measured R_p from the validation dataset, calculate the RMSE and evaluate models’ performance for both ML-based and semi-empirical models. Intercompare the accuracy of the SVR-, KNN-, and RF-based BPDF models with that of the GRNN-based BPDF model.

3. Results

Among the four ML-based models, the GRNN-, SVR-, and KNN- based BPDF models required the optimizing process of model parameters, following Section 2.4 and Figure 1. The selected optimal parameters for these three models were shown in Table 3, whereas the ntree of the RF model was fixed to 100. ML-based models with the optimal model parameters, and semi-empirical models with the calculated a priori free parameters (not shown), were used to estimate R_p in the validation dataset.

The RMSEs (×100) of the semi-empirical and the ML-based models were illustrated in Table 4. It can be seen that all the four ML-based BPDF models outperform the semi-empirical models for all IGBP classes. Among the ML-based models, the KNN-based BPDF model gives the best results in terms of RMSE for most of the IGBP classes, i.e., 10 out of 16 classes. Following are the GRNN- and the SVR-based models, with best performance for six and three classes, respectively. The relatively worst results among the four ML-based models were obtained by the RF-based model, which only gives the best results for IGBP 01. Nevertheless, the performances of the four ML-based models are quite close to each other, with the difference within 0.0001 in regards of RMSE. Conversely, the accuracy of the six semi-empirical models vary greatly, among which the Xie-Cheng model is the best performing one, yielding the best results for 12 IGBP classes, as shown in Table 4. The last column of Table 4 records the improvement (in percentage) of the KNN-based model over the Xie-Cheng model, and it is shown that the KNN-based model outperforms the Xie-Cheng model for all the IGBP classes. The largest improvement of more than 18% was seen in IGBP 06 and 07 (closed and open shrubland), whereas the smallest improvement of 3.57% was found in IGBP 13 (urban and built-up area), which is the surface type that Xie-Cheng model proposed for. Besides, IGBP 04 (deciduous broadleaf forest), 09 (savannas), 15 (snow and ice), and 16 (desert) were also found with the improvement of more than 10%. Overall, compared with the Xie-Cheng model, the KNN-based model improved the accuracy by 9.55%. It is thus clear that the ML-based BPDF models are more accurate than the semi-empirical models.

Figure 2 shows the measurement-model plots of the best performing semi-empirical model, i.e., the Xie-Cheng model, the GRNN-based BPDF model and the three newly built ML-based BPDF models. Both Xie-Cheng and ML-based BPDF models reproduced R_p with good correlation with the measurement, with the correlation coefficient (Cor.) bigger than 0.8. ML-based models yield results with higher Cor., improving Cor. of the Xie-Cheng model by 0.01 to 0.05. The RMSE of the ML-based models are apparently lower than that of the Xie-Cheng model. The Xie-Cheng model tends to be saturated when R_p becomes larger, i.e., R_p is either underestimated or overestimated in such cases. This kind of saturation is obvious in Figure 2 especially for IGBP 02 (forest), 07 (shrubland), and 10 (grassland) when the measured R_p is bigger than 0.3%, 0.8%, and 0.8%, respectively. Such saturation, however, does not appear in the four ML-based models. Additionally, the Xie-Cheng model failed to estimate the negative R_p, whereas it can be reproduced by the ML-based models, as can be seen from Figure 2.

Figure 3 illustrated the ML-based BPDF models’ capability to reproduce the negative R_p when observations of all 16 IGBP classes (18580 observations) are taken into consideration. There is no much difference among the accuracy of the four models in terms of both RMSE (around 0.0025) and Cor. (around 0.076), among which the SVR-based BPDF model yields slightly lower RMSE. Most of the measured negative R_p are no less than −0.3% except for IGBP 15 (ice and snow, 25.8% of all the observations with negative R_p), whose negative R_p can reach −1%. However, there is an obvious overestimation for the four models, with most of the modeled R_p ranging from −0.2% to 0.2% or even greater.

4. Discussion

Polarized reflectance has been found to have a non-linear relationship with sun-sensor geometry and/or vegetation coverage [10,23,34]. In this study, ML techniques were adopted for parameterization of such non-linear relationship. We built three ML-based models, i.e., the SVR-, KNN-, and RF-based BPDF models for the estimation of the polarized reflectance. The performances of these ML-based BPDF models were compared to that of the GRNN-based model and six widely used semi-empirical BPDF models using the POLDER/PARASOL measurements. Experiments suggested that (1) the accuracy of the three newly proposed ML-based models and the GRNN-based model were quite close to each other, among which the KNN-based model showed slightly better performance; (2) the ML-based BPDF models provided comprehensively better performance than the widely used semi-empirical models (Table 4). These results thus confirmed the feasibility of accurate estimation of polarized reflectance of land surfaces using the machine learning techniques.

4.1. Advantages of the ML-Based BPDF Models

One advantage of the ML-based BPDF models over the semi-empirical models found in this study is their ability of avoiding modeling saturation in the forward scattering directions. For the semi-empirical models, the scheme of selection the class-based a priori free parameters led to a reduction of the a priori modeling accuracy. The selected priori free parameters determined the maximum polarized reflectance of land surfaces in the forward scattering directions. They, however, cannot represent the polarized characteristics of every target. The saturation phenomenon occurred especially for those targets whose polarized reflectance has a large bias from the one determined by the selected priori free parameters. In contrast, the ML-based models simultaneously considered the impact of input multi-variables, i.e., the Fresnel function, the scattering angle, and multi-band reflectance. They directly built a non-linear function between polarized reflectance of land surfaces and the input variables without a fixed mathematical expression. They thus can avoid the limitation of saturation.

Another advantage of the ML-based BPDF models is their ability of modeling negative polarized reflectance of land surfaces. Negative polarized reflectance was generally observed at the backscattering directions where multiple successive scattering occurred [10]. ML-based BPDF models recognized this phenomenon in the training dataset, and could reproduce similar signature when they were used for estimation of polarized reflectance. That is to say, if the polarized reflectance is negative under a specific case (e.g., backscattering directions) in the training dataset, the ML-based BPDF models were likely to give a negative estimation when the input variables were similar to the specific case. As for the semi-empirical BPDF models, they neglected such phenomenon and could only give positive estimations.

4.2. Further Improvements Using Different Configuration of Input Variables

Selection of input variables is critical for the performance of the ML algorithms [30,36,50,58]. Among various information provided in the POLDER/PARASOL BRDF-BPDF database, only four variables—i.e., F_p, SA, R₆₇₀, and R₈₆₅—were used as the inputs of the ML-based BPDF models in this study. Naturally, one may question if different configurations of input variables—e.g., addition of reflectance at other wavebands or vegetation indices—could help to improve the performance of the ML-based BPDF models. For this purpose, we designed four experiments, for which different configuration of input variables were adopted. These input configurations are listed in Table 5. It is notable that F_p and SA were used in all these experiments because they are directly related to the polarized characteristics of land surfaces. C0 is the original configuration used in Section 3. In C1, reflectance at 565 nm (R₅₆₅) was considered. Unlike C0 and C1, vegetation indices were used as input variables in C2. Here, the selected VIs included two types, i.e., normalized difference (ND) and simple ratio (SR) [59]

{ND}_{ref, idx} = \frac{R_{ref} - R_{idx}}{R_{ref} + R_{idx}},

(7)

{SR}_{ref, idx} = \frac{R_{ref}}{R_{idx}},

(8)

where

R_{ref}

and

R_{idx}

are reflectance at the reference and index wavelengths, respectively. The index wavelength was fixed to 670 nm, whereas the reference wavelength referred to 865 nm or 565 nm. Finally, C3 adopted reflectances at all the available bands in the BRDF-BPDF database. The optimized model parameters of different configurations were listed in Table A3.

Figure 4 and Table 6 showed the improvements of the ML-based BPDF models with different configuration of input variables over the GRNN-based BPDF model with C0 as input variables. The results suggested that using multi-band reflectance as input variables provided better performance than that using VI, and reflectance of more wavebands leads to a more accurate model. Compared with C0, the RMSE of C1 and C3 were reduced for most of the IGBP classes. Moreover, C3 performed even better than C1, indicating that the more reflectances were utilized, the better performance was achieved. As for using VI as input variables (i.e., C2), the accuracy decreased for most of the IGBP classes (green values in Table 6). For IGBP 04 (deciduous broadleaf forest), 10 (grassland) and 16 (desert), the accuracy was even decreased by more than 5%. It thus suggested that VI could not yield results as good as reflectance, which was also indicated in [60].

When all available reflectances were utilized as input variables, the RF-based BPDF model performed the best. Compared to the GRNN-based model under C0, the overall RMSE of the RF-based BPDF model was decreased by 6.62% (subfigure in Figure 4). The decrease could be up to 16.97% and 14.90% for IGBP 06 and 07 (closed and open shrubland), respectively (Table 6). It was as expected from two aspects. On one hand, more input variables made the RF algorithm randomly select more variables each time for growing each tree, making the individual tree more robust. On the other hand, as an ensemble tree model, RF is more stable when solving the regression problem whose input variables are highly correlated, compared with other ML algorithms. As seen from the matrix of correlation coefficients of the input variables (Table A2), Fresnel factor and scattering angle were highly correlated, and reflectances from 565 nm to 865 nm were moderately correlated with each other. More input variables make RF less possible to choose a pair of well correlated variables for growing an individual tree. For the RF-based BPDF model of each IGBP class, importance of the eight input variables was given in Table 7. Among the eight input variables, R₄₉₀ is the most important one for seven IGBP classes; followed are F_p and R₆₇₀, which are with the highest value of importance for four and three IGBP classes, respectively. It can be explained by the fact that for most of the land surfaces, e.g., vegetation, polarized reflectance takes up more of total reflectance or even dominates the reflectance of absorbing spectral regions—e.g., 490 nm, 670 nm, and/or some shortwave infrared wavebands [55,59]. Conversely in non-absorbing wavebands, polarized scattering is negligible compared with total reflectance. As a result, reflectance at absorbing bands explain more polarized information. Fresnel factor conveys polarized reflection generated by specular reflection, and was assumed in this and previous studies to be the only generator of the polarized reflectance. It has thus been consistently utilized as a key parameter in all the physical and semi-empirical BPDF models. The importance of F_p obtained here further confirmed the feasibility of such assumption. According to Table 7, Fresnel factor, scattering angle, and reflectance at 490 nm and 670 nm were overall important for all the IGBP classes in RF-based BPDF modeling, as they were always in the top two important input variables for every IGBP class. Scattering angle’s relatively high importance can be attributed to its high correlation with the Fresnel factor (Table A2); the increase of polarized reflectance with the decrease of the scattering angle can also explain. Importance of the input variables provided helpful information for future studies on BPDF modeling.

In addition to the higher accuracy, the RF-based model required no parameter to be optimized. It made the training process simple and consequently less time-consuming, compared with the other ML-based BPDF models. The RF-based BPDF model with all the available six-band reflectance as input variables was thus recommended in this study for estimation of polarized reflectance of land surfaces.

4.3. Limitations of the ML-Based BPDF Models

There are two potential limitations of the ML-based BPDF models. One limitation is that the ML-based BPDF models cannot guarantee a significantly high accuracy for negative R_p’s estimation, as shown in Figure 3. The overestimation might be attributed to the impact of neighbor observations in the backscattering area [34]. The observations with negative R_p were generally concentrated in a very small region where the corresponding SA is close to 180°. The R_p at the surrounding directions were generally positive. Moreover, the value of negative R_p is close to zero (generally greater than −0.003) and smaller than that of positive R_p. These two factors made the estimation of negative R_p easily affected by the surrounding positive R_p.

Another limitation is that the parameters optimized in this study (Table 3) might not be directly applied to measurements from other platforms. It is because these parameters were optimized only against the POLDER/PARASOL measurements, and might be sensor-sensitive. Further optimizing and training process might be needed if one wants to apply these models to other polarization measurements, like measurements from airborne AirMSPI [61], MICROPOL [29], and RSP [62], and the space-borne sensors DPC/GF-5 [63] and 3MI/EPS-SG [64].

4.4. Potential Applications of the ML-Based BPDF Models

Using the information of model implementation following Table A1 and optimized input parameters listed in Table 3 (with only four input variables) or Table A3 (with eight input variables), the four ML-based BPDF models trained from POLDER BPDF database can be directly transferred to estimate polarized reflectance from remote sensing data of any platforms. If polarized observations are available, an optimizing process of model parameters should be conducted within the suggested boundaries listed in Table A1 except for the RF-based model. The optimization aims to find a parameter producing the minimum RMSE of cross validation. Although practical, this transfer requires further accuracy assessment to confirm its stability, by using polarized data from other polarimetric platforms. Further investigation using polarized observation from multi-sensors as the training database may produce optimized model parameters that are more generalized.

Polarized reflectance of land surface serves as a necessary boundary condition for retrieval of aerosol optical depth (AOD) [7,21]. The POLDER/PARASOL provided polarized reflectance of various earth targets with an accuracy with errors bigger than 0.001 [21]. It has been reported that a typical error of 0.001 on surface polarized reflectance could lead to an error of 0.04 on AOD [10], Therefore, accurate BPDF models are required for an accurate estimation of surface polarized reflectance. The semi-empirical BPDF models listed in this study produced an overall RMSE of 0.0030. Their error could be even greater for special surface types, e.g., urban area and snow and ice (Table 4). The KNN- and RF-based BPDF model used in this study reduced the overall error to 0.0027 and 0.0025 (Table 4 and Table 6), respectively. The reduction, however, could not significantly improve the AOD retrieval accuracy. Nevertheless, more accurate polarized reflectance is also helpful for retrieval of microphysical properties of aerosol. For this purpose, the four ML-based BPDF models used in this study can be utilized for remote sensing of the complete aerosol properties more precisely [56].

5. Conclusions

In this study, three ML-based BPDF models were proposed using SVR, KNN, and RF algorithms. These models were evaluated and compared to the GRNN-based and semi-empirical BPDF model using the popular POLDER/PARASOL measurements. When Fresnel factor, scattering angle and two-band reflectance (R₆₇₀ and R₈₆₅) were taken as the input variables, the performances of the ML-based models were quite similar. The KNN-based BPDF model gave slightly better accuracy, and compared with the best semi-empirical model, it decreased the overall RMSE by 9.55%. It was confirmed that the ML techniques performed comprehensively better than semi-empirical regressions, as, on one hand, they could avoid model saturation; and on the other hand, they were able to reproduce the negative polarized reflectance. If all available reflectances at six bands were taken as input variables, the RF-based BPDF model was the most accurate and is thus recommended. The improvement achieved by the RF-based BPDF model over the GRNN-based model is 6.62%. Analyses on the importance of input variables of the RF-based BPDF model suggested that reflectance at 490 nm is the most important input variables for most land surfaces, and Fresnel factor and reflectance at 670 nm also played necessary roles.

This study confirmed the feasibility of applying machine learning techniques to more accurate BPDF modeling. It is notable that the RF-based BPDF model proposed in this study could be used to improve the accuracy of complete aerosol retrieval. The efficiency of the RF algorithm allows its easy migration for training against measurements collected from other polarimetric sensors. This paper gives an insight of applying machine learning techniques to accurate BPDF modeling and thus provides an alternative solution for accurate estimation of polarized reflectance of land surfaces.

Author Contributions

Conceptualization, L.Y. and B.Y.; Data curation, S.L. and B.Y.; Formal analysis, S.L., Y.L., and B.Y.; Funding acquisition, Y.L., L.Y., and B.Y.; Investigation, B.Y.; Methodology, S.L.; Supervision, Y.L., L.Y., and B.Y.; Validation, S.L.; Visualization, S.L.; Writing—original draft, S.L.; Writing—review and editing, Y.L., L.Y., and B.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China (no. 2017YFB0503004), the Natural Science Foundation of Hunan Province, China (no. 2019JJ50047), the National Natural Science Foundation of China (no. 41801227), the National Natural Science Foundation of China (no. 31870531), and the National Key R&D Program of China (no. 2017YFC0210100).

Acknowledgments

We thank the Collaborative Innovation Center for Karst Mountain Ecological Environmental Protection and Resource Utilization (Project of Education Department of Guizhou Province) for its support to this work. We also thank the project “High Resolution Quantitative Remote Sensing Based on Skylight Polarization Field” of SAFEA (State Administration of Foreign Experts Affairs), Ministry of Science and Technology of China, for supporting this work. Finally, we thank the CNES (Centre National d’Etudes Spatiales) for preprocessing the POLDER/PARASOL data and making the BPDF-BRDF database publicly available.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Summary of the four ML algorithms used in this study for BPDF modeling.

Algorithm	Software	Functions or Toolbox	Model Parameters	Boundaries of Parameters to be Optimized	Optimizing Method
GRNN	MATLAB (R2019b)	Modeling: function newgrnn Prediction: function sim	σ: basis radius of Gaussian function	[0, 0.2] step 0.01	10-fold cross validation
SVR	MATLAB (R2019b)	V3.24 LIBSVM (Matlab interface)	ε: minimum training error of support vectors, =0.01 γ: parameter controlling basis radius of Gaussian function C: regularization parameter	γ: [10⁻⁵, 10²] C: [10⁻², 10²]	10-fold cross validation, optimization following MATLAB function fminsearchbnd
KNN	MATLAB (R2019b)	Modeling: function KDTreeSearcher Search: knnsearch	t: power of the distance, =1 K: number of nearest points	[0, 200] step 10	10-fold cross validation
RF	MATLAB (R2019b)	Modeling: function TreeBagger with ’PredictorSelection’ set to ’curvature’, and ’OOBPredictorImportance’ set to ’on’ Prediction: function predict	ntree: number of trees, =100 nodesize: minimum leafs nodes, =5 repeating 100 times	None	None

Table A2. Matrix of the correlation coefficient of the eight input variables. Each listed correlation coefficient is the average of correlation coefficients of the 16 IGBP classes, with the corresponding standard deviation listed in the brackets.

	F_p	SA	R₆₇₀	R₈₆₅	R₄₉₀	R₅₆₅	R₇₆₅	R₁₀₂₀
F_p	1	−0.943 (0.004)	−0.262 (0.239)	0.375 (0.253)	−0.112 (0.115)	−0.282 (0.261)	−0.369 (0.252)	−0.170 (0.147)
SA		1	0.337 (0.254)	0.473 (0.273)	0.117 (0.095)	0.346 (0.293)	0.464 (0.273)	0.220 (0.148)
R₆₇₀			1	0.567 (0.339)	0.367 (0.302)	0.762 (0.271)	0.584 (0.351)	0.305 (0.199)
R₈₆₅				1	0.296 (0.311)	0.609 (0.289)	0.994 (0.003)	0.481 (0.132)
R₄₉₀					1	0.469 (0.302)	0.304 (0.316)	0.163 (0.189)
R₅₆₅						1	0.620 (0.292)	0.311 (0.181)
R₇₆₅							1	0.474 (0.129)
R₁₀₂₀								1

Table A3. Optimal parameters of the GRNN-, SVR- and KNN-based BPDF models for different configuration of input variables, i.e., C1, C2 and C3. For RF-based BPDF models, ntree was set to 100 and nodesize was set to 5.

	C1				C2				C3
IGBP Class ID	GRNN	SVR		KNN	GRNN	SVR		KNN	GRNN	SVR		KNN
IGBP Class ID	σ	γ	C	K	σ	γ	C	K	σ	γ	C	K
IGBP01	0.02	4.75	26.97	100	0.05	23.22	0.08	200	0.03	19.36	0.06	70
IGBP02	0.03	8.08	11.86	80	0.05	19.41	1.00	200	0.03	14.29	5.85	80
IGBP03	0.04	24.23	0.10	150	0.04	21.30	1.15	200	0.05	10.54	1.45	80
IGBP04	0.03	29.78	0.07	130	0.04	6.13	3.96	200	0.03	15.62	0.10	80
IGBP05	0.03	13.42	4.68	70	0.1	1.38	0.18	90	0.03	19.36	1.01	70
IGBP06	0.02	39.15	1.89	20	0.03	30.91	1.00	70	0.02	30.93	9.58	20
IGBP07	0.02	43.82	1.00	30	0.02	62.38	0.60	150	0.02	67.68	4.55	20
IGBP08	0.05	20.30	0.92	80	0.06	20.80	6.71	190	0.05	19.46	0.95	50
IGBP09	0.02	51.82	6.76	130	0.03	24.91	0.72	190	0.02	32.08	20.67	90
IGBP10	0.02	1.83	9.41	110	0.02	19.35	11.46	110	0.02	7.40	3.57	50
IGBP11	0.03	14.65	17.31	170	0.06	14.80	3.56	200	0.03	33.18	2.75	100
IGBP12	0.02	31.18	0.09	40	0.04	10.10	4.56	170	0.03	27.70	9.34	40
IGBP13	0.05	32.13	0.07	110	0.08	0.04	6.28	60	0.03	19.47	4.80	60
IGBP14	0.03	17.78	1.97	200	0.05	18.97	0.06	200	0.04	25.38	3.55	100
IGBP15	0.01	28.69	5.05	200	0.01	29.76	13.21	200	0.01	22.60	8.63	90
IGBP16	0.02	50.50	0.30	30	0.04	17.05	3.27	150	0.02	23.51	2.93	10

References

Vanderbilt, V.C.; Grant, L.; Daughtry, C.S.T. Polarization of Light Scattered by Vegetation. P IEEE 1985, 73, 1012–1024. [Google Scholar] [CrossRef]
Curran, P. The Relationship between Polarized Visible-Light and Vegetation Amount. Remote Sens. Environ. 1981, 11, 87–92. [Google Scholar] [CrossRef]
Bradley, C.L.; Diner, D.J.; Xu, F.; Kupinski, M.; Chipman, R.A. Spectral Invariance Hypothesis Study of Polarized Reflectance With the Ground-Based Multiangle SpectroPolarimetric Imager. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8191–8207. [Google Scholar] [CrossRef]
Peltoniemi, J.I.; Jarvinen, J.; Zubko, N.; Gritsevich, M. Spectropolarimetric characterization of pure and polluted land surfaces. Int. J. Remote Sens 2020, 41, 4865–4878. [Google Scholar] [CrossRef]
Suomalainen, J.; Hakala, T.; Puttonen, E.; Peltoniemi, J. Polarised bidirectional reflectance factor measurements from vegetated land surfaces. J. Quant. Spectrosc. Radiat. Transf. 2009, 110, 1044–1056. [Google Scholar] [CrossRef]
Deuze, J.L.; Breon, F.M.; Devaux, C.; Goloub, P.; Herman, M.; Lafrance, B.; Maignan, F.; Marchand, A.; Nadal, F.; Perry, G.; et al. Remote sensing of aerosols over land surfaces from POLDER-ADEOS-1 polarized measurements. J. Geophys Res. Atmos 2001, 106, 4913–4926. [Google Scholar] [CrossRef] [Green Version]
Xie, D.H.; Cheng, T.H.; Zhang, W.; Yu, J.; Li, X.J.; Gong, H.L. Aerosol type over east Asian retrieval using total and polarized remote Sensing. J. Quant. Spectrosc. Radiat. Transf. 2013, 129, 15–30. [Google Scholar] [CrossRef]
Wang, H.; Yang, L.K.; Zhao, M.R.; Du, W.B.; Liu, P.; Sun, X.B. The Normalized Difference Vegetation Index and Angular Variation of Surface Spectral Polarized Reflectance Relationships: Improvements on Aerosol Remote Sensing Over Land. Earth Space Sci 2019, 6, 982–989. [Google Scholar] [CrossRef] [Green Version]
Nadal, F.; Breon, F.M. Parameterization of surface polarized reflectance derived from POLDER spaceborne measurements. IEEE Trans. Geosci. Remote Sens. 1999, 37, 1709–1718. [Google Scholar] [CrossRef]
Maignan, F.; Breon, F.M.; Fedele, E.; Bouvier, M. Polarized reflectances of natural surfaces: Spaceborne measurements and analytical modeling. Remote Sens. Environ. 2009, 113, 2642–2650. [Google Scholar] [CrossRef]
Yang, B.; Zhao, H.M.; Chen, W. Modeling polarized reflectance of snow and ice surface using POLDER measurements. J. Quant. Spectrosc. Radiat. Transf. 2019, 236, 106578–106585. [Google Scholar] [CrossRef]
Martin, W.E.; Hesse, E.; Hough, J.H.; Sparks, W.B.; Cockell, C.S.; Ulanowski, Z.; Germer, T.A.; Kaye, P.H. Polarized optical scattering signatures from biological materials. J. Quant. Spectrosc. Radiat. 2010, 111, 2444–2459. [Google Scholar] [CrossRef] [Green Version]
Sun, Z.Q.; Wu, D.; Lv, Y.F.; Lu, S. Optical Properties of Reflected Light From Leaves: A Case Study From One Species. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4388–4406. [Google Scholar] [CrossRef]
Nilson, T.; Kuusk, A. A Reflectance Model for the Homogeneous Plant Canopy and Its Inversion. Remote Sens. Environ. 1989, 27, 157–167. [Google Scholar] [CrossRef]
Rondeaux, G.; Herman, M. Polarization of Light Reflected by Crop Canopies. Remote Sens Environ. 1991, 38, 63–75. [Google Scholar] [CrossRef]
Sun, Z.Q.; Huang, Y.H.; Bao, Y.L.; Wu, D. Polarized Remote Sensing: A Note on the Stokes Parameters Measurements From Natural and Man-Made Targets Using a Spectrometer. IEEE T Geosci. Remote 2017, 55, 4008–4021. [Google Scholar] [CrossRef]
Yang, B.; Knyazikhin, Y.; Lin, Y.; Yan, K.; Chen, C.; Park, T.; Choi, S.H.; Mottus, M.; Rautiainen, M.; Myneni, R.B.; et al. Analyses of Impact of Needle Surface Properties on Estimation of Needle Absorption Spectrum: Case Study with Coniferous Needle and Shoot Samples. Remote Sens. 2016, 8, 563. [Google Scholar] [CrossRef] [Green Version]
Grant, L.; Daughtry, C.S.T.; Vanderbilt, V.C. Polarized and Specular Reflectance Variation with Leaf Surface-Features. Physiol. Plant. 1993, 88, 1–9. [Google Scholar] [CrossRef]
Breon, F.M.; Tanre, D.; Lecomte, P.; Herman, M. Polarized Reflectance of Bare Soils and Vegetation—Measurements and Models. IEEE Trans. Geosci. Remote Sens. 1995, 33, 487–499. [Google Scholar] [CrossRef]
Litvinov, P.; Hasekamp, O.; Cairns, B.; Mishchenko, M. Reflection models for soil and vegetation surfaces from multiple-viewing angle photopolarimetric measurements. J. Quant. Spectrosc. Radiat. 2010, 111, 529–539. [Google Scholar] [CrossRef]
Xie, D.H.; Cheng, T.H.; Wu, Y.; Fu, H.; Zhong, R.F.; Yu, J. Polarized reflectances of urban areas: Analysis and models. Remote Sens. Environ. 2017, 193, 29–37. [Google Scholar] [CrossRef]
Peltoniemi, J.I.; Gritsevich, M.; Hakala, T.; Dagsson-Waldhauserova, P.; Arnalds, O.; Anttila, K.; Hannula, H.R.; Kivekas, N.; Lihavainen, H.; Meinander, O.; et al. Soot on Snow experiment: Bidirectional reflectance factor measurements of contaminated snow. Cryosphere 2015, 9, 2323–2337. [Google Scholar] [CrossRef] [Green Version]
Yang, B.; Zhao, H.M.; Chen, W. Semi-empirical models for polarized reflectance of land surfaces: Intercomparison using space-borne POLDER measurements. J. Quant. Spectrosc. Radiat. Transf. 2017, 202, 13–20. [Google Scholar] [CrossRef]
Vanderbilt, V.C.; Grant, L. Plant Canopy Specular Reflectance Model. IEEE Trans. Geosci. Remote Sens. 1985, 23, 722–730. [Google Scholar] [CrossRef]
Kallel, A. Leaf polarized polarized BRDF simulation based on Monte Carlo 3-D vector RT modeling. J. Quant. Spectrosc Radiat. 2018, 221, 202–224. [Google Scholar] [CrossRef]
Kallel, A. Two-scale Monte Carlo ray tracing for canopy-leaf vector radiative transfer coupling. J. Quant. Spectrosc Radiat. 2020, 243, 106815. [Google Scholar] [CrossRef]
Kallel, A.; Gastellu-Etchegorry, J.P. Canopy polarized BRDF simulation based on non-stationary Monte Carlo 3-D vector RT modeling. J. Quant. Spectrosc Radiat. 2017, 189, 149–167. [Google Scholar] [CrossRef]
Diner, D.J.; Xu, F.; Martonchik, J.V.; Rheingans, B.E.; Geier, S.; Jovanovic, V.M.; Davis, A.; Chipman, R.A.; McClain, S.C. Exploration of a Polarized Surface Bidirectional Reflectance Model Using the Ground-Based Multiangle SpectroPolarimetric Imager. Atmosphere 2012, 3, 591–619. [Google Scholar] [CrossRef] [Green Version]
Waquet, F.; Leon, J.F.; Cairns, B.; Goloub, P.; Deuze, J.L.; Auriol, F. Analysis of the spectral and angular response of the vegetated surface polarization for the purpose of aerosol remote sensing over land. Appl. Opt. 2009, 48, 1228–1236. [Google Scholar] [CrossRef]
Ichii, K.; Ueyama, M.; Kondo, M.; Saigusa, N.; Kim, J.; Alberto, M.C.; Ardo, J.; Euskirchen, E.S.; Kang, M.; Hirano, T.; et al. New data-driven estimation of terrestrial CO₂ fluxes in Asia using a standardized database of eddy covariance measurements, remote sensing data, and support vector regression. J. Geophys Res. Biogeosci. 2017, 122, 767–795. [Google Scholar] [CrossRef]
Forkuor, G.; Hounkpatin, O.K.L.; Welp, G.; Thiel, M. High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models. PLoS ONE 2017, 12, e0170478. [Google Scholar] [CrossRef] [PubMed]
Kotta, J.; Kutser, T.; Teeveer, K.; Vahtmae, E.; Parnoja, M. Predicting Species Cover of Marine Macrophyte and Invertebrate Species Combining Hyperspectral Remote Sensing, Machine Learning and Regression Techniques. PLoS ONE 2013, 8, e63946. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cui, Y.K.; Chen, X.; Xiong, W.T.; He, L.; Lv, F.; Fan, W.J.; Luo, Z.L.; Hong, Y. A Soil Moisture Spatial and Temporal Resolution Improving Algorithm Based on Multi-Source Remote Sensing Data and GRNN Model. Remote Sens. 2020, 12, 455. [Google Scholar] [CrossRef] [Green Version]
He, Y.H.; Yang, B.; Lin, H.; Zhang, J.Q. Modeling Polarized Reflectance of Natural Land Surfaces Using Generalized Regression Neural Networks. Remote Sens. 2020, 12, 248. [Google Scholar] [CrossRef] [Green Version]
Gilichinsky, M.; Heiskanen, J.; Barth, A.; Wallerman, J.; Egberth, M.; Nilsson, M. Histogram matching for the calibration of kNN stem volume estimates. Int. J. Remote Sens. 2012, 33, 7117–7131. [Google Scholar] [CrossRef]
Shah, S.H.; Angel, Y.; Houborg, R.; Ali, S.; McCabe, M.F. A Random Forest Machine Learning Approach for the Retrieval of Leaf Chlorophyll Content in Wheat. Remote Sens. 2019, 11, 920. [Google Scholar] [CrossRef] [Green Version]
Maxwell, A.E.; Warner, T.A.; Fang, F. Implementation of machine-learning classification in remote sensing: An applied review. Int. J. Remote Sens. 2018, 39, 2784–2817. [Google Scholar] [CrossRef] [Green Version]
Zerrouki, N.; Harrou, F.; Sun, Y.; Hocini, L. A Machine Learning-Based Approach for Land Cover Change Detection Using Remote Sensing and Radiometric Measurements. IEEE Sens. J. 2019, 19, 5843–5850. [Google Scholar] [CrossRef] [Green Version]
Liang, L.; Di, L.P.; Zhang, L.P.; Deng, M.X.; Qin, Z.H.; Zhao, S.H.; Lin, H. Estimation of crop LAI using hyperspectral vegetation indices and a hybrid inversion method. Remote Sens. Environ. 2015, 165, 123–134. [Google Scholar] [CrossRef]
Loozen, Y.; Rebel, K.T.; de Jong, S.M.; Lu, M.; Ollinger, S.V.; Wassen, M.J.; Karssenberg, D. Mapping canopy nitrogen in European forests using remote sensing and environmental variables with the random forests method. Remote Sens. Environ. 2020, 247. [Google Scholar] [CrossRef]
Breon, F.M.; Maignan, F. A BRDF-BPDF database for the analysis of Earth target reflectances. Earth Syst. Sci. Data 2017, 9, 31–45. [Google Scholar] [CrossRef] [Green Version]
Ding, A.X.; Jiao, Z.T.; Dong, Y.D.; Zhang, X.N.; Peltoniemi, J.I.; Mei, L.L.; Guo, J.; Yin, S.Y.; Cui, L.; Chang, Y.X.; et al. Evaluation of the Snow Albedo Retrieved from the Snow Kernel Improved the Ross-Roujean BRDF Model. Remote Sens. 2019, 11, 1611. [Google Scholar] [CrossRef] [Green Version]
Roy, D.P.; Li, Z.B.; Zhang, H.K.K. Adjustment of Sentinel-2 Multi-Spectral Instrument (MSI) Red-Edge Band Reflectance to Nadir BRDF Adjusted Reflectance (NBAR) and Quantification of Red-Edge Band BRDF Effects. Remote Sens. 2017, 9, 1325. [Google Scholar]
Specht, D.F. A General Regression Neural Network. IEEE T Neural Networ 1991, 2, 568–576. [Google Scholar] [CrossRef] [Green Version]
Smola, A.J.; Scholkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef] [Green Version]
Chen, X.W.; Huang, W.M.; Yao, G.W. Wind Speed Estimation From X-Band Marine Radar Images Using Support Vector Regression Method. IEEE Geosci Remote S 2018, 15, 1312–1316. [Google Scholar] [CrossRef]
Axelsson, C.; Skidmore, A.K.; Schlerf, M.; Fauzi, A.; Verhoef, W. Hyperspectral analysis of mangrove foliar chemistry using PLSR and support vector regression. Int. J. Remote Sens. 2013, 34, 1724–1743. [Google Scholar] [CrossRef]
Feret, J.B.; le Maire, G.; Jay, S.; Berveiller, D.; Bendoula, R.; Hmimina, G.; Cheraiet, A.; Oliveira, J.C.; Ponzoni, F.J.; Solanki, T.; et al. Estimating leaf mass per area and equivalent water thickness based on leaf optical properties: Potential and limitations of physical modeling and machine learning. Remote Sens. Environ. 2019, 231, 110959. [Google Scholar] [CrossRef]
Chang, C.C.; Lin, C.J. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. [Google Scholar] [CrossRef]
Franco-Lopez, H.; Ek, A.R.; Bauer, M.E. Estimation and mapping of forest stand density, volume, and cover type using the k-nearest neighbors method. Remote Sens. Environ. 2001, 77, 251–274. [Google Scholar] [CrossRef]
Katila, M.; Tomppo, E. Selecting estimation parameters for the Finnish multisource National Forest Inventory. Remote Sens. Environ. 2001, 76, 16–32. [Google Scholar] [CrossRef]
Sun, H.; Wang, Q.; Wang, G.X.; Lin, H.; Luo, P.; Li, J.P.; Zeng, S.Q.; Xu, X.Y.; Ren, L.X. Optimizing kNN for Mapping Vegetation Cover of Arid and Semi-Arid Areas Using Landsat Images. Remote Sens. 2018, 10, 1248. [Google Scholar] [CrossRef] [Green Version]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Belgiu, M.; Dragut, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm 2016, 114, 24–31. [Google Scholar] [CrossRef]
Liu, S.Y.; Yang, B.; Zhang, Z.H.; Xiang, Y.; Wu, T.X.; Zhao, Y.S.; Zhang, F.Z. Influence of polarized reflection on airborne remote sensing of canopy foliar nitrogen content. Int. J. Remote Sens. 2020, 41, 4879–4900. [Google Scholar] [CrossRef]
Dubovik, O.; Herman, M.; Holdak, A.; Lapyonok, T.; Tanre, D.; Deuze, J.L.; Ducos, F.; Sinyuk, A.; Lopatin, A. Statistically optimized inversion algorithm for enhanced retrieval of aerosol properties from spectral multi-angle polarimetric satellite observations. Atmos. Meas. Tech. 2011, 4, 975–1018. [Google Scholar] [CrossRef] [Green Version]
Litvinov, P.; Hasekamp, O.; Cairns, B. Models for surface reflection of radiance and polarized radiance: Comparison with airborne multi-angle photopolarimetric measurements and implications for modeling top-of-atmosphere measurements. Remote Sens. Environ. 2011, 115, 781–792. [Google Scholar] [CrossRef]
Zhou, R.K.; Wu, D.S.; Zhou, R.Y.; Fang, L.M.; Zheng, X.Y.; Lou, X.W. Estimation of DBH at Forest Stand Level Based on Multi-Parameters and Generalized Regression Neural Network. Forests 2019, 10, 778. [Google Scholar] [CrossRef] [Green Version]
Sims, D.A.; Gamon, J.A. Relationships between leaf pigment content and spectral reflectance across a wide range of species, leaf structures and developmental stages. Remote Sens. Environ. 2002, 81, 337–354. [Google Scholar] [CrossRef]
Liang, L.; Di, L.P.; Huang, T.; Wang, J.H.; Lin, L.; Wang, L.J.; Yang, M.H. Estimation of Leaf Nitrogen Content in Wheat Using New Hyperspectral Indices and a Random Forest Regression Algorithm. Remote Sens. 2018, 10, 1940. [Google Scholar] [CrossRef] [Green Version]
Diner, D.J.; Xu, F.; Garay, M.J.; Martonchik, J.V.; Rheingans, B.E.; Geier, S.; Davis, A.; Hancock, B.R.; Jovanovic, V.M.; Bull, M.A.; et al. The Airborne Multiangle SpectroPolarimetric Imager (AirMSPI): A new tool for aerosol and cloud remote sensing. Atmos. Meas. Tech. 2013, 6, 2007–2025. [Google Scholar] [CrossRef] [Green Version]
Chowdhary, J.; Cairns, B.; Mishchenko, M.; Travis, L. Retrieval of aerosol properties over the ocean using multispectral and multiangle photopolarimetric measurements from the Research Scanning Polarimeter. Geophys. Res. Lett. 2001, 28, 243–246. [Google Scholar] [CrossRef]
Li, Z.Q.; Hou, W.Z.; Hong, J.; Zheng, F.X.; Luo, D.G.; Wang, J.; Gu, X.F.; Qiao, Y.L. Directional Polarimetric Camera (DPC): Monitoring aerosol spectral optical properties over land from satellite observation. J. Quant. Spectrosc. Radiat. Transf. 2018, 218, 21–37. [Google Scholar] [CrossRef]
Fougnie, B.; Marbach, T.; Lacan, A.; Lang, R.; Schlussel, P.; Poli, G.; Munro, R.; Couto, A.B. The multi-viewing multi-channel multi-polarisation imager—Overview of the 3MI polarimetric mission for aerosol and cloud characterization. J. Quant. Spectrosc. Radiat. Transf. 2018, 219, 23–32. [Google Scholar] [CrossRef]

Figure 1. Flowchart for evaluations of ML-based BPDF models and comparisons between ML-based and semi-empirical models. CV denotes the 10-fold cross-validation process.

Figure 2. Scatter plots of modeled and measured polarized reflectance of the Xie-Cheng model, the GRNN-based model and the three proposed ML-based models (i.e., SVR-based, KNN-based and RF-based models) for five typical IGBP classes, i.e., evergreen broadleaf forest (IGBP 01), open shrubland (IGBP 07), grassland (IGBP 10), ice and snow (IGBP 15), and desert (IGBP 16). The horizontal (measurement) and vertical (model) axis range from −0.5% to 4%. Root mean square errors (RMSEs) ×100 and the correlation coefficients between modeled and measured values (Cors) are located in the upper left and lower right corners of each panel, respectively. The values in each colorful bin represent the number of scatters on a logarithmic scale. It is notable that the bins with warmer colors indicate much greater number of observations than the bins with cooler colors.

Figure 3. Scatter plots of modeled and measured negative polarized reflectance for all IGBP classes using the four ML-based BPDF model, i.e., GRNN- (a), SVR- (b), KNN- (c), and RF- (d) based models. The horizontal (measurement) and vertical (model) axis range from −1% to 0 and from −1% to 0.6%, respectively. The black solid line and the black dotted line in each panel represent =0 line and 1:1 line, respectively. The values in each colorful bin represent the number of scatters on a logarithmic scale. It is notable that the bins with warmer colors indicate much more number of scatters than the bins with cooler colors.

Figure 4. Improvements (in percent) of the GRNN- (green), SVR- (yellow), KNN- (blue), and RF- (orange) based BPDF models in terms of RMSE with C1, C2, and C3 as input variables, respectively. The improvements are given for each bar by comparing the RMSE with that of the GRNN-based model with C0 as input variables. Five bars in each model indicate the improvements of the five typical IGBP classes corresponding to those in Figure 2. Bars with colors from dark to light represent IGBP 02, 07, 10, 15, and 16, respectively. The negative value means a decrease of the accuracy. The subfigure illustrated the overall improvements, which denote the improvement of results aggregated from all 16 IGBP classes for each model.

Table 1. Statistics of selected POLDER/PARASOL observations and targets. The column “Prop. of Neg.” gives the proportion of the negative polarized reflectance.

IGBP Class ID	IGBP Class	Observations	Targets	Prop. of Neg.
01	Evergreen Needleleaf Forest	40,800	564	6.72%
02	Evergreen Broadleaf Forest	33,868	551	6.90%
03	Deciduous Needleleaf Forest	35,986	536	7.50%
04	Deciduous Broadleaf Forest	43,075	595	5.98%
05	Mixed Forest	40,112	581	6.93%
06	Closed Shrubland	38,431	341	7.21%
07	Open Shrubland	87,052	599	6.99%
08	Woody Savannas	42,984	556	6.32%
09	Savannas	47,740	589	6.65%
10	Grassland	70,778	599	6.37%
11	Permanent Wetlands	33,180	553	8.52%
12	Croplands	58,557	575	5.98%
13	Urban and Built-Up	26,547	515	11.63%
14	Cropland/Natural Vegetation Mosaic	40,284	572	5.56%
15	Snow and Ice	207,028	554	8.94%
16	Desert	102,265	597	5.96%
All		948,687	8877	7.24%

Table 2. Six semi-empirical models used for comparison in this study.

Model	Formula	Free Para.	Description	Ref.
Nadal–Bréon	$R_{p} = ρ [1 - \exp (- β \frac{F_{p}}{μ_{s} + μ_{v}})]$	$ρ, β$	Based on POLDER/ADEOS measurements, developed for natural land surfaces.	[9]
Waquet	$R_{p} = ξ \cdot F_{p} \cdot S (θ_{s}, σ) \cdot S (θ_{v}, σ)$	$ξ, σ$	Incorporated a shadowing function $S (θ_{s}, σ)$ . Based on air-borne MICROPOL and developed for forest, cropped and urban surfaces.	[29]
Maignan	$R_{p} = \frac{C \cdot \exp (- θ_{i}) \cdot \exp (- NDVI) \cdot F_{p}}{4 (μ_{s} + μ_{v})}$	$C$	Added NDVI to the model and $C \cdot \exp (- θ_{i})$ considered the attenuation from leaf surface. Based on POLDER/PARASOL measurements and developed for 14 IGBP classes.	[10]
Litvinov	$R_{p} = \frac{α π F_{p}}{4 \cos ϑ (μ_{s} + μ_{v})} f (σ, ϑ) f_{s h} (SA, k_{r})$	$α, σ, k_{r}$	Added a shadowing function $f_{s h} (SA, k_{r})$ and considered a Gaussian distribution of facets, $f (σ, ϑ)$ . Based on air-borne RSP measurements and developed for vegetation and soil surfaces.	[57]
Diner	$R_{p} = \frac{ξ \cdot F_{p}}{8 π μ_{s} μ_{v} \cos ϑ}$	$ξ$	Based on the ground-based GroundMSPI and developed for grass surface.	[28]
Xie–Cheng	$R_{p} = A \cdot F_{p} \cdot f_{s h} (SA, k_{r}) \cdot \exp (- 0.7 NDVI)$	$A, k_{r}$	Based on POLDER/PARASOL measurements and developed for urban areas.	[21]

Table 3. Optimal parameters of the GRNN-, SVR-, and KNN-based BPDF models. For RF-based BPDF models, ntree was set to 100 and nodesize was set to 5.

IGBP Class ID	GRNN	SVR		KNN
IGBP Class ID	σ	γ	C	K
IGBP01	0.03	6.35	17.95	50
IGBP02	0.03	10.00	4.88	80
IGBP03	0.04	15.96	2.41	70
IGBP04	0.03	27.11	0.06	80
IGBP05	0.03	18.73	1.53	60
IGBP06	0.02	11.04	2.82	40
IGBP07	0.02	43.82	1.00	30
IGBP08	0.05	10.62	1.23	60
IGBP09	0.02	37.25	0.06	60
IGBP10	0.02	8.04	3.09	40
IGBP11	0.03	20.70	0.22	60
IGBP12	0.03	24.73	0.09	80
IGBP13	0.02	19.45	0.97	30
IGBP14	0.04	17.78	0.06	70
IGBP15	0.01	9.32	2.33	90
IGBP16	0.02	12.13	2.58	30

Table 4. RMSEs (

\times

100) between modeled and measured polarized reflectance for the six semi-empirical BPDF models and for the four ML-based BPDF models. The best performance corresponding to the lowest RMSEs for each IGBP class are in bold italic. The improvement of the KNN-based model over the Xie-Cheng model are shown in the column “Impro. of KNN”. The row “Overall” denotes the RMSE of the results aggregated from all IGBP classes.

Table 4. RMSEs (

\times

100) between modeled and measured polarized reflectance for the six semi-empirical BPDF models and for the four ML-based BPDF models. The best performance corresponding to the lowest RMSEs for each IGBP class are in bold italic. The improvement of the KNN-based model over the Xie-Cheng model are shown in the column “Impro. of KNN”. The row “Overall” denotes the RMSE of the results aggregated from all IGBP classes.

IGBP Class ID	Semi-Empirical BPDF Models						ML-Based BPDF Models				Impro. of KNN
IGBP Class ID	Nadal–Bréon	Waquet	Maignan	Litvinov	Diner	Xie–Cheng	GRNN	SVR	KNN	RF	Impro. of KNN
01	0.349	0.365	0.317	0.360	0.439	0.317	0.302	0.296	0.296	0.294	6.81%
02	0.211	0.234	0.216	0.216	0.318	0.211	0.192	0.193	0.192	0.196	9.16%
03	0.376	0.438	0.319	0.374	0.522	0.327	0.306	0.297	0.303	0.310	7.15%
04	0.236	0.284	0.221	0.242	0.357	0.223	0.200	0.199	0.198	0.201	11.14%
05	0.264	0.289	0.258	0.270	0.354	0.261	0.243	0.245	0.243	0.245	6.69%
06	0.140	0.180	0.148	0.143	0.319	0.131	0.108	0.110	0.108	0.109	18.08%
07	0.177	0.215	0.167	0.183	0.382	0.156	0.129	0.129	0.128	0.129	18.23%
08	0.214	0.246	0.217	0.219	0.321	0.210	0.201	0.197	0.196	0.200	6.72%
09	0.230	0.269	0.210	0.234	0.356	0.208	0.185	0.191	0.187	0.188	10.13%
10	0.232	0.278	0.208	0.241	0.420	0.204	0.175	0.176	0.173	0.174	14.95%
11	0.370	0.393	0.314	0.374	0.464	0.317	0.293	0.287	0.291	0.295	8.11%
12	0.234	0.282	0.229	0.244	0.442	0.227	0.215	0.217	0.215	0.218	5.06%
13	0.323	0.336	0.349	0.336	0.494	0.318	0.320	0.308	0.307	0.308	3.57%
14	0.245	0.295	0.229	0.256	0.438	0.225	0.216	0.208	0.212	0.219	5.91%
15	0.456	0.559	0.461	0.504	0.749	0.467	0.422	0.420	0.419	0.427	10.16%
16	0.169	0.203	0.180	0.178	0.369	0.168	0.148	0.149	0.148	0.152	11.98%
Overall	0.304	0.297	0.360	0.324	0.497	0.296	0.270	0.268	0.268	0.272	9.55%

Table 5. Three configurations of input variables. C0 is the original configuration used in Section 3.

Abbreviation	Configuration of Input Variables
C0	F_p+SA+R₆₇₀+R₈₆₅
C1	F_p+SA+R₆₇₀+R₈₆₅+R₅₆₅
C2	F_p+SA+NDVI+ND_565,670+SR_865,670+SR_565,670
C3	F_p+SA+R₆₇₀+R₈₆₅+R₄₉₀+R₅₆₅+R₇₆₅+R₁₀₅₀

Table 6. Improvements (in percent) of the four ML-based BPDF models in terms of RMSE with C1, C2, and C3 as input variables, respectively, compared with the accuracy of GRNN-based BPDF model with C0 as input variables. The increases in accuracy have positive values whereas the decreases have negative values. In each configuration, the best improvements are in bold italic. The row “Overall” denotes the improvement of results aggregated from of all 16 IGBP classes for each model. The “RMSE of RF” column records the RMSE (×100) of the RF-based BPDF model with C3 as input variables.

IGBP ID	C1				C2				C3				RMSE of RF
IGBP ID	GRNN	SVR	KNN	RF	GRNN	SVR	KNN	RF	GRNN	SVR	KNN	RF	RMSE of RF
01	1.10	2.01	2.65	4.10	2.55	2.42	1.86	3.02	0.74	2.42	2.67	6.09	0.284
02	4.33	5.35	4.78	5.32	1.75	0.97	2.68	0.86	4.51	6.89	4.78	9.55	0.174
03	0.45	3.54	1.50	0.86	−1.84	−2.24	−1.75	−3.84	0.55	5.69	2.06	4.80	0.291
04	0.25	0.96	0.50	3.11	−7.28	−5.79	−5.51	−5.77	0.37	2.72	1.06	6.00	0.188
05	1.93	2.35	1.95	2.38	−4.47	−4.12	−4.75	−4.62	2.28	3.48	2.41	6.00	0.229
06	9.92	9.33	10.77	11.27	0.64	−0.61	−0.03	0.34	10.99	11.62	11.87	16.97	0.090
07	6.40	7.49	8.07	7.91	−0.31	−0.61	−1.11	−0.43	7.45	13.01	9.35	14.90	0.110
08	2.15	5.56	5.10	5.39	0.48	1.52	1.86	0.44	2.39	8.80	5.72	10.10	0.181
09	0.01	0.88	−1.22	4.16	0.52	−0.31	−0.20	−0.24	0.45	3.61	−0.49	8.96	0.169
10	0.01	0.91	0.77	4.89	−4.41	−5.57	−4.16	−3.49	−6.28	2.10	1.91	9.92	0.158
11	0.01	4.38	−0.52	0.91	−2.44	−2.80	−2.94	−3.50	−0.13	4.32	0.01	3.00	0.284
12	4.17	4.75	4.19	4.49	2.10	1.68	2.73	1.84	3.84	7.54	4.36	8.70	0.197
13	0.26	2.22	3.68	4.46	2.29	2.81	0.83	3.40	−0.96	3.48	3.68	5.40	0.303
14	−1.85	4.47	2.45	3.40	−0.60	2.18	0.93	0.27	−0.12	5.92	2.79	6.65	0.202
15	0.04	1.49	0.47	1.28	−2.64	−1.85	−1.16	−2.53	0.53	3.68	1.53	6.29	0.395
16	3.48	1.58	3.85	0.68	−1.79	−2.26	−2.00	−7.62	8.00	8.60	8.62	6.62	0.138
Overall	0.73	2.29	1.46	2.34	−1.80	−1.36	−1.00	−1.82	0.96	4.43	2.37	6.62	0.252

Table 7. Importance of the eight input variables given by random forest, averaged for 100 times’ calculation for each IGBP class. The most important variable is in bold italic for each IGBP class.

IGBP ID	F_p	SA	R₆₇₀	R₈₆₅	R₄₉₀	R₅₆₅	R₇₆₅	R₁₀₂₀
01	1.321	1.604	1.342	0.786	1.201	1.122	1.102	0.917
02	1.090	1.241	1.304	1.460	2.360	1.423	1.531	1.213
03	1.407	1.234	0.946	0.533	0.816	0.671	0.624	0.767
04	0.780	0.825	1.350	0.524	2.789	1.143	0.568	0.311
05	1.319	1.438	1.659	1.064	2.258	1.451	1.343	0.476
06	1.674	1.373	1.394	0.787	2.939	1.409	0.914	1.581
07	1.362	1.642	5.082	1.571	1.818	1.420	2.524	2.566
08	1.357	1.181	2.051	0.622	1.628	1.514	1.233	1.049
09	1.654	1.359	2.668	0.885	3.119	1.955	1.715	1.868
10	1.305	1.607	2.107	0.305	0.939	1.279	0.229	0.268
11	1.630	1.468	1.099	0.724	0.649	0.605	0.631	0.816
12	1.580	1.535	2.675	1.395	2.949	1.772	1.799	2.349
13	1.072	0.990	0.597	0.290	0.457	0.444	0.339	0.264
14	1.501	1.457	1.388	0.812	1.365	1.309	1.052	0.569
15	1.655	1.350	1.955	2.129	2.888	1.840	1.986	2.509
16	1.490	1.791	2.763	2.634	1.463	2.679	2.544	3.018

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, S.; Lin, Y.; Yan, L.; Yang, B. Modeling Bidirectional Polarization Distribution Function of Land Surfaces Using Machine Learning Techniques. Remote Sens. 2020, 12, 3891. https://doi.org/10.3390/rs12233891

AMA Style

Liu S, Lin Y, Yan L, Yang B. Modeling Bidirectional Polarization Distribution Function of Land Surfaces Using Machine Learning Techniques. Remote Sensing. 2020; 12(23):3891. https://doi.org/10.3390/rs12233891

Chicago/Turabian Style

Liu, Siyuan, Yi Lin, Lei Yan, and Bin Yang. 2020. "Modeling Bidirectional Polarization Distribution Function of Land Surfaces Using Machine Learning Techniques" Remote Sensing 12, no. 23: 3891. https://doi.org/10.3390/rs12233891

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling Bidirectional Polarization Distribution Function of Land Surfaces Using Machine Learning Techniques

Abstract

1. Introduction

2. Data and Methods

2.1. POLDER/PARASOL BRDF-BPDF Database

2.2. Machine Learning (ML)-Based BPDF Models

2.2.1. Selection of Input Variables

2.2.2. Generalized Regression Neural Networks (GRNN)

2.2.3. Support Vector Regression (SVR)

2.2.4. K-Nearest-Neighbor (KNN) Regression

2.2.5. Random Forest (RF) Regression

2.3. Semi-Empirical BPDF Models

2.4. Optimization and Selection of Model Parameters

2.5. Evaluation and Comparison

3. Results

4. Discussion

4.1. Advantages of the ML-Based BPDF Models

4.2. Further Improvements Using Different Configuration of Input Variables

4.3. Limitations of the ML-Based BPDF Models

4.4. Potential Applications of the ML-Based BPDF Models

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI