A Comparison of Estimating Crop Residue Cover from Sentinel-2 Data Using Empirical Regressions and Machine Learning Methods

: Quantifying crop residue cover (CRC) on field surfaces is important for monitoring the tillage intensity and promoting sustainable management. Remote-sensing-based techniques have proven practical for determining CRC, however, the methods used are primarily limited to empirical regression based on crop residue indices (CRIs). This study provides a systematic evaluation of empirical regressions and machine learning (ML) algorithms based on their ability to estimate CRC using Sentinel-2 Multispectral Instrument (MSI) data. Unmanned aerial vehicle orthomosaics were used to extracted ground CRC for training Sentinel-2 data-based CRC models. For empirical regression, nine MSI bands, 10 published CRIs, three proposed CRIs, and four mean textural features were evaluated using univariate linear regression. The best performance was obtained by a three-band index calculated using (B2 − B4) / (B2 − B12), with an R 2cv of 0.63 and RMSE cv of 6.509%, using a 10-fold cross-validation. The methodologies of partial least squares regression (PLSR), artificial neural network (ANN), Gaussian process regression (GPR), support vector regression (SVR), and random forest (RF) were compared with four groups of predictors, including nine MSI bands, 13 CRIs, a combination of MSI bands and mean textural features, and a combination of CRIs and textural features. In general, ML approaches achieved high accuracy. A PLSR model with 13 CRIs and textural features resulted in an accuracy of R 2cv = 0.66 and RMSE cv = 6.427%. An RF model with predictors of MSI bands and textural features estimated CRC with an R 2cv = 0.61 and RMSE cv = 6.415%. The estimation was improved by an SVR model with the same input predictors ( R 2cv = 0.67, RMSE cv = 6.343%), followed by a GPR model based on CRIs and textural features. The performance of GPR models was further improved by optimal input variables. A GPR model with six input variables, three MSI bands and three textural features, performed the best, with R 2cv = 0.69 and RMSE cv = 6.149%. This study provides a reference for estimating CRC from Sentinel-2 imagery using ML approaches. The GPR approach is recommended. A combination of spectral information and textural features leads to an improvement in the retrieval of CRC.


Introduction
Crop residues, such as stalks, stems, leaves, and seed pods, are materials left on the surface of agricultural fields after harvest. Crop residue cover (CRC), as an indicator of the amount of crop residue left on the soil surface, plays an important role in evaluating tillage intensity [1,2]. CRC is a key input parameter in models to predict the impact of agricultural systems on soil organic carbon, greenhouse gas emissions, and crop production [3,4]. Therefore, it is critical to estimate CRC accurately, thereby enabling the evaluation of the effectiveness of tillage practice and promoting agricultural sustainability.
Currently, the measurement of CRC depends on manual survey-based methods, such as the linepoint transect. These methods are time-consuming and labor-intensive, thus making systematic and continuous quantification of CRC over large areas difficult [5,6]. Alternatively, remote sensing is an efficient technique to acquire CRC spatially and temporally in a rapid, accurate, and objective manner [7,8]. Remote sensing techniques used to estimate CRC can be classified into empirical regression based on crop residue indices (CRIs), classification [9,10], spectral unmixing [11], and spectral angle methods [12,13]. The most widely used among these methods is empirical regression constructed from a linear or nonlinear relationship between CRC and the CRIs, also known as "the CRI technique".
A series of CRIs have been designed to improve the detection of CRC. One such CRI, cellulose absorption index (CAI), was developed by Nagler et al. [14] based on a unique absorption feature associated with cellulose and lignin at 2100 nm in the spectra of crop residue. The CAI has proven very accurate in quantifying CRC [15]. However, its application is limited because hyperspectral imagery is currently only available on airborne or proximal platforms. Various broadband indices have been developed for mapping CRC using multispectral satellite imagery, such as the Advanced Spaceborn Thermal Emission and Reflectance Radiometer (ASTER) and the Landsat family. The wellknown ASTER CRIs include the lignin cellulose absorption index (LCA), and the shortwave infrared (SWIR) normalized difference residue index (SINDRI) calculated from SWIR bands [16,17], which are effective at measuring CRC due to the precise measurement of cellulose and lignin absorption features [18]. The CRIs developed from the Landsat imageries are also primarily dependent on SWIR bands, such as normalized difference tillage index (NDTI) [19], normalized difference indices (NDI, NDI5, NDI7) [20], normalized difference senescent vegetation index (NDSVI) [21], and normalized difference residue index (NDRI) [22]. Previous studies showed that NDTI performed the best in most cases [15], and NDRI showed an advantage in minimizing the effects of green vegetation [22]. Most of the above indices are limited to normalized difference ratios of two bands. Index formulations, e.g., simple ratio and normalized difference ratio, and band selection have impacts on the performance of the estimation [23]. It is likely that indices based on more than two bands and different formulations would lead to further improvements in the retrieval of variables from remote sensing imagery.
The Sentinel-2 satellites launched in 2015 and 2017 carry Multi-spectral Instruments (MSI) with 13 spectral bands. Apart from the three red-edge bands, the MSI bands have corresponding spectrum regions to the Landsat 8 Operational Land Imager (OLI). Moreover, Sentinel-2 data possess finer spatial and temporal resolutions, which is more suitable for agricultural monitoring. However, there is a lack of studies on evaluating the ability of Sentinel-2 data in CRC estimation [4,24].
Besides the empirical regression approach, various machine learning (ML) regression algorithms have been developed and are popular in bio-geophysical variable retrieval due to their computational efficiency and effectiveness [25,26]. ML algorithms have shown advantages in capturing the nonlinear relationship of input features and retrieval targets and can be massively multivariate, involving several variables. Commonly used ML methods include artificial neural network (ANN), support vector machine (SVM), Gaussian processes regression (GPR) [27][28][29]. To date, very few studies have investigated the use of ML regression for the prediction of CRC [30]. Therefore, evaluating the performance of ML regressions in CRC estimation is urgently needed. CRIs that are designed to enhance the discrimination of crop residue from soil are reportedly more correlated to CRC than spectral reflectances [31]. Therefore, it is worth exploring the use of CRIs as input features in the ML models for the prediction of CRC. Furthermore, several studies reported that textural features were useful in estimating vegetation parameters (i.e., leaf area index (LAI)) and combining both textural features and spectral information provided much improvement compared with using spectral information only [32][33][34]. In addition, Jin et al. [35] evaluated three processing techniques to estimate CRC: regressions based on CRIs, textural features, and combinations of CRIs with textural analyses. They found that textural features were correlated with CRC and the regression based on a combination of textural features and CRIs yields a better result than the other two approaches. Based on these studies, it is necessary to evaluate the potential of textural features used as input predictors in the machine learners for the prediction of CRC.
To address the gaps, this study focused on evaluating the potential of Sentinel-2 data on CRC estimation using an empirical regression approach and ML regression techniques. We hypothesize that using different combinations of MSI bands, CRIs, and textural features could provide improved estimates of CRC than using any single source feature alone. Therefore, we evaluated and compared the performance of univariate regression against individual MSI bands, CRIs and texture features, as well as the retrieval accuracies of partial least squares regression (PLSR), ANN, support vector regression (SVR), GPR, and random forest (RF) associated with MSI bands, CRIs, and their combinations with textural features.

Study Area
The study area is located in Changchun (43°5′N-45°15′N, 124°18′E-127°2′E), Jilin province, Northeast China ( Figure 1). The region is situated in a temperate continental climate zone with four seasons, characterized by a hot and rainy summer and cold and dry winter. The test area is about 320 × 450 m. Maize is the main crop in this region. Local maize cultivars are planted and harvested in May and October, respectively. The variability of in-ground CRC is due to the different harvesting traditions that local farmers have been implemented.

Unmanned Aerial Vehicle Imagery
Field CRC measurements were obtained with an unmanned aerial vehicle (UAV) on March 29, 2018. We used a DJI Inspire 1 quadcopter (DJI, Shenzhen, China). The UAV carried a high resolution digital camera with a 1/2.3'' CMOS sensor and 12.4 million effective pixels to generate images with red, green, and blue (RGB) channels. At 100 m flight height, the captured 537 images, retaining horizontal and vertical overlaps of at least 80% and 75%, respectively, and possessing a spatial resolution of 4.4 cm. The Pix4D software (Pix4D SA, Lausanne, Switzerland) and ENVI 5.3 were applied for image mosaicking and georectification with 12 ground control points obtained from Google Earth as the reference.

Classification of UAV RGB Orthomosaics and Calculation of CRC Data to 20 m Scale
SVM is a powerful supervised classification technique and provided a superior performance compared to most other image classification methods [36,37]; therefore, it was used to classify the UAV RGB orthomosaics of the study area ( Figure 1). The classification of the RGB orthomosaics was finished by SVM embedded in the ENVI 5.3 software. The main land cover types included soil, crop residue, tillage shadows, plastics, and burned areas. The red, green, and blue digital numbers from the RGB orthomosaic of these ground components show discrepancies, which were used in the SVM classification. The whole UAV RGB orthomosaic was clipped into 14 subsets and classified one by one. The regions of interest (ROIs) of soil, crop residue, shadows, plastics and burned areas, were collected based on the manual interpretation of the RGB image, for the image had a super-high resolution of 4.4 cm. ROIs were created for each subset by using the ROI tool embedded in the ENVI 5.3 software. In total, we created more than 1000 ROIs for the four classes. Each ROI covered about 20-30 pixels in average. The separability between the ROIs of crop residue and the other classes was guaranteed higher than 1.8. Of these, 70% were used for training and the remaining were used for evaluation. The classification was evaluated according to overall, user's, and producer's accuracies.
The fractional cover of crop residue is calculated as the number of pixels in the crop residue class divided by the total number of pixels in a given grid cell (or as the mean value of the grid cell from a binary raster). The grid cell in this study is 20 m, i.e., the spatial resolution of Sentinel-2 MSI imagery. We created precisely aligned polygon grids for Sentinel-2 20 m data using the create fishnet tool in ArcMap v.10.5. The CRC was calculated to grid cells by using the zonal statistics tool in ArcMap v.10.5 based on the classification of UAV image. The study area covered 243 20-m pixels. The UAVderived CRC (hereafter UAV-CRC, n = 243) was used for the development of the CRC estimation model.

Empirical Regressions
Empirical regression has long been the most popular method in optical remote sensing analysis [25]. In this study, we selected 10 CRIs to conduct univariate regression analysis with UAV-CRC, respectively. Eight of the ten indices were based on normalized difference format like the NDTI. Simple tillage index (STI) was the ratio of near infrared (NIR) band and SWIR band [2,19]. The deaf fuel index (DFI) was developed based on four bands of 1, 2, 6, and 7 in the Moderate Resolution Imaging Spectroradiometer (MODIS) and showed good potential in estimating steppe residue cover [38].
In addition to the ten published indices, we constructed three index formulations to investigate the potential of improving retrieval accuracy of CRC based on Sentinel-2 bands. The formulae of CAI and lignin Cellulose absorption index (LCA) [39] were adopted to calculate three-band indices (Equations (1) and (2)). These two indices have shown good performance in estimating CRC [39]. Using the NDRI formula as a basis, we constructed the third formulation by substituting one of the two bands in the denominator of NDRI with a third band (Equation (3)). These three formulations were composed of three bands. We named them three-band index 1 (3BI1), 3BI2, and 3BI3 3BI1 , , = 100 × (0.5 × (ρ + ρ − ρ ) (1) where ρ represents band of Sentinel-2.
In order to select a best-performing index, all possible band combinations were correlated with the UAV-measured CRC according to the three formulae. The band combination with the highest coefficient of determination (R 2 ) for each formula was selected as the optimal index. The optimal indices output from the three formulae are listed in Table 1. All the three indices used Sentinel-MSI band 2, 4, and 12. The 3BI3 also used band 11.  [21] shortwave green normalized difference index SGNDI (B3 − B12)/ (B3 + B12) [13] crop residue cover index CRCI The performance of the nine bands of Sentinel-2 with a spatial resolution of 20 m on CRC estimation were also investigated. Furthermore, we extracted mean textural feature, which was reported as being correlated to CRC [35], by using a gray level co-occurrence matrix from Sentinel-2 10 and 20 m imagery, respectively. The mean textural features of each Sentinel-2 10 m band-B2, B3, B4, and B8-were aggregated to 20 m grid cells to match the pixel size of Sentinel-2 20 m image using the zonal statistics tool in Arcmap v.10. A pre-primary analysis showed that the aggregated 20 m mean textures appear more correlated with CRC in comparison to the mean textural features, directly derived from 20 m MSI imagery. Therefore, the aggregated mean textural features of band 2, 3, 4, and 8 were employed.
Each of the bands, indices, and textural features was used as a single explanatory variable for retrieval of CRC. Preprimary analysis showed that these predictors had linear relationships with the UAV-CRC. Therefore, we restricted the fitting method to ordinary least-squares linear regression.

Machine Learning Methods
In contrast to empirical regression, ML algorithms are robust to noisy data and can approximate multivariate non-linear relationships. The PLSR is a linear non-parametric regression model. ANN, GRP, SVR, and RF, also referred to as non-linear non-parametric models, apply non-linear transformations. We grouped input predictors into four categories, i.e., nine MSI bands, 13 CRIs, a combination of MSI bands and the four aggregated textural features, and a combination of 13 CRIs and those textural features. The output parameter was the corresponding UAV-CRC measurements.
The PLSR decomposes both the dependent and independent variables into a number of principal components, and can accommodate highly correlated variables and over-fitting. It is a bilinear calibration method that utilizes data compression by reducing a large number of measured collinear variables to a few non-correlated factors [41].
ANNs have been widely used for estimating terrestrial variables from remote sensing data [25,42]. We used a back-propagation ANN that consisted of input, hidden and output layers. The activation functions in the hidden and output nodes were set to "sigmoid" and linear, respectively. The Levenberg-Marquardt minimization algorithm was used to calibrate the synaptic coefficients [43]. In order to optimize the structural parameters of ANN for the network, we varied both the momentum coefficient and learning rate from 0.1 to 1.0, at a step of 0.05. The number of nodes in the hidden layer was varied from three to seven. The mean squared error with a value of 1e-6 was used as the performance threshold. The optimal network was selected in terms of the absolutely average error between the validation data and predictions.
The SVR model by Vapnik et al. [44], is a popular tool for solving non-linear problems. In the SVR procedure, the n-dimensional input variables are conveyed into a new feature space with higher dimensions using kernel functions, and consequently, the optimal separating hyperplanes are developed [45]. In this study, a Gaussian kernel was applied as the kernel function. Other kernel types-linear, quadratic, and cubic-were tested, but the Gaussian kernel method was found to be the most reliable. The remaining parameters, including the box constraint mode, kernel scale, and the epsilon mode, were optimized.
GPR is another popular kernel-based ML method for the non-linear regression problem [46]. In the process of training data, GPR first elicited a prior GPR to constrain the possible forms of the unknown function and then updates this prior in the light of training samples to generate a posterior GPR as the final functional model [47]. We used an exponential kernel and a constant basic function. The parameter sigma was optimized.
The RF is an ensemble-learning algorithm that combines a large set of decision trees to improve prediction accuracy [48]. The RF presents several advantages: it is not sensitive to noise or overfitting, and it can handle thousands of input variables without variable deletion and runs efficiently [28,49]. In order to implement the RF, two parameters needed to be set up: the number of trees and the number of features. In this study, we applied grid searching and varied the number of trees from 50 to 500 at a step size of 20. The number of features is all the input variables. The optimal network was selected in terms of the absolutely average error between the validation data and predictions. A diagram of the workflow is provided in Figure 2. The hyperparameters of ANN, GPR, SVR, and RF are represented in the supplementary materials (Table S1-S4).

Model Calibration and Validation
The performances of the above estimation algorithms were evaluated using R 2 and root mean square error (RMSE) with respect to the UAV measured CRC. In order to avoid over-fitting of the ML algorithms and dependence on a single random partitioning of the datasets, as well as guarantee that all samples were used for both training and validation, we used a repeated 10-fold cross-validation procedure in the training process and evaluated the performance of each approach, e.g., as reported in other studies [23,28,50]. Samples were randomly split into 10 equal-sized sub-datasets, and they were trained and tested 10 times. For each time, nine sub-datasets were used iteratively for calibration and the remaining sub-dataset for validation. The cross-validation process was then repeated 10 times. By repeating the training procedure 10 times, all observations are used for both calibration and validation, with each observation used for validation only once in one of 10 iterations. The R 2 cv and RMSEcv of each algorithms represented here are the average of 10 iterations. The R 2 cv and RMSEcv of each fold for each algorithms is supplied in the Supplementary Materials (Table S5-S11). Figure 3 shows examples of the SVM classification results. The classification had a high accuracy ( Table 2). The overall accuracy was about 98.06%. The producer's accuracy for the crop residue class was 97.47% and the user's accuracy was 98.86%. The corresponding values for the soil class were 98.73% and 97.20%.   Figure 4 highlights the correlation matrices of all the predictor variables derived from Sentinel-2 and the UAV measured CRC. All bands were highly correlated with CRC with Pearson's correlation coefficients between 0.53 and 0.75, except band 12. Band 4 had the strongest relationship (R = 0.75) with CRC in these nine bands. Most of the 10 published indices showed high positive correlations with the exception of MCRC. NDRI had a correlation of 0.77 with CRC, followed by NDI7 and STI. The proposed 3BI2 was the most positively correlated with CRC (R = 0.78). The proposed 3BI1 was the most negatively correlated with CRC (R = −0.78), followed by 3BI3 and NDSVI. The mean textures extracted from bands 2, 3, 4, and 8 were also correlated with CRC. The mean texture derived from band 4 (named B4_mean) performed better than the other textures. In general, CRIs were more correlated with CRC than bands and textural features. The 26 variables showed different degrees of correlation with each other. The performance of the above variables on estimating CRC was first analyzed by least-squares linear regression between each of variables and the UAV measured CRC. The results of the 10-fold cross-validation for the models (Table 3) revealed that the models based on MSI bands presented various performances, with R 2 cv value ranging from 0.07 to 0.58 and RMSEcv ranging from 10.902% to 7.185%. Among the nine bands, the best model was fitted by band 4, which yielded the highest correlation with CRC (R 2 cv = 0.58, RMSEcv = 7.185%), followed by models based on band 5 and 8a. The regression relationship between band 4 and CRC is shown in Figure 5a. The most poorly performing band was band 12, with an R 2 cv of 0.07 and RMSEcv of 10.902%. The performance order of MSI bands from the highest to lowest with respect to RMSEcv values was B4, B5, B7, B8a, B3, B6, B2, B11, and B12. The models based on published indices yielded various retrieval results, with R 2 cv values ranging from 0.15 to 0.61 and RMSE ranging from 6.663%to 10.336% (Table 3). NDRI calculated by band 4 and band 12 performed best, with an R 2 cv of 0.61 and RMSEcv of 6.663%, followed by NDI7 and STI, which were calculated by bands 8a and 12. The DFI, a combination of bands 12, 11, 4, and 8a, also presented a reliable estimate. NDTI provided an accuracy of R 2 cv equal 0.50 and RMSEcv equal to 7.585%. Other indices presented relatively poor performance, especially MCRC, which was proposed for differentiating between conventional and conservation tillage [17]. Among all the CRIs, the best retrieval accuracy was achieved by 3BI2 (R 2 cv = 0.63, RMSEcv = 6.509%), followed by 3BI1. These two indices, developed from the formulae of CAI and LCA, performed better than the 10 published indices. Adding a third band at the basis of NDRI, 3BI3 slightly improved the accuracy compared to NDRI. The three proposed indices were associated with MSI bands 4, 12, and 2. Although band 12 was not correlated to CRC, indices utilizing it performed better than those without it, such as NDI7 versus NDI5 and NDRI versus NDSVI. This is likely due to the cellulose and lignin absorption in residue near band 12 [22]. The scatter plots between CRC and the best four indices (NDRI, 3BI1, 3BI2, and 3BI3) are shown in Figure 5b-e.

Empirical Regression Analysis Using Sentinel-2 Bands, Indices, and Textural Features for CRC Estimation
Of the four aggregated mean textural features investigated, the mean feature derived from MSI band 4 led to the most accurate estimates of CRC (R 2 cv = 0.49, RMSEcv = 8.071%), followed by B8_mean and B3_mean. B2_mean performed poorly. The regressed relationship between B4_mean and CRC is shown in Figure 5f. Models based on textural features presented higher RMSE values than those based on bands and indices. Generally, CRIs performed best.

Performance of PLSR on estimating CRC
PLSR, an extension of multiple linear regression technique, is particularly useful in the case of predicting a set of dependent variables from a large set of independent variables. The above MSI bands, CRIs, and textural features were correlated with CRC in different degrees. Combinations of these features via PLSR were expected to offer advantages over univariate regression, including enhancing the sensitivity to crop residue, reducing the influence of background spectral noise, and providing structural information via textural features. Therefore, we applied four groups of variables, i.e., MSI nine bands, 13 CRIs, a combination of nine bands and four mean textural features, and a combination of 13 CRIs and those textural features, to estimate CRC via PLSR ( Figure 2). Table 4 presents a comparative analysis of the performance level of PLSR models based on the four groups of input variables. The PLSR model based on the nine MSI bands generated better results than any univariate models based on the MSI bands in terms of R 2 cv and RMSEcv. Similarly, the PLSR model based on the 13 CRIs yielded a higher accuracy of estimating CRC compared to NDRI and NDTI. In addition, the PLSR combining nine bands and four textural features contributed to a lower RMSEcv compared to that based on nine bands. The retrieval accuracy of PLSR was further improved by using 13 CRIs and four textural features as input features (R 2 cv = 0.66, RMSEcv = 6.427%). The addition of textural features enriched the effective information for CRC estimation. The scatter plot of measured CRC values versus estimated value by the best PLSR model is shown in Figure 6. It is noted that samples with CRC measurements below a value of 20% were systematically overestimated and those higher than 20% were underestimated. Table 4. The performances of partial least squares regression (PLSR) associated with four groups of input predictors on CRC estimation. The best-performing input variable combination is highlighted in boldface.  Table 4). The different colors indicate the 10-fold subsets. Table 5 shows the cross-validation results of the four non-linear non-parametric algorithms-ANN, GPR, SVR, and RF-with four groups of input predictors. Generally, every approach presented reliable estimates with acceptable R 2 cv and RMSEcv values. The ANN models were highly correlated with the UAV-CRC, but with higher RMSE values than the results derived from the RF. These four ML algorithms based on MSI bands outperformed the large majority of univariate regressions while not performing as well as 3BI2. Further analysis was executed by considering CRIs as input variables as well as a combination of bands and textural features, and a combination of CRIs and textural features ( Figure 2). Table 5. Performance of artificial neural network (ANN), Gaussian process regression (GPR), support vector regression (SVR), and random forest (RF) associated with four groups of input predictors on CRC estimation. The best-performing input variable combination for each algorithm is highlighted in boldface. Compared to utilizing MSI bands as input predictors, the use of the other three groups of input variables improved the performance of the four non-linear ML approaches (Table 5). In terms of ANN, the retrieval accuracy was significantly improved by replacing bands with either CRIs, a mixture of bands and textural features, or a combination of CRIs and textures. The highest accuracy (R 2 cv = 0.69, RMSEcv = 7.273%) was obtained by using bands and textures as input predictors. Similarly, the retrieval accuracy obtained by employing RF with either CRIs, a mixture of bands and textures, or a mixture of CRIs and textures, was also improved compared to only using bands as input predictors. The best RF performance was found with bands and textures as input variables (R 2 cv = 0.61, RMSEcv = 6.415%). Although using CRIs to replace bands in GPR and SVR models obtained little improvement, these two algorithms associated with the latter two groups of variables obviously improved their performance. For GPR model, the lowest RMSEcv and the highest R 2 cv were obtained by combining CRIs and textures (R 2 cv = 0.66, RMSEcv = 6.352%). The best-performing model was SVR associated with bands and textures (R 2 cv = 0.67, RMSEcv = 6.343%). The results of these four ML models also demonstrated that adding texture information improved retrieval accuracy.

Algorithms
When comparing the PLSR and ML models, the best RF, GPR, and SVR performed better than the best PLSR model. A higher performance for these three non-linear ML models implies that the relationship between CRC and input variables may be non-linear. PLSR cannot cope with complex nonlinear relations [23]. The GPR with CRIs and textural features as input variables and the SVR with an input of bands and textural features are robust for CRC estimation.

Optimization of the GPR model
As noted earlier, one of the advantages of GPR is its ability to provide insight into the relevance of input predictors. The relevance can be interpreted by , which is a parameter of the covariance function of GPR: K , = exp (− , ) . High values mean that relations largely extend along that predictor; hence, the lower the , the more relevant the predictor [51]. We calculated for each groups of input predictors and illustrated it in Figure 7. The plot of for the nine bands (Figure 7a) shows that the most relevant bands were encountered in the red (B4), green (B3), and the second SWIR band (B12). Indices associated with these three bands have proven powerful in estimating CRC ( Table 3). Calculation of values for the 13 indices (Figure 7b) showed that 3BI1, NDSVI, SGNDI, and NDRI were more relevant to CRC than other CRIs. Those indices were related to B3, B4, and B12. When texture information and bands were inputted into GPR algorithm together, B4 and B3_mean were more relevant to CRC than other variables (Figure 7c). B2_mean was not as relevant to CRC as the other three textural features. As for the combination of CRIs and textures as input predictors, NDI5, NDSVI, 3BI1 and 3BI2 presented lower than other indices (Figure 7d). The mean textures of B3, B4, and B8 were also relevant to CRC. We explored the possibility of optimizing the GPR model by excluding the least important predictors according to the relevance of each variable to CRC. After the model evaluation using the entire set of input variables, we used a stepwise elimination method to identify the optimal input combination in such a way to reduce the number of input variables, beginning at the variable with the highest . We ended up with the combination that provided the lowest RMSE. The slash-filled bars in Figure 7 were final input variables for each group. Table 6 lists the accuracies retrieved by GPR models with optimal input variables. Compared to Table 5, the accuracy of each GPR algorithm was improved by selecting input variables according to value. Compared to the original nine bands as input variables, the RMSEcv was improved from 6.593% to 6.4% by using B3, B4, B5, B6, B11, and B12 as input predictors. In terms of the GPR model based on CRIs, the 13 CRIs were reduced to NDI5, NDRI, SGNDI, NDSVI, and 3BI1 with a retrieval accuracy of R 2 cv equal to 0.68 and RMSEcv equal to 6.445%. The combination of NDI5, NDSVI, 3BI1, and mean textures of B3, B4, and B8 also improved GPR retrieval accuracy. The optimal variables among bands and textural features were B4, B8a, and B12, and mean textures of B3, B4, and B8. The GPR associated with these variables outperformed other GPRs, enhancing R 2 cv from 0.66 to 0.69 and RMSEcv from 6.382% to 6.149%. The scatter plots between estimated CRC and measured CRC demonstrate that the CRC estimated lies close to the 1:1 line (Figure 8). Compared to the estimation results by PLSR (Figure 6), overestimation and underestimation were improved. The above results indicated that this is an effective way to identity optimal input variables by using values. The performance of GPR model was further improved by using bands and mean textural features information.   Table 6). The different colors indicated the 10-fold subsets.

Discussion
In this study, we investigated empirical regression and ML methods for the purpose of estimating CRC from Sentinel-2 MSI data. Regression analysis using established CRIs and proposed CRIs based on Sentinel-2 MSI data were explored, together with the applications of PLSR and four non-linear non-parametric ML approaches (ANN, SVR, GPR, and RF), using (1) nine MSI bands, (2) 13 CRIs, (3) nine bands and four mean textural features, and (4) 13 CRIs and those textural features as predictor variables, respectively. Overall, the results demonstrated that PLSR, SVR, GPR, and RF provided higher retrieval accuracy when textural features and CRIs were used as input predictors. The GPR model performance was further improved by using optimized input predictors. GPR is recognized as a promising method for estimating CRC. A further discussion of the elements of this study is presented below.

Empirical Regression Analysis for CRC Estimation
Univariate regressions examining the relationship between individual bands, CRIs, and mean textural features and CRC were conducted. Most of the bands, CRIs, and textural features were correlated to CRC. The best-performing index among the published CRIs was NDRI with an accuracy of R 2 cv of0.61 and RMSEcv of 6.663%. The proposed index 3BI2, constructed with band 2, 4, and 12, obtained the best performance with R 2 cv = 0.63 and RMSEcv = 6.509%, followed by 3BI1 and 3BI3. These three-band indices performed better than two-band indices listed in Table 1. Our results confirmed that, as noted by Verrelst et al. [23], when multiple bands are available there is no reason to limit the estimation to two-band indices. Notwithstanding the promising results, we acknowledge that further comprehensive work is needed to evaluate the reliability of the three proposed indices to estimate CRC in the context of different residue types under various circumstances, such as residue decomposition stage, residue and soil moisture content, soil brightness, and the presence of green vegetation [7,18].

Machine learning approaches for CRC estimation
In this study, five ML techniques (PLSR, ANN, SVR, GPR, and RF) were evaluated and the importance of input variables for these approaches was investigated. Moreover, the possibility of optimizing a GPR model for improving the retrieval accuracy of CRC was explored. 4.2.1. Impact of Training Samples on ML Algorithm Performance ML algorithms typically require a large sample set to build an effective model. In order to analyze the effect of the number training samples on ML performance, each ML's ability to handle limited training samples was examined. The same training samples were used for each algorithm to compare ML performance against a matching set of variables. A uniform percentage of the total samples for each algorithm was chosen at twenty percent intervals starting at 40% and ending at 80%. The sampling for each set was repeated 50 times. The models were trained with inputs of nine bands and four mean textures as an example. The results are presented in Table 7. Results demonstrate that with bands and texture measures as input variables, the best-performing model is SVR across all training sample size trials. The increase in training sample size improved the accuracy for CRC estimation, starting at RMSE equal 6.544% for 40% training samples of the whole sample population and improving to 6.354% for an 80% training sample size. The increased with the increase in training samples. In general, all ML approaches improved accuracies with the increase in training sample size relative to total sample population.

Contribution of Input Predictors on ML Accuracy
First, the five methods were trained and run with MSI bands as input features, yielding significantly higher accuracies than regressions based on individual bands. When the 13 CRIs from Table 1 were used as input features for a comparative analysis, the performances of the five models were further improved. However, regardless of whether nind MSI bands or 13 CRIs were used as input predictors for PLSR, ANN, GPR, and SVR, none of these models performed better than the univariate regression based on 3BI2 in terms of RMSE. When using mean textures and bands as input variable, the GPR, SVR, and RF models achieved better results than 3BI2. These three models also achieved a good performance by using mean textures and CRIs as predictors. Among the limited studies on the use of remote sensing indices as input features in ML models for the prediction of vegetation variables, Shah et al. [28] showed the possibility of using vegetation indices in the RF for estimating chlorophyll. Our results showed the potential for using CRIs and textural features together as input variables in ML approaches for the enhanced estimation of CRC. Moreover, these results indicated that input predictors have an impact on the performance of ML models and highlighted the importance of mean textural feature to obtain better estimates of CRC.
The PLSR is a powerful tool that can model several response variables simultaneously while effectively addressing strong collinear variables [52]. The retrieval accuracy of PLSR was improved by combining mean textures with bands. A combination of mean textures and CRIs via PLSR also enhanced the estimation of CRC. We extracted mean textural features from Sentinel-2 images, and correlated then with CRC. The results demonstrated that a combination of CRIs and mean textures via PLSR can enhance the retrieval accuracy of CRC.
ANN underperformed the other four ML models in terms of RMSE, regardless of the input features. This is likely to do with the fact that ANN is often used with large training datasets [53] and it does not perform well when used with input variables deviating from the small dataset presented during the training stage [54,55]. In this study, we had a limited size of training dataset (n = 243).
Recently, SVR gained popularity for the estimation of vegetation biophysical variables. Yang et al. [56] applied the SVR to estimate leaf area index and leaf chlorophyll density of rice, and found that SVR outperformed linear non-parametric methods. Our result showed that when bands or CRIs were used as input variables, SVR did not perform as well as GPR and RF. However, the SVR model based on a combination of bands and textures outperformed the other methods.
Among the four non-linear non-parametric techniques, GPR and RF have the ability to optimize input variables for the purpose of enhancing model simplicity and improved accuracy. We did not investigate the importance of input variables for RF model because many studies have already done so [28,57]. We emphasized the application of the σ in the GPR covariance function to optimize input predictors. Our results demonstrate that the use of the optimal input variables filtered by σ for each groups of variables enhanced the prediction accuracy of GPR. From Figure 7a, the MSI bands identified as highly important in the estimation of CRC were B3, B4, B5, B6, B11, and B12. When the nine MSI bands and four textural features were combined (Figure 7c), B12 was also identified as the optimal variable. B12 (2190 nm) is near 2100 nm, where the reflectance of crop residue was mainly controlled by lignin and cellulose [58]. Moreover, indices in Table 1 associated with B12 provided a good retrieval of CRC (Table 3). It is likely that the selection of B12 can provide useful information about CRC. From Figure 7b, NDI5, NDRI, SGNDI, NDSVI, and 3BI1 were identified as the optimal input variables. It is worth noting that some of the CRIs that had high correlations with CRC were not identified as final input variables, such as 3BI2 and 3BI3, according to the values of σ. However, 3BI2 was selected as one of the optimal input variables when CRIs and textural features were combined (Figure 7d). Such a result agrees with the conclusion by Shah et al. [28] that remote sensing indices may perform variously when used in combination as input variables in a machine learner. Three of the four textural features were identified as optimal input variables not only when used in combination with bands but also when used in combination with CRIs (Figure 7c-d). The best input features for GPR were B4, B8a, B12, and mean textures of B3, B4, and B8. The results highlight the power of the GPR algorithms for identifying the optimal information for CRC estimation and the importance of selecting proper input variables for obtaining robust outputs.

Importance of Textural Features for CRC Estimation
Image texture is a quantification of the spatial variation of image tone values, which can be related to spatial distribution of vegetation [59]. The above-ground organization of crop residue elements is represented in texture, which is supplementary to the spectral image and may provide additional information about crop residue. In this study, although mean textural features extracted from Sentinel-2 10 bands have produced a low correlation with the UAV-derived CRC, we demonstrate that textural features have a high capability to provide an improvement in estimating CRC along with spectral information such as spectral bands and CRIs. For each ML algorithm, the combination of mean texture and MSI bands resulted in improvement in estimating CRC compared with MSI bands only. The combination of mean texture and CRIs also led to increasing estimation accuracy. Our experimental result confirmed Jin et al.'s findings [35]. They compared the power using eight textural measures extracted from Landsat 8 OLI bands to map CRC with CRIs based on PLSR. Their study showed that the combination of textural features and CRIs performed better than CRIbased variables for estimating CRC. Such similar conclusions suggest that textural information along with spectral information could potentially improve estimating CRC compared to using spectral information only. The benefits of combining textural measures and spectral information in estimating vegetation canopy parameters (i.e., LAI and vegetation fractional coverage) have been proved [32][33][34]. Zhou et al. [32] evaluated the performance of vegetation indices (VIs), texture measures, and combinations of VIs and texture measures on LAI estimation. They found that the approach based on a combination of VIs and textural feature yielded higher accuracy than the VI-based approach and texture-based approach. Based on these studies, it can be concluded that the accuracy of estimated variables based on remote-sensing data could be increased by considering textural information.

Limitations of the Experiment
We presented a low-cost method for mapping CRC at ultrahigh resolution in this study. UAVderived RGB orthomosaics were classified into two classes: crop residue and background. Our results showed a high classification accuracy, which provides a good basis for training and evaluating Sentinel-2 data-based models. It should be noted that the inversion only relied on UAV-CRC data. Several studies demonstrated that the UAV orthomosaics interpretation enables the cost-effective creation of large and comprehensive datasets in comparison to field work [60,61]; therefore, field measures of CRC were not included in this paper, given the ultrahigh resolution of UAV data. We acknowledge that there is space for improvement in the classification accuracy by comparing different classification methods [62], such as object-based image analysis, or logistic regression, which is a powerful statistical learning method for a two-class situation [63].

Conclusion
In this study, we focused on the estimation of CRC based on empirical regressions and machine learning methods from Sentinel-2 imagery. Based on UAV collected CRC and near-synchronous Sentinel-2 image, inversion models were established and validated. The results show that 3BI1, 3BI2, and 3BI3 improved the sensitivity to CRC and the estimation accuracy compared to the published CRIs. MSI bands, CRIs, and their combinations with mean textural features were used as input variables for PLSR, ANN, GPR, SVR and RF models. The five ML approaches with bands as input variables yielded accuracies second to those methods based on CRIs, the combination of indices and textural features, and the combination of bands and textural features, respectively. The SVR with bands and textural features yielded excellent performance (R 2 cv = 0.67, RMSEcv = 6.343%). The GPR based on CRIs and textural features also obtained high accuracy (R 2 cv = 0.66, RMSEcv = 6.352%). The retrieval accuracy of each GPR model optimized by was further improved. The best-performing model was a GPR with input variables of red, NIR, the second SWIR bands, and mean textural features of blue, red, and NIR bands, obtaining an accuracy of R 2 cv equal 0.69 and RMSEcv equal 6.149%. Comparing empirical regression and machine learning methods, it can be concluded that (1) most of the five ML models with the optimal input variables performed better than univariate regression, (2) using a scheme of combining mean textural features with spectral information could lead to higher accuracy of estimating CRC than spectral information alone, and (3) GPR is recommended for CRC estimation.
Supplementary Materials: The hyperparameters for ML models and the 10-fold cross-validation result of each fold for each algorithm. The following are available online at www.mdpi.com/2072-4292/12/9/1470/s1, Table S1: Optimized hyperparameters for ANN with 9 bands and 4 mean textural features as input predictors. Table S2: Optimized hyperparameters for SVR with 9 bands and 4 mean textural features as input predictors. Table S3: Optimized hyperparameters for RF with 9 bands and 4 mean textural features as input predictors. Table S4: Optimized hyperparameters for GPR with 9 bands and 4 mean textural features as input predictors. Table S5: The R 2 cv and RMSEcv (%) values for each fold of the 10-fold cross-validation of the univariate regressions based on MSI bands. Table S6: The continuation of Table S5. Table S7: The R 2 cv and RMSEcv (%) values for each fold of the 10-fold cross-validation of the univariate regressions based on CRIs. Table S8: The continuation of Table S7. Table S9: The continuation of Table S8. Table S10: The R 2 cv and RMSEcv (%) values for each fold of the 10-fold cross-validation of the univariate regressions based on textural features. Table S11: R 2 cv and RMSEcv (%) values for each fold of the 10-fold cross-validation of the five ML approaches with 9 bands and 4 mean textural features as input predictors.