Estimation of Sugarcane Yield Using a Machine Learning Approach Based on UAV-LiDAR Data

: Sugarcane is a multifunctional crop mainly used for sugar and renewable bioenergy production. Accurate and timely estimation of the sugarcane yield before harvest plays a particularly important role in the management of agroecosystems. The rapid development of remote sensing technologies, especially Light Detecting and Ranging (LiDAR), significantly enhances aboveground fresh weight (AFW) estimations. In our study, we evaluated the capability of LiDAR mounted on an Unmanned Aerial Vehicle (UAV) in estimating the sugarcane AFW in Fusui county, Chongzuo city of Guangxi province, China. We measured the height and the fresh weight of sugarcane plants in 105 sampling plots, and eight variables were extracted from the field-based measurements. Six regression algorithms were used to build the sugarcane AFW model: multiple linear regression (MLR), stepwise multiple regression (SMR), generalized linear model (GLM), generalized boosted model (GBM), kernel-based regularized least squares (KRLS), and random forest regression (RFR). The results demonstrate that RFR (R 2 = 0.96, RMSE = 1.27 kg m − 2 ) performs better than other models in terms of prediction accuracy. The final fitted sugarcane AFW distribution maps exhibited good agreement with the observed values (R 2 = 0.97, RMSE = 1.33 kg m − 2 ). Canopy cover, the distance to the road, and tillage methods all have an impact on sugarcane AFW. Our study provides guidance for calculating the optimum planting density, reducing the negative impact of human activities, and selecting suitable tillage methods in actual cultivation and production.


Introduction
Sugarcane is a crop that grows in the tropics and subtropics and serves numerous economic and ecological functions [1,2]. In addition to the well-known edible properties, sugarcane lignocellulosic biomass (including bagasse, straw, and tops) is a type of cheap, abundant, and renewable raw material that promotes sustainable development and can be utilized for biofuel, bioenergy, and several valuable biomolecules [3][4][5]. However, the problems of bioenergy production such as carbon dioxide released from burning or decomposing biomass and competition in agricultural land use need to be solved [6]. China is the third-largest sugar consumer of the world, and the overwhelming majority (approximately 90%) of sugar originates from sugarcane [7]. With the increasing demand for sugar and renewable energy, timely, spatial, and precise estimation of yield is important for optimizing the scientific layout of sugarcane production and management.
Crop yield is usually achieved by the direct weighing method and remote sensing estimations. The field measurement method is costly and laborious to obtain sufficient inventory data for large areas. However, remote sensing-based estimation methods can greatly improve efficiency. Earth observation (EO) data has important roles in global crop condition monitoring and yield estimation [8]. In general, there are three kinds of types of information that can be used to estimate crop yield. Firstly, crop growth models (CGMs) that take the nature of the soil, climatic variables, and so on as input parameters that can forecast yield by simulating crop physical processes [9]. Secondly, low-resolution satellite images that refer to optical sensors with a resolution between 250 m and several kilometers play an important role in yield prediction at the regional level [10]. Thirdly, light detection and ranging (LiDAR) is applicable in yield estimation because of its ability to obtain related information such as height at a fine spatial resolution [11]. In addition to the abovementioned three methods, other approaches (including using synthetic aperture radar (SAR) data, building regression models based on remotely sensed indicators or mixed information together with bio-climatic predictor variables, and so on) is also useful to quantify the expected yield [9,10]. LiDAR, referred to as laser scanning, is an active remote sensing technology that can directly obtain information on the vertical vegetation structure [12]. LiDAR has been recognized as a promising technology to characterize vegetation aboveground biomass (AGB) [13,14]. Wu et al. [15] concluded that LiDAR-derived height and intensity metrics could map fine-scale AGB with low uncertainty at the plot-level. Numerous lidar-derived candidate metrics, in particular height metrics, exhibit good correlations with forest biomass [16]. Previous studies have demonstrated the potential of LiDAR data in estimating AGB. However, most studies focused on forest ecosystems [17][18][19][20] rather than agroecosystems [21]. The estimation of sugarcane aboveground fresh weight (AFW) in agroecosystem is similar to that of vegetation AGB in other ecosystems. The high density of sugarcane represents significant challenges in the application of LiDAR, but the simplicity of sugarcane's morphological structure is conducive to the estimation of AFW by height. Compared with optical sensors, LiDAR installed on an unmanned aerial vehicle (UAV) is more advantageous for acquiring high-density point clouds and fine-resolution structure data [22]. Thus, in-depth investigations on the feasibility of UAV-LiDAR technology in the estimation of sugarcane AFW are necessary.
A reliable statistical algorithm that remote sensing data can depend on is critical for the estimation of crop yield. As one of the challenging problems in agriculture, many previous studies have tried to find a more accurate way between diversified regression models [23][24][25][26]. Machine learning algorithms have become an important decision support tool in massive crop yield estimation [27,28]. For example, Leroux et al. [29] found that the random forest regression (RFR) model exhibited better performance than multiple linear regression for maize yield estimation. Sakamoto [30] estimated the spatial distribution of United States corn and soybeans yield effectively through the use of the RFR model. Gyamerah et al. [31] forecasted groundnut and millet yield based on quantile random forest and Epanechnikov kernel function successfully. Nevertheless, there is a gap in knowledge of whether the application of the RFR algorithm in sugarcane AFW prediction is also feasible.
In China, Guangxi province is the largest sugarcane planting base, and Chongzuo city is the largest sugarcane planting base in Guangxi province. The seven counties under the jurisdiction of Chongzuo city have been designated by the Ministry of Agriculture as "double high" (high yield and high sugar content) sugarcane bases. The overall aim of our study was to accurately estimate the AFW of the sugarcane plantations in Fusui county, Chongzuo city of Guangxi province, China. The yield can be affected by management modes q2in farming programs and unplanned human disturbance activities. The management modes include tillage method and planting density which can be reflected by canopy cover. The influence of human disturbance activities varies with the distance of sugarcane to the road. Thus, the specific objectives of our research were threefold: (1) identify appropriate variables and compare the performance of different regression algorithms on sugarcane AFW modeling systematically; (2) investigate the potential of UAV-LiDAR data in yield prediction and generate AFW maps of the study area; (3) examine the impacts of canopy cover, the distance to the road, and tillage methods on sugarcane AFW estimation.

Study Area and Field Sampling
The study area is located in Fusui county (Figure 1), Chongzuo city of Guangxi province, and it extends across 22.55 • N-22.57 • N, 107.80 • E-107.82 • E. According to the Chongzuo Meteorological Bureau (http://www.chongzuo.gov.cn/zjcz/zrdl/t61340.shtml), the local mean annual temperature is 21.7 • C and the annual precipitation is approximately 1200 mm. Given its location in the subtropical monsoon climate zone, the climate characteristics are suitable for sugarcane cultivation.
Sugarcane is generally planted using two tillage methods (intensification and individualization) in early February and harvested in early December each year. The field sampling was accomplished in late November before ripening. We conducted a survey of thirty-five sites and each site contained three plots. In total, 105 sample plots 1 m × 1 m in size were randomly selected to obtain sugarcane inventory data. The geographic coordinates (longitude and latitude) of each plot were recorded using a GPSMAP 639csx from a Taiwan manufacturer, the GARMIN company. In each plot, we cut the aboveground part of all sugarcane plants and kept the leaves on the stem. The number of sugarcane plants in each plot ranged from nine to eighteen. The height and the fresh weight of each plant were measured.
Remote Sens. 2020, 12, x FOR PEER REVIEW 3 of 13 yield can be affected by management modes q2in farming programs and unplanned human disturbance activities. The management modes include tillage method and planting density which can be reflected by canopy cover. The influence of human disturbance activities varies with the distance of sugarcane to the road. Thus, the specific objectives of our research were threefold: (1) identify appropriate variables and compare the performance of different regression algorithms on sugarcane AFW modeling systematically; (2) investigate the potential of UAV-LiDAR data in yield prediction and generate AFW maps of the study area; (3) examine the impacts of canopy cover, the distance to the road, and tillage methods on sugarcane AFW estimation.

Study Area and Field Sampling
The study area is located in Fusui county (Figure 1), Chongzuo city of Guangxi province, and it extends across 22.55°N-22.57°N, 107.80°E-107.82°E. According to the Chongzuo Meteorological Bureau (http://www.chongzuo.gov.cn/zjcz/zrdl/t61340.shtml), the local mean annual temperature is 21.7 °C and the annual precipitation is approximately 1200 mm. Given its location in the subtropical monsoon climate zone, the climate characteristics are suitable for sugarcane cultivation.
Sugarcane is generally planted using two tillage methods (intensification and individualization) in early February and harvested in early December each year. The field sampling was accomplished in late November before ripening. We conducted a survey of thirty-five sites and each site contained three plots. In total, 105 sample plots 1 m × 1 m in size were randomly selected to obtain sugarcane inventory data. The geographic coordinates (longitude and latitude) of each plot were recorded using a GPSMAP 639csx from a Taiwan manufacturer, the GARMIN company. In each plot, we cut the aboveground part of all sugarcane plants and kept the leaves on the stem. The number of sugarcane plants in each plot ranged from nine to eighteen. The height and the fresh weight of each plant were measured.

UAV-LiDAR Data Preprocessing and LiDAR-Derived Metrics Extraction
We took four days in November 2019 to collect the raw UAV-LiDAR data based on an LR1601-IRIS Lidar point cloud data acquisition system that was installed on a DJI M600 UAV platform. The

UAV-LiDAR Data Preprocessing and LiDAR-Derived Metrics Extraction
We took four days in November 2019 to collect the raw UAV-LiDAR data based on an LR1601-IRIS Lidar point cloud data acquisition system that was installed on a DJI M600 UAV platform. The whole study area is about 1 km 2 and the costs were roughly 2000 USD (United States dollar) in total. The main technical indicators of LiDAR are described in Table 1. In total, 52 remote sensing flying transects were collected to cover the entire area, and the overlap between adjacent routes was set to be greater than 30%. The flight altitude was 100 m above the ground with a uniform velocity of 1 m s −1 . The WGS-84 coordinate system and UTM projection were used for the point cloud data.
The segmentation boundary of LiDAR data was established in Google Earth software and saved in KML format. Then we imported the path data together with LiDAR data into Point Cloud Producer software to obtain all the UAV-LiDAR point clouds. The point clouds dataset included five columns of information: easting, northing, elevation, intensity, and airline_ID. Point density was approximately 175 points m −2 , which was dense enough to allow points to penetrate the sugarcane canopy and reach the exposed surface. The DEM (Digital Elevation Model) and DSM (Digital Surface Model) were generated using ENVI LiDAR software with a spatial resolution of 0.2 m. The height of sugarcane was obtained by subtracting DSM raster data and DEM raster data using ENVI 5.3.1 software. Then the average height, the variance of height, the square of average height, and the range of height with a spatial resolution of 1 m were calculated in ArcGIS 10.4 software. Lidar-derived variables have a high consistency with data from a field survey.

Different Regression Algorithms for Sugarcane AFW Prediction
Six different algorithms, including multiple linear regression (MLR), stepwise multiple regression (SMR), generalized linear model (GLM), generalized boosted model (GBM), kernel-based regularized least squares (KRLS), and random forest regression (RFR), were selected to construct the prediction models in our study. These models are classified into linear models and nonlinear models. R square (R 2 ) and root mean square error (RMSE) between fitted and observed data were used as the evaluation index for model accuracy. The training set and test set contained 78 samples and 27 samples respectively.
Indicators related to height are commonly used to build models to estimate vegetation AGB [32,33]. Due to the significant positive correlation between sugarcane height and weight ( Figure 2) which were measured in the field on individual plants, we selected eight field survey metrics as the appropriate variables. Details of the variables are summarized in Table 2. All regression algorithms were performed in R software (R version 3.6.1).  MLR is a mathematical algorithm that uses multiple independent variables to obtain the predicted value. MLR models have been widely used in various scientific fields given their universality and well-founded theoretical basis [34]. To prevent collinearity, all-subsets regression was implemented using the "leaps" package in R to filter variables in the table. The general form of the formula is shown below: where y is AFW in our study, β0 to βn are unknown linear regression coefficients, and X1 to Xn represent predictor variables. SMR is a variable selection method of the MLR model that is used to gain the highest determination coefficient. The stepwise regression method combines forward selection with backward elimination, which is achieved by adding and deleting control variables according to the needs of each step [35]. Previous research has shown that the stepwise Akaike information criterion (AIC) method for variable selection is more appropriate than other stepwise methods [36]. As a result, we selected important variables from available variables based on the AIC principle. This study was conducted using the "stepwise" package in R statistical environment.

Machine Learning Regression Models
In machine learning regression algorithms, we chose GLM, GBM, KRLS, and RFR as predictive models. The response variable is the fresh weight of sugarcane, and the predictor variables are all shown in Table 2.  MLR is a mathematical algorithm that uses multiple independent variables to obtain the predicted value. MLR models have been widely used in various scientific fields given their universality and well-founded theoretical basis [34]. To prevent collinearity, all-subsets regression was implemented using the "leaps" package in R to filter variables in the table. The general form of the formula is shown below: where y is AFW in our study, β 0 to β n are unknown linear regression coefficients, and X 1 to X n represent predictor variables. SMR is a variable selection method of the MLR model that is used to gain the highest determination coefficient. The stepwise regression method combines forward selection with backward elimination, which is achieved by adding and deleting control variables according to the needs of each step [35]. Previous research has shown that the stepwise Akaike information criterion (AIC) method for variable selection is more appropriate than other stepwise methods [36]. As a result, we selected important variables from available variables based on the AIC principle. This study was conducted using the "stepwise" package in R statistical environment.

Machine Learning Regression Models
In machine learning regression algorithms, we chose GLM, GBM, KRLS, and RFR as predictive models. The response variable is the fresh weight of sugarcane, and the predictor variables are all shown in Table 2. GLM is an extended form of the conventional linear model. Varied distributions (e.g., the normal, binomial, poisson, and gamma) are included in the GLM framework logistic. We chose "Poisson" regression because of its superiority in fitting the logarithmic model of variables.
We used a boosted regression tree (BRT) analysis in GBM and implemented the GBM model through the "gbm" package in R statistical environment. BRT combines the strengths of regression trees and boosting [37]. These strengths bring advantages in modeling non-parametric relationships between AFW and predictor variables.
KRLS applies the kernel trick to conduct regression estimation. It is a kind of machine learning approach that allows users to solve regression problems with the characteristics of ease-of-use and interpretability [38]. In particular, KRLS owns a flexible hypothesis space and provides closed-form estimates for the predicted values, which broaden the application range of the model. We implemented the KRLS model through the "KRLS" package in R statistical environment.
RFR is a well-known and powerful machine learning method proposed by Breiman [39] in which numerous decision trees are used to train and predict the sample. A large set of decision trees improves the classification and regression trees method [40]. In the process of model building, two main parameters must be optimized in this model: the number of decision trees (ntree) and the predictor used in the binary tree at each node selected at random (mtry). Finally, new data are obtained by averaging the predictions of all regression trees [41]. RFR has been widely used in the estimation of AGB given its outstanding characteristics such as insensitivity to noise in training data and nonlinear problems [39,42].
We implemented the RFR model through the "randomForest" package in R statistical environment.

Canopy Cover, the Distance to the Road and Tillage Method Information Extraction
To analyze agricultural management factors associated with sugarcane AFW, we considered canopy cover, the distance to the road, and tillage methods. Information extraction about corresponding yield with different managements was based on point clouds dataset (*.txt) and AFW raster data (*.dat), which were extracted from further LiDAR data analysis and programming in R software. Canopy cover is defined as the ratio of the vertical projection area of vegetation canopy to the total statistical area. Prior to calculating the canopy cover, the normalized point cloud data was generated in the LiDAR360 software. Then the normalized data were used as input data to generate the final results in the "canopy cover" function. We built six categories of the region of interest (ROI) in ENVI software according to the distances of the sugarcane planting area to the road: 0-2, 2-4, 4-6, 6-8, and 8-10 m and greater than 10 m. Tillage methods were divided into intensification and individualization according to the boundary of the farm. Intensive cultivation is within the business scope of local farms, and other farmland focuses on individual cultivation. Finally, we combined the canopy cover, the distance to the road, and tillage methods information with the predicted sugarcane yield information according to the longitude and latitude of each grid in ENVI Classic software. A significance test was conducted in R software. Figure 3 showed the fitting results of different models. Non-linear models exhibited significantly improved performance compared with the two linear models. Regarding machine learning methods, GLM, GBM, KRLS, and RFR, R 2 gradually increased and RMSE gradually decreased. The RFR algorithm had the highest R 2 and the lowest RMSE in all models.   Figure 4a depicted the scatter of the observed and fitted AFW based on the RFR model and LiDAR-derived data to detect the accuracy of the map. RMSE was 1.33 kg m −2 , and the average AFW of sampling regions was 19.50 kg m −2 , implying that the RMSE% was approximately 6.8%. Moreover, the fitting effect of the trend line had practical significance with an R 2 as high as 0.97. The result confirmed the feasibility of using LiDAR point cloud data and RFR algorithms to estimate AFW of sugarcane. As noted in the AFW spatial distribution map (Figure 4b Figure 4a depicted the scatter of the observed and fitted AFW based on the RFR model and LiDAR-derived data to detect the accuracy of the map. RMSE was 1.33 kg m −2 , and the average AFW of sampling regions was 19.50 kg m −2 , implying that the RMSE% was approximately 6.8%. Moreover, the fitting effect of the trend line had practical significance with an R 2 as high as 0.97. The result confirmed the feasibility of using LiDAR point cloud data and RFR algorithms to estimate AFW of sugarcane. As noted in the AFW spatial distribution map (Figure 4b

Analysis of Different Influencing Factors
The average values of AFW increased as the canopy cover increased. However, when canopy cover was greater than 0.7, the results were basically unchanged (Figure 5a). Moreover, AFW corresponding to a distance of 0-2 m to the road was evidently lower than others. In addition, slight changes were noted when the distance was greater than 2 m (Figure 5b). Regarding tillage methods, Figure 5c showed the differences between individualization and intensification. The average AFW values of sugarcane fields planted individually and intensively were approximately 16.84 kg m −2 and 17.43 kg m −2 , respectively. Significant differences were noted between these two groups.

Analysis of Different Influencing Factors
The average values of AFW increased as the canopy cover increased. However, when canopy cover was greater than 0.7, the results were basically unchanged (Figure 5a). Moreover, AFW corresponding to a distance of 0-2 m to the road was evidently lower than others. In addition, slight changes were noted when the distance was greater than 2 m (Figure 5b). Regarding tillage methods, Figure 5c showed the differences between individualization and intensification. The average AFW values of sugarcane fields planted individually and intensively were approximately 16.84 kg m −2 and 17.43 kg m −2 , respectively. Significant differences were noted between these two groups.

Comparison of Diverse Regression Models
We explored the potential of building various regression models based on ground-measured data, including linear and nonlinear algorithms. Except for GLM, nonlinear models outperformed linear models in the accuracy of predictions. The considerations regarding the respective characteristics of different mathematical models and the selection and assessment of predictive variables were determined according to different standards, and the variable importance of regression was an important principle [43]. All-subsets regression was applied to avoid multicollinearity in the traditional linearity model. The average height of sugarcane was ignored in SMR based on the lowest AIC value.
RFR is a highly recommended machine learning approach to establish a complex nonlinear relationship, that tends to confer importance between multiple predictors [44]. A possible explanation for the increased superior performance of RFR is that each tree is independent, and each variable can demonstrate significance in different trees. Furthermore, the RFR model has solved the problem of collinearity and overfitting during AFW estimation. Previous studies have successfully demonstrated that the RFR model is applicable to make an accurate prediction of sugarcane yield [45,46] and our conclusion is consistent with them. When comparing the accuracy of all models, the highest R 2 (0.96) and the lowest RMSE (1.27 kg m −2 ) indicated that RFR was the most appropriate model for the estimation of sugarcane AFW. Scornet et al. [47] suggested that the ability to deal with small sample sizes was conducive to the popularity of RFR methodology. In our study, we accomplished surveys of thirty-five field sites, which contained 105 plots. Increasing field measurements for further analysis on the influence of sample numbers on the prediction results of the model is the focus of our future research.

Comparison of Diverse Regression Models
We explored the potential of building various regression models based on ground-measured data, including linear and nonlinear algorithms. Except for GLM, nonlinear models outperformed linear models in the accuracy of predictions. The considerations regarding the respective characteristics of different mathematical models and the selection and assessment of predictive variables were determined according to different standards, and the variable importance of regression was an important principle [43]. All-subsets regression was applied to avoid multicollinearity in the traditional linearity model. The average height of sugarcane was ignored in SMR based on the lowest AIC value.
RFR is a highly recommended machine learning approach to establish a complex nonlinear relationship, that tends to confer importance between multiple predictors [44]. A possible explanation for the increased superior performance of RFR is that each tree is independent, and each variable can demonstrate significance in different trees. Furthermore, the RFR model has solved the problem of collinearity and overfitting during AFW estimation. Previous studies have successfully demonstrated that the RFR model is applicable to make an accurate prediction of sugarcane yield [45,46] and our conclusion is consistent with them. When comparing the accuracy of all models, the highest R 2 (0.96) and the lowest RMSE (1.27 kg m −2 ) indicated that RFR was the most appropriate model for the estimation of sugarcane AFW. Scornet et al. [47] suggested that the ability to deal with small sample sizes was conducive to the popularity of RFR methodology. In our study, we accomplished surveys of thirty-five field sites, which contained 105 plots. Increasing field measurements for further analysis on the influence of sample numbers on the prediction results of the model is the focus of our future research.

The Feasibility of UAV-LiDAR Data for AFW Prediction
As an active remote sensing technology with rapid development, UAV-LiDAR has distinct advantages compared with traditional LiDAR platforms (satellite, aerial, and terrestrial), such as easy operability, lightweight, financial viability, and flexibility in acquisition and sensor integration [48,49]. Most importantly, UAV laser scanning systems provide higher point density accompanying with lower speeds and altitudes than airborne laser scanning [50]. Detailed characteristics can be extracted from the denser point cloud [51,52]. Yin and Wang [53] realized individual tree detection and delineation using LiDAR data (91 points m −2 ) collected from UAV, including height, crown diameter, and crown clumping density. Almeida et al. [54] monitored the structure of mixed-species forest restoration plantations with three structural variables (canopy height, gap fraction, and leaf area index) from the UAV-borne LiDAR system, and obtained a high R 2 (0.84) for AGB. Most of the previous studies focused on the application of UAV-LiDAR data in the prediction of forest AGB, whereas our study demonstrated that it is also feasible to use this technology for sugarcane.
In our study, data captured from UAV-LiDAR was well suited for AFW prediction and generation of sugarcane AFW maps in the specific study region. We analyzed several features related to vegetation height from LiDAR data with an average density of 175 points m −2 , which significantly contributes to the precise sugarcane AFW estimation (R 2 = 0.97 and RMSE = 1.33 kg m −2 ) with fine spatial resolution. The multirotor UAV provided a stable platform for the LiDAR sensor by reducing vibration. On the one hand, the morphological structure of sugarcane is simple, making it convenient for LiDAR to detect vegetation structure information. On the other hand, the sugarcane plant is much higher than other crops, such as rice, wheat and so on, which is conducive to the recognition of sugarcane in point cloud data if they are planted adjacent to other crops. In addition, the positive correlation between sugarcane height and weight also contributes to the accuracy of the results.
Given the abovementioned advantages and availability of lighter LiDAR sensors in the future, the UAV-borne lidar system will cover a wider range of application functions. This study may pave the way for further analysis of UAV-LiDAR systems and large-scale sugarcane AFW estimations. The fusion of multisource remote sensing may bring vitality to future research. New generation technologies, such as NASA's Global Ecosystem Dynamics Investigation (GEDI), NASA's Ice Cloud and land Elevation Satellite (ICESat−2), and the NASA-Indian Space Research Organization (ISRO) Synthetic Aperture Radar (NISAR), have the potential to collect useful data for estimating biomass at a national or global scale [55].

Insight on Agricultural Management on Sugarcane Production
We investigated the influence of agricultural management factors, including canopy cover, the distance to the road, and tillage methods, on sugarcane AFW in our study. Our results have profound significance for combining scientific research with practical application for the purpose of enhancing sugarcane yield.
Sugarcane is planted in the form of seedlings. Canopy cover directly responds to planting density. The emergence of competition mechanisms explains why the increase in the sugarcane population is not equal to a bumper harvest. Competitive interaction among individuals affected the effective use of resources, mainly including solar energy, soil nutrients, and soil moisture. Different crops have different optimum planting density. Moderate to high density is a scientific approach for improving lint yield under a late sowing date [56]. Low-density triticale regardless of cultivar performed best in low rainfall regions across the Mediterranean Basin [57]. Our results clearly demonstrate that a canopy cover of 0.6-0.7 can make the most effective use of germplasm resources. In addition to the interference of individual competition, the distance to the road also has a significant impact on AFW. Figure 5b demonstrates poor sugarcane growth in the area closest to the road, and the influence of human activities (e.g., the trampling of people and the rolling of vehicles) may represent the best explanation for this observation. In addition to the negative effects of human activities, moisture is likely to evaporate and nutrient is likely to be lost in marginal soil because the release of organic carbon and nitrogen from litter are inhibited in the marginal area [58], which can also lead to a reduction in sugarcane production. Sustainable agricultural intensification is a primary trend of land use to feed a growing population [59,60]. It can achieve many functions, including but not limited to greenhouse gas mitigation [61] and high energy efficiency [62]. The results (Figure 4c) demonstrate that intensive cultivation is worth popularizing because it can effectively improve AFW. Sustainable intensification was practiced from 1980 to 2014 in Northern China with sustained high crop production [63]. China's agriculture is still in the transition stage from conventional agriculture to modern agriculture. Increased crop production can be realized by further agricultural intensification rather than expanding the cultivated land area [64]. In the long term, popularizing farming systems for sustainable intensification must continue to be pursued in the future.

Conclusions
In our study, the effectiveness of UAV-LiDAR data is demonstrated by providing proper metrics according to the sugarcane structural parameters. We evaluated the capacity of different regression algorithms (i.e., MLR, SMR, GLM, GBM, KRLS, and RFR) to estimate sugarcane AFW. The RFR model outperforms other regression models in terms of the accuracy of prediction, providing an important method for mapping the spatial distribution of sugarcane AFW with high resolution. Furthermore, the derived sugarcane AFW map demonstrates the capacity for further analysis of the influence of agricultural management factors, including the planting density, the distance to the road, and tillage methods. The main contribution of our research results is the ability to supply theoretical support for practical farming. We believe that increased sugarcane yield is possible via the scientific calculation of the optimum planting density, the reduction of negative effects caused by human activities, and the choice of suitable tillage methods.