Integration of Boosted Regression Trees and Cellular Automata—Markov Model to Predict the Land Use Spatial Pattern in Hotan Oasis

: The simulation and prediction of the land use changes is generally carried out by cellular automata—Markov (CA-Markov) model, and the generation of suitable maps collection is subjective in the simulation process. In this study, the CA-Markov model was improved by the Boosted Regression Trees (BRT) to simulate land use to make the model objectively. The weight of ten driving factors of the land use changes was analyzed in BRT, in order to produce the suitable maps collection. The accuracy of the model was veriﬁed. The outcomes represent a match of over 84% between simulated and actual land use in 2015, and the Kappa coe ﬃ cient was 0.89, which was satisfactory to approve the calibration process. The land use of Hotan Oasis in 2025 and 2035 were predicted by means of this hybrid model. The area of farmland, built-up land and water body in Hotan Oasis showed an increasing trend, while the area of forestland, grassland and unused land continued to show a decreasing trend in 2025 and 2035. The government needs to formulate measures to improve the utilization rate of water resources to meet the growth of farmland, and need to increase ecological environment protection measures to curb the reduction of grass land and forest land for the ecological health.


Introduction
The over-utilization of land resources with the increase of population has caused a series of ecological and environmental problems, such as forest reduction, soil erosion, climate change, etc., which have made the unmatched contradiction of the population, resources and global environmental change more prominent [1,2]. Currently, the simulation and prediction of land use changes are mainly carried out by establishing mathematical models [3], which mainly include the Markov model [4], Cellular Automata model [5][6][7], System Dynamics model [8,9], and Artificial Neural Networks [10,11]. The Cellular Automata-Markov (CA-Markov) combines the ability to simulate the spatial variation of complex systems of CA model and the long-term prediction of the Markov model. Therefore, it has been used in land use simulation [12][13][14], species distribution [15], and ecological environment [16,17].
Currently, the issue of the use of the CA-Markov model for land use is given more attention [18,19]. The research of application of the CA-Markov model for land use in a wide range of regions, and the related research is increasing year-by-year. The CA-Markov model, which has been calibrated for the

Input Data
The remote sensing monitoring data of the land use in Hotan Oasis in 1995,2000,2005,2010, and 2015 were downloaded from the Resource and Environment Data Cloud Platform (http://www.resdc.cn), for that the data before 1995 has larger error due to technical reasons, such as the accuracy of the remote sensing data, the data missing in a year which was derived from the fusion of other nearby years. The digital grid is 30 m × 30 m. The data was generated by manual visual interpretation, based on each period of remote sensing images of the Landsat TM/ETM as the data source.
Six primary land use types were divided into in this study: farmland, forestland, grassland, water body, built-up land, and unused land. The secondary catalogue and explanation of land use classification can be queried through the above website. The farm land includes dry land and paddy field; grassland includes natural grassland with coverage of 5-20%, the mixed grassland of natural grassland and improved grassland with total coverage of 20-50%, and the mixed grassland of natural grassland, improved grassland and mowed grassland with total coverage of more than 50%; unused land including sand, Gobi, saline-alkali land, swampland, bare land, bare rock texture land, etc. For analyzing the factors that affect the land use distribution in Hotan Oasis, ten factors were selected

Input Data
The remote sensing monitoring data of the land use in Hotan Oasis in 1995,2000,2005,2010, and 2015 were downloaded from the Resource and Environment Data Cloud Platform (http://www.resdc.cn), for that the data before 1995 has larger error due to technical reasons, such as the accuracy of the remote sensing data, the data missing in a year which was derived from the fusion of other nearby years. The digital grid is 30 m × 30 m. The data was generated by manual visual interpretation, based on each period of remote sensing images of the Landsat TM/ETM as the data source.
Six primary land use types were divided into in this study: farmland, forestland, grassland, water body, built-up land, and unused land. The secondary catalogue and explanation of land use classification can be queried through the above website. The farm land includes dry land and paddy field; grassland includes natural grassland with coverage of 5-20%, the mixed grassland of natural grassland and improved grassland with total coverage of 20-50%, and the mixed grassland of natural grassland, improved grassland and mowed grassland with total coverage of more than 50%; unused Sustainability 2020, 12, 1396 4 of 13 land including sand, Gobi, saline-alkali land, swampland, bare land, bare rock texture land, etc. For analyzing the factors that affect the land use distribution in Hotan Oasis, ten factors were selected according to the previous studies [30], such as the elevation, slope, annual precipitation, annual temperature, groundwater depth, population density, gross domestic product density, distance to water body, distance to roads, and distance to the village centers (Table 1), the data were obtained from the Resource and Environment Data Cloud Platform (http://www.resdc.cn).

BRT Model
BRT is a self-learning method based on Classified And Regression Trees (CART). The BRT combines the strengths of two algorithms: regression trees (models that relate a response to their predictors by recursive binary splits) and boosting (an adaptive method for combining many simple models to give improved predictive performance). It generates multiple regression trees by random selection and self-learning, which can improve the stability and prediction accuracy of the model [31,32]. The BRT model in R can be downloaded from the previous research [32].
At present, the BRT has been well applied to the driving force of the environmental pollution and land use change [33,34]. The BRT can calculate the percentage of influence of the independent variable on the dependent variable, as well as the relationship between the independent variable and the dependent variable when the other independent variables are considered as mean value or constant value [35]. The interaction between independent variables was not need to be considered in BRT, while the output is visible and easy to be understood.
Generally, the receiver-operating curve (ROC) is used to evaluate the classification accuracy of the model [36]. The ROC value (area under the curve) is a common criterion for evaluating the predictive performance of the predictive model. It is generally considered that when the ROC value is greater than 0.75, the independent variable results in a good explanatory power for the dependent variable [37]. The analysis used R language package of the BRT can be downloaded from the previous research [32].
In the study, 1 × 10 4 sample points are randomly selected by ArcGIS. The land use value is assigned to be as the binary structure (zero, one) in the BRT model. That means, "one" is for the area of one unchanged land use type from 1995 to 2015, and "zero" is for the rest area during the years. The bagging fraction is 0.75, the learning rate is 0.005, and the tree complexity is 5 in BRT.

CA-Markov Model
The CA model and Markov model are discrete dynamic models of time and state. They are widely used in the simulation and prediction of land use [38]. According to the previous studies [12,14,38,39], the CA model can be expressed as: where S is states of discrete cellular, t is the time instant, t + 1 is the coming future time instant respectively, f is the transformation rule of local space, N is the cellular field. The Markov model for predicting land use changes can be presented using the conditional probability equation as: where S t and S t+1 is land use status at time of t and t + 1, respectively, P ij is the transition probability matrix, i and j are the land use type at time of t and t + 1, respectively, n is the number of land use types.
The CA-Markov model can not only predict accurately in quantity, but also visually express the results in spatial distribution [14,40,41]. However, in the CA-Markov model, when the suitability map collection of the type of land use was making, the Analytical Hierarchy Process was generally used to determine the weight of each suitability map for the factor to create the suitability map collection of the type of land use, which is strong subjective.

BRT-CA-Markov Model
In this study, BRT-CA-Markov model combines BRT model with CA-Markov model to simulate land use, which is more objective. The BRT-CA-Markov model is based on the percentage value of the influencing factors of each type of land use, calculated by BRT, as the pretreatment of the model, which was used to calculate the weights of each corresponding suitability map collection. The suitability raster map value was standardized from 0 to 255 as the CA transition parameters by fuzzy membership function (e.g., J-shaped monotonically decreasing, Sigmoidal monotonically decreasing and linear function), with higher values indicating greater suitability. The suitability raster map for each factor was multiplied by each corresponding weight, so as to create the suitability maps collection by the superposition operation of the MCE (multi-criteria evaluation) to run the CA-Markov model in IDRISI 17.0 (Clark Labs, Clark University, Worcester, MA, USA) [6]. The suitability map of the CA-Markov model is more objective and reasonable, due to the quantitative analysis results of the BRT are directly applied to the production of suitability maps collection as the weights of the influencing factors.
The main implementation of processing architecture of the BRT-Markov-CA model is shown in Figure 2.
The Kappa coefficient can be used to analyze the accuracy assessment of the land use classification by comparing the actual classification results. In addition, it can also be used to measure the similarity of two images, which can be applied to assess the stationarity of the landscape [6,42,43]. The Kappa coefficient (K c ) is calculated as where P 1 is the percentage of the observed agreement, and P 2 is the percentage of the expected agreement. When K ≤ 0.4, the number of grids overlapping between the simulated map and the actual map is small, the similarity between the two maps is low, the difference is high, and the simulation accuracy is low. When 0.4 < Kappa < 0.75, it is considered good, and Kappa ≥ 0.75 is better [44].
The simulated land use spatial distribution map of Hotan Oasis was compared to the actual map in 2015 by "Cross-Tabulation" in IDRISI 17.0 using the Kappa coefficient to test the spatial simulation accuracy of the model. The Kappa coefficient can be used to analyze the accuracy assessment of the land use classification by comparing the actual classification results. In addition, it can also be used to measure the similarity of two images, which can be applied to assess the stationarity of the landscape [6,42,43]. The Kappa coefficient (Kc) is calculated as where P1 is the percentage of the observed agreement, and P2 is the percentage of the expected agreement. When K ≤ 0.4, the number of grids overlapping between the simulated map and the actual map is small, the similarity between the two maps is low, the difference is high, and the simulation accuracy is low. When 0.4 < Kappa < 0.75, it is considered good, and Kappa ≥ 0.75 is better [44].
The simulated land use spatial distribution map of Hotan Oasis was compared to the actual map in 2015 by "Cross-Tabulation" in IDRISI 17.0 using the Kappa coefficient to test the spatial simulation accuracy of the model. Table 2 shows that the factors have good explanatory power for land use distribution, where all the values of ROC greater than 0.75. The weights of influence of the factors on the land use types can be seen in Table 3. The factor of distance from the village centers had the greatest influence on the distribution of the farm land, the factor of distance from roads had the greatest influence on the forestland, and the factor of population density, factor of distance to water body and factor of gross domestic product density had the greatest influence on the grassland, water body and built-up land, respectively. It indicated that the human factors that affected the land use distribution was major. Human activities such as the urban construction, irrigation projects construction, and road construction, etc., mainly changed the types of the land use.  Table 2 shows that the factors have good explanatory power for land use distribution, where all the values of ROC greater than 0.75. The weights of influence of the factors on the land use types can be seen in Table 3. The factor of distance from the village centers had the greatest influence on the distribution of the farm land, the factor of distance from roads had the greatest influence on the forestland, and the factor of population density, factor of distance to water body and factor of gross domestic product density had the greatest influence on the grassland, water body and built-up land, respectively. It indicated that the human factors that affected the land use distribution was major. Human activities such as the urban construction, irrigation projects construction, and road construction, etc., mainly changed the types of the land use.

BRT-CA-Markov Model Validation
The BRT-CA-Markov model was used to predict the land use in 2015. Transition probability matrix and transition area between 1995 and 2005 were obtained by the IDRISI. The suitability raster map value was standardized by the fuzzy membership functions including J-shaped monotonically decreasing and Sigmoidal monotonically decreasing/increasing. Suitability maps collection was produced (Figure 3) considered the influence of factors (Table 3). A standard 5 × 5 contiguity filter [6] was used as the neighborhood definition in this study. The land use in 2005 was considered as a starting point, and selected ten CA iterations to simulate the predicted land use in 2015.

Factors
Farmland

BRT-CA-Markov Model Validation
The BRT-CA-Markov model was used to predict the land use in 2015. Transition probability matrix and transition area between 1995 and 2005 were obtained by the IDRISI. The suitability raster map value was standardized by the fuzzy membership functions including J-shaped monotonically decreasing and Sigmoidal monotonically decreasing/increasing. Suitability maps collection was produced (Figure 3) considered the influence of factors (Table 3). A standard 5 × 5 contiguity filter [6] was used as the neighborhood definition in this study. The land use in 2005 was considered as a starting point, and selected ten CA iterations to simulate the predicted land use in 2015.  The distribution of the simulation land use in 2015 can be seen in Figure 4. Spatial error between simulation and actual land use in 2015 also can be seen in Figure 5. From the accuracy assessment for the simulation used by ArcGIS 10.2 (ESRI, Redlands, CA, USA) (Table 4), the Percentage Error of the spatial distribution of each land use was less than 16% and the Kc was 0.89. Therefore, it is practicable to use the BRT-CA-Markov model to predict land use in 2025 and 2035. The distribution of the simulation land use in 2015 can be seen in Figure 4. Spatial error between simulation and actual land use in 2015 also can be seen in Figure 5. From the accuracy assessment for the simulation used by ArcGIS 10.2 (ESRI, Redlands, CA, USA) (Table 4), the Percentage Error of the spatial distribution of each land use was less than 16% and the Kc was 0.89. Therefore, it is practicable to use the BRT-CA-Markov model to predict land use in 2025 and 2035.   The distribution of the simulation land use in 2015 can be seen in Figure 4. Spatial error between simulation and actual land use in 2015 also can be seen in Figure 5. From the accuracy assessment for the simulation used by ArcGIS 10.2 (ESRI, Redlands, CA, USA) (Table 4), the Percentage Error of the spatial distribution of each land use was less than 16% and the Kc was 0.89. Therefore, it is practicable to use the BRT-CA-Markov model to predict land use in 2025 and 2035.

Land Use in 2025 and 2035
In this study, the land use in 2015 was considered as the initial year, the relative influence of factors on land use distribution, the suitability map collections of the land use, the transition probability matrix and transition area of the land use from 2005 to 2015 and ten CA iterations were input into the BRT-CA-Markov model to predict the spatial distribution of land use in 2025. Similarly, according to the previous study [14], the prediction of land use in 2035 should be set up 2025 as the initial year, and the transition probability matrix and transition area of land use from 2005 to 2015 and ten CA iterations should be input into the BRT-CA-Markov model.
The land use areas of each type were calculated by ArcGIS 10.2 (Table 5)

Discussion
There are many factors affect the accuracy of the CA-Markov model, such as the grid size, the selection of influencing factors and the processing methods of the suitable maps collection [3]. In this study, the grid of 30 m × 30 m was selected, and as many as ten factors that affect the land changes were analyzed for the simulation. The driving factor of the land use considered was more than other studies, even such as the factor of groundwater depth was assessed [14,45]. Many studies generally used the CA-Markov model based on the Analytic Hierarchy Process in operating of the MCE, which was been integrated in the IDRISI [22,41,46]. However, the simulation would be affected by the

Discussion
There are many factors affect the accuracy of the CA-Markov model, such as the grid size, the selection of influencing factors and the processing methods of the suitable maps collection [3]. In this study, the grid of 30 m × 30 m was selected, and as many as ten factors that affect the land changes were analyzed for the simulation. The driving factor of the land use considered was more than other studies, even such as the factor of groundwater depth was assessed [14,45]. Many studies generally used the CA-Markov model based on the Analytic Hierarchy Process in operating of the MCE, which was been integrated in the IDRISI [22,41,46]. However, the simulation would be affected by the variance in expert opinions in the AHP, for that was usually relies on subjective standards in obtaining the weights of factors used. In recent years, the model was improved based on Frequency Ratio, which had yielded a good performance and accuracy. However, analyzing the factors was also artificial in generating suitable maps collection in that method [21]. In this study, the CA-Markov model based on the BRT model achieved a high satisfactory accuracy, and the simulation process was objective in the generation of the suitable maps collection due to the BRT. In order to verify the advantages of the BRT-CA-Markov further, the CA-Markov model based on the other models such as Analytic Hierarchy Process in the Hotan River Basin should be studied, and the accuracy of results will be compared.
For the total research area, the simulation has achieved a high satisfactory accuracy. However, the inconsistency in the land use seems to be in particular at some areas, such as the 224th regiment field in the southwest of the study area. The inconsistency of farmland is mainly because that Xinjiang Construction Corps is large-scale reclaiming the unused land, used the irrigation projects, to meet the planning of regional agricultural development during the periods. The prediction and simulation of future land use is based on the existing policies for all the land use models. The major policy adjustments made by the governments will cause more inconsistency in the simulation, which needs further discussion.
The increase tendency of the area of farmland indicated that the scope of reclaim wasteland was increase according to the study, although the China's Grains for Green Policy (GGP) has driven one of the world's largest conservation projects which was intended to sacrifice farmland for the amenity and environmental values provided by forestland, grassland and wetland [47]. The increasing area of farmland would need more water. On the other hand, water resources for the Hotan Oasis is scarce. Therefore, it is necessary for the government to formulate policies to increase the popularization and implementation of water-saving irrigation technology, and to improve the water use efficiency to meet the growth of farmland. Although the water body was probably increased due to the construction of hydraulic projects, that might not meet the requirements of water for the expansion of the farmland.
The decreasing trend of area of forestland and grassland indicates that the ecological environment is degenerating gradually, and it is necessary to strengthen water and soil conservation measures to curb this trend. The tendency of area of built-up land was increased, which was indicated that the residents demand for urbanization level increased continuously with the continuous development of economy and population, furthermore, the governments needs to make plans to prevent urban disorder expanding.

Conclusions
The creation process of suitability maps collection in the Boosted Regression Trees-cellular automata-Markov (BRT-CA-Markov) model is more objective to the CA-Markov model based on Analytical Hierarchy Process (AHP), also the BRT-CA-Markov model has yielded a good performance and accuracy. That will be useful in the research on land use prediction. The prediction of the land use in Hotan Oasis in 2025 and 2035 used the model is feasible. From the simulation, the areas of farmland, built-up land and water body in Hotan Oasis showed an increasing trend, while the areas of forestland, grassland and unused land continued to show a decreasing trend in 2025 and 2035. The government needs to formulate measures to improve the utilization rate of the water resources to meet the growth of farmland in such an arid and water-scarce area in Hotan Oasis. Meanwhile they need to monitor and control disorder expanding of farmland and increase ecological environment protection measures to curb the reduction of grassland and forestland for the ecological health.
Furthermore, the spatial-temporal pattern and optimal allocation of water-land resources under a changing environment in the Hotan River Basin will be studied, according to the results of this study and the spatial distribution of water resources in Hotan River Basin in the future.

Conflicts of Interest:
The authors declare no conflict of interest.