Regional Terrain Complexity Assessment Based on Principal Component Analysis and Geographic Information System: A Case of Jiangxi Province, China

Regional terrain complexity assessment (TCA) is an important theoretical foundation for geological feature identification, hydrological information extraction and land resources utilization. However, the previous TCA models have many disadvantages; for example, comprehensive consideration and redundancy information analysis of terrain factors is lacking, and the terrain complexity index is difficult to quantify. To overcome these drawbacks, a TCA model based on principal component analysis (PCA) and a geographic information system (GIS) is proposed. Taking Jiangxi province of China as an example, firstly, ten terrain factors are extracted using a digital elevation model (DEM) in GIS software. Secondly, PCA is used to analyze the information redundancy of these terrain factors and deal with data compression. Then, the comprehensive evaluation of the compressed terrain factors is conducted to obtain quantitative terrain complexity indexes and a terrain complexity map (TCM). Finally, the TCM produced by the PCA method is compared with those produced by the slope-only, the variation coefficient and K-means clustering models based on the topographic map drawn by the Bureau of Land and Resources of Jiangxi province. Meanwhile, the TCM is also verified by the actual three-dimensional aerial images. Results show that the correlation coefficients between the TCMs produced by the PCA, slope-only, variable coefficient and K-means clustering models and the local topographic map are 0.894, 0.763, 0.816 and 0.788, respectively. It is concluded that the TCM of the PCA method matches well with the actual field terrain features, and the PCA method can reflect the regional terrain complexity characteristics more comprehensively and accurately when compared to the other three methods.


Introduction
Regional terrain complexity assessment (TCA) is a meaningful foundation for identifying regional geological features, estimating groundwater level and river information and effectively utilizing the land resources in order to promote social and economic development [1][2][3][4]. Hence, it is significant to strengthen the TCA research and produce terrain complexity maps (TCM) in the areas with mountainous and hilly landforms [5][6][7].
The modeling of TCA includes multiple processes, such as acquisition of data sources, assessment unit division, selection of terrain factors and assessment model building. Recently, the technologies of remote sensing (RS) and geographic information systems (GIS) provide accurate and abundant data for regional TCA; therefore, the terrain factors used in this study are also acquired based on RS and GIS [8][9][10][11]. Furthermore, it is very important to select an appropriate model for TCA. In response to this issue, some scholars have proposed a lot of TCA-related models in recent years. Some of these models describe the regional terrain complexity only using a single terrain factor; for example, Tianwen, et al. [12] and Huaxing, et al. [13] selected slope angle as the terrain factor of TCA; Long et al. [14] used the fractal dimension value to describe the terrain complexity; Ashenfelter and Eberhard [15] adopted the angle between the space planes to describe the terrain complexity. These studies are unable to comprehensively reflect the terrain features of the study areas because they select limited terrain factors for TCA. On the other hand, some scholars used several types of terrain factors to determine the regional terrain features. For example, Chambers, et al. [16] pointed out that the regional differentiation laws of terrain complexity can be reflected by the synthesis calculations of terrain factors; Huaxing, Liu and Tang [13] explored the composite terrain factors based on a digital elevation model (DEM) to reflect the change characteristics of topographic relief and folds respectively from the holistic and local perspectives; Rawat and Joshi [17] studied the regional terrain characteristics in the land suitability classification; Thompson, et al. [18] analyzed the effects of DEM with different spatial resolutions on TCA and land landscape simulation.
In any case, there are still several problems in the previous models (variation coefficient method, k-means clustering method, etc.) which need to be solved, such as the need to introduce more terrain factors to TCA models, the lack of consideration of redundant information between terrain factors and the quantitative computation of the terrain complexity indexes. That is to say, the previous method cannot deal with the redundant information between the ten terrain factors. As a result, the TCA accuracy may be reduced. Furthermore, it is difficult for the previous methods to calculate the terrain complexity indexes quantitatively. Therefore, it is necessary and significant to propose a novel model which is appropriate for quantitatively and accurately analyzing a lot of terrain factors and can solve multifactor collinearity problems.
In this study, an excellent and widely acknowledged multivariate statistical method, namely principal component analysis (PCA), is proposed to work with TCA to overcome the drawbacks of the previous models. The PCA method can effectively calculate the component scores from the extracted principal components. Generally, the calculated component scores can be considered as the quantitative terrain complexity indexes, which are helpful to introduce regional topographic features into engineering applications. Then, the comprehensive assessment results can be accurately calculated by the PCA method [19]. The PCA model has been successfully used in various research fields [20][21][22][23]. For example, Li et al. [24] proposed the PCA and high-dimensional model representation techniques to estimate the probabilistic power flow; Geng, et al. [25] used the PCA based long short-term memory network to predict short-term wind speed time series; Schwartz, et al. [26] proposed a robust PCA method to conduct change detection in ultrawideband synthetic aperture radar images; Gao, et al. [27] researched network intrusion detection using extreme learning machine and adaptive PCA methods. However, the PCA method has received little attention in the field of regional TCA. Therefore, the PCA method is used to deal with the terrain factor compression and quantitative calculation of TCA.
The Jiangxi province of China is taken as the research area because the terrain of Jiangxi province is very complex, with types of high mountains and hills and so on. Moreover, complex terrain environments limit the local geological survey and land use planning. Hence, this paper firstly extracts several terrain factors using the DEM data in the GIS software and proposes the PCA method to calculate the terrain complexity indexes of Jiangxi province. Then, the influences of the extracted terrain factors on terrain complexity are explored. Moreover, the TCA results of PCA are compared with those of the slope-only method (considering only the slope map as the TCA map), variation coefficient method and k-means clustering model to discuss the modeling performance of PCA.

Introduction of Jiangxi Province and Data Sources
Jiangxi province is located in Eastern China, between 24 • 29 14" N~30 • 04 41" N and 113 • 34 36" E1 18 • 28 58" E, with a length of 620 km from north to south and 490 km from east to west, covering an area of 166,900 km 2 ( Figure 1). Jiangxi province is mainly hilly and mountainous, with extensive basins and valleys. This province is relatively flat in the northern part, surrounded by mountains on three sides in the eastern, western and southern parts, and is hilly in the middle part, making it a huge basin inclined toward Poyang Lake and opening to the north. The Ganjiang river, Fuhe river, Xinjiang river, Xiuhe river and Raohe river are the five major rivers that flow through Jiangxi province, and Poyang Lake is the largest freshwater lake in China. This province has a subtropical humid climate, with annual average rainfall ranging from 1341 to 1940 mm. with those of the slope-only method (considering only the slope map as the TCA map), variation coefficient method and k-means clustering model to discuss the modeling performance of PCA.

Introduction of Jiangxi Province and Data Sources
Jiangxi province is located in Eastern China, between 24°29′14″ N~30°04′41″ N and 113°34′36″ E~118°28′58″ E, with a length of 620 km from north to south and 490 km from east to west, covering an area of 166,900 km 2 ( Figure 1). Jiangxi province is mainly hilly and mountainous, with extensive basins and valleys. This province is relatively flat in the northern part, surrounded by mountains on three sides in the eastern, western and southern parts, and is hilly in the middle part, making it a huge basin inclined toward Poyang Lake and opening to the north. The Ganjiang river, Fuhe river, Xinjiang river, Xiuhe river and Raohe river are the five major rivers that flow through Jiangxi province, and Poyang Lake is the largest freshwater lake in China. This province has a subtropical humid climate, with annual average rainfall ranging from 1341 to 1940 mm.

Description of Terrain Factors
The data sources used for TCA modeling include (1) the topography data acquired through field survey; (2) the aerial images taken by the unmanned aerial vehicle on 1st February 2018 (freely downloaded from Google Earth 7.1.8.3036 (32-bit) with spatial resolution of 1.09 m); (3) DEM data with 100 m resolution taken on 11th February 2000 for extracting terrain factors (freely downloaded from http://www.gscloud.cn/) [28,29]; (4) the local topographic map with 1:250,000 scale. The Jiangxi province is divided into 16,713,615 grid cells with 100 m grid resolution-this is because this resolution can effectively characterize the topographical features of the study area, and it will not lead to too much computation as a result of mass grid cells [30].
In this study, ten terrain factors (including elevation (m), slope (°), plan curvature, profile curvature, relief amplitude (m), surface roughness, surface cutting depth (m), gully density (m/m 2 ), elevation variation coefficient and slope length (m)) from the above data sources are extracted by the spatial analysis function of ArcGIS 10.2 software. All the ten terrain factors are respectively divided into five classes using the natural breakpoint method [31][32][33]. This is because the natural breakpoint

Description of Terrain Factors
The data sources used for TCA modeling include (1) the topography data acquired through field survey; (2) the aerial images taken by the unmanned aerial vehicle on 1st February 2018 (freely downloaded from Google Earth 7.1.8.3036 (32-bit) with spatial resolution of 1.09 m); (3) DEM data with 100 m resolution taken on 11th February 2000 for extracting terrain factors (freely downloaded from http://www.gscloud.cn/) [28,29]; (4) the local topographic map with 1:250,000 scale. The Jiangxi province is divided into 16,713,615 grid cells with 100 m grid resolution-this is because this resolution can effectively characterize the topographical features of the study area, and it will not lead to too much computation as a result of mass grid cells [30].
In this study, ten terrain factors (including elevation (m), slope ( • ), plan curvature, profile curvature, relief amplitude (m), surface roughness, surface cutting depth (m), gully density (m/m 2 ), elevation variation coefficient and slope length (m)) from the above data sources are extracted by the spatial analysis function of ArcGIS 10.2 software. All the ten terrain factors are respectively divided into five classes using the natural breakpoint method [31][32][33]. This is because the natural breakpoint method can effectively identify the frequency distribution characteristics of terrain factors to obtain the best class division scheme, and it has been successfully used in the research of the class divisions by many other scholars [34][35][36][37]. The purpose of class division is to better represent the distribution rules of these terrain factors ( Figure 2). In fact, the standardized results of terrain factors are the final input variables of all the TCA models.
(1) Elevation The elevation [36,38] of the study area ranges from 2 to 2147 m. The mountains with high elevation mainly distribute around the Jiangxi province, while the central area of Jiangxi province has relatively low elevation. Generally speaking, the higher the elevation in the southern hilly area, the greater the terrain complexity ( Figure 2a).
(2) Slope According to the slope classification standard and the actual slope characteristics, the slopes of the study area are divided into five classes as flat slope, gentle slope, moderate slope, steep slope and acute slope [39][40][41][42]. The area with gentle slope covers around 33.88% of the total study area. The greater the slope, the more complex the terrain. To produce TCM based on the slope-only model, the flat slope, gentle slope, moderate slope, steep slope and acute slope in Jiangxi province are respectively considered as very low, low, moderate, high and very high complex terrain levels.
(3) Plan curvature and profile curvature Plan curvature and profile curvature respectively reflect the change characteristics of concave and convex terrains from the horizontal direction and vertical direction [43]. The plan curvature ( Figure 2b) can be determined by calculating the slope aspect of a slope map. Meanwhile, the profile curvature ( Figure 2c) can be determined by calculating the slope twice on the basis of the DEM map [44]. In general, the greater the profile curvature or plan curvature, the higher the terrain complexity. However, compared with profile curvature, plan curvature has less correlation with terrain complexity.
(4) Relief amplitude The relief amplitude of the study area can be obtained through the statistical test and the maximum height difference method. The greater the relief amplitude, the higher the terrain complexity [45]. According to Liu [46], the study area can be divided into five grades: plain region, tableland, hill, small rolling mountain and mountain. The results in Figure 2d show that the main terrain relief types of Jiangxi province are hill and small rolling mountain.
(5) Surface cutting depth Surface cutting depth is defined as the difference between average elevation and minimum elevation in a certain area of one point on the ground surface [47]. In this study, the calculation radius of surface cutting depth is set to be 12 grid cells and the calculation results are shown in Figure 2e. Generally, a greater surface cutting depth means higher terrain complexity [48].
(6) Surface roughness The surface roughness, which is an important index reflecting surface relief and soil erosion degree, is defined as the ratio of surface area to its projected area in a certain grid cell ( Figure 2f). The surface roughness can be determined using the grid calculator of ArcGIS software according to Equation (1), where Rou is the surface roughness value, and S is the slope value of the study area. The greater the surface roughness, the higher the terrain complexity [49].
(7) Gully density The gully density ( Figure 2g) is defined as the total length of the river network in a unit area [50,51]. The river network in a study area can be extracted by the hydrological analysis tool of ArcGIS 10.2 software. Then, we can calculate the gully density through the grid calculator, as shown in Equation (2), where D S is the gully density, and L is length of the river networks in a unit area A.
(8) Elevation variation coefficient Elevation variation coefficient ( Figure 2h) can be expressed as the ratio of the standard deviation of elevations to their average value in a certain area. This terrain factor is in direct proportion to the terrain complexity [52,53].
(9) Slope length Slope length (Figure 2i) suggests the projected length of the maximum ground distance on a horizontal plane, from a point on the ground to its starting point in the direction of water flow [54,55]. Slope length affects terrain complexity indirectly and its formula is expressed in Equation (3), where m indicates the flow length on the ground surface in the direction of water flow, and S is the slope value of the ground surface.

Modeling Procedures of TCA
The modeling procedures of TCA using the PCA method are shown in Figure 3: (1) All ten types of terrain factors are extracted and graded from the DEM data using spatial analysis tools and data management tools in ArcGIS 10.2 software.

Modeling Procedures of TCA
The modeling procedures of TCA using the PCA method are shown in Figure 3: (1) All ten types of terrain factors are extracted and graded from the DEM data using spatial analysis tools and data management tools in ArcGIS 10.2 software.
(2) The PCA method is used to conduct quantitative calculation of topographic complexity indexes based on the covariance matrix of the above terrain factors. The principal components with cumulative contribution rates greater than 90% are adopted to calculate the component scores, which are considered the terrain complexity indexes.
(3) The slope-only, variation coefficient and k-means clustering methods are respectively used to assess the topographic complexity for comparisons. Their TCA results are also classified into five levels, namely very low, low, moderate, high and very high terrain complexity.
(4) The TCMs of Jiangxi province are produced in ArcGIS 10.2 by the above four methods. (5) The accuracy of these TCMs is evaluated using the correlation coefficients between the TCMs and the topographic map and also evaluated through matching with the actual three-dimensional aerial images. (3) The slope-only, variation coefficient and k-means clustering methods are respectively used to assess the topographic complexity for comparisons. Their TCA results are also classified into five levels, namely very low, low, moderate, high and very high terrain complexity.
(4) The TCMs of Jiangxi province are produced in ArcGIS 10.2 by the above four methods. (5) The accuracy of these TCMs is evaluated using the correlation coefficients between the TCMs and the topographic map and also evaluated through matching with the actual three-dimensional aerial images.

DEM data in Jiangxi Province
Correlations between TCMs and topographic map, and the actual 3dimensional aerial images for accuracy assessment Terrain factors extracted from DEM in GIS software Information compression of Terrain factors using PCA

Calculation of component scores as terrain complexity indexes
Compared with those calculated by Slope-only, variation coefficient and K-means clustering methods

Principal Component Analysis
The PCA method aims to use the idea of dimensionality reduction to transform multiple factors into a few principal components. Each principal component can reflect a piece of information of the original variables [56]. Furthermore, the contained information of each principal component is not repeated with that of other principal components. The PCA converts a given set of related variables into another set of unrelated variables through a linear transformation. Then, these new uncorrelated variables are arranged in descending order of variance. Initially, when using the PCA method, the dimensionality of all terrain factors should be unified. Supposing the original data set

Principal Component Analysis
The PCA method aims to use the idea of dimensionality reduction to transform multiple factors into a few principal components. Each principal component can reflect a piece of information of the original variables [56]. Furthermore, the contained information of each principal component is not repeated with that of other principal components. The PCA converts a given set of related variables into another set of unrelated variables through a linear transformation. Then, these new uncorrelated variables are arranged in descending order of variance. Initially, when using the PCA method, the dimensionality of all terrain factors should be unified. Supposing the original data set x ij , i = 1, 2, · · · · · · , n; j = 1, 2, · · · · · · m is composed by n grid cells with m types of terrain factors, the data set x ij can be standardized by column as x ij − x j σ j , i = 1, 2, · · · · · · , n, j = 1, 2, · · · · · · , m ij is the standardized result of x ij , x j is the mean value of x j , and σ j is the standard deviation of x j . The correlation coefficient matrix of x * ij is first calculated. Then, the eigenvalues λ and their corresponding eigenvectors are calculated based on the correlation coefficient matrix. The contribution rate of each eigenvalue can be obtained using Equation (6), where k is the number of eigenvalues.
According to the eigenvalue and its eigenvector, the physical significance of the principal components is explained. According to the load coefficient value of the selected principal component in each terrain factor, the weight coefficient value of each terrain factor in the selected principal component is calculated by the following formula: where a ij and l ij are respectively the weight coefficient value and load coefficient value of the j-th terrain factor in the i-th principal component; λ i is the eigenvalue corresponding to the i-th principal component. Then, the comprehensive factor of each principal component can be calculated as where F i is the comprehensive factor of all the terrain factors, and m is the number of terrain factors. Finally, the comprehensive scores of the selected principal components are obtained by Equation (9), where V is the total contribution rate of the selected principal components; T is the number of retained principal components; V i is the contribution rate of the i-th principal component.

Variation Coefficient Method
The variation coefficient, which is an objective weighting method, directly uses the information contained in each factor to calculate the weight of the factor [57,58]. The basic idea of this method is that, in the evaluation factor system, the greater the difference of the values contained in a factor, the more the influence of a factor on the evaluated object. The differences between the values contained in a factor can be measured using the variation coefficient of this factor as where VC j is the variation coefficient of the j-th factor, also known as the standard deviation coefficient; σ j and x j are respectively the standard deviation and mean values of the j-th factor. Then, the weight of each factor is calculated as

K-Means Clustering Model
The K-means clustering model [59,60] firstly sets the number of clusters K and the initial center point of each cluster and then updates the center point and optimizes the clustering results in an iterative way. The specific steps are as follows: (1) We set a training sample S = {x 1 , x 2 , · · · , x m }, where x m ∈ R n and m represents the number of samples. K clusters of (SD 1 , SD 2 , · · · , SD K ) are set as the output. Meanwhile, K clustering center points are selected and denoted as u 1 , u 2 , · · · , u k .
(2) Empty K clusters in turn as SD j = {}, j = 1, 2, · · · , k. For each sample x p in the training set S, its distance to the K center points is respectively calculated, and then it is classified into the corresponding cluster with the smallest distance to the center point of the cluster. In Equations (12) and (13), C p denotes the cluster which x p belongs to (3) For each cluster SD i , its center point µ i is recalculated according to Equations (14) and (15), where the sum_ f eature (SD i ) is the sum of the characteristic values of all samples in cluster SD i , and the sum_number(SD i ) is the number of samples in cluster SD i .
(4) Repeat steps 2 and 3 until the cost function is less than the given threshold. Equation (16) is set as the cost function, which denotes the sum of Euclidean distances between all samples in the sample set S and the center points of their corresponding clusters.

Terrain Factor Analysis
The selected terrain factors are standardized, and then the Kaiser-Meyer-Olkin (KMO) test and Bartlett spherical test of the PCA method are performed based on the standardized data. Results show that the KMO test value of the standardized data is 0.790 (greater than 0.6) and its Bartlett spherical test value is less than 0.05, indicating that the PCA method is appropriate for dealing with the selected terrain factors. The PCA method is conducted on all the grid cells in Jiangxi province through the covariance matrix of the selected terrain factors. Then, the eigenvalues and explanatory contribution rates of all principal components are obtained, as shown in Table 1. According to the results in Table 1, the principal components that can be used to describe the terrain factors are selected. Relevant studies suggest that the main information contained in primitive variables can be effectively interpreted by the corresponding principal components when their cumulative contribution rate reaches 90% [61]. It can be seen from Table 1 that the explanatory cumulative contribution rate of the first five principal components F i1 ∼ F i5 can reach 91.78%. Therefore, the F i1 ∼ F i5 (Table 2) are used to describe the terrain characteristics of Jiangxi province. It can be seen from Tables 1 and 2 that the explanatory contribution rate of F i1 is 55.179%, and the maximum load in F i1 is S, with a load coefficient of 0.96, followed by the Rel and Rou, with load coefficients of 0.956 and 0.944, respectively. Therefore, F i1 is mainly used to describe the slope and relief amplitude of the study area. The explanatory contribution rate of F i2 is 11.843%, Ecv is the maximum load in F i2 , with a load coefficient of 0.840, and Gd is the second maximum load in F i2 , with a load coefficient of 0.477, indicating that F i2 is mainly used to describe the elevation variation coefficient and gully density of the study area. In addition, the explanatory contribution of F i3 is 10.072% and the maximum load in F i3 is Plc, with a load coefficient of 0.927, which suggests that F i3 is mainly used to describe the plan curvature. Based on Equation (8), the comprehensive factors of the selected five principal components F i1~Fi5 are calculated as follows:

Terrain Complexity Assessment
According to Equation (9) and the five principal components F i1 ∼ F i5 , the terrain complexity index F of each grid cell in Jiangxi province can be calculated as The TCA results of Jiangxi province using the PCA method are shown in Figure 4a. The areas with very high and high terrain complexity levels are mainly distributed in the surrounding areas of Jiangxi province with continuous mountain ranges, accounting for 1.5% and 6.0% of the total area, respectively. These very high and high terrain complexity indexes are mainly affected by the great values of slope, elevation, relief amplitude, surface cutting depth, surface roughness and profile curvature. On the contrary, the very low and low complexity regions account for the largest area (36.6% and 40.7%, respectively) and are mainly distributed in central parts of Jiangxi province with small values of slope, elevation, relief amplitude, profile curvature and gully density. Furthermore, the moderate terrain complex areas account for 15.1% of the total area and are evenly distributed in the study area. The terrain complexity of Jiangxi province is at a low and middle level on the whole. The overall terrain complexity of the study area is in line with its topographic and geomorphic characteristics, which can be mainly described as middle and low mountains supplemented by hilly areas. Meanwhile, the TCA produced by the slope-only method is shown in Figure 4b for comparison, suggesting that the overall distribution rule of both TCMs is consistent.

TCA Using Variation Coefficient Method
According to the selected terrain factors, the variable coefficient of each terrain factor is determined. In addition, the weight values of all terrain factors are calculated based on their variable coefficients, as shown in Table 3. Then, these weight values are put into Equation (11) to calculate the terrain complexity indexes of Jiangxi province. The produced TCA results are shown in Figure 5a. It can be seen from Table 5 that the very low, low, moderate, high and very high terrain complexity areas account for around 36.64%, 40.72%, 15.14%, 6.04% and 1.46% of the total area, respectively.

TCA Using Variation Coefficient Method
According to the selected terrain factors, the variable coefficient of each terrain factor is determined. In addition, the weight values of all terrain factors are calculated based on their variable coefficients, as shown in Table 3. Then, these weight values are put into Equation (11) to calculate the terrain complexity indexes of Jiangxi province. The produced TCA results are shown in Figure 5a. It can be seen from Table 5 that the very low, low, moderate, high and very high terrain complexity areas account for around 36.64%, 40.72%, 15.14%, 6.04% and 1.46% of the total area, respectively.

TCA Using K-Means Clustering Model
The selected standardized terrain factors are put into the k-means clustering model. First, the number of clusters is set to 5, the clustering center point is randomly determined, and the number of iterations is set to 25. Then, each grid cell with ten standardized terrain factors is classified into the corresponding cluster with the smallest distance to the final center point of the cluster through 25 iterative calculation. The five final center points of the clusters are shown in Table 4. Finally, all the grid cells are classified into five types of different terrain complexity levels, as shown in Figure 5b and Table 5.

Accuracy Assessment of the Four Models
In order to analyze the TCA accuracy of the PCA, slope-only, variable coefficient and k-means clustering models, the correlation coefficients between the TCMs produced by the four models and the local topographic map obtained from the government field surveys are calculated, respectively. A greater correlation coefficient value means that the produced TCM and the topographic map matches better, which further suggests higher TCA accuracy.
In this study, the topographic map with a scale of 1: 250,000, surveyed by the Bureau of Land and Resources of Jiangxi province in 2015, is used to evaluate the TCA accuracy of the four models. In the topographic map of Jiangxi province, the topographic types are classified as five categories of plain, hill, low mountain, middle mountain and mountain, which are assigned values of 1, 2, 3, 4, 5, respectively ( Figure 6). In order to compare the TCMs with the topographic map, the produced TCMs are classified into five terrain complexity levels of very low, low, moderate, high and very high, which are also assigned values of 1, 2, 3, 4, 5, respectively. Then, the correlation coefficients between the topographic map and the TCMs produced by the PCA, slope-only, variable coefficient and K-means clustering models are calculated as 0.894, 0.763, 0.816 and 0.788, respectively. The results show that the TCM produced by the PCA method has the highest correlation coefficient with the topographic map compared to those produced by the slope-only, variable coefficient and k-means clustering models. That is to say, the PCA model has much higher TCA accuracy than the other three types of models. A greater correlation coefficient value means that the produced TCM and the topographic map matches better, which further suggests higher TCA accuracy. In this study, the topographic map with a scale of 1: 250,000, surveyed by the Bureau of Land and Resources of Jiangxi province in 2015, is used to evaluate the TCA accuracy of the four models. In the topographic map of Jiangxi province, the topographic types are classified as five categories of plain, hill, low mountain, middle mountain and mountain, which are assigned values of 1, 2, 3, 4, 5, respectively ( Figure 6). In order to compare the TCMs with the topographic map, the produced TCMs are classified into five terrain complexity levels of very low, low, moderate, high and very high, which are also assigned values of 1, 2, 3, 4, 5, respectively. Then, the correlation coefficients between the topographic map and the TCMs produced by the PCA, slope-only, variable coefficient and K-means clustering models are calculated as 0.894, 0.763, 0.816 and 0.788, respectively. The results show that the TCM produced by the PCA method has the highest correlation coefficient with the topographic map compared to those produced by the slope-only, variable coefficient and k-means clustering models. That is to say, the PCA model has much higher TCA accuracy than the other three types of models.

TCA Compared with Three-Dimensional Aerial Images
In order to more intuitively reflect the consistency between the TCA performance and the actual terrain features, the TCM produced by the PCA method is compared with three-dimensional aerial images of the study area. Eight locations in the TCM are randomly selected, and the three-

TCA Compared with Three-Dimensional Aerial Images
In order to more intuitively reflect the consistency between the TCA performance and the actual terrain features, the TCM produced by the PCA method is compared with three-dimensional aerial images of the study area. Eight locations in the TCM are randomly selected, and the three-dimensional aerial images of these locations (Figure 7a-h)  (1) The topographic complexity levels of Figure 7a,d are mainly high and very high, which is highly consistent with the three-dimensional aerial images of Figure 7a,d. Generally, the topography types of Figure 7a,d are dominated by mountains and valleys.
(2) The topographic complexity levels in both Figure 7b,c are low and the corresponding elevation variations are small. Meanwhile, the topographic features shown in the three-dimensional aerial images of Figure 7b,c are plain and hill. Hence, the matching degree between the TCA result and the three-dimensional aerial image is very high.
(3) A mixture of several kinds of topographic complexity levels is shown in Figure 7e, which is also very consistent with the topographic features of rolling hills and low mountains shown in the aerial image of Figure 7e. The above analysis shows that the TCA accuracy of the PCA method is high and the terrain complexity maps match well with the actual three-dimensional aerial images. (3) A mixture of several kinds of topographic complexity levels is shown in Figure 7e, which is also very consistent with the topographic features of rolling hills and low mountains shown in the aerial image of Figure 7e. The above analysis shows that the TCA accuracy of the PCA method is high and the terrain complexity maps match well with the actual three-dimensional aerial images.

Limitations and Research Prospects
Although the PCA method used in this study shows better TCA performance than the other three types of methods, there are still some limitations in the processes of PCA modeling. Firstly, it is necessary to further test the accuracy of the original DEM data and strive to obtain DEM data with

Limitations and Research Prospects
Although the PCA method used in this study shows better TCA performance than the other three types of methods, there are still some limitations in the processes of PCA modeling. Firstly, it is necessary to further test the accuracy of the original DEM data and strive to obtain DEM data with higher accuracy on this basis. Secondly, the DEM spatial resolution of 100 m is relatively rough; hence, DEM data with higher spatial resolution is needed in order to implement TCA modeling in further research. Thirdly, only ten kinds of terrain factors are extracted from the DEM data in this study; it is significant to extract more terrain factors from DEM data, such as ridge lines, valley lines, slope forms, etc. In the next study, more terrain factors will be introduced as the input variables of the PCA method. Fourthly, the main features extracted by the PCA are the linear correlations among all these terrain factors, and the subsequent research will extract more nonlinear features among the terrain factors. Therefore, kernel principal component analysis [62] or other nonlinear data compression methods can be considered in order to carry out this study. Finally, the accuracy of terrain complexity assessment should be analyzed and verified by combining it with field investigation in the next research, and the engineering applications of TCA results in various fields should be explored to enhance the research significance of this study.
In addition, for the use of the PCA method, all the data sets are put into the PCA method to calculate terrain complexity indexes. However, there is a lack of statistical validation of the results, as a result, an estimate of the goodness of the model itself is not present in this study. Hence, compared to the unsupervised models used in this study, some supervised mathematical models will be used to implement TCA in the next study. For the supervised mathematical models, some of the data sets are used to create a model and the remaining are used to give an estimate of the goodness of the model itself. Furthermore, it is necessary to make sure that the accuracy of the description of the territory using only the new variables is homogeneous over the whole territory. Hence, in the next study, we need to repeat the construction of the models many times using each time a subset of the available data (typically two or three of the datasets) extracted randomly.

Conclusions
There are some disadvantages in the previous TCA studies, such as few terrain factors used for TCA modeling, the lack of information compression between terrain factors and the qualitative TCA modeling processes. In order to overcome these disadvantages, a PCA method is introduced to carry out TCA in this study. The Jiangxi province of China with complex terrain features is used as the study area. Ten terrain factors including elevation, slope, plan curvature, profile curvature, relief amplitude, surface roughness, surface cutting depth, gully density, elevation variation coefficient and slope length are selected as input variables of PCA. The TCM produced by PCA is compared with the results of slope-only, variation coefficient and K-means clustering models and is also verified by aerial images obtained by field survey.
It can be concluded that the PCA method can accurately and reliably describe the terrain complexity distribution features; the produced TCM is very consistent with the actual terrain features. In addition, the proposed PCA method has considerably higher accuracy than the other three models. The main contributions of this study include the comprehensive consideration of ten related terrain factors to address TCA, the removal of redundant information in these terrain factors and the quantitative calculation of terrain complexity indexes. Hence, the TCM produced in this study can be applied to improve the local geological investigation and land planning utilization.