Improving ASTER GDEM Accuracy Using Land Use-Based Linear Regression Methods : A Case Study of Lianyungang , East China

The Advanced Spaceborne Thermal-Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM) is important to a wide range of geographical and environmental studies. Its accuracy, to some extent associated with land-use types reflecting topography, vegetation coverage, and human activities, impacts the results and conclusions of these studies. In order to improve the accuracy of ASTER GDEM prior to its application, we investigated ASTER GDEM errors based on individual land-use types and proposed two linear regression calibration methods, one considering only land use-specific errors and the other considering the impact of both land-use and topography. Our calibration methods were tested on the coastal prefectural city of Lianyungang in eastern China. Results indicate that (1) ASTER GDEM is highly accurate for rice, wheat, grass and mining lands but less accurate for scenic, garden, wood and bare lands; (2) despite improvements in ASTER GDEM2 accuracy, multiple linear regression calibration requires more data (topography) and a relatively complex calibration process; (3) simple linear regression calibration proves a practicable and simplified means to systematically investigate and improve the impact of land-use on ASTER GDEM accuracy. Our method is applicable to areas with detailed land-use data based on highly accurate field-based point-elevation measurements.


Introduction
Digital elevation models (DEMs) are often used as important data in a wide range of geographical and environmental research [1][2][3][4][5][6] and can be acquired by various methods including spatial interpolation based on ground control points (GCPs) [7,8], airborne or spaceborne LiDAR [9,10] and terrestrial laser scanning [11,12].Due to topographic, political, climatic, financial and/or technical obstacles [13,14], such methods are however not suitable for producing DEMs of large areas-for example Chinese cities or counties-that cover areas of thousands of square kilometers.
Spaceborne platforms like Terra Advanced Spaceborne Thermal-Emission and Reflection Radiometer (ASTER), however, can solve that problem by providing free DEM datasets that cover large areas for most of the globe.Compared to the early version, the 30-m resolution ASTER Global DEM (ASTER GDEM) dataset released in 2011 (hereafter GDEM2) has significantly improved spatial coverage, horizontal resolution, and accuracy.Yet data anomalies remain in the GDEM2, which can produce noticeable elevation errors on a local scale [15].Many studies have successfully explored both vertical [16][17][18][19] and relative height accuracy, since absolute vertical accuracy complements Earth superficial shape (ESS) descriptions [20].
Land-cover information is generally derived from the classification of moderate-resolution remote-sensing images [29] (for example, 30 m Landsat TM/ETM+).As such, remotely sensed land-cover data is often less accurate than land-use data; the latter is mainly used for regional public administration such as cadastral management, regional planning and taxation, and is generally obtained through field surveys or sub-meter aerial photography [30,31].Additionally, land-use data describes how land is used and can reflect topographic and vegetative features.Generally, higher elevations with more complicated topographical features correspond to lower labor-or capital-intensive land-use categories such as wood and reserve land.In flat areas, construction or arable land is more prominent than lower labor-or capital-intensive land-use types [32][33][34].This suggests that land-use data is more suitable for calibrating DEM than land-cover data.So far, however, land-use data has been little used for improving ASTER GDEM accuracy.
This study aims at calibrating ASTER GDEM2 via land-use data.We examined two linear regression calibration methods: the first is based on simple linear regression models, which consider only the impact of land-use on GDEM errors; the second is based on a multiple linear regression model and considers the impacts of both land-use and topography.Both methods were tested on the eastern coastal Chinese city of Lianyungang and the results compared.The detailed objectives of this study are to investigate the relationship between GDEM2 errors and a wide range of land-use types in order to propose a practicable land use-specific calibration method for improving the accuracy of GDEM2 in studying geographical areas for which high-resolution DEM datasets are lacking.

Study Area
Lianyungang is located on the Chinese coast in the north-east Jiangsu Province (118.4 • -119.8 • E, 34.0 • -35.1 • N) and has a temperate monsoon climate [35].The study area comprises the three districts of the urban area of Lianyungang: Xinpu, Haizhou and Lianyun (Figure 1A).The study area has a population of over one million and an area of ~633 km 2 and includes various land-use types (Figure 1B).
Since sea-level elevations at the immediate coast were labelled as 0 in GDEM2 [36], we excluded these areas from the study (that is, the coastal areas with 0 value in GDEM2), regardless of land-use types, to avoid unexpected errors in further analysis.

Advanced Spaceborne Thermal-Emission and Reflection Radiometer Digital Elevation Model (ASTER DEM) and Land-Use Datasets
The 30 m ASTER GDEM2 dataset for the study area was downloaded from the International Scientific and Technical Data mirror site, Computer Network Information Center, Chinese Academy of Sciences (http://www.gscloud.cn).It was provided in GeoTIFF format, with the Universal Transverse Mercator (UTM) Zone 50N Projection and the World Geodetic System (WGS) 1984 as the horizontal datum, and the Earth Gravitational Model 1996 (EGM96) as the vertical datum.
Although the ASTER GDEM2 was released in 2011, the Terra satellite was launched in 1999.The exact acquisition time of the tiles producing ASTER GDEM is unknown, limiting the selection of land-use data.However, a cadastral survey was conducted in 2005 to build a cadastral geodatabase integrating urban and rural areas and to map land-use at a 1:2000 scale for the study area.The 2005 land-use data obtained for the study area is midway between the satellite's 1999 launch and the data release in 2011, so that we can expect reasonable correspondence between the ASTER GDEM data and land-use data.The geodatabase includes 24 land-use types, which are not detailed here due to limited space.These were reclassified into 11 types according to China's Current Land-Use Classification (GB/T21010-2007) [37] (Figure 1B): mining (mostly salterns in the south-eastern town of Xuwei in Lianyun District; salterns are considered mining land in China), urban, rural, scenic, wood, bare, grass, wheat, rice, water (mostly inland) and tide, and garden lands (see Table A1 for description of these land-use types).The reclassified land-use types were then used to improve the GDEM2.The land-use data was re-projected from GCS_Xian_1980 to the Universal Transverse Mercator (UTM) zone 50N, in correspondence with the GDEM2.

Reference Data
Elevations measured using the Global Navigation Satellite System (GNSS) were used as ground-truth data.GNSS measurements are taken at the decimeter-centimeter level [38] and have an RMSE (root mean square error) of 30 mm, which is lower than that of GDEM2 [39].The GNSS-based point measurements (over 30,000 points evenly distributed across the study area) with the 1985 National Height Datum in China in the local cadastral survey geodatabase were converted into orthometric height with the Earth Gravitational Model 1996 (EGM96) by subtracting the height of the geoid.
Two different independent sets of sample points were randomly selected from the GNSS measurements of the study area: the first sample set (calibration data) was used for calibration by comparing it with the original GDEM2 while the second sample set (validation data) was used for assessing the performance of the method by comparing it with the calibrated GDEM2.A stratified sampling procedure was performed using an ArcGIS add-in, known as the Sampling Design Tool and allowing various sampling methods [40]: we randomly selected 100 points-minimum distance of 100 m between any pair of two points-for each reclassified land-use types except for grass land (74 points because it was relatively scarce).This resulted in each of the two sets consisting of 1074 points.

Statistical Measures
We first compared the original GDEM2 and the calibration data by subtraction for each of the 11 land-use classes, in order to investigate how elevations in the two datasets varied with land-use type.The (maximum and minimum) elevation difference between the original GDEM2 and the calibration data (Dorig), the mean error (ME), absolute mean error (AME), standard deviation error (STD), and root mean square error (RMSE) were used to characterize the vertical deviation of the original GDEM2 from the calibration data.
The statistical ME measurement reveals the tendency of GDEM2 errors.AME can avoid the offset of negative and positive values, and thus gives a better estimate of the error through absolutizing errors; it is a good indicator that reflects the average of the absolute errors.The statistical STD is the Dorig deviation from ME, and RMSE is the Dorig deviation from ground-truth values.Lower STD and RMSE mean a more concentrated distribution of Dorig.These statistical measures have been widely used for assessing the vertical accuracy of DEM [26,[41][42][43].They were acquired using the following equations: (3) where n is the number of sample points (e.g., 1074 for the study area as a whole), i the i th sample point, ME the average value of difference Dorig i , STD the standard deviation error of Dorig i , and RMSE the root mean square error of Dorig i .In all equations, Dorig i is the difference between the original GDEM2 value Z GDEM2_i and the ground-truth elevation Z elev_i (represented here by GNSS elevations).Higher absolute values of Dorig indicate larger errors in the original GDEM2 elevations in relation to the GNSS elevations.
To explore the relative height accuracy of GDEM2, the false slope ratio (FSR) was used, which can be calculated by [20]: where A and B are the total number of observed cases in which DEMs and GCPs report the same slope trend or different slope trends, respectively.Values of FSR range from 0 to 100.Lower values indicate higher relative height accuracies.

Calibration Methods
Linear regression analysis was used to calibrate the ASTER GDEM.Linear regression analysis is a simple and effective technique to build quantitative relationships between variables in a wide range of fields including geographical studies [44][45][46].It is noted that linear regression does not necessarily imply causation but indicates an association between variables.We performed simple linear regression to model the relationship between GDEM2 error and land-use types.For comparison, we conducted multiple linear regression to model the relationships between GDEM2 error, land-use types, and topographic factors.GDEM2 elevation value, slope, and aspect were included in the multiple linear regression model after they were extracted from the GDEM2 and GNSS datasets using ArcGIS.
In the multiple linear regression, land-use types and aspect were considered categorical variables, following the method proposed by Li et al. [47].Aspect was divided into nine different direction categories, each having a range of 45 • .In other words, the N category ranged from 337.5 • to 22.5 • , the NE category ranged from 22.5 • to 67.5 • and so on.With the N category being the reference, the remaining eight direction categories were included as dummy variables (i.e., NE, E, SE, S, SW, W, NW, and Flat) in the process of modelling.This means that a sample point can be described with the eight dummy variables.For example, a sample point with an SE aspect can be described as SE = 1 with other seven dummy variables being 0, and a sample point with an S aspect can be described as S = 1 with the other seven dummy variables being 0. If all the eight dummy variables are 0, the sample point has an N aspect.Similarly, with mining land as the reference, the remaining 10 land-use types were also integrated as dummy variables.
As mentioned in the introduction, DEM elevation errors are controlled by land-use and topographic factors (here, slope and aspect).Thus, we can construct a model as D orig = f (Land use, Slope, Aspect), which can be transformed into Multiple linear regression models the relationship between two or more independent variables and a dependent variable by fitting a linear equation to the observed data, and is expressed as: The model was constructed through stepwise regression, with the GDEM2 elevation, GDEM2 slope, 10 dummy variables for 11 land-use types, and 8 dummy variables for 9 aspect categories, being used as candidates for independent variables and GNSS elevation as dependent variable.Stepwise regression remains prevalent in research due to its effectiveness in identifying the optimal variables among many for constructing a good predictive model [47,48].
Prior to modelling, correlation analysis was performed to examine the relationships between these variables, which would provide a basis for the modeling.As stepwise regression can exclude highly correlated independent variables in the resultant multiple linear regression model, the correlation analysis focused on the relationship between the dependent variable and the independent variables [47,48].
Both the simple and multiple linear regression analyses were used to calibrate the ASTER GDEM dataset for the study area.The two calibrated ASTER GDEM datasets were then compared to assess the performances of the calibration methods using the above-mentioned statistical measures.

Characterizing GDEM2 Errors
The elevation difference (Dorig) between the original GDEM2 and the calibration data (using the first set of 1074 sample points randomly selected from the GNSS elevation measurements) and its statistical measures were calculated to explore the relationship between errors in ASTER GDEM and individual land-use types.It is obvious that remarkable elevation discrepancies were observed for wood, bare, scenic and garden lands (Figure 2).Table 1 and Figure 3 present the distribution of elevation difference Dorig for each land-use type.The range of Dorig was small for mining, rice, water and tide, urban, and wheat lands but wide for bare, garden, scenic and wood lands (Table 1).Boxplots display the dispersity of Dorig for each land-use type with statistical measurements including the maximum, minimum, median, upper, and lower quartile values (Figure 3).As seen in Figure 3, the ranges from lower to upper quartile values for bare, scenic, wood and garden lands are larger than for other land-use types.Additionally, the maximum and minimum values for these four land-use types are greater than for others.
The statistical measures of elevation difference Dorig for each land-use type are given in Table 2.The ME values were negative for rural and garden lands and positive for other land-use types, revealing a land use-specific bias of the GDEM2 data for our study area.Bare, wood, garden and scenic lands were characterized by the highest AME, STD and RMSE values, which indicates these had the lowest vertical accuracy among all land-use types.The FSR depicting relative height accuracy shows different features of the study area (Table 2).The FSR values were no more than 10.00% for bare, wood, garden and scenic lands, in contrast to those for water and tide, urban, rural, rice, wheat and grass lands.

Simple Linear Regression Analysis Using Only Land-Use Data
Simple linear regression analysis was used to investigate the relationship between the original GDEM2 and the calibration data for different land-use types, as shown in Figure 4.The correlation coefficient was as high as 0.950 for the study area as a whole, but varied with land-use type, being higher for mining, scenic, wood and garden lands (r > 0.90) but lower for grass, wheat, rice and water and tide lands (r < 0.5).The lowest correlation coefficient was observed for rice land (r = 0.109).Simple linear regression models were constructed for the entire study area and for each land-use type.The 11 land use-specific regression models were used to calibrate the GDEM2 and generate an improved DEM, termed GDEM2 sr .

Multiple Linear Regression Analysis Using Land-Use and Topographical Data
Prior to modelling, we performed a correlation analysis for ground-truth elevation value Z elev and the independent variables, including GDEM2 elevation value Z GDEM2 , GDEM2 slope, and the dummy variables for land-use and aspect.Table 3 shows a significant correlation between Z elev and the independent factors.Using the analysis results shown above, a multiple linear regression model of Z elev was constructed through stepwise regression with the variables listed in Table 3. Z GDEM2 , the south-west aspect, slope, west aspect, north-east aspect, woodland and garden land entered the model as their regression coefficients were significant based on their t-statistics with associated p-values (Table 4).We calibrated the GDEM2 using the resultant multiple linear regression model to produce a second improved DEM, GDEM2 mr .

Assessment of Improved GDEM2s
After calibrations, the elevation difference (Dcal) between the calibrated GDEM2s (both GDEM2 sl and GDEM2 ml ) and the validation data (that is, the second sample set), and their statistical measures for the two results, were computed and used to assess the two methods (Table 5 and Figure 5).

GDEM2 Errors for Different Land-Use Types
As ASTER GDEM accuracy is influenced by topographical features and land cover, both of which are reflected by land-use [32][33][34], we first examined the elevation difference between the original GDEM2 and the calibration data (that is, the first sample set).The statistical measures of Dorig (Table 2) for our study area demonstrate that such influence varies with land-use type: negative ME values for rural and garden lands but positive ones for other land-use types.In addition, bare land has the greatest and most dispersed GDEM2 errors among all the land-use types.This is likely to be caused by slope and aspect, which influence GDEM2 error [49], because the bare land in Lianyungang is characterized by high elevation, complex terrain and no vegetation coverage.
As its values were dominated by 0 due to the GDEM2 error fluctuating highly around 0, ME cannot well reflect the range of GDEM2 errors.This shortcoming can be rectified by using AME.Comparative analysis of the AME values for different land-use types shows greatest GDEM2 errors observed in bare, wood, garden and scenic lands at high elevations, and smallest errors in mining, water and tide, wheat and rice lands.Similar observations were made for the statistical measures STD and RMSE values.To explore the source of GDEM2 errors for different land-use types, elevation and slope for each land-use type were classified (Table 6).It is evident that most mining, water and tide, wheat and rice lands are flat with low elevations, while bare, wood, garden and scenic lands are undulating or even steep, with higher elevations in comparison with other land-use types (Table 6).This agrees with the reported relationships between topographical features and land-use [32][33][34].According to the statistics shown in Tables 2 and 6, greater absolute errors are found in areas of land-use types for which more sample points have higher slopes and elevations.This finding demonstrates the strong influence of high elevation and complex terrain on ASTER GDEM accuracy.The GDEM2 elevations were more accurate for rice, wheat, grass and mining lands, mainly in flat areas, and less accurate for scenic, garden, wood and bare lands with complex topographical features, consistent with the finding of the ASTER GDEM validation team [49].The FSR is a good indicator of DEM's relative height accuracy, and its combination with other statistical measures gives a comprehensive assessment of both the absolute and relative height accuracies on DEM [20].Table 2 shows that the GDEM2 in flat areas for rice, wheat and grass lands have smaller absolute errors but also lower relative height accuracy (higher FSR values).Undulated/steep areas for bare, garden and wood lands have higher relative height accuracy with lower FSR values, indicating that topography impacts both the absolute and relative height accuracy of GDEM2 errors.
Our findings illustrate the contrasting influence of land-use on the accuracy of ASTER GDEM (Table 2), and justify the calibration of ASTER GDEM using simple linear regression models that consider only the impact of land-use on GDEM2 errors.

Assessment of Improved GDEM2 Data
The high correlation (r = 0.950) between the original GDEM2 and the calibration data (the first sample set) for the entire study area (Figure 4) justifies the use of the simple linear regression models to improve the accuracy of GDEM2.For individual land-use types, many of them have correlation coefficients r > 0.5, except for grass, wheat, rice, and water and tide land.Rice land is generally quite flat (with suitable slopes of less than 3 • [50]) and can be assumed to have constant elevations in a given geographical area (for example, in Lianyungang).Despite a low correlation (r = 0.109), other measures in Table 2 show low absolute errors observed for rice land.It is therefore reasonable to apply simple linear regression to calibrate ASTER GDEM elevations for rice land using GNSS measurements.It is also worth mentioning that the effect of applying the linear regression model for rice land approximates masking, that is, replacing the elevations with a constant value [49] (for rice land, that value is 5.90), as the slope of the regression model is almost 0 (the scatterplot for rice land in Figure 4).The calibrated ASTER GDEM is remarkably improved (see Table 5 and Figure 6A).All the observed statistical measures in ASTER GDEM elevations decreased, particularly ME (which decreased from 2.92 m to −0.98 m), suggesting reduced systematic error for the entire study area.The error range was also considerably reduced, from (−131.42, 225.33) to (−141.74,181.05).Both STD and RMSE decreased as well, which indicates that our simple linear regression calibration method can effectively improve ASTER GDEM accuracy.The multiple linear regression analysis unravels the relationships between GDEM2 error, land-use types, and topographical factors.Table 3 demonstrates higher relative correlation coefficients between ground-truth elevation values (Z Elev ) and the independent variables, including GDEM2 elevation (Z GDEM2 ), slope of GDEM2, and some land-use types (bare, mining, and wood lands).The variables entering the multiple linear regression model gave contrasting significant contributions to the improvement of GDEM2 accuracy according to our stepwise regression analysis (Table 4).Changes in the statistical measures as shown in Tables 2 and 5 prove the multiple linear regression calibration method indeed improved GDEMS accuracy.
Both calibration methods can improve GDEM2 accuracy.However, simple linear regression calibration, which only considered the impact of land-use, returned lower ME, AME and FSR but higher STD and RMSE values than the multiple linear regression calibration, which considered the impacts of both topography and land-use.

Effects of Sampling
To examine the effect of our sampling on the results, we exchanged the datasets for calibration and validation; that is, the second sample set was used for calibration and the first sample set for validation.The statistical measures of the differences between the original GDEM2 and GNSS data based on the second sample set are shown in Table 7, and the differences between the calibrated GDEM2 and GNSS data based on the first sample set are shown in Table 8.   2 and 5, although the STD and RMSE decreased.The simple linear regression method therefore proves capable of improving GDEM2 accuracy.In the assessment of the multiple linear regression method (Tables 7 and 8), all the statistical measures show improvement except for the AME, in inverse validation, which only increased from 15.22 to 16.01 (5.2%).This suggests that multiple linear regression better improves GDEM2 accuracy than simple linear regression, based on the assessment of improved GDEM2 data and the assessment of the sampling effect.
Although various methods have been proposed to improve ASTER GDEM accuracy, many of them-for example the regionalization based on the multi-source data, the denoising algorithm, and the iterative fusion method [51][52][53][54]-were based on the freely obtained DEM data (for example Shuttle Radar Topography Mission (SRTM) DEM), which contains intrinsic errors.The intrinsic errors may be propagated to the calibrated DEM data if the method is not well designed.With more data being included in the fusion process, there are more calculations and higher complexity in improving DEM accuracy.In addition, such a complex process poses a challenge to non-geodesy experts' understanding of its principles, and thus limits its application in environmental research.Both linear regression calibration methods presented in this study are, however, based on statistical analysis.The simple linear regression method is easier to apply than the multiple linear regression method, as it requires less data for modelling.In addition, a land-use classification scheme consisting of the reclassified 11 land-use types presented in this study proves effective at characterizing both vegetation coverage and buildings and can be produced from a cadastral database.Despite the systematic analysis of the ASTER GDEM error sources to improve the accuracy, the simple linear regression calibration method requires few steps for implementation [55].It therefore provides a good reference for other areas with land-use datasets to obtain DEM data with relatively high accuracy.

Conclusions
To improve the accuracy of ASTER GDEM2 used for land use-related geographical and environmental research on the scale of cities or counties, this study used GNSS measurements to calibrate the ASTER GDEM2 through simple and multiple linear regression analysis.Both calibration methods can improve ASTER GDEM2.The simple linear regression method only considers the impact of land-use on elevation error, while the multiple linear regression method takes both land-use and topographical factors into account.Unlike the multiple linear regression calibration method, which requires additional data input in modelling, the simple linear regression calibration method is simple and easy to apply.It creates an opportunity to calibrate ASTER GDEM datasets for areas with detailed land-use data and good field-based elevation measurements.In cases without such data, detailed land-use information can be extracted from high-resolution satellite remote-sensing imagery.
Although more sample points used for calibration can help further improve ASTER GDEM accuracy, this also means a higher cost of acquiring field data.As such, to balance accuracy and cost, users need to determine a suitable number of sample points.It is recommended that more than 57 sample points should be used in the calibration to obtain a statistically acceptable result [56], but we recommend users take into account their available resources and the geographical size of their study area.

Figure 1 .
Figure 1.Study area consisting of Xinpu, Haizhou and Lianyun districts of Lianyungang: (A) GDEM2 at 30 m resolution; (B) land-use map for 2005, provided by Land and Resources Bureau of Lianyungang.Dark purple indicates the coastal areas with 0 value in GDEM2.

Figure 2 .
Figure 2. Land-use specific elevation differences (Dorig) between the original ASTER GDEM and the calibration data (the first sample set).

Figure 3 .
Figure 3. Boxplots illustrating the distribution of Dorig (the difference between the original GDEM2 and the first sample set).

Figure 4 .
Figure 4. Scatterplots of the original ASTER GDEM2 against the calibration data (that is, the first sample set).Land-use specific simple linear regression models (where the asterisk symbol indicates multiplication) and correlation coefficients are shown in each scatterplot.

Figure 6 .
Figure 6.The statistical measures of the elevation errors for the original GDEM2 and calibrated GDEMs (GDEM2 sl and GDEM2 ml ) for the calibration (A) and the effect of sampling (B).

Table 1 .
Frequency distribution of Dorig (the difference between the original GDEM2 and the first sample set).

Table 2 .
Statistical measures of Dorig (the difference between the original GDEM2 and the first sample set).Land-

Table 3 .
The correlation coefficient values (represented by r) and the associated p-values (represented by p) between the elevation value Z Elev and the independent variables.

Table 4 .
The multiple linear regression model of Z elev .

Table 5 .
Statistical measures of the elevation differences between the calibrated GDEM2 and the validation data.S indicates simple linear regression and M multiple linear regression.

Table 6 .
Original GDEM2 elevation and slope classification for each land-use type in the study area.

Table 7 .
Statistical measures of the difference between original GDEM2 and GNSS measurements by using the second sample set for calibration.

Table 8 .
Statistical measures of the difference between the calibrated GDEM2 and GNSS measurements by using the first sample set for validation (S signifies simple linear regression and M multiple linear regression).AME, and FSR measures for the simple linear regression method, listed in Table7, show considerable improvement over those in Table 8 (Figure 6B), despite slight increases in STD (from 27.38 to 28.18) and RMSE (from 27.45 to 28.16).This agrees well with what we observed in Tables