Evaluating the Impact of Multi-Source Digital Elevation Model Quality on Archeological Predictive Modeling: An Integrated Framework Based on Machine Learning and SHAP-Based Interpretability Analysis

Yang, Jia; Zhao, Jianghong; Hao, Pengcheng; Zhang, Aomeng; Li, Xiaopeng; Tu, Ran; Zhang, Zhi

doi:10.3390/rs18060961

Open AccessArticle

Evaluating the Impact of Multi-Source Digital Elevation Model Quality on Archeological Predictive Modeling: An Integrated Framework Based on Machine Learning and SHAP-Based Interpretability Analysis

by

Jia Yang

^1,2,3

,

Jianghong Zhao

^1,*

,

Pengcheng Hao

¹,

Aomeng Zhang

³,

Xiaopeng Li

⁴,

Ran Tu

^2,5 and

Zhi Zhang

^2,5

¹

School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 102616, China

²

International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China

³

School of Surveying and Urban Spatial Information, Henan University of Urban Construction, Pindingshan 467036, China

⁴

School of Atmospheric Sciences, Nanjing University of Information Science & Technology, Nanjing 210044, China

⁵

Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(6), 961; https://doi.org/10.3390/rs18060961

Submission received: 23 January 2026 / Revised: 14 March 2026 / Accepted: 18 March 2026 / Published: 23 March 2026

(This article belongs to the Special Issue GIS and RS for Spatial Documentation, Analysis and Interpretation in Multi-Scale Archaeological Applications)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

High-quality DEMs yield more geologically plausible feature attributions in SHAP analysis, reducing data-induced interpretive bias.
Vertical accuracy, rather than nominal pixel size, is the primary factor controlling the reliability of archeological predictive models. Copernicus DEM, with an RMSE of 2.19 m, and TanDEM-X show the best performance, whereas ASTER and ALOS 12.5 exhibit substantial vertical errors.

What are the implications of the main findings?

DEM selection should prioritize effective terrain realism rather than nominal spatial resolution in remote-sensing-driven archeological applications.
Integrating explainable AI frameworks helps mitigate data-driven interpretive bias and enhances the archeological validity of predictive modeling results, while providing a reproducible framework for rational DEM selection.

Abstract

Digital Elevation Models (DEMs) constitute a core data source for Archeological Predictive Modeling. However, how quality differences among multi-source DEM propagate through complex models and subsequently affect predictive accuracy and geographic interpretation remains insufficiently understood. This study aims to develop an integrated evaluation framework that combines machine learning with SHAP-based interpretability analysis to systematically compare the suitability of mainstream open access DEM products for archeological site prediction. The results indicate that (1) in terms of vertical accuracy, Copernicus DEM and TanDEM-X achieved the best performance, with RMSE values of 2.19 m and 2.31 m, respectively, whereas ASTER exhibited the lowest accuracy (RMSE = 6.44 m) and exaggerated terrain. (2) Regarding model performance, Copernicus DEM-driven models demonstrated the highest robustness, achieving an AUC of 0.966 under the XGBoost algorithm. (3) Interpretability analysis revealed that different DEM products significantly reallocate the importance of key variables such as slope and the Topographic Wetness Index, potentially distorting scientific interpretations of ancient military defensive site-selection patterns. Copernicus DEM is recommended as a priority data source. Moreover, while pursuing higher spatial resolution, equal attention must be paid to vertical accuracy and consistency with geomorphological logic.

Keywords:

digital elevation model; difference analysis; archeological predictive modeling; machine learning; SHAP; terrain factors

1. Introduction

Digital Elevation Models are fundamental spatial datasets for describing the morphology of the Earth’s surface and play a critical role in a wide range of disciplines, including geomorphological analysis, hydrological modeling, hazard assessment, ecological studies, and the spatial simulation of human activities [1,2]. With the rapid advancement of remote sensing technologies, multi-source, multi-scale, and high-precision DEM products have become increasingly accessible, substantially reducing the cost of terrain data acquisition and promoting their widespread application in interdisciplinary research [3]. In archeological studies in particular, DEMs serve as a crucial medium for characterizing the interactions between natural environments and human activities and have consequently become a primary reference dataset [4]. The theoretical foundation of Archeological Predictive Modeling (APM) lies in the dialectical integration of environmental determinism and cultural adaptation theory: ancient human societies, when selecting locations for settlements, burial grounds, or production sites, were profoundly constrained by geographic environmental factors such as topography, landforms, water resources, and microclimatic conditions. Among these environmental variables, DEMs, as a digital abstraction of surface morphology, not only provide elevation information directly but also constitute the primary source for deriving key terrain variables such as slope, aspect, curvature, terrain roughness, and topographic texture. These derivatives have fundamentally driven the transition of archeology from qualitative description toward quantitative and spatially explicit analysis [5,6,7,8].

Although numerous studies have focused on DEM accuracy assessment, such research has largely been concentrated in the fields of physical geography, including hydrological analysis, landslide detection [9], flood simulation [10,11], and geomorphological change monitoring [12]. In these applications, DEM quality is typically evaluated through comparison with high-precision reference datasets using statistical metrics such as Root Mean Square Error, Mean Absolute Error, and the Kappa coefficient. However, these studies tend to emphasize the intrinsic geometric accuracy of DEM while rarely extending the analysis to examine how DEM errors propagate through derived terrain variables and subsequently influence downstream application models [13,14,15,16]. This limitation is particularly evident in archeological predictive modeling, where statistical learning or machine learning methods are increasingly employed, yet the mechanisms by which DEM errors affect model behavior and prediction outcomes remain insufficiently explored.

Current archeological practice in DEM selection exhibits a pronounced empirical bias. ASTER GDEM [17,18,19,20,21,22,23,24] and SRTM DEM [25,26,27,28,29,30], as the earliest globally available medium-resolution products, continue to be widely adopted due to their ease of access and well-established application history. Typical spatial resolutions cluster around 30 m, with a smaller number of studies employing 90 m data for large-scale regional analysis. Based on these DEM [31], researchers commonly derive morphometric variables such as slope, aspect, elevation, curvature, and terrain ruggedness, which are then used as core explanatory variables in machine learning or statistical models to describe correlations between site distributions and topographic conditions. Notably, with the gradual introduction of ALOS PALSAR DEM products (12.5 m) into archeological research over the past decade [32,33], an implicit consensus has emerged suggesting that higher spatial resolution inevitably leads to improved predictive performance. Consequently, many studies have directly treated ALOS PALSAR DEM products as a “natural upgrade” to 30 m ASTER or SRTM datasets [34,35,36], extracting more detailed slope, aspect, and local terrain indices for site probability modeling or spatial analysis, often without rigorous accuracy assessment or error analysis. From the perspective of remote sensing and terrain measurement, however, increased spatial resolution does not necessarily imply concurrent improvements in vertical accuracy, terrain realism, or archeological relevance. Fundamental differences among DEM products in acquisition methods, surface penetration capability, sensitivity to vegetation and built structures, and filtering or interpolation algorithms can be substantially amplified in complex terrain or archeologically sensitive environments, thereby altering the statistical distributions of environmental variables and influencing how models learn relationships between sites and their surrounding environments.

As machine learning approaches become increasingly prevalent in archeological predictive modeling, model complexity and nonlinearity have intensified, rendering the influence mechanisms of DEM quality more obscure. In traditional statistical models, the pathways through which terrain variables affect outcomes are relatively transparent; by contrast, in models such as random forests, gradient boosting, or neural networks, nonlinear feature interactions may either amplify or mask the effects of DEM errors. Consequently, evaluating DEM suitability solely on the basis of final model performance metrics often fails to fully reveal its impact on internal decision-making processes. In particular, as machine learning models progressively replace traditional logistic regression as the dominant APM approach, high predictive accuracy may conceal structural biases induced by differences in input data quality, undermining the interpretability and transferability of results. In recent years, Explainable Artificial Intelligence techniques—especially SHAP (SHapley Additive exPlanations) [37], grounded in cooperative game theory—have demonstrated strong potential for enhancing model transparency in remote sensing research [38]. Temenos et al. [39] proposed a SHAP-integrated deep learning framework that successfully elucidated the attribution logic of complex spectral features in land-cover classification; Lu et al. [40] employed SHAP to quantify the contributions of terrain and environmental variables to desert-type mapping on the Tibetan Plateau, revealing model sensitivity to environmental gradients; and in digital soil mapping, Jeremy et al. [41] used local attribution analysis to identify sources of predictive uncertainty, thereby improving model credibility. However, within the field of spatial archeological predictive modeling, most studies focus on improving predictive accuracy. They often overlook how intrinsic errors in input datasets, especially those from multi-source DEMs, propagate through complex nonlinear models and distort interpretations of cultural heritage site-selection patterns. SHAP analysis helps evaluate differences in feature contributions across DEM products. It also helps verify whether the model’s decision logic aligns with archeological and geographical reasoning [36].

In response to these challenges, this study proposes an integrated evaluation framework for archeological predictive modeling that explicitly addresses quality differences among multi-source DEMs. By systematically comparing the effects of different DEM products on predictive performance and variable attribution structures within a unified study area and modeling workflow, this framework combines machine learning methods with SHAP-based interpretability analysis. Rather than focusing solely on prediction accuracy, the proposed approach reveals how variations in DEM quality reshape environmental factor contributions and model decision logic.

The primary contributions of this study are summarized as follows:

Establishment of a high precision DEM-based evaluation system: Treeless hilly terrain is identified as an ideal experimental setting for DEM assessment. A 1 m resolution DEM generated from a tri-linear camera is adopted as the reference benchmark, enabling precise calibration of mainstream open access DEM products. A Difference of the DEM analytical framework is introduced to support full-coverage DEM evaluation.
Construction of a three-level “data–factor–model” assessment framework: Systematic analysis is conducted across three hierarchical levels—raw DEM quality, consistency of derived terrain variables, and machine learning model performance—to comprehensively evaluate DEM impacts.
Integration of statistical diagnostics and model interpretability in archeological prediction: The influence of open access DEM products on archeological predictive model performance is assessed using SHAP-based interpretability analysis, complemented by Partial Dependence Plots to reveal nonlinear response relationships of key variables.

Overall, this study aims to provide an integrated pathway for archeological predictive research that begins with data quality assessment while simultaneously accounting for model performance and interpretability reliability. It further establishes a more comprehensive and reproducible evaluation paradigm for the rational selection and application of DEM data in archeological studies.

2. Study Area and Data

2.1. Study Area

The study area is located in Jiuquan City, Gansu Province, China, at the westernmost end of the Hexi Corridor, encompassing Dunhuang City, Guazhou County, and Yumen City (Figure 1). Owing to its inland continental location, the region is characterized by an arid climate with extremely low annual precipitation, resulting in extensive Gobi and desert landscapes. Oases and stable water sources historically provided the essential environmental conditions for the construction of the Han Dynasty Great Wall, while the dry climate has contributed to the long-term preservation of its rammed-earth structures. Despite more than a millennium of natural and anthropogenic processes, a large number of Han Great Wall beacon towers remain preserved across the region, making it an ideal area for archeological predictive modeling and DEM quality assessment.

2.2. DEM Datasets

The DEM datasets employed in this study encompass the most commonly used global products derived from synthetic aperture radar and optical stereo imaging. Detailed information regarding spatial resolution, height type, vertical datum and acquisition technology for each product is provided in Table 1.

TanDEM-X is a global DEM produced under the leadership of the German Aerospace Center, based on a twin-satellite bistatic SAR observation system. In this study, the 30 m resolution version is used. The original data were acquired between 2010 and 2015, and the bistatic X-band SAR configuration effectively mitigates temporal decorrelation effects. The Copernicus DEM (COP-DEM) is also derived from the TanDEM-X mission but has undergone extensive post-processing, including void filling and waterbody smoothing. The Shuttle Radar Topography Mission (SRTM) remains a foundational DEM product in archeological research and includes the SRTM1 (1 arc-second, approximately 30 m) and SRTM3 (3 arc-second, approximately 90 m) versions. SRTM data were acquired in February 2000 using C-band radar aboard the Space Shuttle Endeavor. NASADEM represents a reprocessed version of the original SRTM data produced by NASA, in which ASTER GDEM and ICESat laser altimetry data were integrated to correct voids and suppress noise. This reprocessing significantly improves vertical accuracy in complex terrain while maintaining consistency with SRTM in terms of coordinate and vertical reference systems. Optical stereo DEMs are generated by overlapping image pairs acquired from different viewing angles. ASTER GDEM v3, jointly developed by NASA and Japan’s Ministry of Economy, Trade and Industry, is a 30 m resolution product derived from near-infrared stereo imagery acquired by the Terra satellite. Its long data accumulation period introduces systematic biases in vegetated areas. The ALOS World 3D product (AW3D30), provided by the Japan Aerospace Exploration Agency, is generated by downsample global 5 m stereo imagery acquired by the ALOS PRISM sensor between 2006 and 2011 to a 30 m resolution.

In addition, the ALOS 12.5 m DEM is included in this study. It should be noted that this dataset typically refers to a resampled product based on terrain-corrected (RTC) L-band SAR backscatter from the ALOS PALSAR. Although its nominal spatial resolution reaches 12.5 m, the effective representation of original terrain details largely depends on the underlying auxiliary DEM. The inclusion of this dataset aims to evaluate whether high oversampling rates provide tangible benefits for spatial feature extraction in archeological predictive modeling. The inclusion of DEMs with varying resolutions ranging from 12.5 m to 90 m enables a comprehensive assessment of how spatial granularity influences terrain derivatives. In general, coarser-resolution datasets such as SRTM3 with a resolution of 90 m tend to smooth localized elevation variations, which may lead to the underestimation of slope gradients. In contrast, the ALOS 12.5 m DEM provides a higher nominal spatial resolution, although the level of detail it represents in practice is influenced by the auxiliary datasets used during its generation. This diversity in spatial resolution is therefore valuable for evaluating whether datasets with high oversampling rates or those improved through additional post-processing refinements, such as NASADEM and Copernicus DEM, offer measurable advantages in extracting subtle archeological spatial features.

2.3. High-Precision Reference Data

The reference dataset used in this study is a 1 m resolution Digital Surface Model generated from an airborne Three-Line Camera. The DEM was produced using synchronized pushbroom imaging from forward, nadir, and backward-looking linear arrays mounted on an aerial platform, enabling the acquisition of multi-angle surface imagery. Data acquisition was conducted in September 2021, covering Yumen City, Guazhou County, and Dunhuang City, with an east–west extent of approximately 500 km. The flight altitude was approximately 2000 m, and the total flight line length reached nearly 4920 km.

Through high-precision multi-view image matching and dense three-dimensional point cloud reconstruction, the dataset was processed into a regular gridded elevation product. This DEM accurately represents the elevation of surface features such as vegetation and buildings and effectively captures true surface undulations that integrate both natural terrain and anthropogenic structures, thereby providing a highly reliable benchmark for DEM accuracy assessment.

2.4. Archeological Sample Data

Due to their long construction history and the predominance of rammed-earth materials, many beacon towers have gradually disappeared into the surrounding Gobi environment. Therefore, accurately predicting the locations of vanished beacon towers using extant sites carries significant scientific value. Based on the Survey and Research on the Han Frontier in the Hexi Corridor [42], combined with visual interpretation of remote sensing imagery and field investigations, a total of 209 beacon towers were identified and mapped within the study area (Figure 1). The results indicate that the majority of beacon towers are concentrated in Dunhuang and Guazhou, with a smaller number distributed in Yumen.

Confirmed beacon tower locations obtained from archeological field surveys were used to construct the positive sample dataset. Archeological site data typically represent a “presence-only” dataset, lacking definitive absence information. To address this limitation, non-site samples were randomly generated within the study area. To prevent prediction bias caused by class imbalance, the ratio of positive to negative samples was strictly controlled at 1:1. This ensured that the models could learn environmental differences between suitable and unsuitable locations in a balanced manner.

3. Methods

3.1. Overall Workflow

This study constructs an integrated analytical framework that combines machine learning with SHAP-based interpretability analysis to quantify the effects of multi-source DEM quality differences on archeological predictive modeling. The workflow consists of four major stages (Figure 2). First, multiple open access DEM products, including ALOS, SRTM, and TanDEM-X, are selected, and a high-precision DEM generated from airborne TLC imagery is used as the reference benchmark. Data standardization is performed through the unification of coordinate systems, vertical datums, and spatial resolution. Second, qualitative visual comparison and quantitative Difference of DEM analysis are employed to systematically evaluate the spatial consistency of terrain variables. Third, terrain-derived features are used as inputs to six machine learning models, and model robustness under different DEM accuracies is evaluated using metrics such as AUC, F1-score, and overall accuracy. Finally, the SHAP framework is introduced to interpret feature contributions, elucidating the pathways through which DEM errors propagate via terrain variables and influence model decision-making. This approach overcomes the “black-box” limitation of complex models and provides both accuracy assessment and scientifically grounded decision support for archeological site prediction.

3.2. Data Preprocessing

To eliminate systematic biases arising from differences in data acquisition and processing pipelines, multi-source DEM datasets were standardized by unifying spatial and vertical reference systems. This procedure addresses heterogeneity in sensor type, spatial resolution, and geodetic datum, thereby establishing a reliable foundation for DEM quality assessment and predictive modeling. The standardization process consists of three core steps: vertical datum unification, projection and coordinate transformation, and resampling.

3.2.1. Vertical Datum Correction

Global-scale DEM products are referenced to different vertical datums, including ellipsoidal heights relative to the WGS84 reference ellipsoid and orthometric heights referenced to geoid models such as EGM96. Inconsistencies in vertical datums can introduce systematic elevation offsets and reduce the comparability among datasets. To establish a unified vertical reference, the EGM2008 global geoid model released by the National Geospatial-Intelligence Agency was adopted in this study. Geoid undulation values were derived from the 2.5′ × 2.5′ EGM2008 grid for each pixel within the study area. For DEM referenced to ellipsoidal height

h

, orthometric height

H

was computed using

H = h - N

(1)

where

N

represents geoid undulation. For DEM originally referenced to the EGM96 geoid, elevations were further converted to the EGM2008 orthometric height system through geoid difference correction to ensure that all datasets were expressed within a consistent vertical datum. The accuracy of the conversion between ellipsoidal and orthometric heights largely depends on the precision of the adopted geoid model. The global geoid model EGM2008 provides centimeter- to decimeter-level accuracy at regional scales, significantly improving upon earlier models such as EGM96. In regions where geodetic observations are dense and geoid models have been well calibrated, the uncertainty of the geoid height is typically at the centimeter to sub-centimeter level, leading to orthometric height uncertainties on the order of 0.01 to 0.05 m. In contrast, in areas with sparse observations or complex terrain conditions, the errors of global geoid models may increase to several tens of centimeters according to published evaluations and case studies [43].

3.2.2. Projection and Coordinate Transformation

To guarantee isotropic spatial measurement and accurate distance calculation, all datasets were projected into the Universal Transverse Mercator coordinate system (WGS84/UTM Zone 46N). During reprojection, bilinear interpolation was applied to recalculate grid values. This method preserves geometric accuracy while maintaining the continuity of terrain surfaces.

3.2.3. Data Resampling

The DOD analysis requires input datasets to share identical spatial resolution and grid alignment. Considering the trade-off between terrain detail preservation and computational efficiency, a target resolution of 1 m was selected. All DEM datasets were resampled using bilinear interpolation. Compared with the nearest-neighbor method, bilinear interpolation computes weighted averages from the four nearest neighboring cells, effectively reducing blocky artifacts and producing smoother terrain surfaces that better conform to natural geomorphological characteristics. This step is purely a technical prerequisite for grid alignment and should not be interpreted as an enhancement of spatial resolution or an improvement in accuracy. Consistent evidence for this is also provided by the robustness analysis of different resampling schemes presented in Figure S1 of the Supplementary Materials. In subsequent comparative analyses, however, original-resolution DEM datasets were retained for terrain factor derivation and modeling to avoid unnecessary information distortion.

3.3. DEM Quality Assessment

To systematically evaluate differences in terrain representation and geometric accuracy among DEM products, a comprehensive DEM quality assessment framework was developed. This framework integrates both qualitative and quantitative approaches to analyze their potential impacts on archeological predictive modeling. A high-precision airborne TLC DEM was employed as the reference dataset to compare spatial consistency, vertical error characteristics, and the reliability of derived terrain variables across multiple DEM sources.

Qualitative assessment was conducted through visual comparison and terrain profile analysis. Hillshade maps were generated for all DEM datasets using identical illumination parameters, enabling direct comparison of terrain texture, detail preservation, and macroscopic landform expression under consistent lighting conditions. This approach facilitates intuitive identification of differences in microtopographic representation, particularly in complex terrain. Representative profile transects crossing ridges, slopes, and valley bottoms were extracted across the study area. Elevation values along each transect were compared against the reference DEM, allowing direct evaluation of consistency in elevation amplitude, local extrema positions, and overall terrain trends. Overlaying elevation profiles highlights systematic vertical bias or localized anomalies within individual DEM products.

Quantitative assessment combined the DOD analysis with statistical error metrics to characterize vertical error distributions relative to the reference DEM [44]. Pixel-wise subtraction of each DEM from the benchmark DEM, after unifying coordinate systems and spatial resolution, produces DoD raster that explicitly represent spatial patterns of elevation differences. DoD analysis captures not only overall error magnitude but also spatial heterogeneity across different geomorphic units [45]. On this basis, standard vertical accuracy metrics, including Mean Absolute Error and Root Mean Square Error were calculated. MAE was used to detect systematic overestimation or underestimation tendencies, while RMSE quantified overall error magnitude. Given the strong dependence of archeological predictive models on terrain derivatives, the consistency of key terrain factors generated from different DEMs was further evaluated. Elevation, slope, aspect, and topographic wetness index were selected as representative variables. All terrain factors were derived using identical algorithms and parameter settings, and their statistical distributions were compared to assess consistency and deviation in terrain expression.

3.4. Archeological Predictive Modeling Framework

3.4.1. Feature Engineering

The selection of environmental variables is a critical step in constructing predictive models for Han Dynasty beacon towers. Based on high-quality DEM, twelve terrain and environmental variables were derived to comprehensively characterize spatial conditions ranging from macro-scale topographic structure to micro-scale surface morphology. Elevation serves as the most fundamental terrain variable, directly influencing strategic visibility and defensive advantage. Slope and aspect reflect terrain steepness and directional exposure to solar radiation and prevailing winds, both of which affect site stability and signal transmission efficiency.

Profile curvature and plan curvature quantify variations in slope gradients and contour curvature, respectively, enabling detailed characterization of erosion processes and local convex–concave landform properties. These metrics provide geometric support for identifying ridges or terraces with defensive advantages. Advanced statistical and geomorphometric indices were incorporated to further explore terrain influences on beacon tower placement. The Multi-Resolution Ridge Top Flatness Index (MRRTF) and Multi-Resolution Valley Bottom Flatness Index (MRVBF) were used to identify elevated flat areas and low-lying depositional zones, offering strong indicators of whether beacon towers occupied topographically dominant positions. The Topographic Position Index (TPI) effectively distinguishes discrete landform units such as peaks, valleys, and plains by comparing the elevation of a central cell with the mean elevation of its surrounding neighborhood.

Terrain ruggedness was characterized using both the Terrain Ruggedness Index (TRI) and the Vector Ruggedness Measure (VRM), which quantify elevation variability and normal vector dispersion, respectively, thereby revealing surface fragmentation intensity. The Topographic Wetness Index (TWI) and Wind Shelter Index (WSI) were included as proxies for hydrological and climatic conditions. These indices reflect local moisture accumulation potential and terrain-induced wind attenuation, providing essential environmental constraints when considering the living conditions of frontier soldiers and the microclimatic requirements for beacon fire combustion. Spatial distributions of all environmental variables are shown in Figure 3.

To ensure feature independence and minimize the adverse effects of redundant information on model performance, correlation analysis and multicollinearity diagnostics were conducted. A Pearson correlation matrix was constructed to quantitatively assess relationships among the initial feature set. Results indicate an extremely strong linear correlation between slope and TRI (correlation coefficient = 0.96), as well as moderate positive correlations between elevation and both slope and TRI. Variance Inflation Factor analysis yielded consistent findings (Figure 4a), with VIF values for slope and TRI reaching 20.0 and 18.7, respectively far exceeding the commonly accepted threshold of 10. These results indicate that including all original features would substantially inflate parameter variance in ensemble models and undermine the reliability of SHAP-based interpretability analysis.

By removing the highly redundant TRI variable and reassessing feature independence, an optimized feature set was constructed (Figure 4b). After feature selection, overall correlation levels were markedly reduced. Although certain geomorphologically meaningful associations remained, such as between elevation and slope and between profile curvature and TPI correlation, coefficients reduced to 0.65 and 0.69, respectively, which fell within acceptable limits for model training. The most significant improvement was observed in VIF values, with all retained features exhibiting VIF values below 3.0; notably, the VIF of the slope decreased from 20.0 to 2.9. This feature optimization process effectively eliminated latent coupling effects among predictors and enhanced model robustness when handling multi-source DEM inputs.

3.4.2. Machine Learning Models

Six machine learning algorithms commonly used in geographic modeling and classification tasks were selected to comprehensively evaluate the suitability of eight DEM products for beacon tower prediction in the Gansu region:

Logistic Regression (LR): A classical statistical classification model that maps linear combinations of terrain variables to site occurrence probabilities using a sigmoid function.

K-Nearest Neighbors (KNN): A distance-based classifier that assigns class labels based on Euclidean distance in feature space. Its sensitivity to local spatial similarity allows it to capture microtopographic influences on beacon tower placement.

Support Vector Machine (SVM): Employs kernel functions, such as the radial basis function, to project nonlinear features into higher-dimensional space and identify an optimal separating hyperplane that maximizes the margin between positive and negative samples. SVM is particularly suitable for archeological datasets with limited sample sizes.

Artificial Neural Network (ANN): Implemented using a multilayer perceptron architecture, ANN iteratively updates connection weights via backpropagation to model complex nonlinear relationships among terrain variables.

Random Forest (RF): An ensemble learning method that constructs multiple decision trees through bootstrap sampling and aggregates their predictions via majority voting. RF is highly robust to noise and effectively handles uncertainty introduced by different DEM products.

eXtreme Gradient Boosting (XGBoost): An optimized gradient boosting algorithm that incorporates regularization terms to prevent overfitting and uses second-order Taylor expansion to accelerate loss optimization, achieving high efficiency and predictive accuracy in large-scale geospatial applications.

3.4.3. Model Training and Validation Strategy

To ensure objective assessment of DEM effects, a standardized training and optimization strategy was adopted. Known site locations and an equal number of randomly generated non-site samples were combined and split into training 70% and testing 30% sets using stratified random sampling. During training, five-fold cross-validation was employed to minimize random bias associated with data partitioning and enhance result stability. Grid search was subsequently applied to optimize key hyperparameters for each of the six machine learning models, ensuring that all DEM datasets were evaluated under optimal model configurations. This strategy maximized data utilization while providing a robust and reliable evaluation of model performance.

3.5. Evaluation Metrics for Archeological Prediction

3.5.1. Performance Metrics

To comprehensively assess predictive performance across models driven by the eight DEM products, multiple statistical evaluation metrics were employed. Let TP, TN, FP, and FN denote true positives, true negatives, false positives, and false negatives, respectively.

Overall Accuracy: Measures the proportion of correctly classified samples relative to the total sample size.

O A = \frac{T P + T N}{T P + T N + F P + F N}

(2)

F1-score: The harmonic mean of precision and recall, providing a balanced assessment of a model’s ability to correctly identify archeological sites, particularly under class-balanced conditions.

F 1 = \frac{2 \times Precision \times Recall}{Precision + Recall}

(3)

Kappa Coefficient: Quantifies the agreement between predicted classifications and ground truth while accounting for chance agreement, offering a more robust evaluation of classification gain across different DEM inputs.

K a p p a = \frac{P_{o} - P_{e}}{1 - P_{e}}

(4)

ROC Curve and AUC: The Receiver Operating Characteristic curve evaluates model performance across varying classification thresholds, and the Area Under the Curve summarizes overall discriminative ability. AUC values range from 0.5 to 1.0, with higher values indicating stronger robustness and separability between site and non-site classes.

3.5.2. Model Interpretability Analysis

To elucidate the driving mechanisms underlying beacon tower selection from an archeological geographic perspective, an interpretability framework grounded in game theory and marginal effects was introduced to transform “black-box” models into archeologically meaningful decision-support tools.

SHAP (Shapley Additive exPlanations) decomposes model predictions into additive contributions from individual terrain variables by computing Shapley values within a cooperative game-theoretic framework. SHAP values quantify both the magnitude and direction of each variable’s influence on site occurrence probability, thereby revealing how DEM accuracy loss propagates through variable importance redistribution and affects predictive outcomes.

Partial Dependence Plots were used to explore marginal effects and variable interactions. Two-variable PDPs generate two-dimensional response surfaces that reveal nonlinear interactions between key terrain factors, enabling in-depth interpretation of spatial defensive logic and construction strategies underlying beacon tower deployment.

4. Results

4.1. DEM Quality Assessment Results

4.1.1. Qualitative Comparison

Visual comparison between the 1 m resolution high-precision DEM generated from airborne TLC imagery and eight global DEM products reveals the decisive role of spatial resolution in terrain detail representation (Figure 5). The TLC DEM, used as the reference benchmark, exhibits extremely high terrain fidelity and clearly resolves microtopographic features and gully textures. As spatial resolution decreases, terrain smoothing effects become increasingly pronounced. Although ALOS 12.5 m nominally has the highest spatial resolution among the evaluated products, its visual performance in terrain representation is relatively poor. Among the 30 m resolution DEMs, TanDEM-X, Copernicus DEM, and ALOS30 exhibit relatively high consistency with the reference DEM and successfully reproduce major terrain undulations; however, they show evident “valley filling” effects in the representation of fine-scale gullies. In contrast, ASTER GDEM displays conspicuous terrain artifacts and noise, likely resulting from cloud contamination or stereo-matching errors inherent to optical image-based production methods. SRTM3 performs weakest in visual comparison, with terrain features highly generalized and microtopographic signals almost entirely lost, leaving only the most basic ridge outlines discernible.

Cross-sectional profile comparisons (Figure 6) further illustrate performance differences in capturing surface microrelief. Profiles derived from COPDEM, ALOS30, and TanDEM-X most closely follow the TLC reference curve, particularly along ridgelines and valley bottoms with pronounced relief, demonstrating strong spatial fidelity and effective reconstruction of surface geometry. In contrast, ASTER GDEM profiles exhibit pronounced high-frequency “sawtooth” noise, with elevation values oscillating sharply above and below the reference surface. This non-natural fluctuation reflects artifacts introduced by optical stereo-matching algorithms. Although ALOS12.5 possesses higher nominal resolution, its profiles show a clear systematic positive bias and noticeable lag effects at slope transition zones, suggesting excessive smoothing or geometric displacement introduced during interferometric phase processing or resampling. Medium-resolution models such as SRTM1, SRTM3, and NASADEM display pronounced terrain generalization: sharp edges and small terraces evident in the TLC reference are stretched into gentle slopes, producing a “cut-and-fill” effect. SRTM3, constrained by its 90 m sampling interval, almost entirely loses the ability to represent micro-landform units. Vertical spacing between profile curves indicates that, in terrace-edge zones where archeological sites are likely to occur, height deviations and detail loss blur terrain inflection points, foreshadowing uncertainty in subsequent terrain factor derivation.

4.1.2. Quantitative Error Statistics

The DOD analysis between eight global DEM products and the 1 m resolution TLC reference dataset reveals substantial differences in vertical accuracy and terrain representation capability. Statistical indicators (Table 2) exhibit clear stratification in DEM performance. COPDEM demonstrates the highest overall consistency, with an RMSE of only 2.193 m, an MAE of 1.974 m, and a coefficient of determination (R²) as high as 0.9991. TanDEM-X ranks second, with an RMSE of 2.308 m, indicating excellent vertical reliability. NASADEM, as a reprocessed version of SRTM, shows markedly improved accuracy (RMSE = 2.529 m) relative to SRTM1 (3.797 m) and SRTM3 (3.986 m), confirming the effectiveness of its noise suppression and void-filling algorithms.

Notably, despite its nominally higher spatial resolution, ALOS12.5 yields error statistics (RMSE = 3.994 m) comparable to SRTM3, with an MAE (3.545 m) slightly exceeding that of SRTM3, indicating that high-frequency resampling does not substantially enhance elevation accuracy. ASTER GDEM performs worst, with an RMSE of 6.44 m and an MAE of 4.943 m, consistent with the extensive patchy artifacts observed in its spatial error distribution.

Spatial error maps (Figure 7) further elucidate DEM-specific error patterns under varying terrain conditions. COPDEM and TanDEM-X exhibit the most homogeneous error distributions, characterized by minimal random noise and predominantly uniform positive bias attributable to vegetation effects. In areas of strong relief, such as gullies and ridgelines, SRTM-series products and ALOS30 display pronounced terrain-dependent error structures. The error contours clearly delineate geomorphic frameworks, reflecting geometric distortion in medium- to low-resolution InSAR data over steep slopes. NASADEM shows improved spatial continuity relative to SRTM, with substantially reduced speckle noise and enhanced smoothness. ASTER GDEM exhibits strong randomness and discontinuity in error distribution; the superposition of low-frequency noise and high-frequency artifacts obscures microtopographic features, severely limiting its suitability for fine-scale analysis. ALOS12.5 error maps reveal clear interpolation-induced smoothing patterns, indicating that despite rich texture, its effective accuracy remains constrained by the quality of the underlying auxiliary DEM.

4.1.3. Consistency of Derived Terrain Factors

Statistical analysis of terrain factors derived from the eight DEM products reveals distinct differences in how sensors and spatial resolution influence terrain characterization. Although all DEMs exhibit broadly similar trends in macro-topographic patterns, significant heterogeneity emerges in micro-landform representation and in the frequency distributions of specific terrain variables.

Elevation distributions are highly consistent across all eight DEMs, Figure 8a indicating strong reliability of the global DEMs in capturing basic elevation frameworks. In all cases, elevations are concentrated within the 1200–1900 m range, accounting for more than 55% of the study area. ALOS12.5 shows a slightly higher proportion below 1200 m compared to other products, but overall consistency suggests that elevation exerts a relatively stable influence on large-scale archeological predictive modeling.

In contrast, slope distributions differ markedly among DEMs as shown in Figure 8b. In low-slope areas (<2.9°), SRTM3, COPDEM, NASADEM, and SRTM1 exhibit very high frequencies, approximately 65–75%, forming a characteristic “L-shaped” decay distribution. ASTER GDEM displays a pronounced distributional bias, with its peak frequency occurring in the 2.9–8.7° range, approximately 50%, rather than at the lowest slope class, and consistently higher proportions in steeper slope intervals. This pattern indicates elevated noise or overfitting in ASTER’s representation of microtopographic relief, potentially leading to misclassification of low-slope archeological background environments in APM.

Aspect distributions reveal differences in surface texture recognition among sensors as shown in Figure 8c. While all DEMs show consistent multimodal distributions across major aspect classes, with southwest aspects generally dominant, substantial discrepancies arise in the identification of “flat” areas. ALOS12.5 and TanDEM-X identify significantly higher proportions of flat terrain, approximately 10.3% and 7.4%, respectively, whereas ASTER, SRTM3, and NASADEM identify almost none. These differences stem from varying algorithmic definitions of flatness and sensor-specific vertical precision constraints. Regarding distribution smoothness, SRTM-series DEM and COPDEM exhibit relatively stable trends, whereas ASTER and ALOS12.5 display pronounced fluctuations in specific directions such as north and southeast. For archeological predictive models that emphasize orientation preferences, such micro-aspect variability may lead to significant local shifts in predicted probability surfaces.

The Topographic Wetness Index, a compound variable integrating slope and contributing area, is highly sensitive to DEM quality as shown in Figure 8d. All DEM-derived TWI distributions are right-skewed, with peaks concentrated in the 5–10 range. ASTER GDEM exhibits an exceptionally sharp peak exceeding 70% within this interval, followed by a rapid decline at values > 10, indicating extreme behavior in hydrological simulation. SRTM3 and COPDEM show notably higher proportions in the 10–15 interval, suggesting greater sensitivity in identifying flow accumulation zones or potential wetland environments. ALOS12.5, ALOS30, NASADEM, and SRTM1 exhibit strong convergence in their TWI distributions, indicating good interchangeability among medium-resolution DEMs for archeological modeling. The observed dispersion in TWI distributions implies that DEM choice can lead to substantially different suitability assessments for water-proximal archeological sites.

4.2. Performance Comparison of Archeological Predictive Models

To evaluate the performance of different DEM products in archeological predictive modeling, five core metrics were used: Accuracy, Precision, Recall, F1-score, and Area Under the Curve (AUC).

4.2.1. Overall Performance of Machine Learning Models

Comparison of six machine learning algorithms across eight DEM datasets reveals distinct influences of both data quality and model structure on predictive accuracy (Table S1 and Figure 9). Ensemble learning approaches exhibit clear advantages in archeological predictive modeling. Both XGBoost and Random Forest substantially outperform traditional linear or instance-based models across all accuracy-related metrics. Within the Copernicus DEM dataset, the XGBoost model achieves the highest AUC of 0.9658 observed in the entire study. Moreover, its F1 score of 0.8636 and overall accuracy of 0.8571 show strong internal consistency, demonstrating the superior generalization capability of gradient-boosted decision trees when applied to highly nonlinear and class-imbalanced spatial data such as archeological site distributions. The Random Forest model performs comparably to XGBoost, with accuracy values consistently ranging between 0.88 and 0.90 across multiple datasets, including ALOS30 and ASTER, highlighting its strong robustness to noise. In contrast, the KNN algorithm exhibits the weakest performance. For most datasets, its cross-validated AUC remains around 0.80, and its predictive accuracy on the TAN30 dataset is markedly lower than that of the other algorithms. This result reflects the inherent limitations of Euclidean distance-based classification logic in capturing complex terrain-constrained mechanisms. SVM and Logistic Regression demonstrate moderate predictive capability. Although relatively high accuracy can be achieved on specific datasets such as ASTER, noticeable fluctuations in recall suggest that these models are prone to missed detections when identifying low-probability archeological site samples [46].

4.2.2. Sensitivity of Predictive Accuracy to DEM Source

DEM quality and resolution emerge as key determinants of APM accuracy. Cross-comparison among eight DEM products shows that COPDEM performs consistently well across multiple models. Under the XGBoost algorithm, COPDEM achieves the highest overall AUC (0.9658), with similarly strong performance under RF (AUC = 0.9606), underscoring its advantage in representing microtopographic features and archeological spatial relationships. TanDEM-X also demonstrates high stability, achieving an accuracy of 0.8929 and an F1-score of 0.8989 under XGBoost, highlighting the importance of high-quality radar-derived DEM for capturing anthropogenic landform traces.

Despite its higher nominal resolution, ALOS12.5 consistently underperforms relative to ALOS30 and NASADEM. Under the KNN model, ALOS12.5 yields an AUC of only 0.7926, compared to 0.8093 for ALOS30. This pattern suggests that interpolation-induced smoothing may obscure key archeological–geographic signals. SRTM-series DEMs occupy a mid-to-lower tier in overall performance, with SRTM1 frequently producing F1-scores below 0.8, indicating that traditional spaceborne InSAR DEMs are increasingly being surpassed by newer-generation products such as COPDEM and TanDEM-X in fine-scale archeological prediction.

4.3. Results of Model Interpretability Analysis

SHAP-based attribution analysis reveals strong consistency in global driving factor rankings across models despite substantial differences in input DEM as shown in Figure 10. Elevation emerges as the dominant predictor in all eight DEM-based models, consistently ranking first in feature importance. This finding strongly supports the central role of geomorphological determinism in regional archeological predictive modeling: elevation constrains resource accessibility, defensive advantage, and settlement suitability. However, rankings of secondary drivers differ markedly among DEM products, reflecting intrinsic differences in microtopographic representation.

In COPDEM and ALOS30-based models, the Topographic Position Index ranks second, indicating strong capability in capturing relative landform positions such as ridges and terraces—features preferred for ancient military installations. In contrast, NASADEM emphasizes hydrological sensitivity WSI as the second most important factor, while SRTM3 and TanDEM-X substantially amplify the contribution of the VRM. This redistribution of feature importance reflects DEM geomorphological fidelity: high-quality DEM enables models to learn archeologically meaningful terrain-position attributes, whereas noisier or heavily smoothed DEM tend to drive models toward simpler metrics such as ruggedness or hydrological indices. This form of “interpretive bias” cautions that apparent scientific inferences derived from machine learning may be strongly modulated by the quality of underlying raster data.

5. Discussion

5.1. Impacts of DEM Quality on Archeological Research

The spatial granularity of a DEM directly determines the input quality of key terrain variables—such as slope, aspect, and topographic wetness index—in archeological predictive modeling. The TLC 1 m dataset provides rich microtopographic information that is capable of capturing subtle surface modification traces closely associated with human activities. Such features are often reduced to background noise or completely smoothed out in datasets with resolutions of 30 m or coarser. Although the ALOS 12.5 m product nominally offers higher spatial resolution, it is essentially derived through interpolation from the original 30 m data. While this process enhances visual smoothness, it does not genuinely introduce effective high-frequency terrain information. Moreover, generating complex terrain derivatives from high-resolution data leads to exponentially increased computational costs and imposes substantially higher demands on computing hardware.

For archeological predictive modeling, coarse-resolution datasets such as SRTM3 tend to be excessively generalized, which may cause model failure in small-scale site prediction and restrict their applicability to analyses of regional or macro-scale settlement distribution patterns. Therefore, in archeological research, selecting DEM with resolutions appropriately matched to the spatial scale of target site types is critical for balancing computational efficiency and predictive accuracy. The high-frequency noise exhibited by ASTER GDEM introduces a large number of spurious signals during curvature calculation, leading models to falsely interpret the terrain as containing abundant discontinuous undulations. This, in turn, produces fragmented patch patterns in prediction maps and substantially reduces the practical guidance value for archeological survey. Profile analysis reveals a key trend: vertical accuracy is more important than spatial resolution for maintaining geomorphological coherence. Although ALOS30 and SRTM1 show general consistency with the TLC reference in terms of macro-topography, the loss of fine-scale detail in their micro-profile representations limits their applicability in high-resolution archeological prediction.

Differences in accuracy performance among multi-source DEM fundamentally stem from disparities in their underlying remote sensing technologies and post-processing workflows. The ability of COPDEM and TanDEM-X to maintain extremely low random error is primarily attributable to their use of high-frequency X-band interferometric radar technology. The shorter wavelength of X-band radar is more sensitive to surface undulations, and precise baseline control significantly reduces systematic tilt errors. The technical advancement of NASADEM over the original SRTM series highlights the importance of global adjustment using modern auxiliary datasets, such as ICESat laser altimetry—in improving local elevation accuracy [47].

DEM quality directly governs the reliability of environmental covariates [48,49], including slope, TPI, and flow accumulation. Quantitatively higher precision—such as the lower standard deviation observed in COPDEM—means that terrain operators can provide more realistic geographic signals in machine learning models, thereby enhancing the scientific interpretability of environmental drivers in SHAP analyses. In contrast, the use of high-RMSE data sources like ASTER increases the likelihood that inherent artifacts will be misidentified by algorithms as genuine terrain features. This leads to severe overfitting or spatial displacement in predicted site distributions, with non-natural noise fluctuations incorrectly interpreted as defensive landforms or irrigation features associated with ancient human settlement choices. Particularly for archeological sites that depend on microtopographic recognition, such as small beacon towers or ditches, the terrain-correlated errors observed in SRTM and ALOS12.5 may lead to the failure of hydrological simulation operations, thereby hindering accurate assessment of the true spatial relationships between sites and water sources.

By jointly considering statistical accuracy and spatial consistency, this study concludes that Copernicus DEM should be regarded as the preferred foundational terrain dataset for large-scale archeological predictive modeling in regions lacking LiDAR data. NASADEM represents a viable secondary option, especially for large-area analyses under constrained computational resources, where it offers favorable cost–performance efficiency. ASTER and ALOS12.5, although nominally offering high spatial resolution, exhibit insufficient effective accuracy and therefore should be applied with caution in high-precision archeological site selection analyses unless rigorous local calibration is conducted.

5.2. Model Sensitivity and Algorithm–Data Interactions

Figure 11 shows that COPDEM and TAN-30 demonstrate significant superiority over various machine learning models, a phenomenon attributable to differences in their sensor imaging mechanisms and data post-processing algorithms. COPDEM, a global digital elevation model generated based on TerraSAR-X satellite radar interferometry, exhibits extremely high vertical accuracy and spatial consistency in areas with dramatic topographic relief. This enables the model to more accurately capture microtopographic features that are closely related to site selection, including subtle variations in slope and elevation. In ensemble learning models such as XGBoost and Random Forest, the COPDEM dataset achieves the highest AUC values at 0.966 and 0.961, respectively, further demonstrating the advantages of high-quality radar DEMs in reducing vegetation interference and restoring true surface morphology. In contrast, despite its nominally high spatial resolution, ALOS12.5’s prediction performance repeatedly lagged behind the 30 m resolution ALOS30 and NASADEM. This may reveal a key terrain modeling problem: high-resolution data may introduce pseudo-terrain noise during interpolation or lose crucial archeological geomorphological details during over-smoothing, leading to “overfitting” or “feature confusion” when the model extracts terrain discrimination features [50].

In terms of model performance, Figure 11 shows that nonlinear ensemble algorithms RF and XGBoost exhibit stronger generalization ability than traditional statistical models like LR and clustering models when processing heterogeneous DEM data. The RF and XGBoost model maintains an AUC above 0.9 on multiple data sources, including ASTER, TAN30, and NASADEM, and its built-in random feature selection mechanism effectively alleviates the problem of high collinearity among topographic factors. The phenomenon that logistic regression achieves the highest accuracy of 0.9167 on ASTER data reflects that when the DEM quality is sufficient to clearly define the spatial boundaries of the site, simple linear combinations can also achieve excellent classification results. However, the KNN model performs poorly across nearly all DEM schemes, with an AUC of only 0.7926 under the ALOS12 dataset, indicating that nearest-neighbor algorithms based on Euclidean distance struggle to effectively handle complex and non-normally distributed topographic feature vectors in archeological prediction tasks. The instability of neural networks reminds us that in typical small-sample, high-dimensional data scenarios such as archeological sites, deep learning models often face difficulties in parameter optimization and the risk of overfitting, and their dependence on data quality is far greater than that of tree models [51,52].

5.3. Implications for Archeological Interpretation and Decision-Making

The interaction between slope and the Topographic Wetness Index was examined using three-dimensional Partial Dependence Plots. Because beacon towers typically serve specific military defensive and observational functions, their placement often reflects a trade-off between terrain slope and soil moisture conditions. Terrain slope represents construction difficulty and accessibility, while soil moisture indicates water accumulation potential and the availability of strategic resources. Analysis of the 3D interaction surfaces reveals that (Figure 12), although all eight models exhibit broadly similar overall trends—namely, that site probability peaks in areas characterized by high TWI and low slope—the geometric characteristics of the response surfaces differ in subtle yet critical ways. In the COPDEM and ALOS30 datasets, the interaction surfaces display more pronounced gradient variations, forming a steep “prediction plateau” in areas with low TWI of 5–10 and very low slope of 0–5°. This pattern indicates a high degree of certainty in identifying landforms that are both open and flat while offering potential access to water resources. In contrast, ASTER DEM, NASADEM, and SRTM1 DEM exhibit similar response patterns characterized by a strong dependence on slope and relatively limited sensitivity to fluctuations in TWI. For TanDEM-X and ALOS_12.5, the predicted probability surfaces show greater variability, with multiple local peaks emerging across different TWI ranges. This behavior suggests that higher-resolution or more finely processed DEMs are better able to capture nuanced site-selection logic under complex terrain conditions. Due to its relatively coarse spatial resolution, the SRTM3 dataset produces interaction surfaces that appear spatially generalized. While it is capable of capturing broad topographic trends, it is clearly inferior to DEM products with resolutions of 30 m or finer in representing the micro-scale constraints imposed by detailed landform characteristics on site distribution.

The integration of SHAP-based interpretability analysis with partial dependence plots provides a novel dimension for evaluating DEM quality that goes beyond traditional single-metric assessments, namely, process robustness. Experimental results demonstrate that high spatial resolution does not necessarily translate into high predictive performance. In semi-arid regions such as Gansu Province, characterized by high terrain fragmentation and sparse vegetation cover, second-generation global DEM with moderate spatial resolution (12.5–30 m) but optimized vertical accuracy, such as Copernicus DEM, performs best in capturing the interaction logic between slope and terrain wetness. This finding indicates that, for archeological predictive tasks that rely on micro-topographic discrimination, including the identification of Great Wall foundations and beacon tower platforms, the logical representation of terrain gradients in DEM is more critical than simple pixel densification.

These findings highlight an often-overlooked risk in data-driven archeological research: interpretive bias introduced by raster data quality. Even when predictive performance metrics appear acceptable, the underlying explanatory narratives may diverge substantially depending on DEM choice. Consequently, archeological inferences drawn from machine learning models should be evaluated not only on statistical performance but also on the physical plausibility and cultural coherence of their driving variables [53].

5.4. Limitations and Prospects

Several limitations should be acknowledged. First, this study focuses on a single geomorphological and climatic setting—the arid Gobi landscape of the Hexi Corridor. While this setting is representative of many frontier regions in northwestern China, the transferability of results to humid, vegetated, or mountainous environments requires further validation. Second, the predictive models rely primarily on terrain-derived variables, excluding cultural, historical, and logistical factors such as ancient road networks, water infrastructure, and intervisibility relationships among beacon towers. Integrating these variables may further enhance model realism and interpretive depth. Finally, this study focuses primarily on evaluating the impact of multi-source DEM quality on archeological predictive modeling performance. Although the beacon tower dataset used for modeling originates from previously published archeological surveys and has been cross-checked using remote sensing imagery, additional field verification would further enhance the robustness of the results. Due to the methodological focus of this research, extensive field investigations were not conducted within the scope of the present study. Future research should explore multi-scale DEM fusion strategies that combine the vertical accuracy of high-quality global DEM with the local detail of UAV-derived or LiDAR-derived datasets. In addition, temporal analysis of DEM may provide insights into landscape evolution and site preservation processes, offering a dynamic perspective on archeological visibility. Finally, expanding the interpretability framework to include causal inference and spatially explicit explanation methods may further strengthen the linkage between machine learning outputs and archeological theory.

6. Conclusions

Archeological predictive modeling is often constrained by reliance on a single DEM dataset, while the interactive effects between data source quality and modeling algorithms are frequently overlooked. To address this limitation, this study constructs a systematic framework that integrates DEM accuracy assessment, machine learning-based prediction, and SHAP-based interpretability analysis. From a full life-cycle perspective spanning “data source quality” to “model decision logic,” the framework aims to investigate how terrain data errors propagate into archeological predictive models. Based on an empirical analysis of multiple global open access DEM products in the Han Great Wall region of Gansu Province, the following main conclusions are drawn.

DEM vertical accuracy is the primary driver of archeological predictive reliability, rather than nominal spatial resolution alone. Significant accuracy stratification exists among global open access DEMs in their ability to represent microtopography in arid regions. Medium-resolution products with superior vertical accuracy, represented by the Copernicus DEM with an RMSE of 2.19 m, substantially outperform nominally higher-resolution but noisier datasets like ALOS 12.5 m in capturing the key geo-archeological logic underlying beacon tower placement, including preferences for elevated terrain and strategic visibility. Simply increasing pixel density does not substantially improve model performance; instead, predictive effectiveness depends critically on the physical fidelity of the original elevation observations and the authenticity of derived terrain features.
SHAP-based interpretability analysis reveals that low-accuracy DEMs like ASTER GDEM, although capable of achieving statistically high accuracy under certain algorithms, often anchor their decision logic to data noise or terrain artifacts. In contrast, feature contributions derived from high-accuracy DEMs, Copernicus DEM, and TanDEM-X, exhibit much stronger consistency with established archeological site-selection principles, including the interactive responses of slope and elevation. This finding highlights the need for conducting large-scale spatial predictions to remain vigilant against structurally biased decision logic induced by input data errors in order to avoid misleading interpretations of cultural heritage site-selection patterns.
The integrated evaluation framework linking “data quality–model performance–interpretability logic” overcomes the traditional “black-box” limitation of archeological predictive modeling that focuses solely on predictive accuracy. By emphasizing the role of explainable artificial intelligence in validating the geographical logic of models, this framework advances methodological transparency. The use of high-precision airborne TLC DEM as validation benchmarks, in combination with SHAP and DOD analyses, provides end-to-end reliability assurance from data screening to decision interpretation for interdisciplinary spatial archeology.

Beyond its methodological contributions, this study also provides practical guidance for the use of DEM data in interdisciplinary archeological research and cultural heritage management. For archeologists, the proposed framework offers a systematic approach for selecting appropriate DEM datasets and evaluating how terrain data quality influences the reliability of archeological predictive modeling. By explicitly linking DEM accuracy with model performance and interpretability, the framework helps reduce the risk that terrain noise or elevation errors may lead to misleading spatial interpretations of site-selection patterns. For remote sensing and GIS specialists, the results highlight the importance of considering vertical accuracy and terrain representation capability when applying DEM products for geomorphometric analysis and terrain-based spatial modeling. For cultural heritage managers and conservation planners, an improved understanding of DEM reliability can support more informed landscape-scale assessments, including terrain interpretation, risk evaluation, and the prioritization of areas for archeological survey and heritage protection. By integrating DEM accuracy assessment, machine learning prediction, and explainable artificial intelligence, the framework contributes to improving methodological transparency in archeological predictive modeling and provides a reproducible workflow for evaluating how terrain data uncertainty propagates into spatial decision-making processes.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs18060961/s1, Figure S1: Comparison of DOD accuracy at different resolutions; Table S1: Machine Learning Model Accuracy Evaluation results; Table S2: Ranking of Driving Factors by Importance; Table S3: DEM_Accuracy_Summary.

Author Contributions

Conceptualization, J.Y. and J.Z.; methodology, J.Y.; software, A.Z.; validation, R.T., P.H. and Z.Z.; formal analysis, X.L.; investigation, J.Y.; resources, J.Z.; data curation, R.T.; writing—original draft preparation, J.Y.; writing—review and editing, J.Y.; visualization, A.Z.; supervision, J.Y.; project administration, J.Y.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 42471370), the Construction of the China-Central Asia Human and Environment “Belt and Road” Joint Laboratory and Joint Research on Ancient Human Culture and Environment in the Sulh River Basin (Grant number 2022YFE0203800), and the Youth Innovation Promotion Association of CAS (Grant No. 2023135).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

TLC	Three-Line Camera
ALOS12	ALOS PALSAR DEM
COP-DEM	Copernicus DEM
ALOS30	ALOS AW3D30
SRTM	Shuttle Radar Topography Mission
DOD	Difference of DEMs
APM	Archeological predictive model

References

Okolie, C.J.; Smit, J.L. A Systematic Review and Meta-Analysis of Digital Elevation Model (DEM) Fusion: Pre-Processing, Methods and Applications. J. Photogramm. Remote Sens. 2022, 188, 1–29. [Google Scholar] [CrossRef]
Luo, L.; Wang, X.; Guo, H. Transitioning from Remote Sensing Archaeology to Space Archaeology: Towards a Paradigm Shift. Remote Sens. Environ. 2024, 308, 114200. [Google Scholar] [CrossRef]
Luo, L.; Wang, X.; Guo, H.; Jia, X.; Fan, A. Earth Observation in Archaeology: A Brief Review. Int. J. Appl. Earth Obs. Geoinf. 2023, 116, 103169. [Google Scholar] [CrossRef]
Chen, F.; Zhou, W.; Chen, C.; Ma, P. Extended D-TomoSAR Displacement Monitoring for Nanjing (China) City Built Structure Using High-Resolution TerraSAR/TanDEM-X and Cosmo SkyMed SAR Data. Remote Sens. 2019, 11, 2623. [Google Scholar] [CrossRef]
Risbøl, O.; Bollandsås, O.M.; Nesbakken, A.; Ørka, H.O.; Næsset, E.; Gobakken, T. Interpreting Cultural Remains in Airborne Laser Scanning Generated Digital Terrain Models: Effects of Size and Shape on Detection Success Rates. J. Archaeol. Sci. 2013, 40, 4688–4700. [Google Scholar] [CrossRef]
Gouma, M.; Van Wijngaarden, G.J.; Soetens, S. Assessing the Effects of Geomorphological Processes on Archaeological Densities: A GIS Case Study on Zakynthos Island, Greece. J. Archaeol. Sci. 2011, 38, 2714–2725. [Google Scholar] [CrossRef]
Zhou, W.; Chen, F.; Guo, H.; Hu, M.; Li, Q.; Tang, P.; Zheng, W.; Liu, J.; Luo, R.; Yan, K.; et al. UAV Laser Scanning Technology: A Potential Cost-Effective Tool for Micro-Topography Detection over Wooded Areas for Archaeological Prospection. Int. J. Digit. Earth 2020, 13, 1279–1301. [Google Scholar] [CrossRef]
Howland, M.D.; Jones, I.W.N.; Najjar, M.; Levy, T.E. Quantifying the Effects of Erosion on Archaeological Sites with Low-Altitude Aerial Photography, Structure from Motion, and GIS: A Case Study from Southern Jordan. J. Archaeol. Sci. 2018, 90, 62–70. [Google Scholar] [CrossRef]
Xu, F.; Xu, Q.; Pu, C.; Wang, X.; Xu, P. Can Different Machine Learning Methods Have Consistent Interpretations of DEM-Based Factors in Shallow Landslide Susceptibility Assessments? J. Rock Mech. Geotech. Eng. 2025, 17, 7864–7881. [Google Scholar] [CrossRef]
Hu, J.; Zhang, J.; Li, W.; Li, P.; Zhao, G.; Li, S.; Wang, L.; Li, Y.; Li, D.; Du, M. Investigating Sediment Connectivity of a Small Catchment on the Loess Plateau Using an Appropriate Index at an Optimal Spatial Resolution. J. Hydrol. 2025, 661, 133588. [Google Scholar] [CrossRef]
Boulton, S.J.; Stokes, M. Which DEM Is Best for Analyzing Fluvial Landscape Development in Mountainous Terrains? Geomorphology 2018, 310, 168–187. [Google Scholar] [CrossRef]
Xing, Z.; Chi, Z.; Yang, Y.; Chen, S.; Huang, H.; Cheng, X.; Hui, F. Accuracy Evaluation of Four Greenland Digital Elevation Models (DEMs) and Assessment of River Network Extraction. Remote Sens. 2020, 12, 3429. [Google Scholar] [CrossRef]
Erasmi, S.; Rosenbauer, R.; Buchbach, R.; Busche, T.; Rutishauser, S. Evaluating the Quality and Accuracy of TanDEM-X Digital Elevation Models at Archaeological Sites in the Cilician Plain, Turkey. Remote Sens. 2014, 6, 9475–9493. [Google Scholar] [CrossRef]
Becker, D.; De Andrés-Herrero, M.; Willmes, C.; Weniger, G.-C.; Bareth, G. Investigating the Influence of Different DEMs on GIS-Based Cost Distance Modeling for Site Catchment Analysis of Prehistoric Sites in Andalusia. ISPRS Int. J. Geo-Inf. 2017, 6, 36. [Google Scholar] [CrossRef]
Polidori, L.; El Hage, M. Digital Elevation Model Quality Assessment Methods: A Critical Review. Remote Sens. 2020, 12, 3522. [Google Scholar] [CrossRef]
Han, H.; Zeng, Q.; Jiao, J. Quality Assessment of TanDEM-X DEMs, SRTM and ASTER GDEM on Selected Chinese Sites. Remote Sens. 2021, 13, 1304. [Google Scholar] [CrossRef]
Cuthbertson, P.; Ullmann, T.; Büdel, C.; Varis, A.; Namen, A.; Seltmann, R.; Reed, D.; Taimagambetov, Z.; Iovita, R. Finding Karstic Caves and Rockshelters in the Inner Asian Mountain Corridor Using Predictive Modelling and Field Survey. PLoS ONE 2021, 16, e0245170. [Google Scholar] [CrossRef]
Li, L.; Li, Y.; Chen, X.; Sun, D. A Prediction Study on Archaeological Sites Based on Geographical Variables and Logistic Regression—A Case Study of the Neolithic Era and the Bronze Age of Xiangyang. Sustainability 2022, 14, 15675. [Google Scholar] [CrossRef]
Nsanziyera, A.F.; Lechgar, H.; Fal, S.; Maanan, M.; Saddiqi, O.; Oujaa, A.; Rhinane, H. Remote-Sensing Data-Based Archaeological Predictive Model (APM) for Archaeological Site Mapping in Desert Area, South Morocco. Comptes Rendus Geosci. 2018, 350, 319–330. [Google Scholar] [CrossRef]
Wang, Y.; Shi, X.; Oguchi, T. Archaeological Predictive Modeling Using Machine Learning and Statistical Methods for Japan and China. Int. J. Geo-Inf. 2023, 12, 238. [Google Scholar] [CrossRef]
Wang, T.; Zhang, M.; Li, Z. Explainable Machine Learning Links Erosion Damage to Environmental Factors on Gansu Rammed Earth Great Wall. npj Herit. Sci. 2025, 13, 366–385. [Google Scholar] [CrossRef]
Wu, H.; Wang, X.; Wang, X.; Zhang, L.; Dong, S. Predictive Modeling for Neolithic Settlements in the Lingnan Region, South China. J. Archaeol. Sci. Rep. 2023, 49, 103992. [Google Scholar] [CrossRef]
Yan, L.; Lu, P.; Chen, P.; Danese, M.; Li, X.; Masini, N.; Wang, X.; Guo, L.; Zhao, D. Towards an Operative Predictive Model for the Songshan Area during the Yangshao Period. ISPRS Int. J. Geo-Inf. 2021, 10, 217. [Google Scholar] [CrossRef]
Zhu, X.; Chen, F.; Guo, H. A Spatial Pattern Analysis of Frontier Passes in China’s Northern Silk Road Region Using a Scale Optimization BLR Archaeological Predictive Model. Heritage 2018, 1, 15–32. [Google Scholar] [CrossRef]
Guo, Y.; Wang, L.; Huang, J. Evolution of Toponymic Cultural Landscapes in Xinjiang’s Yulongkashi River Basin. npj Herit. Sci. 2025, 13, 428. [Google Scholar] [CrossRef]
Hazra, S. Prediction of Archaeological Potential Site in Middle and Lower Course of Mayurakshi River Basin, Eastern India Using Logistic Regression Model and GIS. Herit. J. Multidiscip. Stud. Archaeol. 2020, 8, 875–890. [Google Scholar]
Koohpayma, J.; Makki, M.; Lentschke, J.; AlaviPanah, S.K. Predicting Potential Locations of Ancient Settlements Using GIS and Weights-of-Evidence Method (Case Study: North-East of Iran). J. Archaeol. Sci. Rep. 2021, 40, 103229. [Google Scholar] [CrossRef]
Vaughn, S.; Crawford, T. A Predictive Model of Archaeological Potential: An Example from Northwestern Belize. Appl. Geogr. 2009, 29, 542–555. [Google Scholar] [CrossRef]
Wang, J.; Soroush, M.; Maziar, S.; Khazraee, E. Predictive Modeling for Targeted Archaeological Survey of Arsacid Period Sites in the Iranian Borderland Region of the Araxes River Valley. J. Archaeol. Method Theory 2026, 33, 2. [Google Scholar] [CrossRef]
Guechi, I.; Gherraz, H.; Korichi, A.; Alkama, D. Predicting Archaeological Sites Locations in Desert Areas, Using GIS-AHP-GeoTOPSIS Model: Southwestern Algeria, Bechar. Archaeologies 2023, 19, 471–499. [Google Scholar] [CrossRef]
Jiao, M.; Li, M.; Lu, L.; Xue, X.; Dai, Z.; Wu, J. Spatiotemporal Distribution of Immovable Cultural Relics and Their Relationships with Geographic Elements in Sichuan–Chongqing Area. npj Herit. Sci. 2025, 13, 425–443. [Google Scholar] [CrossRef]
Diwan, G.A. Gis-Based Comparative Archaeological Predictive Models: A First Application to Iron Age Sites in the Bekaa (Lebanon). Mediterr. Archaeol. Archaeom. 2020, 20, 143. [Google Scholar]
Liu, Y.; Du, X.; Bai, Y.; Chen, Q.; Liu, D. Multiple Bayesian Models Approach to Assessing Drivers of Cultural Heritage Spatial Distribution Illustrated Lushan County in China. npj Herit. Sci. 2025, 13, 409–428. [Google Scholar] [CrossRef]
Wachtel, I.; Zidon, R.; Garti, S.; Shelach-Lavi, G. Predictive Modeling for Archaeological Site Locations: Comparing Logistic Regression and Maximal Entropy in North Israel and North-East China. J. Archaeol. Sci. 2018, 92, 28–36. [Google Scholar] [CrossRef]
Alwi Muttaqin, L.; Heru Murti, S.; Susilo, B. MaxEnt (Maximum Entropy) Model for Predicting Prehistoric Cave Sites in Karst Area of Gunung Sewu, Gunung Kidul, Yogyakarta. In Proceedings of the Sixth Geoinformation Science Symposium, Yogyakarta, Indonesia, 26–27 August 2019. [Google Scholar]
Yang, J.; Luo, L.; Zhao, J.; Ji, D.; Sun, J.; Fan, J.; Fu, X.; Tu, R.; Wang, X. Explainable Artificial Intelligence with Negative Sample Optimization for Archaeological Site Prediction in Surkhandarya Uzbekistan. npj Herit. Sci. 2025, 13, 689. [Google Scholar] [CrossRef]
Höhl, A.; Obadic, I.; Torres, M.Á.F.; Najjar, H.; Oliveira, D.; Akata, Z.; Dengel, A.; Zhu, X.X. Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing. IEEE Geosci. Remote Sens. Mag. 2024, 12, 261–304. [Google Scholar] [CrossRef]
Li, C.; Hong, D.; Zhang, B.; Liao, T.; Yokoya, N.; Ghamisi, P.; Chen, M.; Wang, L.; Benediktsson, J.A.; Chanussot, J. Interpretable Foundation Models as Decryptors Peering into the Earth System. Innovation 2024, 5, 100682. [Google Scholar] [CrossRef]
Temenos, A.; Temenos, N.; Kaselimi, M.; Doulamis, A.; Doulamis, N. Interpretable Deep Learning Framework for Land Use and Land Cover Classification in Remote Sensing Using SHAP. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
Lu, R.; Liu, S.; Duan, H.; Kang, W.; Zhi, Y. Combining the SHAP Method and Machine Learning Algorithm for Desert Type Extraction and Change Analysis on the Qinghai–Tibetan Plateau. Remote Sens. 2024, 16, 4414. [Google Scholar] [CrossRef]
Rohmer, J.; Belbeze, S.; Guyonnet, D. Insights into the Prediction Uncertainty of Machine-Learning-Based Digital Soil Mapping through a Local Attribution Approach. Soil 2024, 10, 679–697. [Google Scholar] [CrossRef]
Wu, R. Investigation and Research on the Han Dynasty Frontier Fortifications in the Hexi Corridor; Cultural Relics Press: Beijing, China, 2005; ISBN 978-7-5010-1756-0. [Google Scholar]
Eshagh, M.; Zoghi, S. Local error calibration of EGM08 geoid using GNSS/levelling data. J. Appl. Geophys. 2016, 130, 209–217. [Google Scholar] [CrossRef]
Belloni, V.; Fugazza, D.; Hanson, K.; Scaioni, M.; Di Rita, M. Assessing Glacier Thickness Changes with Multi-Temporal UAV-Derived DEMs: The Evolution of Forni Glacier over the Period 2014–2022. Int. J. Appl. Earth Obs. Geoinf. 2025, 140, 104547. [Google Scholar] [CrossRef]
Li, D.; Li, P.; Hu, J.; Bai, X.; Latifi, H.; Liu, L.; Yao, W. Mapping Catchment-Scale Soil Erosion and Deposition Using an Improved DoD Method Based on Multitemporal UAV-Borne Laser Scanning. IEEE Trans. Geosci. Remote Sens. 2025, 63, 1–13. [Google Scholar] [CrossRef]
Yu, T.-K.; Chang, I.-C.; Chen, S.-D.; Chen, H.-L.; Yu, T.-Y. Predicting Potential Soil and Groundwater Contamination Risks from Gas Stations Using Three Machine Learning Models (XGBoost, LightGBM, and Random Forest). Process Saf. Environ. Prot. 2025, 199, 107249. [Google Scholar] [CrossRef]
Li, H.; Zhao, J.; Yan, B.; Yue, L.; Wang, L. Global DEMs Vary from One to Another: An Evaluation of Newly Released Copernicus, NASA and AW3D30 DEM on Selected Terrains of China Using ICESat-2 Altimetry Data. Int. J. Digit. Earth 2022, 15, 1149–1168. [Google Scholar] [CrossRef]
Li, W.; Li, P.; Yan, L.; Hu, J.; Wang, L.; Li, D.; Dan, Y.; Huang, L.; Zhao, G. Impacts of Spatial Resolutions of UAV-LiDAR-Derived DEMs on Erosion Modelling in the Hilly and Gully Loess Plateau. CATENA 2025, 255, 109059. [Google Scholar] [CrossRef]
Carter, J.R. The Effect of Data Precision on the Calculation of Slope and Aspect Using Gridded Dems. Cartographica 1992, 29, 22–34. [Google Scholar] [CrossRef]
Zhang, X.; Guo, S.; Xia, Z.; Mu, H.; Wang, B.; Cui, B.; Fang, H.; Du, P. Integrating Multiple Terrain Features for Artefact Detection in the Newly Released TanDEM-X 30 m DEM and DCM over the Loess Plateau. Int. J. Appl. Earth Obs. Geoinf. 2025, 141, 104632. [Google Scholar] [CrossRef]
Tan, L.; Wu, B.; Zhang, Y.; Zhao, S. GIS-Based Precise Predictive Model of Mountain Beacon Sites in Wenzhou, China. Sci. Rep. 2022, 12, 10773. [Google Scholar] [CrossRef] [PubMed]
Yuan, L.; Li, Z.; Wang, Y.; Hao, Z.; Yu, C. Interactions between the Ming Yansui Great Wall Heritage and Geographical Environment via Monte Carlo Simulation. npj Herit. Sci. 2025, 13, 260. [Google Scholar] [CrossRef]
Chen, F.; Zhou, W.; Xu, H.; Parcharidis, I.; Lin, H.; Fang, C. Space Technology Facilitates the Preventive Monitoring and Preservation of the Great Wall of the Ming Dynasty: A Comparative Study of the Qingtongxia and Zhangjiakou Sections in China. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5719–5729. [Google Scholar] [CrossRef]

Figure 1. Location of the study area. ((a,b) are satellite images of two of the beacon towers).

Figure 2. Technical workflow.

Figure 3. Spatial distribution of environmental variables.

Figure 4. Terrain factor screening and optimization. (a) Results of VIF and Pearson correlation analysis for 10 environmental factors. (b) Results of VIF and Pearson correlation analysis after removing the TRI factor.

Figure 5. Visual comparison of DEM products.

Figure 6. Elevation profile comparisons between TLC DEM and different DEM products along the same transect: (a) ALOS12.5; (b) ASTER; (c) NASADEM; (d) TanDEM-X; (e) ALOS30; (f) Copernicus DEM; (g) SRTM1; (h) SRTM3.

Figure 7. DoD error maps.

Figure 8. Distribution comparison of four terrain-derived factors from different DEM products: (a) Elevation; (b) Slope; (c) Aspect; (d) TWI.

Figure 9. Prediction accuracy comparison across machine learning models.

Figure 10. SHAP feature importance summary.

Figure 11. Machine learning AUC heatmap of various DEM.

Figure 12. Diagram of dual-factor interaction.

Table 1. DEM data source.

Dataset Name	Spatial Resolution	Height Type	Vertical Datum	Acquisition Sensor
ALOS PALSAR DEM	12.5 m	Ellipsoidal height	WGS84	L-band SAR (RTC/Resampled)
TanDEM-X	30 m	Ellipsoidal height	WGS84	X-band InSAR
Copernicus DEM	30 m	Orthometric height	EGM2008	X-band InSAR
SRTM1	30 m	Orthometric height	EGM96	C-band InSAR
NASADEM	30 m	Orthometric height	EGM96	C-band InSAR
ASTER GDEM	30 m	Orthometric height	EGM96	VNIR Optical Stereo
ALOS AW3D30	30 m	Orthometric height	EGM96	PRISM Optical Stereo
SRTM3	90 m	Orthometric height	EGM96	C-band InSAR

Table 2. DEM Accuracy Evaluation.

DEM Type	RMSE (m)	MAE (m)	r	R²
ALOS12.5	3.994	3.545	0.998563	0.997128
ALOS30	3.451	3.149	0.999227	0.998454
COP-DEM	2.193	1.974	0.999536	0.999072
SRTM3	3.986	3.509	0.998079	0.996162
SRTM1	3.797	3.351	0.998578	0.997157
NASADEM	2.529	2.056	0.998582	0.997166
ASTER	6.437	4.943	0.989559	0.979227
TANDEM-X	2.308	2.04	0.999365	0.99873

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, J.; Zhao, J.; Hao, P.; Zhang, A.; Li, X.; Tu, R.; Zhang, Z. Evaluating the Impact of Multi-Source Digital Elevation Model Quality on Archeological Predictive Modeling: An Integrated Framework Based on Machine Learning and SHAP-Based Interpretability Analysis. Remote Sens. 2026, 18, 961. https://doi.org/10.3390/rs18060961

AMA Style

Yang J, Zhao J, Hao P, Zhang A, Li X, Tu R, Zhang Z. Evaluating the Impact of Multi-Source Digital Elevation Model Quality on Archeological Predictive Modeling: An Integrated Framework Based on Machine Learning and SHAP-Based Interpretability Analysis. Remote Sensing. 2026; 18(6):961. https://doi.org/10.3390/rs18060961

Chicago/Turabian Style

Yang, Jia, Jianghong Zhao, Pengcheng Hao, Aomeng Zhang, Xiaopeng Li, Ran Tu, and Zhi Zhang. 2026. "Evaluating the Impact of Multi-Source Digital Elevation Model Quality on Archeological Predictive Modeling: An Integrated Framework Based on Machine Learning and SHAP-Based Interpretability Analysis" Remote Sensing 18, no. 6: 961. https://doi.org/10.3390/rs18060961

APA Style

Yang, J., Zhao, J., Hao, P., Zhang, A., Li, X., Tu, R., & Zhang, Z. (2026). Evaluating the Impact of Multi-Source Digital Elevation Model Quality on Archeological Predictive Modeling: An Integrated Framework Based on Machine Learning and SHAP-Based Interpretability Analysis. Remote Sensing, 18(6), 961. https://doi.org/10.3390/rs18060961

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating the Impact of Multi-Source Digital Elevation Model Quality on Archeological Predictive Modeling: An Integrated Framework Based on Machine Learning and SHAP-Based Interpretability Analysis

Highlights

Abstract

1. Introduction

2. Study Area and Data

2.1. Study Area

2.2. DEM Datasets

2.3. High-Precision Reference Data

2.4. Archeological Sample Data

3. Methods

3.1. Overall Workflow

3.2. Data Preprocessing

3.2.1. Vertical Datum Correction

3.2.2. Projection and Coordinate Transformation

3.2.3. Data Resampling

3.3. DEM Quality Assessment

3.4. Archeological Predictive Modeling Framework

3.4.1. Feature Engineering

3.4.2. Machine Learning Models

3.4.3. Model Training and Validation Strategy

3.5. Evaluation Metrics for Archeological Prediction

3.5.1. Performance Metrics

3.5.2. Model Interpretability Analysis

4. Results

4.1. DEM Quality Assessment Results

4.1.1. Qualitative Comparison

4.1.2. Quantitative Error Statistics

4.1.3. Consistency of Derived Terrain Factors

4.2. Performance Comparison of Archeological Predictive Models

4.2.1. Overall Performance of Machine Learning Models

4.2.2. Sensitivity of Predictive Accuracy to DEM Source

4.3. Results of Model Interpretability Analysis

5. Discussion

5.1. Impacts of DEM Quality on Archeological Research

5.2. Model Sensitivity and Algorithm–Data Interactions

5.3. Implications for Archeological Interpretation and Decision-Making

5.4. Limitations and Prospects

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI