Next Article in Journal
Feasibility of Deep Learning-Based Iceberg Detection in Land-Fast Arctic Sea Ice Using YOLOv8 and SAR Imagery
Previous Article in Journal
A Review of Cross-Modal Image–Text Retrieval in Remote Sensing
Previous Article in Special Issue
Interactive, Shallow Machine Learning-Based Semantic Segmentation of 2D and 3D Geophysical Data from Archaeological Sites
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

AC-YOLOv11: A Deep Learning Framework for Automatic Detection of Ancient City Sites in the Northeastern Tibetan Plateau

1
Key Laboratory of Plateau Surface Process and Ecological Conservation, Ministry of Education, Qinghai Normal University, Xining 810016, China
2
College of Geographical Science, Qinghai Normal University, Xining 810016, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(24), 3997; https://doi.org/10.3390/rs17243997
Submission received: 23 October 2025 / Revised: 8 December 2025 / Accepted: 9 December 2025 / Published: 11 December 2025

Highlights

What are the main findings?
  • We developed a novel deep learning framework (AC-YOLOv11) integrating dual-path attention and deformable feature calibration for ancient city detection on the Tibetan Plateau.
  • The model achieved an F1-score of 94.2% and mAP@0.5 of 82.3% on the newly constructed Qinghai Lake Ancient City Dataset (QHACD), identifying 74 highly probable ancient sites.
What are the implications of the main findings?
  • This study demonstrates the feasibility of automated archaeological prospection in high-altitude environments using high-resolution satellite imagery.
  • The framework provides a scalable approach for cultural heritage mapping and environmental adaptation analysis across the Tibetan Plateau and beyond.

Abstract

Ancient walled cities represent key material evidence for early state formation and human–environment interaction on the northeastern Tibetan Plateau. However, traditional field surveys are often constrained by the vastness and complexity of the plateau environment. This study proposes an improved deep learning framework, AC-YOLOv11, to achieve automated detection of ancient city remains in the Qinghai Lake Basin using 0.8 m GF-2 satellite imagery. By integrating a dual-path attention residual network (AC-SENet) with multi-scale feature fusion, the model enhances sensitivity to faint geomorphic and structural features under conditions of erosion, vegetation cover, and modern disturbance. Training on the newly constructed Qinghai Lake Ancient City Dataset (QHACD) yielded a mean average precision (mAP@0.5) of 82.3% and F1-score of 94.2%. Model application across 7000 km2 identified 309 potential sites, of which 74 were verified as highly probable ancient cities, and field investigations confirmed 3 new sites with typical rammed-earth characteristics. Spatial analysis combining digital elevation models and hydrological data shows that 75.7% of all ancient cities are located within 10 km of major rivers or the lake shoreline, primarily between 3500 and 4000 m a.s.l. These results reveal a clear coupling between settlement distribution and environmental constraints in the high-altitude arid zone. The AC-YOLOv11 model demonstrates strong potential for large-scale archaeological prospection and offers a methodological reference for automated heritage mapping on the Qinghai–Tibet Plateau.

1. Introduction

Accurate identification of ancient city sites on the northeastern Tibetan Plateau is fundamental for understanding regional settlement patterns, human–environment interactions, and cultural development at high altitudes [1]. However, the unique geomorphological conditions of this region—including sparse vegetation, severe erosion, periglacial landforms, and large areas of fragmented gravel surfaces—pose significant challenges to traditional archaeological prospection. Conventional survey methods are highly labor-intensive and often limited by the remote and harsh environment of the plateau. With the increasing availability of high-resolution satellite data, remote sensing has become an indispensable tool for site discovery, landscape reconstruction, and cultural heritage monitoring [2,3,4,5,6,7,8]. Yet, complex topography and the subtle surface expressions of archaeological features still restrict the efficiency and objectivity of manual visual interpretation [9].
In recent years, artificial intelligence has led to major advances in automatic archaeological feature detection [4,10]. Deep learning, especially convolutional neural networks (CNNs), has demonstrated strong potential in recognizing earthworks, burial mounds, qanat systems, charcoal kilns, defensive structures, and other anthropogenic features from LiDAR, multispectral, and very-high-resolution optical imagery [11,12,13,14,15,16]. Studies have shown that automated detection can significantly improve the efficiency, repeatability, and spatial coverage of archaeological prospection, enabling large-scale detection across forested, arid, and mountainous regions [17,18,19,20]. Among various algorithms, the YOLO (You Only Look Once) family of object detectors has become one of the most widely used frameworks because of its balance between detection accuracy and computational efficiency [21]. Improved versions such as YOLOv5, YOLOv7, and YOLOv8 have been successfully applied to cultural heritage mapping and LiDAR-based archaeological detection, providing a promising foundation for automated site-recognition pipelines [22,23,24,25,26].
Despite these advances, the northeastern Tibetan Plateau remains understudied in terms of AI-assisted archaeological detection. Compared with typical low-altitude regions, ancient city sites on the plateau often display weak spectral contrast, irregular outlines, multi-scale degradation, and partial burial by aeolian and fluvial sediments. These characteristics reduce the transferability of existing models and lead to frequent false positives when applying conventional YOLO architectures. Moreover, the lack of annotated training samples from high-altitude archaeological contexts further restricts model generalization. As a result, there is still a need for a task-specific detection framework that can accommodate the unique geomorphological and spectral signatures of mountain-plateau archaeological features [27].
To address these challenges, this study proposes AC-YOLOv11, an enhanced deep learning framework designed for automatic detection of ancient city sites using GF-2 and Sentinel-2 imagery. The model integrates improved feature-fusion modules, deformable convolution, and attention mechanisms to enhance sensitivity to subtle linear and polygonal archaeological structures. A multi-source remote sensing dataset was constructed, incorporating field-verified ancient city sites, multi-temporal samples, and negative samples representing complex backgrounds typical of the Qinghai Lake basin. We further optimized anchor assignments and introduced adaptive spatial-context encoding to reduce misclassification in gravel-desert and eroded-terrace environments. Experimental results demonstrate that AC-YOLOv11 significantly outperforms baseline models in precision, recall, and robustness, especially for small and degraded archaeological features. The integration of deep learning with high-resolution imagery provides a practical pathway for large-scale archaeological prospection and heritage protection strategies on the Tibetan Plateau.
This study not only contributes a new methodological framework for site detection in high-altitude regions but also provides a reproducible workflow integrating remote sensing, AI modeling, and archaeological validation [28]. The results highlight the potential of advanced deep learning tools for discovering and monitoring cultural heritage in challenging environments and offer an important reference for future archaeological research and heritage management on the Tibetan Plateau and beyond.

2. Study Area Overview

The Qinghai Lake Basin lies on the northeastern margin of the Qinghai–Tibet Plateau (97°50′–101°20′E, 35°19′–38°20′N), covering approximately 5.6 × 104 km2 and representing China’s largest inland saline lake basin. The terrain exhibits a northwest–southeast gradient in elevation, ranging from about 2600 to 5700 m a.s.l., and includes diverse geomorphic units such as mountains, river valleys, lakeshore plains and terraces. The basin is characterized by a typical cold, high-altitude semi-arid climate: mean annual temperature is approximately −4 °C, and precipitation ranges from 350 to 450 mm, while evaporation greatly exceeds precipitation, forming a classic inland closed dry environment [29]. Vegetation is dominated by alpine meadows and shrublands, which play a vital role in soil retention and constrain the development of ancient agro-pastoral activities. Geographically, the basin sits at the intersection of the East Asian monsoon region, the northwestern arid belt and the high-altitude cold zone, serving as a key node between the Yellow River source region and the Hexi Corridor. This unique setting has endowed the basin with central significance for Holocene human migration and civilizational exchange [30].
Archaeological and chronological evidence indicate that since the Neolithic period, the basin has functioned as an important agro-pastoral transition zone, with settlements mainly distributed along rivers and lakeshores, reflecting strong human dependence on water resources and pastures [31,32]. During historical times, the region evolved into a frontier interaction zone for the Qiang, Xianbei, Tuyuhun, Tubo and Central Plains dynasties, where numerous ancient city sites bear witness to the development of the northern route of the Silk Road [33,34]. Under the combined influence of natural environment and human activities, the distribution of ancient cities in the Qinghai Lake Basin presents a distinct “lakeshore–riparian” pattern: lakeshore plains and river valleys provided favorable conditions for agriculture and animal husbandry, while at the same time offering obvious advantages in defense and transportation.
Climatic and palaeo-environmental records further show that water-level fluctuations of Qinghai Lake responded closely to regional precipitation and temperature variability, highlighting its sensitivity to climate change [35,36]. Landscape-pattern analyses and remote sensing reconstructions reveal that human settlement expansion generally correlated with environmental stability and resource accessibility [37,38]. Overall, the Qinghai Lake Basin serves as a critical window for understanding human–environment interactions on the Tibetan Plateau and for deciphering the spatial patterns of Silk Road traffic networks in the highlands [39]. In this study, a distribution map of ancient city sites in the Qinghai Lake Basin is compiled, marking the main known sites and their geographical locations (Figure 1).

3. Materials and Methods

3.1. Sources and Processing of Ancient City Site Data

The ancient city inventory used in this study was primarily derived from published archaeological survey reports, historical–geographical records and regional gazetteers, and was cross-checked with modern toponyms and geomorphic clues [40]. Data compilation was based on open-access sources, including Atlas of Chinese Cultural Relics: Qinghai Volume and Studies on Ancient Cities in Qinghai (Qinghai Gucheng Kaobian), from which the names, spatial descriptions and coordinate information of sites within the Qinghai Lake Basin were systematically extracted. In total, 37 ancient city sites were identified. To ensure spatial accuracy and consistency, a multi-stage verification process was implemented. (1) Text–image cross-validation: directional semantics (e.g., “near the river,” “on the terrace,” “by the mountain pass”) described in the literature were matched to high-resolution remote sensing imagery. (2) Topographic consistency testing: DEM-derived slope, curvature and shaded-relief layers were analyzed to confirm whether each site was situated in a geomorphically plausible context, such as valley terraces, lacustrine plains or corridor nodes. All spatial data were unified under the WGS 1984 coordinate reference system and converted to standardized vector layers containing fields for site ID, Chinese and English names, cultural period, longitude, latitude, city shape and area. This integration of archaeological literature and remote sensing-based geospatial verification provides a robust and reproducible dataset for subsequent modeling and pattern analysis.

3.2. Remote Sensing Data and Preprocessing

The GF-2 satellite imagery used in this study was provided by the Qinghai Provincial Remote Sensing Center for Natural Resources (https://www.qhgfrs.cn/publiccms/, accessed on 4 June 2025). GF-2 data comprise four multispectral bands (red, green, blue, and near-infrared) with a spatial resolution of 3.2 m and a panchromatic band with 0.8 m resolution. Prior to analysis, all imagery underwent standard preprocessing in ENVI 5.6 (Harris Geospatial Solutions, Broomfield, CO, USA), including radiometric calibration, atmospheric correction, and orthorectification of both multispectral and panchromatic scenes. Subsequently, a nearest-neighbor diffusion-based pansharpening algorithm was applied to fuse the multispectral and panchromatic bands, yielding pan-sharpened products at 0.8 m spatial resolution [41,42,43,44]. This approach preserves spectral fidelity while enhancing spatial detail, which is critical for distinguishing fine geomorphic and archaeological features. Finally, the imagery was clipped and mosaicked to the Qinghai Lake Basin extent using a vector boundary mask derived from 1:100,000 topographic data. The resulting dataset provides high-resolution, radiometrically consistent coverage suitable for archaeological feature extraction and machine-learning-based detection [45].

3.3. Construction of the Ancient City Dataset (QHACD)

The QHACD was constructed using the GF-2 and Sentinel-2 imagery introduced in Section 3.1, combined with the data preprocessing steps described in Section 3.2. The pan-sharpening, geometric correction, and cloud masking operations in Section 3.2 ensured that all input data shared consistent spatial resolution and radiometric quality, which was essential for accurate sample annotation and subsequent model training. After preprocessing, ancient city sites were manually digitized based on GF-2 pan-sharpened images, while negative samples were extracted from non-archaeological areas with similar spectral or morphological characteristics. These procedures produced a standardized dataset suitable for training and evaluating AC-YOLOv11. For each site, image subsets were extracted within a 1.0–2.5 km window centered on the site, depending on city scale and topographic setting. Each subset was divided into uniform 450 m × 450 m (562 × 562 pixels) image tiles, corresponding to an approximate ground area of 57,372 km2 in total. Manual interpretation and annotation were carried out to delineate “Ancient City” as a single object category. Bounding boxes were drawn to cover the complete extent of city walls and associated features. For sites with fragmentary boundaries or dual enclosures, multiple bounding boxes were applied. To minimize false positives, negative samples were also collected from areas with man-made geometric structures resembling city forms—such as modern settlements, agricultural grids, salt pans, industrial fences and quarry boundaries—maintaining a positive/negative ratio of 1:3 (Figure 2).
In total, 283,320 original tiles were generated across the study area, among which 37 contained verified ancient city instances. To mitigate overfitting and class imbalance inherent to the limited sample size, multiple data augmentation strategies were applied: Geometric augmentation: random rotations (0–360°), horizontal and vertical flips, scaling (0.5–1.5×), random cropping and translation; Radiometric augmentation: ±20% variation in brightness, contrast and saturation [46,47]; Multi-temporal expansion: incorporating GF-2 imagery acquired in different seasons and Sentinel-2 scenes from spring, summer and autumn to account for variations in vegetation cover, surface reflectance and snow; Synthetic augmentation: employing Mosaic and MixUp strategies to preserve semantic integrity while diversifying training samples (Figure 3) [48,49,50]. After augmentation, the number of positive samples increased from 37 to 296. A random 20% subset of the enhanced dataset was manually cross-validated by both archaeology and remote sensing specialists, and samples with ambiguous or indistinct boundaries were discarded. The final Qinghai Lake Ancient City Detection Dataset (QHACD) comprises 296 positive and 888 negative samples, totalling 1184 images, and serves as the foundation for subsequent model training and validation.

3.4. Model Improvement Based on YOLOv11

To address the challenges posed by the blurred boundaries, weak linear textures, and fragmented or morphologically complex structures of ancient city remains in GF-2 remote sensing imagery of the Qinghai Lake Basin, this study proposes an enhanced deep residual architecture named AC-SENet (Ancient City Squeeze-and-Excitation Network) [51,52]. The AC-SENet replaces the original backbone of the YOLOv11 detector, aiming to strengthen the extraction of fine morphological features such as city walls, corner junctions, and moat edges. Built upon the ResNet-152 framework, AC-SENet integrates a Dual-Path Attention Block (DPAB) that jointly models channel-wise and spatial dependencies, alongside an adaptive multi-scale pooling strategy to enhance selective feature activation during the feature-fusion process [53,54,55]. These modifications enable the model to capture discriminative geometric information even under variable illumination, vegetation cover, and terrain complexity—conditions typical of high-altitude archaeological landscapes [56,57]. Compared with the baseline YOLOv11 configuration, the proposed AC-SENet introduces dual-path excitation and deformable calibration to expand the receptive field adaptively and emphasize structural continuity across broken walls and irregular urban perimeters [58,59]. The resulting AC-YOLOv11 architecture achieves improved robustness in detecting low-contrast, partially eroded, or geomorphically disturbed ancient sites—laying the foundation for high-precision archaeological mapping over the Qinghai–Tibet Plateau.

3.4.1. Compression Operation of AC-SENet

In the feature-extraction process of ancient city remote sensing imagery, the compression operation aims to achieve efficient feature dimensionality reduction and channel re-weighting, thereby enhancing the model’s sensitivity to key structural components such as city walls, corner junctions and moat edges, as well as the linear geometry of rectangular enclosures. Unlike the conventional Squeeze-and-Excitation Network (SENet), which relies solely on Global Average Pooling (GAP) for feature compression, the proposed AC-SENet introduces a dual-modal pooling strategy that combines Soft Pooling and Global Max Pooling (GMP) to mitigate the loss of locally salient information inherent to GAP [60].
Ancient city features in high-resolution satellite imagery often manifest as localized high-response regions—e.g., compacted-earth wall edges or angular corner nodes—whose responses may be suppressed by GAP’s averaging operation. To address this issue, Soft Pooling is adopted to dynamically weight each pixel’s contribution through an exponential attention mechanism. Given an input feature map R H × W × C , the importance weight w i of each feature value x i is computed using a Softmax function:
w i = e x i j U e x j
Based on this weight distribution, a channel-wise descriptor vector is produced as:
y c = i U w i × x i
This formulation amplifies highly responsive local features via the exponential operator, enabling the network to concentrate on prominent archaeological morphologies such as ramparts and angular boundaries.
To further compensate for the reduced sensitivity of soft pooling to sparse or weakly preserved traces—typical of partially eroded sites—the AC-SENet parallelly incorporates GMP, which retains the maximum activation within each channel m a x ( U c ) , generating a descriptor Z m a x R 1 × 1 × C . This addition enhances spatial invariance and improves robustness against illumination variation and discontinuous boundaries [61]. Finally, the outputs of the two pooling branches are fused via channel-wise concatenation followed by adaptive weighting:
Z fused = α y c + 1 α z m a x , α 0 ,   1
where α 0,1 is a learnable parameter optimized during back-propagation.
The fused representation simultaneously preserves the fine-grained sensitivity of soft pooling and the noise-resilient global response of max pooling, achieving superior feature compression under low-contrast or topographically complex conditions—typical of archaeological landscapes in the Qinghai–Tibet Plateau.

3.4.2. Excitation Operation of AC-SENet

Ancient city remains in remote sensing imagery typically appear as low-contrast, locally high-reflectance, and structurally heterogeneous objects, where subtle variations in wall edges, corner junctions, and moat shadows play decisive roles in visual recognition. To enhance sensitivity to these morphological cues while suppressing background noise, the Dual-Path Excitation Block (DPEB) is embedded within the backbone of AC-SENet to achieve dynamic channel recalibration and spatial modulation in a coordinated manner. The detailed structure of the Dual-Path Excitation Block (DPEB) is illustrated in Figure 4.
Given the compressed feature descriptor Z R 1 × 1 × C obtained from the preceding compression stage, the DPEB first performs channel re-weighting. The input vector is passed through a fully connected layer with parameters W 1 R C r × C (r = 16), producing an intermediate representation: Z m i d R 1 × 1 × C / r , followed by a LeakyReLU activation (slope = 0.2) to alleviate gradient vanishing and preserve informative negative activations. A subsequent expansion layer: W 2 R C × C / r restores dimensionality: A c h a n n e l R 1 × 1 × C . The channel activation vector is normalized through a Hard-Sigmoid gating function to constrain values to the range 0 ,   1 , balancing computational efficiency with non-linear expressiveness [62]. The resulting weight vector S R 1 × 1 × C is then applied to the feature map U R H × W × C via elementwise multiplication, selectively amplifying channels representing ramparts, corners and other structural edges, while suppressing those dominated by agricultural grids, modern infrastructure or shadow artifacts.
To further capture the fine-grained spatial patterns of moats and eroded foundations, the DPEB integrates a spatial attention branch. The recalibrated feature map U r e c a l i b r a t e d is convolved with a depthwise separable convolution (3 × 3) to generate a spatial attention map M s p a t i a l R H × w : M spatial = σ f DWConv U recalibrated , where f DWConv denotes depthwise convolution and σ is the sigmoid activation function. The attention map is multiplied elementwise with U r e c a l i b r a t e d to form the spatially modulated output U D P E B . This dual-path excitation mechanism allows AC-SENet to dynamically adapt to heterogeneous spectral–spatial conditions and degraded archaeological morphologies, substantially improving robustness against erosion, agricultural disturbance and surface sediment coverage [63].

3.4.3. Adjustment Operation of AC-SENet

Based on the multi-scale feature maps generated from the excitation module, the adjustment operation dynamically redistributes weights and fuses features across scales to optimize the model’s adaptability to eroded, incomplete, and morphologically irregular city structures, while enhancing the detection specificity of corners and linear fortification edges. The output feature maps { X 1 , X 2 , X 3 } R 1 × 1 × C correspond to different spatial scales of city-wall structures. Each feature set is processed through a lightweight 1 × 1 convolution to generate scale-wise weight vectors W R 3 × C . A channel-wise Softmax normalization is then applied to dynamically evaluate the contribution of each scale to the current channel response. The high-resolution feature X 1 primarily governs corner and short-edge localization, whereas the low-resolution feature X 3 emphasizes global rectangular contours. The weighted fusion process can be formulated as:
X fused = k = 1 3 w k , c X k
where w k , c denotes the normalized weight of the k -th scale for channel c . Given that most ancient city walls in the Qinghai Lake Basin have undergone severe morphological degradation due to erosion, cultivation, and modern construction, fixed-kernel convolutions are often incapable of accurately capturing their non-rigid and discontinuous boundaries. To address this limitation, a Deformable Feature Calibration (DFC) mechanism (Figure 5) is introduced, which learns sampling offsets Δ p to dynamically adjust the receptive field of the convolution [64,65]. This allows the model to adaptively represent irregular geometric structures such as collapsed or partially eroded city perimeters while maintaining overall spatial consistency. The calibrated feature map X calibrated is subsequently merged with the original input X input through a residual connection, ensuring gradient stability and accelerating convergence:
X f i n a l = X calibrated + X i n p u t
This residual fusion preserves shallow geometric cues while enhancing high-level semantic abstraction. Consequently, the AC-SENet adjustment module effectively balances detail sensitivity and structural robustness, yielding improved detection accuracy for ancient city sites under complex topographic and illumination conditions in the Qinghai–Tibet Plateau [66].

3.4.4. Integration of AC-SENet with the YOLOv11 Framework

After the optimization of AC-SENet, this network was integrated as the backbone of the YOLOv11 (https://github.com/ultralytics/ultralytics) detection framework to construct the AC-YOLOv11 (Ancient-City YOLOv11) model. In this configuration, the original YOLOv11 backbone was replaced by AC-SENet, while the Neck (PANet) and Head (Detection Head) structures were retained to ensure the stability of the detection pipeline. The overall architecture is illustrated in Figure 6. The AC-SENet extracts multi-scale hierarchical features (C3, C4, C5) from the pan-sharpened GF-2 imagery, which are then fused by the Path Aggregation Network (PANet) to enhance the flow of semantic and localization information across different feature levels. The resulting fused features are fed into the YOLOv11 detection head, where anchor-free detection layers predict bounding boxes and confidence scores through decoupled classification and regression branches.
The purpose of integrating AC-SENet into the YOLOv11 framework is to enhance multi-scale feature extraction, strengthen the representation of weak linear and polygonal structures, and reduce false detections in complex geomorphological backgrounds. The dual-path attention mechanism is designed to improve the response to rectangular fortification outlines, corner transitions, and trench-like edges, while the deformable convolution module provides adaptive feature calibration for irregular wall remnants and partially degraded enclosures [67]. In addition, the deep residual structure inherited from ResNet-152 offers strong representational capacity under small-sample conditions typical of archaeological remote sensing tasks. The comparative performance of AC-YOLOv11 and the baseline YOLOv11 is presented in Section 4.

4. Comparative Analysis and Results

4.1. Training Environment and Parameter Configuration

All model training and testing were performed on a local workstation environment running 64-bit Windows 11 (Microsoft Corporation, Redmond, WA, USA). The experiments were implemented using PyTorch 2.1.0 (https://pytorch.org) with the Python 3.9 programming language (https://www.python.org) [68,69]. The hardware configuration consisted of an NVIDIA GeForce RTX 4070 Ti GPU (NVIDIA Corporation, Santa Clara, CA, USA), with CUDA 11.2 support (NVIDIA Corporation, Santa Clara, CA, USA), and the entire workflow was managed within PyCharm 2023.3.7 (JetBrains s.r.o., Prague, Czech Republic).
To ensure the comparability and reproducibility of the results, no pre-trained weights were used, and all hyperparameters were kept fixed across runs. The dataset was divided into training and validation subsets in an 8:2 ratio, and model training was conducted using the parameter settings listed in Table 1.

4.2. Evaluation Metrics

The model performance was evaluated using five standard metrics: loss function, precision, recall, mean average precision (mAP), and the F1 score (Table 2) [70,71,72].

4.3. Ablation Experiments

To verify the effectiveness of the proposed modules for ancient city detection, we conducted a series of ablation experiments on the Qinghai Lake ancient city remote sensing dataset (QHACD). The original YOLOv11 model served as the baseline. Using a controlled-variable approach, we incrementally introduced: (1) a ResNet-152 backbone, (2) a standard SE attention module, (3) the proposed Dual-Path Excitation Block (DPEB), and (4) the Deformable Feature Calibration mechanism. All experiments employed identical hyperparameters and data splits to ensure fair comparison [73]. Training settings were kept consistent: input image size = 450 × 450, initial learning rate = 0.001, optimizer = Adam, batch size = 16, and total epochs = 200. Evaluation metrics included Precision, Recall, F1-score, mAP@0.5 and mAP@0.5:0.95.
As shown in Table 3, the baseline YOLOv11 demonstrates moderate performance (Precision = 88.6%, Recall = 83.1%, mAP@0.5 = 78.5%). Replacing the backbone with ResNet-152 yields overall improvements, particularly in Recall and mAP. Incorporation of a standard SE attention module further increases Precision to 94.5%, indicating that channel attention enhances feature representation. However, SE alone offers limited robustness under complex terrain and noisy backgrounds. By contrast, the proposed DPEB achieves a more balanced enhancement across channel and spatial dimensions, yielding a Precision of 97.5%, Recall of 91.2%, F1 = 94.2%, and mAP@0.5/mAP@0.5:0.95 of 82.3% and 74.9%, respectively. These results demonstrate that the DPEB significantly improves the model’s ability to detect linear edges and corner features of ancient city remains in complex GF-2 imagery, validating the overall effectiveness of the AC-YOLOv11 architecture.
Compared with the baseline YOLOv11, integrating the DPEB module leads to a 9.9% relative increase in F1-score (from 85.7% to 94.2%) and a 3.8 percentage-point gain in mAP@0.5 (from 78.5% to 82.3%), indicating substantial improvements in both detection accuracy and robustness.

4.4. Model Comparison Experiments

To further validate the effectiveness of the proposed AC-YOLOv11 model for ancient city detection, we conducted a systematic comparison with several state-of-the-art object detection frameworks, including YOLOv3, YOLOv4, YOLOv7, and the lightweight EfficientDet-D3 [74]. All models were trained and evaluated under identical hyperparameter and data-split configurations to ensure fairness and comparability [75,76]. Evaluation metrics included Precision, Recall, F1-score, mAP@0.5, and mAP@0.5:0.95. As shown in Table 4, conventional YOLO variants exhibit stable performance in ancient city detection, with YOLOv7 outperforming earlier versions in terms of mean average precision [10]. EfficientDet-D3 demonstrates superior computational efficiency and inference speed but suffers from reduced accuracy under complex terrain, vegetation cover, and shadow interference [77]. In contrast, the proposed AC-YOLOv11 achieves the best results across all metrics, with Precision = 97.5%, Recall = 91.2%, F1 = 94.2%, mAP@0.5 = 82.3%, and mAP@0.5:0.95 = 74.9%. These outcomes clearly indicate that AC-YOLOv11 provides more accurate detection of city-wall segments, corners, and moat features in high-resolution GF-2 imagery, maintaining robustness even in low-contrast or topographically complex environments.
Compared with the best baseline (YOLOv7), AC-YOLOv11 yields a +7.4 percentage-point gain in Precision, +3.1 points in Recall, and a +2.9-point improvement in mAP@0.5, highlighting its superior balance between accuracy and generalization.

4.5. Full-Scale Evaluation and Model Performance

4.5.1. Training and Validation Losses

To comprehensively assess the performance of the proposed AC-YOLOv11 model in ancient city detection, we conducted full-scale training and evaluation on the QHACD. The model was trained using 450 × 450 input patches, the Adam optimizer, and 200 epochs. Both training and validation losses exhibit stable, monotonic convergence without signs of overfitting.
Specifically, the training losses—train/box_loss, train/cls_loss, and train/dfl_loss—decreased from 3.65/7.24/3.79 at epoch 1 to 0.98/0.61/1.89 at epoch 200 (Figure 7). Correspondingly, validation losses—val/box_loss, val/cls_loss, and val/dfl_loss—declined from 3.62/6.46/4.37 to 1.21/0.70/2.04. The most rapid loss reduction occurred within the first 40 epochs, followed by a gradual convergence phase, stabilizing after approximately 100 epochs. The learning rate decayed monotonically from 6.75 × 10−2 to 1.50 × 10−5, consistent with the smooth convergence of loss curves, confirming the model’s stable optimization process.

4.5.2. Model Performance Metrics

Corresponding to the loss convergence trends, all major performance indicators of AC-YOLOv11 show continuous improvement throughout the training process (Figure 8) [78]. The values of Precision, Recall, mAP@0.5, and mAP@0.5:0.95 increased from 0.567/0.047/0.166/0.077 at epoch 1 to 0.956/0.932/0.967/0.843 at epoch 200, with a corresponding F1-score of 0.944. During the initial 20 epochs, the metrics improved rapidly, followed by a gradual enhancement phase between epochs 60 and 120, and stabilization after epoch 120. Several transient peaks were observed: Precision reached 0.983 at epoch 144, Recall peaked at 0.937 at epoch 196, while mAP@0.5 and mAP@0.5:0.95 achieved 0.972 and 0.849 at epoch 188, respectively. Considering both performance stability and generalization consistency, the final converged weights were adopted as the default model for subsequent evaluations.
During model training, the F1-score increased significantly with epoch progression (Figure 9). In the early stage (epochs 1–10), the model remained in the feature-learning phase, with F1 values fluctuating between 0.09 and 0.45. Between epochs 10 and 40, rapid convergence occurred, with F1 rising from 0.45 to 0.77. After epoch 60, the curve entered a steady growth phase, surpassing 0.90 after epoch 100 and reaching its maximum of 0.948 at epoch 188.
The evolution trend of F1 closely mirrors that of Precision and Recall, indicating that the model maintains a good balance between accuracy and completeness in detecting ancient city targets. Minor oscillations in the later epochs reflect ongoing optimization on high-confidence samples. Overall, the continuous rise and eventual stabilization of the F1-score confirm that AC-YOLOv11 achieves excellent convergence and generalization on the QHACD.

4.5.3. Validation Set Detection Results

In the QHACD validation set, the AC-YOLOv11 model achieved high-confidence detection of ancient city sites across most regions. The model not only accurately localized key structural elements such as rammed-earth walls, moats, and corner bastions, but also successfully identified multiple cities and their associated architectural units—such as gate enclosures (wengcheng), citadels, and inner–outer city layouts—within a single GF-2 scene (Figure 10).
Across representative sites in the Qinghai Lake Basin—Shinaihai, Jiangxigou, Ganzihekou, and the Quanji River valley—the model demonstrated high confidence and precise spatial localization results that closely matched archaeological ground-truth positions. Moreover, the AC-YOLOv11 exhibited robust discrimination between archaeological remains and environmental or anthropogenic features such as natural landforms, farmlands, and modern settlements. Even in imagery characterized by strong topographic relief, dense vegetation cover, or uneven illumination, the model maintained consistent detection performance, confirming its robustness to blurred boundaries and weak-texture targets.
The average inference time was approximately 6.2 ms per image (excluding I/O), indicating the model’s computational efficiency and suitability for large-scale archaeological site recognition and rapid regional mapping [79].

5. Discussion

5.1. Application and Detection Results of the Model in the Qinghai Lake Basin

After completing model training and validation, the AC-YOLOv11 model was applied to pan-sharpened GF-2 satellite imagery covering the Qinghai Lake Basin to evaluate its applicability and stability for large-scale archaeological site recognition. The imagery has a spatial resolution of 0.8 m and covers an area of approximately 7000 km2. With a confidence threshold of 0.75 and an input size of 450 × 450 pixels, the model automatically performed inference on all image tiles across the study area.
A total of 309 potential ancient city targets were detected, including complete rectangular enclosures, degraded rammed-earth walls, moat-like linear features, and corner bastion structures (Figure 11). Following manual screening and archaeological background verification, 74 sites were identified as highly probable ancient city remains. These are primarily distributed along the western shore of Qinghai Lake, the Buha River Basin, and the Qiabuqia–Daotang River corridor (Figure 12).
Overall, the AC-YOLOv11 model demonstrated stable performance in complex surface environments. It exhibited strong capability in extracting high-response morphological features such as walls and moats, while maintaining high detection confidence even in areas with surface fragmentation, dense vegetation cover, or modern construction disturbance. Some detection results further revealed the presence of multiple adjacent settlement units—including main fortresses, subsidiary gate enclosures, and satellite settlements—within a single detection window, suggesting the model’s potential for hierarchical settlement system identification in archaeological landscape analysis [80].

5.2. Field Survey and Verification

To validate the accuracy and archaeological relevance of the AC-YOLOv11 detection results, eight high-confidence suspected ancient city sites were selected across the Qinghai Lake Basin for on-site field surveys and artifact sampling. Field investigations included documentation of surface features, collection of pottery and porcelain fragments, geomorphological observations, and comparative analysis with known archaeological sites and historical records. The purpose of the field verification was to evaluate the model’s practical reliability under different landscape conditions and to clarify how well its detections correspond to genuine archaeological features.
Based on the fieldwork and cross-validation with archaeological data, three of the eight locations were confirmed as ancient city remains, one was identified as a peripheral settlement associated with Fusi City, and four sites lacked sufficient surface evidence to be classified as ancient cities (Table 5 and Table 6). Among them, Site 3, southwest of Fusi City, contained rammed-earth sections and sediment accumulations but no decorated gray pottery; it was therefore interpreted as a peripheral settlement of Fusi City. Site 4 preserved low rammed-earth platforms and wall foundations, yielding cord-marked and net-patterned gray pottery identical to finds from the inner Fusi City, confirming its identity as the southeastern segment of Fusi City’s outer-east wall. Site 8, located north of Dalai Mani II City, showed no exposed cultural layers or surface artifacts; however, satellite imagery revealed a clear square-enclosure pattern, and combined with epigraphic clues, the site is tentatively attributed to the Northern and Southern Dynasties period and provisionally named Dalai Mani III and IV Cities [81].
Through analysis of the results, the false-positive cases fall primarily into three categories: (1) natural terraces and fan surfaces with sharp slope breaks or rectilinear erosion edges, which resemble degraded rammed-earth walls in GF-2 imagery; (2) agricultural grids, whose regular rectangular plots and high-contrast boundaries mimic the planimetric patterns of ancient enclosures; (3) abandoned construction sites or industrial platforms, which exhibit artificial right-angled edges and compacted surfaces similar to archaeological ramparts [82].
These misidentifications reveal how AC-YOLOv11 internally prioritizes morphological cues. Visual inspection shows that the model responds strongly to continuous linear edges, corner angles, and banded micro-relief—diagnostic traits of ancient fortified structures. When similar geometric features occur in natural or modern contexts, the network may generalize its learned representations and assign high confidence to non-archaeological targets. This not only explains the specific cases of misclassification but also demonstrates that the model is extracting meaningful archaeological morphology rather than relying on spurious texture patterns.
Overall, the field verification confirms that AC-YOLOv11 is capable of detecting archaeologically significant remains while maintaining robustness in complex geomorphic environments. At the same time, the error analysis highlights the need to further enrich the negative-sample set—especially with terraces, agricultural grids, and modern infrastructures—to enhance the model’s discriminative ability in future iterations.

5.3. Spatial Pattern of Ancient City Distribution and Its Relationship with the Physical Environment

By integrating 37 confirmed ancient cities with 74 highly probable model-identified sites, the spatial analysis reveals a clear coupling between settlement distribution and hydro-geomorphic conditions in the Qinghai Lake Basin [83]. Approximately three-quarters of all sites are located within 10 km of major rivers or the lakeshore, and more than 90% fall within the 3500–4000 m elevation belt with gentle slopes below 2° (Figure 13, Figure 14 and Figure 15). This concentration along low-slope lacustrine plains, river terraces and piedmont fans indicates that ancient polities deliberately selected locations that balanced access to water resources, arable land and pastures with the need for defensive visibility and transport connectivity [84,85].
These patterns echo previous archaeological and palaeo-environmental studies that emphasize the role of the Qinghai Lake Basin as a key agro-pastoral transition zone and a strategic hub along the “Qinghai Road” of the Silk Road [86,87]. The clustering of cities along river corridors such as the Buha River and the Qiabuqia–Daotang River corridor suggests the existence of a corridor-like settlement system linking the interior plateau with the Hexi Corridor and the Central Plains [88]. Against the background of fluctuating lake levels and Holocene climate variability, the observed distribution thus reflects long-term human adaptation to water availability, terrain constraints and inter-regional exchange demands in a high-altitude arid environment [89,90].

6. Conclusions

This study proposes AC-YOLOv11, a deep learning framework specifically designed for the automatic detection of ancient city sites in high-resolution satellite imagery on the northeastern Tibetan Plateau. By integrating a ResNet-152 backbone with squeeze-and-excitation attention, the Dual-Path Excitation Block, and deformable feature calibration, the model effectively enhances the recognition of eroded rectangular enclosures, linear wall segments, and weak geomorphic traces characteristic of archaeological remains in this region. Experimental results demonstrate that AC-YOLOv11 consistently outperforms YOLOv3, YOLOv4, YOLOv7, YOLOv11 and EfficientDet-D3 in both accuracy and robustness, while maintaining an efficient inference time suitable for large-area archaeological prospection.
Application of the model to the Qinghai Lake Basin generated 309 candidate detections, among which 74 were identified as highly probable ancient city remains. Field validation at selected locations confirmed several previously undocumented sites, highlighting the method’s potential to substantially expand archaeological inventories in areas where traditional survey coverage is limited. The spatial analysis further reveals that ancient cities cluster along low-slope lacustrine plains and river corridors within the 3500–4000 m elevation belt, offering new insights into high-altitude settlement strategies and human–environment interactions on the northeastern Tibetan Plateau.
Despite its advantages, the framework remains constrained by the limited number of positive samples and the reliance on high-resolution commercial data. Future work will focus on expanding training datasets through multi-regional collaboration, incorporating multi-temporal or multi-sensor imagery, and exploring semi-supervised learning strategies to enhance adaptability and reduce annotation requirements. Nevertheless, the present results demonstrate that AC-YOLOv11 provides a practical, interpretable and scalable approach for archaeological remote sensing, offering meaningful support for cultural heritage assessment, site monitoring and early-stage prospection in complex plateau environments.

Author Contributions

Conceptualization, X.S. and G.H.; methodology, X.S.; software, X.S.; validation, X.S. and G.H.; formal analysis, X.S.; investigation, X.S.; resources, G.H.; data curation, X.S.; writing—original draft preparation, X.S.; writing—review and editing, G.H.; visualization, X.S.; supervision, G.H.; project administration, G.H.; funding acquisition, G.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (NSFC), Grant No. 42571192, entitled “Lithic technology and its implications for human dispersal and ecological adaptation in the Qaidam Basin since the Late Pleistocene”.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. A portion of the source code and sample data used for model development are openly accessible on GitHub at: https://github.com/zxht0878/AC-YOLOv11-QinghaiLake (accessed on 7 December 2025). Due to restrictions related to cultural heritage protection and data licensing, full-resolution GF-2 satellite imagery and precise archaeological site coordinates are not publicly available. Only representative image samples and model outputs are provided for academic reproducibility and methodological reference.

Acknowledgments

The authors would like to express their sincere gratitude to Fubo Wang (College of Computer Science, Qinghai Normal University) for his valuable guidance on model design and algorithm optimization. We also thank Ying Sang and Tuzheng Chen, students at Qinghai Normal University, for their assistance during the field investigations, and Hongfei Shi, student, for his helpful advice and support in cartographic visualization.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
AbbreviationFull Term
AC-SENetAncient City Squeeze-and-Excitation Network
AC-YOLOv11Ancient City You Only Look Once Version 11
AIArtificial Intelligence
CNNConvolutional Neural Network
DEMDigital Elevation Model
DPEBDual-Path Excitation Block
DPABDual-Path Attention Block
FCFully Connected Layer
FPNFeature Pyramid Network
GAPGlobal Average Pooling
GF-2Gaofen-2 Satellite
GISGeographic Information System
LiDARLight Detection and Ranging
mAPMean Average Precision
OBBOriented Bounding Box
PANetPath Aggregation Network
QHACDQinghai Lake Ancient City Dataset
ReLURectified Linear Unit
ResNetResidual Network
SARSynthetic Aperture Radar
SESqueeze-and-Excitation
UAVUnmanned Aerial Vehicle

References

  1. Cui, C.; Qi, X.; Ouzhu, L.; Wu, T.; Su, B. Prehistoric human settlement history on the Qinghai-Tibet Plateau and the adaptation mechanism of Tibetan population to the plateau hypoxic environment. Plateau Sci. Res. 2017, 1, 76–82. [Google Scholar]
  2. Orengo, H.A.; Garcia-Molsosa, A. A Brave New World for Archaeological Survey: Automated Machine Learning-Based Potsherd Detection Using High-Resolution Drone Imagery. J. Archaeol. Sci. 2019, 112, 105013. [Google Scholar] [CrossRef]
  3. Li, J.; Li, Z.; Shan, P.; Lyu, X.; Tian, Y.; Du, C.; Wang, Y.; Zhao, Y. Adaptive Deformable Convolutional Neural Network Framework for Depression-Related Behavioral Analysis in Mice. Eng. Appl. Artif. Intell. 2025, 159, 111632. [Google Scholar] [CrossRef]
  4. Mitra, B.; Craswell, N. An Introduction to Neural Information Retrieval. Found. Trends Inf. Retr. 2018, 13, 1–126. [Google Scholar] [CrossRef]
  5. Chen, F.; Lasaponara, R.; Masini, N. An Overview of Satellite Synthetic Aperture Radar Remote Sensing in Archaeology: From Site Detection to Monitoring. J. Cult. Herit. 2015, 23, 5–11. [Google Scholar] [CrossRef]
  6. Tapete, D.; Cigna, F. Detection of Archaeological Looting from Space: Methods, Achievements and Challenges. Remote Sens. 2019, 11, 2389. [Google Scholar] [CrossRef]
  7. Licata, M. Good Practice in Archaeological Diagnostics, Non-Invasive Survey of Complex Archaeological Sites; Corsi, C., Splapšak, B., Vermeulen, F., Eds.; Springer International Publishing: Cham, Switzerland, 2016; Volume 2, pp. 1–2. [Google Scholar]
  8. Luo, L.; Wang, X.; Guo, H. Transitioning from Remote Sensing Archaeology to Space Archaeology: Towards a Paradigm Shift. Remote Sens. Environ. 2024, 308, 114200. [Google Scholar] [CrossRef]
  9. Maktav, D.; Crow, J.; Kolay, C.; Yegen, B.; Onoz, B.; Sunar, F.; Coskun, G.; Karadogan, H.; Cakan, M.; Akar, I.; et al. Integration of Remote Sensing and GIS for Archaeological Investigations. Int. J. Remote Sens. 2009, 30, 1663–1673. [Google Scholar] [CrossRef]
  10. Wang, R.; Pang, J.; Han, X.; Xiang, M.; Ning, X. Automated Magnetocardiography Classification Using a Deformable Convolutional Block Attention Module. Biomed. Signal Process. Control 2025, 105, 107602. [Google Scholar] [CrossRef]
  11. Wang, J.; Feng, S.; Cheng, Y. A review of lightweight neural network architectures for deep learning. Comput. Eng. 2021, 47, 1–13. [Google Scholar]
  12. Zeynali, R.; Mandanici, E.; Bitelli, G. A Technical Note on AI-Driven Archaeological Object Detection in Airborne LiDAR Derivative Data, with CNN as the Leading Technique. Remote Sens. 2025, 17, 2733. [Google Scholar] [CrossRef]
  13. Boone, L.W. Aerial and Satellite Remote Sensing for Aboriginal Archaeology: Past, Present and Future. Aust. Archaeol. 2024, 90, 61–63. [Google Scholar] [CrossRef]
  14. Chase, A.F.; Chase, D.Z.; Weishampel, J.F.; Drake, J.B.; Shrestha, R.L.; Slatton, K.C.; Awe, J.J.; Carter, W.E. Airborne LiDAR, Archaeology, and the Ancient Maya Landscape at Caracol, Belize. J. Archaeol. Sci. 2010, 38, 387–398. [Google Scholar] [CrossRef]
  15. Verschoof-van der Vaart, W.B.; Karsten, L. Applying Automated Object Detection in Archaeological Practice: A Case Study from the Southern Netherlands. Archaeol. Prospect. 2021, 29, 15–31. [Google Scholar] [CrossRef]
  16. Soroush, M.; Mehrtash, A.; Khazraee, E.; Ur, J.A. Deep Learning in Archaeological Remote Sensing: Automated Qanat Detection in the Kurdistan Region of Iraq. Remote Sens. 2020, 12, 500. [Google Scholar] [CrossRef]
  17. Sun, E.; Cui, Y.; Liu, P.; Yan, J. A Decade of Deep Learning for Remote Sensing Spatiotemporal Fusion: Advances, Challenges, and Opportunities. arXiv 2025, arXiv:2504.00901. [Google Scholar] [CrossRef]
  18. Jesse, C. Global-Scale Archaeological Prospection Using CORONA Satellite Imagery: Automated, Crowd-Sourced, and Expert-Led Approaches. J. Field Archaeol. 2020, 45, S89–S100. [Google Scholar]
  19. Suh, J.W.; Anderson, E.; Ouimet, W.; Johnson, K.M.; Witharana, C. Mapping Relict Charcoal Hearths in New England Using Deep Convolutional Neural Networks and LiDAR Data. Remote Sens. 2021, 13, 4630. [Google Scholar] [CrossRef]
  20. Cheng, G.; Han, J.; Lu, X. Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef]
  21. Wang, N.; Zhi, M. A review of research on single-stage general object detection algorithms under deep learning. Comput. Sci. Explor. 2025, 19, 1115–1140. [Google Scholar]
  22. Ma, S.; Chun, Q.; Zhang, C.; Li, D.; Zhai, F.; Yuan, Y. Automatic Damage Detection and Localization of Ancient City Walls—A Case Study of the Great Wall. npj Herit. Sci. 2025, 13, 174. [Google Scholar] [CrossRef]
  23. Pierdicca, R.; Paolanti, M.; Matrone, F.; Martini, M.; Morbidoni, C.; Malinverni, E.S.; Frontoni, E.; Lingua, A.M. Point Cloud Semantic Segmentation Using a Deep Learning Framework for Cultural Heritage. Remote Sens. 2020, 12, 1005. [Google Scholar] [CrossRef]
  24. Stott, D.; Kristiansen, S.M.; Sindbæk, S.M. Searching for Viking Age Fortresses with Automatic Landscape Classification and Feature Detection. Remote Sens. 2019, 11, 1881. [Google Scholar] [CrossRef]
  25. Marek, B.; Miroslav, J.; Milan, K.; Tibor, L.; Peter, S.; Tomáš, T. Semantic Segmentation of Airborne LiDAR Data in Maya Archaeology. Remote Sens. 2020, 12, 3685. [Google Scholar]
  26. Tapete, D.; Cigna, F. Trends and Perspectives of Space-Borne SAR Remote Sensing for Archaeological Landscape and Cultural Heritage Applications. J. Archaeol. Sci. Rep. 2016, 14, 716–726. [Google Scholar] [CrossRef]
  27. Alexandre, G.; Marc, L.; Thierry, L.; Laurence, H.M. Combined Detection and Segmentation of Archeological Structures from LiDAR Data Using a Deep Learning Approach. J. Comput. Appl. Archaeol. 2021, 4, 1. [Google Scholar] [CrossRef]
  28. Mithen, S. Understanding Early Civilizations. J. R. Anthropol. Inst. 2006, 12, 683–685. [Google Scholar] [CrossRef]
  29. Ma, J.; Ren, H.L.; Mao, X.; Liu, M.; Wang, T.; Ma, X. Spatiotemporal Evolution Disparities of Vegetation Trends over the Tibetan Plateau under Climate Change. Remote Sens. 2024, 16, 2585. [Google Scholar] [CrossRef]
  30. Chen, F.; Xia, H.; Gao, Y.; Zhang, D.; Yang, X.; Dong, G. The process and stages of prehistoric human exploration, adaptation and settlement on the Qinghai-Tibet Plateau. Geogr. Sci. 2022, 42, 1–14. [Google Scholar]
  31. Hou, G.L.; Wei, H.C.; E, C.Y.; Sun, Y.J. Holocene human activities and environmental changes on the northeastern margin of the Qinghai-Tibet Plateau—A case study of the Jiangxigou No. 2 site at Qinghai Lake. Acta Geogr. Sin. 2013, 68, 380–388. [Google Scholar]
  32. Lü, H. New advances in prehistoric archaeology on the Qinghai-Tibet Plateau in the new era. China Tibetol. 2023, 3, 1–9. [Google Scholar]
  33. Michal, B. Along the Silk Roads in Mongol Eurasia: Generals, Merchants, and Intellectuals; University of California Press: Oakland, CA, USA, 2020. [Google Scholar]
  34. He, Y.; Li, Z.; Yang, X.; Jia, W.; He, X.; Song, B.; Zhang, N.; Liu, Q. Changes of the Hailuogou Glacier, Mt. Gongga, China, against the Background of Global Warming in the Last Several Decades. J. China Univ. Geosci. 2008, 19, 271–281. [Google Scholar] [CrossRef]
  35. Liao, J.; Gao, L.; Wang, X. Numerical Simulation Forecasting of Water Level for Qinghai Lake Using Multi-Altimeter Data Between 2002 and 2012. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 609–622. [Google Scholar] [CrossRef]
  36. Shang, Y.; Lu, R.; Jia, F.; Tian, L.; Tang, Q.; Chen, Y.; Zhao, C.; Wu, W. Paleoclimatic Evolution Indicated by Major Geochemical Elements from Aeolian Sediments on the East of Qinghai Lake. Sci. Cold Arid Reg. 2013, 5, 301–308. [Google Scholar]
  37. Wang, L.; Lu, R.; Ding, Z.; Bai, M. Holocene Aeolian Activity in the Ganzihe Sandy Land, Qinghai Lake Basin. Quat. Int. 2021, 598, 56–65. [Google Scholar] [CrossRef]
  38. Berganzo-Besga, I.; Orengo, H.A.; Lumbreras, F.; Carrero-Pazos, M.; Fonte, J.; Vilas-Estévez, B. Hybrid MSRM-Based Deep Learning and Multitemporal Sentinel 2-Based Machine Learning Algorithm Detects Near 10k Archaeological Tumuli in North-Western Iberia. Remote Sens. 2021, 13, 4181. [Google Scholar] [CrossRef]
  39. Chen, F.; Ding, L.; Piao, S.; Zhou, T.; Xu, B.; Yao, T.; Li, X. The Tibetan Plateau as the Engine for Asian Environmental Change: The Tibetan Plateau Earth System Research into a New Era. Sci. Bull. 2021, 66, 1263–1266. [Google Scholar] [CrossRef] [PubMed]
  40. Michael, E. Smith Form and Meaning in the Earliest Cities: A New Approach to Ancient Urban Planning. J. Plan. Hist. 2007, 6, 3–47. [Google Scholar] [CrossRef]
  41. Li, S.; Li, C.; Kang, X. Current Status and Future Prospect of Multi-Source Remote Sensing Image Fusion. J. Remote Sens. 2021, 25, 148–166. [Google Scholar]
  42. Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable ConvNets v2: More Deformable, Better Results. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
  43. Garima, P.; Umesh, G. Single Image Super-Resolution Using Multi-Scale Feature Enhancement Attention Residual Network. Optik 2021, 231, 166359. [Google Scholar]
  44. Thomas, C.; Ranchin, T.; Wald, L.; Chanussot, J. Synthesis of Multispectral Images to High Spatial Resolution: A Critical Review of Fusion Methods Based on Remote Sensing Physics. IEEE Trans. Geosci. Remote Sens. 2008, 46, 1301–1312. [Google Scholar] [CrossRef]
  45. Zhao, L.; Zhang, J.; Yang, H.; Xiao, C.; Wei, Y. A Multi-Branch Deep Learning Network for Crop Classification Based on GF-2 Remote Sensing. Remote Sens. 2025, 17, 2852. [Google Scholar] [CrossRef]
  46. Shorten, C.; Khoshgoftaar, T.M. A Survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
  47. Sreejam, M.; Agilandeeswari, L. Deep Multimodal Unmixing of Hyperspectral Images Using Convolutional Block Attention Module (CBAM) and LiDAR Features. Egypt. J. Remote Sens. Space Sci. 2025, 28, 666–680. [Google Scholar] [CrossRef]
  48. Wang, S.; Zhao, M.; Dou, R.; Yu, S.; Liu, L.; Wu, N. A Compact High-Quality Image Demosaicking Neural Network for Edge-Computing Devices. Sensors 2021, 21, 3265. [Google Scholar] [CrossRef]
  49. Pham, L.; Thanh, T.H.T.; Tran, D.M.N.; Le, B. Graph Data Augmentation Using Multi-Label Mixup. Knowl. Inf. Syst. 2025, 67, 8751–8766. [Google Scholar] [CrossRef]
  50. Gu, Z.; Gao, Y.; Liu, X. Position-Robust Optronic Convolutional Neural Networks Dealing with Images Position Variation. Opt. Commun. 2022, 505, 127505. [Google Scholar] [CrossRef]
  51. Wang, N.; Gao, Y.; Zhu, X.; Ren, W.; Zou, B.; Fang, T. Anchor-Free Multiscale Dilated Attention for UAV-Based Foreign Object Detection along Railroad Lines. J. Struct. Des. Constr. Pract. 2025, 31, 04025121. [Google Scholar] [CrossRef]
  52. Schmitt, M.; Hänsch, R. Deep Learning for Synthetic Aperture Radar Remote Sensing; Elsevier Inc.: Amsterdam, The Netherlands, 2025; ISBN 978-0-443-36344-3. [Google Scholar]
  53. Muhammad, S.; Gu, Z. Deep Residual Learning for Image Recognition: A Survey. Appl. Sci. 2022, 12, 8972. [Google Scholar] [CrossRef]
  54. Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
  55. Yuan, B.; Du, Y.; Xie, Z.; Chen, S. Squeeze-and-Excitation Networks and the Improved Informer Model for Bearing Fault Diagnosis. Algorithms 2025, 18, 700. [Google Scholar] [CrossRef]
  56. Schumacher, T.; Pugacheva, P.; Allam, H.; Pinero, A.R.; Maier, B.; Rupfle, J.; Helal, K.; Popovych, O.; Hamza, A.G.; Sholqamy, M.; et al. Confirmation of the ScanPyramids North Face Corridor in the Great Pyramid of Giza Using Multi-Modal Image Fusion from Three Non-Destructive Testing Techniques. Sci. Rep. 2025, 15, 9275. [Google Scholar] [CrossRef]
  57. Fu, Y.; Peng, H.; Zhao, T.; Li, Y.; Peng, J.; Zhang, D. Lightweight Remote Sensing Change Detection with Progressive Multi Scale Difference Aggregation. Sci. Rep. 2025, 15, 30203. [Google Scholar] [CrossRef]
  58. Liu, X.; Liu, B. EGFE-Net: An Edge-Guided and Feature Elimination Network for Small Object Detection. Expert Syst. Appl. 2025, 299, 129989. [Google Scholar] [CrossRef]
  59. Elsharkawy, Z.F. Enhanced YOLOv11 Framework for High Precision Defect Detection in Printed Circuit Boards. Sci. Rep. 2025, 15, 42550. [Google Scholar] [CrossRef] [PubMed]
  60. Mekruksavanich, S.; Jitpattanakul, A. Efficient and Explainable Human Activity Recognition Using Deep Residual Network with Squeeze-and-Excitation Mechanism. Appl. Syst. Innov. 2025, 8, 57. [Google Scholar] [CrossRef]
  61. Hamed, A.A.; Hossein, A.; Mohammad, A.; Vinay, R. DTM Extraction from DSM Using a Multi-Scale DTM Fusion Strategy Based on Deep Learning. Remote Sens. Environ. 2022, 274, 113014. [Google Scholar]
  62. Yang, G.Y.; Li, X.L.; Xiao, Z.K.; Mu, T.J.; Martin, R.R.; Hu, S.M. Sampling Equivariant Self-Attention Networks for Object Detection in Aerial Images. IEEE Trans. Image Process. A Publ. IEEE Signal Process. Soc. 2023, 32, 6413–6425. [Google Scholar] [CrossRef] [PubMed]
  63. Zhang, Z.; Cui, P.; Zhu, W. Deep Learning on Graphs: A Survey. IEEE Trans. Knowl. Data Eng. 2020, 34, 249–270. [Google Scholar] [CrossRef]
  64. Hu, X.; Yuan, Y. Deep-Learning-Based Classification for DTM Extraction from ALS Point Cloud. Remote Sens. 2016, 8, 730. [Google Scholar] [CrossRef]
  65. Pang, D.; Shan, T.; Ma, Y.; Ma, P.; Hu, T.; Tao, R. LRTA-SP: Low-Rank Tensor Approximation With Saliency Prior for Small Target Detection in Infrared Videos. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 2644–2658. [Google Scholar] [CrossRef]
  66. Gui, L.; Gu, X.; Huang, F.; Ren, S.; Qin, H.; Fan, C. Road Extraction from Remote Sensing Images Using a Skip-Connected Parallel CNN-Transformer Encoder-Decoder Model. Appl. Sci. 2025, 15, 1427. [Google Scholar] [CrossRef]
  67. Zhang, S.; Yuan, Q.; Li, J.; Sun, J.; Zhang, X. Scene-Adaptive Remote Sensing Image Super-Resolution Using a Multiscale Attention Network. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4764–4779. [Google Scholar] [CrossRef]
  68. Dagal, I. AdaMoment: A Unified Adaptive-Momentum Framework for Robust Learning Rate Optimization. Knowl. -Based Syst. 2025, 332, 114739. [Google Scholar] [CrossRef]
  69. Krill, P. PyTorch Team Unveils Framework for Programming Clusters. 2025. Available online: https://www.infoworld.com/article/4077449/pytorch-team-unveils-framework-for-programming-clusters.html (accessed on 7 December 2025).
  70. Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
  71. Lin, T.-Y.; Maire, M.; Belongie, S.J.; Bourdev, L.D.; Girshick, R.B.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
  72. Everingham, M.; Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
  73. Hu, L.; Zhang, Y.; Zhao, Y.; Wu, T.; Li, Y. Micro-YOLO+: Searching Optimal Methods for Compressing Object Detection Model Based on Speed, Size, Cost, and Accuracy. SN Comput. Sci. 2022, 3, 391. [Google Scholar] [CrossRef]
  74. Ding, P.; Li, T.; Qian, H.; Ma, L.; Chen, Z. A Lightweight Real-Time Object Detection Method for Complex Scenes Based on YOLOv4. J. Real-Time Image Process. 2025, 22, 68. [Google Scholar] [CrossRef]
  75. Guo, D.; Wang, Y.; Zhu, S.; Li, X. A Vehicle Detection Method Based on an Improved U-YOLO Network for High-Resolution Remote-Sensing Images. Sustainability 2023, 15, 10397. [Google Scholar] [CrossRef]
  76. Fiorucci, M.; Verschoof-Van Der Vaart, W.B.; Soleni, P.; Le Saux, B.; Traviglia, A. Deep Learning for Archaeological Object Detection on LiDAR: New Evaluation Measures and Insights. Remote Sens. 2022, 14, 1694. [Google Scholar] [CrossRef]
  77. Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 3–19 June 2020; pp. 10778–10787. [Google Scholar]
  78. Song, M.; Li, L.; Zhao, X.; Wang, J. Multi-Objective Parameter Stochastic Optimization Method for Time-Delayed Integration Optical Remote Sensing System Used for Kelvin Wake Imaging. Appl. Sci. 2025, 15, 11307. [Google Scholar] [CrossRef]
  79. Shao, Z.; Lyu, H.; Yin, Y.; Cheng, T.; Gao, X.; Zhang, W.; Jing, Q.; Zhao, Y.; Zhang, L. Multi-Scale Object Detection Model for Autonomous Ship Navigation in Maritime Environment. J. Mar. Sci. Eng. 2022, 10, 1783. [Google Scholar] [CrossRef]
  80. Hamdani, A. al Protecting and Recording Our Archaeological Heritage in Southern Iraq. Near East. Archaeol. 2008, 71, 221–230. [Google Scholar] [CrossRef]
  81. Ren, X.; Wang, Q.; Li, Y. Bronze Age and Han Dynasty Tombs in Gucheng, Ping’an County, Qinghai Province. Archaeology 2002, 12, 29–37. [Google Scholar]
  82. Hendrickx, M.; Gheyle, W.; Bonne, J.; Bourgeois, J.; Wulf, A.D.; Goossens, R. The Use of Stereoscopic Images Taken from a Microdrone for the Documentation of Heritage–An Example from the Tuekta Burial Mounds in the Russian Altay. J. Archaeol. Sci. 2011, 38, 2968–2978. [Google Scholar] [CrossRef]
  83. Ran, J.; Liu, Y.; Wangdue, S.; Yang, X.; Wang, T.; Gao, Y.; Cao, P.; Tong, Y.; Dai, Q.; Chen, S.; et al. Ancient Genomes Reveal Basal Asian Ancestries and Dynamic Population Interactions over Time on the Southern Tibetan Plateau. iScience 2025, 28, 113676. [Google Scholar] [CrossRef] [PubMed]
  84. Menze, B.H.; Ur, J.A. Mapping Patterns of Long-Term Settlement in Northern Mesopotamia at a Large Scale. Proc. Natl. Acad. Sci. USA 2012, 109, E778–E787. [Google Scholar] [CrossRef]
  85. Yang, H.; Hu, Q.; Zou, Q.; Ai, M.; Zhao, P.; Wang, S. Predicting Ancient City Sites Using GEE Coupled with Geographic Element Features and Temporal Spectral Features: A Case Study of the Neolithic and Bronze Age of the Jianghan Region, China. npj Herit. Sci. 2025, 13, 11. [Google Scholar] [CrossRef]
  86. Mario, L. The Ancient Near East: History, Society and Economy; Routledge: Oxfordshire, UK, 2013; p. 648. [Google Scholar]
  87. Shelach-Lavi, G. The Archaeology of Early China; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
  88. Argounova-Low, T. Landscapes of Movement: Trails, Paths, and Roads in Anthropological Perspective; Snead, J.E., Erickson, C.L., Darling, J.A., Eds.; University of Pennsylvania Press: Philadelphia, PA, USA, 2012; Volume 18, pp. 470–471. [Google Scholar]
  89. Zhong, Q.; Xie, L.; Wu, J. Reimagining Heritage Villages’ Sustainability: Machine Learning-Driven Human Settlement Suitability in Hunan. Humanit. Soc. Sci. Commun. 2025, 12, 661. [Google Scholar] [CrossRef]
  90. Flohr, P.; Bradbury, J.; ten Harkel, L. Tracing the Patterns: Fields, Villages, and Burial Places in Lebanon. Levant 2021, 53, 315–335. [Google Scholar] [CrossRef]
Figure 1. Distribution of ancient walled city sites in the Qinghai Lake Basin. Numbers indicate the locations of major known sites: (1) Yangchang South Ancient City; (2) Yangchang North Ancient City; (3) Hargai Ancient City; (4) Beixiangyang Ancient City; (5) Nanxiangyang Ancient City; (6) Cangkai Ancient City; (7) Xihai Commandery City; (8) Gahai Ancient City; (9) Fusi City; (10) Upper Jiala Ancient City; (11) Lower Jiala Ancient City; (12) Shangtama Ancient City; (13) Shangtama–Xiazhatan Ancient City; (14) Zhengdongba Ancient City; (15) North Zhujianliang Site of Dongba Ancient City; (16) Dongtai Western Han Lankasuo City (Dongba Township); (17) Xitai Gorge-edge Ancient City; (18) Xinsi Back Terrace Ancient City; (19) Shangmeitai Ancient City; (20) Xitai Ancient City (Dongba Brigade); (21) Kecai Ancient City; (22) Zhihai Ancient City; (23) Daoganri Ancient City; (24) Qunke Jiala Ancient City; (25) Western Extension of Qunke Jiala; (26) Eastern Extension of Qunke Jiala; (27) Southeastern Extension of Qunke Jiala; (28) Eastern Architectural Site of Qunke Jiala; (29) Hei (Black) Ancient City; (30) Nahailie Ancient City; (31) General’s Temple Ancient City; (32) Hudong Yangchang Ancient City; (33) Dacang Jianggai Ancient City; (34) Haixinshan Ancient City (Yinglong City); (35) Jinquan Ancient City; (36) Dalai Mani Ancient City No. 1; (37) Dalai Mani Ancient City No. 2.
Figure 1. Distribution of ancient walled city sites in the Qinghai Lake Basin. Numbers indicate the locations of major known sites: (1) Yangchang South Ancient City; (2) Yangchang North Ancient City; (3) Hargai Ancient City; (4) Beixiangyang Ancient City; (5) Nanxiangyang Ancient City; (6) Cangkai Ancient City; (7) Xihai Commandery City; (8) Gahai Ancient City; (9) Fusi City; (10) Upper Jiala Ancient City; (11) Lower Jiala Ancient City; (12) Shangtama Ancient City; (13) Shangtama–Xiazhatan Ancient City; (14) Zhengdongba Ancient City; (15) North Zhujianliang Site of Dongba Ancient City; (16) Dongtai Western Han Lankasuo City (Dongba Township); (17) Xitai Gorge-edge Ancient City; (18) Xinsi Back Terrace Ancient City; (19) Shangmeitai Ancient City; (20) Xitai Ancient City (Dongba Brigade); (21) Kecai Ancient City; (22) Zhihai Ancient City; (23) Daoganri Ancient City; (24) Qunke Jiala Ancient City; (25) Western Extension of Qunke Jiala; (26) Eastern Extension of Qunke Jiala; (27) Southeastern Extension of Qunke Jiala; (28) Eastern Architectural Site of Qunke Jiala; (29) Hei (Black) Ancient City; (30) Nahailie Ancient City; (31) General’s Temple Ancient City; (32) Hudong Yangchang Ancient City; (33) Dacang Jianggai Ancient City; (34) Haixinshan Ancient City (Yinglong City); (35) Jinquan Ancient City; (36) Dalai Mani Ancient City No. 1; (37) Dalai Mani Ancient City No. 2.
Remotesensing 17 03997 g001
Figure 2. Examples of positive and negative samples in the Qinghai Lake Ancient City Dataset (QHACD). (ad) represent positive samples of confirmed ancient city sites: (a) Dachang Jianggai Ancient City, (b) Gala Upper Ancient City, (c) Xihai Jun Ancient City, and (d) Fusi City. (eh) represent negative samples collected from non-archaeological areas with similar morphological or spectral characteristics, including (e) salt-lake textures, (f) agricultural grids, (g) modern industrial facilities, and (h) contemporary settlements. All samples are derived from 0.8 m GF-2 pan-sharpened imagery and annotated as part of the QHACD to enhance model discrimination between archaeological and modern anthropogenic features. Green bounding boxes indicate confirmed ancient-city targets, while purple bounding boxes denote negative samples.
Figure 2. Examples of positive and negative samples in the Qinghai Lake Ancient City Dataset (QHACD). (ad) represent positive samples of confirmed ancient city sites: (a) Dachang Jianggai Ancient City, (b) Gala Upper Ancient City, (c) Xihai Jun Ancient City, and (d) Fusi City. (eh) represent negative samples collected from non-archaeological areas with similar morphological or spectral characteristics, including (e) salt-lake textures, (f) agricultural grids, (g) modern industrial facilities, and (h) contemporary settlements. All samples are derived from 0.8 m GF-2 pan-sharpened imagery and annotated as part of the QHACD to enhance model discrimination between archaeological and modern anthropogenic features. Green bounding boxes indicate confirmed ancient-city targets, while purple bounding boxes denote negative samples.
Remotesensing 17 03997 g002
Figure 3. Illustration of data augmentation strategies for the Qinghai Lake Ancient City Dataset (QHACD). Panels are shown in the same order as the code outputs: (ae) five examples of geometric/radiometric augmentations on the same tile (random rotation/flip, scale–shift–rotate, random crop, and brightness/contrast/saturation jitter, ±20%); (f) Mosaic augmentation formed by randomly combining four tiles from the study area; (g) MixUp augmentation obtained by linear blending of two independently augmented tiles; (h) the original GF-2 pan-sharpened image (0.8 m). All tiles are 450 m × 450 m (562 × 562 px). These augmentations increase sample diversity and improve model generalization.
Figure 3. Illustration of data augmentation strategies for the Qinghai Lake Ancient City Dataset (QHACD). Panels are shown in the same order as the code outputs: (ae) five examples of geometric/radiometric augmentations on the same tile (random rotation/flip, scale–shift–rotate, random crop, and brightness/contrast/saturation jitter, ±20%); (f) Mosaic augmentation formed by randomly combining four tiles from the study area; (g) MixUp augmentation obtained by linear blending of two independently augmented tiles; (h) the original GF-2 pan-sharpened image (0.8 m). All tiles are 450 m × 450 m (562 × 562 px). These augmentations increase sample diversity and improve model generalization.
Remotesensing 17 03997 g003
Figure 4. Simplified structure of the Dual-Path Excitation Block (DPEB) in AC-SENet. The module receives an input feature map X R H × W × C , performs global average pooling, and processes it through two fully connected layers (FC1, FC2) with ReLU and Sigmoid activations to generate a channel attention weight vector. This vector recalibrates feature responses, enhancing salient structural details such as city walls and corners while suppressing background noise. Different colors represent different functional components of the DPEB, including channel attention, spatial attention, and feature fusion paths.
Figure 4. Simplified structure of the Dual-Path Excitation Block (DPEB) in AC-SENet. The module receives an input feature map X R H × W × C , performs global average pooling, and processes it through two fully connected layers (FC1, FC2) with ReLU and Sigmoid activations to generate a channel attention weight vector. This vector recalibrates feature responses, enhancing salient structural details such as city walls and corners while suppressing background noise. Different colors represent different functional components of the DPEB, including channel attention, spatial attention, and feature fusion paths.
Remotesensing 17 03997 g004
Figure 5. Structure of the adjustment module in AC-SENet (based on a ResNet-152 residual block). The module fuses three scale-specific feature maps { X 1 , X 2 , X 3 } through 1 × 1 convolution and Softmax normalization to generate dynamic weights W The fused feature X f u s e d is then refined by a deformable convolution with learnable offsets Δ p and merged with the residual input X i n p u t . This process enhances robustness to erosion-induced boundary degradation and irregular city-wall geometries in GF-2 imagery.
Figure 5. Structure of the adjustment module in AC-SENet (based on a ResNet-152 residual block). The module fuses three scale-specific feature maps { X 1 , X 2 , X 3 } through 1 × 1 convolution and Softmax normalization to generate dynamic weights W The fused feature X f u s e d is then refined by a deformable convolution with learnable offsets Δ p and merged with the residual input X i n p u t . This process enhances robustness to erosion-induced boundary degradation and irregular city-wall geometries in GF-2 imagery.
Remotesensing 17 03997 g005
Figure 6. Overall architecture of the AC-YOLOv11 model for ancient city detection. The AC-YOLOv11 model integrates the AC-SENet backbone (ResNet-152 with SE and dual-path attention modules) with the YOLOv11 neck and detection head. Multi-scale features (C3–C5) extracted from GF-2 pan-sharpened images are fused via PANet for ancient city identification.
Figure 6. Overall architecture of the AC-YOLOv11 model for ancient city detection. The AC-YOLOv11 model integrates the AC-SENet backbone (ResNet-152 with SE and dual-path attention modules) with the YOLOv11 neck and detection head. Multi-scale features (C3–C5) extracted from GF-2 pan-sharpened images are fused via PANet for ancient city identification.
Remotesensing 17 03997 g006
Figure 7. Training and validation loss curves of AC-YOLOv11. (a) Training losses, including box_loss, cls_loss, and dfl_loss; (b) Validation losses, including val/box_loss, val/cls_loss, and val/dfl_loss.
Figure 7. Training and validation loss curves of AC-YOLOv11. (a) Training losses, including box_loss, cls_loss, and dfl_loss; (b) Validation losses, including val/box_loss, val/cls_loss, and val/dfl_loss.
Remotesensing 17 03997 g007
Figure 8. Performance curves of AC-YOLOv11 across 200 training epochs. (a) Precision; (b) Recall; (c) mAP@0.5; (d) mAP@0.5:0.95.
Figure 8. Performance curves of AC-YOLOv11 across 200 training epochs. (a) Precision; (b) Recall; (c) mAP@0.5; (d) mAP@0.5:0.95.
Remotesensing 17 03997 g008
Figure 9. F1-score variation across 200 epochs of training for AC-YOLOv11 on the QHACD.
Figure 9. F1-score variation across 200 epochs of training for AC-YOLOv11 on the QHACD.
Remotesensing 17 03997 g009
Figure 10. Visualized detection results of ancient city sites in the QHACD validation set. Panels (ap) show representative prediction examples randomly selected from the validation set. Blue bounding boxes indicate the detected ancient-city targets, and the numbers displayed beside the boxes represent the corresponding confidence scores predicted by the model.
Figure 10. Visualized detection results of ancient city sites in the QHACD validation set. Panels (ap) show representative prediction examples randomly selected from the validation set. Blue bounding boxes indicate the detected ancient-city targets, and the numbers displayed beside the boxes represent the corresponding confidence scores predicted by the model.
Remotesensing 17 03997 g010
Figure 11. Model detection examples of typical ancient city sites in the Qinghai Lake Basin. AC-YOLOv11 detection results under different geomorphic and surface conditions: (a) complete rectangular enclosure; (b) degraded rammed-earth remains; (c) vegetated area; (d) main city with subsidiary Wengcheng; (e) partially cropped ancient city; (f) heavily eroded zone; (g) modern disturbance area; (h) simple geomorphic unit. All detections were performed at a confidence threshold of 0.75 (blue bounding boxes), showing robust geometric and contour extraction capability.
Figure 11. Model detection examples of typical ancient city sites in the Qinghai Lake Basin. AC-YOLOv11 detection results under different geomorphic and surface conditions: (a) complete rectangular enclosure; (b) degraded rammed-earth remains; (c) vegetated area; (d) main city with subsidiary Wengcheng; (e) partially cropped ancient city; (f) heavily eroded zone; (g) modern disturbance area; (h) simple geomorphic unit. All detections were performed at a confidence threshold of 0.75 (blue bounding boxes), showing robust geometric and contour extraction capability.
Remotesensing 17 03997 g011
Figure 12. Spatial distribution of 74 highly probable ancient city sites detected in the Qinghai Lake Basin.
Figure 12. Spatial distribution of 74 highly probable ancient city sites detected in the Qinghai Lake Basin.
Remotesensing 17 03997 g012
Figure 13. Spatial relationship between ancient city sites and river systems (10 km buffer) in the Qinghai Lake Basin.
Figure 13. Spatial relationship between ancient city sites and river systems (10 km buffer) in the Qinghai Lake Basin.
Remotesensing 17 03997 g013
Figure 14. Altitudinal distribution of ancient city sites in the Qinghai Lake Basin.
Figure 14. Altitudinal distribution of ancient city sites in the Qinghai Lake Basin.
Remotesensing 17 03997 g014
Figure 15. Spatial relationship between ancient city sites and slope gradients in the Qinghai Lake Basin.
Figure 15. Spatial relationship between ancient city sites and slope gradients in the Qinghai Lake Basin.
Remotesensing 17 03997 g015
Table 1. Parameter settings for AC-YOLOv11 training.
Table 1. Parameter settings for AC-YOLOv11 training.
ParametersSetup
Epochs200
Batch size16
Workers4
Learning rate0.001
OptimizerAdam
Imgsz640
Ratio of training set to validation set8:2
Table 2. Evaluation metrics used for model performance assessment.
Table 2. Evaluation metrics used for model performance assessment.
MetricFormulaDefinitionPurpose
Loss function (L) L = i = 1 C y i · l o g ( p i ) Measures the error between predicted and ground-truth values across C classes. In this single-class detection task, YOLO combines bounding box regression loss and distribution focal loss.Evaluates training stability and effectiveness of learning rate scheduling.
Precision Precision = T P T P + F P Proportion of correctly predicted positives among all predicted positives.Reflects the accuracy of positive predictions.
Recall Recall = T P T P + F N Proportion of actual positives correctly identified by the model.Reflects the completeness of positive detection.
Mean Average Precision (mAP) mAP = 1 N i = 1 N A P i The mean of AP values across all N classes. Since N = 1, m A P equals AP for the “Ancient City” class. Includes m A P 50 (IoU = 0.5) and m A P 50 95 (averaged across IoU thresholds from 0.5 to 0.95).Provides a comprehensive measure of detection precision and localization accuracy.
F1 Score F 1 = 2 · Precision · Recall Precision + Recall Harmonic mean of precision and recall.Provides a balanced evaluation when precision and recall are uneven.
Table 3. Ablation experiment results of AC-YOLOv11 on the QHACD.
Table 3. Ablation experiment results of AC-YOLOv11 on the QHACD.
IDModel VariantPrecision (%)Recall (%)F1-Score (%)mAP@0.5 (%)mAP@0.5:0.95 (%)Inference Speed (ms)
1YOLOv11 (Original Backbone)88.683.185.778.570.45.1–5.3
2+ ResNet-152 Backbone90.386.788.480.972.65.4–5.6
3+ ResNet-152 + SE Attention94.589.291.781.873.45.7–5.9
4+ ResNet-152 + DPEB (Ours)97.591.294.282.374.96.2
Table 4. Comparison of detection performance among different models on the QHACD.
Table 4. Comparison of detection performance among different models on the QHACD.
ModelPrecision (%)Recall (%)F1-Score (%)mAP@0.5 (%)mAP@0.5:0.95 (%)Inference Speed (ms)
YOLOv389.184.386.678.970.57.8
YOLOv491.586.989.181.272.48.5
YOLOv792.388.190.18273.55
EfficientDet-D390.285.487.780.17212
AC-YOLOv11 (Ours)97.591.294.282.374.96.2
Table 5. Field verification results of suspected ancient city sites detected by the AC-YOLOv11 model.
Table 5. Field verification results of suspected ancient city sites detected by the AC-YOLOv11 model.
IDCoordinates (Lat, Long)Distance to Known SiteSurface RemainsSurface ArtifactsGeomorphic SettingFinal Interpretation
137°02′57.23″N, 99°31′47.39″E~5.2 km from Fusi CityRammed-earth wall segments and low stone foundations; irregular planPainted porcelain sherdsPiedmont alluvial-fan terrace; gentle slopeNon-ancient city (modern disturbance)
237°02′44.10″N, 99°32′15.70″E~4.4 km from Fusi CityFragmented rammed-earth wall bases with fissures and voidsRed sandy pottery and glazed warePiedmont alluvial-fan terraceNon-ancient city
337°01′25.49″N, 99°34′26.08″E~0.7 km SW of Fusi CityLow mounds and exposed rammed layersDark-red sandy pottery, brown-green glazeSecond terrace on north bank of Qieji RiverPeripheral settlement of Fusi City
437°01′30.9″N, 99°36′14.5″E~1.2 km SE of Fusi CityRammed-earth platforms and low wall remainsDecorated gray pottery with cord and grid patternsTerrace of the Qieji River; flat terrainOuter city of Fusi (east wall, southern section)
536°52′15.00″N, 100°46′12.87″E~20 km from Xihai County CityRammed-earth wall (height ~1 m)Red sandy and brown-glazed potteryTerminal fan terraceNon-ancient city
637°10′40.86″N, 99°44′10.32″E~16 km from Beixiangyang CityElongated mound with residual rammed baseDark-red sandy pottery, brown-green glazeJunction of lake terrace and piedmont fanNon-ancient city
737°22′17.45″N, 98°48′55.93″E~4.5 km from Jinquan CityEarthen mounds with traces of compactionGray and glazed pottery, red sandy wareTerrace near Buha RiverNon-ancient city
837°28′16.42″N, 98°36′8.20″E; 37°28′10.41″N, 98°35′56.38″E~200 m and 800 m from Dalai-Mane No. 2 CityNo visible rammed-earth remains; covered by grassNo artifacts found; no cultural layerSecond terrace on north bank of Buha RiverAncient city remains (Dalai-Mane No. 3 and No. 4)
Table 6. Comparison between model-detected sites and field verification results in the Qinghai Lake Basin.
Table 6. Comparison between model-detected sites and field verification results in the Qinghai Lake Basin.
IDModel OutputField (Overview)Field (Detail)Artifacts
1Remotesensing 17 03997 i001Remotesensing 17 03997 i002Remotesensing 17 03997 i003Remotesensing 17 03997 i004Remotesensing 17 03997 i005
2Remotesensing 17 03997 i006Remotesensing 17 03997 i007Remotesensing 17 03997 i008Remotesensing 17 03997 i009Remotesensing 17 03997 i010
3Remotesensing 17 03997 i011Remotesensing 17 03997 i012Remotesensing 17 03997 i013Remotesensing 17 03997 i014Remotesensing 17 03997 i015
4Remotesensing 17 03997 i016Remotesensing 17 03997 i017Remotesensing 17 03997 i018Remotesensing 17 03997 i019Remotesensing 17 03997 i020
5Remotesensing 17 03997 i021Remotesensing 17 03997 i022Remotesensing 17 03997 i023Remotesensing 17 03997 i024Remotesensing 17 03997 i025
6Remotesensing 17 03997 i026Remotesensing 17 03997 i027Remotesensing 17 03997 i028Remotesensing 17 03997 i029Remotesensing 17 03997 i030
7Remotesensing 17 03997 i031Remotesensing 17 03997 i032Remotesensing 17 03997 i033Remotesensing 17 03997 i034Remotesensing 17 03997 i035
8Remotesensing 17 03997 i036Remotesensing 17 03997 i037Remotesensing 17 03997 i038
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shi, X.; Hou, G. AC-YOLOv11: A Deep Learning Framework for Automatic Detection of Ancient City Sites in the Northeastern Tibetan Plateau. Remote Sens. 2025, 17, 3997. https://doi.org/10.3390/rs17243997

AMA Style

Shi X, Hou G. AC-YOLOv11: A Deep Learning Framework for Automatic Detection of Ancient City Sites in the Northeastern Tibetan Plateau. Remote Sensing. 2025; 17(24):3997. https://doi.org/10.3390/rs17243997

Chicago/Turabian Style

Shi, Xuan, and Guangliang Hou. 2025. "AC-YOLOv11: A Deep Learning Framework for Automatic Detection of Ancient City Sites in the Northeastern Tibetan Plateau" Remote Sensing 17, no. 24: 3997. https://doi.org/10.3390/rs17243997

APA Style

Shi, X., & Hou, G. (2025). AC-YOLOv11: A Deep Learning Framework for Automatic Detection of Ancient City Sites in the Northeastern Tibetan Plateau. Remote Sensing, 17(24), 3997. https://doi.org/10.3390/rs17243997

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop